官术网_书友最值得收藏!

Introduction

The conclusions drawn from data analysis are only as robust as the quality of the data itself. After obtaining raw text, the next natural step is to validate and clean it carefully. Even the slightest bias may risk the integrity of the results. Therefore, we must take great precautionary measures, which involve thorough inspection, to ensure sanity checks are performed on our data before we begin to understand it. This section should be the starting point for cleaning data in Haskell.

Real-world data often has an impurity that needs to be addressed before it can be processed. For example, extraneous whitespaces or punctuation could clutter data, making it difficult to parse. Duplication and data conflicts are another area of unintended consequences of reading real-world data. Sometimes it's just reassuring to know that data makes sense by conducting sanity checks. Some examples of sanity checks include matching regular expressions as well as detecting outliers by establishing a measure of distance. In this chapter, we will cover each of these topics.

主站蜘蛛池模板: 金堂县| 漳浦县| 巨野县| 象山县| 马山县| 秭归县| 略阳县| 汽车| 崇州市| 新田县| 寿光市| 出国| 始兴县| 乌兰察布市| 霍林郭勒市| 阳城县| 佛教| 天气| 隆德县| 阿瓦提县| 祁连县| 如皋市| 扶绥县| 报价| 鹿泉市| 金溪县| 阿克陶县| 乐昌市| 宁南县| 合江县| 建昌县| 迁安市| 西盟| 射阳县| 乌海市| 隆安县| 通城县| 奇台县| 葫芦岛市| 益阳市| 民丰县|