官术网_书友最值得收藏!

Introduction

In the real world, data rarely matches textbook definitions and examples. We have to deal with issues such as faulty hardware, uncooperative customers, and disgruntled colleagues. It is difficult to predict what kind of issues you will run into, but it is safe to assume that they will be plentiful and challenging. In this chapter, I will sketch some common approaches to deal with noisy data, which are based more on rules of thumb than strict science. Luckily, the trial and error part of data analysis is limited.

Most of this chapter is about outlier management. Outliers are values that we consider to be abnormal. Of course, this is not the only issue that you will encounter, but it is a sneaky one. A common issue is that of missing or invalid values, so I will briefly mention masked arrays and pandas features such as the dropna() function, which I have used throughout this book.

I have also written two recipes about using mpmath for arbitrary precision calculations. I don't recommend using mpmath unless you really have to because of the performance penalty you have to pay. Usually we can work around numerical issues, so arbitrary precision libraries are rarely needed.

主站蜘蛛池模板: 山阴县| 平远县| 武宁县| 天津市| 湖北省| 广汉市| 纳雍县| 普兰县| 南宁市| 平邑县| 漳平市| 五台县| 长海县| 东乡县| 新野县| 长海县| 威远县| 鱼台县| 井冈山市| 建瓯市| 岢岚县| 怀化市| 清徐县| 洪江市| 安岳县| 蛟河市| 南乐县| 右玉县| 兴化市| 凯里市| 渝北区| 囊谦县| 横峰县| 宁津县| 东兴市| 白玉县| 资阳市| 武定县| 南宫市| 喀喇沁旗| 莒南县|