官术网_书友最值得收藏!

Missing values

Another constraint with scikit-learn is that it cannot handle data with missing values. Therefore, we must check whether our dataset has any missing values in any of the columns to begin with. We can do this by using the following code: 

#Checking every column for missing values

df.isnull().any()

This produces this output: 

Here we note that every column has some amount of missing values. 

Missing values can be handled in a variety of ways, such as the following:

  • Median imputation
  • Mean imputation
  • Filling them with the majority value

The amount of techniques is quite large and varies depending on the nature of your dataset. This process of handling features with missing values is called feature engineering.

Feature engineering can be done for both categorical and numerical columns and would require an entire book to explain the various methodologies that comprise the topic. 

Since this book provides you with a deep focus on the art of applying the various machine learning algorithms that scikit-learn offers, feature engineering will not be dealt with. 

So, for the purpose of aligning with the goals that this book intends to achieve, we will impute all the missing values with a zero.

We can do this by using the following code: 

#Imputing the missing values with a 0

df = df.fillna(0)

We now have a dataset that is ready for machine learning with scikit-learn. We will use this dataset for all the other chapters that we will go through in the future. To make it easy for us, then, we will export this dataset as a .csv file and store it in the same directory that you are working in with the Jupyter Notebook.

We can do this by using the following code: 

df.to_csv('fraud_prediction.csv')

This will create a .csv file of this dataset in the directory that you are working in, which you can load into the notebook again using pandas. 

主站蜘蛛池模板: 巫溪县| 汾阳市| 武汉市| 丹凤县| 广州市| 柳河县| 涿州市| 卓资县| 天津市| 武宣县| 乾安县| 清水县| 辉南县| 望城县| 柯坪县| 桓台县| 白玉县| 北川| 江西省| 雷山县| 东阿县| 华阴市| 罗江县| 兴义市| 玛曲县| 醴陵市| 清水河县| 登封市| 澜沧| 恩平市| 莲花县| 华阴市| 河曲县| 洞头县| 怀仁县| 渭南市| 高台县| 达州市| 五家渠市| 察隅县| 衡水市|