官术网_书友最值得收藏!

Reducing the size of the data

The dataset that we are working with contains over 6 million rows of data. Most machine learning algorithms will take a large amount of time to work with a dataset of this size. In order to make our execution time quicker, we will reduce the size of the dataset to 20,000 rows. We can do this by using the following code:

#Storing the fraudulent data into a dataframe

df_fraud = df[df['isFraud'] == 1]

#Storing the non-fraudulent data into a dataframe

df_nofraud = df[df['isFraud'] == 0]

#Storing 12,000 rows of non-fraudulent data

df_nofraud = df_nofraud.head(12000)

#Joining both datasets together

df = pd.concat([df_fraud, df_nofraud], axis = 0)

In the preceding code, the fraudulent rows are stored in one dataframe. This dataframe contains a little over 8,000 rows. The 12,000 non-fraudulent rows are stored in another dataframe, and the two dataframes are joined together using the concat method from pandas.

This results in a dataframe with a little over 20,000 rows, over which we can now execute our algorithms relatively quickly. 

主站蜘蛛池模板: 犍为县| 商河县| 秦皇岛市| 娄底市| 富平县| 隆昌县| 赤壁市| 荣成市| 克山县| 油尖旺区| 苍溪县| 洛阳市| 石棉县| 江北区| 合江县| 湘潭市| 安泽县| 靖边县| 舞阳县| 云梦县| 金山区| 固原市| 泰和县| 开阳县| 姜堰市| 页游| 江阴市| 辉县市| 大新县| 白河县| 静海县| 固镇县| 日照市| 博白县| 双峰县| 乐安县| 海安县| 鲜城| 正定县| 天柱县| 金阳县|