- Machine Learning with scikit:learn Quick Start Guide
- Kevin Jolly
- 178字
- 2021-06-24 18:15:55
Reducing the size of the data
The dataset that we are working with contains over 6 million rows of data. Most machine learning algorithms will take a large amount of time to work with a dataset of this size. In order to make our execution time quicker, we will reduce the size of the dataset to 20,000 rows. We can do this by using the following code:
#Storing the fraudulent data into a dataframe
df_fraud = df[df['isFraud'] == 1]
#Storing the non-fraudulent data into a dataframe
df_nofraud = df[df['isFraud'] == 0]
#Storing 12,000 rows of non-fraudulent data
df_nofraud = df_nofraud.head(12000)
#Joining both datasets together
df = pd.concat([df_fraud, df_nofraud], axis = 0)
In the preceding code, the fraudulent rows are stored in one dataframe. This dataframe contains a little over 8,000 rows. The 12,000 non-fraudulent rows are stored in another dataframe, and the two dataframes are joined together using the concat method from pandas.
This results in a dataframe with a little over 20,000 rows, over which we can now execute our algorithms relatively quickly.
推薦閱讀
- Practical Data Analysis
- 火格局的時(shí)空變異及其在電網(wǎng)防火中的應(yīng)用
- 計(jì)算機(jī)原理
- WOW!Illustrator CS6完全自學(xué)寶典
- MicroPython Projects
- 精通Excel VBA
- Python Algorithmic Trading Cookbook
- 傳感器技術(shù)應(yīng)用
- Photoshop CS3特效處理融會(huì)貫通
- 新手學(xué)電腦快速入門
- 網(wǎng)絡(luò)化分布式系統(tǒng)預(yù)測(cè)控制
- Silverlight 2完美征程
- 智能制造系統(tǒng)及關(guān)鍵使能技術(shù)
- 精通ROS機(jī)器人編程(原書第2版)
- Getting Started with Tableau 2019.2