- Machine Learning with scikit:learn Quick Start Guide
- Kevin Jolly
- 233字
- 2021-06-24 18:15:55
Preparing a dataset for machine learning with scikit-learn
The first step to implementing any machine learning algorithm with scikit-learn is data preparation. Scikit-learn comes with a set of constraints to implementation that will be discussed later in this section. The dataset that we will be using is based on mobile payments and is found on the world's most popular competitive machine learning website – Kaggle.
You can download the dataset from: https://www.kaggle.com/ntnu-testimon/paysim1.
Once downloaded, open a new Jupyter Notebook by using the following code in Terminal (macOS/Linux) or Anaconda Prompt/PowerShell (Windows):
Jupyter Notebook
The fundamental goal of this dataset is to predict whether a mobile transaction is fraudulent. In order to do this, we need to first have a brief understanding of the contents of our data. In order to explore the dataset, we will use the pandas package in Python. You can install pandas by using the following code in Terminal (macOS/Linux) or PowerShell (Windows):
pip3 install pandas
Pandas can be installed on Windows machines in an Anaconda Prompt by using the following code:
conda install pandas
We can now read in the dataset into our Jupyter Notebook by using the following code:
#Package Imports
import pandas as pd
#Reading in the dataset
df = pd.read_csv('PS_20174392719_1491204439457_log.csv')
#Viewing the first 5 rows of the dataset
df.head()
This produces an output as illustrated in the following screenshot:
- 自動控制工程設(shè)計入門
- 機器學(xué)習(xí)與大數(shù)據(jù)技術(shù)
- 返璞歸真:UNIX技術(shù)內(nèi)幕
- JMAG電機電磁仿真分析與實例解析
- MicroPython Projects
- INSTANT Varnish Cache How-to
- Photoshop CS3圖像處理融會貫通
- Android游戲開發(fā)案例與關(guān)鍵技術(shù)
- 電腦日常使用與維護322問
- SMS 2003部署與操作深入指南
- Silverlight 2完美征程
- Hands-On Dashboard Development with QlikView
- FANUC工業(yè)機器人配置與編程技術(shù)
- 手把手教你學(xué)Photoshop CS3
- 樂高創(chuàng)意機器人教程(中級 上冊 10~16歲) (青少年iCAN+創(chuàng)新創(chuàng)意實踐指導(dǎo)叢書)