- Ensemble Machine Learning Cookbook
- Dipayan Sarkar Vijayalakshmi Natarajan
- 151字
- 2021-07-02 13:21:58
Getting ready
In Chapter 1, Get Closer to your Data, we manipulated and prepared the data from the HousePrices.csv file and dealt with the missing values. In this example, we're going to use the final dataset to demonstrate these sampling and resampling techniques.
You can get the prepared dataset from the GitHub.
We'll import the required libraries. We'll read the data and take a look at the dimensions of our dataset:
# import os for operating system dependent functionalities
import os
# import other required libraries
import pandas as pd
from sklearn.model_selection import train_test_split
# Set your working directory according to your requirement
os.chdir(".../Chapter 3/Resampling Methods")
os.getcwd()
Let's read our data. We'll prefix the DataFrame name with df_ to make it easier to understand:
df_housingdata = pd.read_csv("Final_HousePrices.csv")
In the next section, we'll look at how to use train_test_split() from sklean.model_selection to split our data into random training and testing subsets.
推薦閱讀
- 我的J2EE成功之路
- Mastercam 2017數控加工自動編程經典實例(第4版)
- 商戰數據挖掘:你需要了解的數據科學與分析思維
- TIBCO Spotfire:A Comprehensive Primer(Second Edition)
- 網上生活必備
- Hands-On Data Science with SQL Server 2017
- Mastering Salesforce CRM Administration
- INSTANT Autodesk Revit 2013 Customization with .NET How-to
- Enterprise PowerShell Scripting Bootcamp
- Mastering Game Development with Unreal Engine 4(Second Edition)
- Mastering Text Mining with R
- 大數據案例精析
- 大數據素質讀本
- Practical Network Automation
- 巧學活用Linux