官术网_书友最值得收藏!

Getting ready

In Chapter 1, Get Closer to your Data, we manipulated and prepared the data from the HousePrices.csv file and dealt with the missing values. In this example, we're going to use the final dataset to demonstrate these sampling and resampling techniques.

You can get the prepared dataset from the GitHub.

We'll import the required libraries. We'll read the data and take a look at the dimensions of our dataset:

# import os for operating system dependent functionalities
import os

# import other required libraries
import pandas as pd
from sklearn.model_selection import train_test_split

# Set your working directory according to your requirement
os.chdir(".../Chapter 3/Resampling Methods")
os.getcwd()

Let's read our data. We'll prefix the DataFrame name with df_ to make it easier to understand:

df_housingdata = pd.read_csv("Final_HousePrices.csv")

In the next section, we'll look at how to use train_test_split() from sklean.model_selection to split our data into random training and testing subsets.

主站蜘蛛池模板: 黎平县| 晋中市| 辉县市| 高唐县| 白沙| 灌南县| 忻城县| 香河县| 遂昌县| 上栗县| 中西区| 商都县| 华坪县| 深泽县| 宣城市| 项城市| 象州县| 徐汇区| 杨浦区| 无锡市| 津南区| 台北市| 灌阳县| 增城市| 玛纳斯县| 和田市| 凤山市| 平谷区| 台湾省| 英德市| 乳山市| 公主岭市| 喜德县| 河东区| 婺源县| 瑞丽市| 礼泉县| 龙岩市| 修武县| 遂平县| 故城县|