官术网_书友最值得收藏!

Generating Python datasets

To generate a Python dataset, we use the Pandas to_pickle functionality. The dataset we plan to use is called adult.pkl, as shown in the following screenshot:

The related Python code is given here:

import pandas as pd 
path="http://archive.ics.uci.edu/ml/machine-learning-databases/" 
dataSet="adult/adult.data" 
inFile=path+dataSet 
x=pd.read_csv(inFile,header=None) 
adult=pd.DataFrame(x,index=None) 
adult= adult.rename(columns={0:'age',1: 'workclass', 
2:'fnlwgt',3:'education',4:'education-num', 
5:'marital-status',6:'occupation',7:'relationship', 
8:'race',9:'sex',10:'capital-gain',11:'capital-loss', 
12:'hours-per-week',13:'native-country',14:'class'}) 
adult.to_pickle("c:/temp/adult.pkl") 

To show the first several lines of observations, we use the x.head() functionality, shown in the following screenshot:

Note that the backup dataset is available at the author's website, downloadable at http://canisius.edu/~yany/data/adult.data.txt.

主站蜘蛛池模板: 津市市| 连云港市| 宜阳县| 皋兰县| 渑池县| 房山区| 河南省| 启东市| 会东县| 福泉市| 肃宁县| 平顶山市| 牙克石市| 柳河县| 临西县| 蒙自县| 嵊泗县| 稷山县| 革吉县| 乾安县| 诸城市| 横山县| 炎陵县| 凉城县| 开远市| 溧阳市| 东城区| 凉山| 剑阁县| 南陵县| 密山市| 云梦县| 鲜城| 孟津县| 金阳县| 祁连县| 辽宁省| 昌乐县| 敖汉旗| 阿坝| 彭山县|