官术网_书友最值得收藏!

  • Machine Learning with Swift
  • Alexander Sosnovshchenko
  • 233字
  • 2021-06-24 18:54:55

Loading the dataset

Create and open a new IPython notebook. In the chapter's supplementary materials, you can see the file extraterrestrials.csv. Copy it to the same folder where you created your notebook. In the first cell of your notebook, execute the magical command:

In []: 
%matplotlib inline 

This is needed to see inline plots right in the notebook in the future.

The library we are using for datasets loading and manipulation is pandas. Let's import it, and load the .csv file:

In []: 
import pandas as pd 
df = pd.read_csv('extraterrestrials.csv', sep='t', encoding='utf-8', index_col=0) 

Object df is a data frame. This is a table-like data structured for efficient manipulations over the different data types. To see what's inside, execute:

In []: 
df.head() 
Out[]: 

This prints the first five rows of the table. The first three columns (length, color, and fluffy) are features, and the last one is the class label.

How many samples do we have in total? Run this code to find out:

In []: 
len(df) 
Out[]: 
1000 

Looks like the most samples in the beginning are rabbosauruses. Let's fetch five samples at random to see if it holds true in other parts of the dataset:

In []: 
df.sample(5) 
Out[]: 

Well, this isn't helpful, as it would be too tedious to analyze the table content in this way. We need some more advanced tools to perform descriptive statistics computations and data visualization.

主站蜘蛛池模板: 海门市| 刚察县| 彰武县| 垫江县| 溆浦县| 永年县| 贡觉县| 抚州市| 怀来县| 赤水市| 巩义市| 台北市| 玉林市| 富阳市| 六枝特区| 蒙山县| 新乡市| 哈尔滨市| 宁南县| 巴楚县| 沁源县| 永新县| 云南省| 康乐县| 肇源县| 五指山市| 潼关县| 特克斯县| 分宜县| 长汀县| 安图县| 侯马市| 北京市| 图片| 河曲县| 定州市| 天气| 军事| 安国市| 吴堡县| 罗定市|