- Learning Data Mining with Python(Second Edition)
- Robert Layton
- 229字
- 2021-07-02 23:40:08
Using pandas to load the dataset
The pandas library is a library for loading, managing, and manipulating data. It handles data structures behind-the-scenes and supports data analysis functions, such as computing the mean and grouping data by value.
When doing multiple data mining experiments, you will find that you write many of the same functions again and again, such as reading files and extracting features. Each time this reimplementation happens, you run the risk of introducing bugs. Using a high-quality library such as pandas significantly reduces the amount of work needed to do these functions, and also gives you more confidence in using well-tested code to underly your own programs.
Throughout this book, we will be using pandas a lot, introducing use cases as we go and new functions as needed.
We can load the dataset using the read_csv function:
import pandas as pd
data_filename = "basketball.csv"
dataset = pd.read_csv(data_filename)
The result of this is a pandas DataFrame, and it has some useful functions that we will use later on. Looking at the resulting dataset, we can see some issues. Type the following and run the code to see the first five rows of the dataset:
dataset.head(5)
Here's the output:

Just reading the data with no parameters resulted in quite a usable dataset, but it has some issues which we will address in the next section.
- Visual C++數(shù)字圖像模式識(shí)別技術(shù)詳解
- 我的第一本算法書
- 微服務(wù)設(shè)計(jì)原理與架構(gòu)
- 技術(shù)領(lǐng)導(dǎo)力:程序員如何才能帶團(tuán)隊(duì)
- PHP網(wǎng)絡(luò)編程學(xué)習(xí)筆記
- Visual Basic學(xué)習(xí)手冊(cè)
- 組態(tài)軟件技術(shù)與應(yīng)用
- 小程序,巧應(yīng)用:微信小程序開(kāi)發(fā)實(shí)戰(zhàn)(第2版)
- 區(qū)塊鏈項(xiàng)目開(kāi)發(fā)指南
- Oracle實(shí)用教程
- Scala編程實(shí)戰(zhàn)
- Java EE Web應(yīng)用開(kāi)發(fā)基礎(chǔ)
- C語(yǔ)言程序設(shè)計(jì)實(shí)踐
- SSH框架企業(yè)級(jí)應(yīng)用實(shí)戰(zhàn)
- Android高級(jí)開(kāi)發(fā)實(shí)戰(zhàn):UI、NDK與安全