官术网_书友最值得收藏!

Collecting the data

The data we will be using is the match history data for the NBA for the 2015-2016 season. The website  http://basketball-reference.com contains a significant number of resources and statistics collected from the NBA and other leagues. To download the dataset, perform the following steps:

  1. Navigate to http://www.basketball-reference.com/leagues/NBA_2016_games.html  in your web browser.
  2. Click Share & more.
  3. Click Get table as CSV (for Excel).
  4. Copy the data, including the heading, into a text file named basketball.csv.
  5. Repeat this process for the other months, except do not copy the heading.

This will give you a CSV file containing the results from each game of this season of the NBA. Your file should contain 1316 games and a total of 1317 lines in the file, including the header line.

CSV files are text files where each line contains a new row and each value is separated by a comma (hence the name). CSV files can be created manually by typing into a text editor and saving with a .csv extension. They can be opened in any program that can read text files but can also be opened in Excel as a spreadsheet. Excel (and other spreadsheet programs) can usually convert a spreadsheet to CSV as well.

We will load the file with the pandas library, which is an incredibly useful library for manipulating data. Python also contains a built-in library called csv that supports reading and writing CSV files. However, we will use pandas, which provides more powerful functions that we will use later in the chapter for creating new features.

For this chapter, you will need to install pandas. The easiest way to install it is to use Anaconda's conda installer, as you did in Chapter 1, Getting Started with data mining to install scikit-learn:
$ conda install pandas
If you have difficulty in installing pandas, head to the project's website at http://pandas.pydata.org/getpandas.html and read the installation instructions for your system.

主站蜘蛛池模板: 工布江达县| 奎屯市| 奎屯市| 安化县| 安龙县| 肥乡县| 通江县| 上犹县| 五河县| 信丰县| 阳原县| 合肥市| 平舆县| 忻城县| 太谷县| 双城市| 若尔盖县| 南靖县| 临澧县| 桐梓县| 桃园县| 衡东县| 黄骅市| 临武县| 斗六市| 盈江县| 盐津县| 乳山市| 岱山县| 东阳市| 无锡市| 广南县| 安吉县| 武山县| 盐源县| 华容县| 云林县| 林口县| 和林格尔县| 凌云县| 稻城县|