官术网_书友最值得收藏!

Obtaining the dataset

Since the inception of the Netflix Prize, Grouplens, a research group at the University of Minnesota, has released several datasets that are often used for testing algorithms in this area. They have released several versions of a movie rating dataset, which have different sizes. There is a version with 100,000 reviews, one with 1 million reviews and one with 10 million reviews.

The datasets are available from http://grouplens.org/datasets/movielens/ and the dataset we are going to use in this chapter is the MovieLens 100K dataset (with 100,000 reviews). Download this dataset and unzip it in your data folder. Start a new Jupyter Notebook and type the following code:

import os
import pandas as pd
data_folder = os.path.join(os.path.expanduser("~"), "Data", "ml-100k")
ratings_filename = os.path.join(data_folder, "u.data")

Ensure that ratings_filename points to the u.data file in the unzipped folder.

主站蜘蛛池模板: 西青区| 安乡县| 玉山县| 安义县| 福建省| 明水县| 北海市| 太湖县| 余干县| 广南县| 潼关县| 深圳市| 平邑县| 绵竹市| 克什克腾旗| 贵德县| 锡林郭勒盟| 江达县| 英德市| 房山区| 利辛县| 屯留县| 那曲县| 万宁市| 玉环县| 兴业县| 文安县| 福泉市| 宁化县| 鄂尔多斯市| 屏东县| 丰顺县| 沁源县| 内乡县| 合山市| 宜兰市| 岱山县| 延安市| 弥渡县| 班戈县| 高邑县|