官术网_书友最值得收藏!

Loading with pandas

The MovieLens dataset is in a good shape; however, there are some changes from the default options in pandas.read_csv that we need to make. To start with, the data is separated by tabs, not commas. Next, there is no heading line. This means the first line in the file is actually data and we need to manually set the column names.

When loading the file, we set the delimiter parameter to the tab character, tell pandas not to read the first row as the header (with header=None) and to set the column names with given values. Let's look at the following code:

all_ratings = pd.read_csv(ratings_filename, delimiter="t", header=None, names
= ["UserID", "MovieID", "Rating", "Datetime"])

While we won't use it in this chapter, you can properly parse the date timestamp using the following line. Dates for reviews can be an important feature in recommendation prediction, as movies that are rated together often have more similar rankings than movies ranked separately. Accounting for this can improve models significantly.

all_ratings["Datetime"] = pd.to_datetime(all_ratings['Datetime'], unit='s')

You can view the first few records by running the following in a new cell:

all_ratings.head()

The result will come out looking something like this:

主站蜘蛛池模板: 仲巴县| 出国| 余江县| 济阳县| 定陶县| 彰武县| 刚察县| 大新县| 苏尼特左旗| 林口县| 日土县| 环江| 天峻县| 拜城县| 桦甸市| 石狮市| 日喀则市| 闸北区| 包头市| 海南省| 兴文县| 黑山县| 肥乡县| 旅游| 盐池县| 尖扎县| 佛冈县| 合川市| 沙洋县| 韶山市| 醴陵市| 茌平县| 霍山县| 丰原市| 高淳县| 吴忠市| 凤山县| 景德镇市| 游戏| 瑞丽市| 洪泽县|