官术网_书友最值得收藏!

Running the ratings counter script

If you go to the Tools menu in Canopy, you have a shortcut there for Command Prompt that you can use, or you can open up Command Prompt anywhere. When you open that up, just make sure that you get into your SparkCourse directory where you actually downloaded the script that we're going to be using. So, type in C:\SparkCourse (or navigate to the directory if it's in a different location) and then type dir and you should see the contents of the directory. The ratings-counter.py and ml-100k folders should both be in there:

All I need to do to run it, is type in spark-submit ratings-counter.py-follow along with me here:

I'm going to hit Enter and that will let me run this saved script that I wrote for Spark. Off it goes, and we soon get our results. So it made short work of those 100,000 ratings. 100,000 ratings doesn't constitute really big data but we're just playing around on our desktop for now:

The results are kind of interesting. It turns out that the most common rating is four star, so people are most generous with four star ratings, with 34,000 of them in the dataset, and people seem to reserve one stars for the worst of the worst, only about 6,000 one star ratings out of our 100,00 ratings. It might be fun to go and see what actually got rated one star if you want to find some really bad movies to watch.

主站蜘蛛池模板: 华阴市| 拜城县| 蒙城县| 西畴县| 宣汉县| 塔城市| 抚顺县| 兴山县| 万载县| 江孜县| 葵青区| 油尖旺区| 泸水县| 潮州市| 株洲县| 宿迁市| 西乌| 玉溪市| 周口市| 太白县| 新津县| 象州县| 新干县| 高密市| 新疆| 陕西省| 长海县| 成武县| 商丘市| 北票市| 南华县| 秀山| 福贡县| 班玛县| 建始县| 万源市| 习水县| 江源县| 文水县| 高青县| 岑巩县|