- Frank Kane's Taming Big Data with Apache Spark and Python
- Frank Kane
- 252字
- 2021-07-02 21:12:19
Running the ratings counter script
If you go to the Tools menu in Canopy, you have a shortcut there for Command Prompt that you can use, or you can open up Command Prompt anywhere. When you open that up, just make sure that you get into your SparkCourse directory where you actually downloaded the script that we're going to be using. So, type in C:\SparkCourse (or navigate to the directory if it's in a different location) and then type dir and you should see the contents of the directory. The ratings-counter.py and ml-100k folders should both be in there:

All I need to do to run it, is type in spark-submit ratings-counter.py-follow along with me here:

I'm going to hit Enter and that will let me run this saved script that I wrote for Spark. Off it goes, and we soon get our results. So it made short work of those 100,000 ratings. 100,000 ratings doesn't constitute really big data but we're just playing around on our desktop for now:

The results are kind of interesting. It turns out that the most common rating is four star, so people are most generous with four star ratings, with 34,000 of them in the dataset, and people seem to reserve one stars for the worst of the worst, only about 6,000 one star ratings out of our 100,00 ratings. It might be fun to go and see what actually got rated one star if you want to find some really bad movies to watch.
- SQL Server 從入門到項目實踐(超值版)
- C#程序設(shè)計教程
- MATLAB應(yīng)用與實驗教程
- Getting Started with SQL Server 2012 Cube Development
- Visual FoxPro程序設(shè)計
- Vue.js 2 Web Development Projects
- MySQL程序員面試筆試寶典
- JavaScript動態(tài)網(wǎng)頁編程
- Visual C++從入門到精通(第2版)
- Learning Cocos2d-JS Game Development
- MongoDB Administrator’s Guide
- Java面試一戰(zhàn)到底(基礎(chǔ)卷)
- 基于JavaScript的WebGIS開發(fā)
- Visual FoxPro程序設(shè)計實驗教程
- Visual C++ 開發(fā)從入門到精通