官术网_书友最值得收藏!

Summary

This chapter introduced some of the key tools used for data science. In particular, it demonstrated how to download and install the virtual machine for the Cloudera Distribution of Hadoop (CDH), Spark, R, RStudio, and Python. Although the user can download the source code of Hadoop and install it on, say, a Unix system, it is usually fraught with issues and requires a fair amount of debugging. Using a VM instead allows the user to begin using and learning Hadoop with minimal effort as it is a complete preconfigured environment.

Additionally, R and Python are the two most commonly used languages for machine learning and in general, analytics. They are available for all popular operating systems. Although they can be installed in the VM, the user is encouraged to try and install them on their local machines (laptop/workstation) if feasible as it will have relatively higher performance.

In the next chapter, we will pe deeper into the details of Hadoop and its core components and concepts.

主站蜘蛛池模板: 通化市| 洛川县| 邢台县| 哈尔滨市| 五原县| 渝北区| 桦甸市| 香格里拉县| 紫金县| 潢川县| 延吉市| 马公市| 林口县| 漳浦县| 观塘区| 汤阴县| 循化| 醴陵市| 榆中县| 大洼县| 五常市| 左云县| 台南县| 托里县| 安新县| 长乐市| 东平县| 牡丹江市| 山阴县| 彭泽县| 尚志市| 克拉玛依市| 大理市| 赫章县| 邹城市| 云霄县| 西和县| 信宜市| 淮北市| 漠河县| 凭祥市|