官术网_书友最值得收藏!

Summary

Apache Hadoop provides you with a reliable and scalable framework (HDFS) for Big Data storage and a powerful cluster resource management framework (YARN) to run and manage multiple Big Data applications. Apache Spark provides in-memory performance in Big Data processing and libraries and APIs for interactive exploratory analytics, real-time analytics, machine learning, and graph analytics. While MR was the primary processing engine on top of Hadoop, it had multiple drawbacks, such as poor performance and inflexibility in designing applications. Apache Spark is a replacement for MR. All MR-based tools, such as Hive, Pig, Mahout, and Crunch, have already started offering Apache Spark as an additional execution engine apart from MR.

Nowadays, Big Data projects are being implemented in many businesses, from large Fortune 500 companies to small start-ups. Organizations gain an edge if they can go from raw data to decisions quickly with easy-to-use tools to develop applications and explore data. Apache Spark will bring this speed and sophistication to Hadoop clusters.

In the next chapter, let's dive deep into Spark and learn Spark.

主站蜘蛛池模板: 商水县| 铜陵市| 朝阳县| 开远市| 西盟| 澳门| 尚志市| 双城市| 天峻县| 清流县| 永登县| 靖宇县| 肃宁县| 浏阳市| 双柏县| 景谷| 乌鲁木齐市| 富平县| 武威市| 台中县| 友谊县| 稻城县| 瑞金市| 肥东县| 华池县| 台山市| 周宁县| 陕西省| 文昌市| 体育| 栾川县| 渭南市| 抚顺县| 望奎县| 夹江县| 博乐市| 太湖县| 茂名市| 唐河县| 盘山县| 安义县|