官术网_书友最值得收藏!

Spark components

As discussed earlier in this chapter, the main philosophy behind Spark is to provide a unified engine for creating different types of big data applications. Spark provides a variety of libraries to work with batch analytics, streaming, machine learning, and graph analysis.

It is not as if these kinds of processing were never done before Spark, but for every new big data problem, there was a new tool in the market; for example, for batch analysis, we had MapReduce, Hive, and Pig. For Streaming, we had Apache Storm, for machine learning, we had Mahout. Although these tools solve the problems that they are designed for, each of them requires a learning curve. This is where Spark brings advantages. Spark provides a unified stack for solving all of these problems. It has components that are designed for processing all kinds of big data. It also provides many libraries to read or write different kinds of data such as JSON, CSV, and Parquet.

Here is an example of a Spark stack:

Spark stack

Having a unified stack brings lots of advantages. Let's look at some of the advantages:

  • First is code sharing and reusability. Components developed by the data engineering team can easily be integrated by the data science team to avoid code redundancy. 
  • Secondly,  there is always a new tool coming in the market to solve a different big data usecase. Most of the developers struggle to learn new tools and gain expertise in order to use them efficiently. With Spark, developers just have to learn the basic concepts which allows developers to work on different big data use cases.
  • Thirdly, its unified stack gives great power to the developers to explore new ideas without installing new tools.

The following diagram provides a high-level overview of different big-data applications powered by Spark:

Spark use cases
主站蜘蛛池模板: 大安市| 尤溪县| 阿合奇县| 新营市| 陆川县| 乌兰察布市| 鹿泉市| 隆回县| 壶关县| 清涧县| 乐平市| 乌鲁木齐市| 杭锦后旗| 阳西县| 洪江市| 洪洞县| 湖北省| 铜梁县| 昭觉县| 苗栗市| 长沙市| 黔西县| 澄江县| 屏东县| 申扎县| 南川市| 平度市| 岳阳市| 临泉县| 周口市| 三门峡市| 光泽县| 赤峰市| 靖远县| 台南市| 手游| 盐亭县| 靖宇县| 霍林郭勒市| 沁水县| 潮州市|