- Apache Spark Quick Start Guide
- Shrey Mehrotra Akash Grade
- 310字
- 2021-07-02 13:39:55
Spark components
As discussed earlier in this chapter, the main philosophy behind Spark is to provide a unified engine for creating different types of big data applications. Spark provides a variety of libraries to work with batch analytics, streaming, machine learning, and graph analysis.
It is not as if these kinds of processing were never done before Spark, but for every new big data problem, there was a new tool in the market; for example, for batch analysis, we had MapReduce, Hive, and Pig. For Streaming, we had Apache Storm, for machine learning, we had Mahout. Although these tools solve the problems that they are designed for, each of them requires a learning curve. This is where Spark brings advantages. Spark provides a unified stack for solving all of these problems. It has components that are designed for processing all kinds of big data. It also provides many libraries to read or write different kinds of data such as JSON, CSV, and Parquet.
Here is an example of a Spark stack:

Having a unified stack brings lots of advantages. Let's look at some of the advantages:
- First is code sharing and reusability. Components developed by the data engineering team can easily be integrated by the data science team to avoid code redundancy.
- Secondly, there is always a new tool coming in the market to solve a different big data usecase. Most of the developers struggle to learn new tools and gain expertise in order to use them efficiently. With Spark, developers just have to learn the basic concepts which allows developers to work on different big data use cases.
- Thirdly, its unified stack gives great power to the developers to explore new ideas without installing new tools.
The following diagram provides a high-level overview of different big-data applications powered by Spark:

- 機(jī)器學(xué)習(xí)及應(yīng)用(在線實(shí)驗(yàn)+在線自測(cè))
- 工業(yè)機(jī)器人工程應(yīng)用虛擬仿真教程:MotoSim EG-VRC
- 統(tǒng)計(jì)學(xué)習(xí)理論與方法:R語(yǔ)言版
- Learn CloudFormation
- 工業(yè)機(jī)器人應(yīng)用案例集錦
- Learning Azure Cosmos DB
- Linux嵌入式系統(tǒng)開(kāi)發(fā)
- 統(tǒng)計(jì)挖掘與機(jī)器學(xué)習(xí):大數(shù)據(jù)預(yù)測(cè)建模和分析技術(shù)(原書(shū)第3版)
- Flink原理與實(shí)踐
- HBase Essentials
- 電腦上網(wǎng)入門(mén)
- 青少年VEX IQ機(jī)器人實(shí)訓(xùn)課程(初級(jí))
- AMK伺服控制系統(tǒng)原理及應(yīng)用
- 機(jī)器學(xué)習(xí)案例分析(基于Python語(yǔ)言)
- 伺服與運(yùn)動(dòng)控制系統(tǒng)設(shè)計(jì)