- Learning Apache Spark 2
- Muhammad Asif Abbasi
- 244字
- 2021-07-09 18:46:01
How is Spark being used?
Matei Zaharia is the creator of Apache Spark project and co-founder of DataBricks, the company which was formed by the creators of Apache Spark. Matei in his keynote at the Spark summit in Europe during fall of 2015 mentioned some key metrics on how Spark is being used in various runtime environments. The numbers were a bit surprising to me, as I had thought Spark on YARN would have higher numbers than what was presented. Here are the key figures:
- Spark in Standalone mode - 48%
- Spark on YARN - 40%
- Spark on MESOS - 11%
As we can see from the numbers, almost 90% of Apache Spark installations are in standalone mode or on YARN. When Spark is being configured on YARN, we can make an assumption that the organization has chosen Hadoop as their data operating system, and are planning to move their data onto Hadoop, which means our primary source of data ingest might be Hive, HDFS, HBase, or other No SQL systems.
When Apache Spark is installed in standalone mode, the possibility of primary sources increases, but the data on HDFS still remains a huge possibility as it is entirely likely that the customer has a Hadoop installation, but wishes to keep Spark separate as a discovery platform.
Spark can work with a variety of sources. Let's look at the most common sources that we come across:
- File Formats
- File Systems
- Structured Data sources / Databases
- Key/Value Stores
- 新編計(jì)算機(jī)圖形學(xué)
- 網(wǎng)站入侵與腳本攻防修煉
- Mastering Predictive Analytics with scikit:learn and TensorFlow
- 中國(guó)戰(zhàn)略性新興產(chǎn)業(yè)研究與發(fā)展·數(shù)控系統(tǒng)
- Hands-On Agile Software Development with JIRA
- Cisco UCS Cookbook
- Mastering Android Game Development with Unity
- 多媒體技術(shù)應(yīng)用教程
- Ubuntu 9 Linux應(yīng)用基礎(chǔ)
- 巧學(xué)活用WPS
- 信息技術(shù)基礎(chǔ)與應(yīng)用
- 數(shù)字孿生技術(shù)與工程實(shí)踐:模型+數(shù)據(jù)驅(qū)動(dòng)的智能系統(tǒng)
- Pentaho Data Integration Beginner's Guide(Second Edition)
- 智能機(jī)器人制作完全手冊(cè)(第2版)
- 嵌入式系統(tǒng)原理與應(yīng)用設(shè)計(jì)