官术网_书友最值得收藏!

Reasons to choose Apache Spark

Apache Spark is very popular in the big data community these days. Here are some of the most prominent reasons for using Apache Spark in big data modeling and computation:

  • Speed: Speed is important in processing large datasets. Spark offers the ability to run computations up to one hundred times faster than Hadoop2 MapReduce in memory, or ten times faster on disk.
  • Accessibility: Spark was developed to be highly accessible, offering simple APIs in Python, Java, Scala, and SQL, and rich built-in libraries. In addition to this, it also integrates with other big data tools, including Hadoop clusters and sources such as Cassandra3.
  • Platform support: Apache spark was built to run on Hadoop and Mesos, standalone, or in the cloud. It can access diverse data sources, including HDFS, Cassandra, HBase, and S3.
  • Generality: Spark was developed to cover a wide range of workloads, including batch applications, iterative algorithms, interactive queries, and streaming. By supporting these workloads in the same engine, Spark makes it easy and inexpensive to combine different processing types, which is often necessary for data analysis production pipelines.
主站蜘蛛池模板: 宁阳县| 临湘市| 闻喜县| 汶上县| 凌海市| 凤台县| 邵东县| 横峰县| 绥芬河市| 临江市| 建湖县| 尚义县| 游戏| 县级市| 平昌县| 宁化县| 巴楚县| 文登市| 大关县| 长汀县| 南漳县| 阜新市| 浦城县| 丹巴县| 剑河县| 台北市| 广昌县| 仁怀市| 阜康市| 龙泉市| 湖口县| 洛阳市| 仁化县| 黔南| 惠安县| 唐河县| 苏尼特左旗| 曲松县| 观塘区| 阳高县| 滦南县|