官术网_书友最值得收藏!

  • Big Data Analytics
  • Venkat Ankam
  • 204字
  • 2021-08-20 10:32:19

Preface

Big Data Analytics aims at providing the fundamentals of Apache Spark and Hadoop, and how they are integrated together with most commonly used tools and techniques in an easy way. All Spark components (Spark Core, Spark SQL, DataFrames, Datasets, Conventional Streaming, Structured Streaming, MLLib, GraphX, and Hadoop core components), HDFS, MapReduce, and Yarn are explored in great depth with implementation examples on Spark + Hadoop clusters.

The Big Data Analytics industry is moving away from MapReduce to Spark. So, the advantages of Spark over MapReduce are explained in great depth to reap the benefits of in-memory speeds. The DataFrames API, the Data Sources API, and the new Dataset API are explained for building Big Data analytical applications. Real-time data analytics using Spark Streaming with Apache Kafka and HBase is covered to help in building streaming applications. New structured streaming concept is explained with an Internet of Things (IOT) use case. Machine learning techniques are covered using MLLib, ML Pipelines and SparkR; Graph Analytics are covered with GraphX and GraphFrames components of Spark.

This book also introduces web based notebooks such as Jupyter, Apache Zeppelin, and data flow tool Apache NiFi to analyze and visualize data, offering Spark as a Service using Livy Server.

主站蜘蛛池模板: 历史| 苍梧县| 石城县| 贺州市| 吉安市| 夏津县| 广丰县| 郸城县| 连城县| 湖州市| 彭山县| 三明市| 莲花县| 山东省| 兴城市| 青川县| 铜川市| 阜新| 三亚市| 库尔勒市| 齐河县| 黔西县| 体育| 安丘市| 河池市| 乐陵市| 红原县| 安义县| 崇仁县| 无棣县| 义乌市| 乐亭县| 彩票| 东港市| 屯门区| 赫章县| 奈曼旗| 如皋市| 乌鲁木齐县| 厦门市| 财经|