官术网_书友最值得收藏!

What this book covers

Chapter 1, Installing Pyspark and Setting up Your Development Environment, covers the installation of PySpark and learning about core concepts in Spark, including resilient distributed datasets (RDDs), SparkContext, and Spark tools, such as SparkConf and SparkShell.

Chapter 2, Getting Your Big Data into the Spark Environment Using RDDs, explains how to get your big data into the Spark environment using RDDs using a wide array of tools to interact and modify this data so that useful insights can be extracted. 

Chapter 3, Big Data Cleaning and Wrangling with Spark Notebooks, covers how to use Spark in notebook applications, thereby facilitating the effective use of RDDs.

Chapter 4, Aggregating and Summarizing Data into Useful Reports, describes how to calculate averages with the map and reduce function, perform faster average computation, and use a pivot table with key/value pair data points.

Chapter 5, Powerful Exploratory Data Analysis with MLlib, examines Spark's ability to perform regression tasks with models including linear regression and SVMs.

Chapter 6, Putting Structure on Your Big Data with SparkSQL, explains how to manipulate DataFrames with Spark SQL schemas, and use the Spark DSL to build queries for structured data operations.

Chapter 7, Transformations and Actions, looks at Spark transformations to defer computations and then considers transformations that should be avoided. We will also use the reduce and reduceByKey methods to carry out calculations from a dataset.

Chapter 8, Immutable Design, explains how to use DataFrame operations for transformations with a view to discussing immutability in a highly concurrent environment.

Chapter 9, Avoid Shuffle and Reduce Operational Expenses, covers shuffling and the operations of Spark API that should be used. We will then test operations that cause a shuffle in Apache Spark to know which operations should be avoided.

Chapter 10, Saving Data in the Correct Format, explains how to save data in the correct format and also save data in plain text using Spark's standard API. 

Chapter 11, Working with the Spark Key/Value APIdiscusses the transformations available on key/value pairs. We will look at actions on key/value pairs and look at the available partitioners on key/value data.

Chapter 12, Testing Apache Spark Jobs, goes into further detail about testing Apache Spark jobs in different versions of Spark.

Chapter 13, Leveraging the Spark GraphX API, covers how to leverage Spark GraphX API. We will carry out experiments with the Edge API and Vertex API.

主站蜘蛛池模板: 栖霞市| 司法| 德安县| 鹤山市| 松原市| 邵东县| 彭阳县| 丹阳市| 连江县| 景洪市| 邢台县| 洛宁县| 资源县| 涪陵区| 青龙| 延寿县| 德化县| 江西省| 荣成市| 临沭县| 阿拉善左旗| 邹城市| 漾濞| 遂昌县| 南汇区| 广丰县| 丰城市| 陆河县| 无棣县| 报价| 瑞丽市| 邵武市| 达日县| 庐江县| 手游| 东安县| 彰武县| 嘉义县| 朝阳县| 太康县| 三都|