會員

Fast Data Processing with Spark 2（Third Edition）

更新時間：2021-08-20 10:27:33

開會員，本書免費讀 >

ThisbookisfordeveloperswithlittletonoknowledgeofSpark,butwithabackgroundinScala/Javaprogramming.It’srecommendedthatyouhaveexperienceindealingandworkingwithbigdataandastronginterestindatascience.

目錄(118章)

倒序

封面
版權頁
Credits
About the Author
About the Reviewers
www.PacktPub.com
Why subscribe?
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Chapter 1. Installing Spark and Setting Up Your Cluster
Directory organization and convention
Installing the prebuilt distribution
Building Spark from source
Spark topology
A single machine
Running Spark on EC2
Deploying Spark with Chef (Opscode)
Deploying Spark on Mesos
Spark on YARN
Spark standalone mode
References
Summary
Chapter 2. Using the Spark Shell
The Spark shell
Loading a simple text file
Interactively loading data from S3
Summary
Chapter 3. Building and Running a Spark Application
Building Spark applications
Data wrangling with iPython
Developing Spark with Eclipse
Developing Spark with other IDEs
Building your Spark job with Maven
Building your Spark job with something else
References
Summary
Chapter 4. Creating a SparkSession Object
SparkSession versus SparkContext
Building a SparkSession object
SparkContext - metadata
Shared Java and Scala APIs
Python
iPython
Reference
Summary
Chapter 5. Loading and Saving Data in Spark
Spark abstractions
Data modalities
Data modalities and Datasets/DataFrames/RDDs
Loading data into an RDD
Saving your data
References
Summary
Chapter 6. Manipulating Your RDD
Manipulating your RDD in Scala and Java
Manipulating your RDD in Python
References
Summary
Chapter 7. Spark 2.0 Concepts
Code and Datasets for the rest of the book
The data scientist and Spark features
Spark v2.0 and beyond
Apache Spark - evolution
Apache Spark - the full stack
The art of a big data store - Parquet
References
Summary
Chapter 8. Spark SQL
The Spark SQL architecture
Spark SQL how-to in a nutshell
Spark SQL programming
References
Summary
Chapter 9. Foundations of Datasets/DataFrames – The Proverbial Workhorse for DataScientists
Datasets - a quick introduction
Dataset APIs - an overview
Dataset interfaces and functions
References
Summary
Chapter 10. Spark with Big Data
Parquet - an efficient and interoperable big data format
HBase
Reference
Summary
Chapter 11. Machine Learning with Spark ML Pipelines
Spark's machine learning algorithm table
Spark machine learning APIs - ML pipelines and MLlib
ML pipelines
Spark ML examples
The API organization
Basic statistics
Linear regression
Classification
Clustering
Recommendation
Hyper parameters
The final thing
References
Summary
Chapter 12. GraphX
Graphs and graph processing - an introduction
Spark GraphX
GraphX - computational model
The first example - graph
Building graphs
The GraphX API landscape
Structural APIs
Community affiliation and strengths
Algorithms
Partition strategy
Case study - AlphaGo tweets analytics
References
Summary 更新時間：2021-08-20 10:27:33

官术网_书友最值得收藏!

Fast Data Processing with Spark 2（Third Edition）