目錄(118章)
倒序
- 封面
- 版權頁
- Credits
- About the Author
- About the Reviewers
- www.PacktPub.com
- Why subscribe?
- Preface
- What this book covers
- What you need for this book
- Who this book is for
- Conventions
- Reader feedback
- Customer support
- Chapter 1. Installing Spark and Setting Up Your Cluster
- Directory organization and convention
- Installing the prebuilt distribution
- Building Spark from source
- Spark topology
- A single machine
- Running Spark on EC2
- Deploying Spark with Chef (Opscode)
- Deploying Spark on Mesos
- Spark on YARN
- Spark standalone mode
- References
- Summary
- Chapter 2. Using the Spark Shell
- The Spark shell
- Loading a simple text file
- Interactively loading data from S3
- Summary
- Chapter 3. Building and Running a Spark Application
- Building Spark applications
- Data wrangling with iPython
- Developing Spark with Eclipse
- Developing Spark with other IDEs
- Building your Spark job with Maven
- Building your Spark job with something else
- References
- Summary
- Chapter 4. Creating a SparkSession Object
- SparkSession versus SparkContext
- Building a SparkSession object
- SparkContext - metadata
- Shared Java and Scala APIs
- Python
- iPython
- Reference
- Summary
- Chapter 5. Loading and Saving Data in Spark
- Spark abstractions
- Data modalities
- Data modalities and Datasets/DataFrames/RDDs
- Loading data into an RDD
- Saving your data
- References
- Summary
- Chapter 6. Manipulating Your RDD
- Manipulating your RDD in Scala and Java
- Manipulating your RDD in Python
- References
- Summary
- Chapter 7. Spark 2.0 Concepts
- Code and Datasets for the rest of the book
- The data scientist and Spark features
- Spark v2.0 and beyond
- Apache Spark - evolution
- Apache Spark - the full stack
- The art of a big data store - Parquet
- References
- Summary
- Chapter 8. Spark SQL
- The Spark SQL architecture
- Spark SQL how-to in a nutshell
- Spark SQL programming
- References
- Summary
- Chapter 9. Foundations of Datasets/DataFrames – The Proverbial Workhorse for DataScientists
- Datasets - a quick introduction
- Dataset APIs - an overview
- Dataset interfaces and functions
- References
- Summary
- Chapter 10. Spark with Big Data
- Parquet - an efficient and interoperable big data format
- HBase
- Reference
- Summary
- Chapter 11. Machine Learning with Spark ML Pipelines
- Spark's machine learning algorithm table
- Spark machine learning APIs - ML pipelines and MLlib
- ML pipelines
- Spark ML examples
- The API organization
- Basic statistics
- Linear regression
- Classification
- Clustering
- Recommendation
- Hyper parameters
- The final thing
- References
- Summary
- Chapter 12. GraphX
- Graphs and graph processing - an introduction
- Spark GraphX
- GraphX - computational model
- The first example - graph
- Building graphs
- The GraphX API landscape
- Structural APIs
- Community affiliation and strengths
- Algorithms
- Partition strategy
- Case study - AlphaGo tweets analytics
- References
- Summary 更新時間:2021-08-20 10:27:33
推薦閱讀
- C語言程序設計(第2 版)
- Cocos2d-x游戲開發:手把手教你Lua語言的編程方法
- Interactive Data Visualization with Python
- Linux核心技術從小白到大牛
- JavaScript語言精髓與編程實踐(第3版)
- Learning Flask Framework
- Android Application Development Cookbook(Second Edition)
- INSTANT Weka How-to
- Processing互動編程藝術
- 常用工具軟件立體化教程(微課版)
- 大話Java:程序設計從入門到精通
- Mastering Object:Oriented Python(Second Edition)
- 一步一步學Spring Boot:微服務項目實戰(第2版)
- TypeScript High Performance
- Java程序設計(項目教學版)
- Go語言編程之旅:一起用Go做項目
- Office VBA開發經典:中級進階卷
- Splunk Developer's Guide(Second Edition)
- Swift編程實戰:iOS應用開發實例及完整解決方案
- Learning Predictive Analytics with R
- 天天學敏捷:Scrum團隊轉型記
- 零基礎學Visual Basic第2版
- WebGIS之Leaflet全面解析
- 深入理解Java虛擬機:JVM高級特性與最佳實踐(第3版)
- PostgreSQL 9 High Availability Cookbook
- SAP Lumira Essentials
- NoSQL數據庫原理與應用
- 軟件開發實踐:項目驅動式的Java開發指南
- Visual FoxPro程序設計實驗教程
- Web Design Blueprints