舉報

會員
Learning Apache Spark 2
最新章節:
Summary
Thisguideappealstobigdataengineers,analysts,architects,softwareengineers,eventechnicalmanagerswhoneedtoperformefficientdataprocessingonHadoopatrealtime.BasicfamiliaritywithJavaorScalawillbehelpful.Theassumptionisthatreaderswillbefromamixedbackground,butwouldbetypicallypeoplewithbackgroundinengineering/datasciencewithnopriorSparkexperienceandwanttounderstandhowSparkcanhelpthemontheiranalyticsjourney.
目錄(121章)
倒序
- 封面
- 版權信息
- Credits
- About the Author
- About the Reviewers
- www.packtpub.com
- Customer Feedback
- Preface
- Chapter 1. Architecture and Installation
- Apache Spark architecture overview
- Installing Apache Spark
- Writing your first Spark program
- Spark architecture
- Apache Spark cluster manager types
- Running Spark examples
- Brain teasers
- References
- Summary
- Chapter 2. Transformations and Actions with Spark RDDs
- What is an RDD?
- Operations on RDD
- Passing functions to Spark (Scala)
- Passing functions to Spark (Java)
- Passing functions to Spark (Python)
- Transformations
- Set operations in Spark
- Actions
- PairRDDs
- Shared variables
- References
- Summary
- Chapter 3. ETL with Spark
- What is ETL?
- How is Spark being used?
- Commonly Supported File Formats
- Commonly supported file systems
- Structured Data sources and Databases
- References
- Summary
- Chapter 4. Spark SQL
- What is Spark SQL?
- What is DataFrame API?
- What is DataSet API?
- What's new in Spark 2.0?
- The Sparksession
- Creating a DataFrame
- Parquet files
- Working with Hive
- SparkSQL CLI
- References
- Summary
- Chapter 5. Spark Streaming
- What is Spark Streaming?
- Steps involved in a streaming app
- Architecture of Spark Streaming
- Caching and persistence
- Checkpointing
- DStream best practices
- Fault tolerance
- What is Structured Streaming?
- References
- Summary
- Chapter 6. Machine Learning with Spark
- What is machine learning?
- Why machine learning?
- Types of machine learning
- Introduction to Spark MLLib
- Why do we need the Pipeline API?
- How does it work?
- Feature engineering
- Classification and regression
- Clustering
- Collaborative filtering
- ML-tuning - model selection and hyperparameter tuning
- References
- Summary
- Chapter 7. GraphX
- Graphs in everyday life
- What is a graph?
- Why are Graphs elegant?
- What is GraphX?
- Creating your first Graph (RDD API)
- Basic graph operators (RDD API)
- Caching and uncaching of graphs
- Graph algorithms in GraphX
- GraphFrames
- Comparison between GraphFrames and GraphX
- References
- Summary
- Chapter 8. Operating in Clustered Mode
- Clusters nodes and daemons
- Running Spark in standalone mode
- Using the Cluster Launch Scripts to Start a Standalone Cluster
- Running Spark in YARN
- Running Spark in Mesos
- References:
- Summary
- Chapter 9. Building a Recommendation System
- What is a recommendation system?
- User specific recommendations
- Key issues with recommendation systems
- Recommendation system in Spark
- References
- Summary
- Chapter 10. Customer Churn Prediction
- Overview of customer churn
- Why is predicting customer churn important?
- How do we predict customer churn with Spark?
- Exploring customer service calls
- References
- Summary
- Appendix . Theres More with Spark
- Performance tuning
- I/O tuning
- Sizing up your executors
- The skew problem
- Security configuration in Spark
- Setting up Jupyter Notebook with Spark
- Shared variables
- References
- Summary 更新時間:2021-07-09 18:46:26
推薦閱讀
- 虛擬儀器設計測控應用典型實例
- Practical Data Analysis
- 面向STEM的mBlock智能機器人創新課程
- Mastering VMware vSphere 6.5
- PIC單片機C語言非常入門與視頻演練
- Statistics for Data Science
- 嵌入式GUI開發設計
- Machine Learning Algorithms(Second Edition)
- FANUC工業機器人配置與編程技術
- 計算機應用基礎實訓(職業模塊)
- 算法設計與分析
- 單片機硬件接口電路及實例解析
- 淘寶網店頁面設計、布局、配色、裝修一本通
- Android High Performance Programming
- 精通LabVIEW 8.x
- 這樣用Excel!
- Clementine數據挖掘方法及應用
- Photoshop應用基礎
- 數據庫應用技術:Visual FoxPro 6.0
- 零基礎學三菱PLC編程:入門、提高、應用、實例
- OpenStack Bootcamp
- PowerShell Core for Linux Administrators Cookbook
- Modern Computer Architecture and Organization
- Hands-On Machine Learning with JavaScript
- 動態網頁制作
- AI賦能:驅動產業變革的人工智能應用
- 數據庫原理與應用
- 對抗機器學習:機器學習系統中的攻擊和防御
- Learning ServiceNow
- 伺服控制技術自學手冊