舉報

會員
Apache Spark Quick Start Guide
ApacheSparkisaflexibleframeworkthatallowsprocessingofbatchandreal-timedata.Itsunifiedenginehasmadeitquitepopularforbigdatausecases.ThisbookwillhelpyoutogetstartedwithApacheSpark2.0andwritebigdataapplicationsforavarietyofusecases.ItwillalsointroduceyoutoApacheSpark–oneofthemostpopularBigDataprocessingframeworks.AlthoughthisbookisintendedtohelpyougetstartedwithApacheSpark,butitalsofocusesonexplainingthecoreconcepts.ThispracticalguideprovidesaquickstarttotheSpark2.0architectureanditscomponents.ItteachesyouhowtosetupSparkonyourlocalmachine.Aswemoveahead,youwillbeintroducedtoresilientdistributeddatasets(RDDs)andDataFrameAPIs,andtheircorrespondingtransformationsandactions.Then,wemoveontothelifecycleofaSparkapplicationandlearnaboutthetechniquesusedtodebugslow-runningapplications.YouwillalsogothroughSpark’sbuilt-inmodulesforSQL,streaming,machinelearning,andgraphanalysis.Finally,thebookwilllayoutthebestpracticesandoptimizationtechniquesthatarekeyforwritingefficientSparkapplications.Bytheendofthisbook,youwillhaveasoundfundamentalunderstandingoftheApacheSparkframeworkandyouwillbeabletowriteandoptimizeSparkapplications.
目錄(188章)
倒序
- coverpage
- Title Page
- Copyright and Credits
- Apache Spark Quick Start Guide
- About Packt
- Why subscribe?
- Packt.com
- Contributors
- About the authors
- About the reviewer
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Download the color images
- Conventions used
- Get in touch
- Reviews
- Introduction to Apache Spark
- What is Spark?
- Spark architecture overview
- Spark language APIs
- Scala
- Java
- Python
- R
- SQL
- Spark components
- Spark Core
- Spark SQL
- Spark Streaming
- Spark machine learning
- Spark graph processing
- Cluster manager
- Standalone scheduler
- YARN
- Mesos
- Kubernetes
- Making the most of Hadoop and Spark
- Summary
- Apache Spark Installation
- AWS elastic compute cloud (EC2)
- Creating a free account on AWS
- Connecting to your Linux instance
- Configuring Spark
- Prerequisites
- Installing Java
- Installing Scala
- Installing Python
- Installing Spark
- Using Spark components
- Different modes of execution
- Spark sandbox
- Summary
- Spark RDD
- What is an RDD?
- Resilient metadata
- Programming using RDDs
- Transformations and actions
- Transformation
- Narrow transformations
- map()
- flatMap()
- filter()
- union()
- mapPartitions()
- Wide transformations
- distinct()
- sortBy()
- intersection()
- subtract()
- cartesian()
- Action
- collect()
- count()
- take()
- top()
- takeOrdered()
- first()
- countByValue()
- reduce()
- saveAsTextFile()
- foreach()
- Types of RDDs
- Pair RDDs
- groupByKey()
- reduceByKey()
- sortByKey()
- join()
- Caching and checkpointing
- Caching
- Checkpointing
- Understanding partitions
- repartition() versus coalesce()
- partitionBy()
- Drawbacks of using RDDs
- Summary
- Spark DataFrame and Dataset
- DataFrames
- Creating DataFrames
- Data sources
- DataFrame operations and associated functions
- Running SQL on DataFrames
- Temporary views on DataFrames
- Global temporary views on DataFrames
- Datasets
- Encoders
- Internal row
- Creating custom encoders
- Summary
- Spark Architecture and Application Execution Flow
- A sample application
- DAG constructor
- Stage
- Tasks
- Task scheduler
- FIFO
- FAIR
- Application execution modes
- Local mode
- Client mode
- Cluster mode
- Application monitoring
- Spark UI
- Application logs
- External monitoring solution
- Summary
- Spark SQL
- Spark SQL
- Spark metastore
- Using the Hive metastore in Spark SQL
- Hive configuration with Spark
- SQL language manual
- Database
- Table and view
- Load data
- Creating UDFs
- SQL database using JDBC
- Summary
- Spark Streaming Machine Learning and Graph Analysis
- Spark Streaming
- Use cases
- Data sources
- Stream processing
- Microbatch
- DStreams
- Streaming architecture
- Streaming example
- Machine learning
- MLlib
- ML
- Graph processing
- GraphX
- mapVertices
- mapEdges
- subgraph
- GraphFrames
- degrees
- subgraphs
- Graph algorithms
- PageRank
- Summary
- Spark Optimizations
- Cluster-level optimizations
- Memory
- Disk
- CPU cores
- Project Tungsten
- Application optimizations
- Language choice
- Structured versus unstructured APIs
- File format choice
- RDD optimizations
- Choosing the right transformations
- Serializing and compressing
- Broadcast variables
- DataFrame and dataset optimizations
- Catalyst optimizer
- Storage
- Parallelism
- Join performance
- Code generation
- Speculative execution
- Summary
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時間:2021-07-02 13:40:24
推薦閱讀
- ABB工業(yè)機(jī)器人編程全集
- Cinema 4D R13 Cookbook
- 自動檢測與轉(zhuǎn)換技術(shù)
- 群體智能與數(shù)據(jù)挖掘
- 小型電動機(jī)實(shí)用設(shè)計手冊
- 最后一個人類
- Visual Basic從初學(xué)到精通
- 數(shù)據(jù)挖掘方法及天體光譜挖掘技術(shù)
- JavaScript典型應(yīng)用與最佳實(shí)踐
- Android游戲開發(fā)案例與關(guān)鍵技術(shù)
- Learn CloudFormation
- Salesforce for Beginners
- 計算機(jī)應(yīng)用基礎(chǔ)實(shí)訓(xùn)(職業(yè)模塊)
- Instant Slic3r
- 數(shù)字多媒體技術(shù)與應(yīng)用實(shí)例
- 華人動畫師的法蘭西印象
- 智能機(jī)器人:從“深藍(lán)”到AlphaGo
- 大學(xué)計算機(jī)實(shí)踐教程
- 數(shù)據(jù)倉庫結(jié)構(gòu)設(shè)計與實(shí)施
- 開發(fā)者突擊:精通AOP整合應(yīng)用開發(fā)
- Hands-On Ensemble Learning with R
- Network Security with pfSense
- J2ME手機(jī)游戲設(shè)計與開發(fā)
- Python Reinforcement Learning Projects
- 人工智能初探1
- 數(shù)據(jù)處理與深度學(xué)習(xí)
- Practical DevOps
- Learning Elastic Stack 7.0(Second Edition)
- Windows 7使用精解
- VEX IQ機(jī)器人從新手到高手:搭建、編程與競賽