舉報

會員
PySpark Cookbook
最新章節:
How it works...
ThePySparkCookbookisforyouifyouareaPythondeveloperlookingforhands-onrecipesforusingtheApacheSpark2.xecosysteminthebestpossibleway.AthoroughunderstandingofPython(andsomefamiliaritywithSpark)willhelpyougetthebestoutofthebook.
- How it works... 更新時間:2021-06-18 19:07:31
- How to do it...
- Getting ready
- Visualizing the graph
- See also
- There's more...
- How it works...
- How to do it...
- Getting ready
- Finding the fewest number of connections
- How it works...
- How to do it...
- Getting ready
- Using PageRank to determine airport ranking
- How it works...
- How to do it...
- Getting ready
- Understanding the graph
- How it works...
- How to do it...
- Getting ready
- Running queries against the graph
- How it works...
- How to do it...
- Building the graph
- There's more...
- How it works...
- How to do it...
- Getting ready
- Preparing the data
- How it works...
- How to do it...
- Getting ready
- Installing GraphFrames
- Introduction
- GraphFrames – Graph Theory with PySpark
- How it works...
- Terminal 2 – Spark Streaming window
- Terminal 1 – Netcat window
- How to do it...
- Getting ready
- Continuous aggregation with structured streaming
- How it works...
- Terminal 2 – Spark Streaming window
- Terminal 1 – Netcat window
- How to do it...
- Getting ready
- Understanding global aggregations
- There's more...
- How it works...
- Terminal 2 – Spark Streaming window
- Terminal 1 – Netcat window
- How to do it...
- Getting ready
- Understanding DStreams
- Understanding Spark Streaming
- Introduction
- Structured Streaming with PySpark
- How it works...
- How to do it...
- Getting ready
- Topic mining
- How it works...
- How to do it...
- Getting ready
- Standardizing continuous variables
- How it works...
- How to do it...
- Getting ready
- Discretizing continuous variables
- See also
- There's more...
- How it works...
- How to do it...
- Getting ready
- Extracting features from text
- There's more...
- How it works...
- How to do it...
- Getting ready
- Tuning hyperparameters
- See also
- How it works...
- How to do it...
- Getting ready
- Clustering forest cover types
- There's more...
- How it works...
- How to do it...
- Getting ready
- Estimating forest elevation
- There's more...
- How it works...
- How to do it...
- Getting ready
- Predicting forest coverage types
- See also
- There's more...
- How it works...
- How to do it...
- Getting ready
- Selecting the most predictable features
- See also
- How it works...
- How to do it...
- Getting ready
- Introducing Pipelines
- There's more...
- How it works...
- How to do it...
- Getting ready
- Introducing Estimators
- See also
- There's more...
- How it works...
- How to do it...
- Getting ready
- Introducing Transformers
- Machine Learning with the ML Module
- See also
- Classification metrics
- Regression metrics
- How it works...
- How to do it...
- Getting ready
- Computing performance statistics
- See also
- There's more...
- How it works...
- How to do it...
- Getting ready
- Building a clustering models
- There's more...
- How it works...
- How to do it...
- Getting ready
- Forecasting the income levels of census respondents
- How it works...
- How to do it...
- Getting ready
- Predicting hours of work for census respondents
- See also
- There's more...
- How it works...
- Regression
- Classification
- How to do it...
- Getting ready
- Creating an RDD for training
- How it works...
- How to do it...
- Getting ready
- Standardizing the data
- See also...
- There's more...
- How it works...
- How to do it...
- Getting ready
- Transforming the data
- See also...
- How it works...
- How to do it...
- Getting ready
- Testing the data
- See also
- There's more...
- Categorical features
- Numerical features
- How it works...
- How to do it...
- Getting ready
- Exploring the data
- There's more...
- How it works...
- How to do it...
- Getting ready
- Loading the data
- Machine Learning with MLlib
- There's more...
- How it works...
- How to do it...
- Getting ready
- Visualizing interactions between features
- See also
- There's more...
- How it works...
- How to do it...
- Getting ready
- Drawing histograms
- There's more...
- How it works...
- How to do it...
- Getting ready
- Computing correlations
- See also
- Descriptive statistics for aggregated columns
- There's more...
- How it works...
- How to do it...
- Getting ready
- Exploring descriptive statistics
- See also
- How it works...
- How to do it...
- Getting ready
- Handling outliers
- See also
- There's more...
- Missing observations per column
- Missing observations per row
- How it works...
- How to do it...
- Getting ready
- Handling missing observations
- ID collisions
- Only IDs differ
- There's more...
- How it works...
- How to do it...
- Getting ready
- Handling duplicates
- Introduction
- Preparing Data for Modeling
- See also
- The .toPandas() action
- The .take(...) action
- The .collect() action
- The .show(...) action
- How to do it...
- Getting ready
- Overview of DataFrame actions
- See also
- The .freqItems(...) transformation
- The .summary() and .describe() transformations
- The .dropDuplicates(...) transformation
- The .dropna(...) transformation
- The .fillna(...) transformation
- The .repartition(...) transformation
- The .distinct(...) transformation
- The .unionAll(...) transformation
- The .join(...) transformation
- The .withColumn(...) transformation
- The .orderBy(...) transformation
- The .groupBy(...) transformation
- The .filter(...) transformation
- The .select(...) transformation
- How to do it...
- Getting ready
- Overview of DataFrame transformations
- There's more...
- How it works...
- How to do it...
- Getting ready
- Using SQL to interact with DataFrames
- There's more...
- How it works...
- How to do it...
- Getting ready
- Creating a temporary table
- See also
- How it works...
- How to do it...
- Getting ready
- Specifying the schema programmatically
- See also
- How it works...
- How to do it...
- Getting ready
- Inferring the schema using reflection
- See also
- There's more...
- How it works...
- How to do it...
- Getting ready
- Performance optimizations
- How it works...
- How to do it...
- Getting ready
- Accessing underlying RDDs
- See also
- From CSV
- From JSON
- There's more...
- How it works...
- How to do it...
- Getting ready
- Creating DataFrames
- Introduction
- Abstracting Data with DataFrames
- How it works...
- How to do it...
- Getting ready
- Pitfalls of using RDDs
- How it works...
- .saveAsTextFile(...) action
- .count() action
- .reduce(...) action
- .collect() action
- .take(...) action
- How to do it...
- Getting ready
- Overview of RDD actions
- How it works...
- .mapPartitionsWithIndex(...) transformation
- .union(...) transformation
- .sortByKey(...) transformation
- .reduceByKey(...) transformation
- .zipWithIndex() transformation
- .repartition(...) transformation
- .join(...) transformation
- .sample(...) transformation
- .distinct() transformation
- .flatMap(...) transformation
- .filter(...) transformation
- .map(...) transformation
- How to do it...
- Getting ready
- Overview of RDD transformations
- Partitions and performance
- .map(...) method
- .textFile(...) method
- How it works...
- How to do it...
- Getting ready
- Reading data from files
- .take(...) method
- Spark context parallelize method
- How it works...
- How to do it...
- Getting ready
- Creating RDDs
- Introduction
- Abstracting Data with RDDs
- How it works...
- How to do it...
- Getting ready
- Working with Cloudera Spark images
- See also
- There's more...
- How it works...
- How to do it...
- Getting ready
- Configuring a session in Jupyter
- See also
- There's more...
- How it works...
- How to do it...
- Getting ready
- Installing Jupyter
- See also
- How it works...
- How to do it...
- Getting ready
- Configuring a multi-node instance of Spark
- See also
- How it works...
- How to do it...
- Getting ready
- Configuring a local instance of Spark
- There's more...
- How it works...
- How to do it...
- Getting ready
- Installing Spark from binaries
- See also
- There's more...
- How it works...
- How to do it...
- Getting ready
- Installing Spark from sources
- Updating PATH
- Installing Maven
- Installing Scala
- Installing R
- Installing Python
- Installing Java
- There's more...
- How it works...
- How to do it...
- Getting ready
- Installing Spark requirements
- Introduction
- Installing and Configuring Spark
- Reviews
- Get in touch
- See also
- There's more...
- How it works...
- How to do it...
- Getting ready
- Sections
- Conventions used
- Download the color images
- Download the example code files
- To get the most out of this book
- What this book covers
- Who this book is for
- Preface
- Packt is searching for authors like you
- About the reviewer
- About the authors
- Contributors
- PacktPub.com
- Why subscribe?
- Packt Upsell
- 版權信息
- 封面
- 封面
- 版權信息
- Packt Upsell
- Why subscribe?
- PacktPub.com
- Contributors
- About the authors
- About the reviewer
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Download the color images
- Conventions used
- Sections
- Getting ready
- How to do it...
- How it works...
- There's more...
- See also
- Get in touch
- Reviews
- Installing and Configuring Spark
- Introduction
- Installing Spark requirements
- Getting ready
- How to do it...
- How it works...
- There's more...
- Installing Java
- Installing Python
- Installing R
- Installing Scala
- Installing Maven
- Updating PATH
- Installing Spark from sources
- Getting ready
- How to do it...
- How it works...
- There's more...
- See also
- Installing Spark from binaries
- Getting ready
- How to do it...
- How it works...
- There's more...
- Configuring a local instance of Spark
- Getting ready
- How to do it...
- How it works...
- See also
- Configuring a multi-node instance of Spark
- Getting ready
- How to do it...
- How it works...
- See also
- Installing Jupyter
- Getting ready
- How to do it...
- How it works...
- There's more...
- See also
- Configuring a session in Jupyter
- Getting ready
- How to do it...
- How it works...
- There's more...
- See also
- Working with Cloudera Spark images
- Getting ready
- How to do it...
- How it works...
- Abstracting Data with RDDs
- Introduction
- Creating RDDs
- Getting ready
- How to do it...
- How it works...
- Spark context parallelize method
- .take(...) method
- Reading data from files
- Getting ready
- How to do it...
- How it works...
- .textFile(...) method
- .map(...) method
- Partitions and performance
- Overview of RDD transformations
- Getting ready
- How to do it...
- .map(...) transformation
- .filter(...) transformation
- .flatMap(...) transformation
- .distinct() transformation
- .sample(...) transformation
- .join(...) transformation
- .repartition(...) transformation
- .zipWithIndex() transformation
- .reduceByKey(...) transformation
- .sortByKey(...) transformation
- .union(...) transformation
- .mapPartitionsWithIndex(...) transformation
- How it works...
- Overview of RDD actions
- Getting ready
- How to do it...
- .take(...) action
- .collect() action
- .reduce(...) action
- .count() action
- .saveAsTextFile(...) action
- How it works...
- Pitfalls of using RDDs
- Getting ready
- How to do it...
- How it works...
- Abstracting Data with DataFrames
- Introduction
- Creating DataFrames
- Getting ready
- How to do it...
- How it works...
- There's more...
- From JSON
- From CSV
- See also
- Accessing underlying RDDs
- Getting ready
- How to do it...
- How it works...
- Performance optimizations
- Getting ready
- How to do it...
- How it works...
- There's more...
- See also
- Inferring the schema using reflection
- Getting ready
- How to do it...
- How it works...
- See also
- Specifying the schema programmatically
- Getting ready
- How to do it...
- How it works...
- See also
- Creating a temporary table
- Getting ready
- How to do it...
- How it works...
- There's more...
- Using SQL to interact with DataFrames
- Getting ready
- How to do it...
- How it works...
- There's more...
- Overview of DataFrame transformations
- Getting ready
- How to do it...
- The .select(...) transformation
- The .filter(...) transformation
- The .groupBy(...) transformation
- The .orderBy(...) transformation
- The .withColumn(...) transformation
- The .join(...) transformation
- The .unionAll(...) transformation
- The .distinct(...) transformation
- The .repartition(...) transformation
- The .fillna(...) transformation
- The .dropna(...) transformation
- The .dropDuplicates(...) transformation
- The .summary() and .describe() transformations
- The .freqItems(...) transformation
- See also
- Overview of DataFrame actions
- Getting ready
- How to do it...
- The .show(...) action
- The .collect() action
- The .take(...) action
- The .toPandas() action
- See also
- Preparing Data for Modeling
- Introduction
- Handling duplicates
- Getting ready
- How to do it...
- How it works...
- There's more...
- Only IDs differ
- ID collisions
- Handling missing observations
- Getting ready
- How to do it...
- How it works...
- Missing observations per row
- Missing observations per column
- There's more...
- See also
- Handling outliers
- Getting ready
- How to do it...
- How it works...
- See also
- Exploring descriptive statistics
- Getting ready
- How to do it...
- How it works...
- There's more...
- Descriptive statistics for aggregated columns
- See also
- Computing correlations
- Getting ready
- How to do it...
- How it works...
- There's more...
- Drawing histograms
- Getting ready
- How to do it...
- How it works...
- There's more...
- See also
- Visualizing interactions between features
- Getting ready
- How to do it...
- How it works...
- There's more...
- Machine Learning with MLlib
- Loading the data
- Getting ready
- How to do it...
- How it works...
- There's more...
- Exploring the data
- Getting ready
- How to do it...
- How it works...
- Numerical features
- Categorical features
- There's more...
- See also
- Testing the data
- Getting ready
- How to do it...
- How it works...
- See also...
- Transforming the data
- Getting ready
- How to do it...
- How it works...
- There's more...
- See also...
- Standardizing the data
- Getting ready
- How to do it...
- How it works...
- Creating an RDD for training
- Getting ready
- How to do it...
- Classification
- Regression
- How it works...
- There's more...
- See also
- Predicting hours of work for census respondents
- Getting ready
- How to do it...
- How it works...
- Forecasting the income levels of census respondents
- Getting ready
- How to do it...
- How it works...
- There's more...
- Building a clustering models
- Getting ready
- How to do it...
- How it works...
- There's more...
- See also
- Computing performance statistics
- Getting ready
- How to do it...
- How it works...
- Regression metrics
- Classification metrics
- See also
- Machine Learning with the ML Module
- Introducing Transformers
- Getting ready
- How to do it...
- How it works...
- There's more...
- See also
- Introducing Estimators
- Getting ready
- How to do it...
- How it works...
- There's more...
- Introducing Pipelines
- Getting ready
- How to do it...
- How it works...
- See also
- Selecting the most predictable features
- Getting ready
- How to do it...
- How it works...
- There's more...
- See also
- Predicting forest coverage types
- Getting ready
- How to do it...
- How it works...
- There's more...
- Estimating forest elevation
- Getting ready
- How to do it...
- How it works...
- There's more...
- Clustering forest cover types
- Getting ready
- How to do it...
- How it works...
- See also
- Tuning hyperparameters
- Getting ready
- How to do it...
- How it works...
- There's more...
- Extracting features from text
- Getting ready
- How to do it...
- How it works...
- There's more...
- See also
- Discretizing continuous variables
- Getting ready
- How to do it...
- How it works...
- Standardizing continuous variables
- Getting ready
- How to do it...
- How it works...
- Topic mining
- Getting ready
- How to do it...
- How it works...
- Structured Streaming with PySpark
- Introduction
- Understanding Spark Streaming
- Understanding DStreams
- Getting ready
- How to do it...
- Terminal 1 – Netcat window
- Terminal 2 – Spark Streaming window
- How it works...
- There's more...
- Understanding global aggregations
- Getting ready
- How to do it...
- Terminal 1 – Netcat window
- Terminal 2 – Spark Streaming window
- How it works...
- Continuous aggregation with structured streaming
- Getting ready
- How to do it...
- Terminal 1 – Netcat window
- Terminal 2 – Spark Streaming window
- How it works...
- GraphFrames – Graph Theory with PySpark
- Introduction
- Installing GraphFrames
- Getting ready
- How to do it...
- How it works...
- Preparing the data
- Getting ready
- How to do it...
- How it works...
- There's more...
- Building the graph
- How to do it...
- How it works...
- Running queries against the graph
- Getting ready
- How to do it...
- How it works...
- Understanding the graph
- Getting ready
- How to do it...
- How it works...
- Using PageRank to determine airport ranking
- Getting ready
- How to do it...
- How it works...
- Finding the fewest number of connections
- Getting ready
- How to do it...
- How it works...
- There's more...
- See also
- Visualizing the graph
- Getting ready
- How to do it...
- How it works... 更新時間:2021-06-18 19:07:31