舉報

會員
Mastering Machine Learning with Spark 2.x
最新章節:
Summary
Areyouadeveloperwithabackgroundinmachinelearningandstatisticswhoisfeelinglimitedbythecurrentslowand“smalldata”machinelearningtools?Thenthisisthebookforyou!Inthisbook,youwillcreatescalablemachinelearningapplicationstopoweramoderndata-drivenbusinessusingSpark.WeassumethatyoualreadyknowthemachinelearningconceptsandalgorithmsandhaveSparkupandrunning(whetheronaclusterorlocally)andhaveabasicknowledgeofthevariouslibrariescontainedinSpark.
目錄(183章)
倒序
- cover
- Title Page
- Copyright
- Mastering Machine Learning with Spark 2.x
- Credits
- About the Authors
- About the Reviewer
- www.PacktPub.com
- Why subscribe?
- Customer Feedback
- Preface
- What this book covers
- What you need for this book
- Who this book is for
- Conventions
- Reader feedback
- Customer support
- Downloading the example code
- Downloading the color images of this book
- Errata
- Piracy
- Questions
- Introduction to Large-Scale Machine Learning and Spark
- Data science
- The sexiest role of the 21st century – data scientist?
- A day in the life of a data scientist
- Working with big data
- The machine learning algorithm using a distributed environment
- Splitting of data into multiple machines
- From Hadoop MapReduce to Spark
- What is Databricks?
- Inside the box
- Introducing H2O.ai
- Design of Sparkling Water
- What's the difference between H2O and Spark's MLlib?
- Data munging
- Data science - an iterative process
- Summary
- Detecting Dark Matter - The Higgs-Boson Particle
- Type I versus type II error
- Finding the Higgs-Boson particle
- The LHC and data creation
- The theory behind the Higgs-Boson
- Measuring for the Higgs-Boson
- The dataset
- Spark start and data load
- Labeled point vector
- Data caching
- Creating a training and testing set
- What about cross-validation?
- Our first model – decision tree
- Gini versus Entropy
- Next model – tree ensembles
- Random forest model
- Grid search
- Gradient boosting machine
- Last model - H2O deep learning
- Build a 3-layer DNN
- Adding more layers
- Building models and inspecting results
- Summary
- Ensemble Methods for Multi-Class Classification
- Data
- Modeling goal
- Challenges
- Machine learning workflow
- Starting Spark shell
- Exploring data
- Missing data
- Summary of missing value analysis
- Data unification
- Missing values
- Categorical values
- Final transformation
- Modelling data with Random Forest
- Building a classification model using Spark RandomForest
- Classification model evaluation
- Spark model metrics
- Building a classification model using H2O RandomForest
- Summary
- Predicting Movie Reviews Using NLP and Spark Streaming
- NLP - a brief primer
- The dataset
- Dataset preparation
- Feature extraction
- Feature extraction method– bag-of-words model
- Text tokenization
- Declaring our stopwords list
- Stemming and lemmatization
- Featurization - feature hashing
- Term Frequency - Inverse Document Frequency (TF-IDF) weighting scheme
- Let's do some (model) training!
- Spark decision tree model
- Spark Naive Bayes model
- Spark random forest model
- Spark GBM model
- Super-learner model
- Super learner
- Composing all transformations together
- Using the super-learner model
- Summary
- Word2vec for Prediction and Clustering
- Motivation of word vectors
- Word2vec explained
- What is a word vector?
- The CBOW model
- The skip-gram model
- Fun with word vectors
- Cosine similarity
- Doc2vec explained
- The distributed-memory model
- The distributed bag-of-words model
- Applying word2vec and exploring our data with vectors
- Creating document vectors
- Supervised learning task
- Summary
- Extracting Patterns from Clickstream Data
- Frequent pattern mining
- Pattern mining terminology
- Frequent pattern mining problem
- The association rule mining problem
- The sequential pattern mining problem
- Pattern mining with Spark MLlib
- Frequent pattern mining with FP-growth
- Association rule mining
- Sequential pattern mining with prefix span
- Pattern mining on MSNBC clickstream data
- Deploying a pattern mining application
- The Spark Streaming module
- Summary
- Graph Analytics with GraphX
- Basic graph theory
- Graphs
- Directed and undirected graphs
- Order and degree
- Directed acyclic graphs
- Connected components
- Trees
- Multigraphs
- Property graphs
- GraphX distributed graph processing engine
- Graph representation in GraphX
- Graph properties and operations
- Building and loading graphs
- Visualizing graphs with Gephi
- Gephi
- Creating GEXF files from GraphX graphs
- Advanced graph processing
- Aggregating messages
- Pregel
- GraphFrames
- Graph algorithms and applications
- Clustering
- Vertex importance
- GraphX in context
- Summary
- Lending Club Loan Prediction
- Motivation
- Goal
- Data
- Data dictionary
- Preparation of the environment
- Data load
- Exploration – data analysis
- Basic clean up
- Useless columns
- String columns
- Loan progress columns
- Categorical columns
- Text columns
- Missing data
- Prediction targets
- Loan status model
- Base model
- The emp_title column transformation
- The desc column transformation
- Interest RateModel
- Using models for scoring
- Model deployment
- Stream creation
- Stream transformation
- Stream output
- Summary 更新時間:2021-07-02 18:46:37
推薦閱讀
- Python 3.7網絡爬蟲快速入門
- 精通JavaScript+jQuery:100%動態網頁設計密碼
- 自然語言處理實戰:預訓練模型應用及其產品化
- DevOps with Kubernetes
- Android和PHP開發最佳實踐(第2版)
- 零起步玩轉掌控板與Mind+
- Java面向對象軟件開發
- Oracle 12c中文版數據庫管理、應用與開發實踐教程 (清華電腦學堂)
- NativeScript for Angular Mobile Development
- Git高手之路
- Object-Oriented JavaScript(Second Edition)
- 小學生C++創意編程(視頻教學版)
- 好好學Java:從零基礎到項目實戰
- 跟戴銘學iOS編程:理順核心知識點
- INSTANT JQuery Flot Visual Data Analysis
- C++服務器開發精髓
- MongoDB Administrator’s Guide
- JSP程序設計與案例教程
- OpenACC并行編程實戰
- IBM Cognos 10 Report Studio Cookbook(Second Edition)
- LLVM Essentials
- Python從入門到全棧開發
- Python大數據分析與機器學習商業案例實戰
- JavaScript權威指南(原書第6版)
- PostgreSQL Development Essentials
- Python人工智能編程實踐
- JavaScript DOM編程藝術(第2版)
- OpenShift云原生架構:原理與實踐
- Python編程從小白到大牛
- Internet of Things with Arduino Blueprints