舉報(bào)

會(huì)員
Feature Engineering Made Easy
Ifyouareadatascienceprofessionaloramachinelearningengineerlookingtostrengthenyourpredictiveanalyticsmodel,thenthisbookisaperfectguideforyou.SomebasicunderstandingofthemachinelearningconceptsandPythonscriptingwouldbeenoughtogetstartedwiththisbook.
目錄(174章)
倒序
- 封面
- 版權(quán)信息
- Packt Upsell
- Why subscribe?
- PacktPub.com
- Contributors
- About the authors
- About the reviewer
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Download the color images
- Conventions used
- Get in touch
- Reviews
- Introduction to Feature Engineering
- Motivating example – AI-powered communications
- Why feature engineering matters
- What is feature engineering?
- Understanding the basics of data and machine learning
- Supervised learning
- Unsupervised learning
- Unsupervised learning example – marketing segments
- Evaluation of machine learning algorithms and feature engineering procedures
- Example of feature engineering procedures – can anyone really predict the weather?
- Steps to evaluate a feature engineering procedure
- Evaluating supervised learning algorithms
- Evaluating unsupervised learning algorithms
- Feature understanding – what’s in my dataset?
- Feature improvement – cleaning datasets
- Feature selection – say no to bad attributes
- Feature construction – can we build it?
- Feature transformation – enter math-man
- Feature learning – using AI to better our AI
- Summary
- Feature Understanding – What's in My Dataset?
- The structure or lack thereof of data
- An example of unstructured data – server logs
- Quantitative versus qualitative data
- Salary ranges by job classification
- The four levels of data
- The nominal level
- Mathematical operations allowed
- The ordinal level
- Mathematical operations allowed
- The interval level
- Mathematical operations allowed
- Plotting two columns at the interval level
- The ratio level
- Mathematical operations allowed
- Recap of the levels of data
- Summary
- Feature Improvement - Cleaning Datasets
- Identifying missing values in data
- The Pima Indian Diabetes Prediction dataset
- The exploratory data analysis (EDA)
- Dealing with missing values in a dataset
- Removing harmful rows of data
- Imputing the missing values in data
- Imputing values in a machine learning pipeline
- Pipelines in machine learning
- Standardization and normalization
- Z-score standardization
- The min-max scaling method
- The row normalization method
- Putting it all together
- Summary
- Feature Construction
- Examining our dataset
- Imputing categorical features
- Custom imputers
- Custom category imputer
- Custom quantitative imputer
- Encoding categorical variables
- Encoding at the nominal level
- Encoding at the ordinal level
- Bucketing continuous features into categories
- Creating our pipeline
- Extending numerical features
- Activity recognition from the Single Chest-Mounted Accelerometer dataset
- Polynomial features
- Parameters
- Exploratory data analysis
- Text-specific feature construction
- Bag of words representation
- CountVectorizer
- CountVectorizer parameters
- The Tf-idf vectorizer
- Using text in machine learning pipelines
- Summary
- Feature Selection
- Achieving better performance in feature engineering
- A case study – a credit card defaulting dataset
- Creating a baseline machine learning pipeline
- The types of feature selection
- Statistical-based feature selection
- Using Pearson correlation to select features
- Feature selection using hypothesis testing
- Interpreting the p-value
- Ranking the p-value
- Model-based feature selection
- A brief refresher on natural language processing
- Using machine learning to select features
- Tree-based model feature selection metrics
- Linear models and regularization
- A brief introduction to regularization
- Linear model coefficients as another feature importance metric
- Choosing the right feature selection method
- Summary
- Feature Transformations
- Dimension reduction – feature transformations versus feature selection versus feature construction
- Principal Component Analysis
- How PCA works
- PCA with the Iris dataset – manual example
- Creating the covariance matrix of the dataset
- Calculating the eigenvalues of the covariance matrix
- Keeping the top k eigenvalues (sorted by the descending eigenvalues)
- Using the kept eigenvectors to transform new data-points
- Scikit-learn's PCA
- How centering and scaling data affects PCA
- A deeper look into the principal components
- Linear Discriminant Analysis
- How LDA works
- Calculating the mean vectors of each class
- Calculating within-class and between-class scatter matrices
- Calculating eigenvalues and eigenvectors for SW-1SB
- Keeping the top k eigenvectors by ordering them by descending eigenvalues
- Using the top eigenvectors to project onto the new space
- How to use LDA in scikit-learn
- LDA versus PCA – iris dataset
- Summary
- Feature Learning
- Parametric assumptions of data
- Non-parametric fallacy
- The algorithms of this chapter
- Restricted Boltzmann Machines
- Not necessarily dimension reduction
- The graph of a Restricted Boltzmann Machine
- The restriction of a Boltzmann Machine
- Reconstructing the data
- MNIST dataset
- The BernoulliRBM
- Extracting PCA components from MNIST
- Extracting RBM components from MNIST
- Using RBMs in a machine learning pipeline
- Using a linear model on raw pixel values
- Using a linear model on extracted PCA components
- Using a linear model on extracted RBM components
- Learning text features – word vectorizations
- Word embeddings
- Two approaches to word embeddings - Word2vec and GloVe
- Word2Vec - another shallow neural network
- The gensim package for creating Word2vec embeddings
- Application of word embeddings - information retrieval
- Summary
- Case Studies
- Case study 1 - facial recognition
- Applications of facial recognition
- The data
- Some data exploration
- Applied facial recognition
- Case study 2 - predicting topics of hotel reviews data
- Applications of text clustering
- Hotel review data
- Exploration of the data
- The clustering model
- SVD versus PCA components
- Latent semantic analysis
- Summary
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時(shí)間:2021-06-25 22:46:20
推薦閱讀
- GitHub Essentials
- Python金融大數(shù)據(jù)分析(第2版)
- WS-BPEL 2.0 Beginner's Guide
- 大數(shù)據(jù)架構(gòu)商業(yè)之路:從業(yè)務(wù)需求到技術(shù)方案
- 高維數(shù)據(jù)分析預(yù)處理技術(shù)
- Oracle RAC日記
- Python數(shù)據(jù)分析與數(shù)據(jù)化運(yùn)營
- 改變未來的九大算法
- Unity 2018 By Example(Second Edition)
- Access 2010數(shù)據(jù)庫程序設(shè)計(jì)實(shí)踐教程
- 數(shù)據(jù)指標(biāo)體系:構(gòu)建方法與應(yīng)用實(shí)踐
- 數(shù)據(jù)中心經(jīng)營之道
- MySQL數(shù)據(jù)庫應(yīng)用與管理
- Kubernetes快速進(jìn)階與實(shí)戰(zhàn)
- AI Crash Course
- 推薦系統(tǒng)全鏈路設(shè)計(jì):原理解讀與業(yè)務(wù)實(shí)踐
- SQL Server 2012 數(shù)據(jù)庫教程(第3版)
- C# 7 and .NET Core 2.0 High Performance
- Hands-On Java Deep Learning for Computer Vision
- Kafka權(quán)威指南(第2版)
- 高效使用Redis:一書學(xué)透數(shù)據(jù)存儲(chǔ)與高可用集群
- 一本書講透數(shù)據(jù)治理:戰(zhàn)略、方法、工具與實(shí)踐
- Hands-On Big Data Analytics with PySpark
- UnrealScript Game Programming Cookbook
- 自己動(dòng)手做大數(shù)據(jù)系統(tǒng)(第2版)
- 21天學(xué)通SQL Server
- 國產(chǎn)高分衛(wèi)星遙感數(shù)據(jù)在自然資源調(diào)查中的應(yīng)用實(shí)踐:以甘肅、寧夏為例
- 低代碼極速物聯(lián)網(wǎng)開發(fā)指南:基于阿里云IoT Studio快速構(gòu)建物聯(lián)網(wǎng)項(xiàng)目
- Access數(shù)據(jù)庫創(chuàng)建、使用與管理從新手到高手
- 智能數(shù)據(jù):如何挖掘高價(jià)值數(shù)據(jù)