舉報

會員
Python Data Mining Quick Start Guide
Dataminingisanecessaryandpredictableresponsetothedawnoftheinformationage.Itistypicallydefinedasthepatternand/ortrenddiscoveryphaseinthedataminingpipeline,andPythonisapopulartoolforperformingthesetasksasitoffersawidevarietyoftoolsfordatamining.ThisbookwillserveasaquickintroductiontotheconceptofdataminingandputtingittopracticalusewiththehelpofpopularPythonpackagesandlibraries.Youwillgetahands-ondemonstrationofworkingwithdifferentreal-worlddatasetsandextractingusefulinsightsfromthemusingpopularPythonlibrariessuchasNumPy,pandas,scikit-learn,andmatplotlib.Youwillthenlearnthedifferentstagesofdataminingsuchasdataloading,cleaning,analysis,andvisualization.Youwillalsogetafullconceptualdescriptionofpopulardatatransformation,clustering,andclassificationtechniques.Bytheendofthisbook,youwillbeabletobuildanefficientdataminingpipelineusingPythonwithoutanyhassle.
目錄(168章)
倒序
- coverpage
- Title Page
- Copyright and Credits
- Python Data Mining Quick Start Guide
- Dedication
- About Packt
- Why subscribe?
- Packt.com
- Contributors
- About the author
- About the reviewer
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Download the color images
- Conventions used
- Get in touch
- Reviews
- Data Mining and Getting Started with Python Tools
- Descriptive predictive and prescriptive analytics
- What will and will not be covered in this book
- Recommended readings for further explanation
- Setting up Python environments for data mining
- Installing the Anaconda distribution and Conda package manager
- Installing on Linux
- Installing on Windows
- Installing on macOS
- Launching the Spyder IDE
- Launching a Jupyter Notebook
- Installing high-performance Python distribution
- Recommended libraries and how to install
- Recommended libraries
- Summary
- Basic Terminology and Our End-to-End Example
- Basic data terminology
- Sample spaces
- Variable types
- Data types
- Basic summary statistics
- An end-to-end example of data mining in Python
- Loading data into memory – viewing and managing with ease using pandas
- Plotting and exploring data – harnessing the power of Seaborn
- Transforming data – PCA and LDA with scikit-learn
- Quantifying separations – k-means clustering and the silhouette score
- Making decisions or predictions
- Summary
- Collecting Exploring and Visualizing Data
- Types of data sources and loading into pandas
- Databases
- Basic Structured Query Language (SQL) queries
- Disks
- Web sources
- From URLs
- From Scikit-learn and Seaborn-included sets
- Access search and sanity checks with pandas
- Basic plotting in Seaborn
- Popular types of plots for visualizing data
- Scatter plots
- Histograms
- Jointplots
- Violin plots
- Pairplots
- Summary
- Cleaning and Readying Data for Analysis
- The scikit-learn transformer API
- Cleaning input data
- Missing values
- Finding and removing missing values
- Imputing to replace the missing values
- Feature scaling
- Normalization
- Standardization
- Handling categorical data
- Ordinal encoding
- One-hot encoding
- Label encoding
- High-dimensional data
- Dimension reduction
- Feature selection
- Feature filtering
- The variance threshold
- The correlation coefficient
- Wrapper methods
- Sequential feature selection
- Transformation
- PCA
- LDA
- Summary
- Grouping and Clustering Data
- Introducing clustering concepts
- Location of the group
- Euclidean space (centroids)
- Non-Euclidean space (medioids)
- Similarity
- Euclidean space
- The Euclidean distance
- The Manhattan distance
- Maximum distance
- Non-Euclidean space
- The cosine distance
- The Jaccard distance
- Termination condition
- With known number of groupings
- Without known number of groupings
- Quality score and silhouette score
- Clustering methods
- Means separation
- K-means
- Finding k
- K-means++
- Mini batch K-means
- Hierarchical clustering
- Reuse the dendrogram to find number of clusters
- Plot dendrogram
- Density clustering
- Spectral clustering
- Summary
- Prediction with Regression and Classification
- Scikit-learn Estimator API
- Introducing prediction concepts
- Prediction nomenclature
- Mathematical machinery
- Loss function
- Gradient descent
- Fit quality regimes
- Regression
- Metrics of regression model prediction
- Regression example dataset
- Linear regression
- Extension to multivariate form
- Regularization with penalized regression
- Regularization penalties
- Classification
- Classification example dataset
- Metrics of classification model prediction
- Multi-class classification
- One-versus-all
- One-versus-one
- Logistic regression
- Regularized logistic regression
- Support vector machines
- Soft-margin with C
- The kernel trick
- Tree-based classification
- Decision trees
- Node splitting with Gini
- Random forest
- Avoid overfitting and speed up the fits
- Built-in validation with bagging
- Tuning a prediction model
- Cross-validation
- Introduction of the validation set
- Multiple validation sets with k-fold method
- Grid search for hyperparameter tuning
- Summary
- Advanced Topics - Building a Data Processing Pipeline and Deploying It
- Pipelining your analysis
- Scikit-learn's pipeline object
- Deploying the model
- Serializing a model and storing with the pickle module
- Loading a serialized model and predicting
- Python-specific deployment concerns
- Summary
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時間:2021-06-24 15:20:20
推薦閱讀
- 機(jī)器學(xué)習(xí)實(shí)戰(zhàn):基于Sophon平臺的機(jī)器學(xué)習(xí)理論與實(shí)踐
- 工業(yè)機(jī)器人產(chǎn)品應(yīng)用實(shí)戰(zhàn)
- Hands-On Neural Networks with Keras
- Windows程序設(shè)計與架構(gòu)
- 21天學(xué)通ASP.NET
- 21天學(xué)通Java
- 數(shù)據(jù)挖掘方法及天體光譜挖掘技術(shù)
- 運(yùn)動控制器與交流伺服系統(tǒng)的調(diào)試和應(yīng)用
- Nginx高性能Web服務(wù)器詳解
- Learning Azure Cosmos DB
- Excel 2007常見技法與行業(yè)應(yīng)用實(shí)例精講
- Learning Apache Apex
- 大數(shù)據(jù):從基礎(chǔ)理論到最佳實(shí)踐
- NetSuite ERP for Administrators
- Ubuntu 9 Linux應(yīng)用基礎(chǔ)
- 單片機(jī)技術(shù)
- VMware vSphere 6.5 Cookbook(Third Edition)
- 數(shù)字系統(tǒng)設(shè)計與Verilog HDL
- Building Virtual Pentesting Labs for Advanced Penetration Testing(Second Edition)
- Mastering Adobe Premiere Pro CS6 Hotshot
- 仿蛇機(jī)器人的設(shè)計與制作
- 過程控制與集散系統(tǒng)
- 微機(jī)原理與接口技術(shù)
- 單片開關(guān)電源集成電路應(yīng)用設(shè)計實(shí)例
- Flink基礎(chǔ)教程
- Fedora 31 Essentials
- Mastering Kibana 6.x
- Cassandra High Availability
- 電子商務(wù)網(wǎng)絡(luò)技術(shù)基礎(chǔ)
- 數(shù)據(jù)庫應(yīng)用基礎(chǔ):Access 2007