- Python:Advanced Predictive Analytics
- Ashish Kumar Joseph Babcock
- 239字
- 2021-07-02 20:09:23
Summary
The main learning outcomes of this chapter are summarized as follows:
- Various methods and variations in importing a dataset using pandas:
read_csv
and its variations, reading a dataset using open method in Python, reading a file in chunks using theopen
method, reading directly from a URL, specifying the column names from a list, changing the delimiter of a dataset, and so on. - Basic exploratory analysis of data: observing a thumbnail of data, shape, column names, column types, and summary statistics for numerical variables
- Handling missing values: The reason for incorporation of missing values, why it is important to treat them properly, how to treat them properly by deletion and imputation, and various methods of imputing data.
- Creating dummy variables: creating dummy variables for categorical variables to be used in the predictive models.
- Basic plotting: scatter plotting, histograms and boxplots; their meaning and relevance; and how they are plotted.
This chapter is a head start into our journey to explore our data and wrangle it to make it modelling-worthy. The next chapter will go deeper in this pursuit whereby we will learn to aggregate values for categorical variables, sub-set the dataset, merge two datasets, generate random numbers, and sample a dataset.
Cleaning, as we have seen in the last chapter takes about 80% of the modelling time, so it's of critical importance and the methods we are learning will come in handy in the pursuit of that goal.
推薦閱讀
- 數(shù)據(jù)庫基礎(chǔ)教程(SQL Server平臺)
- GitHub Essentials
- 大數(shù)據(jù)技術(shù)基礎(chǔ)
- Python絕技:運用Python成為頂級數(shù)據(jù)工程師
- Python數(shù)據(jù)挖掘:入門、進階與實用案例分析
- 商業(yè)分析思維與實踐:用數(shù)據(jù)分析解決商業(yè)問題
- 3D計算機視覺:原理、算法及應用
- 數(shù)亦有道:Python數(shù)據(jù)科學指南
- SQL優(yōu)化最佳實踐:構(gòu)建高效率Oracle數(shù)據(jù)庫的方法與技巧
- 數(shù)據(jù)科學工程實踐:用戶行為分析與建模、A/B實驗、SQLFlow
- Instant Autodesk AutoCAD 2014 Customization with .NET
- 數(shù)據(jù)分析師養(yǎng)成寶典
- 數(shù)據(jù)指標體系:構(gòu)建方法與應用實踐
- 領(lǐng)域驅(qū)動設(shè)計精粹
- 云原生架構(gòu):從技術(shù)演進到最佳實踐