官术网_书友最值得收藏!

Python and its packages for predictive modelling

In this section, we will discuss some commonly used packages for predictive modelling.

pandas: The most important and versatile package that is used widely in data science domains is pandas and it is no wonder that you can see import pandas at the beginning of any data science code snippet, in this book, and anywhere in general. Among other things, the pandas package facilitates:

  • The reading of a dataset in a usable format (data frame in case of Python)
  • Calculating basic statistics
  • Running basic operations like sub-setting a dataset, merging/concatenating two datasets, handling missing data, and so on

The various methods in pandas will be explained in this book as and when we use them.

Note

To get an overview, navigate to the official page of pandas here: http://pandas.pydata.org/index.html

NumPy: NumPy, in many ways, is a MATLAB equivalent in the Python environment. It has powerful methods to do mathematical calculations and simulations. The following are some of its features:

  • A powerful and widely used a N-d array element
  • An ensemble of powerful mathematical functions used in linear algebra, Fourier transforms, and random number generation
  • A combination of random number generators and an N-d array elements is used to generate dummy datasets to demonstrate various procedures, a practice we will follow extensively, in this book
Note

To get an overview, navigate to official page of NumPy at http://www.NumPy.org/

matplotlib: matplotlib is a Python library that easily generates high-quality 2-D plots. Again, it is very similar to MATLAB.

  • It can be used to plot all kind of common plots, such as histograms, stacked and unstacked bar charts, scatterplots, heat diagrams, box plots, power spectra, error charts, and so on
  • It can be used to edit and manipulate all the plot properties such as title, axes properties, color, scale, and so on
Note

To get an overview, navigate to the official page of matplotlib at: http://matplotlib.org

IPython: IPython provides an environment for interactive computing.

It provides a browser-based notebook that is an IDE-cum-development environment to support codes, rich media, inline plots, and model summary. These notebooks and their content can be saved and used later to demonstrate the result as it is or to save the codes separately and execute them. It has emerged as a powerful tool for web based tutorials as the code and the results flow smoothly one after the other in this environment. At many places in this book, we will be using this environment.

Note

To get an overview, navigate to the official page of IPython here http://ipython.org/

Scikit-learn: scikit-learn is the mainstay of any predictive modelling in Python. It is a robust collection of all the data science algorithms and methods to implement them. Some of the features of scikit-learn are as follows:

  • It is built entirely on Python packages like pandas, NumPy, and matplotlib
  • It is very simple and efficient to use
  • It has methods to implement most of the predictive modelling techniques, such as linear regression, logistic regression, clustering, and Decision Trees
  • It gives a very concise method to predict the outcome based on the model and measure the accuracy of the outcomes
Note

To get an overview, navigate to the official page of scikit-learn here: http://scikit-learn.org/stable/index.html

Python packages, other than these, if used in this book, will be situation based and can be installed using the method described earlier in this section.

主站蜘蛛池模板: 金秀| 黄龙县| 奉节县| 全州县| 湖北省| 昌宁县| 彩票| 富裕县| 南陵县| 池州市| 南平市| 福建省| 涞水县| 云和县| 咸丰县| 普陀区| 龙山县| 陵川县| 隆昌县| 宽城| 台安县| 康马县| 万全县| 灌云县| 庆云县| 綦江县| 辽阳市| 原平市| 浦东新区| 信阳市| 榆树市| 江都市| 山东省| 柘荣县| 柞水县| 南宁市| 南陵县| 延长县| 清水河县| 改则县| 固阳县|