官术网_书友最值得收藏!

Installing software and setting up

For most projects in this book we need scikit-learn (refer to, http://scikit-learn.org/stable/install.html) and matplotlib (refer to, http://matplotlib.org/users/installing.html). Both packages require NumPy, but we also need SciPy for sparse matrices as mentioned before. The scikit-learn library is a machine learning package, which is optimized for performance as a lot of the code runs almost as fast as equivalent C code. The same statement is true for NumPy and SciPy. There are various ways to speed up the code, however they are out of scope for this book, so if you want to know more, please consult the documentation.

matplotlib is a plotting and visualization package. We can also use the seaborn package for visualization. Seaborn uses matplotlib under the hood. There are several other Python visualization packages that cover different usage scenarios. matplotlib and seaborn are mostly useful for the visualization for small to medium datasets. The NumPy package offers the ndarray class and various useful array functions. The ndarray class is an array, that can be one or multi-dimensional. This class also has several subclasses representing matrices, masked arrays, and heterogeneous record arrays. In machine learning we mainly use NumPy arrays to store feature vectors or matrices composed of feature vectors. SciPy uses NumPy arrays and offers a variety of scientific and mathematical functions. We also require the pandas library for data wrangling.

In this book, we will use Python 3. As you may know, Python 2 will no longer be supported after 2020, so I strongly recommend switching to Python 3. If you are stuck with Python 2 you should still be able to modify the example code to work for you. In my opinion, the Anaconda Python 3 distribution is the best option. Anaconda is a free Python distribution for data analysis and scientific computing. It has its own package manager, conda. The distribution includes more than 200 Python packages, which makes it very convenient. For casual users, the Miniconda distribution may be the better choice. Miniconda contains the conda package manager and Python.

The procedures to install Anaconda and Miniconda are similar. Obviously, Anaconda takes more disk space. Follow the instructions from the Anaconda website at http://conda.pydata.org/docs/install/quick.html. First, you have to download the appropriate installer for your operating system and Python version. Sometimes you can choose between a GUI and a command line installer. I used the Python 3 installer, although my system Python version is 2.7. This is possible since Anaconda comes with its own Python. On my machine the Anaconda installer created an anaconda directory in my home directory and required about 900 MB. The Miniconda installer installs a miniconda directory in your home directory. Installation instructions for NumPy are at http://docs.scipy.org/doc/numpy/user/install.html.

Alternatively install NumPy with pip as follows:

$ [sudo] pip install numpy

The command for Anaconda users is:

$ conda install numpy

To install the other dependencies, substitute NumPy by the appropriate package. Please read the documentation carefully, not all options work equally well for each operating system. The pandas installation documentation is at http://pandas.pydata.org/pandas-docs/dev/install.html.

主站蜘蛛池模板: 清水县| 清苑县| 资兴市| 永靖县| 东安县| 托克托县| 伊通| 清新县| 神池县| 剑阁县| 汪清县| 自治县| 昌图县| 通城县| 松桃| 绥阳县| 旬阳县| 双柏县| 林芝县| 建水县| 井研县| 德保县| 汽车| 辽阳县| 金寨县| 庆云县| 牡丹江市| 镇赉县| 贵溪市| 徐汇区| 沈阳市| 巨鹿县| 宜黄县| 四平市| 建水县| 洞口县| 兴仁县| 徐州市| 德江县| 洮南市| 法库县|