官术网_书友最值得收藏!

Using and understanding IPython (Jupyter) Notebooks

Congratulations on your installation! Let's now explore using Jupyter Notebooks, which is also known as IPython Notebook. These days, the more modern name is the Jupyter Notebook, but a lot of people still call it an IPython Notebook, and I consider the names interchangeable for working developers as a result. I do also find the name IPython Notebooks helps me remember the notebook file name suffix which is .ipynb as you'll get to know very well in this book!

Okay so now let's take it right from the top again - with our first exploration of the IPython/Jupyter Notebook. If you haven't yet done so, please navigate to the DataScience folder where we have downloaded all the materials for this book. For me, that's E:DataScience, and if you didn't do so during the preceding installation section, please now double-click and open up the Outliers.ipynb file.

Now what's going to happen when we double-click on this IPython .ipynb file is that first of all it's going to spark up Canopy, if it's not sparked up already, and then it's going to launch a web browser. This is how the full Outliers notebook webpage looks within my browser:

As you can see here, notebooks are structured in such a way that I can intersperse my little notes and commentary about what you're seeing here within the actual code itself, and you can actually run this code within your web browser! So, it's a very handy format for me to give you sort of a little reference that you can use later on in life to go and remind yourself how these algorithms work that we're going to talk about, and actually experiment with them and play with them yourself.

The way that the IPython/Jupyter Notebook files work is that they actually run from within your browser, like a webpage, but they're backed by the Python engine that you installed. So you should be seeing a screen similar to the one shown in the previous screenshot.

You'll notice as you scroll down the notebook in your browser, there are code blocks. They're easy to spot because they contain our actual code. Please find the code box for this code in the Outliers notebook, quite near the top:

%matplotlib inline 
import numpy as np 
 
incomes = np.random.normal(27000, 15000, 10000) 
incomes = np.append(incomes, [1000000000]) 
 
import matplotlib.pyplot as plt 
plt.hist(incomes, 50) 
plt.show() 

Let's take a quick look at this code while we're here. We are setting up a little income distribution in this code. We're simulating the distribution of income in a population of people, and to illustrate the effect that an outlier can have on that distribution, we're simulating Donald Trump entering the mix and messing up the mean value of the income distribution. By the way, I'm not making a political statement, this was all done before Trump became a politician. So you know, full disclosure there.

We can select any code block in the notebook by clicking on it. So if you now click in the code block that contains the code we just looked at above, we can then hit the run button at the top to run it. Here's the area at the top of the screen where you'll find the Run button:

Hitting the Run button with the code block selected, will cause this graph to be regenerated:

Similarly, we can click on the next code block a little further down, you'll spot the one which has the following single line of code :

incomes.mean() 

If you select the code block containing this line, and hit the Run button to run the code, you'll see the output below it, which ends up being a very large value because of the effect of that outlier, something like this:

127148.50796177129

Let's keep going and have some fun. In the next code block down, you'll see the following code, which tries to detect outliers like Donald Trump and remove them from the dataset:

def reject_outliers(data): 
    u = np.median(data) 
    s = np.std(data) 
    filtered = [e for e in data if (u - 2 * s < e < u + 2 * s)] 
    return filtered 
 
filtered = reject_outliers(incomes) 
plt.hist(filtered, 50) 
plt.show() 

So select the corresponding code block in the notebook, and press the run button again. When you do that, you'll see this graph instead:

Now we see a much better histogram that represents the more typical American - now that we've taken out our outlier that was messing things up.

So, at this point, you have everything you need to get started in this course. We have all the data you need, all the scripts, and the development environment for Python and Python notebooks. So, let's rock and roll. Up next we're going to do a little crash course on Python itself, and even if you're familiar with Python, it might be a good little refresher so you might want to watch it regardless. Let's dive in and learn Python.

主站蜘蛛池模板: 奉贤区| 龙门县| 西畴县| 上思县| 五寨县| 镶黄旗| 碌曲县| 定南县| 全椒县| 崇信县| 三门峡市| 东阿县| 定南县| 广汉市| 乐安县| 托克逊县| 沂南县| 丽江市| 察隅县| 元氏县| 綦江县| 筠连县| 洛宁县| 威海市| 嘉定区| 班玛县| 庆安县| 衡阳市| 静乐县| 长岛县| 赣榆县| 宜川县| 如皋市| 衡南县| 郸城县| 韩城市| 绥中县| 鄱阳县| 虞城县| 南充市| 巴南区|