官术网_书友最值得收藏!

Mathematical operations allowed

Remember, at the interval level, we have addition and subtraction to work with. This is a real game-changer. With the ability to add values together, we may introduce two familiar concepts, the arithmetic mean (referred to simply as the mean) and standard deviation. At the interval level, both of these are available to us. To see a great example of this, let's pull in a new dataset, one about climate change:

# load in the data set
climate = pd.read_csv('../data/GlobalLandTemperaturesByCity.csv')
climate.head()

Let us have a look at the following table for a better understanding:

Let's see if we have any missing values with the following line of code:

climate.isnull().sum()

dt 0 AverageTemperature 0 AverageTemperatureUncertainty 0 City 0 Country 0 Latitude 0 Longitude 0 year 0 dtype: int64

# All good

The column in question is called AverageTemperature. One quality of data at the interval level, which temperature is, is that we cannot use a bar/pie chart here because we have too many values:

# show us the number of unique items
climate['AverageTemperature'].nunique()

111994

111,994 values is absurd to plot, and also absurd because we know that the data is quantitative. Likely, the most common graph to utilize starting at this level would be the histogram. This graph is a cousin of the bar graph, and visualizes buckets of quantities and shows frequencies of these buckets.

Let's see a histogram for the AverageTemperature around the world, to see the distribution of temperatures in a very holistic view:

climate['AverageTemperature'].hist()

The following is the output of the preceding code:

Here, we can see that we have an average value of 20°C. Let's confirm this:

climate['AverageTemperature'].describe()

count 8.235082e+06 mean 1.672743e+01 std 1.035344e+01 min -4.270400e+01 25% 1.029900e+01 50% 1.883100e+01 75% 2.521000e+01 max 3.965100e+01 Name: AverageTemperature, dtype: float64

We were close. The mean seems to be around 17°. Let's make this a bit more fun and add new columns called year and century, and also subset the data to only be the temperatures recorded in the US:

# Convert the dt column to datetime and extract the year
climate['dt'] = pd.to_datetime(climate['dt'])
climate['year'] = climate['dt'].map(lambda value: value.year)

climate_sub_us['century'] = climate_sub_us['year'].map(lambda x: x/100+1)
# 1983 would become 20
# 1750 would become 18

# A subset the data to just the US
climate_sub_us = climate.loc[climate['Country'] == 'United States']

With the new column century, let's plot four histograms of temperature, one for each century:

climate_sub_us['AverageTemperature'].hist(by=climate_sub_us['century'],
sharex=True, sharey=True,
figsize=(10, 10),
bins=20)

The following is the output of the preceding code:

Here, we have our four histograms, showing that the AverageTemperature is going up slightly. Let's confirm this:

climate_sub_us.groupby('century')['AverageTemperature'].mean().plot(kind='line')

The following is the output of the preceding code:

Interesting! And because differences are significant at this level, we can answer the question of how much, on average, the temperature has risen since the 18th century in the US. Let's store the changes over the centuries as its own pandas Series object first:

century_changes = climate_sub_us.groupby('century')['AverageTemperature'].mean()

century_changes

century 18 12.073243 19 13.662870 20 14.386622 21 15.197692 Name: AverageTemperature, dtype: float64

And now, let's use the indices in the Series to subtract the value in the 21st century minus the value in the 18th century, to get the difference in temperature:

# 21st century average temp in US minus 18th century average temp in US
century_changes[21] - century_changes[18]

# average difference in monthly recorded temperature in the US since the 18th century
3.12444911546
主站蜘蛛池模板: 兴山县| 长治市| 廉江市| 特克斯县| 娱乐| 闽侯县| 巴林左旗| 雅江县| 蓬溪县| 阳西县| 家居| 上栗县| 钟山县| 阳城县| 安吉县| 云南省| 科技| 宁武县| 莱州市| 新龙县| 封开县| 饶平县| 隆回县| 平陆县| 鄂尔多斯市| 宜川县| 东乡族自治县| 日照市| 嘉祥县| 鲁山县| 石门县| 将乐县| 巴马| 尼勒克县| 松潘县| 大田县| 东港市| 外汇| 石首市| 修文县| 南陵县|