官术网_书友最值得收藏!

  • Feature Engineering Made Easy
  • Sinan Ozdemir Divya Susarla
  • 324字
  • 2021-06-25 22:45:55

Plotting two columns at the interval level

One large advantage of having two columns of data at the interval level, or higher, is that it opens us up to using scatter plots where we can graph two columns of data on our axes and visualize data-points as literal points on the graph. The year and averageTemperature column of our climate change dataset are both at the interval level, as they both have meaning differences, so let's take a crack at plotting all of the monthly recorded US temperatures as a scatter plot, where the x axis will be the year and the y axis will be the temperature. We hope to notice a trending increase in temperature, as the line graph previously suggested:

x = climate_sub_us['year']
y = climate_sub_us['AverageTemperature']
fig, ax = plt.subplots(figsize=(10,5))
ax.scatter(x, y)
plt.show()

The following is the output of the preceding code:

Oof, that's not pretty. There seems to be a lot of noise, and that is to be expected. Every year has multiple towns reporting multiple average temperatures, so it makes sense that we see many vertical points at each year.

Let's employ a groupby the year column to remove much of this noise:

# Let's use a groupby to reduce the amount of noise in the US
climate_sub_us.groupby('year').mean()['AverageTemperature'].plot()

The following is the output of the preceding code:

Better! We can definitely see the increase over the years, but let's smooth it out slightly by taking a rolling mean over the years:

# A moving average to smooth it all out:
climate_sub_us.groupby('year').mean()['AverageTemperature'].rolling(10).mean().plot()

The following is the output of the preceding code:

So, our ability to plot two columns of data at the interval level has re-confirmed what the previous line graph suggested; that there does seem to be a general trend upwards in average temperature across the US.

The interval level of data provides a whole new level of understanding of our data, but we aren't done yet.

主站蜘蛛池模板: 宁蒗| 彰化市| 策勒县| 吴堡县| 万盛区| 安阳市| 辽源市| 武宣县| 六盘水市| 凤阳县| 南溪县| 牡丹江市| 曲麻莱县| 阳谷县| 德令哈市| 宁晋县| 安远县| 沂水县| 兰考县| 常州市| 城市| 青海省| 襄汾县| 建瓯市| 鄱阳县| 渝北区| 林周县| 图木舒克市| 西青区| 高淳县| 黑山县| 井冈山市| 海林市| 板桥市| 开阳县| 甘谷县| 高唐县| 边坝县| 抚州市| 漯河市| 大姚县|