官术网_书友最值得收藏!

  • Practical Time Series Analysis
  • Dr. Avishek Pal Dr. PKS Prakash
  • 680字
  • 2021-07-08 10:18:26

General trend

When a time series exhibits an upward or downward movement in the long run, it is said to have a general trend. A quick way to check the presence of general trend is to plot the time series as in the following figure, which shows CO2 concentrations in air measured during 1974 through 1987:

Figure 1.5: Time series of CO2 readings with an upward trend

However, general trend might not be evident over a short run of the series. Short run effects such as seasonal fluctuations and irregular variations cause the time series to revisit lower or higher values observed in the past and hence can temporarily obfuscate any general trend. This is evident in the same time series of CO2 concentrations when zoomed in over the period of 1979 through 1981, as shown in the following figure. Hence to reveal general trend, we need a time series that dates substantially back in the past.

Figure 1.6: Shorter run of CO2 readings time series which is not able to reveal general trend

The general trend in the time series is due to fundamental shifts or systemic changes of the process or system it represents. For example, the upward movement of CO2 concentrations during 1974 through 1987 can be attributed to the gradual rise in automobiles and industrialization over these years.

A general trend is commonly modeled by setting up the time series as a regression against time and other known factors as explanatory variables. The regression or trend line can then be used as a prediction of the long run movement of the time series. Residuals left by the trend line is further analyzed for other interesting properties such as seasonality, cyclical behavior, and irregular variations.

Now, let us go through the code that generated the preceding plots on CO2 concentrations. We will also show how to build a trend model using linear regression on the time index (which in this case is the index of the year in the data) as explanatory variable and the CO2 concentration as the dependent variable. But first, let us load the data in a pandas.DataFrame.

The data for this example is in the Excel file Monthly_CO2_Concentrations.xlsx under the datasets folder of the GitHub repo.

We start by importing the required packages as follows:

from __future__ import print_function 
import os 
import pandas as pd 
import numpy as np 
%matplotlib inline 
from matplotlib import pyplot as plt 
import seaborn as sns 
os.chdir('D:\Practical Time Series') 
data = pd.read_excel('datasets/Monthly_CO2_Concentrations.xlsx',                                  converters={'Year': np.int32, 'Month':  np.int32}) 
data.head() 

We have passed the argument converters to the read_excel function in order to make sure that columns Year and Month are assigned the integer (np.int32) datatype. The preceding lines of code will generate the following table:

Before plotting we must remove all columns having missing values. Besides, the DataFrame is sorted in ascending order of Year and Month. These are done as follows:

data = data.ix[(~pd.isnull(data['CO2']))&\ 
               (~pd.isnull(data['Year']))&\ 
               (~pd.isnull(data['Month']))] 
data.sort_values(['Year', 'Month'], inplace=True) 

Finally, the plot for the time period 1974 to 1987 is generated by executing the following lines:

plt.figure(figsize=(5.5, 5.5))
data['CO2'].plot(color='b')
plt.title('Monthly CO2 concentrations')
plt.xlabel('Time')
plt.ylabel('CO2 concentratition')
plt.xticks(rotation=30)

The zoomed-in version of the data for the time period 1980 to 1981 is generated by after the DataFrame for these three years:

plt.figure(figsize=(5.5, 5.5))
data['CO2'].loc[(data['Year']==1980) | (data['Year']==1981)].plot(color='b')
plt.title('Monthly CO2 concentrations')
plt.xlabel('Time')
plt.ylabel('CO2 concentratition')
plt.xticks(rotation=30)

Next, let us fit the trend line. For this we import the LinearRegression class from scikit-learn and fit a linear model on the time index:

from sklearn.linear_model import LinearRegression 
trend_model = LinearRegression(normalize=True, fit_intercept=True) 
trend_model.fit(np.array(data.index).reshape((-1,1)), data['CO2']) 
print('Trend model coefficient={} and intercept={}'.format(trend_model.coef_[0], 
                                                           trend_model.intercept_) 
      ) 

This produces the following output:

Trend model coefficient=0.111822078545 and intercept=329.455422234 

The residuals obtained from the trend line model are shown in the following figure and appear to have seasonal behaviour, which is discussed in the next sub section.

The residuals are calculated and plotted in the preceding by the following lines of code:

residuals = np.array(data['CO2']) - trend_model.predict(np.array(data.index).reshape((-1,1)))
plt.figure(figsize=(5.5, 5.5))
pd.Series(data=residuals, index=data.index).plot(color='b')
plt.title('Residuals of trend model for CO2 concentrations')
plt.xlabel('Time')
plt.ylabel('CO2 concentratition')
plt.xticks(rotation=30)
Figure 1.7: Residuals from a linear model of the general trend in CO2 readings
主站蜘蛛池模板: 富源县| 康保县| 苏尼特右旗| 抚州市| 大英县| 九江县| 卓资县| 佛山市| 无锡市| 博乐市| 望江县| 布拖县| 夏河县| 鲁甸县| 新泰市| 大城县| 利辛县| 常熟市| 宁海县| 凤翔县| 滕州市| 武功县| 陈巴尔虎旗| 安达市| 水富县| 北川| 新安县| 喜德县| 介休市| 电白县| 兖州市| 长葛市| 潍坊市| 方正县| 牙克石市| 皋兰县| 比如县| 石河子市| 巴里| 丰城市| 陵川县|