官术网_书友最值得收藏!

Panel data

So far, we have seen data taken from multiple individuals but at one point in time (cross-sectional) or taken from an individual entity but over multiple points in time (time series). However, if we observe multiple entities over multiple points in time we get a panel data also known as longitudinal data. Extending our earlier example about the military expenditure, let us now consider four countries over the same period of 1960-2010. The resulting data will be a panel dataset. The figure given below illustrates the panel data in this scenario. Rows with missing values, corresponding to the period 1960 to 1987 have been dropped before plotting the data.

Figure 1.4: Example of panel data
A generic panel data regression model can be stated as y_it = W x _it +b+ ? _it, which expresses the dependent variable y_it as a linear model of explanatory variable x_it, where W are weights of x_it, b is the bias term, and ?_it is the error. i represents individuals for whom data is collected for multiple points in time represented by j. As evident, this type of panel data analysis seeks to model the variations across both multiple individual and multiple points in time. The variations are reflected by ? _it and assumptions determine the necessary mathematical treatment. For example, if ?_it is assumed to vary non-stochastically with respect to i and t, then it reduces to a dummy variable representing random noise. This type of analysis is known as fixed effects model. On the other hand, ?_it varying stochastically over i and t requires a special treatment of the error and is dealt in a random effects model.

Let us prepare the data that is required to plot the preceding figure. We will continue to expand the code we have used for the cross-sectional and time series data previously in this chapter. We start by creating a DataFrame having the data for the four companies mentioned in the preceding plot. This is done as follows:

chn = data.ix[(data['Indicator Name']=='Military expenditure (% of GDP)')&\
(data['Country Code']=='CHN'),index0:index1+1
]
chn = pd.Series(data=chn.values[0], index=chn.columns)
chn.dropna(inplace=True)

usa = data.ix[(data['Indicator Name']=='Military expenditure (% of GDP)')&\
(data['Country Code']=='USA'),index0:index1+1
]
usa = pd.Series(data=usa.values[0], index=usa.columns)
usa.dropna(inplace=True)

ind = data.ix[(data['Indicator Name']=='Military expenditure (% of GDP)')&\
(data['Country Code']=='IND'),index0:index1+1
]
ind = pd.Series(data=ind.values[0], index=ind.columns)
ind.dropna(inplace=True)

gbr = data.ix[(data['Indicator Name']=='Military expenditure (% of GDP)')&\
(data['Country Code']=='GBR'),index0:index1+1
]
gbr = pd.Series(data=gbr.values[0], index=gbr.columns)
gbr.dropna(inplace=True)

Now that the data is ready for all five countries, we will plot them using the following code:

plt.figure(figsize=(5.5, 5.5))
usa.plot(linestyle='-', marker='*', color='b')
chn.plot(linestyle='-', marker='*', color='r')
gbr.plot(linestyle='-', marker='*', color='g')
ind.plot(linestyle='-', marker='*', color='y')
plt.legend(['USA','CHINA','UK','INDIA'], loc=1)
plt.title('Miltitary expenditure of 5 countries over 10 years')
plt.ylabel('Military expenditure (% of GDP)')
plt.xlabel('Years')s
The Jupyter notebook that has the code used for generating all the preceding figures is Chapter_1_Different_Types_of_Data.ipynb under the code folder in the GitHub repo.

The discussion about different types of data sets the stage for a closer look at time series. We will start doing that by understanding the special properties of data that can be typically found in a time series or panel data with inherent time series in it.

主站蜘蛛池模板: 安塞县| 尼勒克县| 加查县| 长沙市| 五莲县| 康定县| 东乌珠穆沁旗| 张北县| 保山市| 茌平县| 台北市| 昭苏县| 博湖县| 彭州市| 封丘县| 大冶市| 大同县| 汤阴县| 兴安盟| 余庆县| 都兰县| 福鼎市| 普安县| 启东市| 湖南省| 天津市| 永宁县| 济宁市| 三明市| 开江县| 青龙| 峨边| 滁州市| 鄄城县| 璧山县| 神木县| 罗平县| 天津市| 金溪县| 南开区| 嘉峪关市|