- Python:Data Analytics and Visualization
- Phuong Vo.T.H Martin Czygan Ashish Kumar Kirthi Raman
- 417字
- 2021-07-09 18:51:43
Upsampling time series data
In upsampling, the frequency of the time series is increased. As a result, we have more sample points than data points. One of the main questions is how to account for the entries in the series where we have no measurement.
Let's start with hourly data for a single day:
>>> rng = pd.date_range('4/29/2015 8:00', periods=10, freq='H') >>> ts = pd.Series(np.random.randint(0, 100, len(rng)), index=rng) >>> ts.head() 2015-04-29 08:00:00 30 2015-04-29 09:00:00 27 2015-04-29 10:00:00 54 2015-04-29 11:00:00 9 2015-04-29 12:00:00 48 Freq: H, dtype: int64
If we upsample to data points taken every 15 minutes, our time series will be extended with NaN
values:
>>> ts.resample('15min') >>> ts.head() 2015-04-29 08:00:00 30 2015-04-29 08:15:00 NaN 2015-04-29 08:30:00 NaN 2015-04-29 08:45:00 NaN 2015-04-29 09:00:00 27
There are various ways to deal with missing values, which can be controlled by the fill_method
keyword argument to resample. Values can be filled either forward or backward:
>>> ts.resample('15min', fill_method='ffill').head() 2015-04-29 08:00:00 30 2015-04-29 08:15:00 30 2015-04-29 08:30:00 30 2015-04-29 08:45:00 30 2015-04-29 09:00:00 27 Freq: 15T, dtype: int64 >>> ts.resample('15min', fill_method='bfill').head() 2015-04-29 08:00:00 30 2015-04-29 08:15:00 27 2015-04-29 08:30:00 27 2015-04-29 08:45:00 27 2015-04-29 09:00:00 27
With the limit
parameter, it is possible to control the number of missing values to be filled:
>>> ts.resample('15min', fill_method='ffill', limit=2).head() 2015-04-29 08:00:00 30 2015-04-29 08:15:00 30 2015-04-29 08:30:00 30 2015-04-29 08:45:00 NaN 2015-04-29 09:00:00 27 Freq: 15T, dtype: float64
If you want to adjust the labels during resampling, you can use the loffset
keyword argument:
>>> ts.resample('15min', fill_method='ffill', limit=2, loffset='5min').head() 2015-04-29 08:05:00 30 2015-04-29 08:20:00 30 2015-04-29 08:35:00 30 2015-04-29 08:50:00 NaN 2015-04-29 09:05:00 27 Freq: 15T, dtype: float64
There is another way to fill in missing values. We could employ an algorithm to construct new data points that would somehow fit the existing points, for some definition of somehow. This process is called interpolation.
We can ask Pandas to interpolate a time series for us:
>>> tsx = ts.resample('15min') >>> tsx.interpolate().head() 2015-04-29 08:00:00 30.00 2015-04-29 08:15:00 29.25 2015-04-29 08:30:00 28.50 2015-04-29 08:45:00 27.75 2015-04-29 09:00:00 27.00 Freq: 15T, dtype: float64
We saw the default interpolate
method – a linear interpolation – in action. Pandas assumes a linear relationship between two existing points.
Pandas supports over a dozen interpolation
functions, some of which require the scipy
library to be installed. We will not cover interpolation
methods in this chapter, but we encourage you to explore the various methods yourself. The right interpolation
method will depend on the requirements of your application.
- 現(xiàn)代測控系統(tǒng)典型應(yīng)用實例
- 機器學(xué)習(xí)實戰(zhàn):基于Sophon平臺的機器學(xué)習(xí)理論與實踐
- Linux Mint System Administrator’s Beginner's Guide
- Dreamweaver 8中文版商業(yè)案例精粹
- Learning Apache Cassandra(Second Edition)
- 傳感器技術(shù)應(yīng)用
- VB語言程序設(shè)計
- 機器人編程實戰(zhàn)
- 大型數(shù)據(jù)庫管理系統(tǒng)技術(shù)、應(yīng)用與實例分析:SQL Server 2005
- 傳感器與物聯(lián)網(wǎng)技術(shù)
- Lightning Fast Animation in Element 3D
- 數(shù)據(jù)掘金
- 筆記本電腦維修90個精選實例
- 大數(shù)據(jù)案例精析
- The DevOps 2.1 Toolkit:Docker Swarm