Downsampling reduces the number of samples in the data. During this reduction, we are able to apply aggregations over data points. Let's imagine a busy airport with thousands of people passing through every hour. The airport administration has installed a visitor counter in the main area, to get an impression of exactly how busy their airport is.
They are receiving data from the counter device every minute. Here are the hypothetical measurements for a day, beginning at 08:00, ending 600 minutes later at 18:00:
To get a better picture of the day, we can downsample this time series to larger intervals, for example, 10 minutes. We can choose an aggregation function as well. The default aggregation is to take all the values and calculate the mean:
In our airport example, we are also interested in the sum of the values, that is, the combined number of visitors for a given time frame. We can choose the aggregation function by passing a function or a function name to the how parameter works:
Or we can define a custom function if we are interested in more unusual metrics. For example, we could be interested in selecting a random sample for each hour:
If you specify a function by string, Pandas uses highly optimized versions.
The built-in functions that can be used as argument to how are: sum, mean, std, sem, max, min, median, first, last, ohlc. The ohlc metric is popular in finance. It stands for open-high-low-close. An OHLC chart is a typical way to illustrate movements in the price of a financial instrument over time.
While in our airport this metric might not be that valuable, we can compute it nonetheless: