- Python Data Analysis(Second Edition)
- Armando Fandango
- 485字
- 2021-07-09 19:04:08
Dealing with dates
Dates are complicated. Just think of the Y2K bug, the pending Year 2038 problem, and the confusion caused by time zones. It's a mess. We encounter dates naturally when dealing with the time-series data. Pandas can create date ranges, resample time-series data, and perform date arithmetic operations.
Create a range of dates starting from January 1 1900 and lasting 42 days, as follows:
print("Date range", pd.date_range('1/1/1900', periods=42, freq='D'))
January has less than 42 days, so the end date falls in February, as you can check for yourself:
Date range <class 'pandas.tseries.index.DatetimeIndex'> [1900-01-01, ..., 1900-02-11] Length: 42, Freq: D, Timezone: None
The following table from the Pandas official documentation (refer to http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases) describes the frequencies used in Pandas:

Date ranges have their limits in Pandas. Timestamps in Pandas (based on the NumPy datetime64
data type) are represented by a 64-bit integer with nanosecond resolution (a billionth of a second). This limits legal timestamps to dates in the range approximately between the year 1677 and 2262 (not all dates in these years are valid). The exact midpoint of this range is at January 1 1970. For example, January 1 1677 cannot be defined with a Pandas timestamp, while September 30 1677 can, as demonstrated in the following code snippet:
try: print("Date range", pd.date_range('1/1/1677', periods=4, freq='D')) except: etype, value, _ = sys.exc_info() print("Error encountered", etype, value)
The code snippet prints the following error message:
Date range Error encountered <class 'pandas.tslib.OutOfBoundsDatetime'> Out of bounds nanosecond timestamp: 1677-01-01 00:00:00
Given all the previous information, calculate the allowed date range with Pandas DateOffset
as follows:
offset = DateOffset(seconds=2 ** 33/10 ** 9) mid = pd.to_datetime('1/1/1970') print("Start valid range", mid - offset) print("End valid range", mid + offset')
We get the following range values:
Start valid range 1969-12-31 23:59:51.410065 End valid range 1970-01-01 00:00:08.589935
We can convert a list of strings to dates with Pandas. Of course, not all strings can be converted. If Pandas is unable to convert a string, an error is often reported. Sometimes, ambiguities can arise due to differences in the way dates are defined in different locales. In this case, use a format string, as follows:
print("With format", pd.to_datetime(['19021112', '19031230'], format='%Y%m%d'))
The strings should be converted without an error occurring:
With format [datetime.datetime(1902, 11, 12, 0, 0) datetime.datetime(1903, 12, 30, 0, 0)]
If we try to convert a string, which is clearly not a date, by default the string is not converted:
print("Illegal date", pd.to_datetime(['1902-11-12', 'not a date']))
The second string in the list should not be converted:
Illegal date ['1902-11-12' 'not a date']
To force conversion, set the coerce
parameter to True
:
print("Illegal date coerced", pd.to_datetime(['1902-11-12', 'not a date'], errors='coerce'))
Obviously, the second string still cannot be converted to a date, so the only valid value we can give it is NaT
('not a time'):
Illegal date coerced <class 'pandas.tseries.index.DatetimeIndex'> [1902-11-12, NaT]Length: 2, Freq: None, Timezone: None
The code for this example is in ch-03.ipynb
of this book's code bundle.
- Java編程指南:基礎知識、類庫應用及案例設計
- MATLAB應用與實驗教程
- Learning ArcGIS Pro
- Java 9模塊化開發:核心原則與實踐
- 軟件供應鏈安全:源代碼缺陷實例剖析
- 從零開始學Python網絡爬蟲
- Scratch·愛編程的藝術家
- Ext JS 4 Plugin and Extension Development
- 游戲設計的底層邏輯
- 每個人的Python:數學、算法和游戲編程訓練營
- 編寫高質量代碼之Java(套裝共2冊)
- Mastering Responsive Web Design
- Performance Testing with JMeter 3(Third Edition)
- Java語言程序設計與實現(微課版)
- Web應用程序設計:ASP