在Pandas中我们在处理时间序列的时候常用的方法有:
pd.to_datetime()
pd.date_range()
# pd.date_range() index = pd.date_range("20210101",periods=20) index Out[29]: DatetimeIndex(['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04', '2021-01-05', '2021-01-06', '2021-01-07', '2021-01-08', '2021-01-09', '2021-01-10', '2021-01-11', '2021-01-12', '2021-01-13', '2021-01-14', '2021-01-15', '2021-01-16', '2021-01-17', '2021-01-18', '2021-01-19', '2021-01-20'], dtype='datetime64[ns]', freq='D') # pd.to_datetime() df = pd.DataFrame(data=range(20210101,20210128),columns=["period"]) df["aa"] = pd.to_datetime(df["period"],format="%Y%m%d") df Out[24]: period aa 0 20210101 2021-01-01 1 20210102 2021-01-02 2 20210103 2021-01-03 3 20210104 2021-01-04 4 20210105 2021-01-05 5 20210106 2021-01-06 6 20210107 2021-01-07 7 20210108 2021-01-08 8 20210109 2021-01-09 9 20210110 2021-01-10 10 20210111 2021-01-11 11 20210112 2021-01-12 12 20210113 2021-01-13 13 20210114 2021-01-14 14 20210115 2021-01-15 15 20210116 2021-01-16 16 20210117 2021-01-17 17 20210118 2021-01-18 18 20210119 2021-01-19 19 20210120 2021-01-20 20 20210121 2021-01-21 21 20210122 2021-01-22 22 20210123 2021-01-23 23 20210124 2021-01-24 24 20210125 2021-01-25 25 20210126 2021-01-26 26 20210127 2021-01-27 index[1] Out[30]: Timestamp('2021-01-02 00:00:00', freq='D') df["aa"][1] Out[31]: Timestamp('2021-01-02 00:00:00') df["aa"][1] == index[1] Out[32]: True type(df["aa"][1]) Out[33]: pandas._libs.tslibs.timestamps.Timestamp type(index[1]) Out[34]: pandas._libs.tslibs.timestamps.Timestamp
从上面代码可以看出,pandas中的时间格式是pandas._libs.tslibs.timestamps.Timestamp
但是python中常用的时间格式是datetime.datetime
to_pydatetime()
t = datetime(2021,1,2) type(t) Out[54]: datetime.datetime t Out[55]: datetime.datetime(2021, 1, 2, 0, 0) r = (index[1].to_pydatetime()) type(r) Out[57]: datetime.datetime t == r Out[58]: True
将pandas Timestamp 转为 datetime 类型
In [11]: ts = pd.Timestamp('2014-01-23 00:00:00', tz=None) In [12]: ts.to_pydatetime() Out[12]: datetime.datetime(2014, 1, 23, 0, 0)
It's also available on a DatetimeIndex rng = pd.date_range('1/10/2011', periods=3, freq='D') rng.to_pydatetime() Out[60]: array([datetime.datetime(2011, 1, 10, 0, 0), datetime.datetime(2011, 1, 11, 0, 0), datetime.datetime(2011, 1, 12, 0, 0)], dtype=object)
官方文档: http://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#from-timestamps-to-epoch
最近需要提取某一天的时刻距离0:00的分钟数,找了文档之后想到这样一个办法:
In [64]: stamps = pd.date_range('2012-10-08 18:15:05', periods=4, freq='h') In [65]: stamps Out[65]: DatetimeIndex(['2012-10-08 18:15:05', '2012-10-08 19:15:05', '2012-10-08 20:15:05', '2012-10-08 21:15:05'], dtype='datetime64[ns]', freq='D')
先得到距离1970-01-01的秒数
In [66]: (stamps - pd.Timestamp("1970-01-01")) // pd.Timedelta('1s') Out[66]: Int64Index([1349720105, 1349723705, 1349727305, 1349730905], dtype='int64')
对天取余,得到距离0:00的秒数
In [67]: (stamps - pd.Timestamp("1970-01-01")) // pd.Timedelta('1s') % 86400 Out[67]: Int64Index([65705, 69305, 72905, 76505], dtype='int64')
取距离0:00的分钟数
In [68]: (stamps - pd.Timestamp("1970-01-01")) // pd.Timedelta('1s') % 86400 /60 Out[68]: Int64Index([1095.0833333333333, 1155.0833333333333, 1215.0833333333333, 1275.0833333333333], dtype='float64')
同样的,也可以取小时数
In [69]: (stamps - pd.Timestamp("1970-01-01")) // pd.Timedelta('1s') % 86400 /3600 Out[68]: Int64Index([18.25138888888889, 19.25138888888889, 20.25138888888889, 21.25138888888889], dtype='float64')
取小时整数–当然取小时整数也有别的方法。
In [70]: (stamps - pd.Timestamp("1970-01-01")) // pd.Timedelta('1s') % 86400 //3600 Out[70]: Int64Index([18, 19, 20, 21], dtype='int64')
以上为个人经验,希望能给大家一个参考,也希望大家多多支持。