重新采样数据帧,计算间隔内不同列的最小值、最大值以及第一列和最后一列

2024-03-26

我有一个数据库,其中包含 1 分钟的交易数据条目,包括开盘价、最高价、最低价和收盘价。 我想创建一个函数,删除某些时间戳以仅保留 30m 条目。但是,需要更新该时间戳的值以反映新的周期。

我执行了以下代码,但有一个问题:

def time_stamp(minutes):
    Start_stamp=1609459200000
    End_stamp=1622505540000
    Interval=60000*minutes
    list_stamp=np.arange(Start_stamp, End_stamp+1, Interval).tolist()
    for i in range(minutes,End_stamp,minutes):
        df.loc[i,'High']=df['High'].loc[-minutes:].max()
        df.loc[i,'Low']=df.loc[-minutes:,'Low'].min()
        df.loc[i,'Open']=df.loc[-minutes:,'Close']
    df.drop(df.loc[~df['t'].isin(list_stamp)].index, inplace=True)
    return df
time_stamp(30)

ValueError:索引器与系列不兼容

有人可以给我一些建议吗? 谢谢你!


有一种更像熊猫的方法可以做到这一点。

由于没有数据样本,我不得不制作一个。这是执行此操作的代码:

#Make the data
index = pd.date_range('1/1/2000',periods=60*24,freq='T')
df = pd.DataFrame(np.random.rand(60*24,2),columns = ['open','close'],index=index)
df = df+1 # doing this to avoid possiblity of negative values in next steps
df['high'] = df.apply(lambda row: row.max() + np.random.random(),axis=1)
df['low'] = df.apply(lambda row: row.min() - np.random.random(),axis=1)

此时 df 的状态 (head(10)

,open,close,high,low
2000-01-01 00:00:00,1.5236619202496442,1.151985535527245,1.7477467456279827,0.3031985970254675
2000-01-01 00:01:00,1.7567707020541863,1.844917989219291,2.3157262902092053,0.781678343968321
2000-01-01 00:02:00,1.4329459219698644,1.5715643667517165,2.2800512080007325,0.4385068358774301
2000-01-01 00:03:00,1.6278939890163286,1.4967963857419173,2.4514762537932637,0.7483790156969329
2000-01-01 00:04:00,1.7696997962274348,1.7981539004095517,2.1609841398138325,1.4423796609201727
2000-01-01 00:05:00,1.3156416756165012,1.6792424542358473,2.6725022251661867,1.263416934678443
2000-01-01 00:06:00,1.4611709821585714,1.3417705793465275,1.7269143465983203,0.6447125825749427
2000-01-01 00:07:00,1.1353922264378535,1.3576210147951089,1.8826801353270626,0.49493624242983736
2000-01-01 00:08:00,1.6827074173849588,1.2127513631592481,2.4320709664997366,1.015161578142598
2000-01-01 00:09:00,1.277323428018112,1.379928215762615,2.1107247913266804,0.7283856978040806

然后我们就可以做你需要做的事情了(注意时间戳就是索引)

df.重采样 https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.resample.html#pandas.DataFrame.resample将时间序列“重新采样”为您选择的频率。我用了 30 分钟,正如问题中指定的那样。

.agg https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.agg.html让我们做你想做的 -> 获取第一个开盘日期、最后一个收盘日期、最高价和最低价:

df = df.resample('30T').agg({'open': 'first', 'close': 'last','high':'max','low':'min'})

最终输出:

,open,close,high,low
2000-01-01 00:00:00,1.5236619202496442,1.9399515432326182,2.7830658255264904,0.11963392810868156
2000-01-01 00:30:00,1.3507487064130956,1.943836375991639,2.8756089239367886,0.18513880795935822
2000-01-01 01:00:00,1.3521982535896768,1.3917486576623297,2.8566136804896236,0.1750201985909
2000-01-01 01:30:00,1.0429129450145977,1.653875051452551,2.903310168048458,0.12223652926377937
2000-01-01 02:00:00,1.724667336487399,1.3501859745845943,2.7883533771155182,0.10617913875428453
2000-01-01 02:30:00,1.0951747626878743,1.9314727636907452,2.704938040638077,0.2811809746810251
2000-01-01 03:00:00,1.2706302627630148,1.7120392033624894,2.909430407567025,0.11251041513367666
2000-01-01 03:30:00,1.2979020670054455,1.1065439262276353,2.7908377681443057,0.3071618087183765
2000-01-01 04:00:00,1.2146422040399025,1.3758650428561257,2.906605257212037,0.2757186485567582
2000-01-01 04:30:00,1.2791605232157812,1.3337224908227947,2.968804134958828,0.1021661248014647
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)

重新采样数据帧,计算间隔内不同列的最小值、最大值以及第一列和最后一列 的相关文章

随机推荐