在计算股票因子时,经常会用到前1日或者前N日的数据,这时候就需要用到Pandas的shift方法。本文以sz.002407多氟多在2021年的前10根日线数据(如下表所示)为例,展示shift方法的使用,我们首先将数据加载到一个类型为DataFrame的对象df中。
date | open | high | low | close | preclose | volume | amount | adjustflag | turn | tradestatus | pctChg | peTTM | pbMRQ | psTTM | pcfNcfTTM | isST |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2021-01-04 | 20.0300000000 | 21.4000000000 | 19.9300000000 | 20.7300000000 | 20.0300000000 | 69639314 | 1442558998.1400 | 2 | 12.026700 | 1 | 3.494800 | -28.520577 | 5.001054 | 3.874122 | -336.943766 | 0 |
2021-01-05 | 20.5100000000 | 21.3100000000 | 19.9500000000 | 20.6800000000 | 20.7300000000 | 68155914 | 1404877488.0600 | 2 | 11.770500 | 1 | -0.241200 | -28.451786 | 4.988992 | 3.864778 | -336.131070 | 0 |
2021-01-06 | 20.8300000000 | 21.2800000000 | 19.0600000000 | 19.7700000000 | 20.6800000000 | 69108077 | 1375667187.1400 | 2 | 11.935000 | 1 | -4.400400 | -27.199798 | 4.769457 | 3.694713 | -321.340003 | 0 |
2021-01-07 | 19.5500000000 | 20.4400000000 | 19.2100000000 | 19.8900000000 | 19.7700000000 | 57333679 | 1145163382.0700 | 2 | 9.901500 | 1 | 0.607000 | -27.364895 | 4.798406 | 3.717139 | -323.290473 | 0 |
2021-01-08 | 19.8900000000 | 20.1300000000 | 18.6000000000 | 19.4900000000 | 19.8900000000 | 55294505 | 1063035986.1300 | 2 | 9.549300 | 1 | -2.011100 | -26.814570 | 4.701908 | 3.642385 | -316.788905 | 0 |
2021-01-11 | 18.9000000000 | 18.9000000000 | 17.5400000000 | 17.5500000000 | 19.4900000000 | 77040397 | 1371939544.2400 | 2 | 13.304900 | 1 | -9.953800 | -24.145496 | 4.233888 | 3.279829 | -285.256300 | 0 |
2021-01-12 | 17.2000000000 | 18.3500000000 | 16.8300000000 | 18.1300000000 | 17.5500000000 | 55091872 | 970628000.0300 | 2 | 9.514400 | 1 | 3.304800 | -24.943466 | 4.373811 | 3.388222 | -294.683573 | 0 |
2021-01-13 | 18.2000000000 | 19.5300000000 | 18.2000000000 | 18.7600000000 | 18.1300000000 | 68535355 | 1303591461.9500 | 2 | 11.836000 | 1 | 3.474900 | -25.810228 | 4.525797 | 3.505959 | -304.923543 | 0 |
2021-01-14 | 18.6900000000 | 19.6200000000 | 18.1200000000 | 19.4000000000 | 18.7600000000 | 58518764 | 1109560714.8600 | 2 | 10.106200 | 1 | 3.411500 | -26.690747 | 4.680195 | 3.625566 | -315.326052 | 0 |
2021-01-15 | 19.0700000000 | 19.6800000000 | 18.1300000000 | 19.3000000000 | 19.4000000000 | 55752774 | 1052651676.0200 | 2 | 9.628500 | 1 | -0.515500 | -26.553166 | 4.656071 | 3.606877 | -313.700660 | 0 |
Series的shift方法
我们取df的close列,然后调用shift方法,看一下打印效果:
ss = df['close'].shift(1)
print(ss)
0 NaN
1 20.73
2 20.68
3 19.77
4 19.89
5 19.49
6 17.55
7 18.13
8 18.76
9 19.40
Name: close, dtype: float64
Series的shift方法返回的结果仍为Series,参数表示移动的步数,默认值为1,上面的例子中,数据向下移动了1步,我们也可以传入负数,控制数据向上移动:
ss = df['close'].shift(-2)
print(ss)
0 19.77
1 19.89
2 19.49
3 17.55
4 18.13
5 18.76
6 19.40
7 19.30
8 NaN
9 NaN
Name: close, dtype: float64
我们可以通过下面的方式,将shift得到的结果添加回原来的df中:
df['close_1a'] = df['close'].shift(1)
print(df)
date open high low ... psTTM pcfNcfTTM isST close_1a
0 2021-01-04 20.03 21.40 19.93 ... 3.874122 -336.943766 0 NaN
1 2021-01-05 20.51 21.31 19.95 ... 3.864778 -336.131070 0 20.73
2 2021-01-06 20.83 21.28 19.06 ... 3.694713 -321.340003 0 20.68
3 2021-01-07 19.55 20.44 19.21 ... 3.717139 -323.290473 0 19.77
4 2021-01-08 19.89 20.13 18.60 ... 3.642385 -316.788905 0 19.89
5 2021-01-11 18.90 18.90 17.54 ... 3.279829 -285.256300 0 19.49
6 2021-01-12 17.20 18.35 16.83 ... 3.388222 -294.683573 0 17.55
7 2021-01-13 18.20 19.53 18.20 ... 3.505959 -304.923543 0 18.13
8 2021-01-14 18.69 19.62 18.12 ... 3.625566 -315.326052 0 18.76
9 2021-01-15 19.07 19.68 18.13 ... 3.606877 -313.700660 0 19.40
[10 rows x 18 columns]
df的最后1列,就记录了对应日期的前1日收盘价。
DataFrame的shift方法
DataFrame的shift方法与Series的shift方法应用方式类似,只是需要用参数axis来控制数据移动的方向,axis默认值为0,表示按列应用,axis为1时,表示按行应用。
- 按列应用
print(df.shift(1, axis=0))
date open high low ... pbMRQ psTTM pcfNcfTTM isST
0 NaN NaN NaN NaN ... NaN NaN NaN NaN
1 2021-01-04 20.03 21.40 19.93 ... 5.001054 3.874122 -336.943766 0.0
2 2021-01-05 20.51 21.31 19.95 ... 4.988992 3.864778 -336.131070 0.0
3 2021-01-06 20.83 21.28 19.06 ... 4.769457 3.694713 -321.340003 0.0
4 2021-01-07 19.55 20.44 19.21 ... 4.798406 3.717139 -323.290473 0.0
5 2021-01-08 19.89 20.13 18.60 ... 4.701908 3.642385 -316.788905 0.0
6 2021-01-11 18.90 18.90 17.54 ... 4.233888 3.279829 -285.256300 0.0
7 2021-01-12 17.20 18.35 16.83 ... 4.373811 3.388222 -294.683573 0.0
8 2021-01-13 18.20 19.53 18.20 ... 4.525797 3.505959 -304.923543 0.0
9 2021-01-14 18.69 19.62 18.12 ... 4.680195 3.625566 -315.326052 0.0
[10 rows x 17 columns]
所有列均向下移动1步。
假如我们想将股票前1日的最高价和收盘价都添加到对应的行中,可以按如下方式实现:
def shift_i(df, factor_list, i):
shift_df = df[factor_list].shift(i)
shift_df.rename(columns={x: '{}_{}a'.format(x, i) for x in factor_list}, inplace=True)
df = pd.concat([df, shift_df], axis=1)
return df
df = shift_i(df, ['high', 'close'], 1)
print(df)
date open high low ... pcfNcfTTM isST high_1a close_1a
0 2021-01-04 20.03 21.40 19.93 ... -336.943766 0 NaN NaN
1 2021-01-05 20.51 21.31 19.95 ... -336.131070 0 21.40 20.73
2 2021-01-06 20.83 21.28 19.06 ... -321.340003 0 21.31 20.68
3 2021-01-07 19.55 20.44 19.21 ... -323.290473 0 21.28 19.77
4 2021-01-08 19.89 20.13 18.60 ... -316.788905 0 20.44 19.89
5 2021-01-11 18.90 18.90 17.54 ... -285.256300 0 20.13 19.49
6 2021-01-12 17.20 18.35 16.83 ... -294.683573 0 18.90 17.55
7 2021-01-13 18.20 19.53 18.20 ... -304.923543 0 18.35 18.13
8 2021-01-14 18.69 19.62 18.12 ... -315.326052 0 19.53 18.76
9 2021-01-15 19.07 19.68 18.13 ... -313.700660 0 19.62 19.40
[10 rows x 19 columns]
shift_i函数实现了对df中的factor_list的所有因子的shift操作,并且修改新列的列名,例如将high向下移动1步的得到数据的列名修改为high_1a,这样对2021-01-05这一行来看,high_1a列的值就是它前1日的最高价。
- 按行应用
print(df.shift(1, axis=1))
date open high low ... pbMRQ psTTM pcfNcfTTM isST
0 NaN 2021-01-04 20.03 21.40 ... -28.520577 5.001054 3.874122 -336.943766
1 NaN 2021-01-05 20.51 21.31 ... -28.451786 4.988992 3.864778 -336.131070
2 NaN 2021-01-06 20.83 21.28 ... -27.199798 4.769457 3.694713 -321.340003
3 NaN 2021-01-07 19.55 20.44 ... -27.364895 4.798406 3.717139 -323.290473
4 NaN 2021-01-08 19.89 20.13 ... -26.814570 4.701908 3.642385 -316.788905
5 NaN 2021-01-11 18.90 18.90 ... -24.145496 4.233888 3.279829 -285.256300
6 NaN 2021-01-12 17.20 18.35 ... -24.943466 4.373811 3.388222 -294.683573
7 NaN 2021-01-13 18.20 19.53 ... -25.810228 4.525797 3.505959 -304.923543
8 NaN 2021-01-14 18.69 19.62 ... -26.690747 4.680195 3.625566 -315.326052
9 NaN 2021-01-15 19.07 19.68 ... -26.553166 4.656071 3.606877 -313.700660
[10 rows x 17 columns]
所有行均向右移动1步。
目前我们在处理股票数据时,还没涉及到按行应用shift,这里就不展开介绍了。