Pandas的shift方法

在计算股票因子时,经常会用到前1日或者前N日的数据,这时候就需要用到Pandas的shift方法。本文以sz.002407多氟多在2021年的前10根日线数据(如下表所示)为例,展示shift方法的使用,我们首先将数据加载到一个类型为DataFrame的对象df中。

date open high low close preclose volume amount adjustflag turn tradestatus pctChg peTTM pbMRQ psTTM pcfNcfTTM isST
2021-01-04 20.0300000000 21.4000000000 19.9300000000 20.7300000000 20.0300000000 69639314 1442558998.1400 2 12.026700 1 3.494800 -28.520577 5.001054 3.874122 -336.943766 0
2021-01-05 20.5100000000 21.3100000000 19.9500000000 20.6800000000 20.7300000000 68155914 1404877488.0600 2 11.770500 1 -0.241200 -28.451786 4.988992 3.864778 -336.131070 0
2021-01-06 20.8300000000 21.2800000000 19.0600000000 19.7700000000 20.6800000000 69108077 1375667187.1400 2 11.935000 1 -4.400400 -27.199798 4.769457 3.694713 -321.340003 0
2021-01-07 19.5500000000 20.4400000000 19.2100000000 19.8900000000 19.7700000000 57333679 1145163382.0700 2 9.901500 1 0.607000 -27.364895 4.798406 3.717139 -323.290473 0
2021-01-08 19.8900000000 20.1300000000 18.6000000000 19.4900000000 19.8900000000 55294505 1063035986.1300 2 9.549300 1 -2.011100 -26.814570 4.701908 3.642385 -316.788905 0
2021-01-11 18.9000000000 18.9000000000 17.5400000000 17.5500000000 19.4900000000 77040397 1371939544.2400 2 13.304900 1 -9.953800 -24.145496 4.233888 3.279829 -285.256300 0
2021-01-12 17.2000000000 18.3500000000 16.8300000000 18.1300000000 17.5500000000 55091872 970628000.0300 2 9.514400 1 3.304800 -24.943466 4.373811 3.388222 -294.683573 0
2021-01-13 18.2000000000 19.5300000000 18.2000000000 18.7600000000 18.1300000000 68535355 1303591461.9500 2 11.836000 1 3.474900 -25.810228 4.525797 3.505959 -304.923543 0
2021-01-14 18.6900000000 19.6200000000 18.1200000000 19.4000000000 18.7600000000 58518764 1109560714.8600 2 10.106200 1 3.411500 -26.690747 4.680195 3.625566 -315.326052 0
2021-01-15 19.0700000000 19.6800000000 18.1300000000 19.3000000000 19.4000000000 55752774 1052651676.0200 2 9.628500 1 -0.515500 -26.553166 4.656071 3.606877 -313.700660 0

Series的shift方法

我们取df的close列,然后调用shift方法,看一下打印效果:

ss = df['close'].shift(1)
print(ss)
0      NaN
1    20.73
2    20.68
3    19.77
4    19.89
5    19.49
6    17.55
7    18.13
8    18.76
9    19.40
Name: close, dtype: float64

Series的shift方法返回的结果仍为Series,参数表示移动的步数,默认值为1,上面的例子中,数据向下移动了1步,我们也可以传入负数,控制数据向上移动:

ss = df['close'].shift(-2)
print(ss)
0    19.77
1    19.89
2    19.49
3    17.55
4    18.13
5    18.76
6    19.40
7    19.30
8      NaN
9      NaN
Name: close, dtype: float64

我们可以通过下面的方式,将shift得到的结果添加回原来的df中:

df['close_1a'] = df['close'].shift(1)
print(df)
         date   open   high    low  ...     psTTM   pcfNcfTTM  isST  close_1a
0  2021-01-04  20.03  21.40  19.93  ...  3.874122 -336.943766     0       NaN
1  2021-01-05  20.51  21.31  19.95  ...  3.864778 -336.131070     0     20.73
2  2021-01-06  20.83  21.28  19.06  ...  3.694713 -321.340003     0     20.68
3  2021-01-07  19.55  20.44  19.21  ...  3.717139 -323.290473     0     19.77
4  2021-01-08  19.89  20.13  18.60  ...  3.642385 -316.788905     0     19.89
5  2021-01-11  18.90  18.90  17.54  ...  3.279829 -285.256300     0     19.49
6  2021-01-12  17.20  18.35  16.83  ...  3.388222 -294.683573     0     17.55
7  2021-01-13  18.20  19.53  18.20  ...  3.505959 -304.923543     0     18.13
8  2021-01-14  18.69  19.62  18.12  ...  3.625566 -315.326052     0     18.76
9  2021-01-15  19.07  19.68  18.13  ...  3.606877 -313.700660     0     19.40

[10 rows x 18 columns]

df的最后1列,就记录了对应日期的前1日收盘价。


DataFrame的shift方法

DataFrame的shift方法与Series的shift方法应用方式类似,只是需要用参数axis来控制数据移动的方向,axis默认值为0,表示按列应用,axis为1时,表示按行应用。

  1. 按列应用
print(df.shift(1, axis=0))
         date   open   high    low  ...     pbMRQ     psTTM   pcfNcfTTM  isST
0         NaN    NaN    NaN    NaN  ...       NaN       NaN         NaN   NaN
1  2021-01-04  20.03  21.40  19.93  ...  5.001054  3.874122 -336.943766   0.0
2  2021-01-05  20.51  21.31  19.95  ...  4.988992  3.864778 -336.131070   0.0
3  2021-01-06  20.83  21.28  19.06  ...  4.769457  3.694713 -321.340003   0.0
4  2021-01-07  19.55  20.44  19.21  ...  4.798406  3.717139 -323.290473   0.0
5  2021-01-08  19.89  20.13  18.60  ...  4.701908  3.642385 -316.788905   0.0
6  2021-01-11  18.90  18.90  17.54  ...  4.233888  3.279829 -285.256300   0.0
7  2021-01-12  17.20  18.35  16.83  ...  4.373811  3.388222 -294.683573   0.0
8  2021-01-13  18.20  19.53  18.20  ...  4.525797  3.505959 -304.923543   0.0
9  2021-01-14  18.69  19.62  18.12  ...  4.680195  3.625566 -315.326052   0.0

[10 rows x 17 columns]

所有列均向下移动1步。

假如我们想将股票前1日的最高价和收盘价都添加到对应的行中,可以按如下方式实现:

def shift_i(df, factor_list, i):
    shift_df = df[factor_list].shift(i)
    shift_df.rename(columns={x: '{}_{}a'.format(x, i) for x in factor_list}, inplace=True)
    df = pd.concat([df, shift_df], axis=1)
    return df

df = shift_i(df, ['high', 'close'], 1)
print(df)
         date   open   high    low  ...   pcfNcfTTM  isST  high_1a  close_1a
0  2021-01-04  20.03  21.40  19.93  ... -336.943766     0      NaN       NaN
1  2021-01-05  20.51  21.31  19.95  ... -336.131070     0    21.40     20.73
2  2021-01-06  20.83  21.28  19.06  ... -321.340003     0    21.31     20.68
3  2021-01-07  19.55  20.44  19.21  ... -323.290473     0    21.28     19.77
4  2021-01-08  19.89  20.13  18.60  ... -316.788905     0    20.44     19.89
5  2021-01-11  18.90  18.90  17.54  ... -285.256300     0    20.13     19.49
6  2021-01-12  17.20  18.35  16.83  ... -294.683573     0    18.90     17.55
7  2021-01-13  18.20  19.53  18.20  ... -304.923543     0    18.35     18.13
8  2021-01-14  18.69  19.62  18.12  ... -315.326052     0    19.53     18.76
9  2021-01-15  19.07  19.68  18.13  ... -313.700660     0    19.62     19.40

[10 rows x 19 columns]

shift_i函数实现了对df中的factor_list的所有因子的shift操作,并且修改新列的列名,例如将high向下移动1步的得到数据的列名修改为high_1a,这样对2021-01-05这一行来看,high_1a列的值就是它前1日的最高价。

  1. 按行应用
print(df.shift(1, axis=1))
  date        open   high    low  ...      pbMRQ     psTTM  pcfNcfTTM        isST
0  NaN  2021-01-04  20.03  21.40  ... -28.520577  5.001054   3.874122 -336.943766
1  NaN  2021-01-05  20.51  21.31  ... -28.451786  4.988992   3.864778 -336.131070
2  NaN  2021-01-06  20.83  21.28  ... -27.199798  4.769457   3.694713 -321.340003
3  NaN  2021-01-07  19.55  20.44  ... -27.364895  4.798406   3.717139 -323.290473
4  NaN  2021-01-08  19.89  20.13  ... -26.814570  4.701908   3.642385 -316.788905
5  NaN  2021-01-11  18.90  18.90  ... -24.145496  4.233888   3.279829 -285.256300
6  NaN  2021-01-12  17.20  18.35  ... -24.943466  4.373811   3.388222 -294.683573
7  NaN  2021-01-13  18.20  19.53  ... -25.810228  4.525797   3.505959 -304.923543
8  NaN  2021-01-14  18.69  19.62  ... -26.690747  4.680195   3.625566 -315.326052
9  NaN  2021-01-15  19.07  19.68  ... -26.553166  4.656071   3.606877 -313.700660

[10 rows x 17 columns]

所有行均向右移动1步。

目前我们在处理股票数据时,还没涉及到按行应用shift,这里就不展开介绍了。

发表评论

京公网安备 11010802036642号

京ICP备2021028699号