在处理股票数据时，经常需要把一些函数应用于多行或者多列数据上，这时就需要使用Series和DataFrame的apply方法。本文以sz.002407多氟多在2021年的前10根日线数据（如下表所示）为例，展示apply方法的使用，我们首先将数据加载到一个类型为DataFrame的对象df中。

date	open	high	low	close	preclose	volume	amount	adjustflag	turn	tradestatus	pctChg	peTTM	pbMRQ	psTTM	pcfNcfTTM
2021-01-04	20.0300000000	21.4000000000	19.9300000000	20.7300000000	20.0300000000	69639314	1442558998.1400	2	12.026700	1	3.494800	-28.520577	5.001054	3.874122	-336.943766
2021-01-05	20.5100000000	21.3100000000	19.9500000000	20.6800000000	20.7300000000	68155914	1404877488.0600	2	11.770500	1	-0.241200	-28.451786	4.988992	3.864778	-336.131070
2021-01-06	20.8300000000	21.2800000000	19.0600000000	19.7700000000	20.6800000000	69108077	1375667187.1400	2	11.935000	1	-4.400400	-27.199798	4.769457	3.694713	-321.340003
2021-01-07	19.5500000000	20.4400000000	19.2100000000	19.8900000000	19.7700000000	57333679	1145163382.0700	2	9.901500	1	0.607000	-27.364895	4.798406	3.717139	-323.290473
2021-01-08	19.8900000000	20.1300000000	18.6000000000	19.4900000000	19.8900000000	55294505	1063035986.1300	2	9.549300	1	-2.011100	-26.814570	4.701908	3.642385	-316.788905
2021-01-11	18.9000000000	18.9000000000	17.5400000000	17.5500000000	19.4900000000	77040397	1371939544.2400	2	13.304900	1	-9.953800	-24.145496	4.233888	3.279829	-285.256300
2021-01-12	17.2000000000	18.3500000000	16.8300000000	18.1300000000	17.5500000000	55091872	970628000.0300	2	9.514400	1	3.304800	-24.943466	4.373811	3.388222	-294.683573
2021-01-13	18.2000000000	19.5300000000	18.2000000000	18.7600000000	18.1300000000	68535355	1303591461.9500	2	11.836000	1	3.474900	-25.810228	4.525797	3.505959	-304.923543
2021-01-14	18.6900000000	19.6200000000	18.1200000000	19.4000000000	18.7600000000	58518764	1109560714.8600	2	10.106200	1	3.411500	-26.690747	4.680195	3.625566	-315.326052
2021-01-15	19.0700000000	19.6800000000	18.1300000000	19.3000000000	19.4000000000	55752774	1052651676.0200	2	9.628500	1	-0.515500	-26.553166	4.656071	3.606877	-313.700660

Series的apply方法

在Pandas中，从DataFrame取出1行或者1列，都会返回Series对象，例如：

# 取1列数据
print(type(df['date']))

<class 'pandas.core.series.Series'>

# 取1行数据
print(type(df.iloc[0]))

<class 'pandas.core.series.Series'>

我们以date列为例，来介绍Series的apply方法的使用。

在原始的df中，date列的元素类型均为str，我们首先尝试去掉年月日之间的”-“符号，即将”2021-01-04″转化为”20210104″。

首先打印下原始的date列看下结果：

print(df['date'])

0    2021-01-04
1    2021-01-05
2    2021-01-06
3    2021-01-07
4    2021-01-08
5    2021-01-11
6    2021-01-12
7    2021-01-13
8    2021-01-14
9    2021-01-15
Name: date, dtype: object

下面我们定义一个函数，输入参数为原始字符串，返回值为替换后的字符串，函数内部实际调用的是str类的replace函数：

def str_replace(s):
    return s.replace('-', '')

接着我们将自定义的str_replace作为参数传递给Series的apply方法，就可以应用到Series的每个元素上：

sr = df['date'].apply(str_replace)
print(sr)

0    20210104
1    20210105
2    20210106
3    20210107
4    20210108
5    20210111
6    20210112
7    20210113
8    20210114
9    20210115
Name: date, dtype: object

从打印结果可以看出，所有的日期都按照预期进行了修改。

对于带多个参数的函数，同样可以使用apply方法应用到Series的所有元素上。我们对str_replace函数进行改造：

def str_replace_v2(s, c=''):
    return s.replace('-', c)

该函数默认使用空串来替换”-“，也可以使用指定的字符串c来替换。在apply中使用时，只需要指定c的值即可：

sr = df['date'].apply(str_replace_v2, c='/')
print(sr)

0    2021/01/04
1    2021/01/05
2    2021/01/06
3    2021/01/07
4    2021/01/08
5    2021/01/11
6    2021/01/12
7    2021/01/13
8    2021/01/14
9    2021/01/15
Name: date, dtype: object

另：也可以直接使用Series的str方法，而不需要自定义函数，来实现相同的功能：

sr = df['date'].str.replace('-', '')
print(sr)

对于列Series进行的操作，我们可以把计算结果保存为df的新列：

df['new_date'] = df['date'].apply(str_replace_v2, c='/')
print(df)

         date   open   high    low  ...     psTTM   pcfNcfTTM  isST    new_date
0  2021-01-04  20.03  21.40  19.93  ...  3.874122 -336.943766     0  2021/01/04
1  2021-01-05  20.51  21.31  19.95  ...  3.864778 -336.131070     0  2021/01/05
2  2021-01-06  20.83  21.28  19.06  ...  3.694713 -321.340003     0  2021/01/06
3  2021-01-07  19.55  20.44  19.21  ...  3.717139 -323.290473     0  2021/01/07
4  2021-01-08  19.89  20.13  18.60  ...  3.642385 -316.788905     0  2021/01/08
5  2021-01-11  18.90  18.90  17.54  ...  3.279829 -285.256300     0  2021/01/11
6  2021-01-12  17.20  18.35  16.83  ...  3.388222 -294.683573     0  2021/01/12
7  2021-01-13  18.20  19.53  18.20  ...  3.505959 -304.923543     0  2021/01/13
8  2021-01-14  18.69  19.62  18.12  ...  3.625566 -315.326052     0  2021/01/14
9  2021-01-15  19.07  19.68  18.13  ...  3.606877 -313.700660     0  2021/01/15

[10 rows x 18 columns]

从打印结果可以看到，df新增了一列new_date，保存了apply的计算结果。

DataFrame的apply方法

DataFrame的apply与Series的apply使用方法类似，但是需要指定函数是按列还是按行进行，通过axis参数进行设置，axis默认值为0，表示按列应用指定函数，axis为1时，表示按行应用指定函数。

按列应用

假如我们想计算open、high、low、close列的均值，首先取出这4列，看一下数据类型：

print(type(df[['open', 'high', 'low', 'close']]))

<class 'pandas.core.frame.DataFrame'>

可见df[['open', 'high', 'low', 'close']]的数据类型为DataFrame。

然后，我们定义计算均值的函数：

def mean_value(row):
    return numpy.mean(row)

这里调用numpy的mean函数来计算均值。最后我们就可以利用apply方法来计算各列的均值：

print(df[['open', 'high', 'low', 'close']].apply(mean_value))

open     19.287
high     20.064
low      18.557
close    19.370
dtype: float64

apply方法还是返回的Series类型的对象，里面保存了各列的均值。

上面的代码只是用于演示apply的使用方法，也可以使用以下代码实现相同的功能：

print(df[['open', 'high', 'low', 'close']].mean())

如果想在应用函数中访问到具体元素，需要用索引来访问。我们的示例中，各行的索引是从0到9，假设我们想计算第0、2、5行的均值，可以这样改造代码：

def mean_value_v2(row):
    return (row[0] + row[2] + row[5]) / 3

print(df[['open', 'high', 'low', 'close']].apply(mean_value_v2))

open     19.920000
high     20.526667
low      18.843333
close    19.350000
dtype: float64

在代码中可以看到，使用row[0]、row[2]这样的方式来访问具体的元素。

按行应用

按行应用可以实现DataFrame单列或者多列之间的运算。

我们以计算振幅=(high-low)/preclose为例：

def amp(col):
    return (col['high'] - col['low']) / col['preclose']

print(df.apply(amp, axis=1))

0    0.073390
1    0.065605
2    0.107350
3    0.062215
4    0.076923
5    0.069779
6    0.086610
7    0.073359
8    0.079957
9    0.079897
dtype: float64

也可以通过以下两种lambda表达式来实现均可：

print(df.apply(lambda x: (x.high - x.low) / x.preclose, axis=1))
print(df.apply(lambda x: (x['high'] - x['low']) / x['preclose'], axis=1))

如果想把计算结果保存在df的列中，可以如下实现：

df['amp'] = df.apply(lambda x: (x.high - x.low) / x.preclose, axis=1)
print(df)

         date   open   high    low  ...     psTTM   pcfNcfTTM  isST       amp
0  2021-01-04  20.03  21.40  19.93  ...  3.874122 -336.943766     0  0.073390
1  2021-01-05  20.51  21.31  19.95  ...  3.864778 -336.131070     0  0.065605
2  2021-01-06  20.83  21.28  19.06  ...  3.694713 -321.340003     0  0.107350
3  2021-01-07  19.55  20.44  19.21  ...  3.717139 -323.290473     0  0.062215
4  2021-01-08  19.89  20.13  18.60  ...  3.642385 -316.788905     0  0.076923
5  2021-01-11  18.90  18.90  17.54  ...  3.279829 -285.256300     0  0.069779
6  2021-01-12  17.20  18.35  16.83  ...  3.388222 -294.683573     0  0.086610
7  2021-01-13  18.20  19.53  18.20  ...  3.505959 -304.923543     0  0.073359
8  2021-01-14  18.69  19.62  18.12  ...  3.625566 -315.326052     0  0.079957
9  2021-01-15  19.07  19.68  18.13  ...  3.606877 -313.700660     0  0.079897

[10 rows x 18 columns]

打赏赞(10)

Pandas的apply方法

Series的apply方法

DataFrame的apply方法

《Pandas的apply方法》有2条评论

发表评论取消回复

Series的apply方法

DataFrame的apply方法

《Pandas的apply方法》有2条评论

发表评论 取消回复

发表评论取消回复