

我想根据色调在条形顶部添加百分比。这意味着所有红色和蓝色条分别等于 100%。

我可以使蓝色条等于 100%,但红色条不能。哪些部分需要修改?


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# sample data
rows = 100000
data = {'Call_ID': np.random.normal(10000, 8000, size=rows).astype(int),
        'with_client_nmbr': np.random.choice([False, True], size=rows, p=[.17, .83]),
        'Type_of_Caller': np.random.choice(['Agency', 'EE', 'ER'], size=rows, p=[.06, .77, .17])}
all_call = pd.DataFrame(data)

   Call_ID  with_client_nmbr Type_of_Caller
0    11343              True             EE
1    14188              True         Agency
2    16539             False             EE
3    23630              True             ER
4    -7175              True             EE


df_agg= all_call.groupby(['Type_of_Caller','with_client_nmbr'])['Call_ID'].nunique().reset_index()

ax = sns.barplot(x='Type_of_Caller', y='Call_ID', hue='with_client_nmbr',
                 data=df_agg,palette=['orangered', 'skyblue'])

hue_order = all_call['with_client_nmbr'].unique()
df_f = sum(all_call.query("with_client_nmbr==False").groupby('Type_of_Caller')['Call_ID'].nunique())
df_t = sum(all_call.query("with_client_nmbr==True").groupby('Type_of_Caller')['Call_ID'].nunique())

for bars in ax.containers:
    if bars.get_label() == hue_order[0]:
        group_total = df_f
        group_total = df_t
    for p in ax.patches:
        width = p.get_width()
        height = p.get_height()
        x, y = p.get_xy()
        ax.annotate(f'{(height/group_total):.1%}', (x + width/2, y + height*1.02), ha='center')
  • print(hue_order) is ['False', 'True']

  • It's typically not required to use seaborn to plot grouped bars, it's just a matter of shaping the dataframe, usually with .pivot https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot.html or .pivot_table https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pivot_table.html. See How to create a grouped bar plot https://stackoverflow.com/q/47796264/7758804 for more examples.
    • Using pandas.DataFrame.plot https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html在这种情况下,使用宽数据框比使用长数据框更容易seaborn.barplot https://seaborn.pydata.org/generated/seaborn.barplot.html,因为柱/条顺序和totals重合。
    • 这将代码从 16 行减少到 8 行。
  • 看到这个answer https://stackoverflow.com/a/68851142/7758804用于将注释添加为总体的百分比。
  • 测试于python 3.8.11, pandas 1.3.1, and matplotlib 3.4.2


import pandas as pd
import matplotlib.pyplot as plt

# transform the sample data from the OP with pivot_table
dfp = all_call.pivot_table(index='Type_of_Caller', columns='with_client_nmbr', values='Call_ID', aggfunc='nunique')

# display(dfp)
with_client_nmbr  False   True
Agency              994   4593
EE                10554  27455
ER                 2748  11296

Use matplotlib.pyplot.bar_label https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.bar_label.html

  • 需要matplotlib >= 3.4.2
  • 每列按顺序绘制,并且pandas.Series由...制作df.sum()与数据框列的顺序相同。所以,zip totals到绘图容器并使用该值,tot, in labels按色相组计算百分比。
  • Add custom annotations based on percent by hue group, by using the labels parameter.
    • (v.get_height()/tot)*100在列表理解中,计算百分比。
  • 看到这个answer https://stackoverflow.com/a/67561982/7758804对于其他选项使用.bar_label
# get the total value for the column
totals = dfp.sum()

# plot
p1 = dfp.plot(kind='bar', figsize=(8, 4), rot=0, color=['orangered', 'skyblue'], ylabel='Value of Bar', title="The value and percentage (by hue group)")

# add annotations
for tot, p in zip(totals, p1.containers):
    labels = [f'{(v.get_height()/tot)*100:0.2f}%' for v in p]
    p1.bar_label(p, labels=labels, label_type='edge', fontsize=8, rotation=0, padding=2)


