您可以先创建mask
from MultiIndex http://pandas.pydata.org/pandas-docs/stable/generated/pandas.MultiIndex.html, 与之比较0
并检查至少一项True
(最后一个0
) by any http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.any.html:
mask = (pd.DataFrame(df.index.values.tolist(), index=df.index) == 0).any(axis=1)
print (mask)
mun loc geo block
1 0 0 0 True
1 0 0 True
1 0 True
1 False
2 False
2 0 True
1 False
2 False
2 1 1 False
2 False
2 1 1 1 False
2 False
2 1 False
2 False
dtype: bool
然后得到max http://pandas.pydata.org/pandas-docs/stable/generated/pandas.core.groupby.GroupBy.max.html值由groupby http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html每个第一、第二和第三索引,但在过滤之前boolean indexing http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing只取不存在的值True
in mask
:
df1 = df.ix[~mask, 'data1'].groupby(level=['mun','loc','geo']).max()
print (df1)
mun loc geo
1 1 1 4
2 3
2 1 12
2 1 1 123
2 6
Then reindex http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.reindex.html df1
by df.index
,删除最后一层Multiindex
by reset_index http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.reset_index.html, mask http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.mask.html没有改变的值mask
(也有必要删除最后一级)和fillna http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.fillna.html by 1
,因为除法返回相同的值。
df1 = df1.reindex(df.reset_index(level=3, drop=True).index)
.mask(mask.reset_index(level=3, drop=True)).fillna(1)
print (df1)
Name: data1, dtype: int64
mun loc geo
1 0 0 1.0
1 0 1.0
1 1.0
1 4.0
1 4.0
2 1.0
2 3.0
2 3.0
2 1 12.0
1 12.0
2 1 1 123.0
1 123.0
2 6.0
2 6.0
Name: data1, dtype: float64
最后除以div http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.div.html:
print (df['data1'].div(df1.values,axis=0))
mun loc geo block
1 0 0 0 20.000000
1 0 0 10.000000
1 0 10.000000
1 0.750000
2 1.000000
2 0 30.000000
1 0.333333
2 1.000000
2 1 1 0.833333
2 1.000000
2 1 1 1 1.000000
2 0.056911
2 1 1.000000
2 0.166667
dtype: float64