链式索引
As the 文档 https://pandas.pydata.org/pandas-docs/stable/indexing.html#returning-a-view-versus-a-copy以及该网站上的其他几个答案([1] https://stackoverflow.com/questions/21463589/pandas-chained-assignments, [2] https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas)建议,链式索引被认为是不好的做法,应该避免。
因为似乎没有一种优雅的方式来使用基于整数位置的索引 (i.e. .iloc
)而不违反链索引规则(从pandas开始v0.23.4
),建议改为使用基于标签的索引 (i.e. .loc
) 尽可能用于分配目的。
但是,如果您绝对需要按行号访问数据,您可以
df.iloc[-1, df.columns.get_loc('c')] = 42
or
df.iloc[[-1, 1], df.columns.get_indexer(['a', 'c'])] = 42
熊猫行为怪异
根据我的理解,在尝试人为地重现错误时,您期望收到警告是绝对正确的。
到目前为止我发现这取决于数据框的构造方式
df = pd.DataFrame({'a': [4, 5, 6], 'c': [3, 2, 1]})
df.iloc[-1]['c'] = 42 # no warning
df = pd.DataFrame({'a': ['x', 'y', 'z'], 'c': ['t', 'u', 'v']})
df.iloc[-1]['c'] = 'f' # no warning
df = pd.DataFrame({'a': ['x', 'y', 'z'], 'c': [3, 2, 1]})
df.iloc[-1]['c'] = 42 # SettingWithCopyWarning: ...
看来熊猫(至少v0.23.4
)在链分配方面以不同的方式处理混合类型和单一类型数据帧[3] https://github.com/pandas-dev/pandas/blob/d9c814fd38f6ff73c53f286fdc71ca9512b81aef/pandas/core/generic.py#L3159
def _check_is_chained_assignment_possible(self):
"""
Check if we are a view, have a cacher, and are of mixed type.
If so, then force a setitem_copy check.
Should be called just near setting a value
Will return a boolean if it we are a view and are cached, but a
single-dtype meaning that the cacher should be updated following
setting.
"""
if self._is_view and self._is_cached:
ref = self._get_cacher()
if ref is not None and ref._is_mixed_type:
self._check_setitem_copy(stacklevel=4, t='referant',
force=True)
return True
elif self._is_copy:
self._check_setitem_copy(stacklevel=4, t='referant')
return False
这对我来说确实很奇怪,尽管我不确定这是否不是预期的。
然而,有一个老bug https://github.com/pandas-dev/pandas/issues/9767有类似的行为。
UPDATE
根据开发商 https://github.com/pandas-dev/pandas/issues/24315上述行为是预期的。