如何使用 scikit learn 计算多类案例的精度、召回率、准确性和 f1 分数？

2024-02-19

我正在处理情感分析问题，数据如下所示：

label instances
    5    1190
    4     838
    3     239
    1     204
    2     127

所以我的数据从1190年开始就不平衡instances被标记为5。对于分类我使用 scikitSVC http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html。问题是我不知道如何以正确的方式平衡数据，以便准确计算多类情况的精度、召回率、准确性和 f1 分数。所以我尝试了以下方法：

First:

    wclf = SVC(kernel='linear', C= 1, class_weight={1: 10})
    wclf.fit(X, y)
    weighted_prediction = wclf.predict(X_test)

print 'Accuracy:', accuracy_score(y_test, weighted_prediction)
print 'F1 score:', f1_score(y_test, weighted_prediction,average='weighted')
print 'Recall:', recall_score(y_test, weighted_prediction,
                              average='weighted')
print 'Precision:', precision_score(y_test, weighted_prediction,
                                    average='weighted')
print '\n clasification report:\n', classification_report(y_test, weighted_prediction)
print '\n confussion matrix:\n',confusion_matrix(y_test, weighted_prediction)

Second:

auto_wclf = SVC(kernel='linear', C= 1, class_weight='auto')
auto_wclf.fit(X, y)
auto_weighted_prediction = auto_wclf.predict(X_test)

print 'Accuracy:', accuracy_score(y_test, auto_weighted_prediction)

print 'F1 score:', f1_score(y_test, auto_weighted_prediction,
                            average='weighted')

print 'Recall:', recall_score(y_test, auto_weighted_prediction,
                              average='weighted')

print 'Precision:', precision_score(y_test, auto_weighted_prediction,
                                    average='weighted')

print '\n clasification report:\n', classification_report(y_test,auto_weighted_prediction)

print '\n confussion matrix:\n',confusion_matrix(y_test, auto_weighted_prediction)

Third:

clf = SVC(kernel='linear', C= 1)
clf.fit(X, y)
prediction = clf.predict(X_test)


from sklearn.metrics import precision_score, \
    recall_score, confusion_matrix, classification_report, \
    accuracy_score, f1_score

print 'Accuracy:', accuracy_score(y_test, prediction)
print 'F1 score:', f1_score(y_test, prediction)
print 'Recall:', recall_score(y_test, prediction)
print 'Precision:', precision_score(y_test, prediction)
print '\n clasification report:\n', classification_report(y_test,prediction)
print '\n confussion matrix:\n',confusion_matrix(y_test, prediction)


F1 score:/usr/local/lib/python2.7/site-packages/sklearn/metrics/classification.py:676: DeprecationWarning: The default `weighted` averaging is deprecated, and from version 0.18, use of precision, recall or F-score with multiclass or multilabel data or pos_label=None will result in an exception. Please set an explicit value for `average`, one of (None, 'micro', 'macro', 'weighted', 'samples'). In cross validation use, for instance, scoring="f1_weighted" instead of scoring="f1".
  sample_weight=sample_weight)
/usr/local/lib/python2.7/site-packages/sklearn/metrics/classification.py:1172: DeprecationWarning: The default `weighted` averaging is deprecated, and from version 0.18, use of precision, recall or F-score with multiclass or multilabel data or pos_label=None will result in an exception. Please set an explicit value for `average`, one of (None, 'micro', 'macro', 'weighted', 'samples'). In cross validation use, for instance, scoring="f1_weighted" instead of scoring="f1".
  sample_weight=sample_weight)
/usr/local/lib/python2.7/site-packages/sklearn/metrics/classification.py:1082: DeprecationWarning: The default `weighted` averaging is deprecated, and from version 0.18, use of precision, recall or F-score with multiclass or multilabel data or pos_label=None will result in an exception. Please set an explicit value for `average`, one of (None, 'micro', 'macro', 'weighted', 'samples'). In cross validation use, for instance, scoring="f1_weighted" instead of scoring="f1".
  sample_weight=sample_weight)
 0.930416613529

但是，我收到这样的警告：

/usr/local/lib/python2.7/site-packages/sklearn/metrics/classification.py:1172:
DeprecationWarning: The default `weighted` averaging is deprecated,
and from version 0.18, use of precision, recall or F-score with 
multiclass or multilabel data or pos_label=None will result in an 
exception. Please set an explicit value for `average`, one of (None, 
'micro', 'macro', 'weighted', 'samples'). In cross validation use, for 
instance, scoring="f1_weighted" instead of scoring="f1"

如何正确处理不平衡数据，以便以正确的方式计算分类器的指标？

我认为对于哪些权重的用途存在很多困惑。我不确定我到底知道什么困扰着你，所以我将讨论不同的主题，请耐心等待；）。

班级权重

权重来自于class_weight参数用于训练分类器. They 不用于计算您正在使用的任何指标：对于不同的类别权重，由于分类器不同，数字也会不同。

基本上在每个 scikit-learn 分类器中，类权重用于告诉模型一个类的重要性。这意味着在训练过程中，分类器将付出额外的努力来正确分类具有高权重的类。
他们如何做到这一点是特定于算法的。如果您想了解有关 SVC 如何工作的详细信息，并且该文档对您来说没有意义，请随时提及。

指标

一旦有了分类器，您就想知道它的性能如何。在这里您可以使用您提到的指标：accuracy, recall_score, f1_score...

通常，当类别分布不平衡时，准确性被认为是一个糟糕的选择，因为它为仅预测最频繁类别的模型提供了高分。

我不会详细说明所有这些指标，但请注意，除了accuracy，它们自然地应用于班级级别：正如您在本例中所看到的print在分类报告中，它们是为每个类别定义的。他们依赖于诸如true positives or false negative需要定义哪个类是positive one.

             precision    recall  f1-score   support

          0       0.65      1.00      0.79        17
          1       0.57      0.75      0.65        16
          2       0.33      0.06      0.10        17
avg / total       0.52      0.60      0.51        50

警告

F1 score:/usr/local/lib/python2.7/site-packages/sklearn/metrics/classification.py:676: DeprecationWarning: The 
default `weighted` averaging is deprecated, and from version 0.18, 
use of precision, recall or F-score with multiclass or multilabel data  
or pos_label=None will result in an exception. Please set an explicit 
value for `average`, one of (None, 'micro', 'macro', 'weighted', 
'samples'). In cross validation use, for instance, 
scoring="f1_weighted" instead of scoring="f1".

您收到此警告是因为您正在使用 f1 分数、召回率和精度，而没有定义它们的计算方式！问题可以改写：从上面的分类报告中，你如何输出onef1 分数的全局数字？你可以：

取每个类别的 f1 分数的平均值：这就是avg / total结果如上。它也被称为macro平均。
使用真阳性/假阴性等的全局计数来计算 f1 分数（将每个类别的真阳性/假阴性数量相加）。阿卡micro平均。
计算 f1 分数的加权平均值。使用'weighted'在 scikit-learn 中，将通过类的支持来衡量 f1-score：类拥有的元素越多，该类在计算中的 f1-score 就越重要。

这是 scikit-learn 中的 3 个选项，警告是要告诉你必须选一个。所以你必须指定一个average分数方法的参数。

您选择哪一个取决于您想要如何衡量分类器的性能：例如，宏观平均不考虑类别不平衡，并且类别 1 的 f1 分数将与类别的 f1 分数一样重要5. 但是，如果您使用加权平均，您会发现第 5 类更加重要。

这些指标中的整个参数规范目前在 scikit-learn 中并不是非常清晰，根据文档，它在 0.18 版本中会变得更好。他们正在删除一些不明显的标准行为，并发出警告，以便开发人员注意到它。

计算分数

我想提到的最后一件事（如果您意识到这一点，请随意跳过）是，只有根据分类器的数据计算分数才有意义从未见过。这非常重要，因为您在拟合分类器时使用的数据上获得的任何分数都是完全无关的。

这是一种使用方法StratifiedShuffleSplit，它为您提供数据的随机分割（洗牌后），以保留标签分布。

from sklearn.datasets import make_classification
from sklearn.cross_validation import StratifiedShuffleSplit
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report, confusion_matrix

# We use a utility to generate artificial classification data.
X, y = make_classification(n_samples=100, n_informative=10, n_classes=3)
sss = StratifiedShuffleSplit(y, n_iter=1, test_size=0.5, random_state=0)
for train_idx, test_idx in sss:
    X_train, X_test, y_train, y_test = X[train_idx], X[test_idx], y[train_idx], y[test_idx]
    svc.fit(X_train, y_train)
    y_pred = svc.predict(X_test)
    print(f1_score(y_test, y_pred, average="macro"))
    print(precision_score(y_test, y_pred, average="macro"))
    print(recall_score(y_test, y_pred, average="macro"))

本文内容由网友自发贡献，版权归原作者所有，本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容，请联系:hwhale#tublm.com(使用前将#替换为@)