评价指标 | AmeBob

参考：

mAP in Object Detection: Mean Average Precision Explained

📊 基础定义

TP：真正例 (True Positive)
FP：假正例 (False Positive)
FN：假负例 (False Negative)
TN：真负例 (True Negative)

📈 核心指标公式

准确率 (Accuracy)

模型实际猜对的频率

when your model guesses how often does it guess correctly?

召回率 (Recall)

模型实际猜中的数量占应该猜中的数量的比例

Recall is a measure of “has your model guessed every time that it should have guessed?”

F1分数 (F1-Score)

兼顾准确率 (Accuracy)以及召回率 (Recall)

它是 Precision 和 Recall 在特定置信度阈值下的调和平均数。是在一点上的表现

$F1 = 2 \cdot \frac{Precision \cdot Recall}{Precision + Recall}$

交并比 (IoU)与mAP

iou决定判断对错的标准，置信度决定PR曲线中点的变化。

Intersection over Union (IoU) 衡量的是预测框（Prediction）和真实框（Ground Truth）的重合程度，一般可选择指标：重合50%，75% etc。。

$IoU = \frac{Area \ of \ Overlap}{Area \ of \ Union}$

在计算 mAP 之前，我们必须先回答一个问题：这个框预测得准不准？

如果 $IoU \ge$ 阈值（比如 0.5），这个预测就被判定为 TP (True Positive，真正例)。
如果 $IoU <$ 阈值，这个预测就被判定为 FP (False Positive，假正例)。

AP： AP 是 PR 曲线（Precision-Recall Curve）下的面积。它通过衡量所有可能的阈值，计算出模型在不同召回率下的平均精确率。AP 捕捉了模型的整体性能。即使两个模型的最高 F1 分数相同，AP 更大的模型通常意味着它在不同的召回率水平下都能保持较高的精确度

mAP则是计算多个分类下的AP，再取平均。

如mAP@50，便是IoU为0.5时，各类AP累加取平均。

confidence

自信度，这里是用来调节模型的“勇气”，越高的自信度，模型越保守，导致FP更低，从而accuracy越高。保守的模型也就导致FN上升，进而导致Recall下降。

⚠️注意：accuracy上升并不一定使recall下降。如改善模型整体可使二者同时上升。

if the model is in a situation where avoiding false positives (stating a RBC is present when the cell was a WBC) is more important than avoiding false negatives, it can set its confidence threshold higher to encourage the model to only produce high precision predictions at the expense of lowering its amount of coverage (recall).

📊 基础定义#

📈 核心指标公式#

📊 基础定义

📈 核心指标公式