Apache Ignite Developers - Legacy Mail Archive

[ML] Metric calculation for classification models

Classic

List

Threaded

6 messages Options

Yuriy Babak

[ML] Metric calculation for classification models

Igniters, Alexey

I want to discuss the ticket 10371 [1], currently, we calculate 4 numbers
(true positive, true negative, false positive, false negative) for each
"point metric" like accuracy, recall, f-score and precision for each label.

So for the full score we need calculates those 4 numbers 8 times. But we
could calculate all 8 metrics(4 for the first label and 4 for the second
label).

I suggest introducing new API "point metric" for metrics like those
4(accuracy, recall, f-score, and precision) and "integral metric" for
metrics like ROC AUC [2].

Any thoughts would be appreciated.

[1] - https://issues.apache.org/jira/browse/IGNITE-10371
[2] - https://issues.apache.org/jira/browse/IGNITE-10145

aplatonov

Re: [ML] Metric calculation for classification models

You can compute just TP (true-positive), FP, TN and FN counters and use
them to evaluate Recall, Precision, Accuracy, ect. If you want to specify
class for Pr evaluation, then you can compute Pr for first label as
TP/(TP+FP) and for second label as TN/(TN+FN) for example. After it we can
unite all one-point metrics evaluation.

In my opinion we can redesign metrics calculation and provide one-point
metrics (like Pr, Re) and integral metrics like ROC AUC where one-point
metrics can be calculated through TP,FP etc.

Maybe you should design class BinaryClassificationMetric that computes
these counters and provide methods like recall :: () -> double, precision
:: () -> double, etc.

чт, 13 дек. 2018 г. в 13:26, Yuriy Babak <[hidden email]>:

> Igniters, Alexey
>
> I want to discuss the ticket 10371 [1], currently, we calculate 4 numbers
> (true positive, true negative, false positive, false negative) for each
> "point metric" like accuracy, recall, f-score and precision for each label.
>
> So for the full score we need calculates those 4 numbers 8 times. But we
> could calculate all 8 metrics(4 for the first label and 4 for the second
> label).
>
> I suggest introducing new API "point metric" for metrics like those
> 4(accuracy, recall, f-score, and precision) and "integral metric" for
> metrics like ROC AUC [2].
>
> Any thoughts would be appreciated.
>
> [1] - https://issues.apache.org/jira/browse/IGNITE-10371
> [2] - https://issues.apache.org/jira/browse/IGNITE-10145
>

Alexey Zinoviev

Re: [ML] Metric calculation for classification models

So, I agree that we should avoid ineffective metrics calculations.
I think that in 2.8 release we should have

1. BinaryClassificationMetric with all metrics from Wikipedia
2. Metric interface with 1 or two implementations in example folder or
in metric package like roc auc and accuracy
3. BinaryClassificationMetric and MultiClassClassificationMetrics should
implement new interface MetricGroup

Will totally change the current PR according your recommendation

чт, 13 дек. 2018 г. в 16:06, Алексей Платонов <[hidden email]>:

> You can compute just TP (true-positive), FP, TN and FN counters and use
> them to evaluate Recall, Precision, Accuracy, ect. If you want to specify
> class for Pr evaluation, then you can compute Pr for first label as
> TP/(TP+FP) and for second label as TN/(TN+FN) for example. After it we can
> unite all one-point metrics evaluation.
>
> In my opinion we can redesign metrics calculation and provide one-point
> metrics (like Pr, Re) and integral metrics like ROC AUC where one-point
> metrics can be calculated through TP,FP etc.
>
> Maybe you should design class BinaryClassificationMetric that computes
> these counters and provide methods like recall :: () -> double, precision
> :: () -> double, etc.
>
> чт, 13 дек. 2018 г. в 13:26, Yuriy Babak <[hidden email]>:
>
> > Igniters, Alexey
> >
> > I want to discuss the ticket 10371 [1], currently, we calculate 4 numbers
> > (true positive, true negative, false positive, false negative) for each
> > "point metric" like accuracy, recall, f-score and precision for each
> label.
> >
> > So for the full score we need calculates those 4 numbers 8 times. But we
> > could calculate all 8 metrics(4 for the first label and 4 for the second
> > label).
> >
> > I suggest introducing new API "point metric" for metrics like those
> > 4(accuracy, recall, f-score, and precision) and "integral metric" for
> > metrics like ROC AUC [2].
> >
> > Any thoughts would be appreciated.
> >
> > [1] - https://issues.apache.org/jira/browse/IGNITE-10371
> > [2] - https://issues.apache.org/jira/browse/IGNITE-10145
> >
>

Dmitry Pavlov

Re: [ML] Metric calculation for classification models

Folks, I sometimes hear complains related to metrics and its clearness for
end-users.

Would you add a couple of words related to each value to wiki/readme.io?

чт, 13 дек. 2018 г. в 17:13, Alexey Zinoviev <[hidden email]>:

> So, I agree that we should avoid ineffective metrics calculations.
> I think that in 2.8 release we should have
>
> 1. BinaryClassificationMetric with all metrics from Wikipedia
> 2. Metric interface with 1 or two implementations in example folder or
> in metric package like roc auc and accuracy
> 3. BinaryClassificationMetric and MultiClassClassificationMetrics should
> implement new interface MetricGroup
>
> Will totally change the current PR according your recommendation
>
> чт, 13 дек. 2018 г. в 16:06, Алексей Платонов <[hidden email]>:
>
> > You can compute just TP (true-positive), FP, TN and FN counters and use
> > them to evaluate Recall, Precision, Accuracy, ect. If you want to specify
> > class for Pr evaluation, then you can compute Pr for first label as
> > TP/(TP+FP) and for second label as TN/(TN+FN) for example. After it we
> can
> > unite all one-point metrics evaluation.
> >
> > In my opinion we can redesign metrics calculation and provide one-point
> > metrics (like Pr, Re) and integral metrics like ROC AUC where one-point
> > metrics can be calculated through TP,FP etc.
> >
> > Maybe you should design class BinaryClassificationMetric that computes
> > these counters and provide methods like recall :: () -> double, precision
> > :: () -> double, etc.
> >
> > чт, 13 дек. 2018 г. в 13:26, Yuriy Babak <[hidden email]>:
> >
> > > Igniters, Alexey
> > >
> > > I want to discuss the ticket 10371 [1], currently, we calculate 4
> numbers
> > > (true positive, true negative, false positive, false negative) for each
> > > "point metric" like accuracy, recall, f-score and precision for each
> > label.
> > >
> > > So for the full score we need calculates those 4 numbers 8 times. But
> we
> > > could calculate all 8 metrics(4 for the first label and 4 for the
> second
> > > label).
> > >
> > > I suggest introducing new API "point metric" for metrics like those
> > > 4(accuracy, recall, f-score, and precision) and "integral metric" for
> > > metrics like ROC AUC [2].
> > >
> > > Any thoughts would be appreciated.
> > >
> > > [1] - https://issues.apache.org/jira/browse/IGNITE-10371
> > > [2] - https://issues.apache.org/jira/browse/IGNITE-10145
> > >
> >
>

Yuriy Babak

Re: [ML] Metric calculation for classification models

Dmitriy,

Sure, all changes in ML module will be described on readme.io site with
next release (2.8).

Best regards,
Yuriy Babak

чт, 13 дек. 2018 г. в 17:21, Dmitriy Pavlov <[hidden email]>:

> Folks, I sometimes hear complains related to metrics and its clearness for
> end-users.
>
> Would you add a couple of words related to each value to wiki/readme.io?
>
> чт, 13 дек. 2018 г. в 17:13, Alexey Zinoviev <[hidden email]>:
>
> > So, I agree that we should avoid ineffective metrics calculations.
> > I think that in 2.8 release we should have
> >
> > 1. BinaryClassificationMetric with all metrics from Wikipedia
> > 2. Metric interface with 1 or two implementations in example folder or
> > in metric package like roc auc and accuracy
> > 3. BinaryClassificationMetric and MultiClassClassificationMetrics
> should
> > implement new interface MetricGroup
> >
> > Will totally change the current PR according your recommendation
> >
> > чт, 13 дек. 2018 г. в 16:06, Алексей Платонов <[hidden email]>:
> >
> > > You can compute just TP (true-positive), FP, TN and FN counters and use
> > > them to evaluate Recall, Precision, Accuracy, ect. If you want to
> specify
> > > class for Pr evaluation, then you can compute Pr for first label as
> > > TP/(TP+FP) and for second label as TN/(TN+FN) for example. After it we
> > can
> > > unite all one-point metrics evaluation.
> > >
> > > In my opinion we can redesign metrics calculation and provide one-point
> > > metrics (like Pr, Re) and integral metrics like ROC AUC where one-point
> > > metrics can be calculated through TP,FP etc.
> > >
> > > Maybe you should design class BinaryClassificationMetric that computes
> > > these counters and provide methods like recall :: () -> double,
> precision
> > > :: () -> double, etc.
> > >
> > > чт, 13 дек. 2018 г. в 13:26, Yuriy Babak <[hidden email]>:
> > >
> > > > Igniters, Alexey
> > > >
> > > > I want to discuss the ticket 10371 [1], currently, we calculate 4
> > numbers
> > > > (true positive, true negative, false positive, false negative) for
> each
> > > > "point metric" like accuracy, recall, f-score and precision for each
> > > label.
> > > >
> > > > So for the full score we need calculates those 4 numbers 8 times. But
> > we
> > > > could calculate all 8 metrics(4 for the first label and 4 for the
> > second
> > > > label).
> > > >
> > > > I suggest introducing new API "point metric" for metrics like those
> > > > 4(accuracy, recall, f-score, and precision) and "integral metric" for
> > > > metrics like ROC AUC [2].
> > > >
> > > > Any thoughts would be appreciated.
> > > >
> > > > [1] - https://issues.apache.org/jira/browse/IGNITE-10371
> > > > [2] - https://issues.apache.org/jira/browse/IGNITE-10145
> > > >
> > >
> >
>

Alexey Zinoviev

Re: [ML] Metric calculation for classification models

In reply to this post by Dmitry Pavlov

Please, have a look at new version in my PR where I've implemented the
approach that was listed above
https://github.com/apache/ignite/pull/5612

чт, 13 дек. 2018 г. в 17:21, Dmitriy Pavlov <[hidden email]>: