Evaluation and Performance Metrics
Table of Contents
Leaderboard PulseDB - Performance Comparison Results
Model |
Contributor |
Set |
SBP MAE (MASE) |
Grades (IEEE 1708a-2019) |
DBP MAE (MASE) |
Grades (IEEE 1708a-2019) |
Model Info |
|---|---|---|---|---|---|---|---|
BaseLine |
UOL |
calib vital |
14.91 (1.00) |
A 20.67%, B 4.12%, C 3.98%, D 71.24% |
9.52 (1.00) |
A 32.31%, B 5.87%, C 5.87%, D 55.95% |
|
XResNet1d101 |
UOL |
calib vital |
9.08 (0.60) |
A 39.50%, B 6.38%, C 6.00%, D 48.12% |
6.08 (0.63) |
A 53.11%, B 7.59%, C 6.52%, D 32.77% |
|
XResNet1d50 |
UOL |
calib vital |
9.49 (0.63) |
A 36.81%, B 6.16%, C 5.77%, D 51.26% |
6.33 (0.66) |
A 50.72%, B 7.60%, C 6.60%, D 35.08% |
|
Inception1d |
UOL |
calib vital |
9.65 (0.64) |
A 35.50%, B 6.05%, C 5.82%, D 52.63% |
6.52 (0.68) |
A 48.48%, B 7.44%, C 6.85%, D 37.23% |
|
LeNet1d |
UOL |
calib vital |
11.61 (0.77) |
A 27.99%, B 5.31%, C 4.98%, D 61.72% |
7.70 (0.80) |
A 40.33%, B 7.10%, C 6.62%, D 45.95% |
|
TCN + MLP |
LNE |
calib vital |
12.25 (0.82) |
7.97 (0.83) |
|||
WAVELET + MLP |
LNE |
calib vital |
13.62 (0.91) |
A 23.89%, B 4.56%, C 4.47%, D 67.06% |
8.84 (0.92) |
A 35.58%, B 6.54%, C 5.96%, D 51.90% |
|
XResNet1d50 / MCD |
NPL |
pulsedb_calib |
8.62 (0.58) |
A 45.37%, B 6.08%, C 5.30%, D 43.26% |
5.29 (TBD) |
A 62.51%, B 6.04%, C 5.03%, D 26.41% |
|
XResNet1d50 / MCD |
NPL |
calib vital |
7.94 (0.53) |
A 48.00%, B 6.18%, C 5.30%, D 40.52% |
5.07 (0.53) |
A 63.81%, B 5.89%, C 4.89%, D 25.41% |
|
Alexnet (mean) |
PTB |
calib vital |
10.03 (0.67) |
A 35.19%, B 5.90%, C 5.56%, D 53.33% |
6.47 (0.68) |
A 50.51%, B 7.08%, C 6.17%, D 36.34% |
|
Alexnet DE |
PTB |
calib vital |
9.65 (0.65) |
A 36.75%, B 6.04%, C 5.54%, D 51.68% |
6.21 (0.65) |
A 52.34%, B 7.09%, C 6.01%, D 34.55% |
|
Minirocket (mean) |
PTB |
calib vital |
11.43 (0.77) |
A 28.60%, B 5.32%, C 5.16%, D 60.91% |
7.47 (0.78) |
A 42.15%, B 7.17%, C 6.55%, D 44.13% |
|
Minirocket DE |
PTB |
calib vital |
11.34 (0.76) |
A 28.78%, B 5.34%, C 5.22%, D 60.66% |
7.41 (0.78) |
A 42.61%, B 7.14%, C 6.56%, D 43.69% |
|
GPR |
KTU |
calib vital |
12.22 (0.82) |
A 26.60%, B 5.04%, C 4.84%, D 62.37% |
7.78 (0.82) |
A 39.81%, B 6.99%, C 6.58%, D 45.47% |
|
GPR + PCA |
KTU |
calib vital |
12.25 (0.82) |
A 26.52%, B 4.92%, C 4.79%, D 62.61% |
7.82 (0.82) |
A 39.44%, B 6.96%, C 6.42%, D 46.02% |
|
GPR + LASSO |
KTU |
calib vital |
12.20 (0.82) |
A 26.47%, B 4.88%, C 4.94%, D 62.55% |
7.79 (0.82) |
A 39.62%, B 6.94%, C 6.50%, D 45.78% |
Leaderboard PulseDB - Performance Comparison Results
Model |
Contributor |
Set |
SBP MAE (MASE) |
Grades (IEEE 1708a-2019) |
DBP MAE (MASE) |
Grades (IEEE 1708a-2019) |
Model Info |
|---|---|---|---|---|---|---|---|
BaseLine |
UOL |
calibfree vital |
14.87 (1.00) |
A 20.63%, B 4.07%, C 3.88%, D 71.42% |
9.43 (1.00) |
A 32.98%, B 6.35%, C 5.88%, D 54.79% |
|
XResNet1d101 |
UOL |
calibfree vital |
12.70 (0.85) |
A 25.75%, B 4.78%, C 4.75%, D 64.72% |
8.05 (0.85) |
A 38.73%, B 6.70%, C 6.27%, D 48.31% |
|
XResNet1d50 |
UOL |
calibfree vital |
12.40 (0.83) |
A 24.61%, B 5.01%, C 4.76%, D 65.62% |
7.84 (0.83) |
A 39.69%, B 6.96%, C 6.40%, D 46.95% |
|
Inception1d |
UOL |
calibfree vital |
14.97 (1.00) |
A 21.26%, B 4.17%, C 4.11%, D 70.46% |
8.98 (0.95) |
A 27.93%, B 5.32%, C 5.34%, D 61.41% |
|
LeNet1d |
UOL |
calibfree vital |
12.37 (0.83) |
A 24.81%, B 4.91%, C 4.76%, D 65.52% |
7.89 (0.83) |
A 39.01%, B 7.16%, C 6.44%, D 47.39% |
|
WAVELET + MLP |
LNE |
calib vital |
14.21 (0.95) |
A 22.21%, B 4.33%, C 4.34%, D 69.10% |
8.89 (0.94) |
A 35.02%, B 6.24%, C 6.23%, D 52.49% |
|
Alexnet (mean) |
PTB |
calibfree vital |
12.46 (0.84) |
A 25.29%, B 4.81%, C 4.77%, D 65.13% |
7.95 (0.84) |
A 39.12%, B 6.76%, C 6.36%, D 47.76% |
|
Alexnet DE |
PTB |
calibfree vital |
12.34 (0.83) |
A 25.33%, B 5.00%, C 4.87%, D 64.80% |
7.88 (0.84) |
A 39.45%, B 6.91%, C 6.26%, D 47.38% |
|
Minirocket (mean) |
PTB |
calibfree vital |
12.44 (0.84) |
A 25.57%, B 4.88%, C 4.78%, D 64.78% |
7.96 (0.84) |
A 38.60%, B 6.91%, C 6.39%, D 48.09% |
|
Minirocket DE |
PTB |
calibfree vital |
12.35 (0.83) |
A 25.80%, B 4.90%, C 4.81%, D 64.49% |
7.91 (0.84) |
A 38.91%, B 6.92%, C 6.57%, D 47.60% |
|
GPR |
KTU |
calibfree vital |
12.90 (0.87) |
A 24.62%, B 4.70%, C 4.44%, D 65.38% |
8.15 (0.86) |
A 38.06%, B 6.61%, C 6.37%, D 48.10% |
|
GPR + PCA |
KTU |
calibfree vital |
12.90 (0.87) |
A 24.55%, B 4.78%, C 4.34%, D 65.47% |
8.17 (0.87) |
A 37.95%, B 6.50%, C 6.21%, D 48.47% |
|
GPR + LASSO |
KTU |
calibfree vital |
12.81 (0.86) |
A 24.71%, B 4.78%, C 4.56%, D 65.09% |
8.13 (0.86) |
A 37.95%, B 6.76%, C 6.36%, D 48.06% |
|
XResNet1d50 / MCD |
NPL |
calibfree vital |
12.48 (0.84) |
A 25.55%, B 4.91%, C 4.64%, D 64.90% |
8.16 (0.87) |
A 38.41%, B 6.76%, C 6.18%, D 48.65% |
Leaderboard Deepbeat - Performance Comparison Results
Model |
Contributor |
AUC |
F1 (0.5) |
Specificity (sensitivity > 0.8) |
Sensitivity (specificity > 0.8) |
MCC (sensitivity >0.8) |
MCC (specificity>0.8) |
Model Info |
|---|---|---|---|---|---|---|---|---|
XResNet1d101 |
UOL |
0.86 |
0.70 |
0.76 |
0.73 |
0.55 |
0.52 |
|
XResNet1d50 |
UOL |
0.87 |
0.69 |
0.78 |
0.78 |
0.57 |
0.57 |
|
Inception1d |
UOL |
0.87 |
0.72 |
0.79 |
0.79 |
0.58 |
0.58 |
|
S4 |
UOL |
0.87 |
0.69 |
0.79 |
0.79 |
0.58 |
0.57 |
|
LeNet1d |
UOL |
0.76 |
0.55 |
0.58 |
0.50 |
0.37 |
0.32 |
|
WAVELET + MLP |
LNE |
0.77 |
0.61 |
0.59 |
0.52 |
0.38 |
0.33 |
|
MLP |
KTU |
0.52 |
0.34 |
0.19 |
0.32 |
|||
MLP* |
KTU |
0.74 |
0.59 |
0.58 |
0.49 |
|||
MLP** |
KTU |
0.92 |
0.59 |
0.88 |
0.95 |
|||
XResNet1d50 / MCD |
NPL |
0.87 |
0.70 |
0.78 |
0.77 |
0.56 |
0.56 |
|
SPAR / tinyVGG |
KCL |
0.85 |
0.90 |
0.78 |
0.79 |
0.50 |
0.50 |
|
Alexnet(mean of 5) |
PTB |
0.83 |
0.67 |
0.71 |
0.70 |
0.50 |
0.50 |
|
Alexnet DE (5) |
PTB |
0.84 |
0.67 |
0.73 |
0.71 |
0.52 |
0.50 |
|
Minirocket (mean of 5) |
PTB |
0.82 |
0.63 |
0.68 |
0.66 |
0.46 |
0.46 |
|
Minirocket DE (5) |
PTB |
0.82 |
0.65 |
0.68 |
0.67 |
0.47 |
0.47 |
wheen removing segments which were assessed as bad-quality. ** - when setting labels to non AF where PPG segments are assessed as bad quality.
How to create comparable performance evaluations
To create uncertainty evaluations and compare them in a meaningful way to the other models, use the respective functions in the metrics module.
As an example you can use the metrics.mean_absolute_error to compute the MAE for a regression problem or metrics.f1_score to compute the F1 score for a classification problem.
All of the evaluation metrics follow the same input structure.
For Python users: Simply plug the test output dataset and the output of your model of the test input dataset into the metrics you desire.
For non-Python users:
To compute the metric of your choice, you simply need to evaluate the output of your model on the test input dataset and save the output into a common file format.
Then you can simply load the test output dataset and your model evaluations into a python script and call all the metrics suitable to the problem.
You can use app/tutorial_metrics.py as a basis to code from.
For more details see or run app/tutorial_metrics.py or check out the metrics documentation.
Model Meta Information
PCV
key |
value |
|---|---|
dataset |
PulseDB |
model |
BaseLine/XResNet1d101/XResNet1d50/Inception1d/LeNet1d |
script |
|
commit SHA |
|
command |
|
comment |
See gitlab.com/qumphy/wp1-raw-time-series/README.md for more info. |
PCFV
key |
value |
|---|---|
dataset |
PulseDB |
model |
BaseLine/XResNet1d101/XResNet1d50/Inception1d/LeNet1d |
script |
|
commit SHA |
|
command |
|
comment |
See gitlab.com/qumphy/wp1-raw-time-series/README.md for more info. |
DB
key |
value |
|---|---|
dataset |
DeepBeat |
model |
BaseLine/XResNet1d101/XResNet1d50/Inception1d/LeNet1d/S4 |
script |
|
commit SHA |
|
command |
|
comment |
See gitlab.com/qumphy/wp1-raw-time-series/README.md for more info. |
DBMCD-1
key |
value |
|---|---|
dataset |
DeepBeat |
model |
XResNet1d50 modified for monte carlo dropout |
training/eval script |
|
model script |
|
commit SHA |
|
train command |
|
eval command |
|
comment |
We add an auxiliary noise branch to the xresnet1d50 (from penultimate layer), and add dropout liberally throughout the arch. The training/eval script is customised for implementing MCD… |
PCVMCD-1
key |
value |
|---|---|
dataset |
PulseDB |
model |
XResNet1d50 modified for monte carlo dropout |
training/eval script |
|
model script |
|
commit SHA |
|
train command (systolic) |
|
eval command (systolic) |
|
train command (diastolic) |
|
eval command (diastolic) |
|
comment |
We train a seperate but identically configured models (aside from the dropout rates) for systolic and diastolic estimation. We add an auxiliary noise branch to the xresnet1d50 (from penultimate layer), and add dropout liberally throughout the arch. The training/eval script is customised for implementing MCD… |
PCVMCD-2
key |
value |
|---|---|
dataset |
PulseDB |
model |
XResNet1d50 modified for monte carlo dropout |
training/eval script |
|
model script |
|
commit SHA |
5ab9828f5d4737fc845586b53e894351fb9df933 |
train command (systolic) |
|
eval command (systolic) |
|
train command (diastolic) |
|
eval command (diastolic) |
|
comment |
We train a seperate but identically configured models (aside from the dropout rates) for systolic and diastolic estimation. We add an auxiliary noise branch to the xresnet1d50 (from penultimate layer), and add dropout liberally throughout the arch. The training/eval script is customised for implementing MCD… |