Evaluation and Performance Metrics

Table of Contents

  1. Leaderboard Deepbeat

  2. How to create comparable performance evaluations

  3. How to run the code for performance evaluation

  4. Model Meta Information

Leaderboard PulseDB - Performance Comparison Results

Model

Contributor

Set

SBP MAE (MASE)

Grades (IEEE 1708a-2019)

DBP MAE (MASE)

Grades (IEEE 1708a-2019)

Model Info

BaseLine

UOL

calib vital

14.91 (1.00)

A 20.67%, B 4.12%, C 3.98%, D 71.24%

9.52 (1.00)

A 32.31%, B 5.87%, C 5.87%, D 55.95%

PCV

XResNet1d101

UOL

calib vital

9.08 (0.60)

A 39.50%, B 6.38%, C 6.00%, D 48.12%

6.08 (0.63)

A 53.11%, B 7.59%, C 6.52%, D 32.77%

PCV

XResNet1d50

UOL

calib vital

9.49 (0.63)

A 36.81%, B 6.16%, C 5.77%, D 51.26%

6.33 (0.66)

A 50.72%, B 7.60%, C 6.60%, D 35.08%

PCV

Inception1d

UOL

calib vital

9.65 (0.64)

A 35.50%, B 6.05%, C 5.82%, D 52.63%

6.52 (0.68)

A 48.48%, B 7.44%, C 6.85%, D 37.23%

PCV

LeNet1d

UOL

calib vital

11.61 (0.77)

A 27.99%, B 5.31%, C 4.98%, D 61.72%

7.70 (0.80)

A 40.33%, B 7.10%, C 6.62%, D 45.95%

PCV

TCN + MLP

LNE

calib vital

12.25 (0.82)

7.97 (0.83)

WAVELET + MLP

LNE

calib vital

13.62 (0.91)

A 23.89%, B 4.56%, C 4.47%, D 67.06%

8.84 (0.92)

A 35.58%, B 6.54%, C 5.96%, D 51.90%

XResNet1d50 / MCD

NPL

pulsedb_calib

8.62 (0.58)

A 45.37%, B 6.08%, C 5.30%, D 43.26%

5.29 (TBD)

A 62.51%, B 6.04%, C 5.03%, D 26.41%

PCVMCD-1

XResNet1d50 / MCD

NPL

calib vital

7.94 (0.53)

A 48.00%, B 6.18%, C 5.30%, D 40.52%

5.07 (0.53)

A 63.81%, B 5.89%, C 4.89%, D 25.41%

PCVMCD-2

Alexnet (mean)

PTB

calib vital

10.03 (0.67)

A 35.19%, B 5.90%, C 5.56%, D 53.33%

6.47 (0.68)

A 50.51%, B 7.08%, C 6.17%, D 36.34%

Alexnet DE

PTB

calib vital

9.65 (0.65)

A 36.75%, B 6.04%, C 5.54%, D 51.68%

6.21 (0.65)

A 52.34%, B 7.09%, C 6.01%, D 34.55%

Minirocket (mean)

PTB

calib vital

11.43 (0.77)

A 28.60%, B 5.32%, C 5.16%, D 60.91%

7.47 (0.78)

A 42.15%, B 7.17%, C 6.55%, D 44.13%

Minirocket DE

PTB

calib vital

11.34 (0.76)

A 28.78%, B 5.34%, C 5.22%, D 60.66%

7.41 (0.78)

A 42.61%, B 7.14%, C 6.56%, D 43.69%

GPR

KTU

calib vital

12.22 (0.82)

A 26.60%, B 5.04%, C 4.84%, D 62.37%

7.78 (0.82)

A 39.81%, B 6.99%, C 6.58%, D 45.47%

GPR + PCA

KTU

calib vital

12.25 (0.82)

A 26.52%, B 4.92%, C 4.79%, D 62.61%

7.82 (0.82)

A 39.44%, B 6.96%, C 6.42%, D 46.02%

GPR + LASSO

KTU

calib vital

12.20 (0.82)

A 26.47%, B 4.88%, C 4.94%, D 62.55%

7.79 (0.82)

A 39.62%, B 6.94%, C 6.50%, D 45.78%

Leaderboard PulseDB - Performance Comparison Results

Model

Contributor

Set

SBP MAE (MASE)

Grades (IEEE 1708a-2019)

DBP MAE (MASE)

Grades (IEEE 1708a-2019)

Model Info

BaseLine

UOL

calibfree vital

14.87 (1.00)

A 20.63%, B 4.07%, C 3.88%, D 71.42%

9.43 (1.00)

A 32.98%, B 6.35%, C 5.88%, D 54.79%

PCFV

XResNet1d101

UOL

calibfree vital

12.70 (0.85)

A 25.75%, B 4.78%, C 4.75%, D 64.72%

8.05 (0.85)

A 38.73%, B 6.70%, C 6.27%, D 48.31%

PCFV

XResNet1d50

UOL

calibfree vital

12.40 (0.83)

A 24.61%, B 5.01%, C 4.76%, D 65.62%

7.84 (0.83)

A 39.69%, B 6.96%, C 6.40%, D 46.95%

PCFV

Inception1d

UOL

calibfree vital

14.97 (1.00)

A 21.26%, B 4.17%, C 4.11%, D 70.46%

8.98 (0.95)

A 27.93%, B 5.32%, C 5.34%, D 61.41%

PCFV

LeNet1d

UOL

calibfree vital

12.37 (0.83)

A 24.81%, B 4.91%, C 4.76%, D 65.52%

7.89 (0.83)

A 39.01%, B 7.16%, C 6.44%, D 47.39%

PCFV

WAVELET + MLP

LNE

calib vital

14.21 (0.95)

A 22.21%, B 4.33%, C 4.34%, D 69.10%

8.89 (0.94)

A 35.02%, B 6.24%, C 6.23%, D 52.49%

Alexnet (mean)

PTB

calibfree vital

12.46 (0.84)

A 25.29%, B 4.81%, C 4.77%, D 65.13%

7.95 (0.84)

A 39.12%, B 6.76%, C 6.36%, D 47.76%

Alexnet DE

PTB

calibfree vital

12.34 (0.83)

A 25.33%, B 5.00%, C 4.87%, D 64.80%

7.88 (0.84)

A 39.45%, B 6.91%, C 6.26%, D 47.38%

Minirocket (mean)

PTB

calibfree vital

12.44 (0.84)

A 25.57%, B 4.88%, C 4.78%, D 64.78%

7.96 (0.84)

A 38.60%, B 6.91%, C 6.39%, D 48.09%

Minirocket DE

PTB

calibfree vital

12.35 (0.83)

A 25.80%, B 4.90%, C 4.81%, D 64.49%

7.91 (0.84)

A 38.91%, B 6.92%, C 6.57%, D 47.60%

GPR

KTU

calibfree vital

12.90 (0.87)

A 24.62%, B 4.70%, C 4.44%, D 65.38%

8.15 (0.86)

A 38.06%, B 6.61%, C 6.37%, D 48.10%

GPR + PCA

KTU

calibfree vital

12.90 (0.87)

A 24.55%, B 4.78%, C 4.34%, D 65.47%

8.17 (0.87)

A 37.95%, B 6.50%, C 6.21%, D 48.47%

GPR + LASSO

KTU

calibfree vital

12.81 (0.86)

A 24.71%, B 4.78%, C 4.56%, D 65.09%

8.13 (0.86)

A 37.95%, B 6.76%, C 6.36%, D 48.06%

XResNet1d50 / MCD

NPL

calibfree vital

12.48 (0.84)

A 25.55%, B 4.91%, C 4.64%, D 64.90%

8.16 (0.87)

A 38.41%, B 6.76%, C 6.18%, D 48.65%

PCVMCD-2

Leaderboard Deepbeat - Performance Comparison Results

Model

Contributor

AUC

F1 (0.5)

Specificity (sensitivity > 0.8)

Sensitivity (specificity > 0.8)

MCC (sensitivity >0.8)

MCC (specificity>0.8)

Model Info

XResNet1d101

UOL

0.86

0.70

0.76

0.73

0.55

0.52

DB

XResNet1d50

UOL

0.87

0.69

0.78

0.78

0.57

0.57

DB

Inception1d

UOL

0.87

0.72

0.79

0.79

0.58

0.58

DB

S4

UOL

0.87

0.69

0.79

0.79

0.58

0.57

DB

LeNet1d

UOL

0.76

0.55

0.58

0.50

0.37

0.32

DB

WAVELET + MLP

LNE

0.77

0.61

0.59

0.52

0.38

0.33

MLP

KTU

0.52

0.34

0.19

0.32

MLP*

KTU

0.74

0.59

0.58

0.49

MLP**

KTU

0.92

0.59

0.88

0.95

XResNet1d50 / MCD

NPL

0.87

0.70

0.78

0.77

0.56

0.56

DBMCD-1

SPAR / tinyVGG

KCL

0.85

0.90

0.78

0.79

0.50

0.50

Alexnet(mean of 5)

PTB

0.83

0.67

0.71

0.70

0.50

0.50

Alexnet DE (5)

PTB

0.84

0.67

0.73

0.71

0.52

0.50

Minirocket (mean of 5)

PTB

0.82

0.63

0.68

0.66

0.46

0.46

Minirocket DE (5)

PTB

0.82

0.65

0.68

0.67

0.47

0.47

    • wheen removing segments which were assessed as bad-quality. ** - when setting labels to non AF where PPG segments are assessed as bad quality.

How to create comparable performance evaluations

To create uncertainty evaluations and compare them in a meaningful way to the other models, use the respective functions in the metrics module. As an example you can use the metrics.mean_absolute_error to compute the MAE for a regression problem or metrics.f1_score to compute the F1 score for a classification problem. All of the evaluation metrics follow the same input structure.

For Python users: Simply plug the test output dataset and the output of your model of the test input dataset into the metrics you desire.

For non-Python users: To compute the metric of your choice, you simply need to evaluate the output of your model on the test input dataset and save the output into a common file format. Then you can simply load the test output dataset and your model evaluations into a python script and call all the metrics suitable to the problem. You can use app/tutorial_metrics.py as a basis to code from.

For more details see or run app/tutorial_metrics.py or check out the metrics documentation.

Model Meta Information

PCV

key

value

dataset

PulseDB

model

BaseLine/XResNet1d101/XResNet1d50/Inception1d/LeNet1d

script

gitlab.com/qumphy/wp1-raw-time-series/main_ppg.py

commit SHA

command

python main_ppg.py --data ./pulsedb/memmap --input-size 1250 --architecture baseline/xresnet1d101/xresnet1d50/inception1d/lenet1d --finetune-dataset pulsedb_calib_vital

comment

See gitlab.com/qumphy/wp1-raw-time-series/README.md for more info.

PCFV

key

value

dataset

PulseDB

model

BaseLine/XResNet1d101/XResNet1d50/Inception1d/LeNet1d

script

gitlab.com/qumphy/wp1-raw-time-series/main_ppg.py

commit SHA

command

python main_ppg.py --data ./pulsedb/memmap --input-size 1250 --architecture baseline/xresnet1d101/xresnet1d50/inception1d/lenet1d --finetune-dataset pulsedb_calibfree_vital

comment

See gitlab.com/qumphy/wp1-raw-time-series/README.md for more info.

DB

key

value

dataset

DeepBeat

model

BaseLine/XResNet1d101/XResNet1d50/Inception1d/LeNet1d/S4

script

gitlab.com/qumphy/wp1-raw-time-series/main_ppg.py

commit SHA

command

python main_ppg.py --data ./deepbeat/memmap --input-size 800 --architecture baseline/xresnet1d101/xresnet1d50/inception1d/lenet1d --finetune-dataset deepbeat for s4 python main_ppg_s4.py --data ./deepBeat/memmap --input-size 800 --architecture s4 --precision 32 --s4-n 8 --s4-h 512 --batch-size 32  --finetune-dataset deepbeat --lr-find --refresh-rate 1

comment

See gitlab.com/qumphy/wp1-raw-time-series/README.md for more info.

DBMCD-1

key

value

dataset

DeepBeat

model

XResNet1d50 modified for monte carlo dropout

training/eval script

https://gitlab.com/qumphy/wp1-raw-time-series/-/blob/MCD_reg/custom_branch_arch/af/main_ppg_MCD.py?ref_type=heads

model script

https://gitlab.com/qumphy/wp1-raw-time-series/-/blob/MCD_reg/custom_branch_arch/af/clinical_ts/xresnet1d_MCD.py?ref_type=heads

commit SHA

train command

options="main_ppg_MCD.py --data /home/cb25/hpc-work/reg_wp1/wp1-raw-time-series/data --input-size 800 --architecture xresnet1d50_MCD --finetune-dataset deepbeat --refresh-rate 1 --executable mcd --UQ-method MCD --epochs 20 --dropout 0.03"

eval command

"main_ppg_MCD.py --data /home/cb25/hpc-work/reg_wp1/wp1-raw-time-series/data --input-size 800 --architecture xresnet1d50_MCD --finetune-dataset deepbeat --refresh-rate 1 --executable mcd --UQ-method MCD --eval-only ./best_model.ckpt --UQ-iters 100   --dropout 0.03"

comment

We add an auxiliary noise branch to the xresnet1d50 (from penultimate layer), and add dropout liberally throughout the arch. The training/eval script is customised for implementing MCD…

PCVMCD-1

key

value

dataset

PulseDB

model

XResNet1d50 modified for monte carlo dropout

training/eval script

https://gitlab.com/qumphy/wp1-raw-time-series/-/blob/MCD_reg/custom_branch_arch/sys/BP_script_model_info/main_ppg_MCD.py?ref_type=heads

model script

https://gitlab.com/qumphy/wp1-raw-time-series/-/blob/MCD_reg/custom_branch_arch/sys/BP_script_model_info/clinical_ts/xresnet1d_MCD.py?ref_type=heads

commit SHA

train command (systolic)

main_ppg_MCD.py --data /home/cb25/hpc-work/reg_wp1/wp1-raw-time-series/pulsedb_data --input-size 1250 --architecture xresnet1d50_MCD --finetune-dataset pulsedb_calib --refresh-rate 1 --UQ-method MCD --sys-dia-index 0 --executable mcd --dropout 0.05

eval command (systolic)

"main_ppg_MCD.py --data /home/cb25/hpc-work/reg_wp1/wp1-raw-time-series/pulsedb_data --input-size 1250 --architecture xresnet1d50_MCD --finetune-dataset pulsedb_calib --refresh-rate 1 --UQ-method MCD --eval-only ./best_model.ckpt --UQ-method MCD --sys-dia-index 0 --executable mcd --UQ-iters 100 --dropout 0.05"

train command (diastolic)

main_ppg_MCD.py --data /home/cb25/hpc-work/reg_wp1/wp1-raw-time-series/pulsedb_data --input-size 1250 --architecture xresnet1d50_MCD --finetune-dataset pulsedb_calib --refresh-rate 1 --UQ-method MCD --sys-dia-index 1 --executable mcd --dropout 0.04

eval command (diastolic)

main_ppg_MCD.py --data /home/cb25/hpc-work/reg_wp1/wp1-raw-time-series/pulsedb_data --input-size 1250 --architecture xresnet1d50_MCD --finetune-dataset pulsedb_calib --refresh-rate 1 --eval-only ./best_model.ckpt --UQ-method MCD --sys-dia-index 1 --executable mcd --UQ-iters 100 --dropout 0.04

comment

We train a seperate but identically configured models (aside from the dropout rates) for systolic and diastolic estimation. We add an auxiliary noise branch to the xresnet1d50 (from penultimate layer), and add dropout liberally throughout the arch. The training/eval script is customised for implementing MCD…

PCVMCD-2

key

value

dataset

PulseDB

model

XResNet1d50 modified for monte carlo dropout

training/eval script

https://gitlab.com/qumphy/wp1-raw-time-series/-/tree/MCD_reg/custom_branch_arch/sys/BP_script_model_info/v2?ref_type=heads

model script

https://gitlab.com/qumphy/wp1-raw-time-series/-/blob/MCD_reg/custom_branch_arch/sys/BP_script_model_info/v2/clinical_ts/xresnet1d_MCD.py?ref_type=heads

commit SHA

5ab9828f5d4737fc845586b53e894351fb9df933

train command (systolic)

main_ppg_MCD.py --data /home/cb25/hpc-work/reg_wp1/wp1-raw-time-series/pulsedb_data --input-size 1250 --architecture xresnet1d50_MCD --finetune-dataset pulsedb_calib_vital --refresh-rate 1 --UQ-method MCD --sys-dia-index 0 --executable mcd --dropout 0.05

eval command (systolic)

"main_ppg_MCD.py --data /home/cb25/hpc-work/reg_wp1/wp1-raw-time-series/pulsedb_data --input-size 1250 --architecture xresnet1d50_MCD --finetune-dataset pulsedb_calib_vital --refresh-rate 1 --UQ-method MCD --eval-only ./best_model.ckpt --UQ-method MCD --sys-dia-index 0 --executable mcd --UQ-iters 100 --dropout 0.05"

train command (diastolic)

main_ppg_MCD.py --data /home/cb25/hpc-work/reg_wp1/wp1-raw-time-series/pulsedb_data --input-size 1250 --architecture xresnet1d50_MCD --finetune-dataset pulsedb_calib_vital --refresh-rate 1 --UQ-method MCD --sys-dia-index 1 --executable mcd --dropout 0.04

eval command (diastolic)

main_ppg_MCD.py --data /home/cb25/hpc-work/reg_wp1/wp1-raw-time-series/pulsedb_data --input-size 1250 --architecture xresnet1d50_MCD --finetune-dataset pulsedb_calib_vital --refresh-rate 1 --eval-only ./best_model.ckpt --UQ-method MCD --sys-dia-index 1 --executable mcd --UQ-iters 100 --dropout 0.04

comment

We train a seperate but identically configured models (aside from the dropout rates) for systolic and diastolic estimation. We add an auxiliary noise branch to the xresnet1d50 (from penultimate layer), and add dropout liberally throughout the arch. The training/eval script is customised for implementing MCD…