Evaluation and Performance Metrics

Table of Contents

Leaderboard Deepbeat
How to create comparable performance evaluations
How to run the code for performance evaluation
Model Meta Information

Leaderboard PulseDB - Performance Comparison Results

Model	Contributor	Set	SBP MAE (MASE)	Grades (IEEE 1708a-2019)	DBP MAE (MASE)	Grades (IEEE 1708a-2019)	Model Info
BaseLine	UOL	calib vital	14.91 (1.00)	A 20.67%, B 4.12%, C 3.98%, D 71.24%	9.52 (1.00)	A 32.31%, B 5.87%, C 5.87%, D 55.95%	PCV
XResNet1d101	UOL	calib vital	9.08 (0.60)	A 39.50%, B 6.38%, C 6.00%, D 48.12%	6.08 (0.63)	A 53.11%, B 7.59%, C 6.52%, D 32.77%	PCV
XResNet1d50	UOL	calib vital	9.49 (0.63)	A 36.81%, B 6.16%, C 5.77%, D 51.26%	6.33 (0.66)	A 50.72%, B 7.60%, C 6.60%, D 35.08%	PCV
Inception1d	UOL	calib vital	9.65 (0.64)	A 35.50%, B 6.05%, C 5.82%, D 52.63%	6.52 (0.68)	A 48.48%, B 7.44%, C 6.85%, D 37.23%	PCV
LeNet1d	UOL	calib vital	11.61 (0.77)	A 27.99%, B 5.31%, C 4.98%, D 61.72%	7.70 (0.80)	A 40.33%, B 7.10%, C 6.62%, D 45.95%	PCV
TCN + MLP	LNE	calib vital	12.25 (0.82)		7.97 (0.83)
WAVELET + MLP	LNE	calib vital	13.62 (0.91)	A 23.89%, B 4.56%, C 4.47%, D 67.06%	8.84 (0.92)	A 35.58%, B 6.54%, C 5.96%, D 51.90%
XResNet1d50 / MCD	NPL	pulsedb_calib	8.62 (0.58)	A 45.37%, B 6.08%, C 5.30%, D 43.26%	5.29 (TBD)	A 62.51%, B 6.04%, C 5.03%, D 26.41%	PCVMCD-1
XResNet1d50 / MCD	NPL	calib vital	7.94 (0.53)	A 48.00%, B 6.18%, C 5.30%, D 40.52%	5.07 (0.53)	A 63.81%, B 5.89%, C 4.89%, D 25.41%	PCVMCD-2
Alexnet (mean)	PTB	calib vital	10.03 (0.67)	A 35.19%, B 5.90%, C 5.56%, D 53.33%	6.47 (0.68)	A 50.51%, B 7.08%, C 6.17%, D 36.34%
Alexnet DE	PTB	calib vital	9.65 (0.65)	A 36.75%, B 6.04%, C 5.54%, D 51.68%	6.21 (0.65)	A 52.34%, B 7.09%, C 6.01%, D 34.55%
Minirocket (mean)	PTB	calib vital	11.43 (0.77)	A 28.60%, B 5.32%, C 5.16%, D 60.91%	7.47 (0.78)	A 42.15%, B 7.17%, C 6.55%, D 44.13%
Minirocket DE	PTB	calib vital	11.34 (0.76)	A 28.78%, B 5.34%, C 5.22%, D 60.66%	7.41 (0.78)	A 42.61%, B 7.14%, C 6.56%, D 43.69%
GPR	KTU	calib vital	12.22 (0.82)	A 26.60%, B 5.04%, C 4.84%, D 62.37%	7.78 (0.82)	A 39.81%, B 6.99%, C 6.58%, D 45.47%
GPR + PCA	KTU	calib vital	12.25 (0.82)	A 26.52%, B 4.92%, C 4.79%, D 62.61%	7.82 (0.82)	A 39.44%, B 6.96%, C 6.42%, D 46.02%
GPR + LASSO	KTU	calib vital	12.20 (0.82)	A 26.47%, B 4.88%, C 4.94%, D 62.55%	7.79 (0.82)	A 39.62%, B 6.94%, C 6.50%, D 45.78%

Leaderboard PulseDB - Performance Comparison Results

Model	Contributor	Set	SBP MAE (MASE)	Grades (IEEE 1708a-2019)	DBP MAE (MASE)	Grades (IEEE 1708a-2019)	Model Info
BaseLine	UOL	calibfree vital	14.87 (1.00)	A 20.63%, B 4.07%, C 3.88%, D 71.42%	9.43 (1.00)	A 32.98%, B 6.35%, C 5.88%, D 54.79%	PCFV
XResNet1d101	UOL	calibfree vital	12.70 (0.85)	A 25.75%, B 4.78%, C 4.75%, D 64.72%	8.05 (0.85)	A 38.73%, B 6.70%, C 6.27%, D 48.31%	PCFV
XResNet1d50	UOL	calibfree vital	12.40 (0.83)	A 24.61%, B 5.01%, C 4.76%, D 65.62%	7.84 (0.83)	A 39.69%, B 6.96%, C 6.40%, D 46.95%	PCFV
Inception1d	UOL	calibfree vital	14.97 (1.00)	A 21.26%, B 4.17%, C 4.11%, D 70.46%	8.98 (0.95)	A 27.93%, B 5.32%, C 5.34%, D 61.41%	PCFV
LeNet1d	UOL	calibfree vital	12.37 (0.83)	A 24.81%, B 4.91%, C 4.76%, D 65.52%	7.89 (0.83)	A 39.01%, B 7.16%, C 6.44%, D 47.39%	PCFV
WAVELET + MLP	LNE	calib vital	14.21 (0.95)	A 22.21%, B 4.33%, C 4.34%, D 69.10%	8.89 (0.94)	A 35.02%, B 6.24%, C 6.23%, D 52.49%
Alexnet (mean)	PTB	calibfree vital	12.46 (0.84)	A 25.29%, B 4.81%, C 4.77%, D 65.13%	7.95 (0.84)	A 39.12%, B 6.76%, C 6.36%, D 47.76%
Alexnet DE	PTB	calibfree vital	12.34 (0.83)	A 25.33%, B 5.00%, C 4.87%, D 64.80%	7.88 (0.84)	A 39.45%, B 6.91%, C 6.26%, D 47.38%
Minirocket (mean)	PTB	calibfree vital	12.44 (0.84)	A 25.57%, B 4.88%, C 4.78%, D 64.78%	7.96 (0.84)	A 38.60%, B 6.91%, C 6.39%, D 48.09%
Minirocket DE	PTB	calibfree vital	12.35 (0.83)	A 25.80%, B 4.90%, C 4.81%, D 64.49%	7.91 (0.84)	A 38.91%, B 6.92%, C 6.57%, D 47.60%
GPR	KTU	calibfree vital	12.90 (0.87)	A 24.62%, B 4.70%, C 4.44%, D 65.38%	8.15 (0.86)	A 38.06%, B 6.61%, C 6.37%, D 48.10%
GPR + PCA	KTU	calibfree vital	12.90 (0.87)	A 24.55%, B 4.78%, C 4.34%, D 65.47%	8.17 (0.87)	A 37.95%, B 6.50%, C 6.21%, D 48.47%
GPR + LASSO	KTU	calibfree vital	12.81 (0.86)	A 24.71%, B 4.78%, C 4.56%, D 65.09%	8.13 (0.86)	A 37.95%, B 6.76%, C 6.36%, D 48.06%
XResNet1d50 / MCD	NPL	calibfree vital	12.48 (0.84)	A 25.55%, B 4.91%, C 4.64%, D 64.90%	8.16 (0.87)	A 38.41%, B 6.76%, C 6.18%, D 48.65%	PCVMCD-2

Leaderboard Deepbeat - Performance Comparison Results

Model	Contributor	AUC	F1 (0.5)	Specificity (sensitivity > 0.8)	Sensitivity (specificity > 0.8)	MCC (sensitivity >0.8)	MCC (specificity>0.8)	Model Info
XResNet1d101	UOL	0.86	0.70	0.76	0.73	0.55	0.52	DB
XResNet1d50	UOL	0.87	0.69	0.78	0.78	0.57	0.57	DB
Inception1d	UOL	0.87	0.72	0.79	0.79	0.58	0.58	DB
S4	UOL	0.87	0.69	0.79	0.79	0.58	0.57	DB
LeNet1d	UOL	0.76	0.55	0.58	0.50	0.37	0.32	DB
WAVELET + MLP	LNE	0.77	0.61	0.59	0.52	0.38	0.33
MLP	KTU	0.52	0.34	0.19	0.32
MLP*	KTU	0.74	0.59	0.58	0.49
MLP**	KTU	0.92	0.59	0.88	0.95
XResNet1d50 / MCD	NPL	0.87	0.70	0.78	0.77	0.56	0.56	DBMCD-1
SPAR / tinyVGG	KCL	0.85	0.90	0.78	0.79	0.50	0.50
Alexnet(mean of 5)	PTB	0.83	0.67	0.71	0.70	0.50	0.50
Alexnet DE (5)	PTB	0.84	0.67	0.73	0.71	0.52	0.50
Minirocket (mean of 5)	PTB	0.82	0.63	0.68	0.66	0.46	0.46
Minirocket DE (5)	PTB	0.82	0.65	0.68	0.67	0.47	0.47

- wheen removing segments which were assessed as bad-quality. ** - when setting labels to non AF where PPG segments are assessed as bad quality.

How to create comparable performance evaluations

To create uncertainty evaluations and compare them in a meaningful way to the other models, use the respective functions in the metrics module. As an example you can use the metrics.mean_absolute_error to compute the MAE for a regression problem or metrics.f1_score to compute the F1 score for a classification problem. All of the evaluation metrics follow the same input structure.

For Python users: Simply plug the test output dataset and the output of your model of the test input dataset into the metrics you desire.

For non-Python users: To compute the metric of your choice, you simply need to evaluate the output of your model on the test input dataset and save the output into a common file format. Then you can simply load the test output dataset and your model evaluations into a python script and call all the metrics suitable to the problem. You can use app/tutorial_metrics.py as a basis to code from.

For more details see or run app/tutorial_metrics.py or check out the metrics documentation.

Model Meta Information

PCV

key	value
dataset	PulseDB
model	BaseLine/XResNet1d101/XResNet1d50/Inception1d/LeNet1d
script	gitlab.com/qumphy/wp1-raw-time-series/main_ppg.py
commit SHA
command	`python main_ppg.py --data ./pulsedb/memmap --input-size 1250 --architecture baseline/xresnet1d101/xresnet1d50/inception1d/lenet1d --finetune-dataset pulsedb_calib_vital`
comment	See gitlab.com/qumphy/wp1-raw-time-series/README.md for more info.

PCFV

key	value
dataset	PulseDB
model	BaseLine/XResNet1d101/XResNet1d50/Inception1d/LeNet1d
script	gitlab.com/qumphy/wp1-raw-time-series/main_ppg.py
commit SHA
command	`python main_ppg.py --data ./pulsedb/memmap --input-size 1250 --architecture baseline/xresnet1d101/xresnet1d50/inception1d/lenet1d --finetune-dataset pulsedb_calibfree_vital`
comment	See gitlab.com/qumphy/wp1-raw-time-series/README.md for more info.

DB

key	value
dataset	DeepBeat
model	BaseLine/XResNet1d101/XResNet1d50/Inception1d/LeNet1d/S4
script	gitlab.com/qumphy/wp1-raw-time-series/main_ppg.py
commit SHA
command	`python main_ppg.py --data ./deepbeat/memmap --input-size 800 --architecture baseline/xresnet1d101/xresnet1d50/inception1d/lenet1d --finetune-dataset deepbeat` for s4 `python main_ppg_s4.py --data ./deepBeat/memmap --input-size 800 --architecture s4 --precision 32 --s4-n 8 --s4-h 512 --batch-size 32 --finetune-dataset deepbeat --lr-find --refresh-rate 1`
comment	See gitlab.com/qumphy/wp1-raw-time-series/README.md for more info.

DBMCD-1

key	value
dataset	DeepBeat
model	XResNet1d50 modified for monte carlo dropout
training/eval script	https://gitlab.com/qumphy/wp1-raw-time-series/-/blob/MCD_reg/custom_branch_arch/af/main_ppg_MCD.py?ref_type=heads
model script	https://gitlab.com/qumphy/wp1-raw-time-series/-/blob/MCD_reg/custom_branch_arch/af/clinical_ts/xresnet1d_MCD.py?ref_type=heads
commit SHA
train command	`options="main_ppg_MCD.py --data /home/cb25/hpc-work/reg_wp1/wp1-raw-time-series/data --input-size 800 --architecture xresnet1d50_MCD --finetune-dataset deepbeat --refresh-rate 1 --executable mcd --UQ-method MCD --epochs 20 --dropout 0.03"`
eval command	`"main_ppg_MCD.py --data /home/cb25/hpc-work/reg_wp1/wp1-raw-time-series/data --input-size 800 --architecture xresnet1d50_MCD --finetune-dataset deepbeat --refresh-rate 1 --executable mcd --UQ-method MCD --eval-only ./best_model.ckpt --UQ-iters 100 --dropout 0.03"`
comment	We add an auxiliary noise branch to the xresnet1d50 (from penultimate layer), and add dropout liberally throughout the arch. The training/eval script is customised for implementing MCD…

PCVMCD-1

key	value
dataset	PulseDB
model	XResNet1d50 modified for monte carlo dropout
training/eval script	https://gitlab.com/qumphy/wp1-raw-time-series/-/blob/MCD_reg/custom_branch_arch/sys/BP_script_model_info/main_ppg_MCD.py?ref_type=heads
model script	https://gitlab.com/qumphy/wp1-raw-time-series/-/blob/MCD_reg/custom_branch_arch/sys/BP_script_model_info/clinical_ts/xresnet1d_MCD.py?ref_type=heads
commit SHA
train command (systolic)	`main_ppg_MCD.py --data /home/cb25/hpc-work/reg_wp1/wp1-raw-time-series/pulsedb_data --input-size 1250 --architecture xresnet1d50_MCD --finetune-dataset pulsedb_calib --refresh-rate 1 --UQ-method MCD --sys-dia-index 0 --executable mcd --dropout 0.05`
eval command (systolic)	`"main_ppg_MCD.py --data /home/cb25/hpc-work/reg_wp1/wp1-raw-time-series/pulsedb_data --input-size 1250 --architecture xresnet1d50_MCD --finetune-dataset pulsedb_calib --refresh-rate 1 --UQ-method MCD --eval-only ./best_model.ckpt --UQ-method MCD --sys-dia-index 0 --executable mcd --UQ-iters 100 --dropout 0.05"`
train command (diastolic)	`main_ppg_MCD.py --data /home/cb25/hpc-work/reg_wp1/wp1-raw-time-series/pulsedb_data --input-size 1250 --architecture xresnet1d50_MCD --finetune-dataset pulsedb_calib --refresh-rate 1 --UQ-method MCD --sys-dia-index 1 --executable mcd --dropout 0.04`
eval command (diastolic)	`main_ppg_MCD.py --data /home/cb25/hpc-work/reg_wp1/wp1-raw-time-series/pulsedb_data --input-size 1250 --architecture xresnet1d50_MCD --finetune-dataset pulsedb_calib --refresh-rate 1 --eval-only ./best_model.ckpt --UQ-method MCD --sys-dia-index 1 --executable mcd --UQ-iters 100 --dropout 0.04`
comment	We train a seperate but identically configured models (aside from the dropout rates) for systolic and diastolic estimation. We add an auxiliary noise branch to the xresnet1d50 (from penultimate layer), and add dropout liberally throughout the arch. The training/eval script is customised for implementing MCD…

PCVMCD-2

key	value
dataset	PulseDB
model	XResNet1d50 modified for monte carlo dropout
training/eval script	https://gitlab.com/qumphy/wp1-raw-time-series/-/tree/MCD_reg/custom_branch_arch/sys/BP_script_model_info/v2?ref_type=heads
model script	https://gitlab.com/qumphy/wp1-raw-time-series/-/blob/MCD_reg/custom_branch_arch/sys/BP_script_model_info/v2/clinical_ts/xresnet1d_MCD.py?ref_type=heads
commit SHA	5ab9828f5d4737fc845586b53e894351fb9df933
train command (systolic)	`main_ppg_MCD.py --data /home/cb25/hpc-work/reg_wp1/wp1-raw-time-series/pulsedb_data --input-size 1250 --architecture xresnet1d50_MCD --finetune-dataset pulsedb_calib_vital --refresh-rate 1 --UQ-method MCD --sys-dia-index 0 --executable mcd --dropout 0.05`
eval command (systolic)	`"main_ppg_MCD.py --data /home/cb25/hpc-work/reg_wp1/wp1-raw-time-series/pulsedb_data --input-size 1250 --architecture xresnet1d50_MCD --finetune-dataset pulsedb_calib_vital --refresh-rate 1 --UQ-method MCD --eval-only ./best_model.ckpt --UQ-method MCD --sys-dia-index 0 --executable mcd --UQ-iters 100 --dropout 0.05"`
train command (diastolic)	`main_ppg_MCD.py --data /home/cb25/hpc-work/reg_wp1/wp1-raw-time-series/pulsedb_data --input-size 1250 --architecture xresnet1d50_MCD --finetune-dataset pulsedb_calib_vital --refresh-rate 1 --UQ-method MCD --sys-dia-index 1 --executable mcd --dropout 0.04`
eval command (diastolic)	`main_ppg_MCD.py --data /home/cb25/hpc-work/reg_wp1/wp1-raw-time-series/pulsedb_data --input-size 1250 --architecture xresnet1d50_MCD --finetune-dataset pulsedb_calib_vital --refresh-rate 1 --eval-only ./best_model.ckpt --UQ-method MCD --sys-dia-index 1 --executable mcd --UQ-iters 100 --dropout 0.04`
comment	We train a seperate but identically configured models (aside from the dropout rates) for systolic and diastolic estimation. We add an auxiliary noise branch to the xresnet1d50 (from penultimate layer), and add dropout liberally throughout the arch. The training/eval script is customised for implementing MCD…