qumphy.data.pulsedb module

File: qumphy/data/pulsedb.py Project: 22HLT01 QUMPHY Contact: oskar.pfeffer@ptb.de Gitlab: https://gitlab.com/qumphy Description: Functions handling PulseDB data.

class qumphy.data.pulsedb.PulseDBDataModule(data_directory, dataset, source, batch_size, num_workers, sampling_rate, prefetch_factor=8, **dskwargs)[source]

Bases: LightningDataModule

LightningDataModule implementation for the PulseDB dataset.

get_target_stats()[source]

setup(stage)[source]

test_dataloader()[source]

train_dataloader()[source]

val_dataloader()[source]

class qumphy.data.pulsedb.PulseDBDataset(data_directory, dataset, subset, source='all', pressure='both', normalize=False, dtype=torch.float32, load_data=True, data_fraction=1.0, filter_params=None, noise_params=None, input_sampling_rate=None, split_to_input_sampling_rate=None, target_sampling_rate=None)[source]

Bases: Dataset

Dataset Class for the PulseDB dataset.

To use the PulseDB Dataset, the data should be in a directory containing signals.npy and metadata.csv. Before using the dataset, run the write_target_stats_yaml(data_directory) function to create a stats.yaml file containint the statistics of the target, i.e., the mean, median, std, and baseline measures of the SBP and DBP.

calculate_baseline_measures()[source]

calculate_target_stats()[source]

get_data()[source]

get_labels()[source]

get_target_stats()[source]

Returns the statistics of the target.

Returns:: (BP_mean, BP_std, BP_median, MAE_baseline, RMSE_baseline)
Return type:: tuple

load_data(data_directory, filter_params=None, noise_params=None)[source]

load_target_stats(data_directory)[source]

Reads the target statistics of the PulseDB training datasets from a YAML file.

Return type:: None
Parameters:: data_directory (pathlib.Path) – The directory where the data is located.
Return type:: None

normalize_data(data)[source]

Rescales the data to the range [-1, 1].

Parameters:: (array) (data)
Returns:: array: The normalized data in the range [-1, 1].

normalize_target(target, denormalize=False)[source]

Rescales the target to 0 mean and 1 standard deviation, using the BP_mean and BP_std attributes.

Parameters:

target (np.ndarray) – The input target to be normalized.
denormalize (bool, optional) – If True, the target will be denormalized back to its original scale.

Returns:

The (de)normalized target.

Return type:

np.ndarray

select_subset_indices(metadata)[source]

Return an index mask of the memmapped dataset based on the selected subset.

Return type:: Index
Parameters:: metadata (pd.core.frame.DataFrame) – The metadata dataframe.
Returns:: The index mask.
Return type:: pd.core.indexes.base.Index

qumphy.data.pulsedb.get_target_stats(data_directory, dataset, source, dtype=numpy.float16)[source]

Reads the target statistics of the PulseDB training datasets from a YAML file.

Return type:

dict

Parameters:

data_directory (string) – Full path of the directory containing the data.
dataset (string) – Choose between the “calibfree”, “calib”, “aami” or “mini” dataset.
source (string) – Either “mimiciii” or “vital” or “all” (both).
dtype (np.dtype) – The data type of the target statistics.

Returns:

The target statistics as a dictionary.

Return type:

dict

qumphy.data.pulsedb.write_target_stats_yaml(data_directory, verbose=True)[source]

Writes the target statistics of the PulseDB training datasets to a YAML file.

Parameters:: data_directory (string) – The directory where the data is located.
Returns:: None