asteroid.data.wham_dataset module¶

asteroid.data.wham_dataset.normalize_tensor_wav(wav_tensor, eps=1e-08, std=None)[source]¶

class asteroid.data.wham_dataset.WhamDataset(json_dir, task, sample_rate=8000, segment=4.0, nondefault_nsrc=None, normalize_audio=False)[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

Dataset class for WHAM source separation and speech enhancement tasks.

Parameters:

json_dir (str) – The path to the directory containing the json files.
task (str) –
One of 'enh_single', 'enh_both', 'sep_clean' or 'sep_noisy'.
- 'enh_single' for single speaker speech enhancement.
- 'enh_both' for multi speaker speech enhancement.
- 'sep_clean' for two-speaker clean source separation.
- 'sep_noisy' for two-speaker noisy source separation.
sample_rate (int, optional) – The sampling rate of the wav files.
segment (float, optional) – Length of the segments used for training, in seconds. If None, use full utterances (e.g. for test).
nondefault_nsrc (int, optional) – Number of sources in the training targets. If None, defaults to one for enhancement tasks and two for separation tasks.
normalize_audio (bool) – If True then both sources and the mixture are normalized with the standard deviation of the mixture.

References: “WHAM!: Extending Speech Separation to Noisy Environments”, Wichern et al. 2019

dataset_name = 'WHAM'[source]¶

__getitem__(idx)[source]¶: Gets a mixture/sources pair. :returns: mixture, vstack([source_arrays])

get_infos()[source]¶

Get dataset infos (for publishing models).

Returns:	dict, dataset infos with keys dataset, task and licences.