asteroid.losses.pmsqe module¶
-
class
asteroid.losses.pmsqe.
SingleSrcPMSQE
(window_name='sqrt_hann', window_weight=1.0, bark_eq=True, gain_eq=True, sample_rate=16000)[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
Computes the Perceptual Metric for Speech Quality Evaluation (PMSQE) as described in [1]. This version is only designed for 16 kHz (512 length DFT). Adaptation to 8 kHz could be done by changing the parameters of the class (see Tensorflow implementation). The SLL, frequency and gain equalization are applied in each sequence independently.
Parameters: - window_name (str) – Select the used window function for the correct factor to be applied. Defaults to sqrt hanning window. Among [‘rect’, ‘hann’, ‘sqrt_hann’, ‘hamming’, ‘flatTop’].
- window_weight (float, optional) – Correction to the window factor applied.
- bark_eq (bool, optional) – Whether to apply bark equalization.
- gain_eq (bool, optional) – Whether to apply gain equalization.
- sample_rate (int) – Sample rate of the input audio.
References
[1] J.M.Martin, A.M.Gomez, J.A.Gonzalez, A.M.Peinado ‘A Deep Learning Loss Function based on the Perceptual Evaluation of the Speech Quality’, IEEE Signal Processing Letters, 2018. Implemented by Juan M. Martin. Contact: mdjuamart@ugr.es Copyright 2019: University of Granada, Signal Processing, Multimedia Transmission and Speech/Audio Technologies (SigMAT) Group.
Note
Inspired on the Perceptual Evaluation of the Speech Quality (PESQ) algorithm, this function consists of two regularization factors : the symmetrical and asymmetrical distortion in the loudness domain.
Examples
>>> import torch >>> from asteroid.filterbanks import STFTFB, Encoder, transforms >>> from asteroid.losses import PITLossWrapper, SingleSrcPMSQE >>> stft = Encoder(STFTFB(kernel_size=512, n_filters=512, stride=256)) >>> # Usage by itself >>> ref, est = torch.randn(2, 1, 16000), torch.randn(2, 1, 16000) >>> ref_spec = transforms.take_mag(stft(ref)) >>> est_spec = transforms.take_mag(stft(est)) >>> loss_func = SingleSrcPMSQE() >>> loss_value = loss_func(est_spec, ref_spec) >>> # Usage with PITLossWrapper >>> loss_func = PITLossWrapper(SingleSrcPMSQE(), pit_from='pw_pt') >>> ref, est = torch.randn(2, 3, 16000), torch.randn(2, 3, 16000) >>> ref_spec = transforms.take_mag(stft(ref)) >>> est_spec = transforms.take_mag(stft(est)) >>> loss_value = loss_func(ref_spec, est_spec)
-
bark_freq_equalization
(ref_bark_spectra, deg_bark_spectra)[source]¶ This version is applied in the degraded directly.
-
forward
(est_targets, targets, pad_mask=None)[source]¶ - Args
- est_targets (torch.Tensor): Dimensions (B, T, F).
- Padded degraded power spectrum in time-frequency domain.
- targets (torch.Tensor): Dimensions (B, T, F).
- Zero-Padded reference power spectrum in time-frequency domain.
- pad_mask (torch.Tensor, optional): Dimensions (B, T, 1). Mask
- to indicate the padding frames. Defaults to all ones.
- Dimensions
- B: Number of sequences in the batch. T: Number of time frames. F: Number of frequency bins.
- Returns
- torch.tensor of shape (B, ), wD + 0.309 * wDA
- Notes
- Dimensions (B, F, T) are also supported by SingleSrcPMSQE but are less efficient because input tensors are transposed (not inplace).
Examples