asteroid.losses.pmsqe module¶

class asteroid.losses.pmsqe.SingleSrcPMSQE(window_name='sqrt_hann', window_weight=1.0, bark_eq=True, gain_eq=True, sample_rate=16000)[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

Computes the Perceptual Metric for Speech Quality Evaluation (PMSQE) as described in [1]. This version is only designed for 16 kHz (512 length DFT). Adaptation to 8 kHz could be done by changing the parameters of the class (see Tensorflow implementation). The SLL, frequency and gain equalization are applied in each sequence independently.

Parameters:

window_name (str) – Select the used window function for the correct factor to be applied. Defaults to sqrt hanning window. Among [‘rect’, ‘hann’, ‘sqrt_hann’, ‘hamming’, ‘flatTop’].
window_weight (float, optional) – Correction to the window factor applied.
bark_eq (bool, optional) – Whether to apply bark equalization.
gain_eq (bool, optional) – Whether to apply gain equalization.
sample_rate (int) – Sample rate of the input audio.

References

[1] J.M.Martin, A.M.Gomez, J.A.Gonzalez, A.M.Peinado ‘A Deep Learning Loss Function based on the Perceptual Evaluation of the Speech Quality’, IEEE Signal Processing Letters, 2018. Implemented by Juan M. Martin. Contact: mdjuamart@ugr.es Copyright 2019: University of Granada, Signal Processing, Multimedia Transmission and Speech/Audio Technologies (SigMAT) Group.

Note

Inspired on the Perceptual Evaluation of the Speech Quality (PESQ) algorithm, this function consists of two regularization factors : the symmetrical and asymmetrical distortion in the loudness domain.

Examples

>>> import torch
>>> from asteroid.filterbanks import STFTFB, Encoder, transforms
>>> from asteroid.losses import PITLossWrapper, SingleSrcPMSQE
>>> stft = Encoder(STFTFB(kernel_size=512, n_filters=512, stride=256))
>>> # Usage by itself
>>> ref, est = torch.randn(2, 1, 16000), torch.randn(2, 1, 16000)
>>> ref_spec = transforms.take_mag(stft(ref))
>>> est_spec = transforms.take_mag(stft(est))
>>> loss_func = SingleSrcPMSQE()
>>> loss_value = loss_func(est_spec, ref_spec)
>>> # Usage with PITLossWrapper
>>> loss_func = PITLossWrapper(SingleSrcPMSQE(), pit_from='pw_pt')
>>> ref, est = torch.randn(2, 3, 16000), torch.randn(2, 3, 16000)
>>> ref_spec = transforms.take_mag(stft(ref))
>>> est_spec = transforms.take_mag(stft(est))
>>> loss_value = loss_func(ref_spec, est_spec)

bark_computation(spectra)[source]¶

bark_freq_equalization(ref_bark_spectra, deg_bark_spectra)[source]¶: This version is applied in the degraded directly.

bark_gain_equalization(ref_bark_spectra, deg_bark_spectra)[source]¶

compute_audible_power(bark_spectra, factor=1.0)[source]¶

compute_distortion_tensors(ref_bark_spec, deg_bark_spec)[source]¶

forward(est_targets, targets, pad_mask=None)[source]¶

Args

est_targets (torch.Tensor): Dimensions (B, T, F).: Padded degraded power spectrum in time-frequency domain.
targets (torch.Tensor): Dimensions (B, T, F).: Zero-Padded reference power spectrum in time-frequency domain.
pad_mask (torch.Tensor, optional): Dimensions (B, T, 1). Mask: to indicate the padding frames. Defaults to all ones.

Dimensions

B: Number of sequences in the batch. T: Number of time frames. F: Number of frequency bins.

Returns

torch.tensor of shape (B, ), wD + 0.309 * wDA

Notes

Dimensions (B, F, T) are also supported by SingleSrcPMSQE but are less efficient because input tensors are transposed (not inplace).

Examples

static get_correction_factor(window_name)[source]¶: Returns the power correction factor depending on the window.

loudness_computation(bark_spectra)[source]¶

magnitude_at_sll(spectra, pad_mask)[source]¶

per_frame_distortion(sym_d, asym_d, total_power_ref)[source]¶

populate_constants(sample_rate)[source]¶

register_16k_constants()[source]¶

register_8k_constants()[source]¶