asteroid.filterbanks.griffin_lim module¶

asteroid.filterbanks.griffin_lim.griffin_lim(mag_specgram, stft_enc, angles=None, istft_dec=None, n_iter=6, momentum=0.9)[source]¶

Estimates matching phase from magnitude spectogram using the ‘fast’ Griffin Lim algorithm [1].

Parameters:

mag_specgram (torch.Tensor) – (any, dim, ension, freq, frames) as returned by Encoder(STFTFB), the magnitude spectrogram to be inverted.
stft_enc (Encoder[STFTFB]) – The Encoder(STFTFB()) object that was used to compute the input mag_spec.
angles (None or Tensor) – Angles to use to initialize the algorithm. If None (default), angles are init with uniform ditribution.
istft_dec (None or Decoder[STFTFB]) – Optional Decoder to use to get back to the time domain. If None (default), a perfect reconstruction Decoder is built from stft_enc.
n_iter (int) – Number of griffin-lim iterations to run.
momentum (float) – The momentum of fast Griffin-Lim. Original Griffin-Lim is obtained for momentum=0.

Returns:

torch.Tensor – estimated waveforms of shape (any, dim, ension, time).

Examples

>>> stft = Encoder(STFTFB(n_filters=256, kernel_size=256, stride=128))
>>> wav = torch.randn(2, 1, 8000)
>>> spec = stft(wav)
>>> masked_spec = spec * torch.sigmoid(torch.randn_like(spec))
>>> mag = transforms.take_mag(masked_spec, -2)
>>> est_wav = griffin_lim(mag, stft, n_iter=32)

References

[1] Perraudin et al. “A fast Griffin-Lim algorithm,” WASPAA 2013. [2] D. W. Griffin and J. S. Lim: “Signal estimation from modified short-time Fourier transform,” ASSP 1984.

asteroid.filterbanks.griffin_lim.misi(mixture_wav, mag_specgrams, stft_enc, angles=None, istft_dec=None, n_iter=6, momentum=0.0, src_weights=None, dim=1)[source]¶

Jointly estimates matching phase from magnitude spectograms using the Multiple Input Spectrogram Inversion (MISI) algorithm [1].

Parameters:

mixture_wav (torch.Tensor) – (batch, time)
mag_specgrams (torch.Tensor) – (batch, n_src, freq, frames) as returned by Encoder(STFTFB), the magnitude spectrograms to be jointly inverted using MISI (modified or not).
stft_enc (Encoder[STFTFB]) – The Encoder(STFTFB()) object that was used to compute the input mag_spec.
angles (None or Tensor) – Angles to use to initialize the algorithm. If None (default), angles are init with uniform ditribution.
istft_dec (None or Decoder[STFTFB]) – Optional Decoder to use to get back to the time domain. If None (default), a perfect reconstruction Decoder is built from stft_enc.
n_iter (int) – Number of MISI iterations to run.
momentum (float) – Momentum on updates (this argument comes from GriffinLim). Defaults to 0 as it was never proposed anywhere.
src_weights (None or torch.Tensor) – Consistency weight for each source. Shape needs to be broadcastable to istft_dec(mag_specgrams). We make sure that the weights sum up to 1 along dim dim. If src_weights is None, compute them based on relative power.
dim (int) – Axis which contains the sources in mag_specgrams. Used for consistency constraint.

Returns:

torch.Tensor – estimated waveforms of shape (batch, n_src, time).

Examples

>>> stft = Encoder(STFTFB(n_filters=256, kernel_size=256, stride=128))
>>> wav = torch.randn(2, 3, 8000)
>>> specs = stft(wav)
>>> masked_specs = specs * torch.sigmoid(torch.randn_like(specs))
>>> mag = transforms.take_mag(masked_specs, -2)
>>> est_wav = misi(wav.sum(1), mag, stft, n_iter=32)

References

[1] Gunawan and Sen, “Iterative Phase Estimation for the Synthesis of Separated Sources From Single-Channel Mixtures,” in IEEE Signal Processing Letters, 2010. [2] Wang, LeRoux et al. “End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction.” Interspeech 2018 (2018)