asteroid.filterbanks.griffin_lim module¶
-
asteroid.filterbanks.griffin_lim.
griffin_lim
(mag_specgram, stft_enc, angles=None, istft_dec=None, n_iter=6, momentum=0.9)[source]¶ Estimates matching phase from magnitude spectogram using the ‘fast’ Griffin Lim algorithm [1].
Parameters: - mag_specgram (torch.Tensor) – (any, dim, ension, freq, frames) as returned by Encoder(STFTFB), the magnitude spectrogram to be inverted.
- stft_enc (Encoder[STFTFB]) – The Encoder(STFTFB()) object that was used to compute the input mag_spec.
- angles (None or Tensor) – Angles to use to initialize the algorithm. If None (default), angles are init with uniform ditribution.
- istft_dec (None or Decoder[STFTFB]) – Optional Decoder to use to get back to the time domain. If None (default), a perfect reconstruction Decoder is built from stft_enc.
- n_iter (int) – Number of griffin-lim iterations to run.
- momentum (float) – The momentum of fast Griffin-Lim. Original Griffin-Lim is obtained for momentum=0.
Returns: torch.Tensor – estimated waveforms of shape (any, dim, ension, time).
Examples
>>> stft = Encoder(STFTFB(n_filters=256, kernel_size=256, stride=128)) >>> wav = torch.randn(2, 1, 8000) >>> spec = stft(wav) >>> masked_spec = spec * torch.sigmoid(torch.randn_like(spec)) >>> mag = transforms.take_mag(masked_spec, -2) >>> est_wav = griffin_lim(mag, stft, n_iter=32)
References
[1] Perraudin et al. “A fast Griffin-Lim algorithm,” WASPAA 2013. [2] D. W. Griffin and J. S. Lim: “Signal estimation from modified short-time Fourier transform,” ASSP 1984.
-
asteroid.filterbanks.griffin_lim.
misi
(mixture_wav, mag_specgrams, stft_enc, angles=None, istft_dec=None, n_iter=6, momentum=0.0, src_weights=None, dim=1)[source]¶ Jointly estimates matching phase from magnitude spectograms using the Multiple Input Spectrogram Inversion (MISI) algorithm [1].
Parameters: - mixture_wav (torch.Tensor) – (batch, time)
- mag_specgrams (torch.Tensor) – (batch, n_src, freq, frames) as returned by Encoder(STFTFB), the magnitude spectrograms to be jointly inverted using MISI (modified or not).
- stft_enc (Encoder[STFTFB]) – The Encoder(STFTFB()) object that was used to compute the input mag_spec.
- angles (None or Tensor) – Angles to use to initialize the algorithm. If None (default), angles are init with uniform ditribution.
- istft_dec (None or Decoder[STFTFB]) – Optional Decoder to use to get back to the time domain. If None (default), a perfect reconstruction Decoder is built from stft_enc.
- n_iter (int) – Number of MISI iterations to run.
- momentum (float) – Momentum on updates (this argument comes from GriffinLim). Defaults to 0 as it was never proposed anywhere.
- src_weights (None or torch.Tensor) – Consistency weight for each source. Shape needs to be broadcastable to istft_dec(mag_specgrams). We make sure that the weights sum up to 1 along dim dim. If src_weights is None, compute them based on relative power.
- dim (int) – Axis which contains the sources in mag_specgrams. Used for consistency constraint.
Returns: torch.Tensor – estimated waveforms of shape (batch, n_src, time).
Examples
>>> stft = Encoder(STFTFB(n_filters=256, kernel_size=256, stride=128)) >>> wav = torch.randn(2, 3, 8000) >>> specs = stft(wav) >>> masked_specs = specs * torch.sigmoid(torch.randn_like(specs)) >>> mag = transforms.take_mag(masked_specs, -2) >>> est_wav = misi(wav.sum(1), mag, stft, n_iter=32)
References
[1] Gunawan and Sen, “Iterative Phase Estimation for the Synthesis of Separated Sources From Single-Channel Mixtures,” in IEEE Signal Processing Letters, 2010. [2] Wang, LeRoux et al. “End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction.” Interspeech 2018 (2018)