asteroid.filterbanks package¶
-
class
asteroid.filterbanks.
Filterbank
(n_filters, kernel_size, stride=None)[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
Base Filterbank class. Each subclass has to implement a filters property.
Parameters: Variables: n_feats_out (int) – Number of output filters.
-
filters
¶ Abstract method for filters.
-
-
class
asteroid.filterbanks.
Encoder
(filterbank, is_pinv=False, as_conv1d=True, padding=0)[source]¶ Bases:
asteroid.filterbanks.enc_dec._EncDec
Encoder class.
Add encoding methods to Filterbank classes. Not intended to be subclassed.
Parameters: - filterbank (
Filterbank
) – The filterbank to use as an encoder. - is_pinv (bool) – Whether to be the pseudo inverse of filterbank.
- as_conv1d (bool) – Whether to behave like nn.Conv1d. If True (default), forwarding input with shape (batch, 1, time) will output a tensor of shape (batch, freq, conv_time). If False, will output a tensor of shape (batch, 1, freq, conv_time).
- padding (int) – Zero-padding added to both sides of the input.
-
forward
(waveform)[source]¶ Convolve input waveform with the filters from a filterbank. :param waveform: any tensor with samples along the
last dimension. The waveform representation with and batch/channel etc.. dimension.Returns: torch.Tensor
– The corresponding TF domain signal.- Shapes:
>>> (time, ) --> (freq, conv_time) >>> (batch, time) --> (batch, freq, conv_time) # Avoid >>> if as_conv1d: >>> (batch, 1, time) --> (batch, freq, conv_time) >>> (batch, chan, time) --> (batch, chan, freq, conv_time) >>> else: >>> (batch, chan, time) --> (batch, chan, freq, conv_time) >>> (batch, any, dim, time) --> (batch, any, dim, freq, conv_time)
-
classmethod
pinv_of
(filterbank, **kwargs)[source]¶ Returns an
Encoder
, pseudo inverse of aFilterbank
orDecoder
.
- filterbank (
-
class
asteroid.filterbanks.
Decoder
(filterbank, is_pinv=False, padding=0, output_padding=0)[source]¶ Bases:
asteroid.filterbanks.enc_dec._EncDec
Decoder class.
Add decoding methods to Filterbank classes. Not intended to be subclassed.
Parameters: - filterbank (
Filterbank
) – The filterbank to use as an decoder. - is_pinv (bool) – Whether to be the pseudo inverse of filterbank.
- padding (int) – Zero-padding added to both sides of the input.
- output_padding (int) – Additional size added to one side of the output shape.
- Notes
- padding and output_padding arguments are directly passed to F.conv_transpose1d.
-
forward
(spec)[source]¶ Applies transposed convolution to a TF representation.
This is equivalent to overlap-add.
Parameters: spec ( torch.Tensor
) – 3D or 4D Tensor. The TF representation. (Output ofEncoder.forward()
).Returns: torch.Tensor
– The corresponding time domain signal.
- filterbank (
-
class
asteroid.filterbanks.
FreeFB
(n_filters, kernel_size, stride=None, **kwargs)[source]¶ Bases:
asteroid.filterbanks.enc_dec.Filterbank
Free filterbank without any constraints. Equivalent to
nn.Conv1d
.Parameters: Variables: n_feats_out (int) – Number of output filters.
References
[1] : “Filterbank design for end-to-end speech separation”. Submitted to ICASSP 2020. Manuel Pariente, Samuele Cornell, Antoine Deleforge, Emmanuel Vincent.
-
filters
¶ Abstract method for filters.
-
-
class
asteroid.filterbanks.
STFTFB
(n_filters, kernel_size, stride=None, window=None, **kwargs)[source]¶ Bases:
asteroid.filterbanks.enc_dec.Filterbank
STFT filterbank.
Parameters: - n_filters (int) – Number of filters. Determines the length of the STFT filters before windowing.
- kernel_size (int) – Length of the filters (i.e the window).
- stride (int, optional) – Stride of the convolution (hop size). If None
(default), set to
kernel_size // 2
. - window (
numpy.ndarray
, optional) – If None, defaults tonp.sqrt(np.hanning())
.
Variables: n_feats_out (int) – Number of output filters.
-
filters
¶ Abstract method for filters.
-
class
asteroid.filterbanks.
AnalyticFreeFB
(n_filters, kernel_size, stride=None, **kwargs)[source]¶ Bases:
asteroid.filterbanks.enc_dec.Filterbank
Free analytic (fully learned with analycity constraints) filterbank. For more details, see [1].
Parameters: Variables: n_feats_out (int) – Number of output filters.
References
[1] : “Filterbank design for end-to-end speech separation”. Submitted to ICASSP 2020. Manuel Pariente, Samuele Cornell, Antoine Deleforge, Emmanuel Vincent.
-
filters
¶ Abstract method for filters.
-
-
class
asteroid.filterbanks.
ParamSincFB
(n_filters, kernel_size, stride=None, sample_rate=16000, min_low_hz=50, min_band_hz=50)[source]¶ Bases:
asteroid.filterbanks.enc_dec.Filterbank
Extension of the parameterized filterbank from [1] proposed in [2]. Modified and extended from from https://github.com/mravanelli/SincNet
Parameters: - n_filters (int) – Number of filters. Half of n_filters (the real parts) will have parameters, the other half will correspond to the imaginary parts. n_filters should be even.
- kernel_size (int) – Length of the filters.
- stride (int, optional) – Stride of the convolution. If None (default),
set to
kernel_size // 2
. - sample_rate (int, optional) – The sample rate (used for initialization).
- min_low_hz (int, optional) – Lowest low frequency allowed (Hz).
- min_band_hz (int, optional) – Lowest band frequency allowed (Hz).
Variables: n_feats_out (int) – Number of output filters.
References
[1] : “Speaker Recognition from raw waveform with SincNet”. SLT 2018. Mirco Ravanelli, Yoshua Bengio. https://arxiv.org/abs/1808.00158
[2] : “Filterbank design for end-to-end speech separation”. Submitted to ICASSP 2020. Manuel Pariente, Samuele Cornell, Antoine Deleforge, Emmanuel Vincent. https://arxiv.org/abs/1910.10400
-
filters
¶ Compute filters from parameters
-
class
asteroid.filterbanks.
MultiphaseGammatoneFB
(n_filters=128, kernel_size=16, sample_rate=8000, stride=None, **kwargs)[source]¶ Bases:
asteroid.filterbanks.enc_dec.Filterbank
Multi-Phase Gammatone Filterbank as described in [1]. Please cite [1] whenever using this. Original code repository: <https://github.com/sp-uhh/mp-gtf>
Parameters: References: [1] David Ditter, Timo Gerkmann, “A Multi-Phase Gammatone Filterbank for
Speech Separation via TasNet”, ICASSP 2020 Available: <https://ieeexplore.ieee.org/document/9053602/>-
filters
¶ Abstract method for filters.
-
-
asteroid.filterbanks.
griffin_lim
(mag_specgram, stft_enc, angles=None, istft_dec=None, n_iter=6, momentum=0.9)[source]¶ Estimates matching phase from magnitude spectogram using the ‘fast’ Griffin Lim algorithm [1].
Parameters: - mag_specgram (torch.Tensor) – (any, dim, ension, freq, frames) as returned by Encoder(STFTFB), the magnitude spectrogram to be inverted.
- stft_enc (Encoder[STFTFB]) – The Encoder(STFTFB()) object that was used to compute the input mag_spec.
- angles (None or Tensor) – Angles to use to initialize the algorithm. If None (default), angles are init with uniform ditribution.
- istft_dec (None or Decoder[STFTFB]) – Optional Decoder to use to get back to the time domain. If None (default), a perfect reconstruction Decoder is built from stft_enc.
- n_iter (int) – Number of griffin-lim iterations to run.
- momentum (float) – The momentum of fast Griffin-Lim. Original Griffin-Lim is obtained for momentum=0.
Returns: torch.Tensor – estimated waveforms of shape (any, dim, ension, time).
Examples
>>> stft = Encoder(STFTFB(n_filters=256, kernel_size=256, stride=128)) >>> wav = torch.randn(2, 1, 8000) >>> spec = stft(wav) >>> masked_spec = spec * torch.sigmoid(torch.randn_like(spec)) >>> mag = transforms.take_mag(masked_spec, -2) >>> est_wav = griffin_lim(mag, stft, n_iter=32)
References
[1] Perraudin et al. “A fast Griffin-Lim algorithm,” WASPAA 2013. [2] D. W. Griffin and J. S. Lim: “Signal estimation from modified short-time Fourier transform,” ASSP 1984.
-
asteroid.filterbanks.
misi
(mixture_wav, mag_specgrams, stft_enc, angles=None, istft_dec=None, n_iter=6, momentum=0.0, src_weights=None, dim=1)[source]¶ Jointly estimates matching phase from magnitude spectograms using the Multiple Input Spectrogram Inversion (MISI) algorithm [1].
Parameters: - mixture_wav (torch.Tensor) – (batch, time)
- mag_specgrams (torch.Tensor) – (batch, n_src, freq, frames) as returned by Encoder(STFTFB), the magnitude spectrograms to be jointly inverted using MISI (modified or not).
- stft_enc (Encoder[STFTFB]) – The Encoder(STFTFB()) object that was used to compute the input mag_spec.
- angles (None or Tensor) – Angles to use to initialize the algorithm. If None (default), angles are init with uniform ditribution.
- istft_dec (None or Decoder[STFTFB]) – Optional Decoder to use to get back to the time domain. If None (default), a perfect reconstruction Decoder is built from stft_enc.
- n_iter (int) – Number of MISI iterations to run.
- momentum (float) – Momentum on updates (this argument comes from GriffinLim). Defaults to 0 as it was never proposed anywhere.
- src_weights (None or torch.Tensor) – Consistency weight for each source. Shape needs to be broadcastable to istft_dec(mag_specgrams). We make sure that the weights sum up to 1 along dim dim. If src_weights is None, compute them based on relative power.
- dim (int) – Axis which contains the sources in mag_specgrams. Used for consistency constraint.
Returns: torch.Tensor – estimated waveforms of shape (batch, n_src, time).
Examples
>>> stft = Encoder(STFTFB(n_filters=256, kernel_size=256, stride=128)) >>> wav = torch.randn(2, 3, 8000) >>> specs = stft(wav) >>> masked_specs = specs * torch.sigmoid(torch.randn_like(specs)) >>> mag = transforms.take_mag(masked_specs, -2) >>> est_wav = misi(wav.sum(1), mag, stft, n_iter=32)
References
[1] Gunawan and Sen, “Iterative Phase Estimation for the Synthesis of Separated Sources From Single-Channel Mixtures,” in IEEE Signal Processing Letters, 2010. [2] Wang, LeRoux et al. “End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction.” Interspeech 2018 (2018)
-
asteroid.filterbanks.
make_enc_dec
(fb_name, n_filters, kernel_size, stride=None, who_is_pinv=None, padding=0, output_padding=0, **kwargs)[source]¶ Creates congruent encoder and decoder from the same filterbank family.
Parameters: - fb_name (str, className) – Filterbank family from which to make encoder
and decoder. To choose among [
'free'
,'analytic_free'
,'param_sinc'
,'stft'
]. Can also be a class defined in a submodule in this subpackade (e.g.FreeFB
). - n_filters (int) – Number of filters.
- kernel_size (int) – Length of the filters.
- stride (int, optional) – Stride of the convolution.
If None (default), set to
kernel_size // 2
. - who_is_pinv (str, optional) – If None, no pseudo-inverse filters will
be used. If string (among [
'encoder'
,'decoder'
]), decides which ofEncoder
orDecoder
will be the pseudo inverse of the other one. - padding (int) – Zero-padding added to both sides of the input. Passed to Encoder and Decoder.
- output_padding (int) – Additional size added to one side of the output shape. Passed to Decoder.
- **kwargs – Arguments which will be passed to the filterbank class additionally to the usual n_filters, kernel_size and stride. Depends on the filterbank family.
Returns: - fb_name (str, className) – Filterbank family from which to make encoder
and decoder. To choose among [
Submodules¶
- asteroid.filterbanks.analytic_free_fb module
- asteroid.filterbanks.enc_dec module
- asteroid.filterbanks.free_fb module
- asteroid.filterbanks.griffin_lim module
- asteroid.filterbanks.multiphase_gammatone_fb module
- asteroid.filterbanks.param_sinc_fb module
- asteroid.filterbanks.stft_fb module
- asteroid.filterbanks.transforms module