asteroid.filterbanks package¶

class asteroid.filterbanks.Filterbank(n_filters, kernel_size, stride=None)[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

Base Filterbank class. Each subclass has to implement a filters property.

Parameters:	n_filters (int) – Number of filters. kernel_size (int) – Length of the filters. stride (int, optional) – Stride of the conv or transposed conv. (Hop size). If None (default), set to `kernel_size // 2`.
Variables:	n_feats_out (int) – Number of output filters.

get_config()[source]¶: Returns dictionary of arguments to re-instantiate the class.

filters¶: Abstract method for filters.

class asteroid.filterbanks.Encoder(filterbank, is_pinv=False, as_conv1d=True, padding=0)[source]¶

Bases: asteroid.filterbanks.enc_dec._EncDec

Encoder class.

Add encoding methods to Filterbank classes. Not intended to be subclassed.

Parameters:

filterbank (Filterbank) – The filterbank to use as an encoder.
is_pinv (bool) – Whether to be the pseudo inverse of filterbank.
as_conv1d (bool) – Whether to behave like nn.Conv1d. If True (default), forwarding input with shape (batch, 1, time) will output a tensor of shape (batch, freq, conv_time). If False, will output a tensor of shape (batch, 1, freq, conv_time).
padding (int) – Zero-padding added to both sides of the input.

batch_1d_conv(inp, filters)[source]¶

forward(waveform)[source]¶

Convolve input waveform with the filters from a filterbank. :param waveform: any tensor with samples along the

last dimension. The waveform representation with and batch/channel etc.. dimension.

Returns:	`torch.Tensor` – The corresponding TF domain signal.

Shapes:

>>> (time, ) --> (freq, conv_time)
>>> (batch, time) --> (batch, freq, conv_time)  # Avoid
>>> if as_conv1d:
>>>     (batch, 1, time) --> (batch, freq, conv_time)
>>>     (batch, chan, time) --> (batch, chan, freq, conv_time)
>>> else:
>>>     (batch, chan, time) --> (batch, chan, freq, conv_time)
>>> (batch, any, dim, time) --> (batch, any, dim, freq, conv_time)

classmethod pinv_of(filterbank, **kwargs)[source]¶: Returns an Encoder, pseudo inverse of a Filterbank or Decoder.

class asteroid.filterbanks.Decoder(filterbank, is_pinv=False, padding=0, output_padding=0)[source]¶

Bases: asteroid.filterbanks.enc_dec._EncDec

Decoder class.

Add decoding methods to Filterbank classes. Not intended to be subclassed.

Parameters:	filterbank (`Filterbank`) – The filterbank to use as an decoder. is_pinv (bool) – Whether to be the pseudo inverse of filterbank. padding (int) – Zero-padding added to both sides of the input. output_padding (int) – Additional size added to one side of the output shape.

Notes: padding and output_padding arguments are directly passed to F.conv_transpose1d.

forward(spec)[source]¶

Applies transposed convolution to a TF representation.

This is equivalent to overlap-add.

Parameters:	spec (`torch.Tensor`) – 3D or 4D Tensor. The TF representation. (Output of `Encoder.forward()`).
Returns:	`torch.Tensor` – The corresponding time domain signal.

classmethod pinv_of(filterbank)[source]¶: Returns an Decoder, pseudo inverse of a filterbank or Encoder.

class asteroid.filterbanks.FreeFB(n_filters, kernel_size, stride=None, **kwargs)[source]¶

Bases: asteroid.filterbanks.enc_dec.Filterbank

Free filterbank without any constraints. Equivalent to nn.Conv1d.

Parameters:	n_filters (int) – Number of filters. kernel_size (int) – Length of the filters. stride (int, optional) – Stride of the convolution. If None (default), set to `kernel_size // 2`.
Variables:	n_feats_out (int) – Number of output filters.

References

[1] : “Filterbank design for end-to-end speech separation”. Submitted to ICASSP 2020. Manuel Pariente, Samuele Cornell, Antoine Deleforge, Emmanuel Vincent.

filters¶: Abstract method for filters.

class asteroid.filterbanks.STFTFB(n_filters, kernel_size, stride=None, window=None, **kwargs)[source]¶

Bases: asteroid.filterbanks.enc_dec.Filterbank

STFT filterbank.

Parameters:	n_filters (int) – Number of filters. Determines the length of the STFT filters before windowing. kernel_size (int) – Length of the filters (i.e the window). stride (int, optional) – Stride of the convolution (hop size). If None (default), set to `kernel_size // 2`. window (`numpy.ndarray`, optional) – If None, defaults to `np.sqrt(np.hanning())`.
Variables:	n_feats_out (int) – Number of output filters.

filters¶: Abstract method for filters.

class asteroid.filterbanks.AnalyticFreeFB(n_filters, kernel_size, stride=None, **kwargs)[source]¶

Bases: asteroid.filterbanks.enc_dec.Filterbank

Free analytic (fully learned with analycity constraints) filterbank. For more details, see [1].

Parameters:	n_filters (int) – Number of filters. Half of n_filters will have parameters, the other half will be the hilbert transforms. n_filters should be even. kernel_size (int) – Length of the filters. stride (int, optional) – Stride of the convolution. If None (default), set to `kernel_size // 2`.
Variables:	n_feats_out (int) – Number of output filters.

References

[1] : “Filterbank design for end-to-end speech separation”. Submitted to ICASSP 2020. Manuel Pariente, Samuele Cornell, Antoine Deleforge, Emmanuel Vincent.

filters¶: Abstract method for filters.

class asteroid.filterbanks.ParamSincFB(n_filters, kernel_size, stride=None, sample_rate=16000, min_low_hz=50, min_band_hz=50)[source]¶

Bases: asteroid.filterbanks.enc_dec.Filterbank

Extension of the parameterized filterbank from [1] proposed in [2]. Modified and extended from from https://github.com/mravanelli/SincNet

Parameters:

n_filters (int) – Number of filters. Half of n_filters (the real parts) will have parameters, the other half will correspond to the imaginary parts. n_filters should be even.
kernel_size (int) – Length of the filters.
stride (int, optional) – Stride of the convolution. If None (default), set to kernel_size // 2.
sample_rate (int, optional) – The sample rate (used for initialization).
min_low_hz (int, optional) – Lowest low frequency allowed (Hz).
min_band_hz (int, optional) – Lowest band frequency allowed (Hz).

Variables:

n_feats_out (int) – Number of output filters.

References

[1] : “Speaker Recognition from raw waveform with SincNet”. SLT 2018. Mirco Ravanelli, Yoshua Bengio. https://arxiv.org/abs/1808.00158

[2] : “Filterbank design for end-to-end speech separation”. Submitted to ICASSP 2020. Manuel Pariente, Samuele Cornell, Antoine Deleforge, Emmanuel Vincent. https://arxiv.org/abs/1910.10400

get_config()[source]¶: Returns dictionary of arguments to re-instantiate the class.

make_filters(low, high, filt_type='cos')[source]¶

static to_hz(mel)[source]¶

static to_mel(hz)[source]¶

filters¶: Compute filters from parameters

class asteroid.filterbanks.MultiphaseGammatoneFB(n_filters=128, kernel_size=16, sample_rate=8000, stride=None, **kwargs)[source]¶

Bases: asteroid.filterbanks.enc_dec.Filterbank

Multi-Phase Gammatone Filterbank as described in [1]. Please cite [1] whenever using this. Original code repository: <https://github.com/sp-uhh/mp-gtf>

Parameters:	n_filters (int) – Number of filters. kernel_size (int) – Length of the filters. sample_rate (int, optional) – The sample rate (used for initialization). stride (int, optional) – Stride of the convolution. If None (default), set to `kernel_size // 2`.

References: [1] David Ditter, Timo Gerkmann, “A Multi-Phase Gammatone Filterbank for

Speech Separation via TasNet”, ICASSP 2020 Available: <https://ieeexplore.ieee.org/document/9053602/>

filters¶: Abstract method for filters.

asteroid.filterbanks.griffin_lim(mag_specgram, stft_enc, angles=None, istft_dec=None, n_iter=6, momentum=0.9)[source]¶

Estimates matching phase from magnitude spectogram using the ‘fast’ Griffin Lim algorithm [1].

Parameters:

mag_specgram (torch.Tensor) – (any, dim, ension, freq, frames) as returned by Encoder(STFTFB), the magnitude spectrogram to be inverted.
stft_enc (Encoder[STFTFB]) – The Encoder(STFTFB()) object that was used to compute the input mag_spec.
angles (None or Tensor) – Angles to use to initialize the algorithm. If None (default), angles are init with uniform ditribution.
istft_dec (None or Decoder[STFTFB]) – Optional Decoder to use to get back to the time domain. If None (default), a perfect reconstruction Decoder is built from stft_enc.
n_iter (int) – Number of griffin-lim iterations to run.
momentum (float) – The momentum of fast Griffin-Lim. Original Griffin-Lim is obtained for momentum=0.

Returns:

torch.Tensor – estimated waveforms of shape (any, dim, ension, time).

Examples

>>> stft = Encoder(STFTFB(n_filters=256, kernel_size=256, stride=128))
>>> wav = torch.randn(2, 1, 8000)
>>> spec = stft(wav)
>>> masked_spec = spec * torch.sigmoid(torch.randn_like(spec))
>>> mag = transforms.take_mag(masked_spec, -2)
>>> est_wav = griffin_lim(mag, stft, n_iter=32)

References

[1] Perraudin et al. “A fast Griffin-Lim algorithm,” WASPAA 2013. [2] D. W. Griffin and J. S. Lim: “Signal estimation from modified short-time Fourier transform,” ASSP 1984.

asteroid.filterbanks.misi(mixture_wav, mag_specgrams, stft_enc, angles=None, istft_dec=None, n_iter=6, momentum=0.0, src_weights=None, dim=1)[source]¶

Jointly estimates matching phase from magnitude spectograms using the Multiple Input Spectrogram Inversion (MISI) algorithm [1].

Parameters:

mixture_wav (torch.Tensor) – (batch, time)
mag_specgrams (torch.Tensor) – (batch, n_src, freq, frames) as returned by Encoder(STFTFB), the magnitude spectrograms to be jointly inverted using MISI (modified or not).
stft_enc (Encoder[STFTFB]) – The Encoder(STFTFB()) object that was used to compute the input mag_spec.
angles (None or Tensor) – Angles to use to initialize the algorithm. If None (default), angles are init with uniform ditribution.
istft_dec (None or Decoder[STFTFB]) – Optional Decoder to use to get back to the time domain. If None (default), a perfect reconstruction Decoder is built from stft_enc.
n_iter (int) – Number of MISI iterations to run.
momentum (float) – Momentum on updates (this argument comes from GriffinLim). Defaults to 0 as it was never proposed anywhere.
src_weights (None or torch.Tensor) – Consistency weight for each source. Shape needs to be broadcastable to istft_dec(mag_specgrams). We make sure that the weights sum up to 1 along dim dim. If src_weights is None, compute them based on relative power.
dim (int) – Axis which contains the sources in mag_specgrams. Used for consistency constraint.

Returns:

torch.Tensor – estimated waveforms of shape (batch, n_src, time).

Examples

>>> stft = Encoder(STFTFB(n_filters=256, kernel_size=256, stride=128))
>>> wav = torch.randn(2, 3, 8000)
>>> specs = stft(wav)
>>> masked_specs = specs * torch.sigmoid(torch.randn_like(specs))
>>> mag = transforms.take_mag(masked_specs, -2)
>>> est_wav = misi(wav.sum(1), mag, stft, n_iter=32)

References

[1] Gunawan and Sen, “Iterative Phase Estimation for the Synthesis of Separated Sources From Single-Channel Mixtures,” in IEEE Signal Processing Letters, 2010. [2] Wang, LeRoux et al. “End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction.” Interspeech 2018 (2018)

asteroid.filterbanks.make_enc_dec(fb_name, n_filters, kernel_size, stride=None, who_is_pinv=None, padding=0, output_padding=0, **kwargs)[source]¶

Creates congruent encoder and decoder from the same filterbank family.

Parameters:

fb_name (str, className) – Filterbank family from which to make encoder and decoder. To choose among ['free', 'analytic_free', 'param_sinc', 'stft']. Can also be a class defined in a submodule in this subpackade (e.g. FreeFB).
n_filters (int) – Number of filters.
kernel_size (int) – Length of the filters.
stride (int, optional) – Stride of the convolution. If None (default), set to kernel_size // 2.
who_is_pinv (str, optional) – If None, no pseudo-inverse filters will be used. If string (among ['encoder', 'decoder']), decides which of Encoder or Decoder will be the pseudo inverse of the other one.
padding (int) – Zero-padding added to both sides of the input. Passed to Encoder and Decoder.
output_padding (int) – Additional size added to one side of the output shape. Passed to Decoder.
**kwargs – Arguments which will be passed to the filterbank class additionally to the usual n_filters, kernel_size and stride. Depends on the filterbank family.

Returns:

Encoder, Decoder

asteroid.filterbanks package¶

Submodules¶