Shortcuts

asteroid.filterbanks package

class asteroid.filterbanks.Filterbank(n_filters, kernel_size, stride=None)[source]

Bases: sphinx.ext.autodoc.importer._MockObject

Base Filterbank class. Each subclass has to implement a filters property.

Parameters:
  • n_filters (int) – Number of filters.
  • kernel_size (int) – Length of the filters.
  • stride (int, optional) – Stride of the conv or transposed conv. (Hop size). If None (default), set to kernel_size // 2.
Variables:

n_feats_out (int) – Number of output filters.

get_config()[source]

Returns dictionary of arguments to re-instantiate the class.

filters

Abstract method for filters.

class asteroid.filterbanks.Encoder(filterbank, is_pinv=False, as_conv1d=True, padding=0)[source]

Bases: asteroid.filterbanks.enc_dec._EncDec

Encoder class.

Add encoding methods to Filterbank classes. Not intended to be subclassed.

Parameters:
  • filterbank (Filterbank) – The filterbank to use as an encoder.
  • is_pinv (bool) – Whether to be the pseudo inverse of filterbank.
  • as_conv1d (bool) – Whether to behave like nn.Conv1d. If True (default), forwarding input with shape (batch, 1, time) will output a tensor of shape (batch, freq, conv_time). If False, will output a tensor of shape (batch, 1, freq, conv_time).
  • padding (int) – Zero-padding added to both sides of the input.
batch_1d_conv(inp, filters)[source]
forward(waveform)[source]

Convolve input waveform with the filters from a filterbank. :param waveform: any tensor with samples along the

last dimension. The waveform representation with and batch/channel etc.. dimension.
Returns:torch.Tensor – The corresponding TF domain signal.
Shapes:
>>> (time, ) --> (freq, conv_time)
>>> (batch, time) --> (batch, freq, conv_time)  # Avoid
>>> if as_conv1d:
>>>     (batch, 1, time) --> (batch, freq, conv_time)
>>>     (batch, chan, time) --> (batch, chan, freq, conv_time)
>>> else:
>>>     (batch, chan, time) --> (batch, chan, freq, conv_time)
>>> (batch, any, dim, time) --> (batch, any, dim, freq, conv_time)
classmethod pinv_of(filterbank, **kwargs)[source]

Returns an Encoder, pseudo inverse of a Filterbank or Decoder.

class asteroid.filterbanks.Decoder(filterbank, is_pinv=False, padding=0, output_padding=0)[source]

Bases: asteroid.filterbanks.enc_dec._EncDec

Decoder class.

Add decoding methods to Filterbank classes. Not intended to be subclassed.

Parameters:
  • filterbank (Filterbank) – The filterbank to use as an decoder.
  • is_pinv (bool) – Whether to be the pseudo inverse of filterbank.
  • padding (int) – Zero-padding added to both sides of the input.
  • output_padding (int) – Additional size added to one side of the output shape.
Notes
padding and output_padding arguments are directly passed to F.conv_transpose1d.
forward(spec)[source]

Applies transposed convolution to a TF representation.

This is equivalent to overlap-add.

Parameters:spec (torch.Tensor) – 3D or 4D Tensor. The TF representation. (Output of Encoder.forward()).
Returns:torch.Tensor – The corresponding time domain signal.
classmethod pinv_of(filterbank)[source]

Returns an Decoder, pseudo inverse of a filterbank or Encoder.

class asteroid.filterbanks.FreeFB(n_filters, kernel_size, stride=None, **kwargs)[source]

Bases: asteroid.filterbanks.enc_dec.Filterbank

Free filterbank without any constraints. Equivalent to nn.Conv1d.

Parameters:
  • n_filters (int) – Number of filters.
  • kernel_size (int) – Length of the filters.
  • stride (int, optional) – Stride of the convolution. If None (default), set to kernel_size // 2.
Variables:

n_feats_out (int) – Number of output filters.

References

[1] : “Filterbank design for end-to-end speech separation”. Submitted to ICASSP 2020. Manuel Pariente, Samuele Cornell, Antoine Deleforge, Emmanuel Vincent.

filters

Abstract method for filters.

class asteroid.filterbanks.STFTFB(n_filters, kernel_size, stride=None, window=None, **kwargs)[source]

Bases: asteroid.filterbanks.enc_dec.Filterbank

STFT filterbank.

Parameters:
  • n_filters (int) – Number of filters. Determines the length of the STFT filters before windowing.
  • kernel_size (int) – Length of the filters (i.e the window).
  • stride (int, optional) – Stride of the convolution (hop size). If None (default), set to kernel_size // 2.
  • window (numpy.ndarray, optional) – If None, defaults to np.sqrt(np.hanning()).
Variables:

n_feats_out (int) – Number of output filters.

filters

Abstract method for filters.

class asteroid.filterbanks.AnalyticFreeFB(n_filters, kernel_size, stride=None, **kwargs)[source]

Bases: asteroid.filterbanks.enc_dec.Filterbank

Free analytic (fully learned with analycity constraints) filterbank. For more details, see [1].

Parameters:
  • n_filters (int) – Number of filters. Half of n_filters will have parameters, the other half will be the hilbert transforms. n_filters should be even.
  • kernel_size (int) – Length of the filters.
  • stride (int, optional) – Stride of the convolution. If None (default), set to kernel_size // 2.
Variables:

n_feats_out (int) – Number of output filters.

References

[1] : “Filterbank design for end-to-end speech separation”. Submitted to ICASSP 2020. Manuel Pariente, Samuele Cornell, Antoine Deleforge, Emmanuel Vincent.

filters

Abstract method for filters.

class asteroid.filterbanks.ParamSincFB(n_filters, kernel_size, stride=None, sample_rate=16000, min_low_hz=50, min_band_hz=50)[source]

Bases: asteroid.filterbanks.enc_dec.Filterbank

Extension of the parameterized filterbank from [1] proposed in [2]. Modified and extended from from https://github.com/mravanelli/SincNet

Parameters:
  • n_filters (int) – Number of filters. Half of n_filters (the real parts) will have parameters, the other half will correspond to the imaginary parts. n_filters should be even.
  • kernel_size (int) – Length of the filters.
  • stride (int, optional) – Stride of the convolution. If None (default), set to kernel_size // 2.
  • sample_rate (int, optional) – The sample rate (used for initialization).
  • min_low_hz (int, optional) – Lowest low frequency allowed (Hz).
  • min_band_hz (int, optional) – Lowest band frequency allowed (Hz).
Variables:

n_feats_out (int) – Number of output filters.

References

[1] : “Speaker Recognition from raw waveform with SincNet”. SLT 2018. Mirco Ravanelli, Yoshua Bengio. https://arxiv.org/abs/1808.00158

[2] : “Filterbank design for end-to-end speech separation”. Submitted to ICASSP 2020. Manuel Pariente, Samuele Cornell, Antoine Deleforge, Emmanuel Vincent. https://arxiv.org/abs/1910.10400

get_config()[source]

Returns dictionary of arguments to re-instantiate the class.

make_filters(low, high, filt_type='cos')[source]
static to_hz(mel)[source]
static to_mel(hz)[source]
filters

Compute filters from parameters

class asteroid.filterbanks.MultiphaseGammatoneFB(n_filters=128, kernel_size=16, sample_rate=8000, stride=None, **kwargs)[source]

Bases: asteroid.filterbanks.enc_dec.Filterbank

Multi-Phase Gammatone Filterbank as described in [1]. Please cite [1] whenever using this. Original code repository: <https://github.com/sp-uhh/mp-gtf>

Parameters:
  • n_filters (int) – Number of filters.
  • kernel_size (int) – Length of the filters.
  • sample_rate (int, optional) – The sample rate (used for initialization).
  • stride (int, optional) – Stride of the convolution. If None (default), set to kernel_size // 2.

References: [1] David Ditter, Timo Gerkmann, “A Multi-Phase Gammatone Filterbank for

Speech Separation via TasNet”, ICASSP 2020 Available: <https://ieeexplore.ieee.org/document/9053602/>
filters

Abstract method for filters.

asteroid.filterbanks.griffin_lim(mag_specgram, stft_enc, angles=None, istft_dec=None, n_iter=6, momentum=0.9)[source]

Estimates matching phase from magnitude spectogram using the ‘fast’ Griffin Lim algorithm [1].

Parameters:
  • mag_specgram (torch.Tensor) – (any, dim, ension, freq, frames) as returned by Encoder(STFTFB), the magnitude spectrogram to be inverted.
  • stft_enc (Encoder[STFTFB]) – The Encoder(STFTFB()) object that was used to compute the input mag_spec.
  • angles (None or Tensor) – Angles to use to initialize the algorithm. If None (default), angles are init with uniform ditribution.
  • istft_dec (None or Decoder[STFTFB]) – Optional Decoder to use to get back to the time domain. If None (default), a perfect reconstruction Decoder is built from stft_enc.
  • n_iter (int) – Number of griffin-lim iterations to run.
  • momentum (float) – The momentum of fast Griffin-Lim. Original Griffin-Lim is obtained for momentum=0.
Returns:

torch.Tensor – estimated waveforms of shape (any, dim, ension, time).

Examples

>>> stft = Encoder(STFTFB(n_filters=256, kernel_size=256, stride=128))
>>> wav = torch.randn(2, 1, 8000)
>>> spec = stft(wav)
>>> masked_spec = spec * torch.sigmoid(torch.randn_like(spec))
>>> mag = transforms.take_mag(masked_spec, -2)
>>> est_wav = griffin_lim(mag, stft, n_iter=32)

References

[1] Perraudin et al. “A fast Griffin-Lim algorithm,” WASPAA 2013. [2] D. W. Griffin and J. S. Lim: “Signal estimation from modified short-time Fourier transform,” ASSP 1984.

asteroid.filterbanks.misi(mixture_wav, mag_specgrams, stft_enc, angles=None, istft_dec=None, n_iter=6, momentum=0.0, src_weights=None, dim=1)[source]

Jointly estimates matching phase from magnitude spectograms using the Multiple Input Spectrogram Inversion (MISI) algorithm [1].

Parameters:
  • mixture_wav (torch.Tensor) – (batch, time)
  • mag_specgrams (torch.Tensor) – (batch, n_src, freq, frames) as returned by Encoder(STFTFB), the magnitude spectrograms to be jointly inverted using MISI (modified or not).
  • stft_enc (Encoder[STFTFB]) – The Encoder(STFTFB()) object that was used to compute the input mag_spec.
  • angles (None or Tensor) – Angles to use to initialize the algorithm. If None (default), angles are init with uniform ditribution.
  • istft_dec (None or Decoder[STFTFB]) – Optional Decoder to use to get back to the time domain. If None (default), a perfect reconstruction Decoder is built from stft_enc.
  • n_iter (int) – Number of MISI iterations to run.
  • momentum (float) – Momentum on updates (this argument comes from GriffinLim). Defaults to 0 as it was never proposed anywhere.
  • src_weights (None or torch.Tensor) – Consistency weight for each source. Shape needs to be broadcastable to istft_dec(mag_specgrams). We make sure that the weights sum up to 1 along dim dim. If src_weights is None, compute them based on relative power.
  • dim (int) – Axis which contains the sources in mag_specgrams. Used for consistency constraint.
Returns:

torch.Tensor – estimated waveforms of shape (batch, n_src, time).

Examples

>>> stft = Encoder(STFTFB(n_filters=256, kernel_size=256, stride=128))
>>> wav = torch.randn(2, 3, 8000)
>>> specs = stft(wav)
>>> masked_specs = specs * torch.sigmoid(torch.randn_like(specs))
>>> mag = transforms.take_mag(masked_specs, -2)
>>> est_wav = misi(wav.sum(1), mag, stft, n_iter=32)

References

[1] Gunawan and Sen, “Iterative Phase Estimation for the Synthesis of Separated Sources From Single-Channel Mixtures,” in IEEE Signal Processing Letters, 2010. [2] Wang, LeRoux et al. “End-to-End Speech Separation with Unfolded Iterative Phase Reconstruction.” Interspeech 2018 (2018)

asteroid.filterbanks.make_enc_dec(fb_name, n_filters, kernel_size, stride=None, who_is_pinv=None, padding=0, output_padding=0, **kwargs)[source]

Creates congruent encoder and decoder from the same filterbank family.

Parameters:
  • fb_name (str, className) – Filterbank family from which to make encoder and decoder. To choose among ['free', 'analytic_free', 'param_sinc', 'stft']. Can also be a class defined in a submodule in this subpackade (e.g. FreeFB).
  • n_filters (int) – Number of filters.
  • kernel_size (int) – Length of the filters.
  • stride (int, optional) – Stride of the convolution. If None (default), set to kernel_size // 2.
  • who_is_pinv (str, optional) – If None, no pseudo-inverse filters will be used. If string (among ['encoder', 'decoder']), decides which of Encoder or Decoder will be the pseudo inverse of the other one.
  • padding (int) – Zero-padding added to both sides of the input. Passed to Encoder and Decoder.
  • output_padding (int) – Additional size added to one side of the output shape. Passed to Decoder.
  • **kwargs – Arguments which will be passed to the filterbank class additionally to the usual n_filters, kernel_size and stride. Depends on the filterbank family.
Returns:

Encoder, Decoder

Read the Docs v: v0.3.3
Versions
latest
stable
v0.3.3
v0.3.2
v0.3.1
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.