asteroid.models package

class asteroid.models.ConvTasNet(n_src, out_chan=None, n_blocks=8, n_repeats=3, bn_chan=128, hid_chan=512, skip_chan=128, conv_kernel_size=3, norm_type='gLN', mask_act='sigmoid', in_chan=None, fb_name='free', kernel_size=16, n_filters=512, stride=8, encoder_activation='relu', **fb_kwargs)[source]

Bases: asteroid.models.base_models.BaseTasNet

ConvTasNet separation model, as described in [1].

Parameters:
  • n_src (int) – Number of sources in the input mixtures.
  • out_chan (int, optional) – Number of bins in the estimated masks. If None, out_chan = in_chan.
  • n_blocks (int, optional) – Number of convolutional blocks in each repeat. Defaults to 8.
  • n_repeats (int, optional) – Number of repeats. Defaults to 3.
  • bn_chan (int, optional) – Number of channels after the bottleneck.
  • hid_chan (int, optional) – Number of channels in the convolutional blocks.
  • skip_chan (int, optional) – Number of channels in the skip connections. If 0 or None, TDConvNet won’t have any skip connections and the masks will be computed from the residual output. Corresponds to the ConvTasnet architecture in v1 or the paper.
  • conv_kernel_size (int, optional) – Kernel size in convolutional blocks.
  • norm_type (str, optional) – To choose from 'BN', 'gLN', 'cLN'.
  • mask_act (str, optional) – Which non-linear function to generate mask.
  • in_chan (int, optional) – Number of input channels, should be equal to n_filters.
  • fb_name (str, className) – Filterbank family from which to make encoder and decoder. To choose among ['free', 'analytic_free', 'param_sinc', 'stft'].
  • n_filters (int) – Number of filters / Input dimension of the masker net.
  • kernel_size (int) – Length of the filters.
  • stride (int, optional) – Stride of the convolution. If None (default), set to kernel_size // 2.
  • **fb_kwargs (dict) – Additional kwards to pass to the filterbank creation.

References

[1] : “Conv-TasNet: Surpassing ideal time-frequency magnitude masking for speech separation” TASLP 2019 Yi Luo, Nima Mesgarani https://arxiv.org/abs/1809.07454

class asteroid.models.DPRNNTasNet(n_src, out_chan=None, bn_chan=128, hid_size=128, chunk_size=100, hop_size=None, n_repeats=6, norm_type='gLN', mask_act='sigmoid', bidirectional=True, rnn_type='LSTM', num_layers=1, dropout=0, in_chan=None, fb_name='free', kernel_size=16, n_filters=64, stride=8, encoder_activation='relu', **fb_kwargs)[source]

Bases: asteroid.models.base_models.BaseTasNet

DPRNN separation model, as described in [1].

Parameters:
  • n_src (int) – Number of masks to estimate.
  • out_chan (int or None) – Number of bins in the estimated masks. Defaults to in_chan.
  • bn_chan (int) – Number of channels after the bottleneck. Defaults to 128.
  • hid_size (int) – Number of neurons in the RNNs cell state. Defaults to 128.
  • chunk_size (int) – window size of overlap and add processing. Defaults to 100.
  • hop_size (int or None) – hop size (stride) of overlap and add processing. Default to chunk_size // 2 (50% overlap).
  • n_repeats (int) – Number of repeats. Defaults to 6.
  • norm_type (str, optional) –

    Type of normalization to use. To choose from

    • 'gLN': global Layernorm
    • 'cLN': channelwise Layernorm
  • mask_act (str, optional) – Which non-linear function to generate mask.
  • bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN (Intra-Chunk is always bidirectional).
  • rnn_type (str, optional) – Type of RNN used. Choose between 'RNN', 'LSTM' and 'GRU'.
  • num_layers (int, optional) – Number of layers in each RNN.
  • dropout (float, optional) – Dropout ratio, must be in [0,1].
  • in_chan (int, optional) – Number of input channels, should be equal to n_filters.
  • fb_name (str, className) – Filterbank family from which to make encoder and decoder. To choose among ['free', 'analytic_free', 'param_sinc', 'stft'].
  • n_filters (int) – Number of filters / Input dimension of the masker net.
  • kernel_size (int) – Length of the filters.
  • stride (int, optional) – Stride of the convolution. If None (default), set to kernel_size // 2.
  • **fb_kwargs (dict) – Additional kwards to pass to the filterbank creation.

References

[1] “Dual-path RNN: efficient long sequence modeling for
time-domain single-channel speech separation”, Yi Luo, Zhuo Chen and Takuya Yoshioka. https://arxiv.org/abs/1910.06379
class asteroid.models.SuDORMRFImprovedNet(n_src, bn_chan=128, num_blocks=16, upsampling_depth=4, mask_act='relu', in_chan=None, fb_name='free', kernel_size=21, n_filters=512, stride=None, **fb_kwargs)[source]

Bases: asteroid.models.base_models.BaseTasNet

Improved SuDORMRF separation model, as described in [1].

Parameters:
  • n_src (int) – Number of sources in the input mixtures.
  • bn_chan (int, optional) – Number of bins in the bottleneck layer and the UNet blocks.
  • num_blocks (int) – Number of of UBlocks.
  • upsampling_depth (int) – Depth of upsampling.
  • mask_act (str) – Name of output activation.
  • in_chan (int, optional) – Number of input channels, should be equal to n_filters.
  • fb_name (str, className) – Filterbank family from which to make encoder and decoder. To choose among ['free', 'analytic_free', 'param_sinc', 'stft'].
  • n_filters (int) – Number of filters / Input dimension of the masker net.
  • kernel_size (int) – Length of the filters.
  • stride (int, optional) – Stride of the convolution. If None (default), set to kernel_size // 2.
  • **fb_kwargs (dict) – Additional kwards to pass to the filterbank creation.

References

[1] : “Sudo rm -rf: Efficient Networks for Universal Audio Source Separation”,
Tzinis et al. MLSP 2020.
class asteroid.models.SuDORMRFNet(n_src, bn_chan=128, num_blocks=16, upsampling_depth=4, mask_act='softmax', in_chan=None, fb_name='free', kernel_size=21, n_filters=512, stride=None, **fb_kwargs)[source]

Bases: asteroid.models.base_models.BaseTasNet

SuDORMRF separation model, as described in [1].

Parameters:
  • n_src (int) – Number of sources in the input mixtures.
  • bn_chan (int, optional) – Number of bins in the bottleneck layer and the UNet blocks.
  • num_blocks (int) – Number of of UBlocks.
  • upsampling_depth (int) – Depth of upsampling.
  • mask_act (str) – Name of output activation.
  • in_chan (int, optional) – Number of input channels, should be equal to n_filters.
  • fb_name (str, className) – Filterbank family from which to make encoder and decoder. To choose among ['free', 'analytic_free', 'param_sinc', 'stft'].
  • n_filters (int) – Number of filters / Input dimension of the masker net.
  • kernel_size (int) – Length of the filters.
  • stride (int, optional) – Stride of the convolution. If None (default), set to kernel_size // 2.
  • **fb_kwargs (dict) – Additional kwards to pass to the filterbank creation.

References

[1] : “Sudo rm -rf: Efficient Networks for Universal Audio Source Separation”,
Tzinis et al. MLSP 2020.
class asteroid.models.DPTNet(n_src, ff_hid=256, chunk_size=100, hop_size=None, n_repeats=6, norm_type='gLN', ff_activation='relu', encoder_activation='relu', mask_act='relu', bidirectional=True, dropout=0, in_chan=None, fb_name='free', kernel_size=16, n_filters=64, stride=8, **fb_kwargs)[source]

Bases: asteroid.models.base_models.BaseTasNet

DPTNet separation model, as described in [1].

Parameters:
  • n_src (int) – Number of masks to estimate.
  • out_chan (int or None) – Number of bins in the estimated masks. Defaults to in_chan.
  • bn_chan (int) – Number of channels after the bottleneck. Defaults to 128.
  • hid_size (int) – Number of neurons in the RNNs cell state. Defaults to 128.
  • chunk_size (int) – window size of overlap and add processing. Defaults to 100.
  • hop_size (int or None) – hop size (stride) of overlap and add processing. Default to chunk_size // 2 (50% overlap).
  • n_repeats (int) – Number of repeats. Defaults to 6.
  • norm_type (str, optional) –

    Type of normalization to use. To choose from

    • 'gLN': global Layernorm
    • 'cLN': channelwise Layernorm
  • mask_act (str, optional) – Which non-linear function to generate mask.
  • bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN (Intra-Chunk is always bidirectional).
  • rnn_type (str, optional) – Type of RNN used. Choose between 'RNN', 'LSTM' and 'GRU'.
  • num_layers (int, optional) – Number of layers in each RNN.
  • dropout (float, optional) – Dropout ratio, must be in [0,1].
  • in_chan (int, optional) – Number of input channels, should be equal to n_filters.
  • fb_name (str, className) – Filterbank family from which to make encoder and decoder. To choose among ['free', 'analytic_free', 'param_sinc', 'stft'].
  • n_filters (int) – Number of filters / Input dimension of the masker net.
  • kernel_size (int) – Length of the filters.
  • stride (int, optional) – Stride of the convolution. If None (default), set to kernel_size // 2.
  • **fb_kwargs (dict) – Additional kwards to pass to the filterbank creation.

References

[1]: Jingjing Chen et al. “Dual-Path Transformer Network: Direct
Context-Aware Modeling for End-to-End Monaural Speech Separation” Interspeech 2020.
class asteroid.models.LSTMTasNet(n_src, out_chan=None, rnn_type='lstm', n_layers=4, hid_size=512, dropout=0.3, mask_act='sigmoid', bidirectional=True, in_chan=None, fb_name='free', n_filters=64, kernel_size=16, stride=8, encoder_activation=None, **fb_kwargs)[source]

Bases: asteroid.models.base_models.BaseTasNet

TasNet separation model, as described in [1].

Parameters:
  • n_src (int) – Number of masks to estimate.
  • out_chan (int or None) – Number of bins in the estimated masks. Defaults to in_chan.
  • hid_size (int) – Number of neurons in the RNNs cell state. Defaults to 128.
  • mask_act (str, optional) – Which non-linear function to generate mask.
  • bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN (Intra-Chunk is always bidirectional).
  • rnn_type (str, optional) – Type of RNN used. Choose between 'RNN', 'LSTM' and 'GRU'.
  • n_layers (int, optional) – Number of layers in each RNN.
  • dropout (float, optional) – Dropout ratio, must be in [0,1].
  • in_chan (int, optional) – Number of input channels, should be equal to n_filters.
  • fb_name (str, className) – Filterbank family from which to make encoder and decoder. To choose among ['free', 'analytic_free', 'param_sinc', 'stft'].
  • n_filters (int) – Number of filters / Input dimension of the masker net.
  • kernel_size (int) – Length of the filters.
  • stride (int, optional) – Stride of the convolution. If None (default), set to kernel_size // 2.
  • **fb_kwargs (dict) – Additional kwards to pass to the filterbank creation.

References

[1]: Yi Luo et al. “Real-time Single-channel Dereverberation and Separation
with Time-domain Audio Separation Network”, Interspeech 2018
asteroid.models.save_publishable(publish_dir, model_dict, metrics=None, train_conf=None)[source]

Save models to prepare for publication / model sharing.

Parameters:
  • publish_dir (str) – Path to the publishing directory. Usually under exp/exp_name/publish_dir
  • model_dict (dict) – dict at least with keys model_args, state_dict,`dataset` or licenses
  • metrics (dict) – dict with evaluation metrics.
  • train_conf (dict) – Training configuration dict (from conf.yml).
Returns:

dict, same as model_dict with added fields.

Raises:

AssertionError when either `model_args`, `state_dict`,`dataset` orlicenses are not present is model_dict.keys()

asteroid.models.upload_publishable(publish_dir, uploader=None, affiliation=None, git_username=None, token=None, force_publish=False, use_sandbox=False, unit_test=False)[source]

Entry point to upload publishable model.

Parameters:
  • publish_dir (str) – Path to the publishing directory. Usually under exp/exp_name/publish_dir
  • uploader (str) – Full name of the uploader (Ex: Manuel Pariente)
  • affiliation (str, optional) – Affiliation (no accent).
  • git_username (str, optional) – GitHub username.
  • token (str) – Access token generated to upload depositions.
  • force_publish (bool) – Whether to directly publish without asking confirmation before. Defaults to False.
  • use_sandbox (bool) – Whether to use Zenodo’s sandbox instead of the official Zenodo.
  • unit_test (bool) – If True, we do not ask user input and do not publish.