asteroid.models package¶
-
class
asteroid.models.
ConvTasNet
(n_src, out_chan=None, n_blocks=8, n_repeats=3, bn_chan=128, hid_chan=512, skip_chan=128, conv_kernel_size=3, norm_type='gLN', mask_act='sigmoid', in_chan=None, fb_name='free', kernel_size=16, n_filters=512, stride=8, encoder_activation=None, **fb_kwargs)[source]¶ Bases:
asteroid.models.base_models.BaseEncoderMaskerDecoder
ConvTasNet separation model, as described in [1].
Parameters: - n_src (int) – Number of sources in the input mixtures.
- out_chan (int, optional) – Number of bins in the estimated masks.
If
None
, out_chan = in_chan. - n_blocks (int, optional) – Number of convolutional blocks in each repeat. Defaults to 8.
- n_repeats (int, optional) – Number of repeats. Defaults to 3.
- bn_chan (int, optional) – Number of channels after the bottleneck.
- hid_chan (int, optional) – Number of channels in the convolutional blocks.
- skip_chan (int, optional) – Number of channels in the skip connections. If 0 or None, TDConvNet won’t have any skip connections and the masks will be computed from the residual output. Corresponds to the ConvTasnet architecture in v1 or the paper.
- conv_kernel_size (int, optional) – Kernel size in convolutional blocks.
- norm_type (str, optional) – To choose from
'BN'
,'gLN'
,'cLN'
. - mask_act (str, optional) – Which non-linear function to generate mask.
- in_chan (int, optional) – Number of input channels, should be equal to n_filters.
- fb_name (str, className) – Filterbank family from which to make encoder
and decoder. To choose among [
'free'
,'analytic_free'
,'param_sinc'
,'stft'
]. - n_filters (int) – Number of filters / Input dimension of the masker net.
- kernel_size (int) – Length of the filters.
- stride (int, optional) – Stride of the convolution.
If None (default), set to
kernel_size // 2
. - **fb_kwargs (dict) – Additional kwards to pass to the filterbank creation.
References
[1] : “Conv-TasNet: Surpassing ideal time-frequency magnitude masking for speech separation” TASLP 2019 Yi Luo, Nima Mesgarani https://arxiv.org/abs/1809.07454
-
class
asteroid.models.
DPRNNTasNet
(n_src, out_chan=None, bn_chan=128, hid_size=128, chunk_size=100, hop_size=None, n_repeats=6, norm_type='gLN', mask_act='sigmoid', bidirectional=True, rnn_type='LSTM', num_layers=1, dropout=0, in_chan=None, fb_name='free', kernel_size=16, n_filters=64, stride=8, encoder_activation=None, **fb_kwargs)[source]¶ Bases:
asteroid.models.base_models.BaseEncoderMaskerDecoder
DPRNN separation model, as described in [1].
Parameters: - n_src (int) – Number of masks to estimate.
- out_chan (int or None) – Number of bins in the estimated masks. Defaults to in_chan.
- bn_chan (int) – Number of channels after the bottleneck. Defaults to 128.
- hid_size (int) – Number of neurons in the RNNs cell state. Defaults to 128.
- chunk_size (int) – window size of overlap and add processing. Defaults to 100.
- hop_size (int or None) – hop size (stride) of overlap and add processing. Default to chunk_size // 2 (50% overlap).
- n_repeats (int) – Number of repeats. Defaults to 6.
- norm_type (str, optional) –
Type of normalization to use. To choose from
'gLN'
: global Layernorm'cLN'
: channelwise Layernorm
- mask_act (str, optional) – Which non-linear function to generate mask.
- bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN (Intra-Chunk is always bidirectional).
- rnn_type (str, optional) – Type of RNN used. Choose between
'RNN'
,'LSTM'
and'GRU'
. - num_layers (int, optional) – Number of layers in each RNN.
- dropout (float, optional) – Dropout ratio, must be in [0,1].
- in_chan (int, optional) – Number of input channels, should be equal to n_filters.
- fb_name (str, className) – Filterbank family from which to make encoder
and decoder. To choose among [
'free'
,'analytic_free'
,'param_sinc'
,'stft'
]. - n_filters (int) – Number of filters / Input dimension of the masker net.
- kernel_size (int) – Length of the filters.
- stride (int, optional) – Stride of the convolution.
If None (default), set to
kernel_size // 2
. - **fb_kwargs (dict) – Additional kwards to pass to the filterbank creation.
References
- [1] “Dual-path RNN: efficient long sequence modeling for
- time-domain single-channel speech separation”, Yi Luo, Zhuo Chen and Takuya Yoshioka. https://arxiv.org/abs/1910.06379
-
class
asteroid.models.
SuDORMRFImprovedNet
(n_src, bn_chan=128, num_blocks=16, upsampling_depth=4, mask_act='relu', in_chan=None, fb_name='free', kernel_size=21, n_filters=512, stride=None, **fb_kwargs)[source]¶ Bases:
asteroid.models.base_models.BaseEncoderMaskerDecoder
Improved SuDORMRF separation model, as described in [1].
Parameters: - n_src (int) – Number of sources in the input mixtures.
- bn_chan (int, optional) – Number of bins in the bottleneck layer and the UNet blocks.
- num_blocks (int) – Number of of UBlocks.
- upsampling_depth (int) – Depth of upsampling.
- mask_act (str) – Name of output activation.
- in_chan (int, optional) – Number of input channels, should be equal to n_filters.
- fb_name (str, className) – Filterbank family from which to make encoder
and decoder. To choose among [
'free'
,'analytic_free'
,'param_sinc'
,'stft'
]. - n_filters (int) – Number of filters / Input dimension of the masker net.
- kernel_size (int) – Length of the filters.
- stride (int, optional) – Stride of the convolution.
If None (default), set to
kernel_size // 2
. - **fb_kwargs (dict) – Additional kwards to pass to the filterbank creation.
References
- [1] : “Sudo rm -rf: Efficient Networks for Universal Audio Source Separation”,
- Tzinis et al. MLSP 2020.
-
class
asteroid.models.
SuDORMRFNet
(n_src, bn_chan=128, num_blocks=16, upsampling_depth=4, mask_act='softmax', in_chan=None, fb_name='free', kernel_size=21, n_filters=512, stride=None, **fb_kwargs)[source]¶ Bases:
asteroid.models.base_models.BaseEncoderMaskerDecoder
SuDORMRF separation model, as described in [1].
Parameters: - n_src (int) – Number of sources in the input mixtures.
- bn_chan (int, optional) – Number of bins in the bottleneck layer and the UNet blocks.
- num_blocks (int) – Number of of UBlocks.
- upsampling_depth (int) – Depth of upsampling.
- mask_act (str) – Name of output activation.
- in_chan (int, optional) – Number of input channels, should be equal to n_filters.
- fb_name (str, className) – Filterbank family from which to make encoder
and decoder. To choose among [
'free'
,'analytic_free'
,'param_sinc'
,'stft'
]. - n_filters (int) – Number of filters / Input dimension of the masker net.
- kernel_size (int) – Length of the filters.
- stride (int, optional) – Stride of the convolution.
If None (default), set to
kernel_size // 2
. - **fb_kwargs (dict) – Additional kwards to pass to the filterbank creation.
References
- [1] : “Sudo rm -rf: Efficient Networks for Universal Audio Source Separation”,
- Tzinis et al. MLSP 2020.
-
class
asteroid.models.
DPTNet
(n_src, ff_hid=256, chunk_size=100, hop_size=None, n_repeats=6, norm_type='gLN', ff_activation='relu', encoder_activation='relu', mask_act='relu', bidirectional=True, dropout=0, in_chan=None, fb_name='free', kernel_size=16, n_filters=64, stride=8, **fb_kwargs)[source]¶ Bases:
asteroid.models.base_models.BaseEncoderMaskerDecoder
DPTNet separation model, as described in [1].
Parameters: - n_src (int) – Number of masks to estimate.
- out_chan (int or None) – Number of bins in the estimated masks. Defaults to in_chan.
- bn_chan (int) – Number of channels after the bottleneck. Defaults to 128.
- hid_size (int) – Number of neurons in the RNNs cell state. Defaults to 128.
- chunk_size (int) – window size of overlap and add processing. Defaults to 100.
- hop_size (int or None) – hop size (stride) of overlap and add processing. Default to chunk_size // 2 (50% overlap).
- n_repeats (int) – Number of repeats. Defaults to 6.
- norm_type (str, optional) –
Type of normalization to use. To choose from
'gLN'
: global Layernorm'cLN'
: channelwise Layernorm
- mask_act (str, optional) – Which non-linear function to generate mask.
- bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN (Intra-Chunk is always bidirectional).
- rnn_type (str, optional) – Type of RNN used. Choose between
'RNN'
,'LSTM'
and'GRU'
. - num_layers (int, optional) – Number of layers in each RNN.
- dropout (float, optional) – Dropout ratio, must be in [0,1].
- in_chan (int, optional) – Number of input channels, should be equal to n_filters.
- fb_name (str, className) – Filterbank family from which to make encoder
and decoder. To choose among [
'free'
,'analytic_free'
,'param_sinc'
,'stft'
]. - n_filters (int) – Number of filters / Input dimension of the masker net.
- kernel_size (int) – Length of the filters.
- stride (int, optional) – Stride of the convolution.
If None (default), set to
kernel_size // 2
. - **fb_kwargs (dict) – Additional kwards to pass to the filterbank creation.
References
- [1]: Jingjing Chen et al. “Dual-Path Transformer Network: Direct
- Context-Aware Modeling for End-to-End Monaural Speech Separation” Interspeech 2020.
-
class
asteroid.models.
LSTMTasNet
(n_src, out_chan=None, rnn_type='lstm', n_layers=4, hid_size=512, dropout=0.3, mask_act='sigmoid', bidirectional=True, in_chan=None, fb_name='free', n_filters=64, kernel_size=16, stride=8, encoder_activation=None, **fb_kwargs)[source]¶ Bases:
asteroid.models.base_models.BaseEncoderMaskerDecoder
TasNet separation model, as described in [1].
Parameters: - n_src (int) – Number of masks to estimate.
- out_chan (int or None) – Number of bins in the estimated masks. Defaults to in_chan.
- hid_size (int) – Number of neurons in the RNNs cell state. Defaults to 128.
- mask_act (str, optional) – Which non-linear function to generate mask.
- bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN (Intra-Chunk is always bidirectional).
- rnn_type (str, optional) – Type of RNN used. Choose between
'RNN'
,'LSTM'
and'GRU'
. - n_layers (int, optional) – Number of layers in each RNN.
- dropout (float, optional) – Dropout ratio, must be in [0,1].
- in_chan (int, optional) – Number of input channels, should be equal to n_filters.
- fb_name (str, className) – Filterbank family from which to make encoder
and decoder. To choose among [
'free'
,'analytic_free'
,'param_sinc'
,'stft'
]. - n_filters (int) – Number of filters / Input dimension of the masker net.
- kernel_size (int) – Length of the filters.
- stride (int, optional) – Stride of the convolution.
If None (default), set to
kernel_size // 2
. - **fb_kwargs (dict) – Additional kwards to pass to the filterbank creation.
References
- [1]: Yi Luo et al. “Real-time Single-channel Dereverberation and Separation
- with Time-domain Audio Separation Network”, Interspeech 2018
-
class
asteroid.models.
DeMask
(input_type='mag', output_type='mag', hidden_dims=[1024], dropout=0, activation='relu', mask_act='relu', norm_type='gLN', fb_type='stft', n_filters=512, stride=256, kernel_size=512, **fb_kwargs)[source]¶ Bases:
asteroid.models.base_models.BaseModel
Simple MLP model for surgical mask speech enhancement A transformed-domain masking approach is used. :param input_type: whether the magnitude spectrogram “mag” or both real imaginary parts “reim” are
passed as features to the masker network. Concatenation of “mag” and “reim” also can be used by using “cat”.Parameters: - output_type (str, optional) – whether the masker ouputs a mask for magnitude spectrogram “mag” or both real imaginary parts “reim”.
- hidden_dims (list, optional) – list of MLP hidden layer sizes.
- dropout (float, optional) – dropout probability.
- activation (str, optional) – type of activation used in hidden MLP layers.
- mask_act (str, optional) – Which non-linear function to generate mask.
- norm_type (str, optional) – To choose from
'BN'
,'gLN'
,'cLN'
. - fb_name (str) – type of analysis and synthesis filterbanks used, choose between [“stft”, “free”, “analytic_free”].
- n_filters (int) – number of filters in the analysis and synthesis filterbanks.
- stride (int) – filterbank filters stride.
- kernel_size (int) – length of filters in the filterbank.
- encoder_activation (str) –
- **fb_kwargs (dict) – Additional kwards to pass to the filterbank creation.
-
class
asteroid.models.
DCUNet
(architecture, stft_kernel_size=512, stft_stride=None, masknet_kwargs=None)[source]¶ Bases:
asteroid.models.dcunet.BaseDCUNet
DCUNet as proposed in [1].
Parameters: References
[1] : “Phase-aware Speech Enhancement with Deep Complex U-Net”, Hyeong-Seok Choi et al. https://arxiv.org/abs/1903.03107
-
masknet_class
¶
-
-
class
asteroid.models.
DCCRNet
(*args, stft_kernel_size=512, masknet_kwargs=None, **kwargs)[source]¶ Bases:
asteroid.models.dcunet.BaseDCUNet
DCCRNet as proposed in [1].
Parameters: References
[1] : “DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement”, Yanxin Hu et al. https://arxiv.org/abs/2008.00264
-
masknet_class
¶
-
-
asteroid.models.
save_publishable
(publish_dir, model_dict, metrics=None, train_conf=None, recipe=None)[source]¶ Save models to prepare for publication / model sharing.
Parameters: - publish_dir (str) – Path to the publishing directory. Usually under exp/exp_name/publish_dir
- model_dict (dict) – dict at least with keys model_args, state_dict,`dataset` or licenses
- metrics (dict) – dict with evaluation metrics.
- train_conf (dict) – Training configuration dict (from conf.yml).
- recipe (str) – Name of the recipe.
Returns: dict, same as model_dict with added fields.
Raises: AssertionError when either `model_args`, `state_dict`,`dataset` or – licenses are not present is model_dict.keys()
-
asteroid.models.
upload_publishable
(publish_dir, uploader=None, affiliation=None, git_username=None, token=None, force_publish=False, use_sandbox=False, unit_test=False)[source]¶ Entry point to upload publishable model.
Parameters: - publish_dir (str) – Path to the publishing directory. Usually under exp/exp_name/publish_dir
- uploader (str) – Full name of the uploader (Ex: Manuel Pariente)
- affiliation (str, optional) – Affiliation (no accent).
- git_username (str, optional) – GitHub username.
- token (str) – Access token generated to upload depositions.
- force_publish (bool) – Whether to directly publish without asking confirmation before. Defaults to False.
- use_sandbox (bool) – Whether to use Zenodo’s sandbox instead of the official Zenodo.
- unit_test (bool) – If True, we do not ask user input and do not publish.
Submodules¶
- asteroid.models.base_models module
- asteroid.models.conv_tasnet module
- asteroid.models.dccrnet module
- asteroid.models.dcunet module
- asteroid.models.demask module
- asteroid.models.dprnn_tasnet module
- asteroid.models.dptnet module
- asteroid.models.lstm_tasnet module
- asteroid.models.publisher module
- asteroid.models.sudormrf module
- asteroid.models.zenodo module