DNN building blocks¶
Mask estimators¶
Ready-to-use¶
-
asteroid.masknn.blocks.
TDConvNet
(*args, **kwargs)[source]¶ Temporal Convolutional network used in ConvTasnet.
Parameters: - in_chan (int) – Number of input filters.
- n_src (int) – Number of masks to estimate.
- out_chan (int, optional) – Number of bins in the estimated masks.
If
None
, out_chan = in_chan. - n_blocks (int, optional) – Number of convolutional blocks in each repeat. Defaults to 8.
- n_repeats (int, optional) – Number of repeats. Defaults to 3.
- bn_chan (int, optional) – Number of channels after the bottleneck.
- hid_chan (int, optional) – Number of channels in the convolutional blocks.
- skip_chan (int, optional) – Number of channels in the skip connections. If 0 or None, TDConvNet won’t have any skip connections and the masks will be computed from the residual output. Corresponds to the ConvTasnet architecture in v1 or the paper.
- conv_kernel_size (int, optional) – Kernel size in convolutional blocks.
- norm_type (str, optional) – To choose from
'BN'
,'gLN'
,'cLN'
. - mask_act (str, optional) – Which non-linear function to generate mask.
References
[1] : “Conv-TasNet: Surpassing ideal time-frequency magnitude masking for speech separation” TASLP 2019 Yi Luo, Nima Mesgarani https://arxiv.org/abs/1809.07454
-
asteroid.masknn.blocks.
DPRNN
(*args, **kwargs)[source]¶ - Dual-path RNN Network for Single-Channel Source Separation
- introduced in [1].
Parameters: - in_chan (int) – Number of input filters.
- n_src (int) – Number of masks to estimate.
- out_chan (int or None) – Number of bins in the estimated masks. Defaults to in_chan.
- bn_chan (int) – Number of channels after the bottleneck. Defaults to 128.
- hid_size (int) – Number of neurons in the RNNs cell state. Defaults to 128.
- chunk_size (int) – window size of overlap and add processing. Defaults to 100.
- hop_size (int or None) – hop size (stride) of overlap and add processing. Default to chunk_size // 2 (50% overlap).
- n_repeats (int) – Number of repeats. Defaults to 6.
- norm_type (str, optional) –
Type of normalization to use. To choose from
'gLN'
: global Layernorm'cLN'
: channelwise Layernorm
- mask_act (str, optional) – Which non-linear function to generate mask.
- bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN (Intra-Chunk is always bidirectional).
- rnn_type (str, optional) – Type of RNN used. Choose between
'RNN'
,'LSTM'
and'GRU'
. - num_layers (int, optional) – Number of layers in each RNN.
- dropout (float, optional) – Dropout ratio, must be in [0,1].
References
- [1] “Dual-path RNN: efficient long sequence modeling for
- time-domain single-channel speech separation”, Yi Luo, Zhuo Chen and Takuya Yoshioka. https://arxiv.org/abs/1910.06379
Layers¶
-
asteroid.masknn.blocks.
Conv1DBlock
(*args, **kwargs)[source]¶ One dimensional convolutional block, as proposed in [1].
Parameters: - in_chan (int) – Number of input channels.
- hid_chan (int) – Number of hidden channels in the depth-wise convolution.
- skip_out_chan (int) – Number of channels in the skip convolution. If 0 or None, Conv1DBlock won’t have any skip connections. Corresponds to the the block in v1 or the paper. The forward return res instead of [res, skip] in this case.
- kernel_size (int) – Size of the depth-wise convolutional kernel.
- padding (int) – Padding of the depth-wise convolution.
- dilation (int) – Dilation of the depth-wise convolution.
- norm_type (str, optional) –
Type of normalization to use. To choose from
'gLN'
: global Layernorm'cLN'
: channelwise Layernorm'cgLN'
: cumulative global Layernorm
References
[1] : “Conv-TasNet: Surpassing ideal time-frequency magnitude masking for speech separation” TASLP 2019 Yi Luo, Nima Mesgarani https://arxiv.org/abs/1809.07454
-
asteroid.masknn.blocks.
SingleRNN
(*args, **kwargs)[source]¶ Module for a RNN block.
Inspired from https://github.com/yluo42/TAC/blob/master/utility/models.py Licensed under CC BY-NC-SA 3.0 US.
Parameters: - rnn_type (str) – Select from
'RNN'
,'LSTM'
,'GRU'
. Can also be passed in lowercase letters. - input_size (int) – Dimension of the input feature. The input should have shape [batch, seq_len, input_size].
- hidden_size (int) – Dimension of the hidden state.
- n_layers (int, optional) – Number of layers used in RNN. Default is 1.
- dropout (float, optional) – Dropout ratio. Default is 0.
- bidirectional (bool, optional) – Whether the RNN layers are
bidirectional. Default is
False
.
- rnn_type (str) – Select from
-
asteroid.masknn.blocks.
DPRNNBlock
(*args, **kwargs)[source]¶ Dual-Path RNN Block as proposed in [1].
Parameters: - in_chan (int) – Number of input channels.
- hid_size (int) – Number of hidden neurons in the RNNs.
- norm_type (str, optional) – Type of normalization to use. To choose from
-
'gLN'
: global Layernorm -'cLN'
: channelwise Layernorm - bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN.
- rnn_type (str, optional) – Type of RNN used. Choose from
'RNN'
,'LSTM'
and'GRU'
. - num_layers (int, optional) – Number of layers used in each RNN.
- dropout (float, optional) – Dropout ratio. Must be in [0, 1].
References
[1] “Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation”, Yi Luo, Zhuo Chen and Takuya Yoshioka. https://arxiv.org/abs/1910.06379