DNN building blocks¶
Convolutional blocks¶
-
class
asteroid.masknn.convolutional.Conv1DBlock(in_chan, hid_chan, skip_out_chan, kernel_size, padding, dilation, norm_type='gLN')[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObjectOne dimensional convolutional block, as proposed in [1].
Parameters: - in_chan (int) – Number of input channels.
- hid_chan (int) – Number of hidden channels in the depth-wise convolution.
- skip_out_chan (int) – Number of channels in the skip convolution. If 0 or None, Conv1DBlock won’t have any skip connections. Corresponds to the the block in v1 or the paper. The forward return res instead of [res, skip] in this case.
- kernel_size (int) – Size of the depth-wise convolutional kernel.
- padding (int) – Padding of the depth-wise convolution.
- dilation (int) – Dilation of the depth-wise convolution.
- norm_type (str, optional) –
Type of normalization to use. To choose from
'gLN': global Layernorm'cLN': channelwise Layernorm'cgLN': cumulative global Layernorm
References
[1] : “Conv-TasNet: Surpassing ideal time-frequency magnitude masking for speech separation” TASLP 2019 Yi Luo, Nima Mesgarani https://arxiv.org/abs/1809.07454
-
class
asteroid.masknn.convolutional.DCUMaskNet(encoders, decoders, mask_bound='tanh', **kwargs)[source]¶ Bases:
asteroid.masknn.base.BaseDCUMaskNetMasking part of DCUNet, as proposed in [1].
Valid architecture values for the
default_architectureclassmethod are: “Large-DCUNet-20”, “DCUNet-20”, “DCUNet-16”, “DCUNet-10”.References
[1] : “Phase-aware Speech Enhancement with Deep Complex U-Net”, Hyeong-Seok Choi et al. https://arxiv.org/abs/1903.03107
-
class
asteroid.masknn.convolutional.DCUNetComplexDecoderBlock(in_chan, out_chan, kernel_size, stride, padding, norm_type='bN', activation='leaky_relu')[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObjectDecoder block as proposed in [1].
Parameters: - in_chan (int) – Number of input channels.
- out_chan (int) – Number of output channels.
- kernel_size (Tuple[int, int]) – Convolution kernel size.
- stride (Tuple[int, int]) – Convolution stride.
- padding (Tuple[int, int]) – Convolution padding.
- norm_type (str, optional) – Type of normalization to use.
See
asteroid.masknn.normsfor valid values. - activation (str, optional) – Type of activation to use.
See
asteroid.masknn.activationsfor valid values.
References
[1] : “Phase-aware Speech Enhancement with Deep Complex U-Net”, Hyeong-Seok Choi et al. https://arxiv.org/abs/1903.03107
-
class
asteroid.masknn.convolutional.DCUNetComplexEncoderBlock(in_chan, out_chan, kernel_size, stride, padding, norm_type='bN', activation='leaky_relu')[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObjectEncoder block as proposed in [1].
Parameters: - in_chan (int) – Number of input channels.
- out_chan (int) – Number of output channels.
- kernel_size (Tuple[int, int]) – Convolution kernel size.
- stride (Tuple[int, int]) – Convolution stride.
- padding (Tuple[int, int]) – Convolution padding.
- norm_type (str, optional) – Type of normalization to use.
See
asteroid.masknn.normsfor valid values. - activation (str, optional) – Type of activation to use.
See
asteroid.masknn.activationsfor valid values.
References
[1] : “Phase-aware Speech Enhancement with Deep Complex U-Net”, Hyeong-Seok Choi et al. https://arxiv.org/abs/1903.03107
-
class
asteroid.masknn.convolutional.SuDORMRF(in_chan, n_src, bn_chan=128, num_blocks=16, upsampling_depth=4, mask_act='softmax')[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObjectSuDORMRF mask network, as described in [1].
Parameters: - in_chan (int) – Number of input channels. Also number of output channels.
- n_src (int) – Number of sources in the input mixtures.
- bn_chan (int, optional) – Number of bins in the bottleneck layer and the UNet blocks.
- num_blocks (int) – Number of of UBlocks.
- upsampling_depth (int) – Depth of upsampling.
- mask_act (str) – Name of output activation.
References
- [1] : “Sudo rm -rf: Efficient Networks for Universal Audio Source Separation”,
- Tzinis et al. MLSP 2020.
-
class
asteroid.masknn.convolutional.SuDORMRFImproved(in_chan, n_src, bn_chan=128, num_blocks=16, upsampling_depth=4, mask_act='relu')[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObjectImproved SuDORMRF mask network, as described in [1].
Parameters: - in_chan (int) – Number of input channels. Also number of output channels.
- n_src (int) – Number of sources in the input mixtures.
- bn_chan (int, optional) – Number of bins in the bottleneck layer and the UNet blocks.
- num_blocks (int) – Number of of UBlocks
- upsampling_depth (int) – Depth of upsampling
- mask_act (str) – Name of output activation.
References
- [1] : “Sudo rm -rf: Efficient Networks for Universal Audio Source Separation”,
- Tzinis et al. MLSP 2020.
-
class
asteroid.masknn.convolutional.TDConvNet(in_chan, n_src, out_chan=None, n_blocks=8, n_repeats=3, bn_chan=128, hid_chan=512, skip_chan=128, conv_kernel_size=3, norm_type='gLN', mask_act='relu', kernel_size=None)[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObjectTemporal Convolutional network used in ConvTasnet.
Parameters: - in_chan (int) – Number of input filters.
- n_src (int) – Number of masks to estimate.
- out_chan (int, optional) – Number of bins in the estimated masks.
If
None, out_chan = in_chan. - n_blocks (int, optional) – Number of convolutional blocks in each repeat. Defaults to 8.
- n_repeats (int, optional) – Number of repeats. Defaults to 3.
- bn_chan (int, optional) – Number of channels after the bottleneck.
- hid_chan (int, optional) – Number of channels in the convolutional blocks.
- skip_chan (int, optional) – Number of channels in the skip connections. If 0 or None, TDConvNet won’t have any skip connections and the masks will be computed from the residual output. Corresponds to the ConvTasnet architecture in v1 or the paper.
- conv_kernel_size (int, optional) – Kernel size in convolutional blocks.
- norm_type (str, optional) – To choose from
'BN','gLN','cLN'. - mask_act (str, optional) – Which non-linear function to generate mask.
References
[1] : “Conv-TasNet: Surpassing ideal time-frequency magnitude masking for speech separation” TASLP 2019 Yi Luo, Nima Mesgarani https://arxiv.org/abs/1809.07454
-
forward(mixture_w)[source]¶ Parameters: mixture_w ( torch.Tensor) – Tensor of shape [batch, n_filters, n_frames]Returns: torch.Tensor– estimated mask of shape [batch, n_src, n_filters, n_frames]
-
class
asteroid.masknn.convolutional.TDConvNetpp(in_chan, n_src, out_chan=None, n_blocks=8, n_repeats=3, bn_chan=128, hid_chan=512, skip_chan=128, conv_kernel_size=3, norm_type='fgLN', mask_act='relu')[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObjectImproved Temporal Convolutional network used in [1] (TDCN++)
Parameters: - in_chan (int) – Number of input filters.
- n_src (int) – Number of masks to estimate.
- out_chan (int, optional) – Number of bins in the estimated masks.
If
None, out_chan = in_chan. - n_blocks (int, optional) – Number of convolutional blocks in each repeat. Defaults to 8.
- n_repeats (int, optional) – Number of repeats. Defaults to 3.
- bn_chan (int, optional) – Number of channels after the bottleneck.
- hid_chan (int, optional) – Number of channels in the convolutional blocks.
- skip_chan (int, optional) – Number of channels in the skip connections. If 0 or None, TDConvNet won’t have any skip connections and the masks will be computed from the residual output. Corresponds to the ConvTasnet architecture in v1 or the paper.
- kernel_size (int, optional) – Kernel size in convolutional blocks.
- norm_type (str, optional) – To choose from
'BN','gLN','cLN'. - mask_act (str, optional) – Which non-linear function to generate mask.
References
[1] : Kavalerov, Ilya et al. “Universal Sound Separation.” in WASPAA 2019
Notes
The differences wrt to ConvTasnet’s TCN are 1. Channel wise layer norm instead of global 2. Longer-range skip-residual connections from earlier repeat inputs
to later repeat inputs after passing them through dense layer.- Learnable scaling parameter after each dense layer. The scaling
- parameter for the second dense layer in each convolutional block (which is applied rightbefore the residual connection) is initialized to an exponentially decaying scalar equal to 0.9**L, where L is the layer or block index.
-
forward(mixture_w)[source]¶ Parameters: mixture_w ( torch.Tensor) – Tensor of shape [batch, n_filters, n_frames]Returns: torch.Tensor– estimated mask of shape [batch, n_src, n_filters, n_frames]
-
class
asteroid.masknn.convolutional.UBlock(out_chan=128, in_chan=512, upsampling_depth=4)[source]¶ Bases:
asteroid.masknn.convolutional._BaseUBlockUpsampling block.
- Based on the following principle:
REDUCE ---> SPLIT ---> TRANSFORM --> MERGE
Recurrent blocks¶
-
class
asteroid.masknn.recurrent.DCCRMaskNet(encoders, decoders, n_freqs, **kwargs)[source]¶ Bases:
asteroid.masknn.base.BaseDCUMaskNetMasking part of DCCRNet, as proposed in [1].
Valid architecture values for the
default_architectureclassmethod are: “DCCRN”.Parameters: - encoders (list of length N of tuples of (in_chan, out_chan, kernel_size, stride, padding)) – Arguments of encoders of the u-net
- decoders (list of length N of tuples of (in_chan, out_chan, kernel_size, stride, padding)) – Arguments of decoders of the u-net
- n_freqs (int) – Number of frequencies (dim 1) of input to ``.forward()`. n_freqs - 1 must be divisible by f_0 * f_1 * … * f_N where f_k are the frequency strides of the encoders.
References
[1] : “DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement”, Yanxin Hu et al. https://arxiv.org/abs/2008.00264
-
class
asteroid.masknn.recurrent.DCCRMaskNetRNN(in_size, hid_size=128, rnn_type='LSTM', norm_type=None)[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObjectRNN (LSTM) layer between encoders and decoders introduced in [1].
Parameters: - in_size (int) – Number of inputs to the RNN. Must be the product of non-batch, non-time dimensions of output shape of last encoder, i.e. if the last encoder output shape is [batch, n_chans, n_freqs, time], in_size must be n_chans * n_freqs.
- hid_size (int, optional) – Number of units in RNN.
- rnn_type (str, optional) – Type of RNN to use. See
SingleRNNfor valid values. - norm_type (Optional[str], optional) – Norm to use after linear.
See
asteroid.masknn.normsfor valid values. (Not used in [1]).
References
[1] : “DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement”, Yanxin Hu et al. https://arxiv.org/abs/2008.00264
-
class
asteroid.masknn.recurrent.DPRNN(in_chan, n_src, out_chan=None, bn_chan=128, hid_size=128, chunk_size=100, hop_size=None, n_repeats=6, norm_type='gLN', mask_act='relu', bidirectional=True, rnn_type='LSTM', num_layers=1, dropout=0)[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject- Dual-path RNN Network for Single-Channel Source Separation
- introduced in [1].
Parameters: - in_chan (int) – Number of input filters.
- n_src (int) – Number of masks to estimate.
- out_chan (int or None) – Number of bins in the estimated masks. Defaults to in_chan.
- bn_chan (int) – Number of channels after the bottleneck. Defaults to 128.
- hid_size (int) – Number of neurons in the RNNs cell state. Defaults to 128.
- chunk_size (int) – window size of overlap and add processing. Defaults to 100.
- hop_size (int or None) – hop size (stride) of overlap and add processing. Default to chunk_size // 2 (50% overlap).
- n_repeats (int) – Number of repeats. Defaults to 6.
- norm_type (str, optional) –
Type of normalization to use. To choose from
'gLN': global Layernorm'cLN': channelwise Layernorm
- mask_act (str, optional) – Which non-linear function to generate mask.
- bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN (Intra-Chunk is always bidirectional).
- rnn_type (str, optional) – Type of RNN used. Choose between
'RNN','LSTM'and'GRU'. - num_layers (int, optional) – Number of layers in each RNN.
- dropout (float, optional) – Dropout ratio, must be in [0,1].
References
- [1] “Dual-path RNN: efficient long sequence modeling for
- time-domain single-channel speech separation”, Yi Luo, Zhuo Chen and Takuya Yoshioka. https://arxiv.org/abs/1910.06379
-
forward(mixture_w)[source]¶ Parameters: mixture_w ( torch.Tensor) – Tensor of shape [batch, n_filters, n_frames]Returns: torch.Tensor- estimated mask of shape [batch, n_src, n_filters, n_frames]
-
class
asteroid.masknn.recurrent.DPRNNBlock(in_chan, hid_size, norm_type='gLN', bidirectional=True, rnn_type='LSTM', num_layers=1, dropout=0)[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObjectDual-Path RNN Block as proposed in [1].
Parameters: - in_chan (int) – Number of input channels.
- hid_size (int) – Number of hidden neurons in the RNNs.
- norm_type (str, optional) – Type of normalization to use. To choose from
-
'gLN': global Layernorm -'cLN': channelwise Layernorm - bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN.
- rnn_type (str, optional) – Type of RNN used. Choose from
'RNN','LSTM'and'GRU'. - num_layers (int, optional) – Number of layers used in each RNN.
- dropout (float, optional) – Dropout ratio. Must be in [0, 1].
References
[1] “Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation”, Yi Luo, Zhuo Chen and Takuya Yoshioka. https://arxiv.org/abs/1910.06379
-
class
asteroid.masknn.recurrent.LSTMMasker(in_chan, n_src, out_chan=None, rnn_type='lstm', n_layers=4, hid_size=512, dropout=0.3, mask_act='sigmoid', bidirectional=True)[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObjectLSTM mask network introduced in [1], without skip connections.
Parameters: - in_chan (int) – Number of input filters.
- n_src (int) – Number of masks to estimate.
- out_chan (int or None) – Number of bins in the estimated masks. Defaults to in_chan.
- rnn_type (str, optional) – Type of RNN used. Choose between
'RNN','LSTM'and'GRU'. - n_layers (int, optional) – Number of layers in each RNN.
- hid_size (int) – Number of neurons in the RNNs cell state.
- mask_act (str, optional) – Which non-linear function to generate mask.
- bidirectional (bool, optional) – Whether to use BiLSTM
- dropout (float, optional) – Dropout ratio, must be in [0,1].
References
- [1]: Yi Luo et al. “Real-time Single-channel Dereverberation and Separation
- with Time-domain Audio Separation Network”, Interspeech 2018
-
class
asteroid.masknn.recurrent.SingleRNN(rnn_type, input_size, hidden_size, n_layers=1, dropout=0, bidirectional=False)[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObjectModule for a RNN block.
Inspired from https://github.com/yluo42/TAC/blob/master/utility/models.py Licensed under CC BY-NC-SA 3.0 US.
Parameters: - rnn_type (str) – Select from
'RNN','LSTM','GRU'. Can also be passed in lowercase letters. - input_size (int) – Dimension of the input feature. The input should have shape [batch, seq_len, input_size].
- hidden_size (int) – Dimension of the hidden state.
- n_layers (int, optional) – Number of layers used in RNN. Default is 1.
- dropout (float, optional) – Dropout ratio. Default is 0.
- bidirectional (bool, optional) – Whether the RNN layers are
bidirectional. Default is
False.
- rnn_type (str) – Select from
-
class
asteroid.masknn.recurrent.StackedResidualBiRNN(rnn_type, n_units, n_layers=4, dropout=0.0, bidirectional=True)[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObjectStacked Bidirectional RNN with builtin residual connection. Residual connections are applied on both RNN directions. Only supports bidiriectional RNNs. See StackedResidualRNN for unidirectional ones.
Parameters: - rnn_type (str) – Select from
'RNN','LSTM','GRU'. Can also be passed in lowercase letters. - n_units (int) – Number of units in recurrent layers. This will also be the expected input size.
- n_layers (int) – Number of recurrent layers.
- dropout (float) – Dropout value, between 0. and 1. (Default: 0.)
- bidirectional (bool) – If True, use bidirectional RNN, else unidirectional. (Default: False)
- rnn_type (str) – Select from
-
class
asteroid.masknn.recurrent.StackedResidualRNN(rnn_type, n_units, n_layers=4, dropout=0.0, bidirectional=False)[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObjectStacked RNN with builtin residual connection. Only supports forward RNNs. See StackedResidualBiRNN for bidirectional ones.
Parameters: - rnn_type (str) – Select from
'RNN','LSTM','GRU'. Can also be passed in lowercase letters. - n_units (int) – Number of units in recurrent layers. This will also be the expected input size.
- n_layers (int) – Number of recurrent layers.
- dropout (float) – Dropout value, between 0. and 1. (Default: 0.)
- bidirectional (bool) – If True, use bidirectional RNN, else unidirectional. (Default: False)
- rnn_type (str) – Select from
Norms¶
-
class
asteroid.masknn.norms.BatchNorm(*args, **kwargs)[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObjectWrapper class for pytorch BatchNorm1D and BatchNorm2D
-
class
asteroid.masknn.norms.ChanLN(channel_size)[source]¶ Bases:
asteroid.masknn.norms._LayerNormChannel-wise Layer Normalization (chanLN).
-
forward(x)[source]¶ Applies forward pass.
Works for any input size > 2D.
Parameters: x ( torch.Tensor) – [batch, chan, *]Returns: torch.Tensor– chanLN_x [batch, chan, *]
-
-
class
asteroid.masknn.norms.CumLN(channel_size)[source]¶ Bases:
asteroid.masknn.norms._LayerNormCumulative Global layer normalization(cumLN).
-
forward(x)[source]¶ Parameters: x ( torch.Tensor) – Shape [batch, channels, length]Returns: torch.Tensor– cumLN_x [batch, channels, length]
-
-
class
asteroid.masknn.norms.FeatsGlobLN(channel_size)[source]¶ Bases:
asteroid.masknn.norms._LayerNormfeature-wise global Layer Normalization (FeatsGlobLN). Applies normalization over frames for each channel.
-
forward(x)[source]¶ Applies forward pass.
Works for any input size > 2D.
Parameters: x ( torch.Tensor) – [batch, chan, time]Returns: torch.Tensor– chanLN_x [batch, chan, time]
-
-
class
asteroid.masknn.norms.GlobLN(channel_size)[source]¶ Bases:
asteroid.masknn.norms._LayerNormGlobal Layer Normalization (globLN).
-
forward(x)[source]¶ Applies forward pass.
Works for any input size > 2D.
Parameters: x ( torch.Tensor) – Shape [batch, chan, *]Returns: torch.Tensor– gLN_x [batch, chan, *]
-
-
asteroid.masknn.norms.bN¶ alias of
asteroid.masknn.norms.BatchNorm
-
asteroid.masknn.norms.cLN¶ alias of
asteroid.masknn.norms.ChanLN
-
asteroid.masknn.norms.cgLN¶ alias of
asteroid.masknn.norms.CumLN
-
asteroid.masknn.norms.fgLN¶ alias of
asteroid.masknn.norms.FeatsGlobLN
-
asteroid.masknn.norms.gLN¶ alias of
asteroid.masknn.norms.GlobLN
-
asteroid.masknn.norms.get(identifier)[source]¶ Returns a norm class from a string. Returns its input if it is callable (already a
_LayerNormfor example).Parameters: identifier (str or Callable or None) – the norm identifier. Returns: _LayerNormor None