Shortcuts

DNN building blocks

Convolutional blocks

class asteroid.masknn.convolutional.Conv1DBlock(in_chan, hid_chan, skip_out_chan, kernel_size, padding, dilation, norm_type='gLN')[source]

Bases: sphinx.ext.autodoc.importer._MockObject

One dimensional convolutional block, as proposed in [1].

Parameters:
  • in_chan (int) – Number of input channels.
  • hid_chan (int) – Number of hidden channels in the depth-wise convolution.
  • skip_out_chan (int) – Number of channels in the skip convolution. If 0 or None, Conv1DBlock won’t have any skip connections. Corresponds to the the block in v1 or the paper. The forward return res instead of [res, skip] in this case.
  • kernel_size (int) – Size of the depth-wise convolutional kernel.
  • padding (int) – Padding of the depth-wise convolution.
  • dilation (int) – Dilation of the depth-wise convolution.
  • norm_type (str, optional) –

    Type of normalization to use. To choose from

    • 'gLN': global Layernorm
    • 'cLN': channelwise Layernorm
    • 'cgLN': cumulative global Layernorm

References

[1] : “Conv-TasNet: Surpassing ideal time-frequency magnitude masking for speech separation” TASLP 2019 Yi Luo, Nima Mesgarani https://arxiv.org/abs/1809.07454

forward(x)[source]

Input shape [batch, feats, seq]

class asteroid.masknn.convolutional.DCUMaskNet(encoders, decoders, mask_bound='tanh', **kwargs)[source]

Bases: asteroid.masknn.base.BaseDCUMaskNet

Masking part of DCUNet, as proposed in [1].

Valid architecture values for the default_architecture classmethod are: “Large-DCUNet-20”, “DCUNet-20”, “DCUNet-16”, “DCUNet-10”.

References

[1] : “Phase-aware Speech Enhancement with Deep Complex U-Net”, Hyeong-Seok Choi et al. https://arxiv.org/abs/1903.03107

class asteroid.masknn.convolutional.DCUNetComplexDecoderBlock(in_chan, out_chan, kernel_size, stride, padding, norm_type='bN', activation='leaky_relu')[source]

Bases: sphinx.ext.autodoc.importer._MockObject

Decoder block as proposed in [1].

Parameters:
  • in_chan (int) – Number of input channels.
  • out_chan (int) – Number of output channels.
  • kernel_size (Tuple[int, int]) – Convolution kernel size.
  • stride (Tuple[int, int]) – Convolution stride.
  • padding (Tuple[int, int]) – Convolution padding.
  • norm_type (str, optional) – Type of normalization to use. See asteroid.masknn.norms for valid values.
  • activation (str, optional) – Type of activation to use. See asteroid.masknn.activations for valid values.

References

[1] : “Phase-aware Speech Enhancement with Deep Complex U-Net”, Hyeong-Seok Choi et al. https://arxiv.org/abs/1903.03107

class asteroid.masknn.convolutional.DCUNetComplexEncoderBlock(in_chan, out_chan, kernel_size, stride, padding, norm_type='bN', activation='leaky_relu')[source]

Bases: sphinx.ext.autodoc.importer._MockObject

Encoder block as proposed in [1].

Parameters:
  • in_chan (int) – Number of input channels.
  • out_chan (int) – Number of output channels.
  • kernel_size (Tuple[int, int]) – Convolution kernel size.
  • stride (Tuple[int, int]) – Convolution stride.
  • padding (Tuple[int, int]) – Convolution padding.
  • norm_type (str, optional) – Type of normalization to use. See asteroid.masknn.norms for valid values.
  • activation (str, optional) – Type of activation to use. See asteroid.masknn.activations for valid values.

References

[1] : “Phase-aware Speech Enhancement with Deep Complex U-Net”, Hyeong-Seok Choi et al. https://arxiv.org/abs/1903.03107

class asteroid.masknn.convolutional.SuDORMRF(in_chan, n_src, bn_chan=128, num_blocks=16, upsampling_depth=4, mask_act='softmax')[source]

Bases: sphinx.ext.autodoc.importer._MockObject

SuDORMRF mask network, as described in [1].

Parameters:
  • in_chan (int) – Number of input channels. Also number of output channels.
  • n_src (int) – Number of sources in the input mixtures.
  • bn_chan (int, optional) – Number of bins in the bottleneck layer and the UNet blocks.
  • num_blocks (int) – Number of of UBlocks.
  • upsampling_depth (int) – Depth of upsampling.
  • mask_act (str) – Name of output activation.

References

[1] : “Sudo rm -rf: Efficient Networks for Universal Audio Source Separation”,
Tzinis et al. MLSP 2020.
class asteroid.masknn.convolutional.SuDORMRFImproved(in_chan, n_src, bn_chan=128, num_blocks=16, upsampling_depth=4, mask_act='relu')[source]

Bases: sphinx.ext.autodoc.importer._MockObject

Improved SuDORMRF mask network, as described in [1].

Parameters:
  • in_chan (int) – Number of input channels. Also number of output channels.
  • n_src (int) – Number of sources in the input mixtures.
  • bn_chan (int, optional) – Number of bins in the bottleneck layer and the UNet blocks.
  • num_blocks (int) – Number of of UBlocks
  • upsampling_depth (int) – Depth of upsampling
  • mask_act (str) – Name of output activation.

References

[1] : “Sudo rm -rf: Efficient Networks for Universal Audio Source Separation”,
Tzinis et al. MLSP 2020.
class asteroid.masknn.convolutional.TDConvNet(in_chan, n_src, out_chan=None, n_blocks=8, n_repeats=3, bn_chan=128, hid_chan=512, skip_chan=128, conv_kernel_size=3, norm_type='gLN', mask_act='relu', kernel_size=None)[source]

Bases: sphinx.ext.autodoc.importer._MockObject

Temporal Convolutional network used in ConvTasnet.

Parameters:
  • in_chan (int) – Number of input filters.
  • n_src (int) – Number of masks to estimate.
  • out_chan (int, optional) – Number of bins in the estimated masks. If None, out_chan = in_chan.
  • n_blocks (int, optional) – Number of convolutional blocks in each repeat. Defaults to 8.
  • n_repeats (int, optional) – Number of repeats. Defaults to 3.
  • bn_chan (int, optional) – Number of channels after the bottleneck.
  • hid_chan (int, optional) – Number of channels in the convolutional blocks.
  • skip_chan (int, optional) – Number of channels in the skip connections. If 0 or None, TDConvNet won’t have any skip connections and the masks will be computed from the residual output. Corresponds to the ConvTasnet architecture in v1 or the paper.
  • conv_kernel_size (int, optional) – Kernel size in convolutional blocks.
  • norm_type (str, optional) – To choose from 'BN', 'gLN', 'cLN'.
  • mask_act (str, optional) – Which non-linear function to generate mask.

References

[1] : “Conv-TasNet: Surpassing ideal time-frequency magnitude masking for speech separation” TASLP 2019 Yi Luo, Nima Mesgarani https://arxiv.org/abs/1809.07454

forward(mixture_w)[source]
Parameters:mixture_w (torch.Tensor) – Tensor of shape [batch, n_filters, n_frames]
Returns:torch.Tensor – estimated mask of shape [batch, n_src, n_filters, n_frames]
class asteroid.masknn.convolutional.TDConvNetpp(in_chan, n_src, out_chan=None, n_blocks=8, n_repeats=3, bn_chan=128, hid_chan=512, skip_chan=128, conv_kernel_size=3, norm_type='fgLN', mask_act='relu')[source]

Bases: sphinx.ext.autodoc.importer._MockObject

Improved Temporal Convolutional network used in [1] (TDCN++)

Parameters:
  • in_chan (int) – Number of input filters.
  • n_src (int) – Number of masks to estimate.
  • out_chan (int, optional) – Number of bins in the estimated masks. If None, out_chan = in_chan.
  • n_blocks (int, optional) – Number of convolutional blocks in each repeat. Defaults to 8.
  • n_repeats (int, optional) – Number of repeats. Defaults to 3.
  • bn_chan (int, optional) – Number of channels after the bottleneck.
  • hid_chan (int, optional) – Number of channels in the convolutional blocks.
  • skip_chan (int, optional) – Number of channels in the skip connections. If 0 or None, TDConvNet won’t have any skip connections and the masks will be computed from the residual output. Corresponds to the ConvTasnet architecture in v1 or the paper.
  • kernel_size (int, optional) – Kernel size in convolutional blocks.
  • norm_type (str, optional) – To choose from 'BN', 'gLN', 'cLN'.
  • mask_act (str, optional) – Which non-linear function to generate mask.

References

[1] : Kavalerov, Ilya et al. “Universal Sound Separation.” in WASPAA 2019

Notes

The differences wrt to ConvTasnet’s TCN are 1. Channel wise layer norm instead of global 2. Longer-range skip-residual connections from earlier repeat inputs

to later repeat inputs after passing them through dense layer.
  1. Learnable scaling parameter after each dense layer. The scaling
    parameter for the second dense layer in each convolutional block (which is applied rightbefore the residual connection) is initialized to an exponentially decaying scalar equal to 0.9**L, where L is the layer or block index.
forward(mixture_w)[source]
Parameters:mixture_w (torch.Tensor) – Tensor of shape [batch, n_filters, n_frames]
Returns:torch.Tensor – estimated mask of shape [batch, n_src, n_filters, n_frames]
class asteroid.masknn.convolutional.UBlock(out_chan=128, in_chan=512, upsampling_depth=4)[source]

Bases: asteroid.masknn.convolutional._BaseUBlock

Upsampling block.

Based on the following principle:
REDUCE ---> SPLIT ---> TRANSFORM --> MERGE
forward(x)[source]
Parameters:x – input feature map
Returns:transformed feature map
class asteroid.masknn.convolutional.UConvBlock(out_chan=128, in_chan=512, upsampling_depth=4)[source]

Bases: asteroid.masknn.convolutional._BaseUBlock

Block which performs successive downsampling and upsampling in order to be able to analyze the input features in multiple resolutions.

forward(x)[source]
Args
x: input feature map
Returns:transformed feature map

Recurrent blocks

class asteroid.masknn.recurrent.DCCRMaskNet(encoders, decoders, n_freqs, **kwargs)[source]

Bases: asteroid.masknn.base.BaseDCUMaskNet

Masking part of DCCRNet, as proposed in [1].

Valid architecture values for the default_architecture classmethod are: “DCCRN”.

Parameters:
  • encoders (list of length N of tuples of (in_chan, out_chan, kernel_size, stride, padding)) – Arguments of encoders of the u-net
  • decoders (list of length N of tuples of (in_chan, out_chan, kernel_size, stride, padding)) – Arguments of decoders of the u-net
  • n_freqs (int) – Number of frequencies (dim 1) of input to ``.forward()`. n_freqs - 1 must be divisible by f_0 * f_1 * … * f_N where f_k are the frequency strides of the encoders.

References

[1] : “DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement”, Yanxin Hu et al. https://arxiv.org/abs/2008.00264

class asteroid.masknn.recurrent.DCCRMaskNetRNN(in_size, hid_size=128, rnn_type='LSTM', norm_type=None)[source]

Bases: sphinx.ext.autodoc.importer._MockObject

RNN (LSTM) layer between encoders and decoders introduced in [1].

Parameters:
  • in_size (int) – Number of inputs to the RNN. Must be the product of non-batch, non-time dimensions of output shape of last encoder, i.e. if the last encoder output shape is [batch, n_chans, n_freqs, time], in_size must be n_chans * n_freqs.
  • hid_size (int, optional) – Number of units in RNN.
  • rnn_type (str, optional) – Type of RNN to use. See SingleRNN for valid values.
  • norm_type (Optional[str], optional) – Norm to use after linear. See asteroid.masknn.norms for valid values. (Not used in [1]).

References

[1] : “DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement”, Yanxin Hu et al. https://arxiv.org/abs/2008.00264

forward(x: <sphinx.ext.autodoc.importer._MockObject object at 0x7f827d13e050>)[source]

Input shape: [batch, …, time]

class asteroid.masknn.recurrent.DPRNN(in_chan, n_src, out_chan=None, bn_chan=128, hid_size=128, chunk_size=100, hop_size=None, n_repeats=6, norm_type='gLN', mask_act='relu', bidirectional=True, rnn_type='LSTM', num_layers=1, dropout=0)[source]

Bases: sphinx.ext.autodoc.importer._MockObject

Dual-path RNN Network for Single-Channel Source Separation
introduced in [1].
Parameters:
  • in_chan (int) – Number of input filters.
  • n_src (int) – Number of masks to estimate.
  • out_chan (int or None) – Number of bins in the estimated masks. Defaults to in_chan.
  • bn_chan (int) – Number of channels after the bottleneck. Defaults to 128.
  • hid_size (int) – Number of neurons in the RNNs cell state. Defaults to 128.
  • chunk_size (int) – window size of overlap and add processing. Defaults to 100.
  • hop_size (int or None) – hop size (stride) of overlap and add processing. Default to chunk_size // 2 (50% overlap).
  • n_repeats (int) – Number of repeats. Defaults to 6.
  • norm_type (str, optional) –

    Type of normalization to use. To choose from

    • 'gLN': global Layernorm
    • 'cLN': channelwise Layernorm
  • mask_act (str, optional) – Which non-linear function to generate mask.
  • bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN (Intra-Chunk is always bidirectional).
  • rnn_type (str, optional) – Type of RNN used. Choose between 'RNN', 'LSTM' and 'GRU'.
  • num_layers (int, optional) – Number of layers in each RNN.
  • dropout (float, optional) – Dropout ratio, must be in [0,1].

References

[1] “Dual-path RNN: efficient long sequence modeling for
time-domain single-channel speech separation”, Yi Luo, Zhuo Chen and Takuya Yoshioka. https://arxiv.org/abs/1910.06379
forward(mixture_w)[source]
Parameters:mixture_w (torch.Tensor) – Tensor of shape [batch, n_filters, n_frames]
Returns:
torch.Tensor
estimated mask of shape [batch, n_src, n_filters, n_frames]
class asteroid.masknn.recurrent.DPRNNBlock(in_chan, hid_size, norm_type='gLN', bidirectional=True, rnn_type='LSTM', num_layers=1, dropout=0)[source]

Bases: sphinx.ext.autodoc.importer._MockObject

Dual-Path RNN Block as proposed in [1].

Parameters:
  • in_chan (int) – Number of input channels.
  • hid_size (int) – Number of hidden neurons in the RNNs.
  • norm_type (str, optional) – Type of normalization to use. To choose from - 'gLN': global Layernorm - 'cLN': channelwise Layernorm
  • bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN.
  • rnn_type (str, optional) – Type of RNN used. Choose from 'RNN', 'LSTM' and 'GRU'.
  • num_layers (int, optional) – Number of layers used in each RNN.
  • dropout (float, optional) – Dropout ratio. Must be in [0, 1].

References

[1] “Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation”, Yi Luo, Zhuo Chen and Takuya Yoshioka. https://arxiv.org/abs/1910.06379

forward(x)[source]

Input shape : [batch, feats, chunk_size, num_chunks]

class asteroid.masknn.recurrent.LSTMMasker(in_chan, n_src, out_chan=None, rnn_type='lstm', n_layers=4, hid_size=512, dropout=0.3, mask_act='sigmoid', bidirectional=True)[source]

Bases: sphinx.ext.autodoc.importer._MockObject

LSTM mask network introduced in [1], without skip connections.

Parameters:
  • in_chan (int) – Number of input filters.
  • n_src (int) – Number of masks to estimate.
  • out_chan (int or None) – Number of bins in the estimated masks. Defaults to in_chan.
  • rnn_type (str, optional) – Type of RNN used. Choose between 'RNN', 'LSTM' and 'GRU'.
  • n_layers (int, optional) – Number of layers in each RNN.
  • hid_size (int) – Number of neurons in the RNNs cell state.
  • mask_act (str, optional) – Which non-linear function to generate mask.
  • bidirectional (bool, optional) – Whether to use BiLSTM
  • dropout (float, optional) – Dropout ratio, must be in [0,1].

References

[1]: Yi Luo et al. “Real-time Single-channel Dereverberation and Separation
with Time-domain Audio Separation Network”, Interspeech 2018
class asteroid.masknn.recurrent.SingleRNN(rnn_type, input_size, hidden_size, n_layers=1, dropout=0, bidirectional=False)[source]

Bases: sphinx.ext.autodoc.importer._MockObject

Module for a RNN block.

Inspired from https://github.com/yluo42/TAC/blob/master/utility/models.py Licensed under CC BY-NC-SA 3.0 US.

Parameters:
  • rnn_type (str) – Select from 'RNN', 'LSTM', 'GRU'. Can also be passed in lowercase letters.
  • input_size (int) – Dimension of the input feature. The input should have shape [batch, seq_len, input_size].
  • hidden_size (int) – Dimension of the hidden state.
  • n_layers (int, optional) – Number of layers used in RNN. Default is 1.
  • dropout (float, optional) – Dropout ratio. Default is 0.
  • bidirectional (bool, optional) – Whether the RNN layers are bidirectional. Default is False.
forward(inp)[source]

Input shape [batch, seq, feats]

class asteroid.masknn.recurrent.StackedResidualBiRNN(rnn_type, n_units, n_layers=4, dropout=0.0, bidirectional=True)[source]

Bases: sphinx.ext.autodoc.importer._MockObject

Stacked Bidirectional RNN with builtin residual connection. Residual connections are applied on both RNN directions. Only supports bidiriectional RNNs. See StackedResidualRNN for unidirectional ones.

Parameters:
  • rnn_type (str) – Select from 'RNN', 'LSTM', 'GRU'. Can also be passed in lowercase letters.
  • n_units (int) – Number of units in recurrent layers. This will also be the expected input size.
  • n_layers (int) – Number of recurrent layers.
  • dropout (float) – Dropout value, between 0. and 1. (Default: 0.)
  • bidirectional (bool) – If True, use bidirectional RNN, else unidirectional. (Default: False)
forward(x)[source]

Builtin residual connections + dropout applied before residual. Input shape : [batch, time_axis, feat_axis]

class asteroid.masknn.recurrent.StackedResidualRNN(rnn_type, n_units, n_layers=4, dropout=0.0, bidirectional=False)[source]

Bases: sphinx.ext.autodoc.importer._MockObject

Stacked RNN with builtin residual connection. Only supports forward RNNs. See StackedResidualBiRNN for bidirectional ones.

Parameters:
  • rnn_type (str) – Select from 'RNN', 'LSTM', 'GRU'. Can also be passed in lowercase letters.
  • n_units (int) – Number of units in recurrent layers. This will also be the expected input size.
  • n_layers (int) – Number of recurrent layers.
  • dropout (float) – Dropout value, between 0. and 1. (Default: 0.)
  • bidirectional (bool) – If True, use bidirectional RNN, else unidirectional. (Default: False)
forward(x)[source]

Builtin residual connections + dropout applied before residual. Input shape : [batch, time_axis, feat_axis]

Norms

class asteroid.masknn.norms.BatchNorm(*args, **kwargs)[source]

Bases: sphinx.ext.autodoc.importer._MockObject

Wrapper class for pytorch BatchNorm1D and BatchNorm2D

class asteroid.masknn.norms.ChanLN(channel_size)[source]

Bases: asteroid.masknn.norms._LayerNorm

Channel-wise Layer Normalization (chanLN).

forward(x)[source]

Applies forward pass.

Works for any input size > 2D.

Parameters:x (torch.Tensor) – [batch, chan, *]
Returns:torch.Tensor – chanLN_x [batch, chan, *]
class asteroid.masknn.norms.CumLN(channel_size)[source]

Bases: asteroid.masknn.norms._LayerNorm

Cumulative Global layer normalization(cumLN).

forward(x)[source]
Parameters:x (torch.Tensor) – Shape [batch, channels, length]
Returns:torch.Tensor – cumLN_x [batch, channels, length]
class asteroid.masknn.norms.FeatsGlobLN(channel_size)[source]

Bases: asteroid.masknn.norms._LayerNorm

feature-wise global Layer Normalization (FeatsGlobLN). Applies normalization over frames for each channel.

forward(x)[source]

Applies forward pass.

Works for any input size > 2D.

Parameters:x (torch.Tensor) – [batch, chan, time]
Returns:torch.Tensor – chanLN_x [batch, chan, time]
class asteroid.masknn.norms.GlobLN(channel_size)[source]

Bases: asteroid.masknn.norms._LayerNorm

Global Layer Normalization (globLN).

forward(x)[source]

Applies forward pass.

Works for any input size > 2D.

Parameters:x (torch.Tensor) – Shape [batch, chan, *]
Returns:torch.Tensor – gLN_x [batch, chan, *]
asteroid.masknn.norms.bN

alias of asteroid.masknn.norms.BatchNorm

asteroid.masknn.norms.cLN

alias of asteroid.masknn.norms.ChanLN

asteroid.masknn.norms.cgLN

alias of asteroid.masknn.norms.CumLN

asteroid.masknn.norms.fgLN

alias of asteroid.masknn.norms.FeatsGlobLN

asteroid.masknn.norms.gLN

alias of asteroid.masknn.norms.GlobLN

asteroid.masknn.norms.get(identifier)[source]

Returns a norm class from a string. Returns its input if it is callable (already a _LayerNorm for example).

Parameters:identifier (str or Callable or None) – the norm identifier.
Returns:_LayerNorm or None
asteroid.masknn.norms.get_complex(identifier)[source]

Like .get but returns a complex norm created with asteroid.complex_nn.OnReIm.

asteroid.masknn.norms.register_norm(custom_norm)[source]

Register a custom norm, gettable with norms.get.

Parameters:custom_norm – Custom norm to register.
Read the Docs v: v0.3.5
Versions
latest
stable
v0.3.5_b
v0.3.4
v0.3.3
v0.3.2
v0.3.1
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.