asteroid.masknn.recurrent module¶

class asteroid.masknn.recurrent.SingleRNN(rnn_type, input_size, hidden_size, n_layers=1, dropout=0, bidirectional=False)[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

Module for a RNN block.

Inspired from https://github.com/yluo42/TAC/blob/master/utility/models.py Licensed under CC BY-NC-SA 3.0 US.

Parameters:

rnn_type (str) – Select from 'RNN', 'LSTM', 'GRU'. Can also be passed in lowercase letters.
input_size (int) – Dimension of the input feature. The input should have shape [batch, seq_len, input_size].
hidden_size (int) – Dimension of the hidden state.
n_layers (int, optional) – Number of layers used in RNN. Default is 1.
dropout (float, optional) – Dropout ratio. Default is 0.
bidirectional (bool, optional) – Whether the RNN layers are bidirectional. Default is False.

output_size[source]¶

forward(inp)[source]¶: Input shape [batch, seq, feats]

class asteroid.masknn.recurrent.MulCatRNN(rnn_type, input_size, hidden_size, n_layers=1, dropout=0, bidirectional=False)[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

MulCat RNN block from [1].

Composed of two RNNs, returns cat([RNN_1(x) * RNN_2(x), x]).

Parameters:

rnn_type (str) – Select from 'RNN', 'LSTM', 'GRU'. Can also be passed in lowercase letters.
input_size (int) – Dimension of the input feature. The input should have shape [batch, seq_len, input_size].
hidden_size (int) – Dimension of the hidden state.
n_layers (int, optional) – Number of layers used in RNN. Default is 1.
dropout (float, optional) – Dropout ratio. Default is 0.
bidirectional (bool, optional) – Whether the RNN layers are bidirectional. Default is False.

References: [1] Eliya Nachmani, Yossi Adi, & Lior Wolf. (2020). Voice Separation with an Unknown Number of Multiple Speakers.

output_size[source]¶

forward(inp)[source]¶: Input shape [batch, seq, feats]

class asteroid.masknn.recurrent.StackedResidualRNN(rnn_type, n_units, n_layers=4, dropout=0.0, bidirectional=False)[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

Stacked RNN with builtin residual connection. Only supports forward RNNs. See StackedResidualBiRNN for bidirectional ones.

Parameters:

rnn_type (str) – Select from 'RNN', 'LSTM', 'GRU'. Can also be passed in lowercase letters.
n_units (int) – Number of units in recurrent layers. This will also be the expected input size.
n_layers (int) – Number of recurrent layers.
dropout (float) – Dropout value, between 0. and 1. (Default: 0.)
bidirectional (bool) – If True, use bidirectional RNN, else unidirectional. (Default: False)

forward(x)[source]¶: Builtin residual connections + dropout applied before residual. Input shape : [batch, time_axis, feat_axis]

class asteroid.masknn.recurrent.StackedResidualBiRNN(rnn_type, n_units, n_layers=4, dropout=0.0, bidirectional=True)[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

Stacked Bidirectional RNN with builtin residual connection. Residual connections are applied on both RNN directions. Only supports bidiriectional RNNs. See StackedResidualRNN for unidirectional ones.

Parameters:

rnn_type (str) – Select from 'RNN', 'LSTM', 'GRU'. Can also be passed in lowercase letters.
n_units (int) – Number of units in recurrent layers. This will also be the expected input size.
n_layers (int) – Number of recurrent layers.
dropout (float) – Dropout value, between 0. and 1. (Default: 0.)
bidirectional (bool) – If True, use bidirectional RNN, else unidirectional. (Default: False)

forward(x)[source]¶: Builtin residual connections + dropout applied before residual. Input shape : [batch, time_axis, feat_axis]

class asteroid.masknn.recurrent.DPRNNBlock(in_chan, hid_size, norm_type='gLN', bidirectional=True, rnn_type='LSTM', use_mulcat=False, num_layers=1, dropout=0)[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

Dual-Path RNN Block as proposed in [1].

Parameters:

in_chan (int) – Number of input channels.
hid_size (int) – Number of hidden neurons in the RNNs.
norm_type (str, optional) – Type of normalization to use. To choose from - 'gLN': global Layernorm - 'cLN': channelwise Layernorm
bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN.
rnn_type (str, optional) – Type of RNN used. Choose from 'RNN', 'LSTM' and 'GRU'.
num_layers (int, optional) – Number of layers used in each RNN.
dropout (float, optional) – Dropout ratio. Must be in [0, 1].

References: [1] “Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation”, Yi Luo, Zhuo Chen and Takuya Yoshioka. https://arxiv.org/abs/1910.06379

forward(x)[source]¶: Input shape : [batch, feats, chunk_size, num_chunks]

class asteroid.masknn.recurrent.DPRNN(in_chan, n_src, out_chan=None, bn_chan=128, hid_size=128, chunk_size=100, hop_size=None, n_repeats=6, norm_type='gLN', mask_act='relu', bidirectional=True, rnn_type='LSTM', use_mulcat=False, num_layers=1, dropout=0)[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

Dual-path RNN Network for Single-Channel Source Separation: introduced in [1].

Parameters:

in_chan (int) – Number of input filters.
n_src (int) – Number of masks to estimate.
out_chan (int or None) – Number of bins in the estimated masks. Defaults to in_chan.
bn_chan (int) – Number of channels after the bottleneck. Defaults to 128.
hid_size (int) – Number of neurons in the RNNs cell state. Defaults to 128.
chunk_size (int) – window size of overlap and add processing. Defaults to 100.
hop_size (int or None) – hop size (stride) of overlap and add processing. Default to chunk_size // 2 (50% overlap).
n_repeats (int) – Number of repeats. Defaults to 6.
norm_type (str, optional) –
Type of normalization to use. To choose from
- 'gLN': global Layernorm
- 'cLN': channelwise Layernorm
mask_act (str, optional) – Which non-linear function to generate mask.
bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN (Intra-Chunk is always bidirectional).
rnn_type (str, optional) – Type of RNN used. Choose between 'RNN', 'LSTM' and 'GRU'.
num_layers (int, optional) – Number of layers in each RNN.
dropout (float, optional) – Dropout ratio, must be in [0,1].

References: [1] “Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation”, Yi Luo, Zhuo Chen and Takuya Yoshioka. https://arxiv.org/abs/1910.06379

forward(mixture_w)[source]¶

Forward.

Parameters:	mixture_w (`torch.Tensor`) – Tensor of shape $(batch, nfilters, nframes)$
Returns:	`torch.Tensor` – estimated mask of shape $(batch, nsrc, nfilters, nframes)$

get_config()[source]¶

class asteroid.masknn.recurrent.LSTMMasker(in_chan, n_src, out_chan=None, rnn_type='lstm', n_layers=4, hid_size=512, dropout=0.3, mask_act='sigmoid', bidirectional=True)[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

LSTM mask network introduced in [1], without skip connections.

Parameters:

in_chan (int) – Number of input filters.
n_src (int) – Number of masks to estimate.
out_chan (int or None) – Number of bins in the estimated masks. Defaults to in_chan.
rnn_type (str, optional) – Type of RNN used. Choose between 'RNN', 'LSTM' and 'GRU'.
n_layers (int, optional) – Number of layers in each RNN.
hid_size (int) – Number of neurons in the RNNs cell state.
mask_act (str, optional) – Which non-linear function to generate mask.
bidirectional (bool, optional) – Whether to use BiLSTM
dropout (float, optional) – Dropout ratio, must be in [0,1].

References: [1]: Yi Luo et al. “Real-time Single-channel Dereverberation and Separation with Time-domain Audio Separation Network”, Interspeech 2018

forward(x)[source]¶

get_config()[source]¶

class asteroid.masknn.recurrent.DCCRMaskNetRNN(in_size, hid_size=128, rnn_type='LSTM', n_layers=2, norm_type=None, **rnn_kwargs)[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

RNN (LSTM) layer between encoders and decoders introduced in [1].

Parameters:

in_size (int) – Number of inputs to the RNN. Must be the product of non-batch, non-time dimensions of output shape of last encoder, i.e. if the last encoder output shape is $(batch, nchans, nfreqs, time)$, in_size must be $nchans * nfreqs$.
hid_size (int, optional) – Number of units in RNN.
rnn_type (str, optional) – Type of RNN to use. See SingleRNN for valid values.
n_layers (int, optional) – Number of layers used in RNN.
norm_type (Optional[str], optional) – Norm to use after linear. See asteroid.masknn.norms for valid values. (Not used in [1]).
rnn_kwargs (optional) – Passed to SingleRNN().

References: [1] : “DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement”, Yanxin Hu et al. https://arxiv.org/abs/2008.00264

forward(x: <sphinx.ext.autodoc.importer._MockObject object at 0x7f85c9a0b490>)[source]¶: Input shape: [batch, …, time]

class asteroid.masknn.recurrent.DCCRMaskNet(encoders, decoders, n_freqs, **kwargs)[source]¶

Bases: asteroid.masknn.base.BaseDCUMaskNet

Masking part of DCCRNet, as proposed in [1].

Valid architecture values for the default_architecture classmethod are: “DCCRN” and “mini”.

Parameters:

encoders (list of length N of tuples of (in_chan, out_chan, kernel_size, stride, padding)) – Arguments of encoders of the u-net
decoders (list of length N of tuples of (in_chan, out_chan, kernel_size, stride, padding)) – Arguments of decoders of the u-net
n_freqs (int) – Number of frequencies (dim 1) of input to .forward(). Must be divisible by $f_0 * f_1 * … * f_N$ where $f_k$ are the frequency strides of the encoders.

Input shape is expected to be $(batch, nfreqs, time)$, with $nfreqs$ divisible by $f_0 * f_1 * … * f_N$ where $f_k$ are the frequency strides of the encoders.

References: [1] : “DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement”, Yanxin Hu et al. https://arxiv.org/abs/2008.00264

fix_input_dims(x)[source]¶: Overwrite this in subclasses to implement input dimension checks.