asteroid.masknn package¶

class asteroid.masknn.TDConvNet(in_chan, n_src, out_chan=None, n_blocks=8, n_repeats=3, bn_chan=128, hid_chan=512, skip_chan=128, conv_kernel_size=3, norm_type='gLN', mask_act='relu', kernel_size=None)[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

Temporal Convolutional network used in ConvTasnet.

Parameters:

in_chan (int) – Number of input filters.
n_src (int) – Number of masks to estimate.
out_chan (int, optional) – Number of bins in the estimated masks. If None, out_chan = in_chan.
n_blocks (int, optional) – Number of convolutional blocks in each repeat. Defaults to 8.
n_repeats (int, optional) – Number of repeats. Defaults to 3.
bn_chan (int, optional) – Number of channels after the bottleneck.
hid_chan (int, optional) – Number of channels in the convolutional blocks.
skip_chan (int, optional) – Number of channels in the skip connections. If 0 or None, TDConvNet won’t have any skip connections and the masks will be computed from the residual output. Corresponds to the ConvTasnet architecture in v1 or the paper.
conv_kernel_size (int, optional) – Kernel size in convolutional blocks.
norm_type (str, optional) – To choose from 'BN', 'gLN', 'cLN'.
mask_act (str, optional) – Which non-linear function to generate mask.

References

[1] : “Conv-TasNet: Surpassing ideal time-frequency magnitude masking for speech separation” TASLP 2019 Yi Luo, Nima Mesgarani https://arxiv.org/abs/1809.07454

forward(mixture_w)[source]¶

Parameters:	mixture_w (`torch.Tensor`) – Tensor of shape [batch, n_filters, n_frames]
Returns:	`torch.Tensor` – estimated mask of shape [batch, n_src, n_filters, n_frames]

get_config()[source]¶

class asteroid.masknn.DPRNN(in_chan, n_src, out_chan=None, bn_chan=128, hid_size=128, chunk_size=100, hop_size=None, n_repeats=6, norm_type='gLN', mask_act='relu', bidirectional=True, rnn_type='LSTM', num_layers=1, dropout=0)[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

Dual-path RNN Network for Single-Channel Source Separation: introduced in [1].

Parameters:

in_chan (int) – Number of input filters.
n_src (int) – Number of masks to estimate.
out_chan (int or None) – Number of bins in the estimated masks. Defaults to in_chan.
bn_chan (int) – Number of channels after the bottleneck. Defaults to 128.
hid_size (int) – Number of neurons in the RNNs cell state. Defaults to 128.
chunk_size (int) – window size of overlap and add processing. Defaults to 100.
hop_size (int or None) – hop size (stride) of overlap and add processing. Default to chunk_size // 2 (50% overlap).
n_repeats (int) – Number of repeats. Defaults to 6.
norm_type (str, optional) –
Type of normalization to use. To choose from
- 'gLN': global Layernorm
- 'cLN': channelwise Layernorm
mask_act (str, optional) – Which non-linear function to generate mask.
bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN (Intra-Chunk is always bidirectional).
rnn_type (str, optional) – Type of RNN used. Choose between 'RNN', 'LSTM' and 'GRU'.
num_layers (int, optional) – Number of layers in each RNN.
dropout (float, optional) – Dropout ratio, must be in [0,1].

References

[1] “Dual-path RNN: efficient long sequence modeling for: time-domain single-channel speech separation”, Yi Luo, Zhuo Chen and Takuya Yoshioka. https://arxiv.org/abs/1910.06379

forward(mixture_w)[source]¶

Parameters:	mixture_w (`torch.Tensor`) – Tensor of shape [batch, n_filters, n_frames]
Returns:	`torch.Tensor` estimated mask of shape [batch, n_src, n_filters, n_frames]

get_config()[source]¶

class asteroid.masknn.DPTransformer(in_chan, n_src, n_heads=4, ff_hid=256, chunk_size=100, hop_size=None, n_repeats=6, norm_type='gLN', ff_activation='relu', mask_act='relu', bidirectional=True, dropout=0)[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

Dual-path Transformer: introduced in [1].

Parameters:

in_chan (int) – Number of input filters.
n_src (int) – Number of masks to estimate.
n_heads (int) – Number of attention heads.
hid_ff (int) – Number of neurons in the RNNs cell state. Defaults to 256.
chunk_size (int) – window size of overlap and add processing. Defaults to 100.
hop_size (int or None) – hop size (stride) of overlap and add processing. Default to chunk_size // 2 (50% overlap).
n_repeats (int) – Number of repeats. Defaults to 6.
norm_type (str, optional) – Type of normalization to use.
ff_activation (str, optional) – activation function applied at the output of RNN.
mask_act (str, optional) – Which non-linear function to generate mask.
bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN (Intra-Chunk is always bidirectional).
dropout (float, optional) – Dropout ratio, must be in [0,1].

References

[1] Chen, Jingjing, Qirong Mao, and Dong Liu. “Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation.”

arXiv preprint arXiv:2007.13975 (2020).

forward(mixture_w)[source]¶

Parameters:	mixture_w (`torch.Tensor`) – Tensor of shape [batch, n_filters, n_frames]
Returns:	`torch.Tensor` estimated mask of shape [batch, n_src, n_filters, n_frames]

get_config()[source]¶

class asteroid.masknn.LSTMMasker(in_chan, n_src, out_chan=None, rnn_type='lstm', n_layers=4, hid_size=512, dropout=0.3, mask_act='sigmoid', bidirectional=True)[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

LSTM mask network introduced in [1], without skip connections.

Parameters:

in_chan (int) – Number of input filters.
n_src (int) – Number of masks to estimate.
out_chan (int or None) – Number of bins in the estimated masks. Defaults to in_chan.
rnn_type (str, optional) – Type of RNN used. Choose between 'RNN', 'LSTM' and 'GRU'.
n_layers (int, optional) – Number of layers in each RNN.
hid_size (int) – Number of neurons in the RNNs cell state.
mask_act (str, optional) – Which non-linear function to generate mask.
bidirectional (bool, optional) – Whether to use BiLSTM
dropout (float, optional) – Dropout ratio, must be in [0,1].

References

[1]: Yi Luo et al. “Real-time Single-channel Dereverberation and Separation: with Time-domain Audio Separation Network”, Interspeech 2018

forward(x)[source]¶

get_config()[source]¶

class asteroid.masknn.SuDORMRF(in_chan, n_src, bn_chan=128, num_blocks=16, upsampling_depth=4, mask_act='softmax')[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

SuDORMRF mask network, as described in [1].

Parameters:

in_chan (int) – Number of input channels. Also number of output channels.
n_src (int) – Number of sources in the input mixtures.
bn_chan (int, optional) – Number of bins in the bottleneck layer and the UNet blocks.
num_blocks (int) – Number of of UBlocks.
upsampling_depth (int) – Depth of upsampling.
mask_act (str) – Name of output activation.

References

[1] : “Sudo rm -rf: Efficient Networks for Universal Audio Source Separation”,: Tzinis et al. MLSP 2020.

forward(x)[source]¶

get_config()[source]¶

class asteroid.masknn.SuDORMRFImproved(in_chan, n_src, bn_chan=128, num_blocks=16, upsampling_depth=4, mask_act='relu')[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

Improved SuDORMRF mask network, as described in [1].

Parameters:	in_chan (int) – Number of input channels. Also number of output channels. n_src (int) – Number of sources in the input mixtures. bn_chan (int, optional) – Number of bins in the bottleneck layer and the UNet blocks. num_blocks (int) – Number of of UBlocks upsampling_depth (int) – Depth of upsampling mask_act (str) – Name of output activation.

References

[1] : “Sudo rm -rf: Efficient Networks for Universal Audio Source Separation”,: Tzinis et al. MLSP 2020.

forward(x)[source]¶

get_config()[source]¶

asteroid.masknn package¶

Submodules¶