asteroid.masknn.convolutional module¶

class asteroid.masknn.convolutional.Conv1DBlock(in_chan, hid_chan, skip_out_chan, kernel_size, padding, dilation, norm_type='gLN')[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

One dimensional convolutional block, as proposed in [1].

Parameters:

in_chan (int) – Number of input channels.
hid_chan (int) – Number of hidden channels in the depth-wise convolution.
skip_out_chan (int) – Number of channels in the skip convolution. If 0 or None, Conv1DBlock won’t have any skip connections. Corresponds to the the block in v1 or the paper. The forward return res instead of [res, skip] in this case.
kernel_size (int) – Size of the depth-wise convolutional kernel.
padding (int) – Padding of the depth-wise convolution.
dilation (int) – Dilation of the depth-wise convolution.
norm_type (str, optional) –
Type of normalization to use. To choose from
- 'gLN': global Layernorm.
- 'cLN': channelwise Layernorm.
- 'cgLN': cumulative global Layernorm.
- Any norm supported by get()

References: [1] : “Conv-TasNet: Surpassing ideal time-frequency magnitude masking for speech separation” TASLP 2019 Yi Luo, Nima Mesgarani https://arxiv.org/abs/1809.07454

forward(x)[source]¶: Input shape $(batch, feats, seq)$.

class asteroid.masknn.convolutional.TDConvNet(in_chan, n_src, out_chan=None, n_blocks=8, n_repeats=3, bn_chan=128, hid_chan=512, skip_chan=128, conv_kernel_size=3, norm_type='gLN', mask_act='relu')[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

Temporal Convolutional network used in ConvTasnet.

Parameters:

in_chan (int) – Number of input filters.
n_src (int) – Number of masks to estimate.
out_chan (int, optional) – Number of bins in the estimated masks. If None, out_chan = in_chan.
n_blocks (int, optional) – Number of convolutional blocks in each repeat. Defaults to 8.
n_repeats (int, optional) – Number of repeats. Defaults to 3.
bn_chan (int, optional) – Number of channels after the bottleneck.
hid_chan (int, optional) – Number of channels in the convolutional blocks.
skip_chan (int, optional) – Number of channels in the skip connections. If 0 or None, TDConvNet won’t have any skip connections and the masks will be computed from the residual output. Corresponds to the ConvTasnet architecture in v1 or the paper.
conv_kernel_size (int, optional) – Kernel size in convolutional blocks.
norm_type (str, optional) – To choose from 'BN', 'gLN', 'cLN'.
mask_act (str, optional) – Which non-linear function to generate mask.

References: [1] : “Conv-TasNet: Surpassing ideal time-frequency magnitude masking for speech separation” TASLP 2019 Yi Luo, Nima Mesgarani https://arxiv.org/abs/1809.07454

forward(mixture_w)[source]¶

Forward.

Parameters:	mixture_w (`torch.Tensor`) – Tensor of shape $(batch, nfilters, nframes)$
Returns:	`torch.Tensor` – estimated mask of shape $(batch, nsrc, nfilters, nframes)$

get_config()[source]¶

class asteroid.masknn.convolutional.TDConvNetpp(in_chan, n_src, out_chan=None, n_blocks=8, n_repeats=3, bn_chan=128, hid_chan=512, skip_chan=128, conv_kernel_size=3, norm_type='fgLN', mask_act='relu')[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

Improved Temporal Convolutional network used in [1] (TDCN++)

Parameters:

in_chan (int) – Number of input filters.
n_src (int) – Number of masks to estimate.
out_chan (int, optional) – Number of bins in the estimated masks. If None, out_chan = in_chan.
n_blocks (int, optional) – Number of convolutional blocks in each repeat. Defaults to 8.
n_repeats (int, optional) – Number of repeats. Defaults to 3.
bn_chan (int, optional) – Number of channels after the bottleneck.
hid_chan (int, optional) – Number of channels in the convolutional blocks.
skip_chan (int, optional) – Number of channels in the skip connections. If 0 or None, TDConvNet won’t have any skip connections and the masks will be computed from the residual output. Corresponds to the ConvTasnet architecture in v1 or the paper.
kernel_size (int, optional) – Kernel size in convolutional blocks.
norm_type (str, optional) – To choose from 'BN', 'gLN', 'cLN'.
mask_act (str, optional) – Which non-linear function to generate mask.

References: [1] : Kavalerov, Ilya et al. “Universal Sound Separation.” in WASPAA 2019

Note

The differences wrt to ConvTasnet’s TCN are:

Channel wise layer norm instead of global
Longer-range skip-residual connections from earlier repeat inputs to later repeat inputs after passing them through dense layer.
Learnable scaling parameter after each dense layer. The scaling parameter for the second dense layer in each convolutional block (which is applied rightbefore the residual connection) is initialized to an exponentially decaying scalar equal to 0.9**L, where L is the layer or block index.

forward(mixture_w)[source]¶

Forward.

Parameters:	mixture_w (`torch.Tensor`) – Tensor of shape $(batch, nfilters, nframes)$
Returns:	`torch.Tensor` – estimated mask of shape $(batch, nsrc, nfilters, nframes)$

get_config()[source]¶

class asteroid.masknn.convolutional.DCUNetComplexEncoderBlock(in_chan, out_chan, kernel_size, stride, padding, norm_type='bN', activation='leaky_relu')[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

Encoder block as proposed in [1].

Parameters:

in_chan (int) – Number of input channels.
out_chan (int) – Number of output channels.
kernel_size (Tuple[int, int]) – Convolution kernel size.
stride (Tuple[int, int]) – Convolution stride.
padding (Tuple[int, int]) – Convolution padding.
norm_type (str, optional) – Type of normalization to use. See norms for valid values.
activation (str, optional) – Type of activation to use. See activations for valid values.

References: [1] : “Phase-aware Speech Enhancement with Deep Complex U-Net”, Hyeong-Seok Choi et al. https://arxiv.org/abs/1903.03107

forward(x: <sphinx.ext.autodoc.importer._MockObject object at 0x7f85c9a0b490>)[source]¶

class asteroid.masknn.convolutional.DCUNetComplexDecoderBlock(in_chan, out_chan, kernel_size, stride, padding, output_padding=(0, 0), norm_type='bN', activation='leaky_relu')[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

Decoder block as proposed in [1].

Parameters:

in_chan (int) – Number of input channels.
out_chan (int) – Number of output channels.
kernel_size (Tuple[int, int]) – Convolution kernel size.
stride (Tuple[int, int]) – Convolution stride.
padding (Tuple[int, int]) – Convolution padding.
norm_type (str, optional) – Type of normalization to use. See norms for valid values.
activation (str, optional) – Type of activation to use. See activations for valid values.

References: [1] : “Phase-aware Speech Enhancement with Deep Complex U-Net”, Hyeong-Seok Choi et al. https://arxiv.org/abs/1903.03107

forward(x: <sphinx.ext.autodoc.importer._MockObject object at 0x7f85c9a0b490>)[source]¶

class asteroid.masknn.convolutional.DCUMaskNet(encoders, decoders, fix_length_mode=None, **kwargs)[source]¶

Bases: asteroid.masknn.base.BaseDCUMaskNet

Masking part of DCUNet, as proposed in [1].

Valid architecture values for the default_architecture classmethod are: “Large-DCUNet-20”, “DCUNet-20”, “DCUNet-16”, “DCUNet-10” and “mini”.

Valid fix_length_mode values are [None, “pad”, “trim”].

Input shape is expected to be $(batch, nfreqs, time)$, with $nfreqs - 1$ divisible by $f_0 * f_1 * … * f_N$ where $f_k$ are the frequency strides of the encoders, and $time - 1$ is divisible by $t_0 * t_1 * … * t_N$ where $t_N$ are the time strides of the encoders.

References: [1] : “Phase-aware Speech Enhancement with Deep Complex U-Net”, Hyeong-Seok Choi et al. https://arxiv.org/abs/1903.03107

fix_input_dims(x)[source]¶: Overwrite this in subclasses to implement input dimension checks.

fix_output_dims(out, x)[source]¶: Overwrite this in subclasses to implement output dimension checks. y is the output and x was the input (passed to use the shape).

class asteroid.masknn.convolutional.SuDORMRF(in_chan, n_src, bn_chan=128, num_blocks=16, upsampling_depth=4, mask_act='softmax')[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

SuDORMRF mask network, as described in [1].

Parameters:

in_chan (int) – Number of input channels. Also number of output channels.
n_src (int) – Number of sources in the input mixtures.
bn_chan (int, optional) – Number of bins in the bottleneck layer and the UNet blocks.
num_blocks (int) – Number of of UBlocks.
upsampling_depth (int) – Depth of upsampling.
mask_act (str) – Name of output activation.

References: [1] : “Sudo rm -rf: Efficient Networks for Universal Audio Source Separation”, Tzinis et al. MLSP 2020.

forward(x)[source]¶

get_config()[source]¶

class asteroid.masknn.convolutional.SuDORMRFImproved(in_chan, n_src, bn_chan=128, num_blocks=16, upsampling_depth=4, mask_act='relu')[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

Improved SuDORMRF mask network, as described in [1].

Parameters:	in_chan (int) – Number of input channels. Also number of output channels. n_src (int) – Number of sources in the input mixtures. bn_chan (int, optional) – Number of bins in the bottleneck layer and the UNet blocks. num_blocks (int) – Number of of UBlocks upsampling_depth (int) – Depth of upsampling mask_act (str) – Name of output activation.

References: [1] : “Sudo rm -rf: Efficient Networks for Universal Audio Source Separation”, Tzinis et al. MLSP 2020.

forward(x)[source]¶

get_config()[source]¶

class asteroid.masknn.convolutional.UBlock(out_chan=128, in_chan=512, upsampling_depth=4)[source]¶

Bases: asteroid.masknn.convolutional._BaseUBlock

Upsampling block.

Based on the following principle: REDUCE ---> SPLIT ---> TRANSFORM --> MERGE

forward(x)[source]¶

Parameters:	x – input feature map
Returns:	transformed feature map

class asteroid.masknn.convolutional.UConvBlock(out_chan=128, in_chan=512, upsampling_depth=4)[source]¶

Bases: asteroid.masknn.convolutional._BaseUBlock

Block which performs successive downsampling and upsampling in order to be able to analyze the input features in multiple resolutions.

forward(x)[source]¶

Args: x: input feature map

Returns:	transformed feature map