asteroid.masknn.convolutional module¶
-
class
asteroid.masknn.convolutional.
Conv1DBlock
(in_chan, hid_chan, skip_out_chan, kernel_size, padding, dilation, norm_type='gLN')[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
One dimensional convolutional block, as proposed in [1].
Parameters: - in_chan (int) – Number of input channels.
- hid_chan (int) – Number of hidden channels in the depth-wise convolution.
- skip_out_chan (int) – Number of channels in the skip convolution. If 0 or None, Conv1DBlock won’t have any skip connections. Corresponds to the the block in v1 or the paper. The forward return res instead of [res, skip] in this case.
- kernel_size (int) – Size of the depth-wise convolutional kernel.
- padding (int) – Padding of the depth-wise convolution.
- dilation (int) – Dilation of the depth-wise convolution.
- norm_type (str, optional) –
Type of normalization to use. To choose from
'gLN'
: global Layernorm'cLN'
: channelwise Layernorm'cgLN'
: cumulative global Layernorm
References
[1] : “Conv-TasNet: Surpassing ideal time-frequency magnitude masking for speech separation” TASLP 2019 Yi Luo, Nima Mesgarani https://arxiv.org/abs/1809.07454
-
class
asteroid.masknn.convolutional.
DCUMaskNet
(encoders, decoders, mask_bound='tanh', **kwargs)[source]¶ Bases:
asteroid.masknn.base.BaseDCUMaskNet
Masking part of DCUNet, as proposed in [1].
Valid architecture values for the
default_architecture
classmethod are: “Large-DCUNet-20”, “DCUNet-20”, “DCUNet-16”, “DCUNet-10”.References
[1] : “Phase-aware Speech Enhancement with Deep Complex U-Net”, Hyeong-Seok Choi et al. https://arxiv.org/abs/1903.03107
-
class
asteroid.masknn.convolutional.
DCUNetComplexDecoderBlock
(in_chan, out_chan, kernel_size, stride, padding, norm_type='bN', activation='leaky_relu')[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
Decoder block as proposed in [1].
Parameters: - in_chan (int) – Number of input channels.
- out_chan (int) – Number of output channels.
- kernel_size (Tuple[int, int]) – Convolution kernel size.
- stride (Tuple[int, int]) – Convolution stride.
- padding (Tuple[int, int]) – Convolution padding.
- norm_type (str, optional) – Type of normalization to use.
See
asteroid.masknn.norms
for valid values. - activation (str, optional) – Type of activation to use.
See
asteroid.masknn.activations
for valid values.
References
[1] : “Phase-aware Speech Enhancement with Deep Complex U-Net”, Hyeong-Seok Choi et al. https://arxiv.org/abs/1903.03107
-
class
asteroid.masknn.convolutional.
DCUNetComplexEncoderBlock
(in_chan, out_chan, kernel_size, stride, padding, norm_type='bN', activation='leaky_relu')[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
Encoder block as proposed in [1].
Parameters: - in_chan (int) – Number of input channels.
- out_chan (int) – Number of output channels.
- kernel_size (Tuple[int, int]) – Convolution kernel size.
- stride (Tuple[int, int]) – Convolution stride.
- padding (Tuple[int, int]) – Convolution padding.
- norm_type (str, optional) – Type of normalization to use.
See
asteroid.masknn.norms
for valid values. - activation (str, optional) – Type of activation to use.
See
asteroid.masknn.activations
for valid values.
References
[1] : “Phase-aware Speech Enhancement with Deep Complex U-Net”, Hyeong-Seok Choi et al. https://arxiv.org/abs/1903.03107
-
class
asteroid.masknn.convolutional.
SuDORMRF
(in_chan, n_src, bn_chan=128, num_blocks=16, upsampling_depth=4, mask_act='softmax')[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
SuDORMRF mask network, as described in [1].
Parameters: - in_chan (int) – Number of input channels. Also number of output channels.
- n_src (int) – Number of sources in the input mixtures.
- bn_chan (int, optional) – Number of bins in the bottleneck layer and the UNet blocks.
- num_blocks (int) – Number of of UBlocks.
- upsampling_depth (int) – Depth of upsampling.
- mask_act (str) – Name of output activation.
References
- [1] : “Sudo rm -rf: Efficient Networks for Universal Audio Source Separation”,
- Tzinis et al. MLSP 2020.
-
class
asteroid.masknn.convolutional.
SuDORMRFImproved
(in_chan, n_src, bn_chan=128, num_blocks=16, upsampling_depth=4, mask_act='relu')[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
Improved SuDORMRF mask network, as described in [1].
Parameters: - in_chan (int) – Number of input channels. Also number of output channels.
- n_src (int) – Number of sources in the input mixtures.
- bn_chan (int, optional) – Number of bins in the bottleneck layer and the UNet blocks.
- num_blocks (int) – Number of of UBlocks
- upsampling_depth (int) – Depth of upsampling
- mask_act (str) – Name of output activation.
References
- [1] : “Sudo rm -rf: Efficient Networks for Universal Audio Source Separation”,
- Tzinis et al. MLSP 2020.
-
class
asteroid.masknn.convolutional.
TDConvNet
(in_chan, n_src, out_chan=None, n_blocks=8, n_repeats=3, bn_chan=128, hid_chan=512, skip_chan=128, conv_kernel_size=3, norm_type='gLN', mask_act='relu', kernel_size=None)[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
Temporal Convolutional network used in ConvTasnet.
Parameters: - in_chan (int) – Number of input filters.
- n_src (int) – Number of masks to estimate.
- out_chan (int, optional) – Number of bins in the estimated masks.
If
None
, out_chan = in_chan. - n_blocks (int, optional) – Number of convolutional blocks in each repeat. Defaults to 8.
- n_repeats (int, optional) – Number of repeats. Defaults to 3.
- bn_chan (int, optional) – Number of channels after the bottleneck.
- hid_chan (int, optional) – Number of channels in the convolutional blocks.
- skip_chan (int, optional) – Number of channels in the skip connections. If 0 or None, TDConvNet won’t have any skip connections and the masks will be computed from the residual output. Corresponds to the ConvTasnet architecture in v1 or the paper.
- conv_kernel_size (int, optional) – Kernel size in convolutional blocks.
- norm_type (str, optional) – To choose from
'BN'
,'gLN'
,'cLN'
. - mask_act (str, optional) – Which non-linear function to generate mask.
References
[1] : “Conv-TasNet: Surpassing ideal time-frequency magnitude masking for speech separation” TASLP 2019 Yi Luo, Nima Mesgarani https://arxiv.org/abs/1809.07454
-
forward
(mixture_w)[source]¶ Parameters: mixture_w ( torch.Tensor
) – Tensor of shape [batch, n_filters, n_frames]Returns: torch.Tensor
– estimated mask of shape [batch, n_src, n_filters, n_frames]
-
class
asteroid.masknn.convolutional.
TDConvNetpp
(in_chan, n_src, out_chan=None, n_blocks=8, n_repeats=3, bn_chan=128, hid_chan=512, skip_chan=128, conv_kernel_size=3, norm_type='fgLN', mask_act='relu')[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
Improved Temporal Convolutional network used in [1] (TDCN++)
Parameters: - in_chan (int) – Number of input filters.
- n_src (int) – Number of masks to estimate.
- out_chan (int, optional) – Number of bins in the estimated masks.
If
None
, out_chan = in_chan. - n_blocks (int, optional) – Number of convolutional blocks in each repeat. Defaults to 8.
- n_repeats (int, optional) – Number of repeats. Defaults to 3.
- bn_chan (int, optional) – Number of channels after the bottleneck.
- hid_chan (int, optional) – Number of channels in the convolutional blocks.
- skip_chan (int, optional) – Number of channels in the skip connections. If 0 or None, TDConvNet won’t have any skip connections and the masks will be computed from the residual output. Corresponds to the ConvTasnet architecture in v1 or the paper.
- kernel_size (int, optional) – Kernel size in convolutional blocks.
- norm_type (str, optional) – To choose from
'BN'
,'gLN'
,'cLN'
. - mask_act (str, optional) – Which non-linear function to generate mask.
References
[1] : Kavalerov, Ilya et al. “Universal Sound Separation.” in WASPAA 2019
Notes
The differences wrt to ConvTasnet’s TCN are 1. Channel wise layer norm instead of global 2. Longer-range skip-residual connections from earlier repeat inputs
to later repeat inputs after passing them through dense layer.- Learnable scaling parameter after each dense layer. The scaling
- parameter for the second dense layer in each convolutional block (which is applied rightbefore the residual connection) is initialized to an exponentially decaying scalar equal to 0.9**L, where L is the layer or block index.
-
forward
(mixture_w)[source]¶ Parameters: mixture_w ( torch.Tensor
) – Tensor of shape [batch, n_filters, n_frames]Returns: torch.Tensor
– estimated mask of shape [batch, n_src, n_filters, n_frames]
-
class
asteroid.masknn.convolutional.
UBlock
(out_chan=128, in_chan=512, upsampling_depth=4)[source]¶ Bases:
asteroid.masknn.convolutional._BaseUBlock
Upsampling block.
- Based on the following principle:
REDUCE ---> SPLIT ---> TRANSFORM --> MERGE