DNN building blocks¶
Convolutional blocks¶
-
class
asteroid.masknn.convolutional.
Conv1DBlock
(in_chan, hid_chan, skip_out_chan, kernel_size, padding, dilation, norm_type='gLN', causal=False)[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
One dimensional convolutional block, as proposed in [1].
Parameters: - in_chan (int) – Number of input channels.
- hid_chan (int) – Number of hidden channels in the depth-wise convolution.
- skip_out_chan (int) – Number of channels in the skip convolution. If 0 or None, Conv1DBlock won’t have any skip connections. Corresponds to the the block in v1 or the paper. The forward return res instead of [res, skip] in this case.
- kernel_size (int) – Size of the depth-wise convolutional kernel.
- padding (int) – Padding of the depth-wise convolution.
- dilation (int) – Dilation of the depth-wise convolution.
- norm_type (str, optional) –
Type of normalization to use. To choose from
'gLN'
: global Layernorm.'cLN'
: channelwise Layernorm.'cgLN'
: cumulative global Layernorm.- Any norm supported by
get()
- causal (bool, optional) – Whether or not the convolutions are causal
- References
- [1] : “Conv-TasNet: Surpassing ideal time-frequency magnitude masking for speech separation” TASLP 2019 Yi Luo, Nima Mesgarani https://arxiv.org/abs/1809.07454
-
class
asteroid.masknn.convolutional.
TDConvNet
(in_chan, n_src, out_chan=None, n_blocks=8, n_repeats=3, bn_chan=128, hid_chan=512, skip_chan=128, conv_kernel_size=3, norm_type='gLN', mask_act='relu', causal=False)[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
Temporal Convolutional network used in ConvTasnet.
Parameters: - in_chan (int) – Number of input filters.
- n_src (int) – Number of masks to estimate.
- out_chan (int, optional) – Number of bins in the estimated masks.
If
None
, out_chan = in_chan. - n_blocks (int, optional) – Number of convolutional blocks in each repeat. Defaults to 8.
- n_repeats (int, optional) – Number of repeats. Defaults to 3.
- bn_chan (int, optional) – Number of channels after the bottleneck.
- hid_chan (int, optional) – Number of channels in the convolutional blocks.
- skip_chan (int, optional) – Number of channels in the skip connections. If 0 or None, TDConvNet won’t have any skip connections and the masks will be computed from the residual output. Corresponds to the ConvTasnet architecture in v1 or the paper.
- conv_kernel_size (int, optional) – Kernel size in convolutional blocks.
- norm_type (str, optional) – To choose from
'BN'
,'gLN'
,'cLN'
. - mask_act (str, optional) – Which non-linear function to generate mask.
- causal (bool, optional) – Whether or not the convolutions are causal.
- References
- [1] : “Conv-TasNet: Surpassing ideal time-frequency magnitude masking for speech separation” TASLP 2019 Yi Luo, Nima Mesgarani https://arxiv.org/abs/1809.07454
-
forward
(mixture_w)[source]¶ Forward.
Parameters: mixture_w ( torch.Tensor
) – Tensor of shape $(batch, nfilters, nframes)$Returns: torch.Tensor
– estimated mask of shape $(batch, nsrc, nfilters, nframes)$
-
class
asteroid.masknn.convolutional.
TDConvNetpp
(in_chan, n_src, out_chan=None, n_blocks=8, n_repeats=3, bn_chan=128, hid_chan=512, skip_chan=128, conv_kernel_size=3, norm_type='fgLN', mask_act='relu')[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
Improved Temporal Convolutional network used in [1] (TDCN++)
Parameters: - in_chan (int) – Number of input filters.
- n_src (int) – Number of masks to estimate.
- out_chan (int, optional) – Number of bins in the estimated masks.
If
None
, out_chan = in_chan. - n_blocks (int, optional) – Number of convolutional blocks in each repeat. Defaults to 8.
- n_repeats (int, optional) – Number of repeats. Defaults to 3.
- bn_chan (int, optional) – Number of channels after the bottleneck.
- hid_chan (int, optional) – Number of channels in the convolutional blocks.
- skip_chan (int, optional) – Number of channels in the skip connections. If 0 or None, TDConvNet won’t have any skip connections and the masks will be computed from the residual output. Corresponds to the ConvTasnet architecture in v1 or the paper.
- kernel_size (int, optional) – Kernel size in convolutional blocks.
- norm_type (str, optional) – To choose from
'BN'
,'gLN'
,'cLN'
. - mask_act (str, optional) – Which non-linear function to generate mask.
- References
- [1] : Kavalerov, Ilya et al. “Universal Sound Separation.” in WASPAA 2019
Note
The differences wrt to ConvTasnet’s TCN are:
- Channel wise layer norm instead of global
- Longer-range skip-residual connections from earlier repeat inputs to later repeat inputs after passing them through dense layer.
- Learnable scaling parameter after each dense layer. The scaling parameter for the second dense layer in each convolutional block (which is applied rightbefore the residual connection) is initialized to an exponentially decaying scalar equal to 0.9**L, where L is the layer or block index.
-
forward
(mixture_w)[source]¶ Forward.
Parameters: mixture_w ( torch.Tensor
) – Tensor of shape $(batch, nfilters, nframes)$Returns: torch.Tensor
– estimated mask of shape $(batch, nsrc, nfilters, nframes)$
-
class
asteroid.masknn.convolutional.
DCUNetComplexEncoderBlock
(in_chan, out_chan, kernel_size, stride, padding, norm_type='bN', activation='leaky_relu')[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
Encoder block as proposed in [1].
Parameters: - in_chan (int) – Number of input channels.
- out_chan (int) – Number of output channels.
- kernel_size (Tuple[int, int]) – Convolution kernel size.
- stride (Tuple[int, int]) – Convolution stride.
- padding (Tuple[int, int]) – Convolution padding.
- norm_type (str, optional) – Type of normalization to use.
See
norms
for valid values. - activation (str, optional) – Type of activation to use.
See
activations
for valid values.
- References
- [1] : “Phase-aware Speech Enhancement with Deep Complex U-Net”, Hyeong-Seok Choi et al. https://arxiv.org/abs/1903.03107
-
class
asteroid.masknn.convolutional.
DCUNetComplexDecoderBlock
(in_chan, out_chan, kernel_size, stride, padding, output_padding=(0, 0), norm_type='bN', activation='leaky_relu')[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
Decoder block as proposed in [1].
Parameters: - in_chan (int) – Number of input channels.
- out_chan (int) – Number of output channels.
- kernel_size (Tuple[int, int]) – Convolution kernel size.
- stride (Tuple[int, int]) – Convolution stride.
- padding (Tuple[int, int]) – Convolution padding.
- norm_type (str, optional) – Type of normalization to use.
See
norms
for valid values. - activation (str, optional) – Type of activation to use.
See
activations
for valid values.
- References
- [1] : “Phase-aware Speech Enhancement with Deep Complex U-Net”, Hyeong-Seok Choi et al. https://arxiv.org/abs/1903.03107
-
class
asteroid.masknn.convolutional.
DCUMaskNet
(encoders, decoders, fix_length_mode=None, **kwargs)[source]¶ Bases:
asteroid.masknn.base.BaseDCUMaskNet
Masking part of DCUNet, as proposed in [1].
Valid architecture values for the
default_architecture
classmethod are: “Large-DCUNet-20”, “DCUNet-20”, “DCUNet-16”, “DCUNet-10” and “mini”.Valid fix_length_mode values are [None, “pad”, “trim”].
Input shape is expected to be $(batch, nfreqs, time)$, with $nfreqs - 1$ divisible by $f_0 * f_1 * … * f_N$ where $f_k$ are the frequency strides of the encoders, and $time - 1$ is divisible by $t_0 * t_1 * … * t_N$ where $t_N$ are the time strides of the encoders.
- References
- [1] : “Phase-aware Speech Enhancement with Deep Complex U-Net”, Hyeong-Seok Choi et al. https://arxiv.org/abs/1903.03107
-
class
asteroid.masknn.convolutional.
SuDORMRF
(in_chan, n_src, bn_chan=128, num_blocks=16, upsampling_depth=4, mask_act='softmax')[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
SuDORMRF mask network, as described in [1].
Parameters: - in_chan (int) – Number of input channels. Also number of output channels.
- n_src (int) – Number of sources in the input mixtures.
- bn_chan (int, optional) – Number of bins in the bottleneck layer and the UNet blocks.
- num_blocks (int) – Number of of UBlocks.
- upsampling_depth (int) – Depth of upsampling.
- mask_act (str) – Name of output activation.
- References
- [1] : “Sudo rm -rf: Efficient Networks for Universal Audio Source Separation”, Tzinis et al. MLSP 2020.
-
class
asteroid.masknn.convolutional.
SuDORMRFImproved
(in_chan, n_src, bn_chan=128, num_blocks=16, upsampling_depth=4, mask_act='relu')[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
Improved SuDORMRF mask network, as described in [1].
Parameters: - in_chan (int) – Number of input channels. Also number of output channels.
- n_src (int) – Number of sources in the input mixtures.
- bn_chan (int, optional) – Number of bins in the bottleneck layer and the UNet blocks.
- num_blocks (int) – Number of of UBlocks
- upsampling_depth (int) – Depth of upsampling
- mask_act (str) – Name of output activation.
- References
- [1] : “Sudo rm -rf: Efficient Networks for Universal Audio Source Separation”, Tzinis et al. MLSP 2020.
-
class
asteroid.masknn.convolutional.
UBlock
(out_chan=128, in_chan=512, upsampling_depth=4)[source]¶ Bases:
asteroid.masknn.convolutional._BaseUBlock
Upsampling block.
Based on the following principle:
REDUCE ---> SPLIT ---> TRANSFORM --> MERGE
Recurrent blocks¶
-
class
asteroid.masknn.recurrent.
SingleRNN
(rnn_type, input_size, hidden_size, n_layers=1, dropout=0, bidirectional=False)[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
Module for a RNN block.
Inspired from https://github.com/yluo42/TAC/blob/master/utility/models.py Licensed under CC BY-NC-SA 3.0 US.
Parameters: - rnn_type (str) – Select from
'RNN'
,'LSTM'
,'GRU'
. Can also be passed in lowercase letters. - input_size (int) – Dimension of the input feature. The input should have shape [batch, seq_len, input_size].
- hidden_size (int) – Dimension of the hidden state.
- n_layers (int, optional) – Number of layers used in RNN. Default is 1.
- dropout (float, optional) – Dropout ratio. Default is 0.
- bidirectional (bool, optional) – Whether the RNN layers are
bidirectional. Default is
False
.
- rnn_type (str) – Select from
-
class
asteroid.masknn.recurrent.
MulCatRNN
(rnn_type, input_size, hidden_size, n_layers=1, dropout=0, bidirectional=False)[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
MulCat RNN block from [1].
Composed of two RNNs, returns
cat([RNN_1(x) * RNN_2(x), x])
.Parameters: - rnn_type (str) – Select from
'RNN'
,'LSTM'
,'GRU'
. Can also be passed in lowercase letters. - input_size (int) – Dimension of the input feature. The input should have shape [batch, seq_len, input_size].
- hidden_size (int) – Dimension of the hidden state.
- n_layers (int, optional) – Number of layers used in RNN. Default is 1.
- dropout (float, optional) – Dropout ratio. Default is 0.
- bidirectional (bool, optional) – Whether the RNN layers are
bidirectional. Default is
False
.
- References
- [1] Eliya Nachmani, Yossi Adi, & Lior Wolf. (2020). Voice Separation with an Unknown Number of Multiple Speakers.
- rnn_type (str) – Select from
-
class
asteroid.masknn.recurrent.
StackedResidualRNN
(rnn_type, n_units, n_layers=4, dropout=0.0, bidirectional=False)[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
Stacked RNN with builtin residual connection. Only supports forward RNNs. See StackedResidualBiRNN for bidirectional ones.
Parameters: - rnn_type (str) – Select from
'RNN'
,'LSTM'
,'GRU'
. Can also be passed in lowercase letters. - n_units (int) – Number of units in recurrent layers. This will also be the expected input size.
- n_layers (int) – Number of recurrent layers.
- dropout (float) – Dropout value, between 0. and 1. (Default: 0.)
- bidirectional (bool) – If True, use bidirectional RNN, else unidirectional. (Default: False)
- rnn_type (str) – Select from
-
class
asteroid.masknn.recurrent.
StackedResidualBiRNN
(rnn_type, n_units, n_layers=4, dropout=0.0, bidirectional=True)[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
Stacked Bidirectional RNN with builtin residual connection. Residual connections are applied on both RNN directions. Only supports bidiriectional RNNs. See StackedResidualRNN for unidirectional ones.
Parameters: - rnn_type (str) – Select from
'RNN'
,'LSTM'
,'GRU'
. Can also be passed in lowercase letters. - n_units (int) – Number of units in recurrent layers. This will also be the expected input size.
- n_layers (int) – Number of recurrent layers.
- dropout (float) – Dropout value, between 0. and 1. (Default: 0.)
- bidirectional (bool) – If True, use bidirectional RNN, else unidirectional. (Default: False)
- rnn_type (str) – Select from
-
class
asteroid.masknn.recurrent.
DPRNNBlock
(in_chan, hid_size, norm_type='gLN', bidirectional=True, rnn_type='LSTM', use_mulcat=False, num_layers=1, dropout=0)[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
Dual-Path RNN Block as proposed in [1].
Parameters: - in_chan (int) – Number of input channels.
- hid_size (int) – Number of hidden neurons in the RNNs.
- norm_type (str, optional) – Type of normalization to use. To choose from
-
'gLN'
: global Layernorm -'cLN'
: channelwise Layernorm - bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN.
- rnn_type (str, optional) – Type of RNN used. Choose from
'RNN'
,'LSTM'
and'GRU'
. - num_layers (int, optional) – Number of layers used in each RNN.
- dropout (float, optional) – Dropout ratio. Must be in [0, 1].
- References
- [1] “Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation”, Yi Luo, Zhuo Chen and Takuya Yoshioka. https://arxiv.org/abs/1910.06379
-
class
asteroid.masknn.recurrent.
DPRNN
(in_chan, n_src, out_chan=None, bn_chan=128, hid_size=128, chunk_size=100, hop_size=None, n_repeats=6, norm_type='gLN', mask_act='relu', bidirectional=True, rnn_type='LSTM', use_mulcat=False, num_layers=1, dropout=0)[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
- Dual-path RNN Network for Single-Channel Source Separation
- introduced in [1].
Parameters: - in_chan (int) – Number of input filters.
- n_src (int) – Number of masks to estimate.
- out_chan (int or None) – Number of bins in the estimated masks. Defaults to in_chan.
- bn_chan (int) – Number of channels after the bottleneck. Defaults to 128.
- hid_size (int) – Number of neurons in the RNNs cell state. Defaults to 128.
- chunk_size (int) – window size of overlap and add processing. Defaults to 100.
- hop_size (int or None) – hop size (stride) of overlap and add processing. Default to chunk_size // 2 (50% overlap).
- n_repeats (int) – Number of repeats. Defaults to 6.
- norm_type (str, optional) –
Type of normalization to use. To choose from
'gLN'
: global Layernorm'cLN'
: channelwise Layernorm
- mask_act (str, optional) – Which non-linear function to generate mask.
- bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN (Intra-Chunk is always bidirectional).
- rnn_type (str, optional) – Type of RNN used. Choose between
'RNN'
,'LSTM'
and'GRU'
. - num_layers (int, optional) – Number of layers in each RNN.
- dropout (float, optional) – Dropout ratio, must be in [0,1].
- References
- [1] “Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation”, Yi Luo, Zhuo Chen and Takuya Yoshioka. https://arxiv.org/abs/1910.06379
-
forward
(mixture_w)[source]¶ Forward.
Parameters: mixture_w ( torch.Tensor
) – Tensor of shape $(batch, nfilters, nframes)$Returns: torch.Tensor
– estimated mask of shape $(batch, nsrc, nfilters, nframes)$
-
class
asteroid.masknn.recurrent.
LSTMMasker
(in_chan, n_src, out_chan=None, rnn_type='lstm', n_layers=4, hid_size=512, dropout=0.3, mask_act='sigmoid', bidirectional=True)[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
LSTM mask network introduced in [1], without skip connections.
Parameters: - in_chan (int) – Number of input filters.
- n_src (int) – Number of masks to estimate.
- out_chan (int or None) – Number of bins in the estimated masks. Defaults to in_chan.
- rnn_type (str, optional) – Type of RNN used. Choose between
'RNN'
,'LSTM'
and'GRU'
. - n_layers (int, optional) – Number of layers in each RNN.
- hid_size (int) – Number of neurons in the RNNs cell state.
- mask_act (str, optional) – Which non-linear function to generate mask.
- bidirectional (bool, optional) – Whether to use BiLSTM
- dropout (float, optional) – Dropout ratio, must be in [0,1].
- References
- [1]: Yi Luo et al. “Real-time Single-channel Dereverberation and Separation with Time-domain Audio Separation Network”, Interspeech 2018
-
class
asteroid.masknn.recurrent.
DCCRMaskNetRNN
(in_size, hid_size=128, rnn_type='LSTM', n_layers=2, norm_type=None, **rnn_kwargs)[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
RNN (LSTM) layer between encoders and decoders introduced in [1].
Parameters: - in_size (int) – Number of inputs to the RNN. Must be the product of non-batch, non-time dimensions of output shape of last encoder, i.e. if the last encoder output shape is $(batch, nchans, nfreqs, time)$, in_size must be $nchans * nfreqs$.
- hid_size (int, optional) – Number of units in RNN.
- rnn_type (str, optional) – Type of RNN to use. See
SingleRNN
for valid values. - n_layers (int, optional) – Number of layers used in RNN.
- norm_type (Optional[str], optional) – Norm to use after linear.
See
asteroid.masknn.norms
for valid values. (Not used in [1]). - rnn_kwargs (optional) – Passed to
SingleRNN()
.
- References
- [1] : “DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement”, Yanxin Hu et al. https://arxiv.org/abs/2008.00264
-
class
asteroid.masknn.recurrent.
DCCRMaskNet
(encoders, decoders, n_freqs, **kwargs)[source]¶ Bases:
asteroid.masknn.base.BaseDCUMaskNet
Masking part of DCCRNet, as proposed in [1].
Valid architecture values for the
default_architecture
classmethod are: “DCCRN” and “mini”.Parameters: - encoders (list of length N of tuples of (in_chan, out_chan, kernel_size, stride, padding)) – Arguments of encoders of the u-net
- decoders (list of length N of tuples of (in_chan, out_chan, kernel_size, stride, padding)) – Arguments of decoders of the u-net
- n_freqs (int) – Number of frequencies (dim 1) of input to
.forward()
. Must be divisible by $f_0 * f_1 * … * f_N$ where $f_k$ are the frequency strides of the encoders.
Input shape is expected to be $(batch, nfreqs, time)$, with $nfreqs$ divisible by $f_0 * f_1 * … * f_N$ where $f_k$ are the frequency strides of the encoders.
- References
- [1] : “DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement”, Yanxin Hu et al. https://arxiv.org/abs/2008.00264
Attention blocks¶
-
class
asteroid.masknn.attention.
ImprovedTransformedLayer
(embed_dim, n_heads, dim_ff, dropout=0.0, activation='relu', bidirectional=True, norm='gLN')[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
Improved Transformer module as used in [1]. It is Multi-Head self-attention followed by LSTM, activation and linear projection layer.
Parameters: - embed_dim (int) – Number of input channels.
- n_heads (int) – Number of attention heads.
- dim_ff (int) – Number of neurons in the RNNs cell state. Defaults to 256. RNN here replaces standard FF linear layer in plain Transformer.
- dropout (float, optional) – Dropout ratio, must be in [0,1].
- activation (str, optional) – activation function applied at the output of RNN.
- bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN (Intra-Chunk is always bidirectional).
- norm (str, optional) – Type of normalization to use.
- References
- [1] Chen, Jingjing, Qirong Mao, and Dong Liu. “Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation.” arXiv (2020).
-
class
asteroid.masknn.attention.
DPTransformer
(in_chan, n_src, n_heads=4, ff_hid=256, chunk_size=100, hop_size=None, n_repeats=6, norm_type='gLN', ff_activation='relu', mask_act='relu', bidirectional=True, dropout=0)[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
Dual-path Transformer introduced in [1].
Parameters: - in_chan (int) – Number of input filters.
- n_src (int) – Number of masks to estimate.
- n_heads (int) – Number of attention heads.
- ff_hid (int) – Number of neurons in the RNNs cell state. Defaults to 256.
- chunk_size (int) – window size of overlap and add processing. Defaults to 100.
- hop_size (int or None) – hop size (stride) of overlap and add processing. Default to chunk_size // 2 (50% overlap).
- n_repeats (int) – Number of repeats. Defaults to 6.
- norm_type (str, optional) – Type of normalization to use.
- ff_activation (str, optional) – activation function applied at the output of RNN.
- mask_act (str, optional) – Which non-linear function to generate mask.
- bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN (Intra-Chunk is always bidirectional).
- dropout (float, optional) – Dropout ratio, must be in [0,1].
- References
- [1] Chen, Jingjing, Qirong Mao, and Dong Liu. “Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation.” arXiv (2020).
-
forward
(mixture_w)[source]¶ Forward.
Parameters: mixture_w ( torch.Tensor
) – Tensor of shape $(batch, nfilters, nframes)$Returns: torch.Tensor
– estimated mask of shape $(batch, nsrc, nfilters, nframes)$
Norms¶
-
class
asteroid.masknn.norms.
GlobLN
(channel_size)[source]¶ Bases:
asteroid.masknn.norms._LayerNorm
Global Layer Normalization (globLN).
-
forward
(x, EPS: float = 1e-08)[source]¶ Applies forward pass.
Works for any input size > 2D.
Parameters: x ( torch.Tensor
) – Shape [batch, chan, *]Returns: torch.Tensor
– gLN_x [batch, chan, *]
-
-
class
asteroid.masknn.norms.
ChanLN
(channel_size)[source]¶ Bases:
asteroid.masknn.norms._LayerNorm
Channel-wise Layer Normalization (chanLN).
-
forward
(x, EPS: float = 1e-08)[source]¶ Applies forward pass.
Works for any input size > 2D.
Parameters: x ( torch.Tensor
) – [batch, chan, *]Returns: torch.Tensor
– chanLN_x [batch, chan, *]
-
-
class
asteroid.masknn.norms.
CumLN
(channel_size)[source]¶ Bases:
asteroid.masknn.norms._LayerNorm
Cumulative Global layer normalization(cumLN).
-
forward
(x, EPS: float = 1e-08)[source]¶ Parameters: x ( torch.Tensor
) – Shape [batch, channels, length]Returns: torch.Tensor
– cumLN_x [batch, channels, length]
-
-
class
asteroid.masknn.norms.
FeatsGlobLN
(channel_size)[source]¶ Bases:
asteroid.masknn.norms._LayerNorm
Feature-wise global Layer Normalization (FeatsGlobLN). Applies normalization over frames for each channel.
-
forward
(x, EPS: float = 1e-08)[source]¶ Applies forward pass.
Works for any input size > 2D.
Parameters: x ( torch.Tensor
) – [batch, chan, time]Returns: torch.Tensor
– chanLN_x [batch, chan, time]
-
-
class
asteroid.masknn.norms.
BatchNorm
(*args, **kwargs)[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
Wrapper class for pytorch BatchNorm1D and BatchNorm2D
-
asteroid.masknn.norms.
gLN
[source]¶ alias of
asteroid.masknn.norms.GlobLN
-
asteroid.masknn.norms.
fgLN
[source]¶ alias of
asteroid.masknn.norms.FeatsGlobLN
-
asteroid.masknn.norms.
cLN
[source]¶ alias of
asteroid.masknn.norms.ChanLN
-
asteroid.masknn.norms.
cgLN
[source]¶ alias of
asteroid.masknn.norms.CumLN
-
asteroid.masknn.norms.
bN
[source]¶ alias of
asteroid.masknn.norms.BatchNorm
-
asteroid.masknn.norms.
register_norm
(custom_norm)[source]¶ Register a custom norm, gettable with norms.get.
Parameters: custom_norm – Custom norm to register.
Complex number support¶
Complex building blocks that work with PyTorch native (!) complex tensors, i.e. dtypes complex64/complex128, or tensors for which .is_complex() returns True.
Note that Asteroid code has two other representations of complex numbers:
- Torchaudio representation […, 2] where […, 0] and […, 1] are real and imaginary components, respectively
- Asteroid style representation, identical to the Torchaudio representation, but with the last dimension concatenated: tensor([r1, r2, …, rn, i1, i2, …, in]). The concatenated (2 * n) dimension may be at an arbitrary position, i.e. the tensor is of shape […, 2 * n, …]. See asteroid_filterbanks.transforms for details.
-
asteroid.complex_nn.
as_torch_complex
(x, asteroid_dim: int = -2)[source]¶ Convert complex x to complex. Input may be one of:
- PyTorch native complex
- Torchaudio style complex
- Asteroid style complex
- Tuple or list of (real, imaginary) components
Parameters: asteroid_dim (int, optional) – Dimension to check for Asteroid-style complex. Raises: ValueError
– If type of x is not understood.
-
asteroid.complex_nn.
on_reim
(f)[source]¶ Make a complex-valued function callable from a real-valued one by applying it to the real and imaginary components independently.
Returns: cf(x), complex version of f – A function that applies f to the real and imaginary components of x and returns the result as PyTorch complex tensor.
-
class
asteroid.complex_nn.
OnReIm
(module_cls, *args, **kwargs)[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
Like on_reim, but for stateful modules.
Parameters: module_cls (callable) – A class or function that returns a Torch module/functional. Called 2x with *args, **kwargs, to construct the real and imaginary component modules.
-
class
asteroid.complex_nn.
ComplexMultiplicationWrapper
(module_cls, *args, **kwargs)[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
Make a complex-valued module F from a real-valued module f by applying complex multiplication rules:
F(a + i b) = f1(a) - f1(b) + i (f2(b) + f2(a))
where f1, f2 are instances of f that do not share weights.
Parameters: module_cls (callable) – A class or function that returns a Torch module/functional. Constructor of f in the formula above. Called 2x with *args, **kwargs, to construct the real and imaginary component modules.
-
class
asteroid.complex_nn.
ComplexSingleRNN
(rnn_type, input_size, hidden_size, n_layers=1, dropout=0, bidirectional=False)[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
Module for a complex RNN block.
This is similar to :cls:`asteroid.masknn.recurrent.SingleRNN` but uses complex multiplication as described in [1]. Arguments are identical to those of SingleRNN, except for dropout, which is not yet supported.
Parameters: - rnn_type (str) – Select from
'RNN'
,'LSTM'
,'GRU'
. Can also be passed in lowercase letters. - input_size (int) – Dimension of the input feature. The input should have shape [batch, seq_len, input_size].
- hidden_size (int) – Dimension of the hidden state.
- n_layers (int, optional) – Number of layers used in RNN. Default is 1.
- bidirectional (bool, optional) – Whether the RNN layers are
bidirectional. Default is
False
. - dropout – Not yet supported.
- References
- [1] : “DCCRN: Deep Complex Convolution Recurrent Network for Phase-Aware Speech Enhancement”, Yanxin Hu et al. https://arxiv.org/abs/2008.00264
- rnn_type (str) – Select from
-
class
asteroid.complex_nn.
BoundComplexMask
(bound_type)[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
Module version of bound_complex_mask
-
asteroid.complex_nn.
bound_complex_mask
(mask: <sphinx.ext.autodoc.importer._MockObject object at 0x7f7f62c62e90>, bound_type='tanh')[source]¶ Bound a complex mask, as proposed in [1], section 3.2.
Valid bound types, for a complex mask \(M = |M| ⋅ e^{i φ(M)}\):
- Unbounded (“UBD”): \(M_{\mathrm{UBD}} = M\)
- Sigmoid (“BDSS”): \(M_{\mathrm{BDSS}} = σ(|M|) e^{i σ(φ(M))}\)
- Tanh (“BDT”): \(M_{\mathrm{BDT}} = \mathrm{tanh}(|M|) e^{i φ(M)}\)
Parameters: bound_type (str or None) – The type of bound to use, either of “tanh”/”bdt” (default), “sigmoid”/”bdss” or None/”bdt”. - References
- [1] : “Phase-aware Speech Enhancement with Deep Complex U-Net”, Hyeong-Seok Choi et al. https://arxiv.org/abs/1903.03107