asteroid.masknn.attention module¶
-
class
asteroid.masknn.attention.
ImprovedTransformedLayer
(embed_dim, n_heads, dim_ff, dropout=0.0, activation='relu', bidirectional=True, norm='gLN')[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
Improved Transformer module as used in [1]. It is Multi-Head self-attention followed by LSTM, activation and linear projection layer.
Parameters: - embed_dim (int) – Number of input channels.
- n_heads (int) – Number of attention heads.
- dim_ff (int) – Number of neurons in the RNNs cell state. Defaults to 256. RNN here replaces standard FF linear layer in plain Transformer.
- dropout (float, optional) – Dropout ratio, must be in [0,1].
- activation (str, optional) – activation function applied at the output of RNN.
- bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN (Intra-Chunk is always bidirectional).
- norm (str, optional) – Type of normalization to use.
- References
- [1] Chen, Jingjing, Qirong Mao, and Dong Liu. “Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation.” arXiv (2020).
-
class
asteroid.masknn.attention.
DPTransformer
(in_chan, n_src, n_heads=4, ff_hid=256, chunk_size=100, hop_size=None, n_repeats=6, norm_type='gLN', ff_activation='relu', mask_act='relu', bidirectional=True, dropout=0)[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
Dual-path Transformer introduced in [1].
Parameters: - in_chan (int) – Number of input filters.
- n_src (int) – Number of masks to estimate.
- n_heads (int) – Number of attention heads.
- ff_hid (int) – Number of neurons in the RNNs cell state. Defaults to 256.
- chunk_size (int) – window size of overlap and add processing. Defaults to 100.
- hop_size (int or None) – hop size (stride) of overlap and add processing. Default to chunk_size // 2 (50% overlap).
- n_repeats (int) – Number of repeats. Defaults to 6.
- norm_type (str, optional) – Type of normalization to use.
- ff_activation (str, optional) – activation function applied at the output of RNN.
- mask_act (str, optional) – Which non-linear function to generate mask.
- bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN (Intra-Chunk is always bidirectional).
- dropout (float, optional) – Dropout ratio, must be in [0,1].
- References
- [1] Chen, Jingjing, Qirong Mao, and Dong Liu. “Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation.” arXiv (2020).
-
forward
(mixture_w)[source]¶ Forward.
Parameters: mixture_w ( torch.Tensor
) – Tensor of shape $(batch, nfilters, nframes)$Returns: torch.Tensor
– estimated mask of shape $(batch, nsrc, nfilters, nframes)$