asteroid.masknn.attention module¶
-
class
asteroid.masknn.attention.
DPTransformer
(in_chan, n_src, n_heads=4, ff_hid=256, chunk_size=100, hop_size=None, n_repeats=6, norm_type='gLN', ff_activation='relu', mask_act='relu', bidirectional=True, dropout=0)[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
- Dual-path Transformer
- introduced in [1].
Parameters: - in_chan (int) – Number of input filters.
- n_src (int) – Number of masks to estimate.
- n_heads (int) – Number of attention heads.
- hid_ff (int) – Number of neurons in the RNNs cell state. Defaults to 256.
- chunk_size (int) – window size of overlap and add processing. Defaults to 100.
- hop_size (int or None) – hop size (stride) of overlap and add processing. Default to chunk_size // 2 (50% overlap).
- n_repeats (int) – Number of repeats. Defaults to 6.
- norm_type (str, optional) – Type of normalization to use.
- ff_activation (str, optional) – activation function applied at the output of RNN.
- mask_act (str, optional) – Which non-linear function to generate mask.
- bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN (Intra-Chunk is always bidirectional).
- dropout (float, optional) – Dropout ratio, must be in [0,1].
References
[1] Chen, Jingjing, Qirong Mao, and Dong Liu. “Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation.”
arXiv preprint arXiv:2007.13975 (2020).-
forward
(mixture_w)[source]¶ Parameters: mixture_w ( torch.Tensor
) – Tensor of shape [batch, n_filters, n_frames]Returns: torch.Tensor
- estimated mask of shape [batch, n_src, n_filters, n_frames]
-
class
asteroid.masknn.attention.
ImprovedTransformedLayer
(embed_dim, n_heads, dim_ff, dropout=0.0, activation='relu', bidirectional=True, norm='gLN')[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
Improved Transformer module as used in [1]. It is Multi-Head self-attention followed by LSTM, activation and linear projection layer.
Parameters: - embed_dim (int) – Number of input channels.
- n_heads (int) – Number of attention heads.
- dim_ff (int) – Number of neurons in the RNNs cell state. Defaults to 256. RNN here replaces standard FF linear layer in plain Transformer.
- dropout (float, optional) – Dropout ratio, must be in [0,1].
- activation (str, optional) – activation function applied at the output of RNN.
- bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN (Intra-Chunk is always bidirectional).
- norm_type (str, optional) – Type of normalization to use.
References
[1] Chen, Jingjing, Qirong Mao, and Dong Liu. “Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation.”
arXiv preprint arXiv:2007.13975 (2020).