asteroid.masknn.attention module¶

class asteroid.masknn.attention.DPTransformer(in_chan, n_src, n_heads=4, ff_hid=256, chunk_size=100, hop_size=None, n_repeats=6, norm_type='gLN', ff_activation='relu', mask_act='relu', bidirectional=True, dropout=0)[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

Dual-path Transformer: introduced in [1].

Parameters:

in_chan (int) – Number of input filters.
n_src (int) – Number of masks to estimate.
n_heads (int) – Number of attention heads.
hid_ff (int) – Number of neurons in the RNNs cell state. Defaults to 256.
chunk_size (int) – window size of overlap and add processing. Defaults to 100.
hop_size (int or None) – hop size (stride) of overlap and add processing. Default to chunk_size // 2 (50% overlap).
n_repeats (int) – Number of repeats. Defaults to 6.
norm_type (str, optional) – Type of normalization to use.
ff_activation (str, optional) – activation function applied at the output of RNN.
mask_act (str, optional) – Which non-linear function to generate mask.
bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN (Intra-Chunk is always bidirectional).
dropout (float, optional) – Dropout ratio, must be in [0,1].

References

[1] Chen, Jingjing, Qirong Mao, and Dong Liu. “Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation.”

arXiv preprint arXiv:2007.13975 (2020).

forward(mixture_w)[source]¶

Parameters:	mixture_w (`torch.Tensor`) – Tensor of shape [batch, n_filters, n_frames]
Returns:	`torch.Tensor` estimated mask of shape [batch, n_src, n_filters, n_frames]

get_config()[source]¶

class asteroid.masknn.attention.ImprovedTransformedLayer(embed_dim, n_heads, dim_ff, dropout=0.0, activation='relu', bidirectional=True, norm='gLN')[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

Improved Transformer module as used in [1]. It is Multi-Head self-attention followed by LSTM, activation and linear projection layer.

Parameters:

embed_dim (int) – Number of input channels.
n_heads (int) – Number of attention heads.
dim_ff (int) – Number of neurons in the RNNs cell state. Defaults to 256. RNN here replaces standard FF linear layer in plain Transformer.
dropout (float, optional) – Dropout ratio, must be in [0,1].
activation (str, optional) – activation function applied at the output of RNN.
bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN (Intra-Chunk is always bidirectional).
norm_type (str, optional) – Type of normalization to use.

References

[1] Chen, Jingjing, Qirong Mao, and Dong Liu. “Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation.”

arXiv preprint arXiv:2007.13975 (2020).

forward(x)[source]¶