Shortcuts

asteroid.masknn.attention module

class asteroid.masknn.attention.DPTransformer(in_chan, n_src, n_heads=4, ff_hid=256, chunk_size=100, hop_size=None, n_repeats=6, norm_type='gLN', ff_activation='relu', mask_act='relu', bidirectional=True, dropout=0)[source]

Bases: sphinx.ext.autodoc.importer._MockObject

Dual-path Transformer
introduced in [1].
Parameters:
  • in_chan (int) – Number of input filters.
  • n_src (int) – Number of masks to estimate.
  • n_heads (int) – Number of attention heads.
  • hid_ff (int) – Number of neurons in the RNNs cell state. Defaults to 256.
  • chunk_size (int) – window size of overlap and add processing. Defaults to 100.
  • hop_size (int or None) – hop size (stride) of overlap and add processing. Default to chunk_size // 2 (50% overlap).
  • n_repeats (int) – Number of repeats. Defaults to 6.
  • norm_type (str, optional) – Type of normalization to use.
  • ff_activation (str, optional) – activation function applied at the output of RNN.
  • mask_act (str, optional) – Which non-linear function to generate mask.
  • bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN (Intra-Chunk is always bidirectional).
  • dropout (float, optional) – Dropout ratio, must be in [0,1].

References

[1] Chen, Jingjing, Qirong Mao, and Dong Liu. “Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation.”

arXiv preprint arXiv:2007.13975 (2020).
forward(mixture_w)[source]
Parameters:mixture_w (torch.Tensor) – Tensor of shape [batch, n_filters, n_frames]
Returns:
torch.Tensor
estimated mask of shape [batch, n_src, n_filters, n_frames]
get_config()[source]
class asteroid.masknn.attention.ImprovedTransformedLayer(embed_dim, n_heads, dim_ff, dropout=0.0, activation='relu', bidirectional=True, norm='gLN')[source]

Bases: sphinx.ext.autodoc.importer._MockObject

Improved Transformer module as used in [1]. It is Multi-Head self-attention followed by LSTM, activation and linear projection layer.

Parameters:
  • embed_dim (int) – Number of input channels.
  • n_heads (int) – Number of attention heads.
  • dim_ff (int) – Number of neurons in the RNNs cell state. Defaults to 256. RNN here replaces standard FF linear layer in plain Transformer.
  • dropout (float, optional) – Dropout ratio, must be in [0,1].
  • activation (str, optional) – activation function applied at the output of RNN.
  • bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN (Intra-Chunk is always bidirectional).
  • norm_type (str, optional) – Type of normalization to use.

References

[1] Chen, Jingjing, Qirong Mao, and Dong Liu. “Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation.”

arXiv preprint arXiv:2007.13975 (2020).
forward(x)[source]
Read the Docs v: v0.3.3
Versions
latest
stable
v0.3.3
v0.3.2
v0.3.1
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.