Shortcuts

asteroid.models.fasnet module

class asteroid.models.fasnet.FasNetTAC(n_src, enc_dim=64, feature_dim=64, hidden_dim=128, n_layers=4, window_ms=4, stride=None, context_ms=16, sample_rate=16000, tac_hidden_dim=384, norm_type='gLN', chunk_size=50, hop_size=25, bidirectional=True, rnn_type='LSTM', dropout=0.0, use_tac=True)[source]

Bases: asteroid.models.base_models.BaseModel

FasNetTAC separation model with optional Transform-Average-Concatenate (TAC) module[1].

Parameters:
  • n_src (int) – Maximum number of sources the model can separate.
  • enc_dim (int, optional) – Length of analysis filter. Defaults to 64.
  • feature_dim (int, optional) – Size of hidden representation in DPRNN blocks after bottleneck. Defaults to 64.
  • hidden_dim (int, optional) – Number of neurons in the RNNs cell state in DPRNN blocks. Defaults to 128.
  • n_layers (int, optional) – Number of DPRNN blocks. Default to 4.
  • window_ms (int, optional) – Beamformer window_length in milliseconds. Defaults to 4.
  • stride (int, optional) – Stride for Beamforming windows. Defaults to window_ms // 2.
  • context_ms (int, optional) – Context for each Beamforming window. Defaults to 16. Effective window is 2*context_ms+window_ms.
  • sample_rate (int, optional) – Samplerate of input signal.
  • tac_hidden_dim (int, optional) – Size for TAC module hidden dimensions. Default to 384 neurons.
  • norm_type (str, optional) – Normalization layer used. Default is Layer Normalization.
  • chunk_size (int, optional) – Chunk size used for dual-path processing in DPRNN blocks. Default to 50 samples.
  • hop_size (int, optional) – Hop-size used for dual-path processing in DPRNN blocks. Default to chunk_size // 2 (50% overlap).
  • bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN (Intra-Chunk is always bidirectional).
  • rnn_type (str, optional) – Type of RNN used. Choose between 'RNN', 'LSTM' and 'GRU'.
  • dropout (float, optional) – Dropout ratio, must be in [0,1].
  • use_tac (bool, optional) – whether to use Transform-Average-Concatenate for inter-mic-channels communication. Defaults to True.
References
[1] Luo, Yi, et al. “End-to-end microphone permutation and number invariant multi-channel speech separation.” ICASSP 2020.
static windowing_with_context(x, window, context)[source]
forward(x, valid_mics=None)[source]
Parameters:
  • x – (torch.Tensor): multi-channel input signal. Shape: \((batch, mic\_channels, samples)\).
  • valid_mics – (torch.LongTensor): tensor containing effective number of microphones on each batch. Batches can be composed of examples coming from arrays with a different number of microphones and thus the mic_channels dimension is padded. E.g. torch.tensor([4, 3]) means first example has 4 channels and the second 3. Shape: :math`(batch)`.
Returns:

bf_signal (torch.Tensor) – beamformed signal with shape \((batch, n\_src, samples)\).

get_model_args()[source]

Should return args to re-instantiate the class.

Read the Docs v: v0.4.4
Versions
latest
stable
v0.4.4
v0.4.3
v0.4.2
v0.4.1
v0.4.0
v0.3.5_b
v0.3.4
v0.3.3
v0.3.2
v0.3.1
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.