asteroid.models.fasnet module¶

class asteroid.models.fasnet.FasNetTAC(n_src, enc_dim=64, feature_dim=64, hidden_dim=128, n_layers=4, window_ms=4, stride=None, context_ms=16, sample_rate=16000, tac_hidden_dim=384, norm_type='gLN', chunk_size=50, hop_size=25, bidirectional=True, rnn_type='LSTM', dropout=0.0, use_tac=True)[source]¶

Bases: asteroid.models.base_models.BaseModel

FasNetTAC separation model with optional Transform-Average-Concatenate (TAC) module[1].

Parameters:

n_src (int) – Maximum number of sources the model can separate.
enc_dim (int, optional) – Length of analysis filter. Defaults to 64.
feature_dim (int, optional) – Size of hidden representation in DPRNN blocks after bottleneck. Defaults to 64.
hidden_dim (int, optional) – Number of neurons in the RNNs cell state in DPRNN blocks. Defaults to 128.
n_layers (int, optional) – Number of DPRNN blocks. Default to 4.
window_ms (int, optional) – Beamformer window_length in milliseconds. Defaults to 4.
stride (int, optional) – Stride for Beamforming windows. Defaults to window_ms // 2.
context_ms (int, optional) – Context for each Beamforming window. Defaults to 16. Effective window is 2*context_ms+window_ms.
sample_rate (int, optional) – Samplerate of input signal.
tac_hidden_dim (int, optional) – Size for TAC module hidden dimensions. Default to 384 neurons.
norm_type (str, optional) – Normalization layer used. Default is Layer Normalization.
chunk_size (int, optional) – Chunk size used for dual-path processing in DPRNN blocks. Default to 50 samples.
hop_size (int, optional) – Hop-size used for dual-path processing in DPRNN blocks. Default to chunk_size // 2 (50% overlap).
bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN (Intra-Chunk is always bidirectional).
rnn_type (str, optional) – Type of RNN used. Choose between 'RNN', 'LSTM' and 'GRU'.
dropout (float, optional) – Dropout ratio, must be in [0,1].
use_tac (bool, optional) – whether to use Transform-Average-Concatenate for inter-mic-channels communication. Defaults to True.

References: [1] Luo, Yi, et al. “End-to-end microphone permutation and number invariant multi-channel speech separation.” ICASSP 2020.

static windowing_with_context(x, window, context)[source]¶

forward(x, valid_mics=None)[source]¶

Parameters:

x – (torch.Tensor): multi-channel input signal. Shape: \((batch, mic\_channels, samples)\).
valid_mics – (torch.LongTensor): tensor containing effective number of microphones on each batch. Batches can be composed of examples coming from arrays with a different number of microphones and thus the mic_channels dimension is padded. E.g. torch.tensor([4, 3]) means first example has 4 channels and the second 3. Shape: :math`(batch)`.

Returns:

bf_signal (torch.Tensor) – beamformed signal with shape \((batch, n\_src, samples)\).

get_model_args()[source]¶: Should return args to re-instantiate the class.