asteroid.models.fasnet module¶
-
class
asteroid.models.fasnet.
FasNetTAC
(n_src, enc_dim=64, feature_dim=64, hidden_dim=128, n_layers=4, window_ms=4, stride=None, context_ms=16, sample_rate=16000, tac_hidden_dim=384, norm_type='gLN', chunk_size=50, hop_size=25, bidirectional=True, rnn_type='LSTM', dropout=0.0, use_tac=True)[source]¶ Bases:
asteroid.models.base_models.BaseModel
FasNetTAC separation model with optional Transform-Average-Concatenate (TAC) module[1].
Parameters: - n_src (int) – Maximum number of sources the model can separate.
- enc_dim (int, optional) – Length of analysis filter. Defaults to 64.
- feature_dim (int, optional) – Size of hidden representation in DPRNN blocks after bottleneck. Defaults to 64.
- hidden_dim (int, optional) – Number of neurons in the RNNs cell state in DPRNN blocks. Defaults to 128.
- n_layers (int, optional) – Number of DPRNN blocks. Default to 4.
- window_ms (int, optional) – Beamformer window_length in milliseconds. Defaults to 4.
- stride (int, optional) – Stride for Beamforming windows. Defaults to window_ms // 2.
- context_ms (int, optional) – Context for each Beamforming window. Defaults to 16. Effective window is 2*context_ms+window_ms.
- sample_rate (int, optional) – Samplerate of input signal.
- tac_hidden_dim (int, optional) – Size for TAC module hidden dimensions. Default to 384 neurons.
- norm_type (str, optional) – Normalization layer used. Default is Layer Normalization.
- chunk_size (int, optional) – Chunk size used for dual-path processing in DPRNN blocks. Default to 50 samples.
- hop_size (int, optional) – Hop-size used for dual-path processing in DPRNN blocks. Default to chunk_size // 2 (50% overlap).
- bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN (Intra-Chunk is always bidirectional).
- rnn_type (str, optional) – Type of RNN used. Choose between
'RNN'
,'LSTM'
and'GRU'
. - dropout (float, optional) – Dropout ratio, must be in [0,1].
- use_tac (bool, optional) – whether to use Transform-Average-Concatenate for inter-mic-channels communication. Defaults to True.
- References
- [1] Luo, Yi, et al. “End-to-end microphone permutation and number invariant multi-channel speech separation.” ICASSP 2020.
-
forward
(x, valid_mics=None)[source]¶ Parameters: - x – (
torch.Tensor
): multi-channel input signal. Shape: \((batch, mic\_channels, samples)\). - valid_mics – (
torch.LongTensor
): tensor containing effective number of microphones on each batch. Batches can be composed of examples coming from arrays with a different number of microphones and thus themic_channels
dimension is padded. E.g. torch.tensor([4, 3]) means first example has 4 channels and the second 3. Shape: :math`(batch)`.
Returns: bf_signal (
torch.Tensor
) – beamformed signal with shape \((batch, n\_src, samples)\).- x – (