asteroid.masknn.tac module¶

class asteroid.masknn.tac.TAC(input_dim, hidden_dim=384, activation='prelu', norm_type='gLN')[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

Transform-Average-Concatenate inter-microphone-channel permutation invariant communication block [1].

Parameters:	input_dim (int) – Number of features of input representation. hidden_dim (int, optional) – size of hidden layers in TAC operations. activation (str, optional) – type of activation used. See asteroid.masknn.activations. norm_type (str, optional) – type of normalization layer used. See asteroid.masknn.norms.

Note

Supports inputs of shape \((batch, mic\_channels, features, chunk\_size, n\_chunks)\) as in FasNet-TAC. The operations are applied for each element in chunk_size and n_chunks. Output is of same shape as input.

References: [1] : Luo, Yi, et al. “End-to-end microphone permutation and number invariant multi-channel speech separation.” ICASSP 2020.

forward(x, valid_mics=None)[source]¶

Parameters:

x – (torch.Tensor): Input multi-channel DPRNN features. Shape: \((batch, mic\_channels, features, chunk\_size, n\_chunks)\).
valid_mics – (torch.LongTensor): tensor containing effective number of microphones on each batch. Batches can be composed of examples coming from arrays with a different number of microphones and thus the mic_channels dimension is padded. E.g. torch.tensor([4, 3]) means first example has 4 channels and the second 3. Shape: :math`(batch)`.

Returns:

output (torch.Tensor) –

features for each mic_channel after TAC inter-channel processing.: Shape \((batch, mic\_channels, features, chunk\_size, n\_chunks)\).