DSP Modules¶

class asteroid.dsp.LambdaOverlapAdd(nnet, n_src, window_size, hop_size=None, window='hanning', reorder_chunks=True, enable_grad=False)[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

Overlap-add with lambda transform on segments.

Segment input signal, apply lambda function (a neural network for example) and combine with OLA.

Parameters:

nnet (callable) – Function to apply to each segment.
n_src (int) – Number of sources in the output of nnet.
window_size (int) – Size of segmenting window.
hop_size (int) – Segmentation hop size.
window (str) – Name of the window (see scipy.signal.get_window) used for the synthesis.
reorder_chunks – Whether to reorder each consecutive segment. This might be useful when nnet is permutation invariant, as source assignements might change output channel from one segment to the next (in classic speech separation for example). Reordering is performed based on the correlation between the overlapped part of consecutive segment.

forward(x)[source]¶

Forward module: segment signal, apply func, combine with OLA.

Parameters:	x (`torch.Tensor`) – waveform signal of shape (batch, 1, time).
Returns:	`torch.Tensor` – The output of the lambda OLA.

ola_forward(x)[source]¶: Heart of the class: segment signal, apply func, combine with OLA.

class asteroid.dsp.DualPathProcessing(chunk_size, hop_size)[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

Perform Dual-Path processing via overlap-add as in DPRNN [1].

Args:

chunk_size (int): Size of segmenting window. hop_size (int): segmentation hop size.

References

[1] “Dual-path RNN: efficient long sequence modeling for: time-domain single-channel speech separation”, Yi Luo, Zhuo Chen and Takuya Yoshioka. https://arxiv.org/abs/1910.06379

fold(x, output_size=None)[source]¶

Folds back the spliced feature tensor.

Input shape (batch, channels, chunk_size, n_chunks) to original shape (batch, channels, time) using overlap-add.

Parameters:	x – (`torch.Tensor`): spliced feature tensor of shape (batch, channels, chunk_size, n_chunks). output_size – (int, optional): sequence length of original feature tensor. If None, the original length cached by the previous call of unfold will be used.
Returns:	x – (`torch.Tensor`): feature tensor of shape (batch, channels, time).

Note

fold caches the original length of the pr

static inter_process(x, module)[source]¶

Performs inter-chunk processing.

Parameters:

x (torch.Tensor) – spliced feature tensor of shape (batch, channels, chunk_size, n_chunks).
module (torch.nn.Module) – module one wish to apply between each chunk of the spliced feature tensor.

Returns:

x (torch.Tensor) –

processed spliced feature tensor of shape: (batch, channels, chunk_size, n_chunks).

Note

the module should have the channel first convention and accept a 3D tensor of shape (batch, channels, time).

static intra_process(x, module)[source]¶

Performs intra-chunk processing.

Parameters:

x (torch.Tensor) – spliced feature tensor of shape (batch, channels, chunk_size, n_chunks).
module (torch.nn.Module) – module one wish to apply to each chunk of the spliced feature tensor.

Returns:

x (torch.Tensor) –

processed spliced feature tensor of shape: (batch, channels, chunk_size, n_chunks).

Note

the module should have the channel first convention and accept a 3D tensor of shape (batch, channels, time).

unfold(x)[source]¶

Unfold the feature tensor from

(batch, channels, time) to (batch, channels, chunk_size, n_chunks).

Parameters:	x – (`torch.Tensor`): feature tensor of shape (batch, channels, time).
Returns:	x – (`torch.Tensor`): spliced feature tensor of shape (batch, channels, chunk_size, n_chunks).

asteroid.dsp.mixture_consistency(mixture, est_sources, src_weights=None, dim=1)[source]¶

Applies mixture consistency to a tensor of estimated sources.

Args

mixture (torch.Tensor): Mixture waveform or TF representation. est_sources (torch.Tensor): Estimated sources waveforms or TF

representations.

src_weights (torch.Tensor): Consistency weight for each source.: Shape needs to be broadcastable to est_source. We make sure that the weights sum up to 1 along dim dim. If src_weights is None, compute them based on relative power.

dim (int): Axis which contains the sources in est_sources.

Returns

torch.Tensor with same shape as est_sources, after applying mixture consistency.

Notes

This method can be used only in ‘complete’ separation tasks, otherwise the residual error will contain unwanted sources. For example, this won’t work with the task sep_noisy from WHAM.

Examples

>>> # Works on waveforms
>>> mix = torch.randn(10, 16000)
>>> est_sources = torch.randn(10, 2, 16000)
>>> new_est_sources = mixture_consistency(mix, est_sources, dim=1)
>>> # Also works on spectrograms
>>> mix = torch.randn(10, 514, 400)
>>> est_sources = torch.randn(10, 2, 514, 400)
>>> new_est_sources = mixture_consistency(mix, est_sources, dim=1)

References

Scott Wisdom, John R Hershey, Kevin Wilson, Jeremy Thorpe, Michael Chinen, Brian Patton, and Rif A Saurous. “Differentiable consistency constraints for improved deep speech enhancement”, ICASSP 2019.