Shortcuts

DSP Modules

Beamforming

class asteroid.dsp.beamforming.Beamformer(*args, **kwargs)[source]

Bases: sphinx.ext.autodoc.importer._MockObject

Base class for beamforming modules.

static apply_beamforming_vector(bf_vector: <sphinx.ext.autodoc.importer._MockObject object at 0x7f07ea3536d0>, mix: <sphinx.ext.autodoc.importer._MockObject object at 0x7f07ea353710>)[source]

Apply the beamforming vector to the mixture. Output (batch, freqs, frames).

Parameters:
  • bf_vector – shape (batch, mics, freqs)
  • mix – shape (batch, mics, freqs, frames).
static get_reference_mic_vects(ref_mic, bf_mat: <sphinx.ext.autodoc.importer._MockObject object at 0x7f07ea353790>, target_scm: <sphinx.ext.autodoc.importer._MockObject object at 0x7f07ea3537d0> = None, noise_scm: <sphinx.ext.autodoc.importer._MockObject object at 0x7f07ea353810> = None)[source]

Return the reference channel indices over the batch.

Parameters:
  • ref_mic (Optional[Union[int, torch.Tensor]]) – The reference channel. If torch.Tensor (ndim>1), return it, it is the reference mic vector, If torch.LongTensor of size batch, select independent reference mic of the batch. If int, select the corresponding reference mic, If None, the optimal reference mics are computed with get_optimal_reference_mic(), If None, and either SCM is None, ref_mic is set to 0,
  • bf_mat – beamforming matrix of shape (batch, freq, mics, mics).
  • target_scm (torch.ComplexTensor) – (batch, freqs, mics, mics).
  • noise_scm (torch.ComplexTensor) – (batch, freqs, mics, mics).
Returns:

torch.LongTensor of size batch to select with the reference channel indices.

class asteroid.dsp.beamforming.SDWMWFBeamformer(mu=1.0)[source]

Bases: asteroid.dsp.beamforming.Beamformer

forward(mix: <sphinx.ext.autodoc.importer._MockObject object at 0x7f07ea353d10>, target_scm: <sphinx.ext.autodoc.importer._MockObject object at 0x7f07ea353d50>, noise_scm: <sphinx.ext.autodoc.importer._MockObject object at 0x7f07ea353d90>, ref_mic: Union[<sphinx.ext.autodoc.importer._MockObject object at 0x7f07ea353dd0>, <sphinx.ext.autodoc.importer._MockObject object at 0x7f07ea353e10>, int] = None)[source]

Compute and apply SDW-MWF beamformer.

\(\mathbf{w} = \displaystyle (\Sigma_{ss} + \mu \Sigma_{nn})^{-1} \Sigma_{ss}\).

Parameters:
  • mix (torch.ComplexTensor) – shape (batch, mics, freqs, frames)
  • target_scm (torch.ComplexTensor) – (batch, mics, mics, freqs)
  • noise_scm (torch.ComplexTensor) – (batch, mics, mics, freqs)
  • ref_mic (int) – reference microphone.
Returns:

Filtered mixture. torch.ComplexTensor (batch, freqs, frames)

class asteroid.dsp.beamforming.GEVBeamformer(*args, **kwargs)[source]

Bases: asteroid.dsp.beamforming.Beamformer

forward(mix: <sphinx.ext.autodoc.importer._MockObject object at 0x7f07ea353ed0>, target_scm: <sphinx.ext.autodoc.importer._MockObject object at 0x7f07ea353f10>, noise_scm: <sphinx.ext.autodoc.importer._MockObject object at 0x7f07ea353f50>)[source]

Compute and apply the GEV beamformer.

\(\mathbf{w} = \displaystyle MaxEig\{ \Sigma_{nn}^{-1}\Sigma_{ss} \}\), where MaxEig extracts the eigenvector corresponding to the maximum eigenvalue (using the GEV decomposition).

Parameters:
  • mix – shape (batch, mics, freqs, frames)
  • target_scm – (batch, mics, mics, freqs)
  • noise_scm – (batch, mics, mics, freqs)
Returns:

Filtered mixture. torch.ComplexTensor (batch, freqs, frames)

class asteroid.dsp.beamforming.RTFMVDRBeamformer(*args, **kwargs)[source]

Bases: asteroid.dsp.beamforming.Beamformer

forward(mix: <sphinx.ext.autodoc.importer._MockObject object at 0x7f07ea353910>, target_scm: <sphinx.ext.autodoc.importer._MockObject object at 0x7f07ea353950>, noise_scm: <sphinx.ext.autodoc.importer._MockObject object at 0x7f07ea353990>)[source]

Compute and apply MVDR beamformer from the speech and noise SCM matrices.

\(\mathbf{w} = \displaystyle \frac{\Sigma_{nn}^{-1} \mathbf{a}}{ \mathbf{a}^H \Sigma_{nn}^{-1} \mathbf{a}}\) where \(\mathbf{a}\) is the ATF estimated from the target SCM.

Parameters:
  • mix (torch.ComplexTensor) – shape (batch, mics, freqs, frames)
  • target_scm (torch.ComplexTensor) – (batch, mics, mics, freqs)
  • noise_scm (torch.ComplexTensor) – (batch, mics, mics, freqs)
Returns:

Filtered mixture. torch.ComplexTensor (batch, freqs, frames)

from_rtf_vect(mix: <sphinx.ext.autodoc.importer._MockObject object at 0x7f07ea3539d0>, rtf_vec: <sphinx.ext.autodoc.importer._MockObject object at 0x7f07ea353a10>, noise_scm: <sphinx.ext.autodoc.importer._MockObject object at 0x7f07ea353a50>)[source]

Compute and apply MVDR beamformer from the ATF vector and noise SCM matrix.

Parameters:
  • mix (torch.ComplexTensor) – shape (batch, mics, freqs, frames)
  • rtf_vec (torch.ComplexTensor) – (batch, mics, freqs)
  • noise_scm (torch.ComplexTensor) – (batch, mics, mics, freqs)
Returns:

Filtered mixture. torch.ComplexTensor (batch, freqs, frames)

class asteroid.dsp.beamforming.SoudenMVDRBeamformer(*args, **kwargs)[source]

Bases: asteroid.dsp.beamforming.Beamformer

forward(mix: <sphinx.ext.autodoc.importer._MockObject object at 0x7f07ea353ad0>, target_scm: <sphinx.ext.autodoc.importer._MockObject object at 0x7f07ea353b10>, noise_scm: <sphinx.ext.autodoc.importer._MockObject object at 0x7f07ea353b50>, ref_mic: Union[<sphinx.ext.autodoc.importer._MockObject object at 0x7f07ea353b90>, <sphinx.ext.autodoc.importer._MockObject object at 0x7f07ea353bd0>, int] = 0, eps=1e-08)[source]

Compute and apply MVDR beamformer from the speech and noise SCM matrices. This class uses Souden’s formulation [1].

\(\mathbf{w} = \displaystyle \frac{\Sigma_{nn}^{-1} \Sigma_{ss}}{ Tr\left( \Sigma_{nn}^{-1} \Sigma_{ss} \right) }\mathbf{u}\) where \(\mathbf{a}\) is the steering vector.

Parameters:
  • mix (torch.ComplexTensor) – shape (batch, mics, freqs, frames)
  • target_scm (torch.ComplexTensor) – (batch, mics, mics, freqs)
  • noise_scm (torch.ComplexTensor) – (batch, mics, mics, freqs)
  • ref_mic (int) – reference microphone.
  • eps – numerical stabilizer.
Returns:

Filtered mixture. torch.ComplexTensor (batch, freqs, frames)

References
[1] Souden, M., Benesty, J., & Affes, S. (2009). On optimal frequency-domain multichannel linear filtering for noise reduction. IEEE Transactions on audio, speech, and language processing, 18(2), 260-276.
class asteroid.dsp.beamforming.SCM(*args, **kwargs)[source]

Bases: sphinx.ext.autodoc.importer._MockObject

forward(x: <sphinx.ext.autodoc.importer._MockObject object at 0x7f07ea3c1850>, mask: <sphinx.ext.autodoc.importer._MockObject object at 0x7f07ea353590> = None, normalize: bool = True)[source]

See compute_scm().

LambdaOverlapAdd

class asteroid.dsp.LambdaOverlapAdd(nnet, n_src, window_size, hop_size=None, window='hanning', reorder_chunks=True, enable_grad=False)[source]

Bases: sphinx.ext.autodoc.importer._MockObject

Overlap-add with lambda transform on segments (not scriptable).

Segment input signal, apply lambda function (a neural network for example) and combine with OLA.

LambdaOverlapAdd can be used with asteroid.separate and the asteroid-infer CLI.

Parameters:
  • nnet (callable) – Function to apply to each segment.
  • n_src (Optional[int]) – Number of sources in the output of nnet. If None, the number of sources is determined by the network’s output, but some correctness checks cannot be performed.
  • window_size (int) – Size of segmenting window.
  • hop_size (int) – Segmentation hop size.
  • window (str) – Name of the window (see scipy.signal.get_window) used for the synthesis.
  • reorder_chunks – Whether to reorder each consecutive segment. This might be useful when nnet is permutation invariant, as source assignements might change output channel from one segment to the next (in classic speech separation for example). Reordering is performed based on the correlation between the overlapped part of consecutive segment.
ola_forward(x)[source]

Heart of the class: segment signal, apply func, combine with OLA.

forward(x)[source]

Forward module: segment signal, apply func, combine with OLA.

Parameters:x (torch.Tensor) – waveform signal of shape (batch, 1, time).
Returns:torch.Tensor – The output of the lambda OLA.

DualPath Processing

class asteroid.dsp.DualPathProcessing(chunk_size, hop_size)[source]

Bases: sphinx.ext.autodoc.importer._MockObject

Perform Dual-Path processing via overlap-add as in DPRNN [1].

Parameters:
  • chunk_size (int) – Size of segmenting window.
  • hop_size (int) – segmentation hop size.
References
[1] Yi Luo, Zhuo Chen and Takuya Yoshioka. “Dual-path RNN: efficient long sequence modeling for time-domain single-channel speech separation” https://arxiv.org/abs/1910.06379
unfold(x)[source]

Unfold the feature tensor from $(batch, channels, time)$ to $(batch, channels, chunksize, nchunks)$.

Parameters:x (torch.Tensor) – feature tensor of shape $(batch, channels, time)$.
Returns:torch.Tensor – spliced feature tensor of shape $(batch, channels, chunksize, nchunks)$.
fold(x, output_size=None)[source]

Folds back the spliced feature tensor. Input shape $(batch, channels, chunksize, nchunks)$ to original shape $(batch, channels, time)$ using overlap-add.

Parameters:
  • x (torch.Tensor) – spliced feature tensor of shape $(batch, channels, chunksize, nchunks)$.
  • output_size (int, optional) – sequence length of original feature tensor. If None, the original length cached by the previous call of unfold() will be used.
Returns:

torch.Tensor – feature tensor of shape $(batch, channels, time)$.

Note

fold caches the original length of the input.

static intra_process(x, module)[source]

Performs intra-chunk processing.

Parameters:
  • x (torch.Tensor) – spliced feature tensor of shape (batch, channels, chunk_size, n_chunks).
  • module (torch.nn.Module) – module one wish to apply to each chunk of the spliced feature tensor.
Returns:

torch.Tensor – processed spliced feature tensor of shape $(batch, channels, chunksize, nchunks)$.

Note

the module should have the channel first convention and accept a 3D tensor of shape $(batch, channels, time)$.

static inter_process(x, module)[source]

Performs inter-chunk processing.

Parameters:
  • x (torch.Tensor) – spliced feature tensor of shape $(batch, channels, chunksize, nchunks)$.
  • module (torch.nn.Module) – module one wish to apply between each chunk of the spliced feature tensor.
Returns:

x (torch.Tensor) – processed spliced feature tensor of shape $(batch, channels, chunksize, nchunks)$.

Note

the module should have the channel first convention and accept a 3D tensor of shape $(batch, channels, time)$.

Mixture Consistency

asteroid.dsp.mixture_consistency(mixture: <sphinx.ext.autodoc.importer._MockObject object at 0x7f07f69df450>, est_sources: <sphinx.ext.autodoc.importer._MockObject object at 0x7f07f69df210>, src_weights: Optional[<sphinx.ext.autodoc.importer._MockObject object at 0x7f07f69df250>] = None, dim: int = 1) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f07f69df810>[source]

Applies mixture consistency to a tensor of estimated sources.

Parameters:
  • mixture (torch.Tensor) – Mixture waveform or TF representation.
  • est_sources (torch.Tensor) – Estimated sources waveforms or TF representations.
  • src_weights (torch.Tensor) – Consistency weight for each source. Shape needs to be broadcastable to est_source. We make sure that the weights sum up to 1 along dim dim. If src_weights is None, compute them based on relative power.
  • dim (int) – Axis which contains the sources in est_sources.
Returns
torch.Tensor with same shape as est_sources, after applying mixture consistency.
Examples
>>> # Works on waveforms
>>> mix = torch.randn(10, 16000)
>>> est_sources = torch.randn(10, 2, 16000)
>>> new_est_sources = mixture_consistency(mix, est_sources, dim=1)
>>> # Also works on spectrograms
>>> mix = torch.randn(10, 514, 400)
>>> est_sources = torch.randn(10, 2, 514, 400)
>>> new_est_sources = mixture_consistency(mix, est_sources, dim=1)

Note

This method can be used only in ‘complete’ separation tasks, otherwise the residual error will contain unwanted sources. For example, this won’t work with the task “sep_noisy” from WHAM.

References
Scott Wisdom et al. “Differentiable consistency constraints for improved deep speech enhancement”, ICASSP 2019.

VAD

asteroid.dsp.vad.ebased_vad(mag_spec, th_db: int = 40)[source]

Compute energy-based VAD from a magnitude spectrogram (or equivalent).

Parameters:
  • mag_spec (torch.Tensor) – the spectrogram to perform VAD on. Expected shape (batch, *, freq, time). The VAD mask will be computed independently for all the leading dimensions until the last two. Independent of the ordering of the last two dimensions.
  • th_db (int) – The threshold in dB from which a TF-bin is considered silent.
Returns:

torch.BoolTensor, the VAD mask.

Examples
>>> import torch
>>> mag_spec = torch.abs(torch.randn(10, 2, 65, 16))
>>> batch_src_mask = ebased_vad(mag_spec)

Delta Features

asteroid.dsp.deltas.compute_delta(feats: <sphinx.ext.autodoc.importer._MockObject object at 0x7f07ea0b5a90>, dim: int = -1) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f07ea0b5ad0>[source]

Compute delta coefficients of a tensor.

Parameters:
  • feats – Input features to compute deltas with.
  • dim – feature dimension in the feats tensor.
Returns:

Tensor – Tensor of deltas.

Examples
>>> import torch
>>> phase = torch.randn(2, 257, 100)
>>> # Compute instantaneous frequency
>>> inst_freq = compute_delta(phase, dim=-1)
>>> # Or group delay
>>> group_delay = compute_delta(phase, dim=-2)
asteroid.dsp.deltas.concat_deltas(feats: <sphinx.ext.autodoc.importer._MockObject object at 0x7f07ea0b5b90>, order: int = 1, dim: int = -1) → <sphinx.ext.autodoc.importer._MockObject object at 0x7f07ea0b5b10>[source]

Concatenate delta coefficients of a tensor to itself.

Parameters:
  • feats – Input features to compute deltas with.
  • order – Order of the delta e.g with order==2, compute delta of delta as well.
  • dim – feature dimension in the feats tensor.
Returns:

Tensor – Concatenation of the features, the deltas and subsequent deltas.

Examples
>>> import torch
>>> phase = torch.randn(2, 257, 100)
>>> # Compute second order instantaneous frequency
>>> phase_and_inst_freq = concat_deltas(phase, order=2, dim=-1)
>>> # Or group delay
>>> phase_and_group_delay = concat_deltas(phase, order=2, dim=-2)
Read the Docs v: v0.5.0
Versions
latest
stable
v0.5.0
v0.4.5
v0.4.4
v0.4.3
v0.4.2
v0.4.1
v0.4.0
v0.3.5_b
v0.3.4
v0.3.3
v0.3.2
v0.3.1
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.