Models¶

Base classes¶

class asteroid.models.base_models.BaseTasNet(encoder, masker, decoder, encoder_activation=None)[source]¶

Bases: sphinx.ext.autodoc.importer._MockObject

Base class for encoder-masker-decoder separation models.

Parameters:	encoder (Encoder) – Encoder instance. masker (nn.Module) – masker network. decoder (Decoder) – Decoder instance.

forward(wav)[source]¶

Enc/Mask/Dec model forward

Parameters:	wav (torch.Tensor) – waveform tensor. 1D, 2D or 3D tensor, time last.
Returns:	torch.Tensor, of shape (batch, n_src, time) or (n_src, time).

classmethod from_pretrained(pretrained_model_conf_or_path, *args, **kwargs)[source]¶

Instantiate separation model from a model config (file or dict).

Parameters:	pretrained_model_conf_or_path (Union[dict, str]) – model conf as returned by serialize, or path to it. Need to contain model_args and state_dict keys.
Returns:	Instance of BaseTasNet
Raises:	ValueError if the input config file doesn’t contain the keys – model_args and state_dict.

separate(wav)[source]¶

Infer separated sources from input waveforms. Also supports filenames.

Parameters:	wav (Union[torch.Tensor, numpy.ndarray, str]) – waveform array/tensor. Shape: 1D, 2D or 3D tensor, time last.
Returns:	Union[torch.Tensor, numpy.ndarray, None], the estimated sources. (batch, n_src, time) or (n_src, time) w/o batch dim.

serialize()[source]¶

Serialize model and output dictionary.

Returns:	dict, serialized model with keys model_args and state_dict.

Ready-to-use models¶

class asteroid.models.conv_tasnet.ConvTasNet(n_src, out_chan=None, n_blocks=8, n_repeats=3, bn_chan=128, hid_chan=512, skip_chan=128, conv_kernel_size=3, norm_type='gLN', mask_act='sigmoid', in_chan=None, fb_name='free', kernel_size=16, n_filters=512, stride=8, encoder_activation='relu', **fb_kwargs)[source]¶

Bases: asteroid.models.base_models.BaseTasNet

ConvTasNet separation model, as described in [1].

Parameters:

n_src (int) – Number of sources in the input mixtures.
out_chan (int, optional) – Number of bins in the estimated masks. If None, out_chan = in_chan.
n_blocks (int, optional) – Number of convolutional blocks in each repeat. Defaults to 8.
n_repeats (int, optional) – Number of repeats. Defaults to 3.
bn_chan (int, optional) – Number of channels after the bottleneck.
hid_chan (int, optional) – Number of channels in the convolutional blocks.
skip_chan (int, optional) – Number of channels in the skip connections. If 0 or None, TDConvNet won’t have any skip connections and the masks will be computed from the residual output. Corresponds to the ConvTasnet architecture in v1 or the paper.
conv_kernel_size (int, optional) – Kernel size in convolutional blocks.
norm_type (str, optional) – To choose from 'BN', 'gLN', 'cLN'.
mask_act (str, optional) – Which non-linear function to generate mask.
in_chan (int, optional) – Number of input channels, should be equal to n_filters.
fb_name (str, className) – Filterbank family from which to make encoder and decoder. To choose among ['free', 'analytic_free', 'param_sinc', 'stft'].
n_filters (int) – Number of filters / Input dimension of the masker net.
kernel_size (int) – Length of the filters.
stride (int, optional) – Stride of the convolution. If None (default), set to kernel_size // 2.
**fb_kwargs (dict) – Additional kwards to pass to the filterbank creation.

References

[1] : “Conv-TasNet: Surpassing ideal time-frequency magnitude masking for speech separation” TASLP 2019 Yi Luo, Nima Mesgarani https://arxiv.org/abs/1809.07454

class asteroid.models.dprnn_tasnet.DPRNNTasNet(n_src, out_chan=None, bn_chan=128, hid_size=128, chunk_size=100, hop_size=None, n_repeats=6, norm_type='gLN', mask_act='sigmoid', bidirectional=True, rnn_type='LSTM', num_layers=1, dropout=0, in_chan=None, fb_name='free', kernel_size=16, n_filters=64, stride=8, encoder_activation='relu', **fb_kwargs)[source]¶

Bases: asteroid.models.base_models.BaseTasNet

DPRNN separation model, as described in [1].

Parameters:

n_src (int) – Number of masks to estimate.
out_chan (int or None) – Number of bins in the estimated masks. Defaults to in_chan.
bn_chan (int) – Number of channels after the bottleneck. Defaults to 128.
hid_size (int) – Number of neurons in the RNNs cell state. Defaults to 128.
chunk_size (int) – window size of overlap and add processing. Defaults to 100.
hop_size (int or None) – hop size (stride) of overlap and add processing. Default to chunk_size // 2 (50% overlap).
n_repeats (int) – Number of repeats. Defaults to 6.
norm_type (str, optional) –
Type of normalization to use. To choose from
- 'gLN': global Layernorm
- 'cLN': channelwise Layernorm
mask_act (str, optional) – Which non-linear function to generate mask.
bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN (Intra-Chunk is always bidirectional).
rnn_type (str, optional) – Type of RNN used. Choose between 'RNN', 'LSTM' and 'GRU'.
num_layers (int, optional) – Number of layers in each RNN.
dropout (float, optional) – Dropout ratio, must be in [0,1].
in_chan (int, optional) – Number of input channels, should be equal to n_filters.
fb_name (str, className) – Filterbank family from which to make encoder and decoder. To choose among ['free', 'analytic_free', 'param_sinc', 'stft'].
n_filters (int) – Number of filters / Input dimension of the masker net.
kernel_size (int) – Length of the filters.
stride (int, optional) – Stride of the convolution. If None (default), set to kernel_size // 2.
**fb_kwargs (dict) – Additional kwards to pass to the filterbank creation.

References

[1] “Dual-path RNN: efficient long sequence modeling for: time-domain single-channel speech separation”, Yi Luo, Zhuo Chen and Takuya Yoshioka. https://arxiv.org/abs/1910.06379

Publishing models¶

class asteroid.models.zenodo.Zenodo(api_key=None, use_sandbox=True)[source]¶

Bases: object

Faciliate Zenodo’s REST API.

Parameters:	api_key (str) – Access token generated to upload depositions. use_sandbox (bool) – Whether to use the sandbox (default: True) Note that api_key are different in sandbox.

Methods (all methods return the requests response):: create_new_deposition change_metadata_in_deposition, upload_new_file_to_deposition publish_deposition get_deposition remove_deposition remove_all_depositions

Note

A Zenodo record is something that is public and cannot be deleted. A Zenodo deposit has not yet been published, is private and can be deleted.

change_metadata_in_deposition(dep_id, metadata)[source]¶

Set or replace metadata in given deposition

Parameters:	dep_id (int) – deposition id. You cna get it with r = create_new_deposition(); dep_id = r.json()[‘id’] metadata (dict) – Metadata dict.

Examples

metadata = {: ‘title’: ‘My first upload’, ‘upload_type’: ‘poster’, ‘description’: ‘This is my first upload’, ‘creators’: [{‘name’: ‘Doe, John’,

‘affiliation’: ‘Zenodo’}]

}

create_new_deposition(metadata=None)[source]¶

Creates a new deposition.

Parameters:	metadata (dict, optional) – Metadata dict to upload on the new deposition.

get_deposition(dep_id=-1)[source]¶: Get deposition by deposition id. Get all dep_id is -1 (default).

publish_deposition(dep_id)[source]¶

Publish given deposition (Cannot be deleted)!

Parameters:	dep_id (int) – deposition id. You cna get it with r = create_new_deposition(); dep_id = r.json()[‘id’]

remove_all_depositions()[source]¶: Removes all unpublished deposition (not records).

remove_deposition(dep_id)[source]¶: Remove deposition with deposition id dep_id

upload_new_file_to_deposition(dep_id, file, name=None)[source]¶

Upload one file to existing deposition. :param dep_id: deposition id. You cna get it with

r = create_new_deposition(); dep_id = r.json()[‘id’]

Parameters:	file (str or io.BufferedReader) – path to a file, or already opened file (path prefered). name (str, optional) – name given to the uploaded file. Defaults to the path.

(More: https://developers.zenodo.org/#deposition-files)

asteroid.models.publisher.display_one_level_dict(dic)[source]¶

Single level dict to HTML :param dic: :type dic: dict

Returns:	str for HTML-encoded single level dic

asteroid.models.publisher.get_username()[source]¶: Get git of FS username for upload.

asteroid.models.publisher.make_license_notice(model_name, licenses, uploader=None)[source]¶

Make license notice based on license dicts.

Parameters:

model_name (str) – Name of the model.
licenses (List[dict]) –
List of dict with keys (title, title_link, author, author_link,

licence, licence_link).
uploader (str) – Name of the uploader such as “Manuel Pariente”.

Returns:

str, the license note describing the model, it’s attribution,: the original licenses, what we license it under and the licensor.

asteroid.models.publisher.make_metadata_from_model(model)[source]¶

Create Zenodo deposit metadata for a given publishable model. :param model: Dictionary with all infos needed to publish.

More info to come.

Returns:	dict, the metadata to create the Zenodo deposit with.

asteroid.models.publisher.save_publishable(publish_dir, model_dict, metrics=None, train_conf=None)[source]¶

Save models to prepare for publication / model sharing.

Parameters:	publish_dir (str) – Path to the publishing directory. Usually under exp/exp_name/publish_dir model_dict (dict) – dict at least with keys model_args, state_dict,`dataset` or licenses metrics (dict) – dict with evaluation metrics. train_conf (dict) – Training configuration dict (from conf.yml).
Returns:	dict, same as model_dict with added fields.
Raises:	AssertionError when either `model_args`, `state_dict`,`dataset` or – licenses are not present is model_dict.keys()

asteroid.models.publisher.two_level_dict_html(dic)[source]¶

Two-level dict to HTML. :param dic: two-level dict :type dic: dict

Returns:	str for HTML-encoded two level dic

asteroid.models.publisher.upload_publishable(publish_dir, uploader=None, affiliation=None, git_username=None, token=None, force_publish=False, use_sandbox=False, unit_test=False)[source]¶

Entry point to upload publishable model.

Parameters:

publish_dir (str) – Path to the publishing directory. Usually under exp/exp_name/publish_dir
uploader (str) – Full name of the uploader (Ex: Manuel Pariente)
affiliation (str, optional) – Affiliation (no accent).
git_username (str, optional) – GitHub username.
token (str) – Access token generated to upload depositions.
force_publish (bool) – Whether to directly publish without asking confirmation before. Defaults to False.
use_sandbox (bool) – Whether to use Zenodo’s sandbox instead of the official Zenodo.
unit_test (bool) – If True, we do not ask user input and do not publish.

asteroid.models.publisher.zenodo_upload(model, token, model_path=None, use_sandbox=False)[source]¶

Create deposit and upload metadata + model

Parameters:	model (dict) – token (str) – Access token. model_path (str) – Saved model path. use_sandbox (bool) – Whether to use Zenodo’s sandbox instead of the official Zenodo.
Returns:	Zenodo (Zenodo instance with access token) int (deposit ID)