Models¶
Base classes¶
-
class
asteroid.models.base_models.
BaseEncoderMaskerDecoder
(encoder, masker, decoder, encoder_activation=None)[source]¶ Bases:
asteroid.models.base_models.BaseModel
Base class for encoder-masker-decoder separation models.
Parameters: -
forward
(wav)[source]¶ Enc/Mask/Dec model forward
Parameters: wav (torch.Tensor) – waveform tensor. 1D, 2D or 3D tensor, time last. Returns: torch.Tensor, of shape (batch, n_src, time) or (n_src, time).
-
postprocess_decoded
(decoded)[source]¶ Hook to perform transformations on the decoded, time domain representation (output of the decoder) before original shape reconstruction.
Parameters: decoded (Tensor of shape (batch, n_src, time)) – Output of the decoder, before original shape reconstruction. Returns: Transformed decoded
-
postprocess_encoded
(tf_rep)[source]¶ Hook to perform transformations on the encoded, time-frequency domain representation (output of the encoder) before encoder activation is applied.
Parameters: tf_rep (Tensor of shape (batch, freq, time)) – Output of the encoder, before encoder activation is applied. Returns: Transformed tf_rep
-
postprocess_masked
(masked_tf_rep)[source]¶ Hook to perform transformations on the masked time-frequency domain representation (result of masking in the time-frequency domain) before decoding.
Parameters: masked_tf_rep (Tensor of shape (batch, n_src, freq, time)) – Masked time-frequency representation, before decoding. Returns: Transformed masked_tf_rep
-
-
class
asteroid.models.base_models.
BaseModel
[source]¶ Bases:
sphinx.ext.autodoc.importer._MockObject
-
file_separate
(filename: str, output_dir=None, force_overwrite=False, **kwargs) → None[source]¶ Filename interface to separate.
-
classmethod
from_pretrained
(pretrained_model_conf_or_path, *args, **kwargs)[source]¶ Instantiate separation model from a model config (file or dict).
Parameters: - pretrained_model_conf_or_path (Union[dict, str]) – model conf as returned by serialize, or path to it. Need to contain model_args and state_dict keys.
- *args – Positional arguments to be passed to the model.
- **kwargs – Keyword arguments to be passed to the model. They overwrite the ones in the model package.
Returns: nn.Module corresponding to the pretrained model conf/URL.
Raises: ValueError if the input config file doesn’t contain the keys – model_name, model_args or state_dict.
-
numpy_separate
(wav: <sphinx.ext.autodoc.importer._MockObject object at 0x7fbe91e8a7b8>, **kwargs) → <sphinx.ext.autodoc.importer._MockObject object at 0x7fbe91e8a7f0>[source]¶ Numpy interface to separate.
-
separate
(wav, output_dir=None, force_overwrite=False, **kwargs)[source]¶ Infer separated sources from input waveforms. Also supports filenames.
Parameters: - wav (Union[torch.Tensor, numpy.ndarray, str]) – waveform array/tensor. Shape: 1D, 2D or 3D tensor, time last.
- output_dir (str) – path to save all the wav files. If None, estimated sources will be saved next to the original ones.
- force_overwrite (bool) – whether to overwrite existing files.
- **kwargs – keyword arguments to be passed to _separate.
Returns: - Union[torch.Tensor, numpy.ndarray, None], the estimated sources.
(batch, n_src, time) or (n_src, time) w/o batch dim.
Note
By default, separate calls _separate which calls forward. For models whose forward doesn’t return waveform tensors, overwrite _separate to return waveform tensors.
-
-
asteroid.models.base_models.
BaseTasNet
¶ alias of
asteroid.models.base_models.BaseEncoderMaskerDecoder
Ready-to-use models¶
-
class
asteroid.models.conv_tasnet.
ConvTasNet
(n_src, out_chan=None, n_blocks=8, n_repeats=3, bn_chan=128, hid_chan=512, skip_chan=128, conv_kernel_size=3, norm_type='gLN', mask_act='sigmoid', in_chan=None, fb_name='free', kernel_size=16, n_filters=512, stride=8, encoder_activation=None, **fb_kwargs)[source]¶ Bases:
asteroid.models.base_models.BaseEncoderMaskerDecoder
ConvTasNet separation model, as described in [1].
Parameters: - n_src (int) – Number of sources in the input mixtures.
- out_chan (int, optional) – Number of bins in the estimated masks.
If
None
, out_chan = in_chan. - n_blocks (int, optional) – Number of convolutional blocks in each repeat. Defaults to 8.
- n_repeats (int, optional) – Number of repeats. Defaults to 3.
- bn_chan (int, optional) – Number of channels after the bottleneck.
- hid_chan (int, optional) – Number of channels in the convolutional blocks.
- skip_chan (int, optional) – Number of channels in the skip connections. If 0 or None, TDConvNet won’t have any skip connections and the masks will be computed from the residual output. Corresponds to the ConvTasnet architecture in v1 or the paper.
- conv_kernel_size (int, optional) – Kernel size in convolutional blocks.
- norm_type (str, optional) – To choose from
'BN'
,'gLN'
,'cLN'
. - mask_act (str, optional) – Which non-linear function to generate mask.
- in_chan (int, optional) – Number of input channels, should be equal to n_filters.
- fb_name (str, className) – Filterbank family from which to make encoder
and decoder. To choose among [
'free'
,'analytic_free'
,'param_sinc'
,'stft'
]. - n_filters (int) – Number of filters / Input dimension of the masker net.
- kernel_size (int) – Length of the filters.
- stride (int, optional) – Stride of the convolution.
If None (default), set to
kernel_size // 2
. - **fb_kwargs (dict) – Additional kwards to pass to the filterbank creation.
References
[1] : “Conv-TasNet: Surpassing ideal time-frequency magnitude masking for speech separation” TASLP 2019 Yi Luo, Nima Mesgarani https://arxiv.org/abs/1809.07454
-
class
asteroid.models.dprnn_tasnet.
DPRNNTasNet
(n_src, out_chan=None, bn_chan=128, hid_size=128, chunk_size=100, hop_size=None, n_repeats=6, norm_type='gLN', mask_act='sigmoid', bidirectional=True, rnn_type='LSTM', num_layers=1, dropout=0, in_chan=None, fb_name='free', kernel_size=16, n_filters=64, stride=8, encoder_activation=None, **fb_kwargs)[source]¶ Bases:
asteroid.models.base_models.BaseEncoderMaskerDecoder
DPRNN separation model, as described in [1].
Parameters: - n_src (int) – Number of masks to estimate.
- out_chan (int or None) – Number of bins in the estimated masks. Defaults to in_chan.
- bn_chan (int) – Number of channels after the bottleneck. Defaults to 128.
- hid_size (int) – Number of neurons in the RNNs cell state. Defaults to 128.
- chunk_size (int) – window size of overlap and add processing. Defaults to 100.
- hop_size (int or None) – hop size (stride) of overlap and add processing. Default to chunk_size // 2 (50% overlap).
- n_repeats (int) – Number of repeats. Defaults to 6.
- norm_type (str, optional) –
Type of normalization to use. To choose from
'gLN'
: global Layernorm'cLN'
: channelwise Layernorm
- mask_act (str, optional) – Which non-linear function to generate mask.
- bidirectional (bool, optional) – True for bidirectional Inter-Chunk RNN (Intra-Chunk is always bidirectional).
- rnn_type (str, optional) – Type of RNN used. Choose between
'RNN'
,'LSTM'
and'GRU'
. - num_layers (int, optional) – Number of layers in each RNN.
- dropout (float, optional) – Dropout ratio, must be in [0,1].
- in_chan (int, optional) – Number of input channels, should be equal to n_filters.
- fb_name (str, className) – Filterbank family from which to make encoder
and decoder. To choose among [
'free'
,'analytic_free'
,'param_sinc'
,'stft'
]. - n_filters (int) – Number of filters / Input dimension of the masker net.
- kernel_size (int) – Length of the filters.
- stride (int, optional) – Stride of the convolution.
If None (default), set to
kernel_size // 2
. - **fb_kwargs (dict) – Additional kwards to pass to the filterbank creation.
References
- [1] “Dual-path RNN: efficient long sequence modeling for
- time-domain single-channel speech separation”, Yi Luo, Zhuo Chen and Takuya Yoshioka. https://arxiv.org/abs/1910.06379
Publishing models¶
-
class
asteroid.models.zenodo.
Zenodo
(api_key=None, use_sandbox=True)[source]¶ Bases:
object
Faciliate Zenodo’s REST API.
Parameters: - Methods (all methods return the requests response):
- create_new_deposition change_metadata_in_deposition, upload_new_file_to_deposition publish_deposition get_deposition remove_deposition remove_all_depositions
Note
A Zenodo record is something that is public and cannot be deleted. A Zenodo deposit has not yet been published, is private and can be deleted.
-
change_metadata_in_deposition
(dep_id, metadata)[source]¶ Set or replace metadata in given deposition
Parameters: Examples
- metadata = {
‘title’: ‘My first upload’, ‘upload_type’: ‘poster’, ‘description’: ‘This is my first upload’, ‘creators’: [{‘name’: ‘Doe, John’,
‘affiliation’: ‘Zenodo’}]
}
-
create_new_deposition
(metadata=None)[source]¶ Creates a new deposition.
Parameters: metadata (dict, optional) – Metadata dict to upload on the new deposition.
-
publish_deposition
(dep_id)[source]¶ Publish given deposition (Cannot be deleted)!
Parameters: dep_id (int) – deposition id. You cna get it with r = create_new_deposition(); dep_id = r.json()[‘id’]
-
upload_new_file_to_deposition
(dep_id, file, name=None)[source]¶ Upload one file to existing deposition. :param dep_id: deposition id. You cna get it with
r = create_new_deposition(); dep_id = r.json()[‘id’]Parameters: - file (str or io.BufferedReader) – path to a file, or already opened file (path prefered).
- name (str, optional) – name given to the uploaded file. Defaults to the path.
-
asteroid.models.publisher.
display_one_level_dict
(dic)[source]¶ Single level dict to HTML :param dic: :type dic: dict
Returns: str for HTML-encoded single level dic
-
asteroid.models.publisher.
make_license_notice
(model_name, licenses, uploader=None)[source]¶ Make license notice based on license dicts.
Parameters: Returns: - str, the license note describing the model, it’s attribution,
the original licenses, what we license it under and the licensor.
-
asteroid.models.publisher.
make_metadata_from_model
(model)[source]¶ Create Zenodo deposit metadata for a given publishable model. :param model: Dictionary with all infos needed to publish.
More info to come.Returns: dict, the metadata to create the Zenodo deposit with.
-
asteroid.models.publisher.
save_publishable
(publish_dir, model_dict, metrics=None, train_conf=None, recipe=None)[source]¶ Save models to prepare for publication / model sharing.
Parameters: - publish_dir (str) – Path to the publishing directory. Usually under exp/exp_name/publish_dir
- model_dict (dict) – dict at least with keys model_args, state_dict,`dataset` or licenses
- metrics (dict) – dict with evaluation metrics.
- train_conf (dict) – Training configuration dict (from conf.yml).
- recipe (str) – Name of the recipe.
Returns: dict, same as model_dict with added fields.
Raises: AssertionError when either `model_args`, `state_dict`,`dataset` or – licenses are not present is model_dict.keys()
-
asteroid.models.publisher.
two_level_dict_html
(dic)[source]¶ Two-level dict to HTML. :param dic: two-level dict :type dic: dict
Returns: str for HTML-encoded two level dic
-
asteroid.models.publisher.
upload_publishable
(publish_dir, uploader=None, affiliation=None, git_username=None, token=None, force_publish=False, use_sandbox=False, unit_test=False)[source]¶ Entry point to upload publishable model.
Parameters: - publish_dir (str) – Path to the publishing directory. Usually under exp/exp_name/publish_dir
- uploader (str) – Full name of the uploader (Ex: Manuel Pariente)
- affiliation (str, optional) – Affiliation (no accent).
- git_username (str, optional) – GitHub username.
- token (str) – Access token generated to upload depositions.
- force_publish (bool) – Whether to directly publish without asking confirmation before. Defaults to False.
- use_sandbox (bool) – Whether to use Zenodo’s sandbox instead of the official Zenodo.
- unit_test (bool) – If True, we do not ask user input and do not publish.