Shortcuts

asteroid.data.avspeech_dataset module

asteroid.data.avspeech_dataset.get_frames(video)[source]
class asteroid.data.avspeech_dataset.Signal(video_path: Union[str, pathlib.Path], audio_path: Union[str, pathlib.Path], embed_dir: Union[str, pathlib.Path], sr=16000, video_start_length=0, fps=25, signal_len=3)[source]

Bases: object

This class holds the video frames and the audio signal.

Parameters:
  • video_path (str,Path) – Path to video (mp4).
  • audio_path (str,Path) – Path to audio (wav).
  • embed_dir (str,Path) – Path to directory that stores embeddings.
  • sr (int) – sampling rate of audio.
  • video_start_length – video part no. [1]
  • fps (int) – fps of video.
  • signal_len (int) – length of the signal

Note

each video consists of multiple parts which consists of fps*signal_len frames.

get_embed()[source]
get_audio()[source]
class asteroid.data.avspeech_dataset.AVSpeechDataset(input_df_path: Union[str, pathlib.Path], embed_dir: Union[str, pathlib.Path], n_src=2)[source]

Bases: sphinx.ext.autodoc.importer._MockObject

Audio Visual Speech Separation dataset as described in [1].

Parameters:
  • input_df_path (str,Path) – path for combination dataset.
  • embed_dir (str,Path) – path where embeddings are stored.
  • n_src (int) – number of sources.
References
[1] “Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation” Ephrat et. al https://arxiv.org/abs/1804.03619
dataset_name = 'AVSpeech'[source]
static encode(x: numpy.ndarray, p=0.3, stft_encoder=None, EPS=1e-08)[source]
static decode(tf_rep: numpy.ndarray, p=0.3, stft_decoder=None, final_len=48000)[source]
Read the Docs v: v0.4.4
Versions
latest
stable
v0.4.4
v0.4.3
v0.4.2
v0.4.1
v0.4.0
v0.3.5_b
v0.3.4
v0.3.3
v0.3.2
v0.3.1
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.