API

Load a Dataset

To get data from a supported dataset, you only need one function:

SpeechDatasets.dataset — Method

dataset(name::AbstractString, inputdir::AbstractString, outputdir::AbstractString; <keyword arguments>)

Extract recordings and annotations for desired dataset.

Return a SpeechDataset object.

Create the outputdir folder, with:

recordings.jsonl containing each audio file path and associated metadata
annotations-<subset>.jsonl containing each annotation and associated metadata

Arguments

name Name of the dataset. Supported names are ["AVID", "INA Diachrony", "Mini LibriSpeech", "Multilingual LibriSpeech", "TIMIT", "Speech2Tex"].
inputdir Name of dataset directory. If the directory does not exists, it is created and the data is downloaded if possible. Not all datasets can be downloaded, for example proprietary datasets does not implements a download function.
outputdir is the output directory for manifest files.

Keyword Arguments

Common kwargs are

subset Part of the dataset to load (for example "train" or "test").
lang ISO 639-3 code of the language.

Other kwargs can be available depending on the dataset, they can be accessed with get_dataset_kwargs(name::String).

source

Base.summary — Method

Base.summary(dataset::SpeechDataset)

Display informations about given SpeechDataset

source

SpeechDatasets.get_dataset_kwargs — Method

get_dataset_kwargs(name::String)

Return a NamedTuple containing each supported kwarg and its default value for a dataset identified by name.

source

Types

SpeechDataset

SpeechDatasets.SpeechDatasetInfos — Type

struct SpeechDatasetInfos

Store metadata about a dataset.

Fields

name Dataset official name
lang Language or list of languages (ISO 639-3 code)
license License name
source URL to the dataset publication or content
authors list of authors
description A few sentences describing the content or main purpose
subsets List of available subsets (for example ["train", "test"])

source

SpeechDatasets.SpeechDatasetInfos — Method

SpeechDatasetInfos(name::AbstractString)

Construct a SpeechDatasetInfos from the Dataset name.

source

SpeechDatasets.SpeechDataset — Type

struct SpeechDataset <: MLUtils.AbstractDataContainer

Store all dataset recordings and annotations.

It can be iterated, and will give a Tuple{Recording, Annotation} for each entry. Indexation can be done with integer or id.

Fields

infos::SpeechDatasetInfos
idxs::Vector{AbstractString} id indexes to access elements
annotations::Dict{AbstractString, Annotation} Annotation for each index
recordings::Dict{AbstractString, Recording} Recording for each index

source

SpeechDatasets.SpeechDataset — Method

SpeechDataset(infos::SpeechDatasetInfos, manifestroot::AbstractString, subset::AbstractString)

Create a SpeechDataset from manifest files and subset.

source

Access a single element with integer or id indexing

# ds::SpeechDataset
ds[1]
ds["1988-147956-0027"]

Access several elements by providing a list

ds[[1,4,7]]
ds[[8, 2, "777-126732-0015"]]

Get all annotations

ds.annotations

Manifest items

SpeechDatasets.ManifestItem — Type

abstract type ManifestItem end

Base class for all manifest item. Every manifest item should have an id attribute.

source

SpeechDatasets.Recording — Type

struct Recording{Ts<:AbstractAudioSource} <: ManifestItem
    id::AbstractString
    source::Ts
    channels::Vector{Int}
    samplerate::Int
end

A recording is an audio source associated with and id.

Constructors

Recording(id, source, channels, samplerate)
Recording(id, source[; channels = missing, samplerate = missing])

If the channels or the sample rate are not provided then they will be read from source.

Warning

When preparing large corpus, not providing the channels and/or the sample rate can drastically reduce the speed as it forces to read source.

source

SpeechDatasets.Annotation — Type

struct Annotation <: ManifestItem
    id::AbstractString
    recording_id::AbstractString
    start::Float64
    duration::Float64
    channel::Union{Vector, Colon}
    data::Dict
end

An "annotation" defines a segment of a recording on a single channel. The data field is an arbitrary dictionary holdin the nature of the annotation. start and duration (in seconds) defines, where the segment is locatated within the recoding recording_id.

Constructor

Annotation(id, recording_id, start, duration, channel, data)
Annotation(id, recording_id[; channel = missing, start = -1, duration = -1, data = missing)

If start and/or duration are negative, the segment is considered to be the whole sequence length of the recording.

source

AudioSources.load — Method

load(recording::Recording [; start = -1, duration = -1, channels = recording.channels])
load(recording, annotation)

Load the signal from a recording. start, duration (in seconds)

The function returns a tuple (x, sr) where x is a $N×C$ array

$N$ is the length of the signal and $C$ is the number of channels
and sr is the sampling rate of the signal.

source

AudioSources.load — Method

load(r::Recording, a::Annotation)
load(t::Tuple{Recording, Annotation})

Load only a segment of the recording referenced in the annotation.

source

SpeechDatasets.load_manifest — Method

load_manifest(Annotation, path)
load_manifest(Recording, path)

Load Recording/Annotation manifest from path.

source

Lexicons

SpeechDatasets.CMUDICT — Method

CMUDICT(path)

Return the dictionary of pronunciation loaded from the CMU sphinx dictionary. The CMU dictionary will be donwloaded and stored into to path. Subsequent calls will only read the file path without downloading again the data.

source

SpeechDatasets.TIMITDICT — Method

TIMITDICT(timitdir)

Return the dictionary of pronunciation as provided by TIMIT corpus (located in timitdir).

source

SpeechDatasets.MFAFRDICT — Method

MFAFRDICT(path)

Return the french dictionary of pronunciation as provided by MFA (french_mfa v2.0.0a).

source

Index

SpeechDatasets.Annotation
SpeechDatasets.DatasetBuilder
SpeechDatasets.DatasetBuilder
SpeechDatasets.ManifestItem
SpeechDatasets.Recording
SpeechDatasets.SpeechDataset
SpeechDatasets.SpeechDataset
SpeechDatasets.SpeechDatasetInfos
SpeechDatasets.SpeechDatasetInfos
AudioSources.load
AudioSources.load
Base.download
Base.summary
SpeechDatasets.CMUDICT
SpeechDatasets.MFAFRDICT
SpeechDatasets.TIMITDICT
SpeechDatasets.dataset
SpeechDatasets.declareBuilder
SpeechDatasets.get_dataset_kwargs
SpeechDatasets.get_kwargs
SpeechDatasets.get_nametype
SpeechDatasets.load_manifest
SpeechDatasets.prepare