API
Load a Dataset
To get data from a supported dataset, you only need one function:
SpeechDatasets.dataset — Methoddataset(name::AbstractString, inputdir::AbstractString, outputdir::AbstractString; <keyword arguments>)Extract recordings and annotations for desired dataset.
Return a SpeechDataset object.
Create the outputdir folder, with:
recordings.jsonlcontaining each audio file path and associated metadataannotations-<subset>.jsonlcontaining each annotation and associated metadata
Arguments
nameName of the dataset. Supported names are ["AVID", "INA Diachrony", "Mini LibriSpeech", "Multilingual LibriSpeech", "TIMIT", "Speech2Tex"].inputdirName of dataset directory. If the directory does not exists, it is created and the data is downloaded if possible. Not all datasets can be downloaded, for example proprietary datasets does not implements a download function.outputdiris the output directory for manifest files.
Keyword Arguments
Common kwargs are
subsetPart of the dataset to load (for example "train" or "test").langISO 639-3 code of the language.
Other kwargs can be available depending on the dataset, they can be accessed with get_dataset_kwargs(name::String).
Base.summary — MethodBase.summary(dataset::SpeechDataset)Display informations about given SpeechDataset
SpeechDatasets.get_dataset_kwargs — Methodget_dataset_kwargs(name::String)Return a NamedTuple containing each supported kwarg and its default value for a dataset identified by name.
Types
SpeechDataset
SpeechDatasets.SpeechDatasetInfos — Typestruct SpeechDatasetInfosStore metadata about a dataset.
Fields
nameDataset official namelangLanguage or list of languages (ISO 639-3 code)licenseLicense namesourceURL to the dataset publication or contentauthorslist of authorsdescriptionA few sentences describing the content or main purposesubsetsList of available subsets (for example ["train", "test"])
SpeechDatasets.SpeechDatasetInfos — MethodSpeechDatasetInfos(name::AbstractString)Construct a SpeechDatasetInfos from the Dataset name.
SpeechDatasets.SpeechDataset — Typestruct SpeechDataset <: MLUtils.AbstractDataContainerStore all dataset recordings and annotations.
It can be iterated, and will give a Tuple{Recording, Annotation} for each entry. Indexation can be done with integer or id.
Fields
infos::SpeechDatasetInfosidxs::Vector{AbstractString}id indexes to access elementsannotations::Dict{AbstractString, Annotation}Annotation for each indexrecordings::Dict{AbstractString, Recording}Recording for each index
SpeechDatasets.SpeechDataset — MethodSpeechDataset(infos::SpeechDatasetInfos, manifestroot::AbstractString, subset::AbstractString)Create a SpeechDataset from manifest files and subset.
Access a single element with integer or id indexing
# ds::SpeechDataset
ds[1]
ds["1988-147956-0027"]Access several elements by providing a list
ds[[1,4,7]]
ds[[8, 2, "777-126732-0015"]]Get all annotations
ds.annotationsManifest items
SpeechDatasets.ManifestItem — Typeabstract type ManifestItem endBase class for all manifest item. Every manifest item should have an id attribute.
SpeechDatasets.Recording — Typestruct Recording{Ts<:AbstractAudioSource} <: ManifestItem
id::AbstractString
source::Ts
channels::Vector{Int}
samplerate::Int
endA recording is an audio source associated with and id.
Constructors
Recording(id, source, channels, samplerate)
Recording(id, source[; channels = missing, samplerate = missing])If the channels or the sample rate are not provided then they will be read from source.
When preparing large corpus, not providing the channels and/or the sample rate can drastically reduce the speed as it forces to read source.
SpeechDatasets.Annotation — Typestruct Annotation <: ManifestItem
id::AbstractString
recording_id::AbstractString
start::Float64
duration::Float64
channel::Union{Vector, Colon}
data::Dict
endAn "annotation" defines a segment of a recording on a single channel. The data field is an arbitrary dictionary holdin the nature of the annotation. start and duration (in seconds) defines, where the segment is locatated within the recoding recording_id.
Constructor
Annotation(id, recording_id, start, duration, channel, data)
Annotation(id, recording_id[; channel = missing, start = -1, duration = -1, data = missing)If start and/or duration are negative, the segment is considered to be the whole sequence length of the recording.
AudioSources.load — Methodload(recording::Recording [; start = -1, duration = -1, channels = recording.channels])
load(recording, annotation)Load the signal from a recording. start, duration (in seconds)
The function returns a tuple (x, sr) where x is a $N×C$ array
- $N$ is the length of the signal and $C$ is the number of channels
- and
sris the sampling rate of the signal.
AudioSources.load — Methodload(r::Recording, a::Annotation)
load(t::Tuple{Recording, Annotation})Load only a segment of the recording referenced in the annotation.
SpeechDatasets.load_manifest — Methodload_manifest(Annotation, path)
load_manifest(Recording, path)Load Recording/Annotation manifest from path.
Lexicons
SpeechDatasets.CMUDICT — MethodCMUDICT(path)Return the dictionary of pronunciation loaded from the CMU sphinx dictionary. The CMU dictionary will be donwloaded and stored into to path. Subsequent calls will only read the file path without downloading again the data.
SpeechDatasets.TIMITDICT — MethodTIMITDICT(timitdir)Return the dictionary of pronunciation as provided by TIMIT corpus (located in timitdir).
SpeechDatasets.MFAFRDICT — MethodMFAFRDICT(path)Return the french dictionary of pronunciation as provided by MFA (french_mfa v2.0.0a).
Index
SpeechDatasets.AnnotationSpeechDatasets.DatasetBuilderSpeechDatasets.DatasetBuilderSpeechDatasets.ManifestItemSpeechDatasets.RecordingSpeechDatasets.SpeechDatasetSpeechDatasets.SpeechDatasetSpeechDatasets.SpeechDatasetInfosSpeechDatasets.SpeechDatasetInfosAudioSources.loadAudioSources.loadBase.downloadBase.summarySpeechDatasets.CMUDICTSpeechDatasets.MFAFRDICTSpeechDatasets.TIMITDICTSpeechDatasets.datasetSpeechDatasets.declareBuilderSpeechDatasets.get_dataset_kwargsSpeechDatasets.get_kwargsSpeechDatasets.get_nametypeSpeechDatasets.load_manifestSpeechDatasets.prepare