API
Load a Dataset
To get data from a supported dataset, you only need one function:
SpeechDatasets.dataset — Function
dataset(dataset, inputdir::AbstractString, outputdir::AbstractString; kwargs...)Create a SpeechDataset object for dataset. inputdir is the directory containing the raw data. If the inputdir does not exist and the data is freely available, it will be automatically downloaded and put in inputdir. outputdir is the directory where will be stored summary files. kwargs... are dataset specific arguments passed to dataset
See metadata with
Base.summary — Method
summary(ds::SpeechDataset)
summary(key::String)
summary(key::Symbol)Display dataset metadata, adapt to current MIME type (HTML or plain text)
Access citation in BibTeX format with
SpeechDatasets.cite — Method
cite(ds::SpeechDataset)
cite(key::String)
cite(key::Symbol)Get citation for a given dataset in BibTeX format, if available. The output is a multiline string that can directly be appended to a .bib file. If you want a different format you can then use the external package Bibliography.jl or its components BibInternal.jl and BibParser.jl. For example, you can parse the bib string to a BibInternal.Entry object, and then convert it to JSON:
parsed = BibParser.parse_entry(cite(ds))
JSON.json(parsed[ds.citekey], omit_empty=true)Types
SpeechDataset
SpeechDatasets.SpeechDataset — Type
SpeechDatasetStore metadata about a speech dataset.
SpeechDataset objects are iterable, you can also access a single element with id indexing :
# ds::SpeechDataset
recording, annotation = ds["msmr0_si1405"]As it is an AbstractDict subType, you can use the followings functions
length(ds)
keys(ds)
values(ds)
get(ds, "key", defaultValue)Manifest items
SpeechDatasets.ManifestItem — Type
abstract type ManifestItem endBase class for all manifest item. Every manifest item should have an id attribute.
SpeechDatasets.Recording — Type
struct Recording{Ts<:AbstractAudioSource} <: ManifestItem
id::AbstractString
source::Ts
channels::Vector{Int}
samplerate::Int
endA recording is an audio source associated with and id.
Constructors
Recording(id, source, channels, samplerate)
Recording(id, source[; channels = missing, samplerate = missing])If the channels or the sample rate are not provided then they will be read from source.
SpeechDatasets.Annotation — Type
struct Annotation <: ManifestItem
id::AbstractString
recording_id::AbstractString
start::Float64
duration::Float64
channel::Union{Vector, Colon}
data::Dict
endAn "annotation" defines a segment of a recording on a single channel. The data field is an arbitrary dictionary holdin the nature of the annotation. start and duration (in seconds) defines, where the segment is locatated within the recoding recording_id.
Constructor
Annotation(id, recording_id, start, duration, channel, data)
Annotation(id, recording_id[; channel = missing, start = -1, duration = -1, data = missing)If start and/or duration are negative, the segment is considered to be the whole sequence length of the recording.
AudioSources.load — Method
load(recording::Recording [; start = -1, duration = -1, channels = recording.channels])
load(recording, annotation)Load the signal from a recording. start, duration (in seconds)
The function returns a tuple (x, sr) where x is a $N×C$ array
- $N$ is the length of the signal and $C$ is the number of channels
- and
sris the sampling rate of the signal.
AudioSources.load — Method
load(r::Recording, a::Annotation)
load(t::Tuple{Recording, Annotation})Load only a segment of the recording referenced in the annotation.
SpeechDatasets.load_manifest — Method
load_manifest(Annotation, path)
load_manifest(Recording, path)Load Recording/Annotation manifest from path.
Lexicons
Datasets with lexicons or other language files (units, wordcount, topology) should provide a lang/ directory as artifact. It is loaded on dataset instantiation.
Index
SpeechDatasets.AnnotationSpeechDatasets.ManifestItemSpeechDatasets.RecordingSpeechDatasets.SpeechDatasetAudioSources.loadAudioSources.loadBase.summarySpeechDatasets.citeSpeechDatasets.datasetSpeechDatasets.get_artifactSpeechDatasets.get_dataset_kwargsSpeechDatasets.get_download_kwargsSpeechDatasets.get_kwargsSpeechDatasets.load_manifestSpeechDatasets.prepare