Skip to content

Add the v1.3#7

Open
mekaneeky wants to merge 11 commits intoSunbirdAI:mainfrom
mekaneeky:main
Open

Add the v1.3#7
mekaneeky wants to merge 11 commits intoSunbirdAI:mainfrom
mekaneeky:main

Conversation

@mekaneeky
Copy link
Copy Markdown

What does this PR do?

Added the ASR crowdsourced data collected under the supervision of @solomonnsumba

Description of the data/datasets to be added

ASR multispeaker data for all 6 languages

Motivation behind this addition

More data = Better Models = Better Tools

Screenshots (optional)

@solomonnsumba
Copy link
Copy Markdown

@mekaneeky asr-speech for Luganda: It appears that we are missing the audio files in the salt-url-train-v1.3.jsonl dataset. The lug field has the following example incomplete URL: "https://storage.cloud.google.com/speech_collection_bucket/VALIDATED/".

Similarly, the salt-relative-train-v1.3.jsonl dataset also lacks the asr-speech audio files for Luganda. The lug field in this dataset has the following empty value: "lug": ""

Please make sure to provide the correct URLs for the audio files in both datasets. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants