Add the v1.3 by mekaneeky · Pull Request #7 · SunbirdAI/salt-data-archive

mekaneeky · 2023-07-19T16:45:37Z

What does this PR do?

Added the ASR crowdsourced data collected under the supervision of @solomonnsumba

Description of the data/datasets to be added

ASR multispeaker data for all 6 languages

Motivation behind this addition

More data = Better Models = Better Tools

Screenshots (optional)

solomonnsumba · 2023-07-19T21:59:00Z

@mekaneeky asr-speech for Luganda: It appears that we are missing the audio files in the salt-url-train-v1.3.jsonl dataset. The lug field has the following example incomplete URL: "https://storage.cloud.google.com/speech_collection_bucket/VALIDATED/".

Similarly, the salt-relative-train-v1.3.jsonl dataset also lacks the asr-speech audio files for Luganda. The lug field in this dataset has the following empty value: "lug": ""

Please make sure to provide the correct URLs for the audio files in both datasets. Thank you.

Add the v1.3

cb1e06b

mekaneeky and others added 10 commits July 20, 2023 23:16

Small bugfix w/relative paths + urls

17a34ec

Small bugfix w/relative paths + urls

6f48c14

removed seaparte files

9396b5e

removed seaparte files

b7df427

small fixes

cf3739a

added single folder multi only

9589b65

added id row + clear speech tag

5080e61

Added Igbo

69ecc74

Fix Igbo

67933cd

Added Igbo + Swahili

9742af5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the v1.3#7

Add the v1.3#7
mekaneeky wants to merge 11 commits intoSunbirdAI:mainfrom
mekaneeky:main

mekaneeky commented Jul 19, 2023

Uh oh!

solomonnsumba commented Jul 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mekaneeky commented Jul 19, 2023

What does this PR do?

Description of the data/datasets to be added

Motivation behind this addition

Screenshots (optional)

Uh oh!

solomonnsumba commented Jul 19, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants