this is all good stuff, but a bit too broad - i want data sets only in english, for example

For English-only datasets, you might find interest in the Mozilla Common Voice, which contains 7,335 validated hours of spoken English across 60 languages, including demographic metadata like age and sex, thereby focusing on various English speakers^[1].

Another excellent option is the People’s Speech Dataset, which is noted as among the world's largest English speech recognition corpus available under CC-BY-SA and CC-BY 4.0 licenses, comprising over 30,000 hours of transcribed English speech with diverse speakers^[2].

Curated by jimmy

Related Content From The Pandipedia

How many websites are in English on the Internet?Advancements in Speech Recognition: An Overview of Deep Speech 2 Key Achievements of Nelson Mandela Getting Started with Learning a New Language Which benchmark is widely used to test language model general knowledge across many subjects?Innovative Brain-to-Text Decoding Using Non-Invasive Techniques Understanding CLIP: A Breakthrough in Visual Models and Natural Language Windows Vista and Aero glass nostalgia What is "Attention Is All You Need"?What is Google’s speech to retrieval?Neural Machine Translation By Jointly Learning to Align And Translate [Easy Read]What tools do beginners need for gardening?The Impact of Language Extinction on Culture

this is all good stuff, but a bit too broad - i want data sets only in english, for example

Follow Up Recommendations

Related Content From The Pandipedia