You can meet me at the Common Voice stall before noon at @fossasia 2025 site.
ทั้งทีมงาน
Come and experience something special with us!
ขอเชิญชวนทุกท่านเข้าร่วมงาน
โดยภายในงานมีการพูดคุยกับ speakers รวมถึงจัดบูธเกี่ยวกับชุมชน
หนึ่งในนั้นคือบูธ
Has anyone here received this mail from Mozilla regarding commonvoice?
>>quote
Mozilla has always fought for an open, accessible internet that puts people in control — no matter the obstacles. Today, we need to share a significant challenge: Mozilla Common Voice is at risk of losing $1.05 million in U.S. government funding due to Donald Trump and Elon Musk's interference with science and technology grants.1, 2
Mozilla Common Voice is the largest open, crowd-sourced speech recognition dataset, designed to make voice-enabled technology available to the world’s 7000 languages. This funding was meant to help expand our work over the next three years, letting us build features in response to community demand — like code-switching datasets and Indigenous language licensing. Now, we don’t know if we’ll receive any of that support.
But we’re not backing down. We’re adapting our roadmap, staying nimble, and finding new ways to sustain this work. And we can’t do it alone. That’s why we’re turning to you for support.
Last year, 100,558 people contributed to Mozilla. That's an incredible show of support. If you've been waiting for the right moment to donate, now is the time. Your contribution will help sustain work like Common Voice and advance an open and accessible internet for all.
Make a $10 USD contribution to Mozilla today to help build an internet that puts people first.
<< end quote
#Mozilla #commonvoice #firefox #Trump #funding #Musk
https://commonvoice.mozilla.org/en?form=US-funding-freeze-3&amount=10¤cy=USD
I'm tired. Real tired. Completely fix. And I want to donate my fix voice to #CommonVoice
It's been another big year as I work towards completing my #dissertation on voice dataset documentation and how it influences how well #speech technologies work for all voices at the #ANU School of Cybernetics - with big thanks to my supervisors, Elizabeth Williams, Alexandra Zafiroglu, Jofish Kaye and Paul Wong 黃仲熙.
I've wrapped up a partnership with Mozilla's #CommonVoice team, which let me explore the hashtag#dataset in a lot more detail - big thanks EM Lewis-Jong, @jessie Dmitrij Feller in particular.
It was an incredible honor to keynote #FF24 at the National Film and Sound Archive of Australia alongside Peter-Lucas Jones of Te Hiku Media, expertly facilitated by Keir Winesmith - thanks @ingridbmason and team for the opportunity - and stay tuned for a little project we are working on - we know you're all eager for the video of this keynote, but we're adding a little more magic.
I helped out with @everythingopen Media and Comms this year, and am looking forward to speaking in January in Adelaide.
A huge thanks to my fellow #PhD buddies - Lorenn Ruster, @nedcpr, Glen Berman, Tom Chan, Danny Bettay, Charlotte Bradley, @Amirasadi, Memunat Ajoke Ibrahim and the later cohorts for all your support, shut up and write sessions and intellectual growth.
When was the last time that you actually contributed to an open source project?
I'm certain that you've heard of common voice at Mozilla
In case you haven't The languages that need more data are All of them. So even contribute 15 samples a Day does a lot on the whole.
I had slacked off on my Common Voice contributions, but I'm now picking it up again
The Mozilla #CommonVoice #dataset v20 was released yesterday - the largest open #speech dataset in the world. My #dataviz, linked below, shows a continuation of patterns seen for some years now:
What are your interpretations of the dataset?
https://observablehq.com/@kathyreid/mozilla-common-voice-v20-dataset-metadata-coverage
En cette dernière journée du salon #PSLXXL de @parinux, nous vous présentons #Pontoon #traductions #Nightly #CommonVoice #PDF dans Firefox
If you're a #language nerd like I am, then you won't have missed the @mozilla #CommonVoice v19 #speech #dataset release - which now features 131 languages! Here's my #dataviz, done in @observablehq of the v19 #metadata coverage.
I've updated the visualisation this time around with human-readable language names instead of their ISO-639 or BCP-47 language codes to make it it easier to read.
There's some interesting observations:
What do you make of the data visualisation? Are there any other insights you can see?
Big thanks to the CV team for all their efforts - EM, Jessica Rose, Dmitrij Feller and Justin Grant.
https://observablehq.com/@kathyreid/mozilla-common-voice-v19-dataset-metadata-coverage
La dernière ambition de #CommonVoice de @mozilla@mozilla.social : obtenir des outils vocaux qui comprennent les conversations naturelles et le langage courant https://foundation.mozilla.org/en/blog/common-voice-spontaneous-speech/
Each quarter, when the new @mozilla #CommonVoice #dataset is released, I do a #dataviz using @observablehq of its #metadata coverage, across all 100+ languages, based on the JSON summary that is part of the release.
Some of my observations from the v18 release are:
Catalan also appears to have the highest percentage of audio recordings by older speakers - e.g. speakers in their forties, fifties and older. Again, this highlights the diversity of speakers in the Catalan dataset.
Big thanks to all data contributors in this release for your donated utterances, and to Dmitrij Feller, @jessie, Gina Moape, EM Lewis-Jong and the team for all your efforts.
What are your thoughts? What conclusions do you draw?
https://observablehq.com/@kathyreid/mozilla-common-voice-v18-dataset-metadata-coverage
Delighted to be able to publicise a paper that was presented at the @ALTAnlp 2023 Workshop at the end of last year, co-authored with my #PhD supervisor, Associate Professor @eltwilliams, and written as part of my research at #ANU School of Cybernetics.
Titled "Right the docs: Characterising voice dataset documentation practices used in machine learning", it combines both exploratory interviews and documentation analysis to characterise how large voice datasets - e.g. #LibriSpeech, @mozilla's #CommonVoice, and several others, document their #metadata.
Unsurprisingly, it finds that the #dataset #documentation practices seen currently do not meet the needs of the #ML practitioners who use these datasets.
We show, once again, in the words of Nithya Sambasivan - "everyone wants to do the model work, but nobody wants to do the data work" ...
https://aclanthology.org/2023.alta-1.6/
Citation:
Reid, K., Williams, E.T., 2023. Right the docs: Characterising voice dataset documentation practices used in machine learning, in: Muresan, S., Chen, V., Casey, K., David, V., Nina, D., Koji, I., Erik, E., Stefan, U. (Eds.), Proceedings of the 21st Annual Workshop of the Australasian Language Technology Association. Association for Computational Linguistics, Melbourne, Australia, pp. 51–66.
For the past couple of years, as each new @mozilla #CommonVoice dataset of #voice #data is released, I've been using @observablehq to visualise the #metadata coverage across the 100+ languages in the dataset.
Version 17 was released yesterday (big ups to the team - EM Lewis-Jong, @jessie, Gina Moape, Dmitrij Feller) and there's some super interesting insights from the visualisation:
See the visualisation here and let me know your thoughts below!
Last week, as part of my #PhD program at the #ANU School of #cybernetics, I gave my final presentation, which is a summary of my methods and #research findings. I covered my interview work, the #dataset documentation analysis work I've been doing and my analysis work around #accents in @mozilla's #CommonVoice platform.
There were some insightful and thought-provoking questions from my panel and audience members, and of course - so many ideas for future research inquiry!
A huge thanks to my panel, chaired so well by Professor Alexandra Zafiroglu, to Dr Elizabeth Williams, my meticulous, methodical and always-encouraging Primary Supervisor, and to my co-supervisors Dr Jofish Kaye and Dr Paul Wong 黃仲熙 for their deep expertise in #HCI and #data respectively.
Similarly, a huge thank you to my #PhD cohort - Charlotte Bradley, Tom Chan, Danny Bettay and Sam Backwell - as well as the other cohorts in the School - for your encouragement and intellectual journeying.
I'm delighted to be presenting this paper, joint work with my doctoral supervisor, @eltwilliams, at the upcoming #EAAMO23 @ACM conference (presenting remotely to Boston from Australia - how good is hybrid?!)
#CommonVoice and #accent choice: data contributors describe their spoken accents in diverse ways
The paper reports on an analysis of accent data in #CommonVoice, and the ways in which data contributors self-describe their accents - a feature which has been available in the platform since 2022.
https://dl.acm.org/doi/10.1145/3617694.3623258
If you'd like to see the @observablehq code behind the #dataviz in the paper, you can access it here:
https://observablehq.com/@kathyreid/phd-mozilla-cv-accent-relationships-v13-eaamo
Good morning everyone! Here's my latest #Connections #Introduction #Introductions #TwitterMigration post, where I curate interesting accounts for you to follow from across the #Fediverse
@maryrobinette is a #writer #author, and I am listening to her incredible #LadyAstronaut series at the moment. If you love #SciFi (esp hard scifi) you should read it, too!
@sayashk is a #ComputerScience #PhD candidate at #Princeton, who is researching failures in #ML (he's also co-running a workshop on open #FoundationModels in about 15 hours, see my previous posts for more info)
@michcampbell is Dr Micha Campbell and she is a #PalaeoClimate #PostDoc living on #Dharawal country
@mthv is a #Research #Engineer who works in #GIS at #CNRS
@astrolori is Lori and she is into #OpenSource, #fashion, #space and #tech #WomenInSTEM
@pandas_dev is the official account for #pandas, the #Python #DataAnalysis tool
@jessie is a lover of #languages and helps run #CommonVoice, @mozilla 's open #voice #data set, which now supports over 100 languages. She also teaches #WebDev and loves #hiking. She's awesome you should follow her
That's all for now, please do share your own lists so we can create deeper connections, and a tightly-connected community here
I'm reminded here of @maryrobinette's short story - "Red Rockets" - "She built something better than fireworks. She built community."
It's been a while since I did an #Introductions #Connections #Introduction #TwitterMigration post, where I curate a list of interesting people in the #Fediverse you might want to follow - helping us create valuable communities and connections.
@nrennie is a #lecturer #researcher in #health #DataScience at #LancasterUniversity, and she does amazing work in #DataViz, primarily with #RStats
@mkohler is a #SoftwareEngineer and #EngineeringManager, and a long-time contributor to all things @mozilla, and in particular, the #CommonVoice project
@isomeme is a #SoftwareEngineer too and she practices Hermetic #Magick
@skc is Scott Kingsley Clark, who is also a #SoftwareEngineer and Lead Dev of the #Pods framework for #WordPress
@aehdeschaine is interested in #libraries #archives #architecture and #PaleoGeography
@hclarke is a Senior #Research Fellow at #UniMelb. He researches #WildFire and #ClimageChange
@blogdiva is Liza, and she, well she's generally awesome and shares my views on Space Karen / ApartheidBoi
Aprofite per a recordar-vos que si voleu que els assistents de veu, com la Siri, Alexa, OK Google, etc. parlen la nostra llengua, una manera d’aconseguir-ho és participant en el @commonvoicecat de @mozilla
Podeu llegir les frases que l’aplicació va oferint i enregistrar la vostra veu, o podeu validar els talls de veu d’altres usuaris per a comprovar que es corresponen amb el text en pantalla.
Cal molta ajuda i molts esforços, que no decaiga!!
#CommonVoice #CommonVoiceCAT