veganism.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
Veganism Social is a welcoming space on the internet for vegans to connect and engage with the broader decentralized social media community.

Administered by:

Server stats:

240
active users

#speechrecognition

0 posts0 participants0 posts today

After my #wake_word_detection #research has delievered fruits, I have plans to continue works in the voice domain. I would love if I could train a #TTS model which has #British accent so I would use it to practice.

I was wondering if I could do the inference on #A311D #NPU. However, as I am skimming papers of different models, having inference on A311D with reasonable performance seems unlikely. Even training of these models on my entry level #IntelArc #GPU would be painful.

Maybe I could just finetune an already existing models. I am also thinking about using #GeneticProgramming for some components of these TTS models to see if there will be better inference performance.

There are #FastSpeech2 and #SpeedySpeech which look promising. I wonder how much natural their accents will be. But they would be good starting points.

BTW, if anyone needs opensource models, I would love to work as a freelancer and have an #opensource job. Even if someone can just provide access to computation resources, that would be good.

#forhire #opensourcejob #job #hiring

Speech recognition systems struggle with accents and dialects, risking problems in critical fields like healthcare and emergency services. Imagine calling 911 and the AI used to screen out non-emergency calls can’t understand you.

A Spanish language professor explains: theconversation.com/sorry-i-di #AI #speechrecognition

The Conversation‘Sorry, I didn’t get that’: AI misunderstands some people’s words more than others
More from The Conversation U.S.

#UnplugTrump - Tipp5:
Verabschiede dich von Alexa und anderen Sprachassistenten, die deine Gespräche mithören und auswerten. Nutze stattdessen eine datenschutzfreundliche Alternative wie OpenVoiceOS, ein Open-Source-Sprachassistent, der von einer aktiven Community weiterentwickelt wird und auf einem RaspberryPi läuft. So behältst du die Kontrolle über deine Daten.

Hey folks :FediverseSymbol:

We've actually done an unwritten, off-the-cusp trans voice Friday recording today :TransHeart:

We've not listened back to it, because voice dysphoria, but we've added full alt text.

In case you're wondering how we've done that without listening back to it, we've once against used an amazing tool called Subtitle Edit, which has audio to text functionality via the Whisper speech recognition engine.

We used the large-v3 model, which is about 3.1 GB, but gives incredibly accurate transcription.

In case anyone can't access the alt text, we've added the full transcript below too.

#TransVoiceFriday #TransVoice #voice #VoiceFeminisation #VoiceFeminization #VoiceTraining #trans #transgender #TransFem #VoiceDysphoria #SubtitleEdit #PurfviewWhisper #AudioToText #SpeechToText #SpeechRecognition

Hey folks, I know that we haven't done a voice note in forever, and that's been for a multitude of reasons, some of which are related to mental health, some of which are related to work, stress, anxiety, depression, etc, things like that, which comes under mental health anyway, yeah, partly due to poor time management, yay for being AuDHD! But not gonna lie, some of it does come down to underlying voice dysphoria, because this is the best we've managed to get since December 2021. And just for anyone who hasn't heard roughly what we sounded like beforehand, we haven't exactly moved our voice up a lot. I mean, the base level would just be down here. So I can move my voice back up here easily now, and this is the comfortable, this is the default voice. But, um... It's not where I want it to be, it's not in the female range, and I can't easily push the pitch up higher without it sounding wrong. But yeah, there's been a lot of stuff going on recently, um, a lot of bad stuff for everyone, don't want to talk about all of that. But, um, let's just focus on supporting each other, helping each other, um, being kind to ourselves and others right now, and being compassionate and empathetic. That's all I've really got to say. I'm trying to do the same thing with ourselves, but yeah, it's hard sometimes. Anyway, ta-ta for now.

00:00/01:40
Replied in thread

@alcinnz

> So please don't talk to me like "AI" is new...

Yeah, I wonder how many people know that #Windows Vista came with a #SpeechToText / #SpeechRecognition program out of the box!

It even has its own Wikipedia page!

en.wikipedia.org/wiki/Windows_

According to the article, it was released November 2006, more than 17 years ago!

I've tried it in Windows 10 and it worked beautifully (e.g. dictate text, execute commands, like open a program etc.). All locally!

en.wikipedia.orgWindows Speech Recognition - Wikipedia

Quick comparison between AWS and Google's speech recognition.

Google has a superior UI. Click to upload a file and then a bunch of options.
AWS makes you go to a different site to upload the file to S3, and offers very few options.

But AWS is *amazingly* accurate, whereas Google is quite dumb.

Take the phrase "Fourteen pounds".
AWS: "£14"
Google: "14 LB"

WTAF?

Both were told to process as en-GB, and there are a few quirks in both. But AWS is excellent.

Today in my web browsing history: The tech sector waxing lyrical about #OpenAI's upgrades to #ChatGPT, which include #SpeechRecognition and #SpeechSynthesis capabilities, meanwhile on Wikipedia, which was scraped to train many LLMs, begs for donations.

All of the content we've placed online has been mined, processed, refined, and is being sold back to us.

For many of us, that's a new experience.

But I suspect that for those from the global South, it's a repeat of centuries of colonisation.

Reposting my #introduction after the SDF database crash—

Hi everybody! I'm Will and I enjoy all things #languages and #code. My day job is in #nlp (natural language processing #nlproc) and #speechrecognition for language education. In grad school I worked on #Bayesian #pragmatics with #deeplearning.

I speak English natively, #español #Spanish / #中文 #Chinese / #العربية #Arabic passably, and lots of others poorly. Talk to me in your language!