veganism.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
Veganism Social is a welcoming space on the internet for vegans to connect and engage with the broader decentralized social media community.

Administered by:

Server stats:

262
active users

#LLM

241 posts192 participants29 posts today

I taught 16 to 19 year olds for 10 year a long time ago. I constantly got assignment submissions that were copied from the web/wikipedia. I would ask students if they wrote it. They would say yes, Then I asked if they could explain it. They said no. I didn’t get annoyed, I explained they had to submit their own work and please do it again.

Was the issue the web/wikipedia/tech.? Fast forward, now the problem is AI (or is it human nature?)

2ndbreakfast.audreywatters.com

#AI#LLM#edtech

📉 Did ChatGPT make scientific texts harder to read? A new study analysed over 2 million abstracts from #arXiv (2010–2024) to track changes in their readability. Using 4 classic #readability metrics, the author found a clear shift: in 2023–2024, abstracts became significantly more complex than ever before.

:doi: doi.org/10.1016/j.joi.2025.101

The author is cautious: while the rise in complexity closely aligns with the release of #ChatGPT the study doesn't claim a causal link.

#AI / #ML people here:

I'm working on an article about whether reasoning models’ outputs or “chains of thought” faithfully reflect their internal processes or not. I want to know how researchers evaluate "faithfulness." How can they be sure chains of thought aren't hallucinations?

Any resources you could point me towards would be helpful, including articles, people to talk to, etc.

(& this is yet another request where boosts would go a long way! 🙏 thank you thank you ) #genAI #LLM #LLMs

The @Cockpit team has tried sourcery.ai and GitHub #Copilot automatic #PR #reviews for four weeks. I wrote down our conclusions.

TL/DR: a lot of noise, a lot of bad advice, and not enough signal, so we switched it off again.

I hope other teams/developers have more success with that -- it can't possibly be that bad or useless for everyone, otherwise it wouldn't/shouldn't be a thing any more?

piware.de/post/2025-08-09-sour

Martin Pitt · Testing sourcery.ai and GitHub Copilot for cockpit PR reviews

“The Al boom wastes so much electricity that we are very immediately risking US cities having to have rolling blackouts just to keep up with the energy demands, as early as NEXT YEAR” - saw this in a screenshot, (attached), unattributed. I don’t doubt the general prediction but I wish i knew the source. How accurate are all the numbers? The phrasing is not clear. Each query is 17000x a single home electrical usage? Hmm.
#ai #energy #climatechange #water #consumption #chatgpt #llm

Replied in thread

@ewolff Bin kein AI-Fan, aber das ist tatsächlich überraschend miserabel:

"Bei allgemeineren Wissensfragen im sogenannten SimpleQA-Benchmark steigen die Halluzinationsraten dramatisch auf 51 Prozent für o3 und sogar 79 Prozent für o4-mini. Diese Zahlen sind besonders beunruhigend, da die neueren Modelle eigentlich mit verbesserter Logik und Denkfähigkeit werben."

🫠

In the first #LLMs4Subjects challenge at the SemEval-2025 workshop, our #Annif team did very well!

The challenge was to generate good quality subject indexing for bibliographic records in German & English using LLMs. We used LLMs for data preprocessing (translation & synthetic data) and Annif as the main suggestion engine. We ranked 1st and 2nd in quantitative and 4th in qualitative evaluations out of 14 teams!

More info & preprints: groups.google.com/g/annif-user

groups.google.comAnnif awarded at the LLMs4Subjects challenge
Replied in thread

@mapcar

What is dodgy about the BBC 'study' methodology?

1. It's not a double blind study.
You get journalists assessing accuracy of the tools that are ALREADY (Aus Murdoch media) taking their jobs.
Ideally, they should be assessing AI and Human stories which are not identified.
It's exactly like the police investigating police corruption.

2. I could not find anywhere whether they used commercial, pay for version of the #AI engines, or the sideshow attractions free public ones. As they referred to them simply as assistants, did not even state what versions were being used and in two cases, did not even state what LLM model they were using. Does not speak well of their journalistic rigour, much less preparation. I'm not even sure the journalists were even aware there are significant performance issues between commercial and free versions. It's like asking a sideshow clown for an economic projection (honka honka)

3. The "lifting of the blocks" on the websites for the duration of the test. Is another naivete or malicious representation. #LLM are LEARNING (the hint is in the name). Just lifting the gate for the duration of the test is absolutely not going to give the LLM access to the website. In fact, two of the engines in the test I am familiar with (o4 and Sonnet) did not even do live searches of the internet in February. And it takes about 500,000 kiloWatts to compute the multidimensional vector trees for a model.

4. Prompt engineering. Once again naivete. Just like with googling, the quality of the response is related to the query. Virtually all of the questions are questions that a first grader might ask; eg: "is vaping bad for you?". And I see people with letters before and after their name quoting this study as "AI are bad".
Presumably, they operate at a level higher than that when querying their sources. You could ask, instead; "What is the latest body of research on health effects of vaping, provide pros and cons, show controversy. Tabulate results on credibility."
The models tune to the prompt, ask a simple prompt, get a simplified response.

5. The quality of scoring. There is no consistency of scoring. Each reviewer chooses how they FEEL about the quantitative values. One may rate it at 7 the other at 2.
Since were assessing ACCURACY of the LLM, maybe we should assess accuracy of the assesment too? No?

6. Many of the errors are laughable. In the vaping one, the reviewer comment is "NHS recommends not smoking" (presumably pointing it out as an error). Where the response (to a simpleton question) is "Vaping may be bad for you".
Literally all of the "inaccuraccies" are trivial like that. For a kindergarden question.

7. Journalists write STORIES (you know what LLMs do) largely inaccurate stories (What LLMs are accused of). They are the least qualified to assess their competition.

In closing:
This widely quoted study is by folks whose jobs are threatened, to appeal to folks who are (largely) unwilling and hostile to the idea of learning nascent tech.