veganism.social is one of the many independent Mastodon servers you can use to participate in the fediverse.
Veganism Social is a welcoming space on the internet for vegans to connect and engage with the broader decentralized social media community.

Administered by:

Server stats:

214
active users

#languagemodels

1 post1 participant0 posts today

🚀Somebody decided that tuning the knobs on large language models wasn't enough, so they invented "Inference-Aware Fine-Tuning for Best-of-N Sampling"—because that's what the world needed, more jargon. 🙄 Meanwhile, our brains are staggering under the weight of acronyms, wondering if the Simons Foundation can fund a cure for their strain.💡
arxiv.org/abs/2412.15287 #InferenceAwareFineTuning #BestOfNSampling #LanguageModels #AIJargon #SimonsFoundation #HackerNews #ngated

arXiv logo
arXiv.orgInference-Aware Fine-Tuning for Best-of-N Sampling in Large Language ModelsRecent studies have indicated that effectively utilizing inference-time compute is crucial for attaining better performance from large language models (LLMs). In this work, we propose a novel inference-aware fine-tuning paradigm, in which the model is fine-tuned in a manner that directly optimizes the performance of the inference-time strategy. We study this paradigm using the simple yet effective Best-of-N (BoN) inference strategy, in which a verifier selects the best out of a set of LLM-generated responses. We devise the first imitation learning and reinforcement learning~(RL) methods for BoN-aware fine-tuning, overcoming the challenging, non-differentiable argmax operator within BoN. We empirically demonstrate that our BoN-aware models implicitly learn a meta-strategy that interleaves best responses with more diverse responses that might be better suited to a test-time input -- a process reminiscent of the exploration-exploitation trade-off in RL. Our experiments demonstrate the effectiveness of BoN-aware fine-tuning in terms of improved performance and inference-time compute. In particular, we show that our methods improve the Bo32 performance of Gemma 2B on Hendrycks MATH from 26.8% to 30.8%, and pass@32 from 60.0% to 67.0%, as well as the pass@16 on HumanEval from 61.6% to 67.1%.

🚀 Oh, look! Yet another 'groundbreaking' platform trying to democratize AI by letting anyone and everyone play with large language models... as long as they're willing to pretend Python isn't a thing. 🤦‍♂️ Blessed by the almighty #Mozilla, because nothing screams innovation like clunky open-source projects with dreams of world domination. 🌍✨
transformerlab.ai/ #AIinnovation #OpenSource #LanguageModels #TechTrends #HackerNews #ngated

transformerlab.aiHello from Transformer Lab | Transformer LabDocumentation for LLM Toolkit, Transformer Lab

Beyond Quacking: Deep Integration of Language Models and RAG into DuckDB

arxiv.org/abs/2504.01157

arXiv.orgBeyond Quacking: Deep Integration of Language Models and RAG into DuckDBKnowledge-intensive analytical applications retrieve context from both structured tabular data and unstructured, text-free documents for effective decision-making. Large language models (LLMs) have made it significantly easier to prototype such retrieval and reasoning data pipelines. However, implementing these pipelines efficiently still demands significant effort and has several challenges. This often involves orchestrating heterogeneous data systems, managing data movement, and handling low-level implementation details, e.g., LLM context management. To address these challenges, we introduce FlockMTL: an extension for DBMSs that deeply integrates LLM capabilities and retrieval-augmented generation (RAG). FlockMTL includes model-driven scalar and aggregate functions, enabling chained predictions through tuple-level mappings and reductions. Drawing inspiration from the relational model, FlockMTL incorporates: (i) cost-based optimizations, which seamlessly apply techniques such as batching and caching; and (ii) resource independence, enabled through novel SQL DDL abstractions: PROMPT and MODEL, introduced as first-class schema objects alongside TABLE. FlockMTL streamlines the development of knowledge-intensive analytical applications, and its optimizations ease the implementation burden.

In a groundbreaking feat of #AI wizardry, #UCSD triumphantly declares that their language models have finally learned to convincingly imitate humans. 🎉 Apparently, these digital chatterboxes are now capable of hoodwinking us mere mortals—because who needs real human interaction anyway? 🤖💬
arxiv.org/abs/2503.23674 #Innovation #LanguageModels #HumanImitation #TechTrends #HackerNews #ngated

arXiv.orgLarge Language Models Pass the Turing TestWe evaluated 4 systems (ELIZA, GPT-4o, LLaMa-3.1-405B, and GPT-4.5) in two randomised, controlled, and pre-registered Turing tests on independent populations. Participants had 5 minute conversations simultaneously with another human participant and one of these systems before judging which conversational partner they thought was human. When prompted to adopt a humanlike persona, GPT-4.5 was judged to be the human 73% of the time: significantly more often than interrogators selected the real human participant. LLaMa-3.1, with the same prompt, was judged to be the human 56% of the time -- not significantly more or less often than the humans they were being compared to -- while baseline models (ELIZA and GPT-4o) achieved win rates significantly below chance (23% and 21% respectively). The results constitute the first empirical evidence that any artificial system passes a standard three-party Turing test. The results have implications for debates about what kind of intelligence is exhibited by Large Language Models (LLMs), and the social and economic impacts these systems are likely to have.

Dear #AI #Fediverse, there's been some buzz recently about #LanguageModels that are not gigantic black boxes, and #MachineLearning in general, developed as #FLOSS.

There's this Google internal document, for example, that points out FLOSS community is close to eating Google's and OpenAI's cake:
ttps://www.semianalysis.com/p/google-we-have-no-moat-and-neither

So here is my question to you:

What are the best examples of *useful*, *small*, *on-device* models already out there?

:boost_requested: