Nume MacAroon Ⓥ @nm

1 post1 participant0 posts today

**N-gated Hacker News** @ngate@mastodon.social · 2d

N-gated Hacker News @ngate@mastodon.social

Somebody decided that tuning the knobs on large language models wasn't enough, so they invented "Inference-Aware Fine-Tuning for Best-of-N Sampling"—because that's what the world needed, more jargon. Meanwhile, our brains are staggering under the weight of acronyms, wondering if the Simons Foundation can fund a cure for their strain.
https://arxiv.org/abs/2412.15287 #InferenceAwareFineTuning #BestOfNSampling #LanguageModels #AIJargon #SimonsFoundation #HackerNews #ngated

arXiv.orgInference-Aware Fine-Tuning for Best-of-N Sampling in Large Language ModelsRecent studies have indicated that effectively utilizing inference-time compute is crucial for attaining better performance from large language models (LLMs). In this work, we propose a novel inference-aware fine-tuning paradigm, in which the model is fine-tuned in a manner that directly optimizes the performance of the inference-time strategy. We study this paradigm using the simple yet effective Best-of-N (BoN) inference strategy, in which a verifier selects the best out of a set of LLM-generated responses. We devise the first imitation learning and reinforcement learning~(RL) methods for BoN-aware fine-tuning, overcoming the challenging, non-differentiable argmax operator within BoN. We empirically demonstrate that our BoN-aware models implicitly learn a meta-strategy that interleaves best responses with more diverse responses that might be better suited to a test-time input -- a process reminiscent of the exploration-exploitation trade-off in RL. Our experiments demonstrate the effectiveness of BoN-aware fine-tuning in terms of improved performance and inference-time compute. In particular, we show that our methods improve the Bo32 performance of Gemma 2B on Hendrycks MATH from 26.8% to 30.8%, and pass@32 from 60.0% to 67.0%, as well as the pass@16 on HumanEval from 61.6% to 67.1%.

**Nick Byrd, Ph.D.** @ByrdNick@nerdculture.de · Apr 21

Apr 21

Nick Byrd, Ph.D. @ByrdNick@nerdculture.de

Can popular, generalist #LLMs answer questions as specialists?

Adopting each step of #diagnosis into a #ChainOfThought prompt made small and large #languageModels' outperform both zero-shot and the fine-tuned OLAPH method on the #MedLFQA benchmark.

https://doi.org/10.48550/arXiv.2503.03194 #AI

Structured Outputs Enable General-Purpose LLMs to be Medical Experts, pages 1 and 2.

**Hacker News** @h4ckernews@mastodon.social · Apr 19

Apr 19

Hacker News @h4ckernews@mastodon.social

Inferring the Phylogeny of Large Language Models

https://arxiv.org/abs/2404.04671

arXiv.orgPhyloLM : Inferring the Phylogeny of Large Language Models and Predicting their Performances in BenchmarksThis paper introduces PhyloLM, a method adapting phylogenetic algorithms to Large Language Models (LLMs) to explore whether and how they relate to each other and to predict their performance characteristics. Our method calculates a phylogenetic distance metrics based on the similarity of LLMs' output. The resulting metric is then used to construct dendrograms, which satisfactorily capture known relationships across a set of 111 open-source and 45 closed models. Furthermore, our phylogenetic distance predicts performance in standard benchmarks, thus demonstrating its functional validity and paving the way for a time and cost-effective estimation of LLM capabilities. To sum up, by translating population genetic concepts to machine learning, we propose and validate a tool to evaluate LLM development, relationships and capabilities, even in the absence of transparent training information.

#HackerNews #Inferring #the

**N-gated Hacker News** @ngate@mastodon.social · Apr 14

Apr 14

N-gated Hacker News @ngate@mastodon.social

Oh, look! Yet another 'groundbreaking' platform trying to democratize AI by letting anyone and everyone play with large language models... as long as they're willing to pretend Python isn't a thing. Blessed by the almighty #Mozilla, because nothing screams innovation like clunky open-source projects with dreams of world domination.
https://transformerlab.ai/ #AIinnovation #OpenSource #LanguageModels #TechTrends #HackerNews #ngated

transformerlab.aiHello from Transformer Lab | Transformer LabDocumentation for LLM Toolkit, Transformer Lab

**Hacker News** @h4ckernews@mastodon.social · Apr 7

Apr 7

Hacker News @h4ckernews@mastodon.social

Beyond Quacking: Deep Integration of Language Models and RAG into DuckDB

https://arxiv.org/abs/2504.01157

arXiv.orgBeyond Quacking: Deep Integration of Language Models and RAG into DuckDBKnowledge-intensive analytical applications retrieve context from both structured tabular data and unstructured, text-free documents for effective decision-making. Large language models (LLMs) have made it significantly easier to prototype such retrieval and reasoning data pipelines. However, implementing these pipelines efficiently still demands significant effort and has several challenges. This often involves orchestrating heterogeneous data systems, managing data movement, and handling low-level implementation details, e.g., LLM context management. To address these challenges, we introduce FlockMTL: an extension for DBMSs that deeply integrates LLM capabilities and retrieval-augmented generation (RAG). FlockMTL includes model-driven scalar and aggregate functions, enabling chained predictions through tuple-level mappings and reductions. Drawing inspiration from the relational model, FlockMTL incorporates: (i) cost-based optimizations, which seamlessly apply techniques such as batching and caching; and (ii) resource independence, enabled through novel SQL DDL abstractions: PROMPT and MODEL, introduced as first-class schema objects alongside TABLE. FlockMTL streamlines the development of knowledge-intensive analytical applications, and its optimizations ease the implementation burden.

#HackerNews #BeyondQuacking #LanguageModels

**Hacker News** @h4ckernews@mastodon.social · Apr 7

Apr 7

Hacker News @h4ckernews@mastodon.social

LLMs Understand Nullability

https://dmodel.ai/nullability-gentle/

dmodel.aiInside the CodeBot: A Gentle Introduction to How LLMs Understand Nullability

#HackerNews #LLMs #Nullability

**N-gated Hacker News** @ngate@mastodon.social · Apr 2

Apr 2

N-gated Hacker News @ngate@mastodon.social

In a groundbreaking feat of #AI wizardry, #UCSD triumphantly declares that their language models have finally learned to convincingly imitate humans. Apparently, these digital chatterboxes are now capable of hoodwinking us mere mortals—because who needs real human interaction anyway?
https://arxiv.org/abs/2503.23674 #Innovation #LanguageModels #HumanImitation #TechTrends #HackerNews #ngated

arXiv.orgLarge Language Models Pass the Turing TestWe evaluated 4 systems (ELIZA, GPT-4o, LLaMa-3.1-405B, and GPT-4.5) in two randomised, controlled, and pre-registered Turing tests on independent populations. Participants had 5 minute conversations simultaneously with another human participant and one of these systems before judging which conversational partner they thought was human. When prompted to adopt a humanlike persona, GPT-4.5 was judged to be the human 73% of the time: significantly more often than interrogators selected the real human participant. LLaMa-3.1, with the same prompt, was judged to be the human 56% of the time -- not significantly more or less often than the humans they were being compared to -- while baseline models (ELIZA and GPT-4o) achieved win rates significantly below chance (23% and 21% respectively). The results constitute the first empirical evidence that any artificial system passes a standard three-party Turing test. The results have implications for debates about what kind of intelligence is exhibited by Large Language Models (LLMs), and the social and economic impacts these systems are likely to have.

**Hacker News** @h4ckernews@mastodon.social · Apr 2

Apr 2

Hacker News @h4ckernews@mastodon.social

Circuit Tracing: Revealing Computational Graphs in Language Models (Anthropic)

https://transformer-circuits.pub/2025/attribution-graphs/methods.html

Transformer CircuitsCircuit Tracing: Revealing Computational Graphs in Language ModelsWe describe an approach to tracing the

#HackerNews #CircuitTracing #LanguageModels

**Nightwatcher's Chronicles** @NightwatchNeptune@mastodon.social · Mar 31

Mar 31

Nightwatcher's Chronicles @NightwatchNeptune@mastodon.social

Behind every word, Claude's mind remains a mystery! Its strategies, learned from data, are a whirlwind of billions of computations. We, its creators, strive to understand how it thinks to harness its power better and guide it as intended. #LanguageModels #AIAdvancements https://www.anthropic.com/research/tracing-thoughts-language-model

**Matasoft** @matasoft@mastodon.world · Feb 25

**Denny Vrandečić** @vrandecic@mas.to · Jan 14

Jan 14

Denny Vrandečić @vrandecic@mas.to

What do I think about large language models? Let's use two quotes:

"Our languages were made for telling stories, not for representing ideas very accurately." - Alan Kay

"All models are wrong, but some are useful." - George Box

#ai #llm #languagemodels

**Bruce Sterling @bruces** @bruces@mastodon.social · Dec 12, 2024

Dec 12, 2024

Bruce Sterling @bruces @bruces@mastodon.social

*Machine translation efforts circa 1960. #network #cybernetics #primevalAI #languagemodels

**Andrew Lampinen** @lampinen@sigmoid.social · Jul 17, 2024

Jul 17, 2024

Andrew Lampinen @lampinen@sigmoid.social

Pleased to share that our paper "Language models, like humans, show content effects on reasoning tasks" is now published in PNAS Nexus!
https://academic.oup.com/pnasnexus/article/3/7/pgae233/7712372

OUP AcademicLanguage models, like humans, show content effects on reasoning tasksAbstract. Abstract reasoning is a key ability for an intelligent system. Large language models (LMs) achieve above-chance performance on abstract reasoning

#LanguageModels #lms #AI

**Michał "rysiek" Woźniak ·** @rysiek@mstdn.social · Jun 29, 2023

Jun 29, 2023

Michał "rysiek" Woźniak · @rysiek@mstdn.social

Dear #AI #Fediverse, there's been some buzz recently about #LanguageModels that are not gigantic black boxes, and #MachineLearning in general, developed as #FLOSS.

There's this Google internal document, for example, that points out FLOSS community is close to eating Google's and OpenAI's cake:
ttps://www.semianalysis.com/p/google-we-have-no-moat-and-neither

So here is my question to you:

What are the best examples of *useful*, *small*, *on-device* models already out there?