Nume MacAroon Ⓥ @nm

**trndgtr.com** @trndgtr@mastodon.social · 2d

trndgtr.com @trndgtr@mastodon.social

Training AI to Persuade? - Jeremie & Edouard Harris on JRE

#reinforcementlearning #ai #aipersuasion

**RAIswarms.com** @raiswarms@mastodon.social · 3d

RAIswarms.com @raiswarms@mastodon.social

Building Agents That Orchestrate Production + Inventory Flow

https://raiswarms.com/building-agents-that-orchestrate-production-inventory-flow/
#DevRAI #IndustryRAI #AI #AutonomousFactories #ConstraintSatisfaction #Industry50 #InventoryOptimization #MultiagentSystems #ProductionScheduling #ReinforcementLearning

**Victoria Stuart** @persagen@mastodon.social · Apr 22 *

Apr 22 *

Victoria Stuart @persagen@mastodon.social

[AGI discussion, DeepMind] Welcome to the Era of Experience
https://storage.googleapis.com/deepmind-media/Era-of-Experience%20/The%20Era%20of%20Experience%20Paper.pdf
https://old.reddit.com/r/MachineLearning/comments/1k4zr1i/r_deepmind_welcome_to_the_era_of_experience

* threshold of new era in AI that promises unprecedented level of ability
* new generation of agents will acquire superhuman capabilities, learning predominantly f. experience
* paradigm shift, accompanied by algorithmic advancements in RL, will unlock new supra-human capabilities

#Google #DeepMind #AI

**Antonio** @qolorao@mastodon.social · Apr 15

Apr 15

Antonio @qolorao@mastodon.social

Nuestro último artículo "MELGYM: A dynamic control interface for MELCOR simulations" ha sido publicado en la revista SoftwareX.

https://www.sciencedirect.com/science/article/pii/S2352711025001153

Presentamos MELGYM, una interfaz en Python que permite el control interactivo de simulaciones con MELCOR, un código ampliamente utilizado para el análisis de seguridad en instalaciones nucleares como IFMIF-DONES.

#reinforcementlearning #ai #nuclear

**Europe Says** @europesays@pubeurope.com · Apr 12

Apr 12

Europe Says @europesays@pubeurope.com

https://www.europesays.com/1988179/ Exploring Artificial Intelligence Applications in Agriculture #AI #ArtificialIntelligence #ComputerVision #ExpertSystems #MachineLearning #NaturalLanguageProcessing #PennStateExtension #ReinforcementLearning #Robotics

**N-gated Hacker News** @ngate@mastodon.social · Apr 8

Apr 8

N-gated Hacker News @ngate@mastodon.social

Oh, look! Another groundbreaking study in which #academia leans on #buzzwords like "reinforcement learning" to suggest that someday, maybe, #AI will conquer more than just calculus and compiling code. It's like a toddler boasting about mastering finger painting and claiming they’ll soon create the next Mona Lisa.
https://arxiv.org/abs/2503.23829 #ReinforcementLearning #GroundbreakingStudy #TechTrends #HackerNews #ngated

arXiv.orgCrossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse DomainsReinforcement learning with verifiable rewards (RLVR) has demonstrated significant success in enhancing mathematical reasoning and coding performance of large language models (LLMs), especially when structured reference answers are accessible for verification. However, its extension to broader, less structured domains remains unexplored. In this work, we investigate the effectiveness and scalability of RLVR across diverse real-world domains including medicine, chemistry, psychology, economics, and education, where structured reference answers are typically unavailable. We reveal that binary verification judgments on broad-domain tasks exhibit high consistency across various LLMs provided expert-written reference answers exist. Motivated by this finding, we utilize a generative scoring technique that yields soft, model-based reward signals to overcome limitations posed by binary verifications, especially in free-form, unstructured answer scenarios. We further demonstrate the feasibility of training cross-domain generative reward models using relatively small (7B) LLMs without the need for extensive domain-specific annotation. Through comprehensive experiments, our RLVR framework establishes clear performance gains, significantly outperforming state-of-the-art open-source aligned models such as Qwen2.5-72B and DeepSeek-R1-Distill-Qwen-32B across domains in free-form settings. Our approach notably enhances the robustness, flexibility, and scalability of RLVR, representing a substantial step towards practical reinforcement learning applications in complex, noisy-label scenarios.

**Hacker News** @h4ckernews@mastodon.social · Apr 8

Apr 8

Hacker News @h4ckernews@mastodon.social

Can reinforcement learning for LLMs scale beyond math and coding tasks? Probably

https://arxiv.org/abs/2503.23829

#HackerNews #reinforcementlearning #LLMs

**JesseTong** @natsume_shokogami@mastodon.world · Apr 6

Apr 6

JesseTong @natsume_shokogami@mastodon.world

@lianna Well, most #AIs and #robots in fiction I think their inputs are mostly or fully sensory-based, and they learn in real time through #ReinforcementLearning - esque techniques. AIs like LLMs are frozen in place (they never update and are just replaced over time), and they do not have any meanful interaction to the real world, nor like reflection.

I'd think that robots like #Sophia a few years ago would be more closer to the former than the latter, but #AIBros love conflating the twos.

**Antonio Lieto** @antoniolieto@fediscience.org · Apr 1

Apr 1

Antonio Lieto @antoniolieto@fediscience.org

Happy birthday to Cognitive Design for Artificial Minds (https://lnkd.in/gZtzwDn3) that was released 4 years ago!

Since then its ideas have been presented and discussed widely in the research fields of AI/Cognitive Science/Robotics and - nowadays - both the possibilities and the limitations of: #LLMs, #GenerativeAI and #ReinforcementLearning (already envisioned and discussed in the book) have become a common topic of research interests in the AI community and beyond.
Similarly also the topic concerning the evaluation - in human-like and human-level terms - of the current AI systems has become a critical theme related to the problem Anthropomorphic interpretation of AI output (see e.g. https://lnkd.in/dVi9Qf_k ).
Book reviews have been published on ACM Computing Reviews (2021) https://lnkd.in/dWQpJdkV and on Argumenta (2023): https://lnkd.in/derH3VKN

I have been invited to present the content of the book in over 20 official scientific events in international conferences, Ph.D Schools in US, China, Japan, Finland, Germany, Sweden, France, Brazil, Poland, Austria and, of course, Italy.

A news I am happy to share is that Routledge/Taylor & Francis contacted me few weeks ago for a second edition! Stay tuned!

The #book is available in many webstores:
- Routledge: https://lnkd.in/dPrC26p
- Taylor & Francis: https://lnkd.in/dprVF2w
- Amazon: https://lnkd.in/dC8rEzPi

@academicchatter @cognition
#AI #minimalcognitivegrid #CognitiveAI #cognitivescience #cognitivesystems

**Python Weekly** @python_discussions@mastodon.social · Mar 31

Mar 31

Python Weekly @python_discussions@mastodon.social

Implemented 18 RL Algorithms in a Simpler Way

https://github.com/FareedKhan-dev/all-rl-algorithms

Discussions: https://discu.eu/q/https://github.com/FareedKhan-dev/all-rl-algorithms

#programming #python #reinforcementlearning

**nemo™** @nemo@mas.to · Mar 8

Mar 8

nemo™ @nemo@mas.to

Check out QwQ-32B! A 32B parameter model that rivals DeepSeek-R1 in performance, thanks to Reinforcement Learning. It excels in math, coding & general problem-solving. Open-weight on Hugging Face & ModelScope! Explore the future of AGI!
https://qwenlm.github.io/blog/qwq-32b/ #AI #ReinforcementLearning #MachineLearning #Qwen #OpenSource

Qwen · Mar 5QwQ-32B: Embracing the Power of Reinforcement LearningQWEN CHAT Hugging Face ModelScope DEMO DISCORD Scaling Reinforcement Learning (RL) has the potential to enhance model performance beyond conventional pretraining and post-training methods. Recent studies have demonstrated that RL can significantly improve the reasoning capabilities of models. For instance, DeepSeek R1 has achieved state-of-the-art performance by integrating cold-start data and multi-stage training, enabling deep thinking and complex reasoning. Our research explores the scalability of Reinforcement Learning (RL) and its impact on enhancing the intelligence of large language models.

**Dr. Anna Latour** @anna@mathstodon.xyz · Mar 4

Mar 4

Dr. Anna Latour @anna@mathstodon.xyz

My colleagues at TU Delft are seeking to hire a postdoc to work on Applied Planning and Scheduling under Uncertainty, with applications in modelling supply chain scenarios for offshore wind farm installation: https://careers.tudelft.nl/job/Delft-Postdoc-in-Applied-Planning-and-Scheduling-under-Uncertainty-2628-CD/814890902/

careers.tudelft.nlPostdoc in Applied Planning and Scheduling under UncertaintyPostdoc in Applied Planning and Scheduling under Uncertainty

#AcademicMastodon #PostdocLife #Hiring

**Compsci Weekly** @compsci_discussions@mastodon.social · Feb 24

Feb 24

Compsci Weekly @compsci_discussions@mastodon.social

[R] Literally recreated Mathematical reasoning and Deepseek’s aha moment in less than 10$ via end to end Simple Reinforcement Learning

https://medium.com/@rjusnba/overnight-end-to-end-rl-training-a-3b-model-on-a-grade-school-math-dataset-leads-to-reasoning-df61410c04c6

Discussions: https://discu.eu/q/https://medium.com/%40rjusnba/overnight-end-to-end-rl-training-a-3b-model-on-a-grade-school-math-dataset-leads-to-reasoning-df61410c04c6

Medium · Feb 18Recreating DeepSeek’s ‘Wait!/Aha Moment’ for Under $10: A Breakthrough in Affordable AI Reasoning-> Reinforce-Lite⚡️By Raz

#compsci #machinelearning #reinforcementlearning

**Tero Keski-Valkama** @tero@rukii.net · Feb 13

Feb 13

Tero Keski-Valkama @tero@rukii.net

How to formulate exploration-exploitation trade-off better than all the hacks on top of Bellman equation?

We can first of all simply estimate the advantage of exploration by Monte-Carlo in a swarm setting: Pitting fully exploitative agents against fully exploitative agents which have the benefit of recent exploration. This can be easily done by lagging policy models.

Of course the advantage of exploration needs to be divided by the cost of exploration, which is linear to the number of agents used in the swarm to explore at a particular state.

Note that the advantage of exploration depends on the state of the agent, so we might want to define an explorative critic to estimate this.

What's beautiful in this formulation is that we can incorporate autoregressive #WorldModels naturally, as the exploitative agents only learn from rewards, but the explorative agents choose their actions in a way which maximizes the improvement of the auto-regressive World Model.

It brings these two concepts together as sides of the same coin.

Exploitation is reward-guided action, exploration is auto-regressive state transition model improvement guided action.

Balancing the two is a swarm dynamic which encourages branching where exploration has an expected value in reward terms. This can be estimated by computing the advantage of exploitative agents utilizing recent exploration versus agents which do not, and returning this advantage to the points of divergence between the two.

#mathematics #ReinforcementLearning #RL

**Tino Eberl** @tinoeberl@mastodon.online · Jan 16

Jan 16

Tino Eberl @tinoeberl@mastodon.online

#KINutzen #Retröt
Eine neue #KI der TU #Wien verbessert die Behandlung von #Sepsis: Mithilfe von #ReinforcementLearning analysiert die KI #Patientendaten und schlägt optimale #Behandlungsschritte vor. Erste Tests zeigen eine um 3 % höhere #Heilungsrate. Die KI unterstützt #Ärzte, trifft aber keine Entscheidungen – die Verantwortung bleibt beim Menschen.

#MedizinischeKI #Intensivmedizin #Innovation

https://tino-eberl.de/nutzen-kuenstlicher-intelligenz/neue-ki-der-tu-wien-steigert-heilungsraten-bei-blutvergiftungen/

Tino Eberl · Aug 8, 2023Neue KI der TU Wien steigert Heilungsraten bei BlutvergiftungenEine an der TU Wien entwickelte KI schlägt Behandlungsstrategien bei Sepsis vor und steigerte die Heilungsrate um 3 %.

**Compsci Weekly** @compsci_discussions@mastodon.social · Dec 12, 2024

Dec 12, 2024

Compsci Weekly @compsci_discussions@mastodon.social

[R] Evaluating the world model implicit in a generative model

https://arxiv.org/pdf/2406.03689

Discussions: https://discu.eu/q/https://arxiv.org/pdf/2406.03689

#compsci #machinelearning #reinforcementlearning

**Brandon Rohrer** @brohrer@recsys.social · Nov 24, 2024

Nov 24, 2024

Brandon Rohrer @brohrer@recsys.social

Adding my love letter to

arxiv.org/pdf/2304.01315

Empirical Design in Reinforcement Learning
by
Andrew Patterson, Samuel Neumann, Martha White, Adam White

JMLR 25 (2024) 1-63

#ReinforcementLearning

These aren’t the heroes we deserve, but they are the heroes we need.

**Brandon Rohrer** @brohrer@recsys.social · Nov 19, 2024

Nov 19, 2024

Brandon Rohrer @brohrer@recsys.social

If you've ever worked with a physical robot and #ReinforcementLearning you've had to deal with delays. Thinking takes time, even at computer speeds, and the world doesn't stop.

One way to minimize the delays is for the to world to act on new commands mid-cycle, rather than wait for its next turn.

https://www.brandonrohrer.com/rl_noninteger_delay

Stochastic non-integer delay in real-time reinforcement learning
Brandon Rohrer
When turn-taking breaks down
Synchronization between agents in worlds gets trickier when the world is operating in real time, as with robots and other physical systems. In non real-time RL, the world can wait for the agent to plan its next action. It is a turn-taking scenario where the agent waits for the world to make its next move and vice versa. But for physical hardware the world doesn't wait for the agent to compute. The instant it reports its sensor readings to the agent, they are already obsolete. The world has moved on in its machinations. Time and tide wait for no agent.

If the world moves slowly compared to the speed at which the agent computes, this mismatch can be mostly ignored. But if computations are intensive, as with some vision applications and nearly all transformer-based agents, or compute power is limited, as with many mobile robots, then this delay can become considerable. It can render modern algorithms infeasible.

This problem is described in this 2019 paper as well as here and here. The terminology used to describe it varies, but if you want to explore the literature, the search terms "stochastic delay" and "non-integer delay" will get you started.

**Python Weekly** @python_discussions@mastodon.social · Nov 3, 2024

Nov 3, 2024

Python Weekly @python_discussions@mastodon.social

[Project] PyMAB: An exploratory Python Library for Multi-Armed Bandits

https://github.com/danielaLopes/pymab

Discussions: https://discu.eu/q/https://github.com/danielaLopes/pymab

GitHubGitHub - danielaLopes/pymab: Python library for Multi-Armed Bandit algorithmsPython library for Multi-Armed Bandit algorithms. Contribute to danielaLopes/pymab development by creating an account on GitHub.

#programming #python #reinforcementlearning

**Brandon Rohrer** @brohrer@recsys.social · Jul 6, 2024

Jul 6, 2024

Brandon Rohrer @brohrer@recsys.social

A fun part of working on a #ReinforcementLearning workbench is that I get to think about how to connect different kinds of agents to different kinds of worlds – representation, interfaces, abstraction.

Something I’m stumbling on is representing models and planners.
Is there such a thing as a planner distinct from a model? Or is planning just something a model does?
In object-oriented programming terms, would a planner be a separate class from a model? Or would it be a method in a model class?

Recent searches

Search options

Administered by:

Server stats:

#reinforcementlearning