Accepted Papers
We accepted 48 papers (5 oral, 6 spotlight, and 37 posters) to the workshop, out of which 34 were archival and 14 were non-archival.
Oral Papers
Safe in Isolation, Dangerous Together: Agent-Driven Multi-Turn Decomposition Jailbreaks on LLMs Devansh Srivastav, Xiao Zhang (archival) |
CAMPHOR: Collaborative Agents for Multi-input Planning and High-Order Reasoning On Device Yicheng Fu, Raviteja Anantha, Jianpeng Cheng (archival) |
TALES- Text Adventure Learning Environment Suite Christopher Zhang Cui, Xingdi Yuan, Ziang Xiao, Prithviraj Ammanabrolu, Marc-Alexandre Côté (non-archival) |
Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games David Guzman Piedrahita, Yongjin Yang, Mrinmaya Sachan, Giorgia Ramponi, Bernhard Schölkopf, Zhijing Jin (non-archival) |
AgentAda: Skill-Adaptive Data Analytics for Tailored Insight Discovery Amirhossein Abaskohi, Amrutha Varshini Ramesh, Shailesh Nanisetty, Chirag Goel, David Vazquez, Christopher Pal, Spandana Gella, Giuseppe Carenini, Issam H. Laradji (non-archival) |
Spotlight Papers
Do Large Language Models Learn Human-Like Strategic Preferences? Jesse Roberts, Kyle Moore, Douglas Fisher (archival) |
A Study on Leveraging Search and Self-Feedback for Agent Reasoning Karthikeyan K, Michelle Yuan, Elman Mansimov, Katerina Margatina, Anurag Pratik, Daniele Bonadiman, MONICA SUNKARA, Yi Zhang, Yassine Benajiba (archival) |
BEARCUBS: A benchmark for computer-using web agents Yixiao Song, Katherine Thai, Chau Minh Pham, Yapei Chang, Mazin Nadaf, Mohit Iyyer (non-archival) |
GitGoodBench: A Novel Benchmark For Evaluating Agentic Performance On Git Tobias Lindenbauer, Egor Bogomolov, Yaroslav Zharov (archival) |
Fleet of Agents: Coordinated Problem Solving with Large Language Models Nearchos Potamitis, Lars Henning Klein, Roland Aydin, Robert West, Caglar Gulcehre, Akhil Arora (non-archival) |
Preventing Rogue Agents Improves Multi-Agent Collaboration Ohav Barbi, Ori Yoran, Mor Geva (archival) |
Posters
Archival (28 papers)
Are You Sure You’re Positive? Consolidating Chain-of-Thought Agents with Uncertainty Quantification for Aspect-Category Sentiment Analysis Filippos Ventirozos, Peter A. Appleby, Matthew Shardlow |
Snap Out of It: A Dual-Process Approach to Mitigating Overthinking in Language Model Reasoning Ashish Pandian, Nelson Lojo, Wei Xun Lai, Jackson Lukas |
ToolReflection: Improving Large Language Models for Real-World API Calls with Self-Generated Data Gregory Polyakov, Ilseyar Alimova, Dmitry Abulkhanov, Ivan Sedykh, Andrey Bout, Sergey Nikolenko, Irina Piontkovskaya |
PAARS: Persona Aligned Agentic Retail Shoppers Saab Mansour, Leonardo Perelli, Lorenzo Mainetti, George Davidson, Stefano D'Amato |
Inherent and emergent liability issues in LLM-based agentic systems: a principal-agent perspective Garry A. Gabison, R. Patrick Xian |
A Conversational Agent Framework for Multimodal Knowledge Retrieval: A Case Study in FHWA InfoHighway Web Portal Queries Sai Surya Gadiraju, Duoduo Liao, Zijie He |
Bridging the Digital Divide: Empowering Elderly Smartphone Users with Intelligent and Human-Centered Design in Agemate Liangliang Chen, Yongzhen Mu |
RL + Transformer = A General-Purpose Problem Solver Micah Rentschler, Jesse Roberts |
AID-Agent: An LLM-Agent for Advanced Extraction and Integration of Documents Bin Li, Jannis Conen, Felix Aller |
Leveraging LLM-based sentiment analysis for portfolio optimization with proximal policy optimization Kemal Kirtac, Guido Germano |
Conditional Multi-Stage Failure Recovery for Embodied Agents Youmna Farag, Svetlana Stoyanchev, Mohan Li, Simon Keizer, Rama Doddipatla |
Decentralized Low-Rank Fine-Tuning of Large Language Models Sajjad Ghiasvand, Mahnoosh Alizadeh, Ramtin Pedarsani |
A Multi-AI Agent System for Autonomous Optimization of Agentic AI Solutions via Iterative Refinement and LLM-Driven Feedback Loops Kamer Ali Yuksel, Thiago Castro Ferreira, Mohamed Al-Badrashiny, Hassan Sawaf |
From Knowledge to Noise: CTIM-Rover and the Pitfalls of Episodic Memory in Software Engineering Agents Tobias Lindenbauer, Georg Groh, Hinrich Schuetze |
Hidden Forms: A Dataset to Fill Masked Interfaces from Language Commands Anirudh Sundar, Christopher Gordon Richardson, William Gay, Benjamin Reichman, Larry Heck |
Positive Experience Reflection for Agents in Interactive Text Environments Philip Lippmann, Matthijs T. J. Spaan, Jie Yang |
Weight-of-Thought Reasoning: Exploring Neural Network Weights for Enhanced LLM Reasoning Saif Punjwani, Larry Heck |
StateAct: Enhancing LLM Base Agents via Self-prompting and State-tracking Nikolai Rozanov, Marek Rei |
DFLOW: Diverse Dialogue Flow Simulation with Large Language Models Wanyu Du, Song Feng, James Gung, Lijia Sun, Yi Zhang, Saab Mansour, Yanjun Qi |
Prompt-based Personality Profiling: Reinforcement Learning for Relevance Filtering Jan Hofmann, Cornelia Sindermann, Roman Klinger |
TCQA$^2$: A Tiered Conversational Q&A Agent in Gaming Ze Chen, Chengcheng Wei, Jiewen Zheng, Jiarong He |
Oversight Structures for Agentic AI in Public-Sector Organizations Chris Schmitz, Jonathan Rystrøm, Jan Batzner |
DIAMOND: An LLM-Driven Agent for Context-Aware Baseball Highlight Summarization Jeonghun Kang, Soonmok Kwon, Joonseok Lee, Byung-Hak Kim |
Measuring temporal effects of agent knowledge by date-controlled tool use R. Patrick Xian, Qiming Cui, Stefan Bauer, Reza Abbasi-Asl |
The Power of Simplicity in LLM-Based Event Forecasting Meiru Zhang, Auss Abbood, Zaiqiao Meng, Nigel Collier |
The Art of Tool Interface Design Yunnan Wu, Qile P. Chen, Deshank Baranwal, Jinlong Zhou, Jian Yuan |
FrontierScience Bench: Evaluating AI Research Capabilities in LLMs Matthew Li, Santiago Torres-Garcia, Shayan Halder, Phani Kuppa, Sean O'Brien, Vasu Sharma, Kevin Zhu, Sunishchal Dev |
VisTRA: Visual Tool-use Reasoning Analyzer for Small Object Visual Question Answering Hiroaki Sugiyama, Ko Koga, Toshifumi Nishijima |
Non-archival (9 papers)
Chain-of-Conceptual-Thought Elicits Daily Conversation in Large Language Models Qingqing Gu, Dan Wang, Yue Zhao, Xiaoyu Wang, Zhonglin Jiang, Yong Chen, Luo Ji |
WriteHERE: Adaptive Long-form Writing via Heterogeneous Recursive Planning with Language Models Ruibin Xiong, Yimeng Chen, Dmitrii Khizbullin, Mingchen Zhuge, Jürgen Schmidhuber |
Collaborating Action by Action: A Multi-agent LLM Framework for Embodied Reasoning Isadora White, Kolby Nottingham, Ayush Parasbhai Maniar, Max Robinson, Mehul Maheshwari, Hansen Lillemark, Lianhui Qin, Prithviraj Ammanabrolu |
WorldGUI: An Interactive Benchmark for Desktop GUI Automation from Any Starting Point Henry Hengyuan Zhao, Kaiming Yang, Wendi Yu, Difei Gao, Mike Zheng Shou |
Stepwise Informativeness Search for Improving LLM Reasoning Siyuan Wang, Enda Zhao, zhongyu wei, Xiang Ren |
MASCA: LLM based-Multi Agents System for Credit Assessment Gautam Jajoo, Pranjal A Chitale, Saksham Aggarwal |
MMA-RAG: A Novel Modular RAG Framework With LLM-based Multi-agents Reinforcement Learning Xiangwen Deng, Shenao Jiang, Yufeng Wang, FEICE HUANG, Jiawei Zhou, Peng Jiao, Yong Ge, Haoqian Wang |
Gödel Agent: A Self-Referential Agent Framework for Recursively Self-Improvement Xunjian Yin, Xinyi Wang, Liangming Pan, Li Lin, Xiaojun Wan, William Yang Wang |
Scaling Web Agent Training through Automatic Data Generation and Fine-grained Evaluation Lajanugen Logeswaran, Jaekyeom Kim, Sungryull Sohn, Creighton Glasscock, Honglak Lee |