← All Topics
👁️

Vision & Multimodal

Vision-language models, multimodal AI, visual question answering, and image-text alignment.

30 papers in the last 30 daysRSS feed
A Reproducibility Study of LLM-Based Query Reformulation

Amin Bigdeli, Radin Hamidi Rad, Hai Son Le et al.

cs.IRcs.CLApr 30, 2026

Large Language Models (LLMs) are now widely used for query reformulation and expansion in Information Retrieval, with many studies reporting substantial effectiveness gains. However, these results are

Heterogeneous Scientific Foundation Model Collaboration

Zihao Li, Jiaru Zou, Feihao Fang et al.

cs.AIcs.CLcs.LGApr 30, 2026

Agentic large language model systems have demonstrated strong capabilities. However, their reliance on language as the universal interface fundamentally limits their applicability to many real-world p

GUI Agents with Reinforcement Learning: Toward Digital Inhabitants

Junan Hu, Jian Liu, Jingxiang Lai et al.

cs.AIcs.CVApr 30, 2026

Graphical User Interface (GUI) agents have emerged as a promising paradigm for intelligent systems that perceive and interact with graphical interfaces visually. Yet supervised fine-tuning alone canno

On the Proper Treatment of Units in Surprisal Theory

Samuel Kiegeland, Vésteinn Snæbjarnarson, Tim Vieira et al.

cs.CLApr 30, 2026

Surprisal theory links human processing effort to the predictability of an upcoming linguistic unit, but empirical work often leaves the notion of a unit underspecified. In practice, experimental stim

Universal statistical laws governing culinary design

Ganesh Bagler, Gopal Krishna Tewari, Aditya Raj Yadav et al.

physics.soc-phcs.CLApr 30, 2026

Cooking is a cultural expression of human creativity that transcends geography and time through the orchestration of ingredients and techniques, much like languages do through words and syntax. Yet, b

Exploration Hacking: Can LLMs Learn to Resist RL Training?

Eyon Jang, Damon Falck, Joschka Braun et al.

cs.LGcs.CLApr 30, 2026

Reinforcement learning (RL) has become essential to the post-training of large language models (LLMs) for reasoning, agentic capabilities and alignment. Successful RL relies on sufficient exploration

Track Vision & Multimodal — Get notified when new papers are scored

Sign up free and get daily digests tailored to your research interests.

Sign up free