← All Topics
👁️

Vision & Multimodal

Vision-language models, multimodal AI, visual question answering, and image-text alignment.

30 papers in the last 30 daysRSS feed
NavTrust: Benchmarking Trustworthiness for Embodied Navigation

Huaide Jiang, Yash Chaudhary, Yuping Wang et al.

cs.ROcs.AIcs.CVcs.LGMar 19, 2026

There are two major categories of embodied navigation: Vision-Language Navigation (VLN), where agents navigate by following natural language instructions; and Object-Goal Navigation (OGN), where agent

How Uncertainty Estimation Scales with Sampling in Reasoning Models

Maksym Del, Markus Kängsepp, Marharyta Domnich et al.

cs.AIcs.CLcs.LGMar 19, 2026

Uncertainty estimation is critical for deploying reasoning language models, yet remains poorly understood under extended chain-of-thought reasoning. We study parallel sampling as a fully black-box app

FinTradeBench: A Financial Reasoning Benchmark for LLMs

Yogesh Agrawal, Aniruddha Dutta, Md Mahadi Hasan et al.

cs.CEcs.AIcs.CLcs.IRMar 19, 2026

Real-world financial decision-making is a challenging problem that requires reasoning over heterogeneous signals, including company fundamentals derived from regulatory filings and trading signals com

Hypothesis-Conditioned Query Rewriting for Decision-Useful Retrieval

Hangeol Chang, Changsun Lee, Seungjoon Rho et al.

cs.CLcs.AIcs.LGMar 19, 2026

Retrieval-Augmented Generation (RAG) improves Large Language Models (LLMs) by grounding generation in external, non-parametric knowledge. However, when a task requires choosing among competing options

Motion-o: Trajectory-Grounded Video Reasoning

Bishoy Galoaa, Shayda Moezzi, Xiangyu Bai et al.

cs.CVcs.AIMar 19, 2026

Recent research has made substantial progress on video reasoning, with many models leveraging spatio-temporal evidence chains to strengthen their inference capabilities. At the same time, a growing se

Are complicated loss functions necessary for teaching LLMs to reason?

Gabriele Carrino, Andrea Sassella, Nicolo Brunello et al.

cs.LGcs.AIcs.CLMar 19, 2026

Recent advances in large language models (LLMs) highlight the importance of post training techniques for improving reasoning and mathematical ability. Group Relative Policy Optimization (GRPO) has sho

Track Vision & Multimodal — Get notified when new papers are scored

Sign up free and get daily digests tailored to your research interests.

Sign up free