DiscoPhon: Benchmarking the Unsupervised Discovery of Phoneme Inventories With Discrete Speech UnitsMaxime Poli, Manel Khentout, Angelo Ortiz Tandazo et al.
cs.CLcs.SDeess.ASMar 19, 2026
We introduce DiscoPhon, a multilingual benchmark for evaluating unsupervised phoneme discovery from discrete speech units. DiscoPhon covers 6 dev and 6 test languages, chosen to span a wide range of p…
F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual WorldZiyin Zhang, Zihan Liao, Hang Yu et al.
cs.CLcs.AIMar 19, 2026
We present F2LLM-v2, a new family of general-purpose, multilingual embedding models in 8 distinct sizes ranging from 80M to 14B. Trained on a newly curated composite of 60 million publicly available h…
Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy DistillationZhuolin Yang, Zihan Liu, Yang Chen et al.
cs.CLcs.AIcs.LGMar 19, 2026
We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. Despite its compact size, its mathematical an…
NavTrust: Benchmarking Trustworthiness for Embodied NavigationHuaide Jiang, Yash Chaudhary, Yuping Wang et al.
cs.ROcs.AIcs.CVcs.LGMar 19, 2026
There are two major categories of embodied navigation: Vision-Language Navigation (VLN), where agents navigate by following natural language instructions; and Object-Goal Navigation (OGN), where agent…
DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative Latent DenoisingTianjiao Yu, Xinzhuo Li, Muntasir Wahed et al.
cs.CVcs.AIcs.LGMar 19, 2026
Understanding and generating 3D objects as compositions of meaningful parts is fundamental to human perception and reasoning. However, most text-to-3D methods overlook the semantic and functional stru…
Serendipity by Design: Evaluating the Impact of Cross-domain Mappings on Human and LLM CreativityQiawen Ella Liu, Marina Dubova, Henry Conklin et al.
cs.AIcs.CLMar 19, 2026
Are large language models (LLMs) creative in the same way humans are, and can the same interventions increase creativity in both? We evaluate a promising but largely untested intervention for creativi…
How Uncertainty Estimation Scales with Sampling in Reasoning ModelsMaksym Del, Markus Kängsepp, Marharyta Domnich et al.
cs.AIcs.CLcs.LGMar 19, 2026
Uncertainty estimation is critical for deploying reasoning language models, yet remains poorly understood under extended chain-of-thought reasoning. We study parallel sampling as a fully black-box app…
Meanings and Measurements: Multi-Agent Probabilistic Grounding for Vision-Language NavigationSwagat Padhan, Lakshya Jain, Bhavya Minesh Shah et al.
cs.ROcs.AIcs.CLcs.CVMar 19, 2026
Robots collaborating with humans must convert natural language goals into actionable, physically grounded decisions. For example, executing a command such as "go two meters to the right of the fridge"…
SAVeS: Steering Safety Judgments in Vision-Language Models via Semantic CuesCarlos Hinojosa, Clemens Grange, Bernard Ghanem
cs.CVcs.AIcs.CLcs.LGMar 19, 2026
Vision-language models (VLMs) are increasingly deployed in real-world and embodied settings where safety decisions depend on visual context. However, it remains unclear which visual evidence drives th…
DaPT: A Dual-Path Framework for Multilingual Multi-hop Question AnsweringYilin Wang, Yuchun Fan, Jiaoyang Li et al.
cs.CLcs.AIMar 19, 2026
Retrieval-augmented generation (RAG) systems have made significant progress in solving complex multi-hop question answering (QA) tasks in the English scenario. However, RAG systems inevitably face the…
CustomTex: High-fidelity Indoor Scene Texturing via Multi-Reference CustomizationWeilin Chen, Jiahao Rao, Wenhao Wang et al.
cs.CVcs.AIMar 19, 2026
The creation of high-fidelity, customizable 3D indoor scene textures remains a significant challenge. While text-driven methods offer flexibility, they lack the precision for fine-grained, instance-le…
FinTradeBench: A Financial Reasoning Benchmark for LLMsYogesh Agrawal, Aniruddha Dutta, Md Mahadi Hasan et al.
cs.CEcs.AIcs.CLcs.IRMar 19, 2026
Real-world financial decision-making is a challenging problem that requires reasoning over heterogeneous signals, including company fundamentals derived from regulatory filings and trading signals com…
Parallelograms Strike Back: LLMs Generate Better Analogies than PeopleQiawen Ella Liu, Raja Marjieh, Jian-Qiao Zhu et al.
cs.CLcs.AIMar 19, 2026
Four-term word analogies (A:B::C:D) are classically modeled geometrically as ''parallelograms,'' yet recent work suggests this model poorly captures how humans produce analogies, with simple local-sim…
Box Maze: A Process-Control Architecture for Reliable LLM ReasoningZou Qiang
cs.AIcs.CLMar 19, 2026
Large language models (LLMs) demonstrate strong generative capabilities but remain vulnerable to hallucination and unreliable reasoning under adversarial prompting. Existing safety approaches -- such …
VEPO: Variable Entropy Policy Optimization for Low-Resource Language Foundation ModelsChonghan Liu, Yimin Du, Qi An et al.
cs.CLcs.AIMar 19, 2026
Large language models frequently exhibit suboptimal performance on low resource languages, primarily due to inefficient subword segmentation and systemic training data imbalances. In this paper, we pr…
A Dataset and Resources for Identifying Patient Health Literacy Information from Clinical NotesMadeline Bittner, Dina Demner-Fushman, Yasmeen Shabazz et al.
cs.CLMar 19, 2026
Health literacy is a critical determinant of patient outcomes, yet current screening tools are not always feasible and differ considerably in the number of items, question format, and dimensions of he…
ARIADNE: A Perception-Reasoning Synergy Framework for Trustworthy Coronary Angiography AnalysisZhan Jin, Yu Luo, Yizhou Zhang et al.
cs.CVcs.AIMar 19, 2026
Conventional pixel-wise loss functions fail to enforce topological constraints in coronary vessel segmentation, producing fragmented vascular trees despite high pixel-level accuracy. We present ARIADN…
Hypothesis-Conditioned Query Rewriting for Decision-Useful RetrievalHangeol Chang, Changsun Lee, Seungjoon Rho et al.
cs.CLcs.AIcs.LGMar 19, 2026
Retrieval-Augmented Generation (RAG) improves Large Language Models (LLMs) by grounding generation in external, non-parametric knowledge. However, when a task requires choosing among competing options…
UGID: Unified Graph Isomorphism for Debiasing Large Language ModelsZikang Ding, Junchi Yao, Junhao Li et al.
cs.CLcs.AIMar 19, 2026
Large language models (LLMs) exhibit pronounced social biases. Output-level or data-optimization--based debiasing methods cannot fully resolve these biases, and many prior works have shown that biases…
Em-Garde: A Propose-Match Framework for Proactive Streaming Video UnderstandingYikai Zheng, Xin Ding, Yifan Yang et al.
cs.CVcs.AIMar 19, 2026
Recent advances in Streaming Video Understanding has enabled a new interaction paradigm where models respond proactively to user queries. Current proactive VideoLLMs rely on per-frame triggering decis…
SEM: Sparse Embedding Modulation for Post-Hoc Debiasing of Vision-Language ModelsQuentin Guimard, Federico Bartsch, Simone Caldarella et al.
cs.CVcs.AIcs.LGMar 19, 2026
Models that bridge vision and language, such as CLIP, are key components of multimodal AI, yet their large-scale, uncurated training data introduce severe social and spurious biases. Existing post-hoc…
What Really Controls Temporal Reasoning in Large Language Models: Tokenisation or Representation of Time?Gagan Bhatia, Ahmad Muhammad Isa, Maxime Peyrard et al.
cs.CLcs.AIMar 19, 2026
We present MultiTempBench, a multilingual temporal reasoning benchmark spanning three tasks, date arithmetic, time zone conversion, and temporal relation extraction across five languages (English, Ger…
Translating MRI to PET through Conditional Diffusion Models with Enhanced Pathology AwarenessYitong Li, Igor Yakushev, Dennis M. Hedderich et al.
cs.CVcs.AIMar 19, 2026
Positron emission tomography (PET) is a widely recognized technique for diagnosing neurodegenerative diseases, offering critical functional insights. However, its high costs and radiation exposure hin…
How Auditory Knowledge in LLM Backbones Shapes Audio Language Models: A Holistic EvaluationKe-Han Lu, Szu-Wei Fu, Chao-Han Huck Yang et al.
eess.AScs.CLcs.SDMar 19, 2026
Large language models (LLMs) have been widely used as knowledge backbones of Large Audio Language Models (LALMs), yet how much auditory knowledge they encode through text-only pre-training and how thi…
MultihopSpatial: Multi-hop Compositional Spatial Reasoning Benchmark for Vision-Language ModelYoungwan Lee, Soojin Jang, Yoorhim Cho et al.
cs.CVcs.AIMar 19, 2026
Spatial reasoning is foundational for Vision-Language Models (VLMs), particularly when deployed as Vision-Language-Action (VLA) agents in physical environments. However, existing benchmarks predominan…
RewardFlow: Topology-Aware Reward Propagation on State Graphs for Agentic RL with Large Language ModelsXiao Feng, Bo Han, Zhanke Zhou et al.
cs.AIcs.CLcs.LGMar 19, 2026
Reinforcement learning (RL) holds significant promise for enhancing the agentic reasoning capabilities of large language models (LLMs) with external environments. However, the inherent sparsity of ter…
Motion-o: Trajectory-Grounded Video ReasoningBishoy Galoaa, Shayda Moezzi, Xiangyu Bai et al.
cs.CVcs.AIMar 19, 2026
Recent research has made substantial progress on video reasoning, with many models leveraging spatio-temporal evidence chains to strengthen their inference capabilities. At the same time, a growing se…
Are complicated loss functions necessary for teaching LLMs to reason?Gabriele Carrino, Andrea Sassella, Nicolo Brunello et al.
cs.LGcs.AIcs.CLMar 19, 2026
Recent advances in large language models (LLMs) highlight the importance of post training techniques for improving reasoning and mathematical ability. Group Relative Policy Optimization (GRPO) has sho…
Why Better Cross-Lingual Alignment Fails for Better Cross-Lingual Transfer: Case of EncodersYana Veitsman, Yihong Liu, Hinrich Schütze
cs.CLMar 19, 2026
Better cross-lingual alignment is often assumed to yield better cross-lingual transfer. However, explicit alignment techniques -- despite increasing embedding similarity -- frequently fail to improve …
Implicit Grading Bias in Large Language Models: How Writing Style Affects Automated Assessment Across Math, Programming, and Essay TasksRudra Jadhav, Janhavi Danve, Sonalika Shaw
cs.CLMar 19, 2026
As large language models (LLMs) are increasingly deployed as automated graders in educational settings, concerns about fairness and bias in their evaluations have become critical. This study investiga…
Track Vision & Multimodal — Get notified when new papers are scored
Sign up free and get daily digests tailored to your research interests.
Sign up free