Rigorous Error Certification for Neural PDE Solvers: From Empirical Residuals to Solution GuaranteesAmartya Mukherjee, Maxwell Fitzsimmons, David C. Del Rey Fernández et al.
cs.LGmath.APmath.FAMar 19, 2026
Uncertainty quantification for partial differential equations is traditionally grounded in discretization theory, where solution error is controlled via mesh/grid refinement. Physics-informed neural n…
SOL-ExecBench: Speed-of-Light Benchmarking for Real-World GPU Kernels Against Hardware LimitsEdward Lin, Sahil Modi, Siva Kumar Sastry Hari et al.
cs.LGcs.AIMar 19, 2026
As agentic AI systems become increasingly capable of generating and optimizing GPU kernels, progress is constrained by benchmarks that reward speedup over software baselines rather than proximity to h…
Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy DistillationZhuolin Yang, Zihan Liu, Yang Chen et al.
cs.CLcs.AIcs.LGMar 19, 2026
We introduce Nemotron-Cascade 2, an open 30B MoE model with 3B activated parameters that delivers best-in-class reasoning and strong agentic capabilities. Despite its compact size, its mathematical an…
NavTrust: Benchmarking Trustworthiness for Embodied NavigationHuaide Jiang, Yash Chaudhary, Yuping Wang et al.
cs.ROcs.AIcs.CVcs.LGMar 19, 2026
There are two major categories of embodied navigation: Vision-Language Navigation (VLN), where agents navigate by following natural language instructions; and Object-Goal Navigation (OGN), where agent…
DreamPartGen: Semantically Grounded Part-Level 3D Generation via Collaborative Latent DenoisingTianjiao Yu, Xinzhuo Li, Muntasir Wahed et al.
cs.CVcs.AIcs.LGMar 19, 2026
Understanding and generating 3D objects as compositions of meaningful parts is fundamental to human perception and reasoning. However, most text-to-3D methods overlook the semantic and functional stru…
How Uncertainty Estimation Scales with Sampling in Reasoning ModelsMaksym Del, Markus Kängsepp, Marharyta Domnich et al.
cs.AIcs.CLcs.LGMar 19, 2026
Uncertainty estimation is critical for deploying reasoning language models, yet remains poorly understood under extended chain-of-thought reasoning. We study parallel sampling as a fully black-box app…
Meanings and Measurements: Multi-Agent Probabilistic Grounding for Vision-Language NavigationSwagat Padhan, Lakshya Jain, Bhavya Minesh Shah et al.
cs.ROcs.AIcs.CLcs.CVMar 19, 2026
Robots collaborating with humans must convert natural language goals into actionable, physically grounded decisions. For example, executing a command such as "go two meters to the right of the fridge"…
SAVeS: Steering Safety Judgments in Vision-Language Models via Semantic CuesCarlos Hinojosa, Clemens Grange, Bernard Ghanem
cs.CVcs.AIcs.CLcs.LGMar 19, 2026
Vision-language models (VLMs) are increasingly deployed in real-world and embodied settings where safety decisions depend on visual context. However, it remains unclear which visual evidence drives th…
Adaptive Regime-Aware Stock Price Prediction Using Autoencoder-Gated Dual Node Transformers with Reinforcement Learning ControlMohammad Al Ridhawi, Mahtab Haj Ali, Hussein Al Osman
cs.LGcs.AIq-fin.STMar 19, 2026
Stock markets exhibit regime-dependent behavior where prediction models optimized for stable conditions often fail during volatile periods. Existing approaches typically treat all market states unifor…
D5P4: Partition Determinantal Point Process for Diversity in Parallel Discrete Diffusion DecodingJonathan Lys, Vincent Gripon, Bastien Pasdeloup et al.
cs.AIcs.LGMar 19, 2026
Discrete diffusion models are promising alternatives to autoregressive approaches for text generation, yet their decoding methods remain under-studied. Standard decoding methods for autoregressive mod…
Hypothesis-Conditioned Query Rewriting for Decision-Useful RetrievalHangeol Chang, Changsun Lee, Seungjoon Rho et al.
cs.CLcs.AIcs.LGMar 19, 2026
Retrieval-Augmented Generation (RAG) improves Large Language Models (LLMs) by grounding generation in external, non-parametric knowledge. However, when a task requires choosing among competing options…
Regret Bounds for Competitive Resource Allocation with Endogenous CostsRui Chai
cs.AIcs.DScs.GTcs.LGMar 19, 2026
We study online resource allocation among N interacting modules over T rounds. Unlike standard online optimization, costs are endogenous: they depend on the full allocation vector through an interacti…
From Accuracy to Readiness: Metrics and Benchmarks for Human-AI Decision-MakingMin Hun Lee
cs.HCcs.AIcs.LGMar 19, 2026
Artificial intelligence (AI) systems are deployed as collaborators in human decision-making. Yet, evaluation practices focus primarily on model accuracy rather than whether human-AI teams are prepared…
Foundations of Schrödinger Bridges for Generative ModelingSophia Tang
cs.LGcs.AIMar 19, 2026
At the core of modern generative modeling frameworks, including diffusion models, score-based models, and flow matching, is the task of transforming a simple prior distribution into a complex target d…
AgentDS Technical Report: Benchmarking the Future of Human-AI Collaboration in Domain-Specific Data ScienceAn Luo, Jin Du, Xun Xian et al.
cs.LGcs.AIstat.MEMar 19, 2026
Data science plays a critical role in transforming complex data into actionable insights across numerous domains. Recent developments in large language models (LLMs) and artificial intelligence (AI) a…
SEM: Sparse Embedding Modulation for Post-Hoc Debiasing of Vision-Language ModelsQuentin Guimard, Federico Bartsch, Simone Caldarella et al.
cs.CVcs.AIcs.LGMar 19, 2026
Models that bridge vision and language, such as CLIP, are key components of multimodal AI, yet their large-scale, uncurated training data introduce severe social and spurious biases. Existing post-hoc…
FedTrident: Resilient Road Condition Classification Against Poisoning Attacks in Federated LearningSheng Liu, Panos Papadimitratos
cs.CRcs.AIcs.DCcs.LGMar 19, 2026
FL has emerged as a transformative paradigm for ITS, notably camera-based Road Condition Classification (RCC). However, by enabling collaboration, FL-based RCC exposes the system to adversarial partic…
Evaluating Game Difficulty in Tetris Block PuzzleChun-Jui Wang, Jian-Ting Guo, Hung Guei et al.
cs.AIcs.LGMar 19, 2026
Tetris Block Puzzle is a single player stochastic puzzle in which a player places blocks on an 8 x 8 grid to complete lines; its popular variants have amassed tens of millions of downloads. Despite th…
Teleological Inference in Structural Causal Models via Intentional InterventionsDario Compagno, Fabio Massimo Zennaro
cs.AIMar 19, 2026
Structural causal models (SCMs) were conceived to formulate and answer causal questions. This paper shows that SCMs can also be used to formulate and answer teleological questions, concerning the inte…
RewardFlow: Topology-Aware Reward Propagation on State Graphs for Agentic RL with Large Language ModelsXiao Feng, Bo Han, Zhanke Zhou et al.
cs.AIcs.CLcs.LGMar 19, 2026
Reinforcement learning (RL) holds significant promise for enhancing the agentic reasoning capabilities of large language models (LLMs) with external environments. However, the inherent sparsity of ter…
Improving moment tensor solutions under Earth structure uncertainty with simulation-based inferenceA. A. Saoulis, T. -S. Pham, A. M. G. Ferreira
physics.geo-phcs.AIMar 19, 2026
Bayesian inference represents a principled way to incorporate Earth structure uncertainty in full-waveform moment tensor inversions, but traditional approaches generally require significant approximat…
Are complicated loss functions necessary for teaching LLMs to reason?Gabriele Carrino, Andrea Sassella, Nicolo Brunello et al.
cs.LGcs.AIcs.CLMar 19, 2026
Recent advances in large language models (LLMs) highlight the importance of post training techniques for improving reasoning and mathematical ability. Group Relative Policy Optimization (GRPO) has sho…
Online Learning and Equilibrium Computation with Ranking FeedbackMingyang Liu, Yongshan Chen, Zhiyuan Fan et al.
cs.LGcs.CLcs.GTMar 19, 2026
Online learning in arbitrary, and possibly adversarial, environments has been extensively studied in sequential decision-making, and it is closely connected to equilibrium computation in game theory. …
Optimal Splitting of Language Models from Mixtures to Specialized DomainsSkyler Seto, Pierre Ablin, Anastasiia Filippova et al.
cs.CLcs.LGMar 19, 2026
Language models achieve impressive performance on a variety of knowledge, language, and reasoning tasks due to the scale and diversity of pretraining data available. The standard training recipe is a …
Entropy trajectory shape predicts LLM reasoning reliability: A diagnostic study of uncertainty dynamics in chain-of-thoughtXinghao Zhao
cs.CLcs.LGMar 19, 2026
Chain-of-thought (CoT) reasoning improves LLM accuracy, yet detecting failures cheaply remains elusive. We study whether the shape of uncertainty dynamics across reasoning steps--captured by sampling …
STEP: Scientific Time-Series Encoder Pretraining via Cross-Domain DistillationChen Zhang, Liwei Liu, Jun Tao et al.
cs.LGcs.CLMar 19, 2026
Scientific time series are central to scientific AI but are typically sparse, highly heterogeneous, and limited in scale, making unified representation learning particularly challenging. Meanwhile, fo…
CausalRM: Causal-Theoretic Reward Modeling for RLHF from Observational User FeedbacksHao Wang, Licheng Pan, Zhichao Chen et al.
cs.LGcs.AIcs.CLstat.MLMar 19, 2026
Despite the success of reinforcement learning from human feedback (RLHF) in aligning language models, current reward modeling heavily relies on experimental feedback data collected from human annotato…
SpecForge: A Flexible and Efficient Open-Source Training Framework for Speculative DecodingShenggui Li, Chao Wang, Yikai Zhu et al.
cs.LGcs.AIcs.CLMar 19, 2026
Large language models incur high inference latency due to sequential autoregressive decoding. Speculative decoding alleviates this bottleneck by using a lightweight draft model to propose multiple tok…
Memento-Skills: Let Agents Design AgentsHuichi Zhou, Siyuan Guo, Anjie Liu et al.
cs.AIcs.CLcs.LGMar 19, 2026
We introduce \emph{Memento-Skills}, a generalist, continually-learnable LLM agent system that functions as an \emph{agent-designing agent}: it autonomously constructs, adapts, and improves task-specif…
HISR: Hindsight Information Modulated Segmental Process Rewards For Multi-turn Agentic Reinforcement LearningZhicong Lu, Zichuan Lin, Wei Jia et al.
cs.LGcs.AIcs.CLMar 19, 2026
While large language models excel in diverse domains, their performance on complex longhorizon agentic decision-making tasks remains limited. Most existing methods concentrate on designing effective r…
Track Efficient Inference — Get notified when new papers are scored
Sign up free and get daily digests tailored to your research interests.
Sign up free