← All Topics

Efficient Inference

Quantization, pruning, speculative decoding, KV cache, and fast LLM serving.

30 papers in the last 30 daysRSS feed
NavTrust: Benchmarking Trustworthiness for Embodied Navigation

Huaide Jiang, Yash Chaudhary, Yuping Wang et al.

cs.ROcs.AIcs.CVcs.LGMar 19, 2026

There are two major categories of embodied navigation: Vision-Language Navigation (VLN), where agents navigate by following natural language instructions; and Object-Goal Navigation (OGN), where agent

How Uncertainty Estimation Scales with Sampling in Reasoning Models

Maksym Del, Markus Kängsepp, Marharyta Domnich et al.

cs.AIcs.CLcs.LGMar 19, 2026

Uncertainty estimation is critical for deploying reasoning language models, yet remains poorly understood under extended chain-of-thought reasoning. We study parallel sampling as a fully black-box app

Hypothesis-Conditioned Query Rewriting for Decision-Useful Retrieval

Hangeol Chang, Changsun Lee, Seungjoon Rho et al.

cs.CLcs.AIcs.LGMar 19, 2026

Retrieval-Augmented Generation (RAG) improves Large Language Models (LLMs) by grounding generation in external, non-parametric knowledge. However, when a task requires choosing among competing options

Foundations of Schrödinger Bridges for Generative Modeling

Sophia Tang

cs.LGcs.AIMar 19, 2026

At the core of modern generative modeling frameworks, including diffusion models, score-based models, and flow matching, is the task of transforming a simple prior distribution into a complex target d

Evaluating Game Difficulty in Tetris Block Puzzle

Chun-Jui Wang, Jian-Ting Guo, Hung Guei et al.

cs.AIcs.LGMar 19, 2026

Tetris Block Puzzle is a single player stochastic puzzle in which a player places blocks on an 8 x 8 grid to complete lines; its popular variants have amassed tens of millions of downloads. Despite th

Are complicated loss functions necessary for teaching LLMs to reason?

Gabriele Carrino, Andrea Sassella, Nicolo Brunello et al.

cs.LGcs.AIcs.CLMar 19, 2026

Recent advances in large language models (LLMs) highlight the importance of post training techniques for improving reasoning and mathematical ability. Group Relative Policy Optimization (GRPO) has sho

Online Learning and Equilibrium Computation with Ranking Feedback

Mingyang Liu, Yongshan Chen, Zhiyuan Fan et al.

cs.LGcs.CLcs.GTMar 19, 2026

Online learning in arbitrary, and possibly adversarial, environments has been extensively studied in sequential decision-making, and it is closely connected to equilibrium computation in game theory.

Memento-Skills: Let Agents Design Agents

Huichi Zhou, Siyuan Guo, Anjie Liu et al.

cs.AIcs.CLcs.LGMar 19, 2026

We introduce \emph{Memento-Skills}, a generalist, continually-learnable LLM agent system that functions as an \emph{agent-designing agent}: it autonomously constructs, adapts, and improves task-specif

Track Efficient Inference — Get notified when new papers are scored

Sign up free and get daily digests tailored to your research interests.

Sign up free