Most Cited 2025 "language model confidence" Papers
22,274 papers found • Page 14 of 112
Conference
MiniPLM: Knowledge Distillation for Pre-training Language Models
Yuxian Gu, Hao Zhou, Fandong Meng et al.
Meta-Unlearning on Diffusion Models: Preventing Relearning Unlearned Concepts
Hongcheng Gao, Tianyu Pang, Chao Du et al.
Structured Preconditioners in Adaptive Optimization: A Unified Analysis
Shuo Xie, Tianhao Wang, Sashank J. Reddi et al.
Reasoning of Large Language Models over Knowledge Graphs with Super-Relations
Song Wang, Junhong Lin, Xiaojie Guo et al.
Compression of 3D Gaussian Splatting with Optimized Feature Planes and Standard Video Codecs
Soonbin Lee, Fangwen Shu, Yago Sanchez de la Fuente et al.
Improving Reasoning Performance in Large Language Models via Representation Engineering
Bertram Højer, Oliver Jarvis, Stefan Heinrich
3DRealCar: An In-the-wild RGB-D Car Dataset with 360-degree Views
Xiaobiao Du, Yida Wang, Haiyang Sun et al.
Wasserstein Flow Matching: Generative Modeling Over Families of Distributions
Doron Haviv, Aram-Alexandre Pooladian, Dana Pe'er et al.
Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling
Junha Hyung, Kinam Kim, Susung Hong et al.
ProteinBench: A Holistic Evaluation of Protein Foundation Models
Fei YE, Zaixiang Zheng, Dongyu Xue et al.
Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving
Xiang Li, Pengfei Li, Yupeng Zheng et al.
Boosting MLLM Reasoning with Text-Debiased Hint-GRPO
Qihan Huang, Weilong Dai, Jinlong Liu et al.
AgentAuditor: Human-level Safety and Security Evaluation for LLM Agents
Hanjun Luo, Shenyu Dai, Chiming Ni et al.
Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency
Jerry Yao-Chieh Hu, Wei-Po Wang, Ammar Gilani et al.
MagicArticulate: Make Your 3D Models Articulation-Ready
Chaoyue Song, Jianfeng Zhang, Xiu Li et al.
DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra
Montgomery Bohde, Mrunali Manjrekar, Runzhong Wang et al.
S^3cMath: Spontaneous Step-Level Self-Correction Makes Large Language Models Better Mathematical Reasoners
Yuchen Yan, Jin Jiang, Yang Liu et al.
TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching
Wenxiang Guo, Yu Zhang, Changhao Pan et al.
Ref-GS: Directional Factorization for 2D Gaussian Splatting
Youjia Zhang, Anpei Chen, Yumin Wan et al.
What Are Step-Level Reward Models Rewarding? Counterintuitive Findings from MCTS-Boosted Mathematical Reasoning
Yiran Ma, Zui Chen, Tianqiao Liu et al.
VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation
Shoubin Yu, Difan Liu, Ziqiao Ma et al.
INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations
Yongming Zhu, Longhao Zhang, Zhengkun Rong et al.
Scalable Influence and Fact Tracing for Large Language Model Pretraining
Tyler Chang, Dheeraj Rajagopal, Tolga Bolukbasi et al.
LangTime: A Language-Guided Unified Model for Time Series Forecasting with Proximal Policy Optimization
Wenzhe Niu, Zongxia Xie, Yanru Sun et al.
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
Jiangyong Huang, Baoxiong Jia, Yan Wang et al.
Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval
Yuanmin Tang, Jue Zhang, Xiaoting Qin et al.
Spiking Transformer with Spatial-Temporal Attention
Donghyun Lee, Yuhang Li, Youngeun Kim et al.
Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding
Weiyu Guo, Ziyang Chen, Shaoguang WANG et al.
GigaHands: A Massive Annotated Dataset of Bimanual Hand Activities
Rao Fu, Dingxi Zhang, Alex Jiang et al.
UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing
Yiheng Li, RuiBing Hou, Hong Chang et al.
Score as Action: Fine Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning
Hanyang Zhao, Haoxian Chen, Ji Zhang et al.
Ada-R1: Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization
Haotian Luo, Haiying He, Yibo Wang et al.
PO3AD: Predicting Point Offsets toward Better 3D Point Cloud Anomaly Detection
Jianan Ye, Weiguang Zhao, Xi Yang et al.
Discretization-invariance? On the Discretization Mismatch Errors in Neural Operators
Wenhan Gao, Ruichen Xu, Yuefan Deng et al.
Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment
Xiaojun Jia, Sensen Gao, Simeng Qin et al.
StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements
Mingkun Lei, Xue Song, Beier Zhu et al.
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Zhaolin Gao, Wenhao Zhan, Jonathan Chang et al.
Training-Free Efficient Video Generation via Dynamic Token Carving
Yuechen Zhang, Jinbo Xing, bin xia et al.
VeriThinker: Learning to Verify Makes Reasoning Model Efficient
Zigeng Chen, Xinyin Ma, Gongfan Fang et al.
ActionPiece: Contextually Tokenizing Action Sequences for Generative Recommendation
Yupeng Hou, Jianmo Ni, Zhankui He et al.
MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization
Kangyu Zhu, Peng Xia, Yun Li et al.
PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability
Weijie Zhou, Manli Tao, Chaoyang Zhao et al.
Collab: Controlled Decoding using Mixture of Agents for LLM Alignment
Souradip Chakraborty, Sujay Bhatt, Udari Sehwag et al.
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
Xiaoyuan Liu, Tian Liang, Zhiwei He et al.
EuroBERT: Scaling Multilingual Encoders for European Languages
Nicolas Boizard, Hippolyte Gisserot-Boukhlef, Duarte Miguel Alves et al.
FatesGS: Fast and Accurate Sparse-View Surface Reconstruction Using Gaussian Splatting with Depth-Feature Consistency
Han Huang, Yulun Wu, Chao Deng et al.
Lightning-Fast Image Inversion and Editing for Text-to-Image Diffusion Models
Dvir Samuel, Barak Meiri, Haggai Maron et al.
Boosting Neural Combinatorial Optimization for Large-Scale Vehicle Routing Problems
Fu Luo, Xi Lin, Yaoxin Wu et al.
Better Instruction-Following Through Minimum Bayes Risk
Ian Wu, Patrick Fernandes, Amanda Bertsch et al.
MagicMan: Generative Novel View Synthesis of Humans with 3D-Aware Diffusion and Iterative Refinement
Xu He, Zhiyong Wu, Xiaoyu Li et al.
One-Step is Enough: Sparse Autoencoders for Text-to-Image Diffusion Models
Viacheslav Surkov, Chris Wendler, Antonio Mari et al.
Distillation of Discrete Diffusion through Dimensional Correlations
Satoshi Hayakawa, Yuhta Takida, Masaaki Imaizumi et al.
HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation
Boyuan Wang, Xiaofeng Wang, Chaojun Ni et al.
Adversarial Reasoning at Jailbreaking Time
Mahdi Sabbaghi, Paul Kassianik, George Pappas et al.
Any6D: Model-free 6D Pose Estimation of Novel Object
Taeyeop Lee, Bowen Wen, Minjun Kang et al.
Air Quality Prediction with Physics-Guided Dual Neural ODEs in Open Systems
jindong tian, Yuxuan Liang, Ronghui Xu et al.
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion
zhenwei Wang, Tengfei Wang, Zexin He et al.
EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing
Haotian Sun, Tao Lei, Bowen Zhang et al.
A Quantum Circuit-Based Compression Perspective for Parameter-Efficient Learning
Chen-Yu Liu, Chao-Han Huck Yang, Hsi-Sheng Goan et al.
LoRA-FAIR: Federated LoRA Fine-Tuning with Aggregation and Initialization Refinement
Jieming Bian, Lei Wang, Letian Zhang et al.
TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing
Stefan Lionar, Jiabin Liang, Gim Hee Lee
AttackBench: Evaluating Gradient-based Attacks for Adversarial Examples
Antonio Emanuele Cinà, Jérôme Rony, Maura Pintor et al.
Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
Soumya Suvra Ghosal, Souradip Chakraborty, Vaibhav Singh et al.
Merging on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging
Anke Tang, Enneng Yang, Li Shen et al.
Cubify Anything: Scaling Indoor 3D Object Detection
Justin Lazarow, David Griffiths, Gefen Kohavi et al.
MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning
Suning Huang, Zheyu Zhang, Tianhai Liang et al.
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees
Zhiyuan Zeng, Yizhong Wang, Hannaneh Hajishirzi et al.
4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion
Chaoyang Wang, Peiye Zhuang, Tuan Duc Ngo et al.
Learnings from Scaling Visual Tokenizers for Reconstruction and Generation
Philippe Hansen-Estruch, David Yan, Ching-Yao Chuang et al.
Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation and Reconstruction
Yuanhao Cai, He Zhang, Kai Zhang et al.
FaithDiff: Unleashing Diffusion Priors for Faithful Image Super-resolution
Junyang Chen, Jinshan Pan, Jiangxin Dong
4D-VLA: Spatiotemporal Vision-Language-Action Pretraining with Cross-Scene Calibration
Jiahui Zhang, Yurui Chen, Yueming Xu et al.
Scaling Laws for Pre-training Agents and World Models
Tim Pearce, Tabish Rashid, David Bignell et al.
Understanding Long Videos with Multimodal Language Models
Kanchana Ranasinghe, Xiang Li, Kumara Kahatapitiya et al.
TIME-FS: Joint Learning of Tensorial Incomplete Multi-View Unsupervised Feature Selection and Missing-View Imputation
Yanyong Huang, Minghui Lu, Wei Huang et al.
Tracing Representation Progression: Analyzing and Enhancing Layer-Wise Similarity
Jiachen Jiang, Jinxin Zhou, Zhihui Zhu
CognitionCapturer: Decoding Visual Stimuli from Human EEG Signal with Multimodal Information
Kaifan Zhang, Lihuo He, Xin Jiang et al.
Is Artificial Intelligence Generated Image Detection a Solved Problem?
Ziqiang Li, Jiazhen Yan, Ziwen He et al.
Incentivizing Dual Process Thinking for Efficient Large Language Model Reasoning
Xiaoxue Cheng, Junyi Li, Zhenduo Zhang et al.
Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models
Rui Ye, Jingyi Chai, Xiangrui Liu et al.
DarkBench: Benchmarking Dark Patterns in Large Language Models
Esben Kran, Hieu Minh Nguyen, Akash Kundu et al.
RoboScape: Physics-informed Embodied World Model
Yu Shang, Xin Zhang, Yinzhou Tang et al.
GraphEval: A Lightweight Graph-Based LLM Framework for Idea Evaluation
Tao Feng, Yihang Sun, Jiaxuan You
Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory
Nikola Zubic, Federico Soldà, Aurelio Sulser et al.
A Label-free Heterophily-guided Approach for Unsupervised Graph Fraud Detection
Junjun Pan, Yixin Liu, Xin Zheng et al.
Perm: A Parametric Representation for Multi-Style 3D Hair Modeling
Chengan He, Xin Sun, Zhixin Shu et al.
Exploring the Limits of Vision-Language-Action Manipulation in Cross-task Generalization
Jiaming Zhou, Ke Ye, Jiayi Liu et al.
No Preference Left Behind: Group Distributional Preference Optimization
Binwei Yao, Zefan Cai, Yun-Shiuan Chuang et al.
Generalization through variance: how noise shapes inductive biases in diffusion models
John Vastola
Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting
Tong Ye, Yangkai Du, Tengfei Ma et al.
SLMRec: Distilling Large Language Models into Small for Sequential Recommendation
Wujiang Xu, Qitian Wu, Zujie Liang et al.
MMDisCo: Multi-Modal Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation
Akio Hayakawa, Masato Ishii, Takashi Shibuya et al.
Bridging Context Gaps: Leveraging Coreference Resolution for Long Contextual Understanding
Yanming Liu, Xinyue Peng, Jiannan Cao et al.
Video-Bench: Human-Aligned Video Generation Benchmark
Hui Han, Siyuan Li, Jiaqi Chen et al.
Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds
Hao Liang, Zhiquan Luo
KVTuner: Sensitivity-Aware Layer-Wise Mixed-Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
Xing Li, Zeyu Xing, Yiming Li et al.
The Language of Motion: Unifying Verbal and Non-verbal Language of 3D Human Motion
Changan Chen, Juze Zhang, Shrinidhi Kowshika Lakshmikanth et al.
Vision-Language Gradient Descent-driven All-in-One Deep Unfolding Networks
Haijin Zeng, Xiangming Wang, Yongyong Chen et al.
Mitra: Mixed Synthetic Priors for Enhancing Tabular Foundation Models
Xiyuan Zhang, Danielle Maddix Robinson, Junming Yin et al.
Interpreting Object-level Foundation Models via Visual Precision Search
Ruoyu Chen, Siyuan Liang, Jingzhi Li et al.
Adaptive Length Image Tokenization via Recurrent Allocation
Shivam Duggal, Phillip Isola, Antonio Torralba et al.
Aioli: A Unified Optimization Framework for Language Model Data Mixing
Mayee Chen, Michael Hu, Nicholas Lourie et al.
ContextAgent: Context-Aware Proactive LLM Agents with Open-world Sensory Perceptions
Bufang Yang, Lilin Xu, Liekang Zeng et al.
Structure-Adaptive Multi-View Graph Clustering for Remote Sensing Data
Renxiang Guan, Wenxuan Tu, Siwei Wang et al.
Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset
Xiao Wang, Yu Jin, Wentao Wu et al.
MCU: An Evaluation Framework for Open-Ended Game Agents
Xinyue Zheng, Haowei Lin, Kaichen He et al.
Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters
Yuan Wang, Ouxiang Li, Tingting Mu et al.
DINO-Foresight: Looking into the Future with DINO
Efstathios Karypidis, Ioannis Kakogeorgiou, Spyridon Gidaris et al.
Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models
Yiran Guo, Lijie Xu, Jie Liu et al.
Navigating Neural Space: Revisiting Concept Activation Vectors to Overcome Directional Divergence
Frederik Pahde, Maximilian Dreyer, Moritz Weckbecker et al.
A Frustratingly Simple Yet Highly Effective Attack Baseline: Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1
Zhaoyi Li, Xiaohan Zhao, Dong-Dong Wu et al.
MSP-MVS: Multi-Granularity Segmentation Prior Guided Multi-View Stereo
Zhenlong Yuan, Cong Liu, Fei Shen et al.
Sloth: scaling laws for LLM skills to predict multi-benchmark performance across families
Felipe Maia Polo, Seamus Somerstep, Leshem Choshen et al.
Test-Time Learning for Large Language Models
Jinwu Hu, Zitian Zhang, Guohao Chen et al.
ITBench: Evaluating AI Agents across Diverse Real-World IT Automation Tasks
Saurabh Jha, Rohan Arora, Yuji Watanabe et al.
IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification
Yuhao Wang, Yongfeng Lv, Pingping Zhang et al.
VLM-R³: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought
Chaoya Jiang, Yongrui Heng, Wei Ye et al.
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Guangzhi Sun, Yudong Yang, Jimin Zhuang et al.
Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization
Phillip Guo, Aaquib Syed, Abhay Sheshadri et al.
FastVAR: Linear Visual Autoregressive Modeling via Cached Token Pruning
Hang Guo, Yawei Li, Taolin Zhang et al.
MMGDreamer: Mixed-Modality Graph for Geometry-Controllable 3D Indoor Scene Generation
Zhifei Yang, Keyang Lu, Chao Zhang et al.
Towards Adversarially Robust Dataset Distillation by Curvature Regularization
Eric Xue, Yijiang Li, Haoyang Liu et al.
DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution
Zheng-Peng Duan, jiawei zhang, Xin Jin et al.
DataDecide: How to Predict Best Pretraining Data with Small Experiments
Ian Magnusson, Tai Nguyen, Ben Bogin et al.
Whoever Started the interference Should End It: Guiding Data-Free Model Merging via Task Vectors
Runxi Cheng, Feng Xiong, Yongxian Wei et al.
MMAD: A Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection
Xi Jiang, Jian Li, Hanqiu Deng et al.
ReSim: Reliable World Simulation for Autonomous Driving
Jiazhi Yang, Kashyap Chitta, Shenyuan Gao et al.
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
Yuqing Wang, Zhijie Lin, Yao Teng et al.
Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning
Yuhao Zhou, Yiheng Wang, Xuming He et al.
In-Context Editing: Learning Knowledge from Self-Induced Distributions
Siyuan Qi, Bangcheng Yang, Kailin Jiang et al.
Palu: KV-Cache Compression with Low-Rank Projection
Chi-Chih Chang, Wei-Cheng Lin, Chien-Yu Lin et al.
Active Learning for Neural PDE Solvers
Daniel Musekamp, Marimuthu Kalimuthu, David Holzmüller et al.
CLIBD: Bridging Vision and Genomics for Biodiversity Monitoring at Scale
ZeMing Gong, Austin Wang, Xiaoliang Huo et al.
GS-CPR: Efficient Camera Pose Refinement via 3D Gaussian Splatting
Changkun Liu, Shuai Chen, Yash Bhalgat et al.
MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects
Lei Fan, Dongdong Fan, Zhiguang Hu et al.
Patch-wise Structural Loss for Time Series Forecasting
Dilfira Kudrat, Zongxia Xie, Yanru Sun et al.
Iterative Predictor-Critic Code Decoding for Real-World Image Dehazing
Jiayi Fu, Siyu Liu, Zikun Liu et al.
Patient-Level Anatomy Meets Scanning-Level Physics: Personalized Federated Low-Dose CT Denoising Empowered by Large Language Model
Ziyuan Yang, Yingyu Chen, Zhiwen Wang et al.
HRAvatar: High-Quality and Relightable Gaussian Head Avatar
Dongbin Zhang, Yunfei Liu, Lijian Lin et al.
JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration
yunlong lin, Zixu Lin, Haoyu Chen et al.
Adam Exploits $\ell_\infty$-geometry of Loss Landscape via Coordinate-wise Adaptivity
Shuo Xie, Mohamad Amin Mohamadi, Zhiyuan Li
OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision
Junjie Wang, Bin Chen, Bin Kang et al.
DeLLMa: Decision Making Under Uncertainty with Large Language Models
Ollie Liu, Deqing Fu, Dani Yogatama et al.
The Scene Language: Representing Scenes with Programs, Words, and Embeddings
Yunzhi Zhang, Zizhang Li, Matt Zhou et al.
A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules
Kairong Luo, Haodong Wen, Shengding Hu et al.
Discrete to Continuous: Generating Smooth Transition Poses from Sign Language Observations
Shengeng Tang, Jiayi He, Lechao Cheng et al.
A Probabilistic Perspective on Unlearning and Alignment for Large Language Models
Yan Scholten, Stephan Günnemann, Leo Schwinn
Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters
Zhiyang Guo, Jinxu Xiang, Kai Ma et al.
G-VEval: A Versatile Metric for Evaluating Image and Video Captions Using GPT-4o
Tony Cheng Tong, Sirui He, Zhiwen Shao et al.
GraphMaster: Automated Graph Synthesis via LLM Agents in Data-Limited Environments
Enjun Du, Xunkai Li, Tian Jin et al.
BEV-TSR: Text-Scene Retrieval in BEV Space for Autonomous Driving
Tao Tang, Dafeng Wei, Zhengyu Jia et al.
The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense
Yangyang Guo, Fangkai Jiao, Liqiang Nie et al.
Learning Clustering-based Prototypes for Compositional Zero-Shot Learning
Hongyu Qu, Jianan Wei, Xiangbo Shu et al.
Emergence and scaling laws in SGD learning of shallow neural networks
Yunwei Ren, Eshaan Nichani, Denny Wu et al.
Block-Attention for Efficient Prefilling
Dongyang Ma, Yan Wang, Tian Lan
Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs
Lei Zhang, Yunshui Li, Jiaming Li et al.
Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset
Yiqun Mei, Mingming He, Li Ma et al.
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
Kaiyue Wen, Huaqing Zhang, Hongzhou Lin et al.
Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
Vaishnavh Nagarajan, Chen Wu, Charles Ding et al.
Spiking Vision Transformer with Saccadic Attention
Shuai Wang, Malu Zhang, Dehao Zhang et al.
Force Prompting: Video Generation Models Can Learn And Generalize Physics-based Control Signals
Nate Gillman, Charles Herrmann, Michael Freeman et al.
MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning
Yaming Yang, Dilxat Muhtar, Yelong Shen et al.
Quamba: A Post-Training Quantization Recipe for Selective State Space Models
Hung-Yueh Chiang, Chi-Chih Chang, Natalia Frumkin et al.
GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation
Hongyin Zhang, Pengxiang Ding, Shangke Lyu et al.
Magic Insert: Style-Aware Drag-and-Drop
Nataniel Ruiz, Yuanzhen Li, Neal Wadhwa et al.
Towards Universality: Studying Mechanistic Similarity Across Language Model Architectures
Junxuan Wang, Xuyang Ge, Wentao Shu et al.
WyckoffDiff -- A Generative Diffusion Model for Crystal Symmetry
Filip Ekström Kelvinius, Oskar Andersson, Abhijith Parackal et al.
Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera
Yuliang Guo, Sparsh Garg, S. Mahdi H. Miangoleh et al.
Learning Graph Quantized Tokenizers
Limei Wang, Kaveh Hassani, Si Zhang et al.
Reasoning Limitations of Multimodal Large Language Models. A case study of Bongard Problems
Mikołaj Małkiński, Szymon Pawlonka, Jacek Mańdziuk
Physics-Constrained Flow Matching: Sampling Generative Models with Hard Constraints
Utkarsh Utkarsh, Pengfei Cai, Alan Edelman et al.
AllTracker: Efficient Dense Point Tracking at High Resolution
Adam Harley, Yang You, Yang Zheng et al.
Controllable Context Sensitivity and the Knob Behind It
Julian Minder, Kevin Du, Niklas Stoehr et al.
Learning Efficient Positional Encodings with Graph Neural Networks
Charilaos Kanatsoulis, Evelyn Choi, Stefanie Jegelka et al.
Training Neural Networks as Recognizers of Formal Languages
Alexandra Butoi, Ghazal Khalighinejad, Anej Svete et al.
Synthetic Face Datasets Generation via Latent Space Exploration from Brownian Identity Diffusion
David Geissbühler, Hatef Otroshi Shahreza, Sébastien Marcel
Power Lines: Scaling laws for weight decay and batch size in LLM pre-training
Shane Bergsma, Nolan Dey, Gurpreet Gosal et al.
TCPFormer: Learning Temporal Correlation with Implicit Pose Proxy for 3D Human Pose Estimation
Jiajie Liu, Mengyuan Liu, Hong Liu et al.
Swift4D: Adaptive divide-and-conquer Gaussian Splatting for compact and efficient reconstruction of dynamic scene
Jiahao Wu, Rui Peng, Zhiyan Wang et al.
Can LVLMs Obtain a Driver’s License? A Benchmark Towards Reliable AGI for Autonomous Driving
Yuhang Lu, Yichen Yao, Jiadong Tu et al.
Falcon: Faster and Parallel Inference of Large Language Models Through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree
Xiangxiang Gao, Weisheng Xie, Yiwei Xiang et al.
Locality-aware Gaussian Compression for Fast and High-quality Rendering
Seungjoo Shin, Jaesik Park, Sunghyun Cho
xLSTM-Mixer: Multivariate Time Series Forecasting by Mixing via Scalar Memories
Maurice Kraus, Felix Divo, Devendra Singh Dhami et al.
Unifying Unsupervised Graph-Level Anomaly Detection and Out-of-Distribution Detection: A Benchmark
Yili Wang, Yixin Liu, Xu Shen et al.
FD2-Net: Frequency-Driven Feature Decomposition Network for Infrared-Visible Object Detection
Ke Li, Di Wang, Zhangyuan Hu et al.
DexVLG: Dexterous Vision-Language-Grasp Model at Scale
Jiawei He, Danshi Li, Xinqiang Yu et al.
RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
Haifeng Huang, Xinyi Chen, Yilun Chen et al.
Video Diffusion Models Are Strong Video Inpainter
Minhyeok Lee, Suhwan Cho, Chajin Shin et al.
FunBO: Discovering Acquisition Functions for Bayesian Optimization with FunSearch
Virginia Aglietti, Ira Ktena, Jessica Schrouff et al.
Efficient Learning with Sine-Activated Low-Rank Matrices
Yiping Ji, Hemanth Saratchandran, Cameron Gordon et al.
UniDet3D: Multi-dataset Indoor 3D Object Detection
Maksim Kolodiazhnyi, Anna Vorontsova, Matvey Skripkin et al.
Boosting Generative Image Modeling via Joint Image-Feature Synthesis
Theodoros Kouzelis, Efstathios Karypidis, Ioannis Kakogeorgiou et al.
Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention
Weitai Kang, Mengxue Qu, Jyoti Kini et al.
Text2midi: Generating Symbolic Music from Captions
Keshav Bhandari, Abhinaba Roy, Kyra Wang et al.
Multi-Turn Code Generation Through Single-Step Rewards
Arnav Kumar Jain, Gonzalo Gonzalez-Pumariega, Wayne Chen et al.
Learning 3D Persistent Embodied World Models
Siyuan Zhou, Yilun Du, Yuncong Yang et al.
Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training
Brian Bartoldson, Siddarth Venkatraman, James Diffenderfer et al.
Simulating Human-like Daily Activities with Desire-driven Autonomy
Yiding Wang, Yuxuan Chen, Fangwei Zhong et al.
EpiCoder: Encompassing Diversity and Complexity in Code Generation
Yaoxiang Wang, Haoling Li, Xin Zhang et al.
Prompting Fairness: Integrating Causality to Debias Large Language Models
Jingling Li, Zeyu Tang, Xiaoyu Liu et al.