Most Cited 2025 "large reasoning models" Papers
22,274 papers found • Page 44 of 112
Conference
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale
Joya Chen, Yiqi Lin, Ziyun Zeng et al.
Simplification Is All You Need against Out-of-Distribution Overconfidence
Keke Tang, Chao Hou, Weilong Peng et al.
Token Cropr: Faster ViTs for Quite a Few Tasks
Benjamin Bergner, Christoph Lippert, Aravindh Mahendran
SafeVid: Toward Safety Aligned Video Large Multimodal Models
Yixu Wang, Jiaxin Song, Yifeng Gao et al.
KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation
Jiajun Shi, Jian Yang, Jiaheng Liu et al.
CADRef: Robust Out-of-Distribution Detection via Class-Aware Decoupled Relative Feature Leveraging
Zhiwei Ling, Yachen Chang, Hailiang Zhao et al.
GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning
Kelin Yu, Sheng Zhang, Harshit Soora et al.
JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers
Kwon Byung-Ki, Qi Dai, Lee Hyoseok et al.
GauCho: Gaussian Distributions with Cholesky Decomposition for Oriented Object Detection
Jeffri Erwin Murrugarra Llerena, José Henrique Marques, Claudio Jung
DaCapo: Score Distillation as Stacked Bridge for Fast and High-quality 3D Editing
Yufei Huang, Bangyan Liao, Yuqi Hu et al.
On the Out-Of-Distribution Generalization of Large Multimodal Models
Xingxuan Zhang, Jiansheng Li, Wenjing Chu et al.
Neural Shell Texture Splatting: More Details and Fewer Primitives
Xin Zhang, Anpei Chen, Jincheng Xiong et al.
ViewSRD: 3D Visual Grounding via Structured Multi-View Decomposition
Ronggang Huang, Haoxin Yang, Yan Cai et al.
Spectral Convolutional Conditional Neural Process
Peiman Mohseni, Nick Duffield
Fact-R1: Towards Explainable Video Misinformation Detection with Deep Reasoning
Fanrui Zhang, Dian Li, Qiang Zhang et al.
MARS: A Malignity-Aware Backdoor Defense in Federated Learning
Wei Wan, Ning Yuxuan, Zhicong Huang et al.
SPRINT: Enabling Interleaved Planning and Parallelized Execution in Reasoning Models
Emil Biju, Shayan Talaei, Zhemin Huang et al.
Permissioned LLMs: Enforcing Access Control in Large Language Models
Bargav Jayaraman, Virendra Marathe, Hamid Mozaffari et al.
TrajAgent: An LLM-Agent Framework for Trajectory Modeling via Large-and-Small Model Collaboration
Yuwei Du, Jie Feng, Jie Zhao et al.
Advanced Sign Language Video Generation with Compressed and Quantized Multi-Condition Tokenization
Cong Wang, Zexuan Deng, Zhiwei Jiang et al.
DRoP: Distributionally Robust Data Pruning
Artem Vysogorets, Kartik Ahuja, Julia Kempe
LARGO: Latent Adversarial Reflection through Gradient Optimization for Jailbreaking LLMs
Ran Li, Hao Wang, Chengzhi Mao
CASAGPT: Cuboid Arrangement and Scene Assembly for Interior Design
Weitao Feng, Hang Zhou, Jing Liao et al.
AttentionPredictor: Temporal Patterns Matter for KV Cache Compression
Qingyue Yang, Jie Wang, Xing Li et al.
ProtInvTree: Deliberate Protein Inverse Folding with Reward-guided Tree Search
Mengdi Liu, Xiaoxue Cheng, Zhangyang Gao et al.
MaintainCoder: Maintainable Code Generation Under Dynamic Requirements
Zhengren Wang, Rui ling, Chufan Wang et al.
Flexible Language Modeling in Continuous Space with Transformer-based Autoregressive Flows
Ruixiang Zhang, Shuangfei Zhai, Jiatao Gu et al.
Channel Consistency Prior and Self-Reconstruction Strategy Based Unsupervised Image Deraining
Guanglu Dong, Tianheng Zheng, Yuanzhouhan Cao et al.
End-to-End Vision Tokenizer Tuning
Wenxuan Wang, Fan Zhang, Yufeng Cui et al.
ReRAW: RGB-to-RAW Image Reconstruction via Stratified Sampling for Efficient Object Detection on the Edge
Radu Berdan, Beril Besbinar, Christoph Reinders et al.
GIVEPose: Gradual Intra-class Variation Elimination for RGB-based Category-Level Object Pose Estimation
Ziqin Huang, Gu Wang, Chenyangguang Zhang et al.
Towards Effective Federated Graph Foundation Model via Mitigating Knowledge Entanglement
Yinlin Zhu, Xunkai Li, Jishuo Jia et al.
NSD-Imagery: A Benchmark Dataset for Extending fMRI Vision Decoding Methods to Mental Imagery
Reese Kneeland, Paul Scotti, Ghislain St-Yves et al.
LayerD: Decomposing Raster Graphic Designs into Layers
Tomoyuki Suzuki, Kang-Jun Liu, Naoto Inoue et al.
Multiscale guidance of protein structure prediction with heterogeneous cryo-EM data
Rishwanth Raghu, Axel Levy, Gordon Wetzstein et al.
Statistical inference for Linear Stochastic Approximation with Markovian Noise
Sergey Samsonov, Marina Sheshukova, Eric Moulines et al.
Diffusion Model is Effectively Its Own Teacher
Xinyin Ma, Runpeng Yu, Songhua Liu et al.
Certifying Language Model Robustness with Fuzzed Randomized Smoothing: An Efficient Defense Against Backdoor Attacks
Bowei He, Lihao Yin, Huiling Zhen et al.
Knowledge Distillation with Multi-granularity Mixture of Priors for Image Super-Resolution
Simiao Li, Yun Zhang, Wei Li et al.
Effortless, Simulation-Efficient Bayesian Inference using Tabular Foundation Models
Julius Vetter, Manuel Gloeckler, Daniel Gedon et al.
$\texttt{BetaConform}$: Efficient MAP Estimation of LLM Ensemble Judgment Performance with Prior Transfer
Huaizhi Qu, Inyoung Choi, Zhen Tan et al.
Boosting Multimodal Learning via Disentangled Gradient Learning
Shicai Wei, Chunbo Luo, Yang Luo
LightSwitch: Multi-view Relighting with Material-guided Diffusion
Yehonathan Litman, Fernando De la Torre, Shubham Tulsiani
3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection
Yung-Hsu Yang, Luigi Piccinelli, Mattia Segu et al.
PRM: Photometric Stereo based Large Reconstruction Model
Wenhang Ge, Jiantao Lin, Guibao SHEN et al.
Gene Regulatory Network Inference in the Presence of Selection Bias and Latent Confounders
Gongxu Luo, Haoyue Dai, Longkang Li et al.
GRAVER: Generative Graph Vocabularies for Robust Graph Foundation Models Fine-tuning
Haonan Yuan, Qingyun Sun, Junhua Shi et al.
DSV-LFS: Unifying LLM-Driven Semantic Cues with Visual Features for Robust Few-Shot Segmentation
Amin Karimi, Charalambos Poullis
Characterizing the Expressivity of Fixed-Precision Transformer Language Models
Jiaoda Li, Ryan Cotterell
ASHiTA: Automatic Scene-grounded HIerarchical Task Analysis
Yun Chang, Leonor Fermoselle, Duy Ta et al.
Convergence and Implicit Bias of Gradient Descent on Continual Linear Classification
Hyunji Jung, Hanseul Cho, Chulhee Yun
HORT: Monocular Hand-held Objects Reconstruction with Transformers
Zerui Chen, Rolandos Alexandros Potamias, Shizhe Chen et al.
AltLoRA: Towards Better Gradient Approximation in Low-Rank Adaptation with Alternating Projections
Xin Yu, Yujia Wang, Jinghui Chen et al.
Learning to Discover Regulatory Elements for Gene Expression Prediction
Xingyu Su, Haiyang Yu, Degui Zhi et al.
What You Have is What You Track: Adaptive and Robust Multimodal Tracking
Yuedong Tan, Jiawei Shao, Eduard Zamfir et al.
Conditional Panoramic Image Generation via Masked Autoregressive Modeling
Chaoyang Wang, Xiangtai Li, Lu Qi et al.
PlayerOne: Egocentric World Simulator
Yuanpeng Tu, Hao Luo, Xi Chen et al.
Praxis-VLM: Vision-Grounded Decision Making via Text-Driven Reinforcement Learning
Zhe Hu, Jing Li, Zhongzhu Pu et al.
Unbiased Video Scene Graph Generation via Visual and Semantic Dual Debiasing
Yanjun Li, Zhaoyang Li, Honghui Chen et al.
Doppelgangers++: Improved Visual Disambiguation with Geometric 3D Features
Yuanbo Xiangli, Ruojin Cai, Hanyu Chen et al.
SViM3D: Stable Video Material Diffusion for Single Image 3D Generation
Andreas Engelhardt, Mark Boss, Vikram Voleti et al.
Focus-N-Fix: Region-Aware Fine-Tuning for Text-to-Image Generation
Xiaoying Xing, Avinab Saha, Junfeng He et al.
One-Step Diffusion-Based Image Compression with Semantic Distillation
Naifu Xue, Zhaoyang Jia, Jiahao Li et al.
EgoThinker: Unveiling Egocentric Reasoning with Spatio-Temporal CoT
Baoqi Pei, Yifei Huang, Jilan Xu et al.
Training-Free Dataset Pruning for Instance Segmentation
Yalun Dai, Lingao Xiao, Ivor Tsang et al.
Knowledge-based Visual Question Answer with Multimodal Processing, Retrieval and Filtering
yuyang Hong, Jiaqi Gu, Yang Qi et al.
Sharp Matrix Empirical Bernstein Inequalities
Hongjian Wang, Aaditya Ramdas
A Closer Look at Model Collapse: From a Generalization-to-Memorization Perspective
Lianghe Shi, Meng Wu, Huijie Zhang et al.
FineMotion: A Dataset and Benchmark with both Spatial and Temporal Annotation for Fine-grained Motion Generation and Editing
Bizhu Wu, Jinheng Xie, Meidan Ding et al.
Embracing Contradiction: Theoretical Inconsistency Will Not Impede the Road of Building Responsible AI Systems
Gordon Dai, Yunze Xiao
Contractive Dynamical Imitation Policies for Efficient Out-of-Sample Recovery
Amin Soleimani Abyaneh, Mahrokh Boroujeni, Hsiu-Chin Lin et al.
VOVTrack: Exploring the Potentiality in Raw Videos for Open-Vocabulary Multi-Object Tracking
Zekun Qian, Ruize Han, Junhui Hou et al.
Test-time Adaptation for Foundation Medical Segmentation Model Without Parametric Updates
Kecheng Chen, Xinyu Luo, Tiexin Qin et al.
Self-supervised Learning of Hybrid Part-aware 3D Representations of 2D Gaussians and Superquadrics
Zhirui Gao, Renjiao Yi, Yuhang Huang et al.
Nearly Zero-Cost Protection Against Mimicry by Personalized Diffusion Models
Namhyuk Ahn, KiYoon Yoo, Wonhyuk Ahn et al.
Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment
Bryan Sangwoo Kim, Jeongsol Kim, Jong Chul Ye
SceneMI: Motion In-betweening for Modeling Human-Scene Interaction
Inwoo Hwang, Bing Zhou, Young Min Kim et al.
LISAt: Language-Instructed Segmentation Assistant for Satellite Imagery
Jerome Quenum, Wen-Han Hsieh, Tsung-Han (Patrick) Wu et al.
On the Emergence of Linear Analogies in Word Embeddings
Daniel Korchinski, Dhruva Karkada, Yasaman Bahri et al.
SCAN: Self-Denoising Monte Carlo Annotation for Robust Process Reward Learning
Yuyang Ding, Xinyu Shi, Juntao Li et al.
Precise Event Spotting in Sports Videos: Solving Long-Range Dependency and Class Imbalance
Sanchayan Santra, Vishal Chudasama, Pankaj Wasnik et al.
Protecting Your Video Content: Disrupting Automated Video-based LLM Annotations
Haitong Liu, Kuofeng Gao, Yang Bai et al.
Inference-Scale Complexity in ANN-SNN Conversion for High-Performance and Low-Power Applications
Tong Bu, Maohua Li, Zhaofei Yu
Balanced Image Stylization with Style Matching Score
Yuxin Jiang, Liming Jiang, Shuai Yang et al.
Measure gradients, not activations! Enhancing neuronal activity in deep reinforcement learning
Jiashun Liu, Zihao Wu, Johan Obando Ceron et al.
Steady Progress Beats Stagnation: Mutual Aid of Foundation and Conventional Models in Mixed Domain Semi-Supervised Medical Image Segmentation
Qinghe Ma, Jian Zhang, Zekun Li et al.
ATP: Adaptive Threshold Pruning for Efficient Data Encoding in Quantum Neural Networks
Mohamed Afane, Gabrielle Ebbrecht, Ying Wang et al.
GENIUS: A Generative Framework for Universal Multimodal Search
Sungyeon Kim, Xinliang Zhu, Xiaofan Lin et al.
PersonaBooth: Personalized Text-to-Motion Generation
Boeun Kim, Hea In Jeong, JungHoon Sung et al.
dFLMoE: Decentralized Federated Learning via Mixture of Experts for Medical Data Analysis
Luyuan Xie, Tianyu Luan, Wenyuan Cai et al.
Incentivizing LLMs to Self-Verify Their Answers
Fuxiang Zhang, Jiacheng Xu, Chaojie Wang et al.
Towards All-in-One Medical Image Re-Identification
Yuan Tian, Kaiyuan Ji, Rongzhao Zhang et al.
Continuous Diffusion Model for Language Modeling
Jaehyeong Jo, Sung Ju Hwang
Vision Function Layer in Multimodal LLMs
Cheng Shi, Yizhou Yu, Sibei Yang
Stop the Nonconsensual Use of Nude Images in Research
Princessa Cintaqia, Arshia Arya, Elissa Redmiles et al.
TimeFormer: Capturing Temporal Relationships of Deformable 3D Gaussians for Robust Reconstruction
Dadong Jiang, Zhi Hou, Zhihui Ke et al.
MALinZero: Efficient Low-Dimensional Search for Mastering Complex Multi-Agent Planning
Sizhe Tang, Jiayu Chen, Tian Lan
HumanOLAT: A Large-Scale Dataset for Full-Body Human Relighting and Novel-View Synthesis
Timo Teufel, xilong zhou, Umar Iqbal et al.
Efficient Pre-Training of LLMs via Topology-Aware Communication Alignment on More Than 9600 GPUs
Guoliang He, Youhe Jiang, Wencong Xiao et al.
DeClotH: Decomposable 3D Cloth and Human Body Reconstruction from a Single Image
Hyeongjin Nam, Donghwan Kim, Jeongtaek Oh et al.
Proxy Target: Bridging the Gap Between Discrete Spiking Neural Networks and Continuous Control
Zijie Xu, Tong Bu, Zecheng Hao et al.
CoDA: Coordinated Diffusion Noise Optimization for Whole-Body Manipulation of Articulated Objects
Huaijin Pi, Zhi Cen, Zhiyang Dou et al.
Better Estimation of the Kullback--Leibler Divergence Between Language Models
Afra Amini, Tim Vieira, Ryan Cotterell
Codifying Character Logic in Role-Playing
Letian Peng, Jingbo Shang
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
Rui Zhao, Weijia Mao, Mike Zheng Shou
Articulate3D: Holistic Understanding of 3D Scenes as Universal Scene Description
Anna-Maria Halacheva, Yang Miao, Jan-Nico Zaech et al.
BridgeDepth: Bridging Monocular and Stereo Reasoning with Latent Alignment
Tongfan Guan, Jiaxin Guo, Chen Wang et al.
Multi-Label Prototype Visual Spatial Search for Weakly Supervised Semantic Segmentation
Songsong Duan, Xi Yang, Nannan Wang
Balanced Token Pruning: Accelerating Vision Language Models Beyond Local Optimization
kaiyuan Li, Xiaoyue Chen, Chen Gao et al.
USP-Gaussian: Unifying Spike-based Image Reconstruction, Pose Correction and Gaussian Splatting
Kang Chen, Jiyuan Zhang, Zecheng Hao et al.
Auto-Vocabulary Semantic Segmentation
Osman Ülger, Maksymilian Kulicki, Yuki Asano et al.
GECKO: Gigapixel Vision-Concept Contrastive Pretraining in Histopathology
Saarthak Kapse, Pushpak Pati, Srikar Yellapragada et al.
ResGS: Residual Densification of 3D Gaussian for Efficient Detail Recovery
Yanzhe Lyu, Kai Cheng, Kang Xin et al.
Lost in Transmission: When and Why LLMs Fail to Reason Globally
Tobias Schnabel, Kiran Tomlinson, Adith Swaminathan et al.
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning
Xin Wen, Bingchen Zhao, Yilun Chen et al.
VoteFlow: Enforcing Local Rigidity in Self-Supervised Scene Flow
Yancong Lin, Shiming Wang, Liangliang Nan et al.
Robust Simulation-Based Inference under Missing Data via Neural Processes
Yogesh Verma, Ayush Bharti, Vikas Garg
Sampling from multi-modal distributions with polynomial query complexity in fixed dimension via reverse diffusion
Adrien Vacher, Omar Chehab, Anna Korba
Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene
Shengqiong Wu, Hao Fei, Jingkang Yang et al.
Understanding protein function with a multimodal retrieval-augmented foundation model
Timothy Truong Jr, Tristan Bepler
GAP: Gaussianize Any Point Clouds with Text Guidance
Weiqi Zhang, Junsheng Zhou, Haotian Geng et al.
Pursuing Better Decision Boundaries for Long-Tailed Object Detection via Category Information Amount
Yanbiao Ma, Wei Dai, Jiayi Chen
VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges
Yuxuan Wang, Yiqi Song, Cihang Xie et al.
EquiTabPFN: A Target-Permutation Equivariant Prior Fitted Network
Michael Arbel, David Salinas, Frank Hutter
Brain-Like Processing Pathways Form in Models With Heterogeneous Experts
Jack Cook, Danyal Akarca, Rui Costa et al.
PolarFree: Polarization-based Reflection-Free Imaging
Mingde Yao, Menglu Wang, King Man Tam et al.
Can Video LLMs Refuse to Answer? Alignment for Answerability in Video Large Language Models
Eunseop Yoon, Hee Suk Yoon, Mark Hasegawa-Johnson et al.
A Simple Linear Patch Revives Layer-Pruned Large Language Models
Xinrui Chen, Haoli Bai, Tao Yuan et al.
Is Limited Participant Diversity Impeding EEG-based Machine Learning?
Philipp Bomatter, Henry Gouk
LUNA: Efficient and Topology-Agnostic Foundation Model for EEG Signal Analysis
Berkay Döner, Thorir Mar Ingolfsson, Luca Benini et al.
Algorithms and SQ Lower Bounds for Robustly Learning Real-valued Multi-Index Models
Ilias Diakonikolas, Giannis Iakovidis, Daniel Kane et al.
DREAM: Drafting with Refined Target Features and Entropy-Adaptive Cross-Attention Fusion for Multimodal Speculative Decoding
Yunhai Hu, Tianhua Xia, Zining Liu et al.
Know What You Don't Know: Uncertainty Calibration of Process Reward Models
Young-Jin Park, Kristjan Greenewald, Kaveh Alimohammadi et al.
BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation
Ruotong Wang, Mingli Zhu, Jiarong Ou et al.
GenDataAgent: On-the-fly Dataset Augmentation with Synthetic Data
Zhiteng Li, Lele Chen, Jerone Andrews et al.
Prompt2Perturb (P2P): Text-Guided Diffusion-Based Adversarial Attack on Breast Ultrasound Images
Yasamin Medghalchi, Moein Heidari, Clayton Allard et al.
Introducing FOReCAst: The Future Outcome Reasoning and Confidence Assessment Benchmark
Zhangdie Yuan, Zifeng Ding, Andreas Vlachos
Riemannian Flow Matching for Brain Connectivity Matrices via Pullback Geometry
Antoine Collas, Ce Ju, Nicolas Salvy et al.
WISE: A Framework for Gigapixel Whole-Slide-Image Lossless Compression
Yu Mao, Jun Wang, Nan Guan et al.
VerbDiff: Text-Only Diffusion Models with Enhanced Interaction Awareness
SeungJu Cha, Kwanyoung Lee, Ye-Chan Kim et al.
SAD Neural Networks: Divergent Gradient Flows and Asymptotic Optimality via o-minimal Structures
Julian Kranz, Davide Gallon, Steffen Dereich et al.
Beyond Value Functions: Single-Loop Bilevel Optimization under Flatness Conditions
Liuyuan Jiang, Quan Xiao, Lisha Chen et al.
MeToken: Uniform Micro-environment Token Boosts Post-Translational Modification Prediction
Cheng Tan, Zhenxiao Cao, Zhangyang Gao et al.
iManip: Skill-Incremental Learning for Robotic Manipulation
Zexin Zheng, Jia-Feng Cai, Xiao-Ming Wu et al.
Generative Zoo
Tomasz Niewiadomski, Anastasios Yiannakidis, Hanz Cuevas Velasquez et al.
Towards Robust Parameter-Efficient Fine-Tuning for Federated Learning
Xiuwen Fang, Mang Ye
H-MoRe: Learning Human-centric Motion Representation for Action Analysis
Zhanbo Huang, Xiaoming Liu, Yu Kong
Self-supervised contrastive learning performs non-linear system identification
Rodrigo Gonzalez Laiz, Tobias Schmidt, Steffen Schneider
KVCOMM: Online Cross-context KV-cache Communication for Efficient LLM-based Multi-agent Systems
Hancheng Ye, Zhengqi Gao, Mingyuan Ma et al.
Position: Towards Bidirectional Human-AI Alignment
Hua Shen, Tiffany Knearem, Reshmi Ghosh et al.
BIGS: Bimanual Category-agnostic Interaction Reconstruction from Monocular Videos via 3D Gaussian Splatting
Jeongwan On, Kyeonghwan Gwak, Gunyoung Kang et al.
MultiMorph: On-demand Atlas Construction
Mazdak Abulnaga, Andrew Hoopes, Neel Dey et al.
Symmetry Strikes Back: From Single-Image Symmetry Detection to 3D Generation
Xiang Li, Zixuan Huang, Anh Thai et al.
Learning Neural Exposure Fields for View Synthesis
Michael Niemeyer, Fabian Manhardt, Marie-Julie Rakotosaona et al.
Reasoning to Attend: Try to Understand How <SEG> Token Works
Rui Qian, Xin Yin, Dejing Dou
Universal Sequence Preconditioning
Annie Marsden, Elad Hazan
End-to-End Implicit Neural Representations for Classification
Alexander Gielisse, Jan van Gemert
3D-RAD: A Comprehensive 3D Radiology Med-VQA Dataset with Multi-Temporal Analysis and Diverse Diagnostic Tasks
Xiaotang Gai, Jiaxiang Liu, Yichen Li et al.
Dual-view X-ray Detection: Can AI Detect Prohibited Items from Dual-view X-ray Images like Humans?
Renshuai Tao, Haoyu Wang, Yuzhe Guo et al.
Safe and Stable Control via Lyapunov-Guided Diffusion Models
Xiaoyuan Cheng, Xiaohang Tang, Yiming Yang
Understanding Multi-Task Activities from Single-Task Videos
Yuhan Shen, Ehsan Elhamifar
InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts
Weipeng Zhong, Peizhou Cao, Yichen Jin et al.
Better Training Data Attribution via Better Inverse Hessian-Vector Products
Andrew Wang, Elisa Nguyen, Runshi Yang et al.
Improving Time Series Forecasting via Instance-aware Post-hoc Revision
Zhiding Liu, Mingyue Cheng, Guanhao Zhao et al.
Secret Lies in Color: Enhancing AI-Generated Images Detection with Color Distribution Analysis
Zexi Jia, Chuanwei Huang, Yeshuang Zhu et al.
SURGEON: Memory-Adaptive Fully Test-Time Adaptation via Dynamic Activation Sparsity
Ke Ma, Jiaqi Tang, Bin Guo et al.
CustomKD: Customizing Large Vision Foundation for Edge Model Improvement via Knowledge Distillation
Jungsoo Lee, Debasmit Das, Munawar Hayat et al.
A Large-Scale 3D Face Mesh Video Dataset via Neural Re-parameterized Optimization
Kim Youwang, Lee Hyun, Kim Sung-Bin et al.
SRA-CL: Semantic Retrieval Augmented Contrastive Learning for Sequential Recommendation
Ziqiang Cui, Yunpeng Weng, Xing Tang et al.
JAMUN: Bridging Smoothed Molecular Dynamics and Score-Based Learning for Conformational Ensemble Generation
Ameya Daigavane, Bodhi Vani, Darcy Davidson et al.
Plug-and-Play Context Feature Reuse for Efficient Masked Generation
Xuejie Liu, Anji Liu, Guy Van den Broeck et al.
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs
Lucas Ventura, Antoine Yang, Cordelia Schmid et al.
Region-based Cluster Discrimination for Visual Representation Learning
Yin Xie, Kaicheng Yang, Xiang An et al.
VoxelSplat: Dynamic Gaussian Splatting as an Effective Loss for Occupancy and Flow Prediction
Ziyue Zhu, Shenlong Wang, Jin Xie et al.
SpecEdge: Scalable Edge-Assisted Serving Framework for Interactive LLMs
Jinwoo Park, Seunggeun Cho, Dongsu Han
LATTE-MV: Learning to Anticipate Table Tennis Hits from Monocular Videos
Daniel Etaat, Dvij Rajesh Kalaria, Nima Rahmanian et al.
WALL-E: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents
Siyu Zhou, Tianyi Zhou, Yijun Yang et al.
Multi-Token Prediction Needs Registers
Anastasios Gerontopoulos, Spyridon Gidaris, Nikos Komodakis
MuTri: Multi-view Tri-alignment for OCT to OCTA 3D Image Translation
zhuangzhuang chen, hualiang wang, Chubin Ou et al.
REVE: A Foundation Model for EEG - Adapting to Any Setup with Large-Scale Pretraining on 25,000 Subjects
Yassine El Ouahidi, Jonathan Lys, Philipp Thölke et al.
Go With the Flow: Fast Diffusion for Gaussian Mixture Models
George Rapakoulias, Ali Reza Pedram, Fengjiao Liu et al.
Dynamic Multimodal Prototype Learning in Vision-Language Models
Xingyu Zhu, Shuo Wang, Beier Zhu et al.
Memory Mosaics at scale
Jianyu Zhang, Leon Bottou
Neural Networks Generalize on Low Complexity Data
Sourav Chatterjee, Timothy Sudijono
Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using Concordia
Chandler Smith, Marwa Abdulhai, Manfred Díaz et al.
Sequential Gaussian Avatars with Hierarchical Motion Context
Wangze Xu, Yifan Zhan, Zhihang Zhong et al.
Scaling Laws For Scalable Oversight
Joshua Engels, David Baek, Subhash Kantamneni et al.
UniGist: Towards General and Hardware-aligned Sequence-level Long Context Compression
Chenlong Deng, Zhisong Zhang, Kelong Mao et al.
SPARS3R: Semantic Prior Alignment and Regularization for Sparse 3D Reconstruction
Yutao Tang, Yuxiang Guo, Deming Li et al.
UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning
Ye Liu, Zongyang Ma, Junfu Pu et al.
Controllable Weather Synthesis and Removal with Video Diffusion Models
Chih-Hao Lin, Zian Wang, Ruofan Liang et al.
Acknowledging Focus Ambiguity in Visual Questions
Chongyan Chen, Yu-Yun Tseng, Zhuoheng Li et al.
EmoNet-Face: An Expert-Annotated Benchmark for Synthetic Emotion Recognition
Christoph Schuhmann, Robert Kaczmarczyk, Gollam Rabby et al.
DINGO: Constrained Inference for Diffusion LLMs
Tarun Suresh, Debangshu Banerjee, Shubham Ugare et al.
DistinctAD: Distinctive Audio Description Generation in Contexts
Bo Fang, Wenhao Wu, Qiangqiang Wu et al.
Non-Asymptotic Guarantees for Average-Reward Q-Learning with Adaptive Stepsizes
Zaiwei Chen
Scalable and Cost-Efficient de Novo Template-Based Molecular Generation
Piotr Gaiński, Oussama Boussif, Andrei Rekesh et al.
Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving
Hao Zhou, Zhanning Gao, Zhili Chen et al.
DuoLoRA : Cycle-consistent and Rank-disentangled Content-Style Personalization
Aniket Roy, Shubhankar Borse, Shreya Kadambi et al.
Pause Tokens Strictly Increase the Expressivity of Constant-Depth Transformers
Charles London, Varun Kanade