Most Cited 2025 "message-passing paradigm" Papers
22,274 papers found • Page 20 of 112
Conference
Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions
Siqiao Mu, Diego Klabjan
Learning-Order Autoregressive Models with Application to Molecular Graph Generation
Zhe Wang, Jiaxin Shi, Nicolas Heess et al.
AdaRankGrad: Adaptive Gradient Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning
Yehonathan Refael, Jonathan Svirsky, Boris Shustin et al.
AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference
Zhuomin He, Yizhen Yao, Pengfei Zuo et al.
Vec2Face: Scaling Face Dataset Generation with Loosely Constrained Vectors
Haiyu Wu, Jaskirat Singh, Sicong Tian et al.
Strategy Coopetition Explains the Emergence and Transience of In-Context Learning
Aaditya Singh, Ted Moskovitz, Sara Dragutinović et al.
Layer- and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers
Haoran You, Connelly Barnes, Yuqian Zhou et al.
TopoNets: High performing vision and language models with brain-like topography
Mayukh Deb, Mainak Deb, Apurva Murty
Epistemic Alignment: A Mediating Framework for User-LLM Knowledge Delivery
Nicholas Clark, Hua Shen, Bill Howe et al.
FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors
Yabo Zhang, xinpeng zhou, Yihan Zeng et al.
Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model
Dongki Kim, Wonbin Lee, Sung Ju Hwang
Localized Concept Erasure for Text-to-Image Diffusion Models Using Training-Free Gated Low-Rank Adaptation
Byung Hyun Lee, Sungjin Lim, Se Young Chun
Revisiting Zeroth-Order Optimization: Minimum-Variance Two-Point Estimators and Directionally Aligned Perturbations
Shaocong Ma, Heng Huang
Breaking Latent Prior Bias in Detectors for Generalizable AIGC Image Detection
Yue Zhou, Xinan He, Kaiqing Lin et al.
Towards Training-free Anomaly Detection with Vision and Language Foundation Models
Jinjin Zhang, Guodong Wang, yizhou jin et al.
LIMOPro: Reasoning Refinement for Efficient and Effective Test-time Scaling
Yang Xiao, Jiashuo WANG, Ruifeng Yuan et al.
CLIPer: Hierarchically Improving Spatial Representation of CLIP for Open-Vocabulary Semantic Segmentation
Lin Sun, Jiale Cao, Jin Xie et al.
Learning 2D Invariant Affordance Knowledge for 3D Affordance Grounding
Xianqiang Gao, Pingrui Zhang, Delin Qu et al.
Enhancing Multilingual LLM Pretraining with Model-Based Data Selection
Bettina Messmer, Vinko Sabolčec, Martin Jaggi
BFANet: Revisiting 3D Semantic Segmentation with Boundary Feature Analysis
Weiguang Zhao, Rui Zhang, Qiufeng Wang et al.
Brain Mapping with Dense Features: Grounding Cortical Semantic Selectivity in Natural Images With Vision Transformers
Andrew Luo, Jacob Yeung, Rushikesh Zawar et al.
ICE-Bench: A Unified and Comprehensive Benchmark for Image Creating and Editing
Yulin Pan, Xiangteng He, Chaojie Mao et al.
Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards
Yangsibo Huang, Milad Nasr, Anastasios Angelopoulos et al.
Adapter Merging with Centroid Prototype Mapping for Scalable Class-Incremental Learning
Takuma Fukuda, Hiroshi Kera, Kazuhiko Kawamoto
CharaConsist: Fine-Grained Consistent Character Generation
Mengyu Wang, Henghui Ding, Jianing Peng et al.
Post-pre-training for Modality Alignment in Vision-Language Foundation Models
Shin'ya Yamaguchi, Dewei Feng, Sekitoshi Kanai et al.
Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma?
Tianyuan Qu, Longxiang Tang, Bohao PENG et al.
AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation
Sixiang Chen, Jiaming Liu, Siyuan Qian et al.
Long-Sequence Recommendation Models Need Decoupled Embeddings
Ningya Feng, Junwei Pan, Jialong Wu et al.
KTAE: A Model-Free Algorithm to Key-Tokens Advantage Estimation in Mathematical Reasoning
Wei Sun, Wen Yang, Pu Jian et al.
Diffusion Models for Attribution
Xiongren Chen, Jiuyong Li, Jixue Liu et al.
DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses
Yatian Pang, Bin Zhu, Bin Lin et al.
CLIMB: Data Foundations for Large Scale Multimodal Clinical Foundation Models
David Dai, Peilin Chen, Malinda Lu et al.
Neuro-3D: Towards 3D Visual Decoding from EEG Signals
Zhanqiang Guo, Jiamin Wu, Yonghao Song et al.
MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection
Bokai Lin, Zihao Zeng, Zipeng Xiao et al.
Potemkin Understanding in Large Language Models
Marina Mancoridis, Bec Weeks, Keyon Vafa et al.
What's the Move? Hybrid Imitation Learning via Salient Points
Priya Sundaresan, Hengyuan Hu, Quan Vuong et al.
Accelerating Large Language Model Reasoning via Speculative Search
Zhihai Wang, Jie Wang, Jilai Pan et al.
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing
Ming Li, Xin Gu, Fan Chen et al.
Plastic Learning with Deep Fourier Features
Alex Lewandowski, Dale Schuurmans, Marlos C. Machado
TopoLM: brain-like spatio-functional organization in a topographic language model
Neil Rathi, Johannes Mehrer, Badr AlKhamissi et al.
ScribbleLight: Single Image Indoor Relighting with Scribbles
Jun Myeong Choi, Annie N. Wang, Pieter Peers et al.
Efficient 3D Recognition with Event-driven Spike Sparse Convolution
Xuerui Qiu, Man Yao, Jieyuan Zhang et al.
DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation
Hongbin Lin, Zilu Guo, Yifan Zhang et al.
Energy Matching: Unifying Flow Matching and Energy-Based Models for Generative Modeling
Michal Balcerak, Tamaz Amiranashvili, Antonio Terpin et al.
EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision
Yiming Zhao, Taein Kwon, Paul Streli et al.
On the Effect of Negative Gradient in Group Relative Deep Reinforcement Optimization
wenlong deng, Yi Ren, Muchen Li et al.
OmniCount: Multi-label Object Counting with Semantic-Geometric Priors
Anindya Mondal, Sauradip Nag, Xiatian Zhu et al.
No Metric to Rule Them All: Toward Principled Evaluations of Graph-Learning Datasets
Corinna Coupette, Jeremy Wayland, Emily Simons et al.
3D Mesh Editing using Masked LRMs
William Gao, Dilin Wang, Yuchen Fan et al.
Transformer-Squared: Self-adaptive LLMs
Qi Sun, Edoardo Cetin, Yujin Tang
Advancing Spiking Neural Networks Towards Multiscale Spatiotemporal Interaction Learning
Yimeng Shan, Malu Zhang, Rui-jie Zhu et al.
Training LLMs over Neurally Compressed Text
Brian Lester, Jaehoon Lee, Jeffrey Pennington et al.
Online Video Understanding: OVBench and VideoChat-Online
Zhenpeng Huang, Xinhao Li, Jiaqi Li et al.
Sparse Autoencoders Reveal Temporal Difference Learning in Large Language Models
Can Demircan, Tankred Saanum, Akshay Jagadish et al.
Conformal Thresholded Intervals for Efficient Regression
Rui Luo, Zhixin Zhou
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
Wei Chen, Lin Li, Yongqi Yang et al.
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment
Chenghao Fan, zhenyi lu, Sichen Liu et al.
Certified Unlearning for Neural Networks
Anastasiia Koloskova, Youssef Allouah, Animesh Jha et al.
BlockDance: Reuse Structurally Similar Spatio-Temporal Features to Accelerate Diffusion Transformers
Hui Zhang, Tingwei Gao, Jie Shao et al.
Coreset Selection via Reducible Loss in Continual Learning
Ruilin Tong, Yuhang Liu, Javen Qinfeng Shi et al.
RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation
Mingfei Han, Liang Ma, Kamila Zhumakhanova et al.
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning
Baoqi Pei, Yifei Huang, Jilan Xu et al.
Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?
Antonia Wüst, Tim Woydt, Lukas Helff et al.
TGB-Seq Benchmark: Challenging Temporal GNNs with Complex Sequential Dynamics
Lu Yi, Jie Peng, Yanping Zheng et al.
Revisiting Tampered Scene Text Detection in the Era of Generative AI
Chenfan Qu, Yiwu Zhong, Fengjun Guo et al.
DyG-Mamba: Continuous State Space Modeling on Dynamic Graphs
Dongyuan Li, Shiyin Tan, Ying Zhang et al.
NoT: Federated Unlearning via Weight Negation
Yasser Khalil, Leo Maxime Brunswic, Soufiane Lamghari et al.
Scaling Laws for Gradient Descent and Sign Descent for Linear Bigram Models under Zipf’s Law
Frederik Kunstner, Francis Bach
CityGS-X: A Scalable Architecture for Efficient and Geometrically Accurate Large-Scale Scene Reconstruction
Yuanyuan Gao, Hao Li, Jiaqi Chen et al.
SV-RAG: LoRA-Contextualizing Adaptation of MLLMs for Long Document Understanding
Jian Chen, Ruiyi Zhang, Yufan Zhou et al.
POGEMA: A Benchmark Platform for Cooperative Multi-Agent Pathfinding
Alexey Skrynnik, Anton Andreychuk, Anatolii Borzilov et al.
Geometry of Lightning Self-Attention: Identifiability and Dimension
Nathan Henry, Giovanni Luca Marchetti, Kathlén Kohn
UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation
Rui Tian, Mingfei Gao, Mingze Xu et al.
Unlocking Efficient, Scalable, and Continual Knowledge Editing with Basis-Level Representation Fine-Tuning
Tianci Liu, Ruirui Li, Yunzhe Qi et al.
Efficient Rectification of Neuro-Symbolic Reasoning Inconsistencies by Abductive Reflection
Wen-Chao Hu, Wang-Zhou Dai, Yuan Jiang et al.
Graph Mixture of Experts and Memory-augmented Routers for Multivariate Time Series Anomaly Detection
Xiaoyu Huang, Weidong Chen, Bo Hu et al.
Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes
Lihan Jiang, Kerui Ren, Mulin Yu et al.
Formation of Representations in Neural Networks
Liu Ziyin, Isaac Chuang, Tomer Galanti et al.
MolParser: End-to-end Visual Recognition of Molecule Structures in the Wild
Xi Fang, Jiankun Wang, Xiaochen Cai et al.
Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions
Yik Siu Chan, Narutatsu Ri, Yuxin Xiao et al.
BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
Md Mohaiminul Islam, Tushar Nagarajan, Huiyu Wang et al.
Black-Box Adversarial Attacks on LLM-Based Code Completion
Slobodan Jenko, Niels Mündler, Jingxuan He et al.
RoboTron-Drive: All-in-One Large Multimodal Model for Autonomous Driving
Zhijian Huang, Chengjian Feng, Baihui Xiao et al.
Exploring Vacant Classes in Label-Skewed Federated Learning
Kuangpu Guo, Yuhe Ding, Jian Liang et al.
KBLaM: Knowledge Base augmented Language Model
Xi Wang, Taketomo Isazawa, Liana Mikaelyan et al.
Learning Adversarial MDPs with Stochastic Hard Constraints
Francesco Emanuele Stradi, Matteo Castiglioni, Alberto Marchesi et al.
RealMath: A Continuous Benchmark for Evaluating Language Models on Research-Level Mathematics
Jie Zhang, Cezara Petrui, Kristina Nikolić et al.
From Commands to Prompts: LLM-based Semantic File System for AIOS
Zeru Shi, Kai Mei, Mingyu Jin et al.
Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding
Yue Fan, Xiaojian Ma, Rongpeng Su et al.
LiveXiv - A Multi-Modal live benchmark based on Arxiv papers content
Nimrod Shabtay, Felipe Maia Polo, Sivan Doveh et al.
RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors
Fengshuo Bai, Runze Liu, Yali Du et al.
Scaling Laws for Differentially Private Language Models
Ryan McKenna, Yangsibo Huang, Amer Sinha et al.
Proxy Denoising for Source-Free Domain Adaptation
Song Tang, Wenxin Su, Yan Gan et al.
MG-MotionLLM: A Unified Framework for Motion Comprehension and Generation across Multiple Granularities
Bizhu Wu, Jinheng Xie, Keming Shen et al.
ReSi: A Comprehensive Benchmark for Representational Similarity Measures
Max Klabunde, Tassilo Wald, Tobias Schumacher et al.
Debiased Multimodal Understanding for Human Language Sequences
Zhi Xu, Dingkang Yang, Mingcheng Li et al.
Skill Expansion and Composition in Parameter Space
Tenglong Liu, Jianxiong Li, Yinan Zheng et al.
UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents
Han Xiao, Guozhi Wang, Yuxiang Chai et al.
EmotiCrafter: Text-to-Emotional-Image Generation based on Valence-Arousal Model
Shengqi Dang, Yi He, Long Ling et al.
How Far are AI-generated Videos from Simulating the 3D Visual World: A Learned 3D Evaluation Approach
Chirui CHANG, Jiahui Liu, Zhengzhe Liu et al.
Law of the Weakest Link: Cross Capabilities of Large Language Models
Ming Zhong, Aston Zhang, Xuewei Wang et al.
VORTA: Efficient Video Diffusion via Routing Sparse Attention
Wenhao Sun, Rong-Cheng Tu, Yifu Ding et al.
Latent-EnSF: A Latent Ensemble Score Filter for High-Dimensional Data Assimilation with Sparse Observation Data
Phillip Si, Peng Chen
A Meta-Learning Approach to Bayesian Causal Discovery
Anish Dhir, Matthew Ashman, James Requeima et al.
Learning Personalized Decision Support Policies
Umang Bhatt, Valerie Chen, Katherine M. Collins et al.
Towards Graph Foundation Models: Learning Generalities Across Graphs via Task-Trees
Zehong Wang, Zheyuan Zhang, Tianyi MA et al.
Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models
Amir Mohammad Karimi Mamaghan, Samuele Papa, Karl H. Johansson et al.
Improving the Sparse Structure Learning of Spiking Neural Networks from the View of Compression Efficiency
Jiangrong Shen, Qi Xu, Gang Pan et al.
LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models
Jian Liang, Wenke Huang, Guancheng Wan et al.
Advancing Mathematical Reasoning in Language Models: The Impact of Problem-Solving Data, Data Synthesis Methods, and Training Stages
Zui Chen, Tianqiao Liu, Tongqing et al.
DocVLM: Make Your VLM an Efficient Reader
Mor Shpigel Nacson, Aviad Aberdam, Roy Ganz et al.
Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation
Henghui Du, Guangyao Li, Chang Zhou et al.
P-SPIKESSM: HARNESSING PROBABILISTIC SPIKING STATE SPACE MODELS FOR LONG-RANGE DEPENDENCY TASKS
Malyaban Bal, Abhronil Sengupta
SplatTalk: 3D VQA with Gaussian Splatting
Anh Thai, Kyle Genova, Songyou Peng et al.
Text-to-LoRA: Instant Transformer Adaption
Rujikorn Charakorn, Edoardo Cetin, Yujin Tang et al.
Solving New Tasks by Adapting Internet Video Knowledge
Calvin Luo, Zilai Zeng, Yilun Du et al.
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
Wanpeng Zhang, Zilong Xie, Yicheng Feng et al.
QoQ-Med: Building Multimodal Clinical Foundation Models with Domain-Aware GRPO Training
David Dai, Peilin Chen, Chanakya Ekbote et al.
LinPrim: Linear Primitives for Differentiable Volumetric Rendering
Nicolas von Lützow, Matthias Niessner
MaskControl: Spatio-Temporal Control for Masked Motion Synthesis
Ekkasit Pinyoanuntapong, Muhammad Usama Saleem, Korrawe Karunratanakul et al.
EgoLM: Multi-Modal Language Model of Egocentric Motions
Fangzhou Hong, Vladimir Guzov, Hyo Jin Kim et al.
LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models
Haiwen Huang, Anpei Chen, Volodymyr Havrylov et al.
Towards Trustworthy Knowledge Graph Reasoning: An Uncertainty Aware Perspective
Bo Ni, Yu Wang, Lu Cheng et al.
LightningDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos
Yujun Shi, Jun Hao Liew, Hanshu Yan et al.
One-for-More: Continual Diffusion Model for Anomaly Detection
Xiaofan Li, Xin Tan, Zhuo Chen et al.
STD-PLM: Understanding Both Spatial and Temporal Properties of Spatial-Temporal Data with PLM
Yiheng Huang, Xiaowei Mao, Shengnan Guo et al.
LeanAgent: Lifelong Learning for Formal Theorem Proving
Adarsh Kumarappan, Mohit Tiwari, Peiyang Song et al.
GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution
Fengxiang Wang, Mingshuo Chen, Yueying Li et al.
MaestroMotif: Skill Design from Artificial Intelligence Feedback
Martin Klissarov, Mikael Henaff, Roberta Raileanu et al.
MMEgo: Towards Building Egocentric Multimodal LLMs for Video QA
Hanrong Ye, Haotian Zhang, Erik Daxberger et al.
TLDR: Token-Level Detective Reward Model for Large Vision Language Models
Deqing Fu, Tong Xiao, Rui Wang et al.
RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion
Xiaomeng Chu, Jiajun Deng, Guoliang You et al.
TimeDART: A Diffusion Autoregressive Transformer for Self-Supervised Time Series Representation
Daoyu Wang, Mingyue Cheng, Zhiding Liu et al.
In-context Time Series Predictor
Jiecheng Lu, Yan Sun, Shihao Yang
Toward Real-world BEV Perception: Depth Uncertainty Estimation via Gaussian Splatting
Shu-Wei Lu, Yi-Hsuan Tsai, Yi-Ting Chen
Light-T2M: A Lightweight and Fast Model for Text-to-motion Generation
Ling-An Zeng, Guohong Huang, Gaojie Wu et al.
ObjectMate: A Recurrence Prior for Object Insertion and Subject-Driven Generation
Daniel Winter, Asaf Shul, Matan Cohen et al.
Flow-Based Policy for Online Reinforcement Learning
Lei Lv, Yunfei Li, Yu Luo et al.
VAGEN: Reinforcing World Model Reasoning for Multi-Turn VLM Agents
Kangrui Wang, Pingyue Zhang, Zihan Wang et al.
Effective Training Data Synthesis for Improving MLLM Chart Understanding
Yuwei Yang, Zeyu Zhang, Yunzhong Hou et al.
Achieving Dimension-Free Communication in Federated Learning via Zeroth-Order Optimization
Zhe Li, Bicheng Ying, Zidong Liu et al.
Backdoor Attacks Against No-Reference Image Quality Assessment Models via a Scalable Trigger
Yi Yu, Song Xia, Xun Lin et al.
Yuan: Yielding Unblemished Aesthetics Through a Unified Network for Visual Imperfections Removal in Generated Images
Zhenyu Yu, Chee Seng Chan
NTPP: Generative Speech Language Modeling for Dual-Channel Spoken Dialogue via Next-Token-Pair Prediction
Qichao Wang, Ziqiao Meng, Wenqian Cui et al.
Stacking Brick by Brick: Aligned Feature Isolation for Incremental Face Forgery Detection
Jikang Cheng, Zhiyuan Yan, Ying Zhang et al.
Predictive Data Selection: The Data That Predicts Is the Data That Teaches
KaShun SHUM, Yuzhen Huang, Hongjian Zou et al.
Test-Time Scaling of Diffusion Models via Noise Trajectory Search
Vignav Ramesh, Morteza Mardani
Everything, Everywhere, All at Once: Is Mechanistic Interpretability Identifiable?
Maxime Méloux, Silviu Maniu, François Portet et al.
Manifold Learning by Mixture Models of VAEs for Inverse Problems
Giovanni S. Alberti, Johannes Hertrich, Matteo Santacesaria et al.
Generative Classifiers Avoid Shortcut Solutions
Alexander Li, Ananya Kumar, Deepak Pathak
Precise Localization of Memories: A Fine-grained Neuron-level Knowledge Editing Technique for LLMs
Haowen Pan, Xiaozhi Wang, Yixin Cao et al.
FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video
Yue Gao, Hong-Xing Yu, Bo Zhu et al.
Noise Stability Optimization for Finding Flat Minima: A Hessian-based Regularization Approach
Haotian Ju, Hongyang Zhang, Dongyue Li
The Illusion of Unlearning: The Unstable Nature of Machine Unlearning in Text-to-Image Diffusion Models
Naveen George, Karthik Nandan Dasaraju, Rutheesh Reddy Chittepu et al.
DiffPuter: Empowering Diffusion Models for Missing Data Imputation
Hengrui Zhang, Liancheng Fang, Qitian Wu et al.
Unbounded: A Generative Infinite Game of Character Life Simulation
Jialu Li, Yuanzhen Li, Neal Wadhwa et al.
SymmCompletion: High-Fidelity and High-Consistency Point Cloud Completion with Symmetry Guidance
Hongyu Yan, Zijun Li, Kunming Luo et al.
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Kianté Brantley, Mingyu Chen, Zhaolin Gao et al.
AdaManip: Adaptive Articulated Object Manipulation Environments and Policy Learning
Yuanfei Wang, Xiaojie Zhang, Ruihai Wu et al.
Edge Prompt Tuning for Graph Neural Networks
Xingbo Fu, Yinhan He, Jundong Li
SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models
Jiawei Zhang, Xuan Yang, Taiqi Wang et al.
MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation
Yukang Lin, Hokit Fung, Jianjin Xu et al.
FLAME: Learning to Navigate with Multimodal LLM in Urban Environments
Yunzhe Xu, Yiyuan Pan, Zhe Liu et al.
HiP-AD: Hierarchical and Multi-Granularity Planning with Deformable Attention for Autonomous Driving in a Single Decoder
Yingqi Tang, Zhuoran Xu, Zhaotie Meng et al.
Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaptation
Anqi Li, Feng Li, Yuxi Liu et al.
MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model
Chenjie Cao, Chaohui Yu, Shang Liu et al.
Bridging the Vision-Brain Gap with an Uncertainty-Aware Blur Prior
Haitao Wu, Qing Li, Changqing Zhang et al.
Generative Monoculture in Large Language Models
Fan Wu, Emily Black, Varun Chandrasekaran
OmniSR: Shadow Removal Under Direct and Indirect Lighting
Jiamin Xu, Zelong Li, Yuxin Zheng et al.
DASK: Distribution Rehearsing via Adaptive Style Kernel Learning for Exemplar-Free Lifelong Person Re-Identification
Kunlun Xu, Chenghao Jiang, Peixi Xiong et al.
Zero-shot Video Moment Retrieval via Off-the-shelf Multimodal Large Language Models
Yifang Xu, Yunzhuo Sun, Benxiang Zhai et al.
HashAttention: Semantic Sparsity for Faster Inference
Aditya Desai, Shuo Yang, Alejandro Cuadron et al.
OmniStyle: Filtering High Quality Style Transfer Data at Scale
Ye Wang, Ruiqi Liu, Jiang Lin et al.
GRPose: Learning Graph Relations for Human Image Generation with Pose Priors
Xiangchen Yin, Donglin Di, Lei Fan et al.
Debiased All-in-one Image Restoration with Task Uncertainty Regularization
Gang Wu, Junjun Jiang, Yijun Wang et al.
NexusGS: Sparse View Synthesis with Epipolar Depth Priors in 3D Gaussian Splatting
Yulong Zheng, Zicheng Jiang, Shengfeng He et al.
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
Jiange Yang, Haoyi Zhu, Yating Wang et al.
Bag of Tricks for Inference-time Computation of LLM Reasoning
Fan LIU, Wen-Shuo Chao, Naiqiang Tan et al.
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Chenyu Yang, Xuan Dong, Xizhou Zhu et al.
Detecting Backdoor Attacks in Federated Learning via Direction Alignment Inspection
Jiahao Xu, Zikai Zhang, Rui Hu
The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking
Yuchun Miao, Sen Zhang, Liang Ding et al.
HOIGPT: Learning Long-Sequence Hand-Object Interaction with Language Models
Mingzhen Huang, Fu-Jen Chu, Bugra Tekin et al.
UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing
Tsu-Jui Fu, Yusu Qian, Chen Chen et al.
GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration
Yuchen Sun, Shanhui Zhao, Tao Yu et al.
Scaling Inference-Efficient Language Models
Song Bian, Minghao Yan, Shivaram Venkataraman
Taming Knowledge Conflicts in Language Models
Gaotang Li, Yuzhong Chen, Hanghang Tong
Exploring the limits of strong membership inference attacks on large language models
Jamie Hayes, I Shumailov, Christopher A. Choquette-Choo et al.
GaussianSpa: An “Optimizing-Sparsifying” Simplification Framework for Compact and High-Quality 3D Gaussian Splatting
Yangming Zhang, Wenqi Jia, Wei Niu et al.
AdvPrefix: An Objective for Nuanced LLM Jailbreaks
Sicheng Zhu, Brandon Amos, Yuandong Tian et al.
SmartRAG: Jointly Learn RAG-Related Tasks From the Environment Feedback
Jingsheng Gao, Linxu Li, Ke Ji et al.
From 2D CAD Drawings to 3D Parametric Models: A Vision-Language Approach
Xilin Wang, Jia Zheng, Yuanchao Hu et al.
Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis
Bowen Zhang, Sicheng Xu, Chuxin Wang et al.
DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation
Zhixuan Liang, Yao Mu, Yixiao Wang et al.
Trusted Multi-View Classification via Evolutionary Multi-View Fusion
Xinyan Liang, Pinhan Fu, Yuhua Qian et al.
MADGEN: Mass-Spec attends to De Novo Molecular generation
Yinkai Wang, Xiaohui Chen, Liping Liu et al.
Unlocking the Capabilities of Large Vision-Language Models for Generalizable and Explainable Deepfake Detection
Peipeng Yu, Jianwei Fei, Hui Gao et al.
MoEQuant: Enhancing Quantization for Mixture-of-Experts Large Language Models via Expert-Balanced Sampling and Affinity Guidance
Zhixuan Chen, Xing Hu, Dawei Yang et al.
Conditioning Diffusions Using Malliavin Calculus
Jakiw Pidstrigach, Elizabeth Baker, Carles Domingo i Enrich et al.
WikiBigEdit: Understanding the Limits of Lifelong Knowledge Editing in LLMs
Lukas Thede, Karsten Roth, Matthias Bethge et al.