Most Cited 2025 "key-value pair reuse" Papers
22,274 papers found • Page 29 of 112
Conference
Transformers Learn Low Sensitivity Functions: Investigations and Implications
Bhavya Vasudeva, Deqing Fu, Tianyi Zhou et al.
ZoomLDM: Latent Diffusion Model for Multi-scale Image Generation
Srikar Yellapragada, Alexandros Graikos, Kostas Triaridis et al.
Accurate Differential Operators for Hybrid Neural Fields
Aditya Chetan, Guandao Yang, Zichen Wang et al.
PhysX-3D: Physical-Grounded 3D Asset Generation
Ziang Cao, Zhaoxi Chen, Liang Pan et al.
Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration
Zilong Huang, Jun He, Junyan Ye et al.
DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion
Qingcheng Zhao, Xiang Zhang, Haiyang Xu et al.
Spectral Informed Mamba for Robust Point Cloud Processing
Ali Bahri, Moslem Yazdanpanah, Mehrdad Noori et al.
On the Sample Complexity Bounds of Bilevel Reinforcement Learning
Mudit Gaur, Utsav Singh, Amrit Singh Bedi et al.
MAR-3D: Progressive Masked Auto-regressor for High-Resolution 3D Generation
Jinnan Chen, Lingting Zhu, Zeyu HU et al.
Embodied Web Agents: Bridging Physical-Digital Realms for Integrated Agent Intelligence
Yining Hong, Rui Sun, Bingxuan Li et al.
NullSwap: Proactive Identity Cloaking Against Deepfake Face Swapping
Tianyi Wang, Shuaicheng Niu, Harry Cheng et al.
Lost in Latent Space: An Empirical Study of Latent Diffusion Models for Physics Emulation
François Rozet, Ruben Ohana, Michael McCabe et al.
ReDit: Reward Dithering for Improved LLM Policy Optimization
Chenxing Wei, Jiarui Yu, Ying He et al.
Monocular and Generalizable Gaussian Talking Head Animation
Shengjie Gong, Haojie Li, Jiapeng Tang et al.
Contextual AD Narration with Interleaved Multimodal Sequence
Hanlin Wang, Zhan Tong, Kecheng Zheng et al.
Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach
Yunuo Chen, Junli Cao, Vidit Goel et al.
Instruction-based Image Manipulation by Watching How Things Move
Mingdeng Cao, Xuaner Zhang, Yinqiang Zheng et al.
Towards Generalizable Scene Change Detection
Jae-Woo KIM, Ue-Hwan Kim
Oasis: One Image is All You Need for Multimodal Instruction Data Synthesis
Letian Zhang, Quan Cui, Bingchen Zhao et al.
Stable Port-Hamiltonian Neural Networks
Fabian J. Roth, Dominik K. Klein, Maximilian Kannapinn et al.
Thinker: Learning to Think Fast and Slow
Stephen Chung, Wenyu Du, Jie Fu
p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay
Jun Zhang, Desen Meng, Zhengming Zhang et al.
Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve Adjustment
Ziteng Cui, Xuangeng Chu, Tatsuya Harada
SWE-SQL: Illuminating LLM Pathways to Solve User SQL Issues in Real-World Applications
Jinyang Li, Xiaolong Li, Ge Qu et al.
Orchid: Image Latent Diffusion for Joint Appearance and Geometry Generation
Akshay Krishnan, Xinchen Yan, Vincent Casser et al.
The Art of Deception: Color Visual Illusions and Diffusion Models
Alexandra Gomez-Villa, Kai Wang, C.Alejandro Parraga et al.
Causally Reliable Concept Bottleneck Models
Giovanni De Felice, Arianna Casanova Flores, Francesco De Santis et al.
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding
Chongjun Tu, Lin Zhang, pengtao chen et al.
Understanding Adam Requires Better Rotation Dependent Assumptions
Tianyue Zhang, Lucas Maes, Alan Milligan et al.
Unified Multimodal Understanding via Byte-Pair Visual Encoding
Wanpeng Zhang, Yicheng Feng, Hao Luo et al.
PERSE: Personalized 3D Generative Avatars from A Single Portrait
Hyunsoo Cha, Inhee Lee, Hanbyul Joo
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training
Tong Wei, Yijun Yang, Junliang Xing et al.
Root Cause Analysis of Outliers with Missing Structural Knowledge
William Roy Orchard, Nastaran Okati, Sergio Garrido Mejia et al.
Why LVLMs Are More Prone to Hallucinations in Longer Responses: The Role of Context
Ge Zheng, Jiaye Qian, Jiajin Tang et al.
LMM-Det: Make Large Multimodal Models Excel in Object Detection
Jincheng Li, Chunyu Xie, Ji Ao et al.
Why Masking Diffusion Works: Condition on the Jump Schedule for Improved Discrete Diffusion
Alan Amin, Nate Gruver, Andrew Wilson
FALCON: Fairness Learning via Contrastive Attention Approach to Continual Semantic Scene Understanding
Thanh-Dat Truong, Utsav Prabhu, Bhiksha Raj et al.
Effective SAM Combination for Open-Vocabulary Semantic Segmentation
Minhyeok Lee, Suhwan Cho, Jungho Lee et al.
CorrCLIP: Reconstructing Patch Correlations in CLIP for Open-Vocabulary Semantic Segmentation
Dengke Zhang, Fagui Liu, Quan Tang
ModSkill: Physical Character Skill Modularization
Yiming Huang, Zhiyang Dou, Lingjie Liu
Enhancing SAM with Efficient Prompting and Preference Optimization for Semi-supervised Medical Image Segmentation
Aishik Konwer, Zhijian Yang, Erhan Bas et al.
Discrete Neural Flow Samplers with Locally Equivariant Transformer
Zijing Ou, Ruixiang Zhang, Yingzhen Li
AMO Sampler: Enhancing Text Rendering with Overshooting
Xixi Hu, Keyang Xu, Bo Liu et al.
MonoSplat: Generalizable 3D Gaussian Splatting from Monocular Depth Foundation Models
Yifan Liu, Keyu Fan, Weihao Yu et al.
Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing
Hanhui Wang, Yihua Zhang, Ruizheng Bai et al.
LLM Strategic Reasoning: Agentic Study through Behavioral Game Theory
Jingru Jia, Zehua Yuan, Junhao Pan et al.
CPathAgent: An Agent-based Foundation Model for Interpretable High-Resolution Pathology Image Analysis Mimicking Pathologists' Diagnostic Logic
YUXUAN SUN, Yixuan Si, Chenglu Zhu et al.
Among Us: A Sandbox for Measuring and Detecting Agentic Deception
Satvik Golechha, Adrià Garriga-Alonso
RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training
Raktim Gautam Goswami, Prashanth Krishnamurthy, Yann LeCun et al.
Can DPO Learn Diverse Human Values? A Theoretical Scaling Law
Shawn Im, Sharon Li
ChatReID: Open-ended Interactive Person Retrieval via Hierarchical Progressive Tuning for Vision Language Models
Ke Niu, Haiyang Yu, Mengyang Zhao et al.
A multiscale analysis of mean-field transformers in the moderate interaction regime
Giuseppe Bruno, Federico Pasqualotto, Andrea Agazzi
MUNBa: Machine Unlearning via Nash Bargaining
Jing Wu, Mehrtash Harandi
Interpretable Image Classification via Non-parametric Part Prototype Learning
Zhijie Zhu, Lei Fan, Maurice Pagnucco et al.
Noise Modeling in One Hour: Minimizing Preparation Efforts for Self-supervised Low-Light RAW Image Denoising
Feiran Li, Haiyang Jiang, Daisuke Iso
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs
Xiaoqin Wang, Xusen Ma, Xianxu Hou et al.
DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion
Qitao Zhao, Amy Lin, Jeff Tan et al.
ProbPose: A Probabilistic Approach to 2D Human Pose Estimation
Miroslav Purkrábek, Jiri Matas
WHAT MAKES MATH PROBLEMS HARD FOR REINFORCEMENT LEARNING: A CASE STUDY
Ali Shehper, Anibal Medina-Mardones, Lucas Fagan et al.
DnLUT: Ultra-Efficient Color Image Denoising via Channel-Aware Lookup Tables
Sidi Yang, Binxiao Huang, Yulun Zhang et al.
T2I-Copilot: A Training-Free Multi-Agent Text-to-Image System for Enhanced Prompt Interpretation and Interactive Generation
Chieh-Yun Chen, Min Shi, Gong Zhang et al.
ZIM: Zero-Shot Image Matting for Anything
Beomyoung Kim, Chanyong Shin, Joonhyun Jeong et al.
GraphLand: Evaluating Graph Machine Learning Models on Diverse Industrial Data
Gleb Bazhenov, Oleg Platonov, Liudmila Prokhorenkova
LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS
Wanhua Li, Yujie Zhao, Minghan Qin et al.
HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned Guidance
Jiazi Bu, Pengyang Ling, Yujie Zhou et al.
MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI
Huanjin Yao, Jiaxing Huang, Yawen Qiu et al.
GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation
Ning Gao, Yilun Chen, Shuai Yang et al.
DeNVeR: Deformable Neural Vessel Representations for Unsupervised Video Vessel Segmentation
Chun-Hung Wu, Shih-Hong Chen, Chih Yao Hu et al.
Erase Diffusion: Empowering Object Removal Through Calibrating Diffusion Pathways
Yi Liu, Hao Zhou, Benlei Cui et al.
Exploring Temporally-Aware Features for Point Tracking
Inès Hyeonsu Kim, Seokju Cho, Gabriel Huang et al.
Generating Multimodal Driving Scenes via Next-Scene Prediction
Yanhao Wu, Haoyang Zhang, Tianwei Lin et al.
COCONut-PanCap: Joint Panoptic Segmentation and Grounded Captions for Fine-Grained Understanding and Generation
Xueqing Deng, Linjie Yang, Qihang Yu et al.
Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images
Kazi Sajeed Mehrab, M. Maruf, Arka Daw et al.
RoME: Domain-Robust Mixture-of-Experts for MILP Solution Prediction across Domains
Tianle Pu, Zijie Geng, Haoyang Liu et al.
Cross-modal Causal Relation Alignment for Video Question Grounding
weixing chen, Yang Liu, Binglin Chen et al.
SyncDiff: Synchronized Motion Diffusion for Multi-Body Human-Object Interaction Synthesis
Wenkun He, Yun Liu, Ruitao Liu et al.
Bringing RNNs Back to Efficient Open-Ended Video Understanding
Weili Xu, Enxin Song, Wenhao Chai et al.
Overcoming Challenges of Long-Horizon Prediction in Driving World Models
Arian Mousakhan, Sudhanshu Mittal, Silvio Galesso et al.
AFL: A Single-Round Analytic Approach for Federated Learning with Pre-trained Models
Run He, Kai Tong, Di Fang et al.
Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition
Chengxiang Huang, Yake Wei, Zequn Yang et al.
From Judgment to Interference: Early Stopping LLM Harmful Outputs via Streaming Content Monitoring
Yang Li, Qiang Sheng, Yehan Yang et al.
nvBench 2.0: Resolving Ambiguity in Text-to-Visualization through Stepwise Reasoning
Tianqi Luo, Chuhan Huang, Leixian Shen et al.
ConMo: Controllable Motion Disentanglement and Recomposition for Zero-Shot Motion Transfer
Jiayi Gao, Zijin Yin, Changcheng Hua et al.
LangBridge: Interpreting Image as a Combination of Language Embeddings
Jiaqi Liao, Yuwei Niu, Fanqing Meng et al.
HAMoBE: Hierarchical and Adaptive Mixture of Biometric Experts for Video-based Person ReID
Yiyang Su, Yunping Shi, Feng Liu et al.
CrossOver: 3D Scene Cross-Modal Alignment
Sayan Deb Sarkar, Ondrej Miksik, Marc Pollefeys et al.
ProKeR: A Kernel Perspective on Few-Shot Adaptation of Large Vision-Language Models
Yassir Bendou, Amine Ouasfi, Vincent Gripon et al.
Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning
Saemi Moon, Minjong Lee, Sangdon Park et al.
GS-2DGS: Geometrically Supervised 2DGS for Reflective Object Reconstruction
Jinguang Tong, Xuesong li, Fahira Afzal Maken et al.
PurpCode: Reasoning for Safer Code Generation
Jiawei Liu, Nirav Diwan, Zhe Wang et al.
Deeply Supervised Flow-Based Generative Models
Inkyu Shin, Chenglin Yang, Liang-Chieh Chen
3D-GSW: 3D Gaussian Splatting for Robust Watermarking
Youngdong Jang, Hyunje Park, Feng Yang et al.
Learning normalized image densities via dual score matching
Florentin Guth, Zahra Kadkhodaie, Eero Simoncelli
Do Automatic Factuality Metrics Measure Factuality? A Critical Evaluation
Sanjana Ramprasad, Byron Wallace
Parameter-efficient Fine-tuning in Hyperspherical Space for Open-vocabulary Semantic Segmentation
Zelin Peng, Zhengqin Xu, Zhilin Zeng et al.
DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery
Jiadong Tang, Yu Gao, Dianyi Yang et al.
GaussianFusion: Gaussian-Based Multi-Sensor Fusion for End-to-End Autonomous Driving
Shuai Liu, Quanmin Liang, Zefeng Li et al.
ROD-MLLM: Towards More Reliable Object Detection in Multimodal Large Language Models
Heng Yin, Yuqiang Ren, Ke Yan et al.
The Change You Want To Detect: Semantic Change Detection In Earth Observation With Hybrid Data Generationf
Yanis Benidir, Nicolas Gonthier, Clement Mallet
Focusing on Tracks for Online Multi-Object Tracking
Kyujin Shim, Kangwook Ko, YuJin Yang et al.
BANet: Bilateral Aggregation Network for Mobile Stereo Matching
Gangwei Xu, Jiaxin Liu, Xianqi Wang et al.
SIGMAN: Scaling 3D Human Gaussian Generation with Millions of Assets
Yuhang Yang, Fengqi Liu, Yixing Lu et al.
MimeQA: Towards Socially-Intelligent Nonverbal Foundation Models
Hengzhi Li, Megan Tjandrasuwita, Yi R. (May) Fung et al.
Extrapolated Urban View Synthesis Benchmark
Xiangyu Han, Zhen Jia, Boyi Li et al.
Solving Inverse Problems with FLAIR
Julius Erbach, Dominik Narnhofer, Andreas Dombos et al.
Split Gibbs Discrete Diffusion Posterior Sampling
Wenda Chu, Zihui Wu, Yifan Chen et al.
3D Student Splatting and Scooping
Jialin Zhu, Jiangbei Yue, Feixiang He et al.
MMLongBench: Benchmarking Long-Context Vision-Language Models Effectively and Thoroughly
Zhaowei Wang, Wenhao Yu, Xiyu REN et al.
True Zero-Shot Inference of Dynamical Systems Preserving Long-Term Statistics
Christoph Jürgen Hemmer, Daniel Durstewitz
Decouple and Track: Benchmarking and Improving Video Diffusion Transformers For Motion Transfer
Qingyu Shi, Jianzong Wu, Jinbin Bai et al.
RAD: Region-Aware Diffusion Models for Image Inpainting
Sora Kim, Sungho Suh, Minsik Lee
Boost Your Human Image Generation Model via Direct Preference Optimization
Sanghyeon Na, Yonggyu Kim, Hyunjoon Lee
Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion
Zexin He, Tengfei Wang, Xin Huang et al.
Exploring CLIP's Dense Knowledge for Weakly Supervised Semantic Segmentation
Zhiwei Yang, Yucong Meng, Kexue Fu et al.
CRISP: Object Pose and Shape Estimation with Test-Time Adaptation
Jingnan Shi, Rajat Talak, Harry Zhang et al.
DynaGuide: Steering Diffusion Polices with Active Dynamic Guidance
Maximilian Du, Shuran Song
Towards Stable and Storage-efficient Dataset Distillation: Matching Convexified Trajectory
Wenliang Zhong, Haoyu Tang, Qinghai Zheng et al.
g3D-LF: Generalizable 3D-Language Feature Fields for Embodied Tasks
Zihan Wang, Gim Hee Lee
Edge-SD-SR: Low Latency and Parameter Efficient On-device Super-Resolution with Stable Diffusion via Bidirectional Conditioning
Isma Hadji, Mehdi Noroozi, Victor Escorcia et al.
SimpleStrat: Diversifying Language Model Generation with Stratification
Justin Wong, Yury Orlovskiy, Alexander Shypula et al.
MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios
Yang Shi, Huanqian Wang, Xie et al.
MLZero: A Multi-Agent System for End-to-end Machine Learning Automation
Haoyang Fang, Boran Han, Nick Erickson et al.
Training-Free Text-Guided Image Editing with Visual Autoregressive Model
Yufei Wang, Lanqing Guo, Zhihao Li et al.
UniMamba: Unified Spatial-Channel Representation Learning with Group-Efficient Mamba for LiDAR-based 3D Object Detection
Xin Jin, Haisheng Su, Kai Liu et al.
Sparse2DGS: Geometry-Prioritized Gaussian Splatting for Surface Reconstruction from Sparse Views
Jiang Wu, Rui Li, Yu Zhu et al.
DyCON: Dynamic Uncertainty-aware Consistency and Contrastive Learning for Semi-supervised Medical Image Segmentation
Maregu Assefa, Muzammal Naseer, IYYAKUTTI IYAPPAN GANAPATHI et al.
FlowMo: Variance-Based Flow Guidance for Coherent Motion in Video Generation
Ariel Shaulov, Itay Hazan, Lior Wolf et al.
DTGBrepGen: A Novel B-rep Generative Model through Decoupling Topology and Geometry
Jing Li, Yihang Fu, Falai Chen
IR3D-Bench: Evaluating Vision-Language Model Scene Understanding as Agentic Inverse Rendering
Hengyu Liu, Chenxin Li, Zhengxin Li et al.
DynamicID: Zero-Shot Multi-ID Image Personalization with Flexible Facial Editability
Xirui Hu, Jiahao Wang, Hao chen et al.
Analyzing Finetuning Representation Shift for Multimodal LLMs Steering
Pegah KHAYATAN, Mustafa Shukor, Jayneel Parekh et al.
VideoMAR: Autoregressive Video Generation with Continuous Tokens
Hu Yu, Biao Gong, Hangjie Yuan et al.
A Smooth Sea Never Made a Skilled SAILOR: Robust Imitation via Learning to Search
Arnav Kumar Jain, Vibhakar Mohta, Subin Kim et al.
Composite Flow Matching for Reinforcement Learning with Shifted-Dynamics Data
Lingkai Kong, Haichuan Wang, Tonghan Wang et al.
Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning
Pengxiang Li, Zhi Gao, Bofei Zhang et al.
LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living
Dominick Reilly, Rajatsubhra Chakraborty, Arkaprava Sinha et al.
Finite-Sample Analysis of Policy Evaluation for Robust Average Reward Reinforcement Learning
Yang Xu, Washim Mondal, Vaneet Aggarwal
DAViD: Modeling Dynamic Affordance of 3D Objects Using Pre-trained Video Diffusion Models
Hyeonwoo Kim, Sangwon Baik, Hanbyul Joo
HouseCrafter: Lifting Floorplans to 3D Scenes with 2D Diffusion Models
YIWEN CHEN, Hieu Nguyen, Vikram Voleti et al.
CL-LoRA: Continual Low-Rank Adaptation for Rehearsal-Free Class-Incremental Learning
Jiangpeng He, Zhihao Duan, Fengqing Zhu
SeqGrowGraph: Learning Lane Topology as a Chain of Graph Expansions
Mengwei Xie, Shuang Zeng, Xinyuan Chang et al.
Web-Shepherd: Advancing PRMs for Reinforcing Web Agents
Hyungjoo Chae, Seonghwan Kim, Junhee Cho et al.
Multi-modal Knowledge Distillation-based Human Trajectory Forecasting
Jaewoo Jeong, Seohee Lee, Daehee Park et al.
GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models with Generative Flow Networks
Haoqiang Kang, Enna Sachdeva, Piyush Gupta et al.
Learned Image Compression with Hierarchical Progressive Context Modeling
Yuqi Li, Haotian Zhang, Li Li et al.
Mask^2DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation
Tianhao Qi, Jianlong Yuan, Wanquan Feng et al.
MPS-Prover: Advancing Stepwise Theorem Proving by Multi-Perspective Search and Data Curation
Zhenwen Liang, Linfeng Song, Yang Li et al.
AnomalyNCD: Towards Novel Anomaly Class Discovery in Industrial Scenarios
Ziming Huang, Xurui Li, Haotian Liu et al.
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding
Jinlong Li, Cristiano Saltori, Fabio Poiesi et al.
Visual Jenga: Discovering Object Dependencies via Counterfactual Inpainting
Anand Bhattad, Konpat Preechakul, Alexei Efros
Generative Zero-Shot Composed Image Retrieval
Lan Wang, Wei Ao, Vishnu Naresh Boddeti et al.
GC4NC: A Benchmark Framework for Graph Condensation on Node Classification with New Insights
Shengbo Gong, Juntong Ni, Noveen Sachdeva et al.
Always Skip Attention
Yiping Ji, Hemanth Saratchandran, Peyman Moghadam et al.
Extrapolation by Association: Length Generalization Transfer In Transformers
Ziyang Cai, Nayoung Lee, Avi Schwarzschild et al.
LoTUS: Large-Scale Machine Unlearning with a Taste of Uncertainty
Christoforos N. Spartalis, Theodoros Semertzidis, Efstratios Gavves et al.
Token Activation Map to Visually Explain Multimodal LLMs
Yi Li, Hualiang Wang, Xinpeng Ding et al.
Linear Attention Modeling for Learned Image Compression
Donghui Feng, Zhengxue Cheng, Shen Wang et al.
Multi-Granularity Class Prototype Topology Distillation for Class-Incremental Source-Free Unsupervised Domain Adaptation
Peihua Deng, Jiehua Zhang, Xichun Sheng et al.
OpenGU: A Comprehensive Benchmark for Graph Unlearning
Bowen Fan, Yuming Ai, Xunkai Li et al.
H3D-DGS: Exploring Heterogeneous 3D Motion Representation for Deformable 3D Gaussian Splatting
Bing He, Yunuo Chen, Guo Lu et al.
PRE-Mamba: A 4D State Space Model for Ultra-High-Frequent Event Camera Deraining
Ciyu Ruan, Ruishan Guo, Zihang GONG et al.
RS-vHeat: Heat Conduction Guided Efficient Remote Sensing Foundation Model
Huiyang Hu, Peijin Wang, Hanbo Bi et al.
DA-VPT: Semantic-Guided Visual Prompt Tuning for Vision Transformers
Li Ren, Chen Chen, Liqiang Wang et al.
Token Embeddings Violate the Manifold Hypothesis
Michael Robinson, Sourya Dey, Tony Chiang
Training-Free Safe Denoisers for Safe Use of Diffusion Models
Mingyu Kim, Dongjun Kim, Amman Yusuf et al.
DOLLAR: Few-Step Video Generation via Distillation and Latent Reward Optimization
Zihan Ding, Chi Jin, Difan Liu et al.
DiC: Rethinking Conv3x3 Designs in Diffusion Models
Yuchuan Tian, Jing Han, Chengcheng Wang et al.
PEACE: Empowering Geologic Map Holistic Understanding with MLLMs
Yangyu Huang, Tianyi Gao, Haoran Xu et al.
InfiniDreamer: Arbitrarily Long Human Motion Generation via Segment Score Distillation
Wenjie Zhuo, Fan Ma, Hehe Fan
GaussRender: Learning 3D Occupancy with Gaussian Rendering
Loick Chambon, Eloi Zablocki, Alexandre Boulch et al.
TimE: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenarios
Shaohang Wei, Wei Li, Feifan Song et al.
Interactive Medical Image Analysis with Concept-based Similarity Reasoning
Ta Duc Huy, Sen Kim Tran, Phan Nguyen et al.
DNAEdit: Direct Noise Alignment for Text-Guided Rectified Flow Editing
Chenxi Xie, Minghan Li, Shuai Li et al.
Optical-Flow Guided Prompt Optimization for Coherent Video Generation
Hyelin Nam, Jaemin Kim, Dohun Lee et al.
CoTMR: Chain-of-Thought Multi-Scale Reasoning for Training-Free Zero-Shot Composed Image Retrieval
Zelong Sun, Dong Jing, Zhiwu Lu
Scalable Fingerprinting of Large Language Models
Anshul Nasery, Jonathan Hayase, Creston Brooks et al.
Erasing More Than Intended? How Concept Erasure Degrades the Generation of Non-Target Concepts
Ibtihel Amara, Ahmed Imtiaz Humayun, Ivana Kajic et al.
POp-GS: Next Best View in 3D-Gaussian Splatting with P-Optimality
Joey Wilson, Marcelino M. de Almeida, Sachit Mahajan et al.
Kinetics: Rethinking Test-Time Scaling Law
Ranajoy Sadhukhan, Zhuoming Chen, Haizhong Zheng et al.
Joint Out-of-Distribution Filtering and Data Discovery Active Learning
Sebastian Schmidt, Leonard Schenk, Leo Schwinn et al.
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
Maximilian Beck, Korbinian Pöppel, Phillip Lippe et al.
Non-equilibrium Annealed Adjoint Sampler
Jaemoo Choi, Yongxin Chen, Molei Tao et al.
CacheQuant: Comprehensively Accelerated Diffusion Models
Xuewen Liu, Zhikai Li, Qingyi Gu
MEGA: Masked Generative Autoencoder for Human Mesh Recovery
Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda et al.
PhysSplat: Efficient Physics Simulation for 3D Scenes via MLLM-Guided Gaussian Splatting
Haoyu Zhao, Hao Wang, Xingyue Zhao et al.
Self-Cross Diffusion Guidance for Text-to-Image Synthesis of Similar Subjects
Weimin Qiu, Jieke Wang, Meng Tang
One-Step Diffusion for Detail-Rich and Temporally Consistent Video Super-Resolution
Yujing Sun, Lingchen Sun, Shuaizheng Liu et al.
Adversarial Paraphrasing: A Universal Attack for Humanizing AI-Generated Text
Yize Cheng, Vinu Sankar Sadasivan, Mehrdad Saberi et al.
ARM: Appearance Reconstruction Model for Relightable 3D Generation
Xiang Feng, Chang Yu, Zoubin Bi et al.
SemGeoMo: Dynamic Contextual Human Motion Generation with Semantic and Geometric Guidance
Peishan Cong, Ziyi Wang, Yuexin Ma et al.
Compression-Aware One-Step Diffusion Model for JPEG Artifact Removal
Jinpei Guo, Zheng Chen, Wenbo Li et al.
DISC: Dynamic Decomposition Improves LLM Inference Scaling
Jonathan Li, Wei Cheng, Benjamin Riviere et al.
AV-Flow: Transforming Text to Audio-Visual Human-like Interactions
Aggelina Chatziagapi, Louis-Philippe Morency, Hongyu Gong et al.
4KAgent: Agentic Any Image to 4K Super-Resolution
Yushen Zuo, Qi Zheng, Mingyang Wu et al.
Tartan IMU: A Light Foundation Model for Inertial Positioning in Robotics
Shibo Zhao, Sifan Zhou, Raphael Blanchard et al.
StateSpaceDiffuser: Bringing Long Context to Diffusion World Models
Nedko Savov, Naser Kazemi, Deheng Zhang et al.
Foundations of Top-$k$ Decoding for Language Models
Georgy Noarov, Soham Mallick, Tao Wang et al.
Hybrid Latent Reasoning via Reinforcement Learning
Zhenrui Yue, Bowen Jin, Huimin Zeng et al.
Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation
Bolin Lai, Felix Juefei-Xu, Miao Liu et al.
Enhancing Creative Generation on Stable Diffusion-based Models
Jiyeon Han, Dahee Kwon, Gayoung Lee et al.