Most Cited 2025 "style-content separation" Papers
22,274 papers found • Page 11 of 112
Conference
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models
Dewei Zhou, Mingwei Li, Zongxin Yang et al.
Streamlining Redundant Layers to Compress Large Language Models
Xiaodong Chen, Yuxuan Hu, Jing Zhang et al.
Multi-Domain Graph Foundation Models: Robust Knowledge Transfer via Topology Alignment
Shuo Wang, Bokui Wang, Zhixiang Shen et al.
Logically Consistent Language Models via Neuro-Symbolic Integration
Diego Calanzone, Stefano Teso, Antonio Vergari
Generating Multi-Image Synthetic Data for Text-to-Image Customization
Nupur Kumari, Xi Yin, Jun-Yan Zhu et al.
FaithDiff: Unleashing Diffusion Priors for Faithful Image Super-resolution
Junyang Chen, Jinshan Pan, Jiangxin Dong
CoTFormer: A Chain of Thought Driven Architecture with Budget-Adaptive Computation Cost at Inference
Amirkeivan Mohtashami, Matteo Pagliardini, Martin Jaggi
ET-SEED: EFFICIENT TRAJECTORY-LEVEL SE(3) EQUIVARIANT DIFFUSION POLICY
Chenrui Tie, Yue Chen, Ruihai Wu et al.
Logical Consistency of Large Language Models in Fact-Checking
Bishwamittra Ghosh, Sarah Hasan, Naheed Anjum Arafat et al.
AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration
Andy Zhou, Kevin Wu, Francesco Pinto et al.
TimeDP: Learning to Generate Multi-Domain Time Series with Domain Prompts
Yu-Hao Huang, Chang Xu, Yueying Wu et al.
RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers
Yan Gong, Yiren Song, Yicheng Li et al.
GaussianFlowOcc: Sparse and Weakly Supervised Occupancy Estimation using Gaussian Splatting and Temporal Flow
Simon Boeder, Fabian Gigengack, Benjamin Risse
ILIAS: Instance-Level Image retrieval At Scale
Giorgos Kordopatis-Zilos, Vladan Stojnić, Anna Manko et al.
PuzzleFusion++: Auto-agglomerative 3D Fracture Assembly by Denoise and Verify
Zhengqing Wang, Jiacheng Chen, Yasutaka Furukawa
Dynamic Camera Poses and Where to Find Them
Chris Rockwell, Joseph Tung, Tsung-Yi Lin et al.
SOLVE: Synergy of Language-Vision and End-to-End Networks for Autonomous Driving
Xuesong Chen, Linjiang Huang, Tao Ma et al.
RoboScape: Physics-informed Embodied World Model
Yu Shang, Xin Zhang, Yinzhou Tang et al.
Spiking Vision Transformer with Saccadic Attention
Shuai Wang, Malu Zhang, Dehao Zhang et al.
Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models
Yiran Guo, Lijie Xu, Jie Liu et al.
From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications
Ajay Jaiswal, Yifan Wang, Lu Yin et al.
Revisiting Nearest Neighbor for Tabular Data: A Deep Tabular Baseline Two Decades Later
Han-Jia Ye, Huai-Hong Yin, De-Chuan Zhan et al.
DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation
Changdae Oh, Yixuan Li, Kyungwoo Song et al.
Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach
Jiancong Xiao, Bojian Hou, Zhanliang Wang et al.
FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model
Jun Zhou, Jiahao Li, Zunnan Xu et al.
Stochastic Deep Restoration Priors for Imaging Inverse Problems
Yuyang Hu, Albert Peng, Weijie Gan et al.
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Zhaolin Gao, Wenhao Zhan, Jonathan Chang et al.
Thermalizer: Stable autoregressive neural emulation of spatiotemporal chaos
Chris Pedersen, Laure Zanna, Joan Bruna
MangaNinja: Line Art Colorization with Precise Reference Following
Zhiheng Liu, Ka Leong Cheng, Xi Chen et al.
Is Artificial Intelligence Generated Image Detection a Solved Problem?
Ziqiang Li, Jiazhen Yan, Ziwen He et al.
Breaking the Low-Rank Dilemma of Linear Attention
Qihang Fan, Huaibo Huang, Ran He
Personalized Federated Learning for Spatio-Temporal Forecasting: A Dual Semantic Alignment-Based Contrastive Approach
Qingxiang Liu, Sheng Sun, Yuxuan Liang et al.
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
Sagnik Mukherjee, Lifan Yuan, Dilek Hakkani-Tur et al.
Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs
Sungmin Cha, Sungjun Cho, Dasol Hwang et al.
One-for-All Few-Shot Anomaly Detection via Instance-Induced Prompt Learning
Wenxi Lv, Qinliang Su, Wenchao Xu
Black-Box Detection of Language Model Watermarks
Thibaud Gloaguen, Nikola Jovanović, Robin Staab et al.
VMBench: A Benchmark for Perception-Aligned Video Motion Generation
Xinran Ling, Chen Zhu, Meiqi Wu et al.
Erwin: A Tree-based Hierarchical Transformer for Large-scale Physical Systems
Maksim Zhdanov, Max Welling, Jan-Willem van de Meent
Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization
Zichen Miao, Zhengyuan Yang, Kevin Lin et al.
Dynamic Gaussians Mesh: Consistent Mesh Reconstruction from Dynamic Scenes
Isabella Liu, Hao Su, Xiaolong Wang
TimeKAN: KAN-based Frequency Decomposition Learning Architecture for Long-term Time Series Forecasting
Songtao Huang, Zhen Zhao, Can Li et al.
Wasserstein Flow Matching: Generative Modeling Over Families of Distributions
Doron Haviv, Aram-Alexandre Pooladian, Dana Pe'er et al.
BillBoard Splatting (BBSplat): Learnable Textured Primitives for Novel View Synthesis
David Svitov, Pietro Morerio, Lourdes Agapito et al.
PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models
Dhouib Mohamed, Davide Buscaldi, Vanier Sonia et al.
UNSURE: self-supervised learning with Unknown Noise level and Stein's Unbiased Risk Estimate
Julián Tachella, Mike Davies, Laurent Jacques
IncEventGS: Pose-Free Gaussian Splatting from a Single Event Camera
Jian Huang, Chengrui Dong, Xuanhua Chen et al.
AIR-BENCH 2024: A Safety Benchmark based on Regulation and Policies Specified Risk Categories
Yi Zeng, Yu Yang, Andy Zhou et al.
AutoToM: Scaling Model-based Mental Inference via Automated Agent Modeling
Zhining Zhang, Chuanyang Jin, Mung Yao Jia et al.
Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards
Charles Arnal, Gaëtan Narozniak, Vivien Cabannes et al.
CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images
Chen Cheng, Jiacheng Wei, Tianrun Chen et al.
MANTA: A Large-Scale Multi-View and Visual-Text Anomaly Detection Dataset for Tiny Objects
Lei Fan, Dongdong Fan, Zhiguang Hu et al.
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He, Qihang Yu, Qihao Liu et al.
SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes
Cheng-De Fan, Chen-Wei Chang, Yi-Ruei Liu et al.
Two-stream Beats One-stream: Asymmetric Siamese Network for Efficient Visual Tracking
Jiawen Zhu, Huayi Tang, Xin Chen et al.
AdaGrad under Anisotropic Smoothness
Yuxing Liu, Rui Pan, Tong Zhang
Interpreting the linear structure of vision-language model embedding spaces
Isabel Papadimitriou, Huangyuan Su, Thomas Fel et al.
An Empirical Analysis of Uncertainty in Large Language Model Evaluations
Qiujie Xie, Qingqiu Li, Zhuohao Yu et al.
ActionPiece: Contextually Tokenizing Action Sequences for Generative Recommendation
Yupeng Hou, Jianmo Ni, Zhankui He et al.
Presto! Distilling Steps and Layers for Accelerating Music Generation
Zachary Novack, Ge Zhu, Jonah Casebeer et al.
SITCOM: Step-wise Triple-Consistent Diffusion Sampling For Inverse Problems
Ismail Alkhouri, Shijun Liang, Cheng-Han Huang et al.
Can A Society of Generative Agents Simulate Human Behavior and Inform Public Health Policy? A Case Study on Vaccine Hesitancy
Abe Bohan Hou, Hongru Du, Yichen Wang et al.
Improved Regret Bounds for Linear Adversarial MDPs via Linear Optimization
XiangCheng Zhang, Fang Kong, Baoxiang Wang et al.
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Julie Kallini, Shikhar Murty, Christopher Manning et al.
Efficient Track Anything
Yunyang Xiong, Chong Zhou, Xiaoyu Xiang et al.
Toward Understanding In-context vs. In-weight Learning
Bryan Chan, Xinyi Chen, Andras Gyorgy et al.
Diffusion on Language Model Encodings for Protein Sequence Generation
Viacheslav Meshchaninov, Pavel Strashnov, Andrey Shevtsov et al.
Learning to Generate Unit Tests for Automated Debugging
Archiki Prasad, Elias Stengel-Eskin, Justin Chen et al.
SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning
Xiao Liang, Zhong-Zhi Li, Yeyun Gong et al.
When Large Vision-Language Model Meets Large Remote Sensing Imagery: Coarse-to-Fine Text-Guided Token Pruning
Junwei Luo, Yingying Zhang, Xue Yang et al.
CofCA: A STEP-WISE Counterfactual Multi-hop QA benchmark
Jian Wu, Linyi Yang, Zhen Wang et al.
Optimizing Temperature for Language Models with Multi-Sample Inference
Weihua Du, Yiming Yang, Sean Welleck
KVTuner: Sensitivity-Aware Layer-Wise Mixed-Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
Xing Li, Zeyu Xing, Yiming Li et al.
UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving
Yuping Wang, Xiangyu Huang, Xiaokang Sun et al.
Refine Knowledge of Large Language Models via Adaptive Contrastive Learning
Yinghui Li, Haojing Huang, Jiayi Kuang et al.
MVSAnywhere: Zero-Shot Multi-View Stereo
Sergio Izquierdo, Mohamed Sayed, Michael Firman et al.
Explore In-Context Segmentation via Latent Diffusion Models
Chaoyang Wang, Xiangtai Li, Henghui Ding et al.
Optimal transport-based conformal prediction
Gauthier Thurin, Kimia Nadjahi, Claire Boyer
Probabilistic Language-Image Pre-Training
Sanghyuk Chun, Wonjae Kim, Song Park et al.
Provably Accurate Shapley Value Estimation via Leverage Score Sampling
Christopher Musco, R. Teal Witter
Quantized Spike-driven Transformer
Xuerui Qiu, Malu Zhang, Jieyuan Zhang et al.
To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning
Tian Qin, David Alvarez-Melis, Samy Jelassi et al.
R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning
Lijun Sheng, Jian Liang, Zilei Wang et al.
Code Graph Model (CGM): A Graph-Integrated Large Language Model for Repository-Level Software Engineering Tasks
Hongyuan Tao, Ying Zhang, Zhenhao Tang et al.
Unified Parameter-Efficient Unlearning for LLMs
Chenlu Ding, Jiancan Wu, Yancheng Yuan et al.
Backdoor Attacks on Dense Retrieval via Public and Unintentional Triggers
Quanyu Long, Yue Deng, Leilei Gan et al.
LONG3R: Long Sequence Streaming 3D Reconstruction
Zhuoguang Chen, Minghui Qin, Tianyuan Yuan et al.
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis
Jiatao Gu, Tianrong Chen, David Berthelot et al.
FlowDec: A flow-based full-band general audio codec with high perceptual quality
Simon Welker, Matthew Le, Ricky T. Q. Chen et al.
AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis
Khiem Vuong, Anurag Ghosh, Deva Ramanan et al.
UGMathBench: A Diverse and Dynamic Benchmark for Undergraduate-Level Mathematical Reasoning with Large Language Models
Xin Xu, Jiaxin ZHANG, Tianhao Chen et al.
Can Transformers Learn Full Bayesian Inference in Context?
Arik Reuter, Tim G. J. Rudner, Vincent Fortuin et al.
Samba: Synchronized Set-of-Sequences Modeling for Multiple Object Tracking
Mattia Segu, Luigi Piccinelli, Siyuan Li et al.
Pippo: High-Resolution Multi-View Humans from a Single Image
Yash Kant, Ethan Weber, Jin Kyu Kim et al.
Patch-level Sounding Object Tracking for Audio-Visual Question Answering
Zhangbin Li, Jinxing Zhou, Jing Zhang et al.
DyMO: Training-Free Diffusion Model Alignment with Dynamic Multi-Objective Scheduling
Xin Xie, Dong Gong
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
Weijia Wu, Mingyu Liu, Zeyu Zhu et al.
Robust Function-Calling for On-Device Language Model via Function Masking
Qiqiang Lin, Muning Wen, Qiuying Peng et al.
Provable weak-to-strong generalization via benign overfitting
David Wu, Anant Sahai
HiLo: A Learning Framework for Generalized Category Discovery Robust to Domain Shifts
Hongjun Wang, Sagar Vaze, Kai Han
Joint Velocity-Growth Flow Matching for Single-Cell Dynamics Modeling
Dongyi Wang, Yuanwei Jiang, Zhenyi Zhang et al.
Reversible Decoupling Network for Single Image Reflection Removal
Hao Zhao, Mingjia Li, Qiming Hu et al.
FaceShot: Bring Any Character into Life
Junyao Gao, Yanan Sun, Fei Shen et al.
Position: Don't Use the CLT in LLM Evals With Fewer Than a Few Hundred Datapoints
Sam Bowyer, Laurence Aitchison, Desi Ivanova
GigaHands: A Massive Annotated Dataset of Bimanual Hand Activities
Rao Fu, Dingxi Zhang, Alex Jiang et al.
Personalized Preference Fine-tuning of Diffusion Models
Meihua Dang, Anikait Singh, Linqi Zhou et al.
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface
Hao Tang, Chen-Wei Xie, Haiyang Wang et al.
Bridging Modalities: Improving Universal Multimodal Retrieval by Multimodal Large Language Models
Xin Zhang, Yanzhao Zhang, Wen Xie et al.
Video Diffusion Models Are Strong Video Inpainter
Minhyeok Lee, Suhwan Cho, Chajin Shin et al.
NitroFusion: High-Fidelity Single-Step Diffusion through Dynamic Adversarial Training
Dar-Yen Chen, Hmrishav Bandyopadhyay, Kai Zou et al.
IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation
Yiren Song, Pei Yang, Hai Ci et al.
E-Valuating Classifier Two-Sample Tests
Tim Bakker, Christian A. Naesseth, Patrick Forré et al.
Assessing and Learning Alignment of Unimodal Vision and Language Models
Le Zhang, Qian Yang, Aishwarya Agrawal
X-Dyna: Expressive Dynamic Human Image Animation
Di Chang, Hongyi Xu, You Xie et al.
DRAWER: Digital Reconstruction and Articulation With Environment Realism
Hongchi Xia, Entong Su, Marius Memmel et al.
Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting
Runsong Zhu, Shi Qiu, ZHENGZHE LIU et al.
Unleashing Vecset Diffusion Model for Fast Shape Generation
Zeqiang Lai, Zhao Yunfei, Zibo Zhao et al.
WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models
Shengda Fan, Xin Cong, Yuepeng Fu et al.
DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes
Jinxiu Liu, Shaoheng Lin, Yinxiao Li et al.
FunBO: Discovering Acquisition Functions for Bayesian Optimization with FunSearch
Virginia Aglietti, Ira Ktena, Jessica Schrouff et al.
Know "No" Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP
Junsung Park, Jungbeom Lee, Jongyoon Song et al.
Layout-your-3D: Controllable and Precise 3D Generation with 2D Blueprint
Junwei Zhou, Xueting Li, Lu Qi et al.
SAIST: Segment Any Infrared Small Target Model Guided by Contrastive Language-Image Pretraining
Mingjin Zhang, Xiaolong Li, Fei Gao et al.
CircuitFusion: Multimodal Circuit Representation Learning for Agile Chip Design
Wenji Fang, Shang Liu, Jing Wang et al.
Knowledge Editing with Dynamic Knowledge Graphs for Multi-Hop Question Answering
Yifan Lu, Yigeng Zhou, Jing Li et al.
Implicit Search via Discrete Diffusion: A Study on Chess
Jiacheng Ye, Zhenyu Wu, Jiahui Gao et al.
Weighted-Reward Preference Optimization for Implicit Model Fusion
Ziyi Yang, Fanqi Wan, Longguang Zhong et al.
Pitfalls of Evidence-Based AI Policy
Stephen Casper, David Krueger, Dylan Hadfield-Menell
A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules
Kairong Luo, Haodong Wen, Shengding Hu et al.
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research
James Burgess, Jeffrey J Nirschl, Laura Bravo-Sánchez et al.
Multi-Turn Jailbreaking Large Language Models via Attention Shifting
Xiaohu Du, Fan Mo, Ming Wen et al.
Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections
Bo Wang, Qinyuan Cheng, Runyu Peng et al.
Docopilot: Improving Multimodal Models for Document-Level Understanding
Yuchen Duan, Zhe Chen, Yusong Hu et al.
Towards a Theoretical Understanding of Synthetic Data in LLM Post-Training: A Reverse-Bottleneck Perspective
Zeyu Gan, Yong Liu
Dissecting and Mitigating Diffusion Bias via Mechanistic Interpretability
Yingdong Shi, Changming Li, Yifan Wang et al.
HGSFusion: Radar-Camera Fusion with Hybrid Generation and Synchronization for 3D Object Detection
Zijian Gu, Jianwei Ma, Yan Huang et al.
Synthetic-to-Real Self-supervised Robust Depth Estimation via Learning with Motion and Structure Priors
Weilong Yan, Ming Li, Li Haipeng et al.
Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models
Shicheng Xu, Liang Pang, Yunchang Zhu et al.
DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing
William June Suk Choi, Kyungmin Lee, Jongheon Jeong et al.
Mixture of Parrots: Experts improve memorization more than reasoning
Samy Jelassi, Clara Mohri, David Brandfonbrener et al.
AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks
Zekang Yang, Wang Zeng, Sheng Jin et al.
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
Lucas Bandarkar, Benjamin Muller, Pritish Yuvraj et al.
CO-SPY: Combining Semantic and Pixel Features to Detect Synthetic Images by AI
Siyuan Cheng, Lingjuan Lyu, Zhenting Wang et al.
Nested Learning: The Illusion of Deep Learning Architectures
Ali Behrouz, Meisam Razaviyayn, Peilin Zhong et al.
Active Data Curation Effectively Distills Large-Scale Multimodal Models
Vishaal Udandarao, Nikhil Parthasarathy, Muhammad Ferjad Naeem et al.
SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models
Haotian Xia, Zhengbang Yang, Junbo Zou et al.
DPCore: Dynamic Prompt Coreset for Continual Test-Time Adaptation
Yunbei Zhang, Akshay Mehra, Shuaicheng Niu et al.
NEST: A Neuromodulated Small-world Hypergraph Trajectory Prediction Model for Autonomous Driving
Chengyue Wang, Haicheng Liao, Bonan Wang et al.
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation
Ziyang Luo, Haoning Wu, Dongxu Li et al.
Spike2Former: Efficient Spiking Transformer for High-performance Image Segmentation
Zhenxin Lei, Man Yao, Jiakui Hu et al.
SQLFixAgent: Towards Semantic-Accurate Text-to-SQL Parsing via Consistency-Enhanced Multi-Agent Collaboration
Jipeng Cen, Jiaxin Liu, Zhixu Li et al.
SimpleTM: A Simple Baseline for Multivariate Time Series Forecasting
Hui Chen, Viet Luong, Lopamudra Mukherjee et al.
Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation
Slava Elizarov, Ciara Rowles, Simon Donné
Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts
Minh Le, Chau Nguyen, Huy Nguyen et al.
Weak-to-Strong Generalization Through the Data-Centric Lens
Changho Shin, John Cooper, Frederic Sala
BingoGuard: LLM Content Moderation Tools with Risk Levels
Fan Yin, Philippe Laban, XIANGYU PENG et al.
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption
Tiehan Fan, Kepan Nan, Rui Xie et al.
Deep Distributed Optimization for Large-Scale Quadratic Programming
Augustinos Saravanos, Hunter Kuperman, Alex Oshin et al.
A Second-Order Perspective on Model Compositionality and Incremental Learning
Angelo Porrello, Lorenzo Bonicelli, Pietro Buzzega et al.
The Power of LLM-Generated Synthetic Data for Stance Detection in Online Political Discussions
Stefan Sylvius Wagner, Maike Behrendt, Marc Ziegele et al.
SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models
Zilan Wang, Junfeng Guo, Jiacheng Zhu et al.
Overcoming Lower-Level Constraints in Bilevel Optimization: A Novel Approach with Regularized Gap Functions
Wei Yao, Haian Yin, Shangzhi Zeng et al.
Towards Universal Soccer Video Understanding
Jiayuan Rao, Haoning Wu, Hao Jiang et al.
Retrieval-Augmented Dynamic Prompt Tuning for Incomplete Multimodal Learning
Jian Lang, Zhangtao Cheng, Ting Zhong et al.
SkillMimic: Learning Basketball Interaction Skills from Demonstrations
Yinhuai Wang, Qihan Zhao, Runyi Yu et al.
Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model
Wenhong Zhu, Zhiwei He, Xiaofeng Wang et al.
Block-Attention for Efficient Prefilling
Dongyang Ma, Yan Wang, Tian Lan
Hybrid-Level Instruction Injection for Video Token Compression in Multi-modal Large Language Models
Zhihang Liu, Chen-Wei Xie, Pandeng Li et al.
Toward Adaptive Large Language Models Structured Pruning via Hybrid-grained Weight Importance Assessment
Jun Liu, Zhenglun Kong, Pu Zhao et al.
OmniSync: Towards Universal Lip Synchronization via Diffusion Transformers
Ziqiao Peng, Jiwen Liu, Haoxian Zhang et al.
DPU: Dynamic Prototype Updating for Multimodal Out-of-Distribution Detection
Li Li, Huixian Gong, Hao Dong et al.
RelGNN: Composite Message Passing for Relational Deep Learning
Tianlang Chen, Charilaos Kanatsoulis, Jure Leskovec
MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents
Yanqi Dai, Huanran Hu, Lei Wang et al.
Large Language Model Meets Graph Neural Network in Knowledge Distillation
Shengxiang Hu, Guobing Zou, Song Yang et al.
Inference-Time Hyper-Scaling with KV Cache Compression
Adrian Łańcucki, Konrad Staniszewski, Piotr Nawrot et al.
Optimization with Access to Auxiliary Information
EL MAHDI CHAYTI, Sai Karimireddy
NFIG: Multi-Scale Autoregressive Image Generation via Frequency Ordering
Zhihao Huang, Xi Qiu, Yukuo Ma et al.
Vision-Language Gradient Descent-driven All-in-One Deep Unfolding Networks
Haijin Zeng, Xiangming Wang, Yongyong Chen et al.
BotSim: LLM-Powered Malicious Social Botnet Simulation
Boyu Qiao, Kun Li, Wei Zhou et al.
Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video
David Yifan Yao, Albert J. Zhai, Shenlong Wang
Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues
Youngjoon Jang, Haran Raajesh, Liliane Momeni et al.
Geolocation Representation from Large Language Models Are Generic Enhancers for Spatio-Temporal Learning
Junlin He, Tong Nie, Wei Ma
Retrieving Semantics from the Deep: an RAG Solution for Gesture Synthesis
M. Hamza Mughal, Rishabh Dabral, Merel CJ Scholman et al.
Semantic Convergence: Harmonizing Recommender Systems via Two-Stage Alignment and Behavioral Semantic Tokenization
Guanghan Li, Xun Zhang, Yufei Zhang et al.
Ready-to-React: Online Reaction Policy for Two-Character Interaction Generation
Zhi Cen, Huaijin Pi, Sida Peng et al.
Adaptive Classifier-Free Guidance via Dynamic Low-Confidence Masking
Pengxiang Li, Shilin Yan, Jiayin Cai et al.
REArtGS: Reconstructing and Generating Articulated Objects via 3D Gaussian Splatting with Geometric and Motion Constraints
Di Wu, Liu Liu, Zhou Linli et al.
Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos
Chiara Plizzari, Alessio Tonioni, Yongqin Xian et al.
Mechanistic Permutability: Match Features Across Layers
Nikita Balagansky, Ian Maksimov, Daniil Gavrilov
Video Anomaly Detection with Motion and Appearance Guided Patch Diffusion Model
Hang Zhou, Jiale Cai, Yuteng Ye et al.
ThinkBench: Dynamic Out-of-Distribution Evaluation for Robust LLM Reasoning
Shulin Huang, Linyi Yang, Yan Song et al.
4DGT: Learning a 4D Gaussian Transformer Using Real-World Monocular Videos
Zhen Xu, Zhengqin Li, Zhao Dong et al.
ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement
XIANGYU PENG, Congying Xia, Xinyi Yang et al.
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning
Yana Wei, Liang Zhao, Jianjian Sun et al.
OpenUnlearning: Accelerating LLM Unlearning via Unified Benchmarking of Methods and Metrics
Vineeth Dorna, Anmol Mekala, Wenlong Zhao et al.
Online Reasoning Video Segmentation with Just-in-Time Digital Twins
Yiqing Shen, Bohan Liu, Chenjia Li et al.
Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning
Zenan Li, Zhaoyu Li, Wen Tang et al.
JamMa: Ultra-lightweight Local Feature Matching with Joint Mamba
Xiaoyong Lu, Songlin Du
Beyond Canonicalization: How Tensorial Messages Improve Equivariant Message Passing
Peter Lippmann, Gerrit Gerhartz, Roman Remme et al.
MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks
Yinghao Zhu, Ziyi He, Haoran Hu et al.
Low-Light Image Enhancement via Generative Perceptual Priors
Han Zhou, Wei Dong, Xiaohong Liu et al.