Most Cited 2025 "js divergence" Papers
22,274 papers found • Page 26 of 112
Conference
Broken Tokens? Your Language Model can Secretly Handle Non-Canonical Tokenizations
Brian Zheng, Alisa Liu, Orevaoghene Ahia et al.
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
Jiashuo Yu, Yue Wu, Meng Chu et al.
HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model
Tao Wang, Changxu Cheng, Lingfeng Wang et al.
Task Generalization with Autoregressive Compositional Structure: Can Learning from $D$ Tasks Generalize to $D^T$ Tasks?
Amirhesam Abedsoltan, Huaqing Zhang, Kaiyue Wen et al.
Relieving Universal Label Noise for Unsupervised Visible-Infrared Person Re-Identification by Inferring from Neighbors
Xiao Teng, Long Lan, Dingyao Chen et al.
MLE-STAR: Machine Learning Engineering Agent via Search and Targeted Refinement
Jaehyun Nam, Jinsung Yoon, Jiefeng Chen et al.
DPC: Dual-Prompt Collaboration for Tuning Vision-Language Models
Haoyang Li, Liang Wang, Chao Wang et al.
Compositional Risk Minimization
Divyat Mahajan, Mohammad Pezeshki, Charles Arnal et al.
Grounding Language with Vision: A Conditional Mutual Information Calibrated Decoding Strategy for Reducing Hallucinations in LVLMs
Hao Fang, Changle Zhou, Jiawei Kong et al.
MUSE: Mamba Is Efficient Multi-scale Learner for Text-video Retrieval
Haoran Tang, Meng Cao, Jinfa Huang et al.
From Specificity to Generality: Revisiting Generalizable Artifacts in Detecting Face Deepfakes
Long Ma, Zhiyuan Yan, Jin Xu et al.
Evaluating Large Language Models through Role-Guide and Self-Reflection: A Comparative Study
Lili Zhao, Yang Wang, Qi Liu et al.
DepthCues: Evaluating Monocular Depth Perception in Large Vision Models
Duolikun Danier, Mehmet Aygun, Changjian Li et al.
DELIFT: Data Efficient Language model Instruction Fine-Tuning
Ishika Agarwal, Krishnateja Killamsetty, Lucian Popa et al.
Do Visual Imaginations Improve Vision-and-Language Navigation Agents?
Akhil Perincherry, Jacob Krantz, Stefan Lee
TabFlex: Scaling Tabular Learning to Millions with Linear Attention
Yuchen Zeng, Tuan Dinh, Wonjun Kang et al.
ManiVideo: Generating Hand-Object Manipulation Video with Dexterous and Generalizable Grasping
Youxin Pang, Ruizhi Shao, Jiajun Zhang et al.
Neighbor Does Matter: Density-Aware Contrastive Learning for Medical Semi-supervised Segmentation
Feilong Tang, Zhongxing Xu, Ming Hu et al.
Combining Cost Constrained Runtime Monitors for AI Safety
Tim Hua, James Baskerville, Henri Lemoine et al.
CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians
Chongjian GE, Chenfeng Xu, Yuanfeng Ji et al.
Chimera: Improving Generalist Model with Domain-Specific Experts
Tianshuo Peng, Mingsheng Li, Jiakang Yuan et al.
Hierarchical Vector Quantization for Unsupervised Action Segmentation
Federico Spurio, Emad Bahrami, Gianpiero Francesca et al.
RepLDM: Reprogramming Pretrained Latent Diffusion Models for High-Quality, High-Efficiency, High-Resolution Image Generation
Boyuan Cao, Jiaxin Ye, Yujie Wei et al.
HELM: Hierarchical Encoding for mRNA Language Modeling
Mehdi Yazdani-Jahromi, Mangal Prakash, Tommaso Mansi et al.
DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image
Qingxuan Wu, Zhiyang Dou, Sirui Xu et al.
Rapidly Adapting Policies to the Real-World via Simulation-Guided Fine-Tuning
Patrick Yin, Tyler Westenbroek, Ching-An Cheng et al.
DIPO: Dual-State Images Controlled Articulated Object Generation Powered by Diverse Data
Ruiqi Wu, Xinjie wang, Liu.Liu et al.
ActiveGAMER: Active GAussian Mapping through Efficient Rendering
Liyan Chen, Huangying Zhan, Kevin Chen et al.
Mimic In-Context Learning for Multimodal Tasks
Yuchu Jiang, Jiale Fu, chenduo hao et al.
BoA: Attention-aware Post-training Quantization without Backpropagation
Junhan Kim, Ho-young Kim, Eulrang Cho et al.
Multi-Focus Image Fusion via Explicit Defocus Blur Modelling
Yuhui Quan, Xi Wan, Zitao Tang et al.
See It from My Perspective: How Language Affects Cultural Bias in Image Understanding
Amith Ananthram, Elias Stengel-Eskin, Mohit Bansal et al.
Multi-Modal and Multi-Attribute Generation of Single Cells with CFGen
Alessandro Palma, Till Richter, Hanyi Zhang et al.
Seurat: From Moving Points to Depth
Seokju Cho, Gabriel Huang, Seungryong Kim et al.
TransPixeler: Advancing Text-to-Video Generation with Transparency
Luozhou Wang, Yijun Li, ZhiFei Chen et al.
A General Adaptive Dual-level Weighting Mechanism for Remote Sensing Pansharpening
Jie Huang, Haorui Chen, Jiaxuan Ren et al.
MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments
MATTHIEU CORD, Antonin Vobecky, Oriane Siméoni et al.
Lightweight Dataset Pruning without Full Training via Example Difficulty and Prediction Uncertainty
Yeseul Cho, Baekrok Shin, Changmin Kang et al.
iMoT: Inertial Motion Transformer for Inertial Navigation
Son Minh Nguyen, Duc Viet Le, Paul Havinga
MP-SfM: Monocular Surface Priors for Robust Structure-from-Motion
Zador Pataki, Paul-Edouard Sarlin, Johannes Schönberger et al.
ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos
Tanveer Hannan, Md Mohaiminul Islam, Jindong Gu et al.
From Debate to Equilibrium: Belief‑Driven Multi‑Agent LLM Reasoning via Bayesian Nash Equilibrium
Yi Xie, Zhanke Zhou, Chentao Cao et al.
PAL: Sample-Efficient Personalized Reward Modeling for Pluralistic Alignment
Daiwei Chen, Yi Chen, Aniket Rege et al.
Progressive Compositionality in Text-to-Image Generative Models
Xu Han, Linghao Jin, Xiaofeng Liu et al.
SegLLM: Multi-round Reasoning Segmentation with Large Language Models
Xudong Wang, Shaolun Zhang, Shufan Li et al.
Ringmaster ASGD: The First Asynchronous SGD with Optimal Time Complexity
Artavazd Maranjyan, Alexander Tyurin, Peter Richtarik
Can Textual Gradient Work in Federated Learning?
Minghui Chen, Ruinan Jin, Wenlong Deng et al.
DORNet: A Degradation Oriented and Regularized Network for Blind Depth Super-Resolution
Zhengxue Wang, Zhiqiang Yan, Jinshan Pan et al.
ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation
Ali Athar, Xueqing Deng, Liang-Chieh Chen
Toward a Unified Theory of Gradient Descent under Generalized Smoothness
Alexander Tyurin
Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation
Junha Lee, Chunghyun Park, Jaesung Choe et al.
Adversarial Generative Flow Network for Solving Vehicle Routing Problems
Ni Zhang, Jingfeng Yang, Zhiguang Cao et al.
Breaking AR’s Sampling Bottleneck: Provable Acceleration via Diffusion Language Models
Gen Li, Changxiao Cai
REGENT: A Retrieval-Augmented Generalist Agent That Can Act In-Context in New Environments
Kaustubh Sridhar, Souradeep Dutta, Dinesh Jayaraman et al.
vesselFM: A Foundation Model for Universal 3D Blood Vessel Segmentation
Bastian Wittmann, Yannick Wattenberg, Tamaz Amiranashvili et al.
Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model
Benlin Liu, Yuhao Dong, Yiqin Wang et al.
Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?
Simon Park, Abhishek Panigrahi, Yun Cheng et al.
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Yuying Ge, Yizhuo Li, Yixiao Ge et al.
Decomposition Polyhedra of Piecewise Linear Functions
Marie-Charlotte Brandenburg, Moritz Grillo, Christoph Hertrich
Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding
Zining Wang, Tongkun Guan, Pei Fu et al.
Offline Model-Based Optimization by Learning to Rank
Rong-Xi Tan, Ke Xue, Shen-Huan Lyu et al.
EdgeTAM: On-Device Track Anything Model
Chong Zhou, Chenchen Zhu, Yunyang Xiong et al.
DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback
Zaid Khan, Elias Stengel-Eskin, Jaemin Cho et al.
MVREC: A General Few-shot Defect Classification Model Using Multi-View Region-Context
Shuai Lyu, Rongchen Zhang, Zeqi Ma et al.
DualOpt: A Dual Divide-and-Optimize Algorithm for the Large-scale Traveling Salesman Problem
Shipei Zhou, Yuandong Ding, Chi Zhang et al.
MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes
Ruijie Lu, Yixin Chen, Junfeng Ni et al.
Rendering-Aware Reinforcement Learning for Vector Graphics Generation
Juan Rodriguez, Haotian Zhang, Abhay Puri et al.
Thinking Racial Bias in Fair Forgery Detection: Models, Datasets and Evaluations
Decheng Liu, Zongqi Wang, Chunlei Peng et al.
Local-Prompt: Extensible Local Prompts for Few-Shot Out-of-Distribution Detection
Fanhu Zeng, Zhen Cheng, Fei Zhu et al.
DriveEditor: A Unified 3D Information-Guided Framework for Controllable Object Editing in Driving Scenes
Yiyuan Liang, Zhiying Yan, Liqun Chen et al.
Show and Segment: Universal Medical Image Segmentation via In-Context Learning
Yunhe Gao, Di Liu, Zhuowei Li et al.
Scent of Knowledge: Optimizing Search-Enhanced Reasoning with Information Foraging
Hongjin Qian, Zheng Liu
DreamPRM: Domain-reweighted Process Reward Model for Multimodal Reasoning
Qi Cao, Ruiyi Wang, Ruiyi Zhang et al.
EventSplat: 3D Gaussian Splatting from Moving Event Cameras for Real-time Rendering
Toshiya Yura, Ashkan Mirzaei, Igor Gilitschenski
Feature Denoising Diffusion Model for Blind Image Quality Assessment
Xudong Li, Yan Zhang, Yunhang Shen et al.
Near-Optimal Sample Complexity for MDPs via Anchoring
Jongmin Lee, Mario Bravo, Roberto Cominetti
Energy-based Backdoor Defense Against Federated Graph Learning
Guancheng Wan, Zitong Shi, Wenke Huang et al.
Is Your Video Language Model a Reliable Judge?
Ming Liu, Wensheng Zhang
AnyBimanual: Transferring Unimanual Policy for General Bimanual Manipulation
Guanxing Lu, Tengbo Yu, Haoyuan Deng et al.
PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting
Cheng Zhang, Haofei Xu, Qianyi Wu et al.
NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative
Asmar Nadeem, Faegheh Sardari, Robert Dawes et al.
FixTalk: Taming Identity Leakage for High-Quality Talking Head Generation in Extreme Cases
Shuai Tan, Bill Gong, Bin Ji et al.
Gumbel Counterfactual Generation From Language Models
Shauli Ravfogel, Anej Svete, Vésteinn Snæbjarnarson et al.
Shortcuts and Identifiability in Concept-based Models from a Neuro-Symbolic Lens
Samuele Bortolotti, Emanuele Marconato, Paolo Morettin et al.
Superposition Yields Robust Neural Scaling
Yizhou Liu, Ziming Liu, Jeff Gore
MindAligner: Explicit Brain Functional Alignment for Cross-Subject Visual Decoding from Limited fMRI Data
Yuqin Dai, Zhouheng Yao, Chunfeng Song et al.
Towards Autonomous Micromobility through Scalable Urban Simulation
Wayne Wu, Honglin He, Chaoyuan Zhang et al.
Rethinking Open-Vocabulary Segmentation of Radiance Fields in 3D Space
Hyunjee Lee, Youngsik Yun, Jeongmin Bae et al.
Highly Compressed Tokenizer Can Generate Without Training
Lukas Lao Beyer, Tianhong Li, Xinlei Chen et al.
VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning
Ji Soo Lee, Jongha Kim, Jeehye Na et al.
Correcting Deviations from Normality: A Reformulated Diffusion Model for Multi-Class Unsupervised Anomaly Detection
Farzad Beizaee, Gregory A. Lodygensky, Christian Desrosiers et al.
Beyond Sequence: Impact of Geometric Context for RNA Property Prediction
Junjie Xu, Artem Moskalev, Tommaso Mansi et al.
Improved Finite-Particle Convergence Rates for Stein Variational Gradient Descent
Sayan Banerjee, Krishna Balasubramanian, PROMIT GHOSAL
AudSemThinker: Enhancing Audio-Language Models Through Reasoning over Semantics of Sound
Gijs Wijngaard, Elia Formisano, Michele Esposito et al.
RBench-V: A Primary Assessment for Visual Reasoning Models with Multimodal Outputs
Meng-Hao Guo, Xuanyu Chu, Qianrui Yang et al.
StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
Xin Ding, Hao Wu, Yifan Yang et al.
VideoAuteur: Towards Long Narrative Video Generation
Junfei Xiao, Feng Cheng, Lu Qi et al.
Multi-Granular Multimodal Clue Fusion for Meme Understanding
Li Zheng, Hao Fei, Ting Dai et al.
MAGE: Model-Level Graph Neural Networks Explanations via Motif-based Graph Generation
Zhaoning Yu, Hongyang Gao
APOLLO: Automated LLM and Lean Collaboration for Advanced Formal Reasoning
Azim Ospanov, Farzan Farnia, Roozbeh Yousefzadeh
Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification
Yang Qin, Chao Chen, Zhihang Fu et al.
DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation
Jisoo Kim, Jungbin Cho, Joonho Park et al.
Near, far: Patch-ordering enhances vision foundation models' scene understanding
Valentinos Pariza, Mohammadreza Salehi, Gertjan J Burghouts et al.
Assessing the Creativity of LLMs in Proposing Novel Solutions to Mathematical Problems
Junyi Ye, Jingyi Gu, Xinyun Zhao et al.
Distilling Structural Representations into Protein Sequence Models
Jeffrey Ouyang-Zhang, Chengyue Gong, Yue Zhao et al.
Understanding and Improving Length Generalization in Recurrent Models
Ricardo Buitrago Ruiz, Albert Gu
PDE-Transformer: Efficient and Versatile Transformers for Physics Simulations
Benjamin Holzschuh, Qiang Liu, Georg Kohl et al.
MSE-Adapter: A Lightweight Plugin Endowing LLMs with the Capability to Perform Multimodal Sentiment Analysis and Emotion Recognition
Yang Yang, Xunde Dong, Yupeng Qiang
OpenAD: Open-World Autonomous Driving Benchmark for 3D Object Detection
Zhongyu Xia, Jishuo Li, Zhiwei Lin et al.
Prioritized Generative Replay
Ren Wang, Kevin Frans, Pieter Abbeel et al.
InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment
Yunhong Lu, Qichao Wang, Hengyuan Cao et al.
Value-Based Deep RL Scales Predictably
Oleh Rybkin, Michal Nauman, Preston Fu et al.
InverseCoder: Self-improving Instruction-Tuned Code LLMs with Inverse-Instruct
Yutong Wu, Di Huang, Wenxuan Shi et al.
VRVVC: Variable-Rate NeRF-Based Volumetric Video Compression
Qiang Hu, Houqiang Zhong, Zihan Zheng et al.
Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition
Zheyang Xiong, Jack Cai, John Cooper et al.
FreCaS: Efficient Higher-Resolution Image Generation via Frequency-aware Cascaded Sampling
zhengqiang ZHANG, Ruihuang Li, Lei Zhang
Distilling Monocular Foundation Model for Fine-grained Depth Completion
Yingping Liang, Yutao Hu, Wenqi Shao et al.
ReAttention: Training-Free Infinite Context with Finite Attention Scope
Xiaoran Liu, Ruixiao Li, Zhigeng Liu et al.
An Interpretable N-gram Perplexity Threat Model for Large Language Model Jailbreaks
Valentyn Boreiko, Alexander Panfilov, Václav Voráček et al.
Information-Driven Design of Imaging Systems
Henry Pinkard, Leyla Kabuli, Eric Markley et al.
NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields
Amandine Brunetto, Sascha Hornauer, Fabien Moutarde
VidTwin: Video VAE with Decoupled Structure and Dynamics
Yuchi Wang, Junliang Guo, Xinyi Xie et al.
WildFake: A Large-Scale and Hierarchical Dataset for AI-Generated Images Detection
Yan Hong, Jianming Feng, Haoxing Chen et al.
DiffCalib: Reformulating Monocular Camera Calibration as Diffusion-Based Dense Incident Map Generation
Xiankang He, Guangkai Xu, Bo Zhang et al.
HOPE for a Robust Parameterization of Long-memory State Space Models
Annan Yu, Michael W Mahoney, N. Benjamin Erichson
Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation
Jiantao Lin, Xin Yang, Meixi Chen et al.
Probability Density Geodesics in Image Diffusion Latent Space
Qingtao Yu, Jaskirat Singh, Zhaoyuan Yang et al.
Generative Inbetweening through Frame-wise Conditions-Driven Video Generation
Tianyi Zhu, Dongwei Ren, Qilong Wang et al.
TEncDM: Understanding the Properties of the Diffusion Model in the Space of Language Model Encodings
Alexander Shabalin, Viacheslav Meshchaninov, Egor Chimbulatov et al.
PatientSim: A Persona-Driven Simulator for Realistic Doctor-Patient Interactions
Daeun Kyung, Hyunseung Chung, Seongsu Bae et al.
REWIND: Real-Time Egocentric Whole-Body Motion Diffusion with Exemplar-Based Identity Conditioning
Jihyun Lee, Weipeng Xu, Alexander Richard et al.
Preference-Guided Diffusion for Multi-Objective Offline Optimization
Yashas Annadani, Syrine Belakaria, Stefano Ermon et al.
On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Bingrui Li, Wei Huang, Andi Han et al.
GENTEEL-NEGOTIATOR: LLM-Enhanced Mixture-of-Expert-Based Reinforcement Learning Approach for Polite Negotiation Dialogue
Priyanshu Priya, Rishikant Chigrupaatii, Mauajama Firdaus et al.
SMARTIES: Spectrum-Aware Multi-Sensor Auto-Encoder for Remote Sensing Images
Gencer Sumbul, Chang Xu, Emanuele Dalsasso et al.
Monet: Mixture of Monosemantic Experts for Transformers
Jungwoo Park, Young Jin Ahn, Kee-Eung Kim et al.
Aligned Datasets Improve Detection of Latent Diffusion-Generated Images
Anirudh Sundara Rajan, Utkarsh Ojha, Jedidiah Schloesser et al.
HaDeMiF: Hallucination Detection and Mitigation in Large Language Models
Xiaoling Zhou, Mingjie Zhang, Zhemg Lee et al.
(Mis)Fitting Scaling Laws: A Survey of Scaling Law Fitting Techniques in Deep Learning
Margaret Li, Sneha Kudugunta, Luke Zettlemoyer
Multi-Pair Temporal Sentence Grounding via Multi-Thread Knowledge Transfer Network
Xiang Fang, Wanlong Fang, Changshuo Wang et al.
Sample complexity of data-driven tuning of model hyperparameters in neural networks with structured parameter-dependent dual function
Maria-Florina Balcan, Anh Nguyen, Dravyansh Sharma
Depth-Centric Dehazing and Depth-Estimation from Real-World Hazy Driving Video
Junkai Fan, Kun Wang, Zhiqiang Yan et al.
Fine-structure Preserved Real-world Image Super-resolution via Transfer VAE Training
Qiaosi Yi, Shuai Li, Rongyuan Wu et al.
OS-ATLAS: Foundation Action Model for Generalist GUI Agents
Zhiyong Wu, Zhenyu Wu, Fangzhi Xu et al.
Fast Think-on-Graph: Wider, Deeper and Faster Reasoning of Large Language Model on Knowledge Graph
Xujian Liang, Zhaoquan Gu
Spectral-Refiner: Accurate Fine-Tuning of Spatiotemporal Fourier Neural Operator for Turbulent Flows
Shuhao Cao, Francesco Brarda, Ruipeng Li et al.
Neural Approximate Mirror Maps for Constrained Diffusion Models
Berthy Feng, Ricardo Baptista, Katherine Bouman
LOMA: Language-assisted Semantic Occupancy Network via Triplane Mamba
Yubo Cui, Zhiheng Li, Jiaqiang Wang et al.
MultiGO: Towards Multi-level Geometry Learning for Monocular 3D Textured Human Reconstruction
Gangjian Zhang, Nanjie Yao, Shunsi Zhang et al.
Breaking Free from MMI: A New Frontier in Rationalization by Probing Input Utilization
Wei Liu, Zhiying Deng, Zhongyu Niu et al.
Knowledge Distillation with Refined Logits
Wujie Sun, Defang Chen, Siwei Lyu et al.
Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation
Gao Peng, Le Zhuo, Dongyang Liu et al.
Do LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness
Rongzhe Wei, Peizhi Niu, Hans Hao-Hsun Hsu et al.
SPARTAN: A Sparse Transformer World Model Attending to What Matters
Anson Lei, Bernhard Schölkopf, Ingmar Posner
Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification
Jiaxiang Gou, Luping Ji, Pei Liu et al.
Black Swan: Abductive and Defeasible Video Reasoning in Unpredictable Events
Aditya Chinchure, Sahithya Ravi, Raymond Ng et al.
M3Net: Multimodal Multi-task Learning for 3D Detection, Segmentation, and Occupancy Prediction in Autonomous Driving
Xuesong Chen, Shaoshuai Shi, Tao Ma et al.
Does Editing Provide Evidence for Localization?
Zihao Wang, Victor Veitch
An End-to-End Robust Point Cloud Semantic Segmentation Network with Single-Step Conditional Diffusion Models
Wentao Qu, Jing Wang, Yongshun Gong et al.
LLM+AL: Bridging Large Language Models and Action Languages for Complex Reasoning About Actions
Adam Ishay, Joohyung Lee
Measuring what Matters: Construct Validity in Large Language Model Benchmarks
Andrew M. Bean, Ryan Othniel Kearns, Angelika Romanou et al.
A Training-Free Sub-quadratic Cost Transformer Model Serving Framework with Hierarchically Pruned Attention
Heejun Lee, Geon Park, Youngwan Lee et al.
RNG: Relightable Neural Gaussians
Jiahui Fan, Fujun Luan, Jian Yang et al.
Cross-View Referring Multi-Object Tracking
Sijia Chen, En Yu, Wenbing Tao
U-REPA: Aligning Diffusion U-Nets to ViTs
Yuchuan Tian, Hanting Chen, Mengyu Zheng et al.
Learning Safety Constraints for Large Language Models
Xin Chen, Yarden As, Andreas Krause
CustomContrast: A Multilevel Contrastive Perspective for Subject-Driven Text-to-Image Customization
Nan Chen, Mengqi Huang, Zhuowei Chen et al.
SimulPL: Aligning Human Preferences in Simultaneous Machine Translation
Donglei Yu, Yang Zhao, Jie Zhu et al.
Simulation-Free Hierarchical Latent Policy Planning for Proactive Dialogues
Tao He, Lizi Liao, Yixin Cao et al.
ChainHOI: Joint-based Kinematic Chain Modeling for Human-Object Interaction Generation
Ling-An Zeng, Guohong Huang, Yi-Lin Wei et al.
DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference
Jinwei Yao, Kaiqi Chen, Kexun Zhang et al.
BrainUICL: An Unsupervised Individual Continual Learning Framework for EEG Applications
Yangxuan Zhou, Sha Zhao, Jiquan Wang et al.
Generative Multiview Relighting for 3D Reconstruction under Extreme Illumination Variation
Hadi Alzayer, Philipp Henzler, Jonathan T. Barron et al.
ForestFormer3D: A Unified Framework for End-to-End Segmentation of Forest LiDAR 3D Point Clouds
Binbin Xiang, Maciej Wielgosz, Stefano Puliti et al.
SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes
Yuji Wang, Haoran Xu, Yong Liu et al.
AGFSync: Leveraging AI-Generated Feedback for Preference Optimization in Text-to-Image Generation
Jingkun An, Yinghao Zhu, Zongjian Li et al.
In-Context Deep Learning via Transformer Models
Weimin Wu, Maojiang Su, Jerry Yao-Chieh Hu et al.
Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained Control
Hejia Chen, Haoxian Zhang, Shoulong Zhang et al.
ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering
Yuki Imajuku, Kohki Horie, Yoichi Iwata et al.
Progressive distillation induces an implicit curriculum
Abhishek Panigrahi, Bingbin Liu, Sadhika Malladi et al.
UAVScenes: A Multi-Modal Dataset for UAVs
Sijie Wang, Siqi Li, Yawei Zhang et al.
DRESSing Up LLM: Efficient Stylized Question-Answering via Style Subspace Editing
Xinyu Ma, Yifeng Xu, Yang Lin et al.
Decentralized Diffusion Models
David McAllister, Matthew Tancik, Jiaming Song et al.
AlphaZero Neural Scaling and Zipf's Law: a Tale of Board Games and Power Laws
Oren Neumann, Claudius Gros
CP-Guard: Malicious Agent Detection and Defense in Collaborative Bird’s Eye View Perception
Senkang Hu, Yihang Tao, Guowen Xu et al.
Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Generation
Zheng Anlin, Xin Wen, Xuanyang Zhang et al.
Towards Accurate Binary Spiking Neural Networks: Learning with Adaptive Gradient Modulation Mechanism
Yu Liang, Wenjie Wei, Ammar Belatreche et al.
GVMGen: A General Video-to-Music Generation Model with Hierarchical Attentions
Heda Zuo, Weitao You, Junxian Wu et al.
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation
Zichen Wen, Shaobo Wang, Yufa Zhou et al.
MolSpectra: Pre-training 3D Molecular Representation with Multi-modal Energy Spectra
Liang Wang, Shaozhen Liu, Yu Rong et al.
ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Predictions
Dubing Chen, Jin Fang, Wencheng Han et al.
MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions
Yekun Chai, Haoran Sun, Huang Fang et al.
MalCL: Leveraging GAN-Based Generative Replay to Combat Catastrophic Forgetting in Malware Classification
Jimin Park, AHyun Ji, Minji Park et al.
MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining
Yunze Liu, Li Yi
Asynchronous Federated Clustering with Unknown Number of Clusters
Yunfan Zhang, Yiqun Zhang, Yang Lu et al.
Residual-MPPI: Online Policy Customization for Continuous Control
Pengcheng Wang, Chenran Li, Catherine Weaver et al.
Compression via Pre-trained Transformers: A Study on Byte-Level Multimodal Data
David Heurtel-Depeiges, Anian Ruoss, Joel Veness et al.
BiDeV: Bilateral Defusing Verification for Complex Claim Fact-Checking
Yuxuan Liu, Hongda Sun, Wenya Guo et al.
GeoSplatting: Towards Geometry Guided Gaussian Splatting for Physically-based Inverse Rendering
Kai Ye, Chong Gao, Guanbin Li et al.
Hotspot-Driven Peptide Design via Multi-Fragment Autoregressive Extension
Jiahan Li, Tong Chen, Shitong Luo et al.