Most Cited 2025 "visual distractors" Papers
22,274 papers found • Page 15 of 112
Conference
Preference Optimization on Pareto Sets: On a Theory of Multi-Objective Optimization
Abhishek Roy, Geelon So, Yian Ma
VTDexManip: A Dataset and Benchmark for Visual-tactile Pretraining and Dexterous Manipulation with Reinforcement Learning
Qingtao Liu, Yu Cui, Zhengnan Sun et al.
DELTA: Pre-Train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment
Haitao Li, Qingyao Ai, Xinyan Han et al.
ViSpeak: Visual Instruction Feedback in Streaming Videos
Shenghao Fu, Qize Yang, Yuan-Ming Li et al.
Learning the RoPEs: Better 2D and 3D Position Encodings with STRING
Connor Schenck, Isaac Reid, Mithun Jacob et al.
Measuring Human and AI Values Based on Generative Psychometrics with Large Language Models
Haoran Ye, Yuhang Xie, Yuanyi Ren et al.
Consistency Checks for Language Model Forecasters
Daniel Paleka, Abhimanyu Pallavi Sudhir, Alejandro Alvarez et al.
Beyond Walking: A Large-Scale Image-Text Benchmark for Text-based Person Anomaly Search
Shuyu Yang, Yaxiong Wang, Li Zhu et al.
Efficient Model Editing with Task-Localized Sparse Fine-tuning
Leonardo Iurada, Marco Ciccone, Tatiana Tommasi
Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks
Peng Xie, Yequan Bie, Jianda Mao et al.
Emergent Temporal Correspondences from Video Diffusion Transformers
Jisu Nam, Soowon Son, Dahyun Chung et al.
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics
Dongyoung Kim, Huiwon Jang, Sumin Park et al.
DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation
Jiangran Lyu, Ziming Li, Xuesong Shi et al.
RoMA: Scaling up Mamba-based Foundation Models for Remote Sensing
Fengxiang Wang, Yulin Wang, Mingshuo Chen et al.
SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes
Tony Alex, Sara Atito, Armin Mustafa et al.
AdvPrefix: An Objective for Nuanced LLM Jailbreaks
Sicheng Zhu, Brandon Amos, Yuandong Tian et al.
Neural Exploratory Landscape Analysis for Meta-Black-Box-Optimization
Zeyuan Ma, Jiacheng Chen, Hongshu Guo et al.
MIMO: A Medical Vision Language Model with Visual Referring Multimodal Input and Pixel Grounding Multimodal Output
Yanyuan Chen, Dexuan Xu, Yu Huang et al.
VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving
Haiming Zhang, Wending Zhou, Shenzhen The Chinese University of Hongkong et al.
Deep Linear Probe Generators for Weight Space Learning
Jonathan Kahana, Eliahu Horwitz, Imri Shuval et al.
Towards Training-free Anomaly Detection with Vision and Language Foundation Models
Jinjin Zhang, Guodong Wang, yizhou jin et al.
Vector-ICL: In-context Learning with Continuous Vector Representations
Yufan Zhuang, Chandan Singh, Liyuan Liu et al.
SplatFlow: Self-Supervised Dynamic Gaussian Splatting in Neural Motion Flow Field for Autonomous Driving
Su Sun, Cheng Zhao, Zhuoyang Sun et al.
Flexible Frame Selection for Efficient Video Reasoning
Shyamal Buch, Arsha Nagrani, Anurag Arnab et al.
Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion
Jingyuan Chen, Fuchen Long, Jie An et al.
GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement
Peiye Zhuang, Songfang Han, Chaoyang Wang et al.
Confidence Estimation for Error Detection in Text-to-SQL Systems
Oleg Somov, Elena Tutubalina
EllieSQL: Cost-Efficient Text-to-SQL with Complexity-Aware Routing
Yizhang Zhu, Runzhi JIANG, Boyan Li et al.
S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Model with Spatio-Temporal Visual Representation
Yichen Xie, Runsheng Xu, Tong He et al.
Scalable and Certifiable Graph Unlearning: Overcoming the Approximation Error Barrier
Lu Yi, Zhewei Wei
Unifying Autoregressive and Diffusion-Based Sequence Generation
Nima Fathi, Torsten Scholak, Pierre-Andre Noel
LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models
Minqian Liu, Zhiyang Xu, Xinyi Zhang et al.
Unleashing Hour-Scale Video Training for Long Video-Language Understanding
Jingyang Lin, Jialian Wu, Ximeng Sun et al.
AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference
Zhuomin He, Yizhen Yao, Pengfei Zuo et al.
EnergyMoGen: Compositional Human Motion Generation with Energy-Based Diffusion Model in Latent Space
Jianrong Zhang, Hehe Fan, Yi Yang
Seeing Your Speech Style: A Novel Zero-Shot Identity-Disentanglement Face-based Voice Conversion
Yan Rong, Li Liu
Data-Driven Performance Guarantees for Classical and Learned Optimizers
Rajiv Sambharya, Bartolomeo Stellato
WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image
Yuci Liang, Xinheng Lyu, Meidan Ding et al.
Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models
Jin Wang, Chenghui Lv, Xian Li et al.
LLM Unlearning Reveals a Stronger-Than-Expected Coreset Effect in Current Benchmarks
Soumyadeep Pal, Changsheng Wang, James Diffenderfer et al.
LoRe: Personalizing LLMs via Low-Rank Reward Modeling
Avinandan Bose, Zhihan Xiong, Yuejie Chi et al.
Law of the Weakest Link: Cross Capabilities of Large Language Models
Ming Zhong, Aston Zhang, Xuewei Wang et al.
ID-Patch: Robust ID Association for Group Photo Personalization
Yimeng Zhang, Tiancheng Zhi, Jing Liu et al.
Think Then React: Towards Unconstrained Action-to-Reaction Motion Generation
Wenhui Tan, Boyuan Li, Chuhao Jin et al.
CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIP
Songlong Xing, Zhengyu Zhao, Nicu Sebe
Latent-EnSF: A Latent Ensemble Score Filter for High-Dimensional Data Assimilation with Sparse Observation Data
Phillip Si, Peng Chen
Scalable Surrogate Verification of Image-Based Neural Network Control Systems Using Composition and Unrolling
Feiyang Cai, Chuchu Fan, Stanley Bak
Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought'' Control
Hannah Cyberey, David Evans
BASKET: A Large-Scale Video Dataset for Fine-Grained Skill Estimation
Yulu Pan, Ce Zhang, Gedas Bertasius
Di[M]O: Distilling Masked Diffusion Models into One-step Generator
Yuanzhi Zhu, Xi WANG, Stéphane Lathuilière et al.
Breaking the Data Barrier -- Building GUI Agents Through Task Generalization
Junlei Zhang, Zichen Ding, Chang Ma et al.
3D Mesh Editing using Masked LRMs
William Gao, Dilin Wang, Yuchen Fan et al.
Sample Efficient Preference Alignment in LLMs via Active Exploration
Viraj Mehta, Syrine Belakaria, Vikramjeet Das et al.
MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge
yuntao du, Kailin Jiang, Zhi Gao et al.
ADAM: An Embodied Causal Agent in Open-World Environments
Shu Yu, Chaochao Lu
ConfTuner: Training Large Language Models to Express Their Confidence Verbally
Yibo Li, Miao Xiong, Jiaying Wu et al.
Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data
Seiji Maekawa, Hayate Iso, Nikita Bhutani
Advancing Language Multi-Agent Learning with Credit Re-Assignment for Interactive Environment Generalization
Zhitao He, Zijun Liu, Peng Li et al.
General Scene Adaptation for Vision-and-Language Navigation
Haodong Hong, Yanyuan Qiao, Sen Wang et al.
Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations
Yiyou Sun, Yu Gai, Lijie Chen et al.
STCOcc: Sparse Spatial-Temporal Cascade Renovation for 3D Occupancy and Scene Flow Prediction
Zhimin Liao, Ping Wei, Shuaijia Chen et al.
Diff2Flow: Training Flow Matching Models via Diffusion Model Alignment
Johannes Schusterbauer, Ming Gui, Frank Fundel et al.
More RLHF, More Trust? On The Impact of Preference Alignment On Trustworthiness
Aaron J. Li, Satyapriya Krishna, Hima Lakkaraju
Generalized Recorrupted-to-Recorrupted: Self-Supervised Learning Beyond Gaussian Noise
Brayan Monroy, Jorge Bacca, Julián Tachella
Fluid Language Model Benchmarking
Valentin Hofmann, David Heineman, Ian Magnusson et al.
Label-Free Backdoor Attacks in Vertical Federated Learning
Wei Shen, Wenke Huang, Guancheng Wan et al.
First SFT, Second RL, Third UPT: Continual Improving Multi-Modal LLM Reasoning via Unsupervised Post-Training
Lai Wei, Yuting Li, Chen Wang et al.
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text
Nikhil Kandpal, Brian Lester, Colin Raffel et al.
Aligning Human Motion Generation with Human Perceptions
Haoru Wang, Wentao Zhu, Luyi Miao et al.
Towards Multiple Character Image Animation Through Enhancing Implicit Decoupling
Jingyun Xue, WANG HongFa, Qi Tian et al.
EqNIO: Subequivariant Neural Inertial Odometry
Royina Karegoudra Jayanth, Yinshuang Xu, Ziyun Wang et al.
ResCLIP: Residual Attention for Training-free Dense Vision-language Inference
Jinhong Deng, Yuhang Yang, Wen Li et al.
A Single Goal is All You Need: Skills and Exploration Emerge from Contrastive RL without Rewards, Demonstrations, or Subgoals
Grace Liu, Michael Tang, Benjamin Eysenbach
Deep MMD Gradient Flow without adversarial training
Alexandre Galashov, Valentin De Bortoli, Arthur Gretton
MagicColor: Multi-instance Sketch Colorization
yinhan Zhang, Yue Ma, Bingyuan Wang et al.
Gaussian Eigen Models for Human Heads
Wojciech Zielonka, Timo Bolkart, Thabo Beeler et al.
Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance
Sachin Goyal, Christina Baek, Zico Kolter et al.
Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program
Minghe Gao, Xuqi Liu, Zhongqi Yue et al.
Breach By A Thousand Leaks: Unsafe Information Leakage in 'Safe' AI Responses
David Glukhov, Ziwen Han, I Shumailov et al.
Causal Inference over Visual-Semantic-Aligned Graph for Image Classification
Lei Meng, Xiangxian Li, Xiaoshuo Yan et al.
Rethinking Visual Counterfactual Explanations Through Region Constraint
Bartlomiej Sobieski, Jakub Grzywaczewski, Bartłomiej Sadlej et al.
Multiple Heads are Better than One: Mixture of Modality Knowledge Experts for Entity Representation Learning
Yichi Zhang, Zhuo Chen, Lingbing Guo et al.
The 3D-PC: a benchmark for visual perspective taking in humans and machines
Drew Linsley, Peisen Zhou, Alekh Ashok et al.
Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning of Vision Language Models
Huajie Tan, Yuheng Ji, Xiaoshuai Hao et al.
EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation
Zihao Zhang, Haoran Chen, Haoyu Zhao et al.
Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
Federico Cocchi, Nicholas Moratelli, Marcella Cornia et al.
MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL
Claas Voelcker, Marcel Hussing, ERIC EATON et al.
Temporal Separation with Entropy Regularization for Knowledge Distillation in Spiking Neural Networks
Kairong Yu, Chengting Yu, Tianqing Zhang et al.
DocVLM: Make Your VLM an Efficient Reader
Mor Shpigel Nacson, Aviad Aberdam, Roy Ganz et al.
Pareto Set Learning for Multi-Objective Reinforcement Learning
Erlong Liu, Yu-Chang Wu, Xiaobin Huang et al.
RGBAvatar: Reduced Gaussian Blendshapes for Online Modeling of Head Avatars
Linzhou Li, Yumeng Li, Yanlin Weng et al.
V2X-R: Cooperative LiDAR-4D Radar Fusion with Denoising Diffusion for 3D Object Detection
Xun Huang, Jinlong Wang, Qiming Xia et al.
Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting
Suraj Anand, Michael Lepori, Jack Merullo et al.
Hierarchical Mixture of Experts: Generalizable Learning for High-Level Synthesis
Weikai Li, Ding Wang, Zijian Ding et al.
Visual Generation Without Guidance
Huayu Chen, Kai Jiang, Kaiwen Zheng et al.
Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs
Yikang Zhou, Tao Zhang, Shilin Xu et al.
PCDreamer: Point Cloud Completion Through Multi-view Diffusion Priors
Guangshun Wei, Yuan Feng, Long Ma et al.
Harnessing Massive Satellite Imagery with Efficient Masked Image Modeling
Fengxiang Wang, Hongzhen Wang, Di Wang et al.
Noise Stability Optimization for Finding Flat Minima: A Hessian-based Regularization Approach
Haotian Ju, Hongyang Zhang, Dongyue Li
WildSAT: Learning Satellite Image Representations from Wildlife Observations
Rangel Daroya, Elijah Cole, Oisin Mac Aodha et al.
VTON 360: High-Fidelity Virtual Try-On from Any Viewing Direction
Zijian He, Yuwei Ning, Yipeng Qin et al.
SuperDec: 3D Scene Decomposition with Superquadrics Primitives
Elisabetta Fedele, Boyang Sun, Francis Engelmann et al.
LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models
Haiwen Huang, Anpei Chen, Volodymyr Havrylov et al.
$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training
Jin Zhou, Kaiwen Wang, Jonathan Chang et al.
ParZC: Parametric Zero-Cost Proxies for Efficient NAS
Peijie Dong, Lujun Li, Zhenheng Tang et al.
Generative Monoculture in Large Language Models
Fan Wu, Emily Black, Varun Chandrasekaran
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
Gang Li, Ming Lin, Tomer Galanti et al.
Knowledge Graph Completion with Relation-Aware Anchor Enhancement
Duanyang Yuan, Sihang Zhou, Xiaoshu Chen et al.
Video Summarization with Large Language Models
Min Jung Lee, Dayoung Gong, Minsu Cho
Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model
Zhaochong An, Guolei Sun, Yun Liu et al.
DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution
Zheng Chen, Zichen Zou, Kewei Zhang et al.
Revisiting Multimodal Fusion for 3D Anomaly Detection from an Architectural Perspective
Kaifang Long, Guoyang Xie, Lianbo Ma et al.
Generalized and Efficient 2D Gaussian Splatting for Arbitrary-scale Super-Resolution
Du Chen, Liyi Chen, Zhengqiang ZHANG et al.
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float (DFloat11)
Tianyi Zhang, Mohsen Hariri, Shaochen (Henry) Zhong et al.
ReCap: Better Gaussian Relighting with Cross-Environment Captures
Jingzhi Li, Zongwei Wu, Eduard Zamfir et al.
Graph Neural Preconditioners for Iterative Solutions of Sparse Linear Systems
Jie Chen
Open-World Amodal Appearance Completion
Jiayang Ao, Yanbei Jiang, Qiuhong Ke et al.
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data
Yiren Song, Cheng Liu, Mike Zheng Shou
Anyprefer: An Agentic Framework for Preference Data Synthesis
Yiyang Zhou, Zhaoyang Wang, Tianle Wang et al.
DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers
Xuanlei Zhao, Shenggan Cheng, Chang Chen et al.
Budgeted Online Continual Learning by Adaptive Layer Freezing and Frequency-based Sampling
Minhyuk Seo, Hyunseo Koh, Jonghyun Choi
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Haiyang Wang, Yue Fan, Muhammad Ferjad Naeem et al.
MINERVA: Evaluating Complex Video Reasoning
Arsha Nagrani, Sachit Menon, Ahmet Iscen et al.
Efficiently Parameterized Neural Metriplectic Systems
Anthony Gruber, Kookjin Lee, Haksoo Lim et al.
MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-text Decoding
Weikang Qiu, Zheng Huang, Haoyu Hu et al.
A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks
Thomas Schmied, Thomas Adler, Vihang Patil et al.
Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling
Tsung-Han (Patrick) Wu, Heekyung Lee, Jiaxin Ge et al.
SlerpFace: Face Template Protection via Spherical Linear Interpolation
Zhizhou Zhong, Yuxi Mi, Yuge Huang et al.
HIIF: Hierarchical Encoding based Implicit Image Function for Continuous Super-resolution
Yuxuan Jiang, Ho Man Kwan, jasmine peng et al.
MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards
Sheng Wang, Liheng Chen, Pengan CHEN et al.
DOTA: Distributional Test-time Adaptation of Vision-Language Models
Zongbo Han, Jialong Yang, Guangyu Wang et al.
Probing the Latent Hierarchical Structure of Data via Diffusion Models
Antonio Sclocchi, Alessandro Favero, Noam Levi et al.
SpatialDreamer: Self-supervised Stereo Video Synthesis from Monocular Input
Zhen Lv, Yangqi Long, Congzhentao Huang et al.
DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation
Jing He, Haodong Li, huyongzhe et al.
Training-Free Guidance Beyond Differentiability: Scalable Path Steering with Tree Search in Diffusion and Flow Models
Yingqing Guo, Yukang Yang, Hui Yuan et al.
The Double-Ellipsoid Geometry of CLIP
Meir Yossef Levi, Guy Gilboa
CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering
Tianyu Huai, Jie Zhou, Xingjiao Wu et al.
GRPose: Learning Graph Relations for Human Image Generation with Pose Priors
Xiangchen Yin, Donglin Di, Lei Fan et al.
X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing
Xinyan Chen, Jianfei Yang
Trajectory Mamba: Efficient Attention-Mamba Forecasting Model Based on Selective SSM
Yizhou Huang, Yihua Cheng, Kezhi Wang
Light-T2M: A Lightweight and Fast Model for Text-to-motion Generation
Ling-An Zeng, Guohong Huang, Gaojie Wu et al.
Measuring memorization in RLHF for code completion
Jamie Hayes, I Shumailov, Billy Porter et al.
Visual Test-time Scaling for GUI Agent Grounding
Tiange Luo, Lajanugen Logeswaran, Justin Johnson et al.
PhyMPGN: Physics-encoded Message Passing Graph Network for spatiotemporal PDE systems
Bocheng Zeng, Qi Wang, Mengtao Yan et al.
Hybrid Global-Local Representation with Augmented Spatial Guidance for Zero-Shot Referring Image Segmentation
Ting Liu, Siyuan Li
Radiology Report Generation via Multi-objective Preference Optimization
Ting Xiao, Lei Shi, Peng Liu et al.
Harmony in Divergence: Towards Fast, Accurate, and Memory-efficient Zeroth-order LLM Fine-tuning
Qitao Tan, Jun Liu, Zheng Zhan et al.
HOIGPT: Learning Long-Sequence Hand-Object Interaction with Language Models
Mingzhen Huang, Fu-Jen Chu, Bugra Tekin et al.
Layered Image Vectorization via Semantic Simplification
Zhenyu Wang, Jianxi Huang, Zhida Sun et al.
GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration
Yuchen Sun, Shanhui Zhao, Tao Yu et al.
Task-Agnostic Guided Feature Expansion for Class-Incremental Learning
Bowen Zheng, Da-Wei Zhou, Han-Jia Ye et al.
Text-to-CAD Generation Through Infusing Visual Feedback in Large Language Models
Ruiyu Wang, Yu Yuan, Shizhao Sun et al.
VTON-HandFit: Virtual Try-on for Arbitrary Hand Pose Guided by Hand Priors Embedding
Yujie Liang, Xiaobin Hu, Boyuan Jiang et al.
Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM
Yatai Ji, Jiacheng Zhang, Jie Wu et al.
CCIN: Compositional Conflict Identification and Neutralization for Composed Image Retrieval
Likai Tian, Jian Zhao, Zechao Hu et al.
Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think
Zhenyi Lu, Xiaoye Qu, Zhenyi Lu et al.
Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions
Siqiao Mu, Diego Klabjan
Node-Time Conditional Prompt Learning in Dynamic Graphs
Xingtong Yu, Zhenghao Liu, Xinming Zhang et al.
Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model
Dongki Kim, Wonbin Lee, Sung Ju Hwang
GraphAvatar: Compact Head Avatars with GNN-Generated 3D Gaussians
Xiaobao Wei, Peng Chen, Ming Lu et al.
Spectral Image Tokenizer
Carlos Esteves, Mohammed Suhail, Ameesh Makadia
VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation
Wenhao Wang, Yi Yang
RomanTex: Decoupling 3D-aware Rotary Positional Embedded Multi-Attention Network for Texture Synthesis
yifei feng, Mx Yang, Shuhui Yang et al.
Point2RBox-v2: Rethinking Point-supervised Oriented Object Detection with Spatial Layout Among Instances
Yi Yu, Botao Ren, Peiyuan Zhang et al.
DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents
Hao Li, Xiaogeng Liu, CHIU Chun et al.
CoreMatching: A Co-adaptive Sparse Inference Framework with Token and Neuron Pruning for Comprehensive Acceleration of Vision-Language Models
Qinsi Wang, Hancheng Ye, Ming-Yu Chung et al.
Attention as a Hypernetwork
Simon Schug, Seijin Kobayashi, Yassir Akram et al.
Beyond Graphs: Can Large Language Models Comprehend Hypergraphs?
Yifan Feng, Chengwu Yang, Xingliang Hou et al.
Reliable and Efficient Amortized Model-based Evaluation
Sang Truong, Yuheng Tu, Percy Liang et al.
CLEVER: A Curated Benchmark for Formally Verified Code Generation
Amitayush Thakur, Jasper Lee, George Tsoukalas et al.
Tuning the Frequencies: Robust Training for Sinusoidal Neural Networks
Tiago Novello, Diana Aldana Moreno, André Araujo et al.
ADBA: Approximation Decision Boundary Approach for Black-Box Adversarial Attacks
Feiyang Wang, Xingquan Zuo, Hai Huang et al.
The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization
Jae-Won Chung, Jeff J. Ma, Ruofan Wu et al.
Needle Threading: Can LLMs Follow Threads Through Near-Million-Scale Haystacks?
Jonathan Roberts, Kai Han, Samuel Albanie
AlphaPO: Reward Shape Matters for LLM Alignment
Aman Gupta, Shao Tang, Qingquan Song et al.
Taylor Series-Inspired Local Structure Fitting Network for Few-shot Point Cloud Semantic Segmentation
Changshuo Wang, Shuting He, Xiang Fang et al.
Unifying 2D and 3D Vision-Language Understanding
Ayush Jain, Alexander Swerdlow, Yuzhou Wang et al.
A Closer Look at TabPFN v2: Understanding Its Strengths and Extending Its Capabilities
Han-Jia Ye, Si-Yang Liu, Wei-Lun (Harry) Chao
Unlearn and Burn: Adversarial Machine Unlearning Requests Destroy Model Accuracy
Yangsibo Huang, Daogao Liu, Lynn Chua et al.
QERA: an Analytical Framework for Quantization Error Reconstruction
Cheng Zhang, Jeffrey T. H. Wong, Can Xiao et al.
Transformer Copilot: Learning from The Mistake Log in LLM Fine-tuning
Jiaru Zou, Yikun Ban, Zihao Li et al.
CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models
Hao He, Ceyuan Yang, Shanchuan Lin et al.
DreamRelation: Bridging Customization and Relation Generation
Qingyu Shi, Lu Qi, Jianzong Wu et al.
Periodic Materials Generation using Text-Guided Joint Diffusion Model
KISHALAY DAS, Subhojyoti Khastagir, Pawan Goyal et al.
FormalAlign: Automated Alignment Evaluation for Autoformalization
Jianqiao Lu, Yingjia Wan, Yinya Huang et al.
PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation
Qihan Huang, Weilong Dai, Jinlong Liu et al.
Scaling Laws for Optimal Data Mixtures
Mustafa Shukor, Louis Bethune, Dan Busbridge et al.
Boosting Latent Diffusion with Perceptual Objectives
Tariq Berrada, Pietro Astolfi, Melissa Hall et al.
AdaRankGrad: Adaptive Gradient Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning
Yehonathan Refael, Jonathan Svirsky, Boris Shustin et al.
UCF-Crime-DVS: A Novel Event-Based Dataset for Video Anomaly Detection with Spiking Neural Networks
Yuanbin Qian, Shuhan Ye, Chong Wang et al.
Attention layers provably solve single-location regression
Pierre Marion, Raphaël Berthier, Gérard Biau et al.
Efficient Attention-Sharing Information Distillation Transformer for Lightweight Single Image Super-Resolution
Karam Park, Jae Woong Soh, Nam Ik Cho
Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation
Yuxuan Wang, Xuanyu Yi, Haohan Weng et al.
X-Dancer: Expressive Music to Human Dance Video Generation
Zeyuan Chen, Hongyi Xu, Guoxian Song et al.
Topological Blindspots: Understanding and Extending Topological Deep Learning Through the Lens of Expressivity
Yam Eitan, Yoav Gelberg, Guy Bar-Shalom et al.
DM-Adapter: Domain-Aware Mixture-of-Adapters for Text-Based Person Retrieval
Yating Liu, Zimo Liu, Xiangyuan Lan et al.
RayFlow: Instance-Aware Diffusion Acceleration via Adaptive Flow Trajectories
Huiyang Shao, Xin Xia, Yuhong Yang et al.
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
Yumeng Li, William H Beluch, Margret Keuper et al.
Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order Bounds
Zhiyong Wang, Dongruo Zhou, John C.S. Lui et al.
Hydra-SGG: Hybrid Relation Assignment for One-stage Scene Graph Generation
Minghan Chen, Guikun Chen, Wenguan Wang et al.