Most Cited 2025 "embedding space deduplication" Papers
22,274 papers found • Page 41 of 112
Conference
SDFit: 3D Object Pose and Shape by Fitting a Morphable SDF to a Single Image
Dimitrije Antić, Georgios Paschalidis, Shashank Tripathi et al.
Towards Immersive Human-X Interaction: A Real-Time Framework for Physically Plausible Motion Synthesis
Kaiyang Ji, Ye Shi, Zichen Jin et al.
EgoM2P: Egocentric Multimodal Multitask Pretraining
Gen Li, Yutong Chen, Yiqian Wu et al.
Temporal Unlearnable Examples: Preventing Personal Video Data from Unauthorized Exploitation by Object Tracking
Qiangqiang Wu, Yi Yu, Chenqi Kong et al.
G-DexGrasp: Generalizable Dexterous Grasping Synthesis Via Part-Aware Prior Retrieval and Prior-Assisted Generation
Juntao Jian, Xiuping Liu, Zixuanchen Zixuanchen et al.
DH-FaceVid-1K: A Large-Scale High-Quality Dataset for Face Video Generation
Donglin Di, He Feng, Wenzhang SUN et al.
Adaptive Hyper-Graph Convolution Network for Skeleton-based Human Action Recognition with Virtual Connections
Youwei Zhou, Tianyang Xu, Cong Wu et al.
FaceLift: Learning Generalizable Single Image 3D Face Reconstruction from Synthetic Heads
Weijie Lyu, Yi Zhou, Ming-Hsuan Yang et al.
Precise Action-to-Video Generation Through Visual Action Prompts
Yuang Wang, Chao Wen, Haoyu Guo et al.
Latent-Reframe: Enabling Camera Control for Video Diffusion Models without Training
Zhenghong Zhou, Jie An, Jiebo Luo
Frequency-Guided Posterior Sampling for Diffusion-Based Image Restoration
Darshan Thaker, Abhishek Goyal, Rene Vidal
A Quality-Guided Mixture of Score-Fusion Experts Framework for Human Recognition
Jie Zhu, Yiyang Su, Minchul Kim et al.
Motion Synthesis with Sparse and Flexible Keyjoint Control
Inwoo Hwang, Jinseok Bae, Donggeun Lim et al.
IDFace: Face Template Protection for Efficient and Secure Identification
Sunpill Kim, Seunghun Paik, Chanwoo Hwang et al.
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh
Shuangkang Fang, I-Chao Shen, Yufeng Wang et al.
2HandedAfforder: Learning Precise Actionable Bimanual Affordances from Human Videos
Marvin Heidinger, Snehal Jauhri, Vignesh Prasad et al.
Edicho: Consistent Image Editing in the Wild
Qingyan Bai, Hao Ouyang, Yinghao Xu et al.
DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization
Wenchuan Wang, Mengqi Huang, Yijing Tu et al.
Versatile Transition Generation with Image-to-Video Diffusion
Zuhao Yang, Jiahui Zhang, Yingchen Yu et al.
AnyI2V: Animating Any Conditional Image with Motion Control
Ziye Li, Xincheng Shuai, Hao Luo et al.
The Silent Assistant: NoiseQuery as Implicit Guidance for Goal-Driven Image Generation
Ruoyu Wang, Huayang Huang, Ye Zhu et al.
Disrupting Model Merging: A Parameter-Level Defense Without Sacrificing Accuracy
JUNHAO WEI, YU ZHE, Jun Sakuma
Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers
Divyansh Srivastava, Xiang Zhang, He Wen et al.
Contrastive Test-Time Composition of Multiple LoRA Models for Image Generation
Tuna Meral, Enis Simsar, Federico Tombari et al.
Benchmarking and Learning Multi-Dimensional Quality Evaluator for Text-to-3D Generation
Yujie Zhang, Bingyang Cui, Qi Yang et al.
Dense2MoE: Restructuring Diffusion Transformer to MoE for Efficient Text-to-Image Generation
Youwei Zheng, Yuxi Ren, Xin Xia et al.
Exploring Multimodal Diffusion Transformers for Enhanced Prompt-based Image Editing
Joonghyuk Shin, Alchan Hwang, Yujin Kim et al.
EDiT: Efficient Diffusion Transformers with Linear Compressed Attention
Philipp Becker, Abhinav Mehrotra, Ruchika Chavhan et al.
Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding
Yuanhan Zhang, Yunice Chew, Yuhao Dong et al.
MultiverSeg: Scalable Interactive Segmentation of Biomedical Imaging Datasets with In-Context Guidance
Hallee Wong, Jose Javier Gonzalez Ortiz, John Guttag et al.
DisTime: Distribution-based Time Representation for Video Large Language Models
yingsen zeng, Zepeng Huang, Yujie Zhong et al.
Dynamic Dictionary Learning for Remote Sensing Image Segmentation
Xuechao Zou, Yue Li, Shun Zhang et al.
Streaming VideoLLMs for Real-Time Procedural Video Understanding
Dibyadip Chatterjee, Edoardo Remelli, Yale Song et al.
SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs
Jiahui Wang, Zuyan Liu, Yongming Rao et al.
From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment
Yucheng Suo, Fan Ma, Linchao Zhu et al.
VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization
Sihan Yang, Runsen Xu, Chenhang Cui et al.
Scaling Tumor Segmentation: Best Lessons from Real and Synthetic Data
Qi Chen, Xinze Zhou, Chen Liu et al.
SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality
Sijie Li, Chen Chen, Jungong Han
Street Gaussians without 3D Object Tracker
Ruida Zhang, Chengxi Li, Chenyangguang Zhang et al.
StochasticSplats: Stochastic Rasterization for Sorting-Free 3D Gaussian Splatting
Shakiba Kheradmand, Delio Vicini, George Kopanas et al.
Curve-Aware Gaussian Splatting for 3D Parametric Curve Reconstruction
Zhirui Gao, Renjiao Yi, YaQiao Dai et al.
AAA-Gaussians: Anti-Aliased and Artifact-Free 3D Gaussian Rendering
Michael Steiner, Thomas Köhler, Lukas Radl et al.
Sat2City: 3D City Generation from A Single Satellite Image with Cascaded Latent Diffusion
Tongyan Hua, Lutao Jiang, Ying-Cong Chen et al.
Leveraging BEV Paradigm for Ground-to-Aerial Image Synthesis
Junyan Ye, Jun He, Weijia Li et al.
Unraveling the Effects of Synthetic Data on End-to-End Autonomous Driving
Junhao Ge, Zuhong Liu, Longteng Fan et al.
MikuDance: Animating Character Art with Mixed Motion Dynamics
Jiaxu Zhang, Xianfang Zeng, Xin Chen et al.
ObjectRelator: Enabling Cross-View Object Relation Understanding Across Ego-Centric and Exo-Centric Perspectives
Yuqian Fu, Runze Wang, Bin Ren et al.
CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation
Zhuoyan Luo, Yinghao Wu, Tianheng Cheng et al.
Not All Frame Features Are Equal: Video-to-4D Generation via Decoupling Dynamic-Static Features
Liying Yang, Chen Liu, Zhenwei Zhu et al.
Synergistic Prompting for Robust Visual Recognition with Missing Modalities
Zhihui Zhang, Luanyuan Dai, Qika Lin et al.
PerLDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Model
Jinhua Zhang, Hualian Sheng, Sijia Cai et al.
HERO: Human Reaction Generation from Videos
Chengjun Yu, Wei Zhai, Yuhang Yang et al.
Improving Multimodal Learning via Imbalanced Learning
Shicai Wei, Chunbo Luo, Yang Luo
InterGSEdit: Interactive 3D Gaussian Splatting Editing with 3D Geometry-Consistent Attention Prior
Minghao Wen, Shengjie Wu, Kangkan Wang et al.
CaO2: Rectifying Inconsistencies in Diffusion-Based Dataset Distillation
Haoxuan Wang, Zhenghao Zhao, Junyi Wu et al.
CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning
Duo Wu, Jinghe Wang, Yuan Meng et al.
Learning Normal Flow Directly From Events
Dehao Yuan, Levi Burner, Jiayi Wu et al.
Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning
Qi Wang, Zhipeng Zhang, Baao Xie et al.
DiffVSR: Revealing an Effective Recipe for Taming Robust Video Super-Resolution Against Complex Degradations
Xiaohui Li, Yihao Liu, Shuo Cao et al.
SpatialCrafter: Unleashing the Imagination of Video Diffusion Models for Scene Reconstruction from Limited Observations
Songchun Zhang, Huiyao Xu, Sitong Guo et al.
TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition
Xingsong Ye, Yongkun Du, Yunbo Tao et al.
MOSCATO: Predicting Multiple Object State Change Through Actions
Parnian Zameni, Yuhan Shen, Ehsan Elhamifar
AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction
Junhao Cheng, Yuying Ge, Yixiao Ge et al.
Exploring the Visual Feature Space for Multimodal Neural Decoding
Weihao Xia, Cengiz Oztireli
RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping
Dongming Wu, Yanping Fu, Saike Huang et al.
Scendi Score: Prompt‑Aware Diversity Evaluation via Schur Complement of CLIP Embeddings
Azim Ospanov, Mohammad Jalali, Farzan Farnia
VisNumBench: Evaluating Number Sense of Multimodal Large Language Models
Tengjin Weng, Jingyi Wang, Wenhao Jiang et al.
NeurOp-Diff: Continuous Remote Sensing Image Super-Resolution via Neural Operator Diffusion
Zihao Xu, Yuzhi Tang, Bowen Xu et al.
Φ-GAN:Physics-Inspired GAN for Generating SAR Images Under Limited Data
Xidan Zhang, Yihan Zhuang, Qian Guo et al.
GaSLight: Gaussian Splats for Spatially-Varying Lighting in HDR
Christophe Bolduc, Yannick Hold-Geoffroy, Jean-Francois Lalonde
GUAVA: Generalizable Upper Body 3D Gaussian Avatar
Dongbin Zhang, Yunfei Liu, Lijian Lin et al.
DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer
Yecheng Wu, Han Cai, Junyu Chen et al.
X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation
jian ma, Qirong Peng, Xu Guo et al.
WaveMamba: Wavelet-Driven Mamba Fusion for RGB-Infrared Object Detection
Haodong Zhu, Wenhao Dong, Linlin Yang et al.
SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts
Gengze Zhou, Yicong Hong, Zun Wang et al.
DC-ControlNet: Decoupling Inter- and Intra-Element Conditions in Image Generation with Diffusion Models
hongji yang, Wencheng Han, Yucheng Zhou et al.
Adding Additional Control to One-Step Diffusion with Joint Distribution Matching
Yihong Luo, Tianyang Hu, Yifan Song et al.
Boost 3D Reconstruction using Diffusion-based Monocular Camera Calibration
Junyuan Deng, Wei Yin, Xiaoyang Guo et al.
SignRep: Enhancing Self-Supervised Sign Representations
Ryan Wong, Necati Cihan Camgoz, Richard Bowden
Frequency-Aware Autoregressive Modeling for Efficient High-Resolution Image Synthesis
Zhuokun Chen, Jugang Fan, Zhuowei Yu et al.
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory
Nan Chen, Mengqi Huang, Yihao Meng et al.
SEGS-SLAM: Structure-enhanced 3D Gaussian Splatting SLAM with Appearance Embedding
Tianci Wen, Zhiang Liu, Yongchun Fang
SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering
Byeongjun Park, Hyojun Go, Hyelin Nam et al.
GaussianUpdate: Continual 3D Gaussian Splatting Update for Changing Environments
Lin Zeng, Boming Zhao, Jiarui Hu et al.
Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion
shengyuan zhang, An Zhao, Ling Yang et al.
MonoMVSNet: Monocular Priors Guided Multi-View Stereo Network
Jianfei Jiang, Qiankun Liu, Haochen Yu et al.
PhysRig: Differentiable Physics-Based Skinning and Rigging Framework for Realistic Articulated Object Modeling
Hao Zhang, Haolan Xu, Chun Feng et al.
Latent Diffusion Models with Masked AutoEncoders
Junho Lee, Jeongwoo Shin, Hyungwook Choi et al.
TikZero: Zero-Shot Text-Guided Graphics Program Synthesis
Jonas Belouadi, Eddy Ilg, Margret Keuper et al.
Capturing Individual Human Preferences with Reward Features
Andre Barreto, Vincent Dumoulin, Yiran Mao et al.
Understanding Prompt Tuning and In-Context Learning via Meta-Learning
Tim Genewein, Kevin Li, Jordi Grau-Moya et al.
Tight Lower Bounds and Improved Convergence in Performative Prediction
Pedram Khorsandi, Rushil Gupta, Mehrnaz Mofakhami et al.
MIRAGE: A Benchmark for Multimodal Information-Seeking and Reasoning in Agricultural Expert-Guided Conversations
Vardhan Dongre, Chi Gui, Shubham Garg et al.
MindGYM: What Matters in Question Synthesis for Thinking-Centric Fine-Tuning?
Zhe Xu, Daoyuan Chen, Zhenqing Ling et al.
EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity Understanding
Ege Özsoy, Arda Mamur, Felix Tristram et al.
BEDLAM2.0: Synthetic humans and cameras in motion
Joachim Tesch, Giorgio Becherini, Prerana Achar et al.
Privacy Reasoning in Ambiguous Contexts
Ren Yi, Octavian Suciu, Adrian Gascon et al.
Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding
Hanyin Wang, Zhenbang Wu, Gururaj Kolar et al.
Mitigating Forgetting in LLM Fine-Tuning via Low-Perplexity Token Learning
Chao-Chung Wu, Zhi Rui Tam, Chieh-Yen Lin et al.
FlowDAS: A Stochastic Interpolant-based Framework for Data Assimilation
Siyi Chen, Yixuan Jia, Qing Qu et al.
Backward Conformal Prediction
Etienne Gauthier, Francis Bach, Michael Jordan
FreqPolicy: Efficient Flow-based Visuomotor Policy via Frequency Consistency
Yifei Su, Ning Liu, Dong Chen et al.
Tropical Attention: Neural Algorithmic Reasoning for Combinatorial Algorithms
Baran Hashemi, Kurt Pasque, Chris Teska et al.
GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains
Chun Wang, Xiaojun Ye, Xiaoran Pan et al.
In-Context Learning Strategies Emerge Rationally
Daniel Wurgaft, Ekdeep S Lubana, Core Francisco Park et al.
Logic.py: Bridging the Gap between LLMs and Constraint Solvers
Pascal Kesseli, Peter O'Hearn, Ricardo Cabral
BeliefMapNav: 3D Voxel-Based Belief Map for Zero-Shot Object Navigation
Zibo Zhou, Yue Hu, Lingkai Zhang et al.
PanoWan: Lifting Diffusion Video Generation Models to 360$^\circ$ with Latitude/Longitude-aware Mechanisms
Yifei Xia, Shuchen Weng, Siqi Yang et al.
MINGLE: Mixture of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging
Zihuan Qiu, Yi Xu, Chiyuan He et al.
Treatment Effect Estimation for Optimal Decision-Making
Dennis Frauen, Valentyn Melnychuk, Jonas Schweisthal et al.
ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding
Muye Huang, Lingling Zhang, Jie Ma et al.
Exploring Diffusion Transformer Designs via Grafting
Keshigeyan Chandrasegaran, Michael Poli, Dan Fu et al.
MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks
Sanjoy Chowdhury, Mohamed Elmoghany, Yohan Abeysinghe et al.
NeurIPT: Foundation Model for Neural Interfaces
Zitao Fang, Chenxuan Li, Hongting Zhou et al.
Learning Diffusion Models with Flexible Representation Guidance
Chenyu Wang, Cai Zhou, Sharut Gupta et al.
Efficient Data Selection at Scale via Influence Distillation
Mahdi Nikdan, Vincent Cohen-Addad, Dan Alistarh et al.
Scaling Up Liquid-Resistance Liquid-Capacitance Networks for Efficient Sequence Modeling
Mónika Farsang, Radu Grosu
Entropic Time Schedulers for Generative Diffusion Models
Dejan Stancevic, Florian Handke, Luca Ambrogioni
Modeling Microenvironment Trajectories on Spatial Transcriptomics with NicheFlow
Kristiyan Sakalyan, Alessandro Palma, Filippo Guerranti et al.
Better Language Model Inversion by Compactly Representing Next-Token Distributions
Murtaza Nazir, Matthew Finlayson, John Morris et al.
Distribution-Aligned Decoding for Efficient LLM Task Adaptation
Senkang Hu, Xudong Han, Jinqi Jiang et al.
System Prompt Optimization with Meta-Learning
Yumin Choi, Jinheon Baek, Sung Ju Hwang
InfinityStar: Unified Spacetime AutoRegressive Modeling for Visual Generation
Jinlai Liu, Jian Han, Bin Yan et al.
Understanding and Rectifying Safety Perception Distortion in VLMs
Xiaohan Zou, Jian Kang, George Kesidis et al.
Multilevel neural simulation-based inference
Yuga Hikida, Ayush Bharti, Niall Jeffrey et al.
One Subgoal at a Time: Zero-Shot Generalization to Arbitrary Linear Temporal Logic Requirements in Multi-Task Reinforcement Learning
Zijian Guo, İlker Işık, H M Sabbir Ahmad et al.
Neural Thermodynamics: Entropic Forces in Deep and Universal Representation Learning
Liu Ziyin, Yizhou Xu, Isaac Chuang
Improving Generalization of Neural Combinatorial Optimization for Vehicle Routing Problems via Test-Time Projection Learning
Yuanyao Chen, Rongsheng Chen, Fu Luo et al.
Fine-grained List-wise Alignment for Generative Medication Recommendation
Chenxiao Fan, Chongming Gao, Wentao Shi et al.
Improved Balanced Classification with Theoretically Grounded Loss Functions
Corinna Cortes, Mehryar Mohri, Yutao Zhong
Spiral: Semantic-Aware Progressive LiDAR Scene Generation and Understanding
Dekai Zhu, Yixuan Hu, Youquan Liu et al.
The Fluorescent Veil: A Stealthy and Effective Physical Adversarial Patch Against Traffic Sign Recognition
Shuai Yuan, Xingshuo Han, Hongwei Li et al.
ExGra-Med: Extended Context Graph Alignment for Medical Vision-Language Models
Duy M. H. Nguyen, Nghiem Diep, Trung Nguyen et al.
Transformer brain encoders explain human high-level visual responses
Hossein Adeli, Sun Minni, Nikolaus Kriegeskorte
When Are Concepts Erased From Diffusion Models?
Kevin Lu, Nicky Kriplani, Rohit Gandikota et al.
R$^2$ec: Towards Large Recommender Models with Reasoning
Runyang You, Yongqi Li, Xinyu Lin et al.
Convergence of Clipped SGD on Convex $(L_0,L_1)$-Smooth Functions
Ofir Gaash, Kfir Y. Levy, Yair Carmon
RadarQA: Multi-modal Quality Analysis of Weather Radar Forecasts
Xuming He, Zhiyuan You, Junchao Gong et al.
The Rich and the Simple: On the Implicit Bias of Adam and SGD
Bhavya Vasudeva, Jung Lee, Vatsal Sharan et al.
macOSWorld: A Multilingual Interactive Benchmark for GUI Agents
Pei Yang, Hai Ci, Mike Zheng Shou
CHOICE: Benchmarking the Remote Sensing Capabilities of Large Vision-Language Models
Xiao An, Jiaxing Sun, Zihan Gui et al.
Joint Relational Database Generation via Graph-Conditional Diffusion Models
Mohamed Amine Ketata, David Lüdke, Leo Schwinn et al.
Generating Computational Cognitive models using Large Language Models
Milena Rmus, Akshay Kumar Jagadish, Marvin Mathony et al.
Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers
Yixiao Huang, Hanlin Zhu, Tianyu Guo et al.
Failure Prediction at Runtime for Generative Robot Policies
Ralf Römer, Adrian Kobras, Luca Worbis et al.
Predicting Empirical AI Research Outcomes with Language Models
Jiaxin Wen, Chenglei Si, Yueh-Han Chen et al.
Transformers Provably Learn Chain-of-Thought Reasoning with Length Generalization
Yu Huang, Zixin Wen, Aarti Singh et al.
Towards Understanding the Mechanisms of Classifier-Free Guidance
Xiang Li, Rongrong Wang, Qing Qu
Learning Orthogonal Multi-Index Models: A Fine-Grained Information Exponent Analysis
Yunwei Ren, Jason Lee
DrVD-Bench: Do Vision-Language Models Reason Like Human Doctors in Medical Image Diagnosis?
Tianhong Zhou, xu yin, Yingtao Zhu et al.
Refusal Direction is Universal Across Safety-Aligned Languages
Xinpeng Wang, Mingyang Wang, Yihong Liu et al.
Systematic Reward Gap Optimization for Mitigating VLM Hallucinations
Lehan He, Zeren Chen, Zhelun Shi et al.
Revisiting End-to-End Learning with Slide-level Supervision in Computational Pathology
Wenhao Tang, Rong Qin, Heng Fang et al.
Language Models Can Predict Their Own Behavior
Dhananjay Ashok, Jonathan May
AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play
Ran Xu, Yuchen Zhuang, Zihan Dong et al.
Object-centric 3D Motion Field for Robot Learning from Human Videos
Zhao-Heng Yin, Sherry Yang, Pieter Abbeel
Planning without Search: Refining Frontier LLMs with Offline Goal-Conditioned RL
Joey Hong, Anca Dragan, Sergey Levine
LuxDiT: Lighting Estimation with Video Diffusion Transformer
Ruofan Liang, Kai He, Zan Gojcic et al.
Bisecle: Binding and Separation in Continual Learning for Video Language Understanding
Yue Tan, Xiaoqian Hu, Hao Xue et al.
Online Learning of Neural Networks
Amit Daniely, Idan Mehalel, Elchanan Mossel
VeriThoughts: Enabling Automated Verilog Code Generation using Reasoning and Formal Verification
Patrick Yubeaton, Andre Nakkab, Weihua Xiao et al.
FlexEvent: Towards Flexible Event-Frame Object Detection at Varying Operational Frequencies
Dongyue Lu, Lingdong Kong, Gim Hee Lee et al.
Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction
Yifei Wang, Weimin Bai, colin zhang et al.
Multi-head Transformers Provably Learn Symbolic Multi-step Reasoning via Gradient Descent
Tong Yang, Yu Huang, Yingbin Liang et al.
WorldWeaver: Generating Long-Horizon Video Worlds via Rich Perception
Zhiheng Liu, Xueqing Deng, Shoufa Chen et al.
Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs
Kejia Zhang, Keda TAO, Jiasheng Tang et al.
The Structural Complexity of Matrix-Vector Multiplication
Emile Anand, Jan van den Brand, Rose McCarty
Neurosymbolic Diffusion Models
Emile van Krieken, Pasquale Minervini, Edoardo Maria Ponti et al.
Head Pursuit: Probing Attention Specialization in Multimodal Transformers
Lorenzo Basile, Valentino Maiorca, Diego Doimo et al.
Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs
Yifan Shen, Yuanzhe Liu, Jingyuan Zhu et al.
Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation
Wenbo Zhang, Tianrun Hu, Hanbo Zhang et al.
VideoVLA: Video Generators Can Be Generalizable Robot Manipulators
Yichao Shen, Fangyun Wei, Zhiying Du et al.
Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective
Yang Zhang, Xinran Li, Jianing Ye et al.
AutoData: A Multi-Agent System for Open Web Data Collection
Tianyi Ma, Yiyue Qian, Zheyuan Zhang et al.
E2Former: An Efficient and Equivariant Transformer with Linear-Scaling Tensor Products
Yunyang Li, Lin Huang, Zhihao Ding et al.
Do different prompting methods yield a common task representation in language models?
Guy Davidson, Todd Gureckis, Brenden Lake et al.
Alligat0R: Pre-Training through Covisibility Segmentation for Relative Camera Pose Regression
Thibaut Loiseau, Guillaume Bourmaud, Vincent Lepetit
PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning
Yizhen Zhang, Yang Ding, Shuoshuo Zhang et al.
Let Me Think! A Long Chain of Thought Can Be Worth Exponentially Many Short Ones
Parsa Mirtaheri, Ezra Edelman, Samy Jelassi et al.
Geo-Sign: Hyperbolic Contrastive Regularisation for Geometrically Aware Sign Language Translation
Edward Fish, Richard Bowden
Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks
Debargha Ganguly, Vikash Singh, Sreehari Sankar et al.
SAVVY: Spatial Awareness via Audio-Visual LLMs through Seeing and Hearing
Mingfei Chen, Zijun Cui, Xiulong Liu et al.
Flexible MOF Generation with Torsion-Aware Flow Matching
Nayoung Kim, Seongsu Kim, Sungsoo Ahn
MedMax: Mixed-Modal Instruction Tuning for Training Biomedical Assistants
Hritik Bansal, Daniel Israel, Siyan Zhao et al.
FALCON: An ML Framework for Fully Automated Layout-Constrained Analog Circuit Design
Asal Mehradfar, Xuzhe Zhao, Yilun Huang et al.
Functional Scaling Laws in Kernel Regression: Loss Dynamics and Learning Rate Schedules
Binghui Li, Fengling Chen, Zixun Huang et al.
On the Edge of Memorization in Diffusion Models
Sam Buchanan, Druv Pai, Yi Ma et al.
CellVerse: Do Large Language Models Really Understand Cell Biology?
Fan Zhang, Tianyu Liu, Zhihong Zhu et al.
Orthogonal Survival Learners for Estimating Heterogeneous Treatment Effects from Time-to-Event Data
Dennis Frauen, Maresa Schröder, Konstantin Hess et al.
Watermarking Autoregressive Image Generation
Nikola Jovanović, Ismail Labiad, Tomas Soucek et al.
Top-H Decoding: Adapting the Creativity and Coherence with Bounded Entropy in Text Generation
Erfan Baghaei Potraghloo, Seyedarmin Azizi, Souvik Kundu et al.
Efficient Quadratic Corrections for Frank-Wolfe Algorithms
Jannis Halbey, Seta Rakotomandimby, Mathieu Besançon et al.
Toward Engineering AGI: Benchmarking the Engineering Design Capabilities of LLMs
Xingang Guo, Yaxin Li, XiangYi Kong et al.
Unleashing Diffusion Transformers for Visual Correspondence by Modulating Massive Activations
Chaofan Gan, Yuanpeng Tu, Xi Chen et al.
L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models
Xiaohao Liu, Xiaobo Xia, Weixiang Zhao et al.
Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection
Shuhai Zhang, ZiHao Lian, Jiahao Yang et al.
MaterialRefGS: Reflective Gaussian Splatting with Multi-view Consistent Material Inference
Wenyuan Zhang, Jimin Tang, Weiqi Zhang et al.
Surprise3D: A Dataset for Spatial Understanding and Reasoning in Complex 3D Scenes
Jiaxin Huang, Ziwen Li, Hanlue Zhang et al.
RTV-Bench: Benchmarking MLLM Continuous Perception, Understanding and Reasoning through Real-Time Video
ShuHang Xun, Sicheng Tao, Jungang Li et al.
DanmakuTPPBench: A Multi-modal Benchmark for Temporal Point Process Modeling and Understanding
Yue Jiang, Jichu Li, Yang Liu et al.