Most Cited 2025 "covariance structure" Papers
22,274 papers found • Page 41 of 112
Conference
NightAdapter: Learning a Frequency Adapter for Generalizable Night-time Scene Segmentation
Qi Bi, Jingjun Yi, Huimin Huang et al.
MERGE: Multi-faceted Hierarchical Graph-based GNN for Gene Expression Prediction from Whole Slide Histopathology Images
Aniruddha Ganguly, Debolina Chatterjee, Wentao Huang et al.
3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement
Yihang Luo, Shangchen Zhou, Yushi Lan et al.
ABC-Former: Auxiliary Bimodal Cross-domain Transformer with Interactive Channel Attention for White Balance
Yu-Cheng Chiu, GUAN-RONG CHEN, Zihao Chen et al.
Vid-SME: Membership Inference Attacks against Large Video Understanding Models
Qi Li, Runpeng Yu, Xinchao Wang
BUFFER-X: Towards Zero-Shot Point Cloud Registration in Diverse Scenes
Minkyun Seo, Hyungtae Lim, Kanghee Lee et al.
PanoWan: Lifting Diffusion Video Generation Models to 360$^\circ$ with Latitude/Longitude-aware Mechanisms
Yifei Xia, Shuchen Weng, Siqi Yang et al.
Improved Balanced Classification with Theoretically Grounded Loss Functions
Corinna Cortes, Mehryar Mohri, Yutao Zhong
Unraveling the Effects of Synthetic Data on End-to-End Autonomous Driving
Junhao Ge, Zuhong Liu, Longteng Fan et al.
VisNumBench: Evaluating Number Sense of Multimodal Large Language Models
Tengjin Weng, Jingyi Wang, Wenhao Jiang et al.
Spiral: Semantic-Aware Progressive LiDAR Scene Generation and Understanding
Dekai Zhu, Yixuan Hu, Youquan Liu et al.
BIP3D: Bridging 2D Images and 3D Perception for Embodied Intelligence
Xuewu Lin, Tianwei Lin, Alan Huang et al.
DecoupledGaussian: Object-Scene Decoupling for Physics-Based Interaction
Miaowei Wang, Yibo Zhang, Rui Ma et al.
IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments
Can Zhang, Gim Hee Lee
MagicHOI: Leveraging 3D Priors for Accurate Hand-object Reconstruction from Short Monocular Video Clips
SHIBO WANG, Haonan He, Maria Parelli et al.
Deterministic Image-to-Image Translation via Denoising Brownian Bridge Models with Dual Approximators
Bohan Xiao, PEIYONG WANG, Qisheng He et al.
MEGADance: Mixture-of-Experts Architecture for Genre-Aware 3D Dance Generation
kaixing yang, Xulong Tang, Ziqiao Peng et al.
Scendi Score: Prompt‑Aware Diversity Evaluation via Schur Complement of CLIP Embeddings
Azim Ospanov, Mohammad Jalali, Farzan Farnia
BEDLAM2.0: Synthetic humans and cameras in motion
Joachim Tesch, Giorgio Becherini, Prerana Achar et al.
Constrained Optimization From a Control Perspective via Feedback Linearization
Runyu Zhang, Arvind Raghunathan, Jeff Shamma et al.
MEMOIR: Lifelong Model Editing with Minimal Overwrite and Informed Retention for LLMs
Ke Wang, Yiming QIN, Nikolaos Dimitriadis et al.
The Fluorescent Veil: A Stealthy and Effective Physical Adversarial Patch Against Traffic Sign Recognition
Shuai Yuan, Xingshuo Han, Hongwei Li et al.
Learning to Highlight Audio by Watching Movies
Chao Huang, Ruohan Gao, J. M. F. Tsang et al.
MaterialRefGS: Reflective Gaussian Splatting with Multi-view Consistent Material Inference
Wenyuan Zhang, Jimin Tang, Weiqi Zhang et al.
Modeling Thousands of Human Annotators for Generalizable Text-to-Image Person Re-identification
Jiayu Jiang, Changxing Ding, Wentao Tan et al.
Flow Matching-Based Autonomous Driving Planning with Advanced Interactive Behavior Modeling
Tianyi Tan, Yinan Zheng, Ruiming Liang et al.
Dynamic Dictionary Learning for Remote Sensing Image Segmentation
Xuechao Zou, Yue Li, Shun Zhang et al.
WorldWeaver: Generating Long-Horizon Video Worlds via Rich Perception
Zhiheng Liu, Xueqing Deng, Shoufa Chen et al.
Context-Enhanced Memory-Refined Transformer for Online Action Detection
Zhanzhong Pang, Fadime Sener, Angela Yao
Toward Robust Neural Reconstruction from Sparse Point Sets
Amine Ouasfi, Shubhendu Jena, Eric Marchand et al.
Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation
Wenbo Zhang, Tianrun Hu, Hanbo Zhang et al.
BecomingLit: Relightable Gaussian Avatars with Hybrid Neural Shading
Jonathan Schmidt, Simon Giebenhain, Matthias Niessner
ETCH: Generalizing Body Fitting to Clothed Humans via Equivariant Tightness
Boqian Li, Zeyu Cai, Michael Black et al.
Glocal Information Bottleneck for Time Series Imputation
Jie Yang, Kexin Zhang, Guibin Zhang et al.
Understanding and Rectifying Safety Perception Distortion in VLMs
Xiaohan Zou, Jian Kang, George Kesidis et al.
HyperNet Fields: Efficiently Training Hypernetworks without Ground Truth by Learning Weight Trajectories
Eric Hedlin, Munawar Hayat, Fatih Porikli et al.
Understanding Fine-tuning CLIP for Open-vocabulary Semantic Segmentation in Hyperbolic Space
Zelin Peng, Zhengqin Xu, Zhilin Zeng et al.
ExGra-Med: Extended Context Graph Alignment for Medical Vision-Language Models
Duy M. H. Nguyen, Nghiem Diep, Trung Nguyen et al.
HyperLoRA: Parameter-Efficient Adaptive Generation for Portrait Synthesis
Mengtian Li, Jinshu Chen, Wanquan Feng et al.
L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models
Xiaohao Liu, Xiaobo Xia, Weixiang Zhao et al.
Zero-Shot 4D Lidar Panoptic Segmentation
Yushan Zhang, Aljoša Ošep, Laura Leal-Taixe et al.
Precise Action-to-Video Generation Through Visual Action Prompts
Yuang Wang, Chao Wen, Haoyu Guo et al.
Let Me Think! A Long Chain of Thought Can Be Worth Exponentially Many Short Ones
Parsa Mirtaheri, Ezra Edelman, Samy Jelassi et al.
The Structural Complexity of Matrix-Vector Multiplication
Emile Anand, Jan van den Brand, Rose McCarty
Improving Transferable Targeted Attacks with Feature Tuning Mixup
Kaisheng Liang, Xuelong Dai, Yanjie Li et al.
The Silent Assistant: NoiseQuery as Implicit Guidance for Goal-Driven Image Generation
Ruoyu Wang, Huayang Huang, Ye Zhu et al.
VeriThoughts: Enabling Automated Verilog Code Generation using Reasoning and Formal Verification
Patrick Yubeaton, Andre Nakkab, Weihua Xiao et al.
Transformer brain encoders explain human high-level visual responses
Hossein Adeli, Sun Minni, Nikolaus Kriegeskorte
Leveraging BEV Paradigm for Ground-to-Aerial Image Synthesis
Junyan Ye, Jun He, Weijia Li et al.
Unleashing Diffusion Transformers for Visual Correspondence by Modulating Massive Activations
Chaofan Gan, Yuanpeng Tu, Xi Chen et al.
Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs
Kejia Zhang, Keda TAO, Jiasheng Tang et al.
Uni-Renderer: Unifying Rendering and Inverse Rendering Via Dual Stream Diffusion
ZhiFei Chen, Tianshuo Xu, Wenhang Ge et al.
When Are Concepts Erased From Diffusion Models?
Kevin Lu, Nicky Kriplani, Rohit Gandikota et al.
FedVLA: Federated Vision-Language-Action Learning with Dual Gating Mixture-of-Experts for Robotic Manipulation
Cui Miao, Tao Chang, meihan wu et al.
FaceLift: Learning Generalizable Single Image 3D Face Reconstruction from Synthetic Heads
Weijie Lyu, Yi Zhou, Ming-Hsuan Yang et al.
AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play
Ran Xu, Yuchen Zhuang, Zihan Dong et al.
Information Density Principle for MLLM Benchmarks
Chunyi Li, Xiaozhe Li, Zicheng Zhang et al.
Disrupting Model Merging: A Parameter-Level Defense Without Sacrificing Accuracy
JUNHAO WEI, YU ZHE, Jun Sakuma
DiffVsgg: Diffusion-Driven Online Video Scene Graph Generation
Mu Chen, Liulei Li, Wenguan Wang et al.
RAGNet: Large-scale Reasoning-based Affordance Segmentation Benchmark towards General Grasping
Dongming Wu, Yanping Fu, Saike Huang et al.
Open-World Objectness Modeling Unifies Novel Object Detection
Shan Zhang, Yao Ni, Jinhao Du et al.
FreeGave: 3D Physics Learning from Dynamic Videos by Gaussian Velocity
Jinxi Li, Ziyang Song, Siyuan Zhou et al.
Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents
Yunseok Jang, Yeda Song, Sungryull Sohn et al.
ZeroGrasp: Zero-Shot Shape Reconstruction Enabled Robotic Grasping
Shun Iwase, Muhammad Zubair Irshad, Katherine Liu et al.
Geo-Sign: Hyperbolic Contrastive Regularisation for Geometrically Aware Sign Language Translation
Edward Fish, Richard Bowden
PerLA: Perceptive 3D Language Assistant
Guofeng Mei, Wei Lin, Luigi Riz et al.
Motion Synthesis with Sparse and Flexible Keyjoint Control
Inwoo Hwang, Jinseok Bae, Donggeun Lim et al.
R$^2$ec: Towards Large Recommender Models with Reasoning
Runyang You, Yongqi Li, Xinyu Lin et al.
AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs?
Ori Press, Brandon Amos, Haoyu Zhao et al.
Exploring the Visual Feature Space for Multimodal Neural Decoding
Weihao Xia, Cengiz Oztireli
One2Any: One-Reference 6D Pose Estimation for Any Object
Mengya Liu, Siyuan Li, Ajad Chhatkuli et al.
Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers
Divyansh Srivastava, Xiang Zhang, He Wen et al.
Contrastive Test-Time Composition of Multiple LoRA Models for Image Generation
Tuna Meral, Enis Simsar, Federico Tombari et al.
SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs
Jiahui Wang, Zuyan Liu, Yongming Rao et al.
Adaptive Hyper-Graph Convolution Network for Skeleton-based Human Action Recognition with Virtual Connections
Youwei Zhou, Tianyang Xu, Cong Wu et al.
FALCON: An ML Framework for Fully Automated Layout-Constrained Analog Circuit Design
Asal Mehradfar, Xuzhe Zhao, Yilun Huang et al.
Vid2Sim: Generalizable, Video-based Reconstruction of Appearance, Geometry and Physics for Mesh-free Simulation
Chuhao Chen, Zhiyang Dou, Chen Wang et al.
Concept Replacer: Replacing Sensitive Concepts in Diffusion Models via Precision Localization
lingyun zhang, Yu Xie, Yanwei Fu et al.
EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity Understanding
Ege Özsoy, Arda Mamur, Felix Tristram et al.
Convergence of Clipped SGD on Convex $(L_0,L_1)$-Smooth Functions
Ofir Gaash, Kfir Y. Levy, Yair Carmon
On the Zero-shot Adversarial Robustness of Vision-Language Models: A Truly Zero-shot and Training-free Approach
Baoshun Tong, Hanjiang Lai, Yan Pan et al.
Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations
Jeong Hun Yeo, Minsu Kim, Chae Won Kim et al.
Birth and Death of a Rose
Chen Geng, Yunzhi Zhang, Shangzhe Wu et al.
Exact: Exploring Space-Time Perceptive Clues for Weakly Supervised Satellite Image Time Series Semantic Segmentation
Hao Zhu, Yan Zhu, Jiayu Xiao et al.
Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models
Yulei Qin, Gang Li, Zongyi Li et al.
MindGYM: What Matters in Question Synthesis for Thinking-Centric Fine-Tuning?
Zhe Xu, Daoyuan Chen, Zhenqing Ling et al.
MedMax: Mixed-Modal Instruction Tuning for Training Biomedical Assistants
Hritik Bansal, Daniel Israel, Siyan Zhao et al.
Modeling Multiple Normal Action Representations for Error Detection in Procedural Tasks
Wei-Jin Huang, Yuan-Ming Li, Zhi-Wei Xia et al.
Modeling Microenvironment Trajectories on Spatial Transcriptomics with NicheFlow
Kristiyan Sakalyan, Alessandro Palma, Filippo Guerranti et al.
RadarQA: Multi-modal Quality Analysis of Weather Radar Forecasts
Xuming He, Zhiyuan You, Junchao Gong et al.
JTD-UAV: MLLM-Enhanced Joint Tracking and Description Framework for Anti-UAV Systems
Yifan Wang, Jian Zhao, Zhaoxin Fan et al.
Audio-Visual Semantic Graph Network for Audio-Visual Event Localization
Liang Liu, Shuaiyong Li, Yongqiang Zhu
CLOC: Contrastive Learning for Ordinal Classification with Multi-Margin N-pair Loss
Dileepa Pitawela, Gustavo Carneiro, Hsiang-Ting Chen
The Rich and the Simple: On the Implicit Bias of Adam and SGD
Bhavya Vasudeva, Jung Lee, Vatsal Sharan et al.
CoHD: A Counting-Aware Hierarchical Decoding Framework for Generalized Referring Expression Segmentation
Zhuoyan Luo, Yinghao Wu, Tianheng Cheng et al.
Unsupervised Joint Learning of Optical Flow and Intensity with Event Cameras
Shuang Guo, Friedhelm Hamann, Guillermo Gallego
MIRAGE: A Benchmark for Multimodal Information-Seeking and Reasoning in Agricultural Expert-Guided Conversations
Vardhan Dongre, Chi Gui, Shubham Garg et al.
FDS: Frequency-Aware Denoising Score for Text-Guided Latent Diffusion Image Editing
Yufan Ren, Zicong Jiang, Tong Zhang et al.
D^3-Human: Dynamic Disentangled Digital Human from Monocular Video
Honghu Chen, Bo Peng, Yunfan Tao et al.
Convex Relaxation for Robust Vanishing Point Estimation in Manhattan World
Bangyan Liao, Zhenjun Zhao, Haoang Li et al.
Adversarial Domain Prompt Tuning and Generation for Single Domain Generalization
Zhipeng Xu, De Cheng, XINYANG JIANG et al.
Better Language Model Inversion by Compactly Representing Next-Token Distributions
Murtaza Nazir, Matthew Finlayson, John Morris et al.
Dynamic Stereotype Theory Induced Micro-expression Recognition with Oriented Deformation
Bohao Zhang, Xuejiao Wang, Changbo Wang et al.
macOSWorld: A Multilingual Interactive Benchmark for GUI Agents
Pei Yang, Hai Ci, Mike Zheng Shou
Interpretable Generative Models through Post-hoc Concept Bottlenecks
Akshay R. Kulkarni, Ge Yan, Chung-En Sun et al.
Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs
Yifan Shen, Yuanzhe Liu, Jingyuan Zhu et al.
Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models
Sangwon Jang, June Suk Choi, Jaehyeong Jo et al.
VideoHandles: Editing 3D Object Compositions in Videos Using Video Generative Priors
Juil Koo, Paul Guerrero, Chun-Hao P. Huang et al.
AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction
Junhao Cheng, Yuying Ge, Yixiao Ge et al.
IDFace: Face Template Protection for Efficient and Secure Identification
Sunpill Kim, Seunghun Paik, Chanwoo Hwang et al.
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Wenqi Zhang, Hang Zhang, Xin Li et al.
Multi-Resolution Pathology-Language Pre-training Model with Text-Guided Visual Representation
Shahad Albastaki, Anabia Sohail, IYYAKUTTI IYAPPAN GANAPATHI et al.
Rate-In: Information-Driven Adaptive Dropout Rates for Improved Inference-Time Uncertainty Estimation
Tal Zeevi, Ravid Shwartz-Ziv, Yann LeCun et al.
CHOICE: Benchmarking the Remote Sensing Capabilities of Large Vision-Language Models
Xiao An, Jiaxing Sun, Zihan Gui et al.
Conformal Prediction for Zero-Shot Models
Julio Silva-Rodríguez, Ismail Ben Ayed, Jose Dolz
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh
Shuangkang Fang, I-Chao Shen, Yufeng Wang et al.
Commonsense Video Question Answering through Video-Grounded Entailment Tree Reasoning
Huabin Liu, Filip Ilievski, Cees G. M. Snoek
Single Domain Generalization for Few-Shot Counting via Universal Representation Matching
Xianing Chen, Si Huo, Borui Jiang et al.
Flexible MOF Generation with Torsion-Aware Flow Matching
Nayoung Kim, Seongsu Kim, Sungsoo Ahn
Dynamic Integration of Task-Specific Adapters for Class Incremental Learning
Jiashuo Li, Shaokun Wang, Bo Qian et al.
WeatherGen: A Unified Diverse Weather Generator for LiDAR Point Clouds via Spider Mamba Diffusion
Yang Wu, Yun Zhu, Kaihua Zhang et al.
Joint Relational Database Generation via Graph-Conditional Diffusion Models
Mohamed Amine Ketata, David Lüdke, Leo Schwinn et al.
ProbeSDF: Light Field Probes For Neural Surface Reconstruction
Briac Toussaint, Diego Thomas, Jean-Sébastien Franco
Entropic Time Schedulers for Generative Diffusion Models
Dejan Stancevic, Florian Handke, Luca Ambrogioni
Generating Computational Cognitive models using Large Language Models
Milena Rmus, Akshay Kumar Jagadish, Marvin Mathony et al.
SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis
Junho Kim, Hyunjun Kim, Hosu Lee et al.
Logic.py: Bridging the Gap between LLMs and Constraint Solvers
Pascal Kesseli, Peter O'Hearn, Ricardo Cabral
Mind the Trojan Horse: Image Prompt Adapter Enabling Scalable and Deceptive Jailbreaking
Junxi Chen, Junhao Dong, Xiaohua Xie
Enhancing Facial Privacy Protection via Weakening Diffusion Purification
Ali Salar, Qing Liu, Yingli Tian et al.
Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers
Yixiao Huang, Hanlin Zhu, Tianyu Guo et al.
Failure Prediction at Runtime for Generative Robot Policies
Ralf Römer, Adrian Kobras, Luca Worbis et al.
MOSCATO: Predicting Multiple Object State Change Through Actions
Parnian Zameni, Yuhan Shen, Ehsan Elhamifar
The Overthinker's DIET: Cutting Token Calories with DIfficulty-AwarE Training
Weize Chen, Jiarui yuan, Jin Tailin et al.
Scaling Up Liquid-Resistance Liquid-Capacitance Networks for Efficient Sequence Modeling
Mónika Farsang, Radu Grosu
Sat2City: 3D City Generation from A Single Satellite Image with Cascaded Latent Diffusion
Tongyan Hua, Lutao Jiang, Ying-Cong Chen et al.
Spotting the Unexpected (STU): A 3D LiDAR Dataset for Anomaly Segmentation in Autonomous Driving
Alexey Nekrasov, Malcolm Burdorf, Stewart Worrall et al.
DeGauss: Dynamic-Static Decomposition with Gaussian Splatting for Distractor-free 3D Reconstruction
Rui Wang, Quentin Lohmeyer, Mirko Meboldt et al.
Exploring Multimodal Diffusion Transformers for Enhanced Prompt-based Image Editing
Joonghyuk Shin, Alchan Hwang, Yujin Kim et al.
AutoLUT: LUT-Based Image Super-Resolution with Automatic Sampling and Adaptive Residual Learning
Yuheng Xu, Shijie Yang, Xin Liu et al.
TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition
Xingsong Ye, Yongkun Du, Yunbo Tao et al.
Manual-PA: Learning 3D Part Assembly from Instruction Diagrams
Jiahao Zhang, Anoop Cherian, Cristian Rodriguez-Opazo et al.
Toward Engineering AGI: Benchmarking the Engineering Design Capabilities of LLMs
Xingang Guo, Yaxin Li, XiangYi Kong et al.
2HandedAfforder: Learning Precise Actionable Bimanual Affordances from Human Videos
Marvin Heidinger, Snehal Jauhri, Vignesh Prasad et al.
Efficient Quadratic Corrections for Frank-Wolfe Algorithms
Jannis Halbey, Seta Rakotomandimby, Mathieu Besançon et al.
LongDiff: Training-Free Long Video Generation in One Go
Zhuoling Li, Hossein Rahmani, Qiuhong Ke et al.
Predicting Empirical AI Research Outcomes with Language Models
Jiaxin Wen, Chenglei Si, Yueh-Han Chen et al.
DEAL: Data-Efficient Adversarial Learning for High-Quality Infrared Imaging
Zhu Liu, Zijun Wang, Jinyuan Liu et al.
PoLAR: Polar-Decomposed Low-Rank Adapter Representation
Kai Lion, Liang Zhang, Bingcong Li et al.
From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment
Yucheng Suo, Fan Ma, Linchao Zhu et al.
Semi-Supervised State-Space Model with Dynamic Stacking Filter for Real-World Video Deraining
Shangquan Sun, Wenqi Ren, Juxiang Zhou et al.
Minority-Focused Text-to-Image Generation via Prompt Optimization
Soobin Um, Jong Chul Ye
DiTASK: Multi-Task Fine-Tuning with Diffeomorphic Transformations
Krishna Sri Ipsit Mantri, Carola-Bibiane Schönlieb, Bruno Ribeiro et al.
Jailbreaking the Non-Transferable Barrier via Test-Time Data Disguising
Yongli Xiang, Ziming Hong, Lina Yao et al.
Multiplayer Federated Learning: Reaching Equilibrium with Less Communication
TaeHo Yoon, Sayantan Choudhury, Nicolas Loizou
PeRL: Permutation-Enhanced Reinforcement Learning for Interleaved Vision-Language Reasoning
Yizhen Zhang, Yang Ding, Shuoshuo Zhang et al.
MIRA: Medical Time Series Foundation Model for Real-World Health Data
Hao Li, Bowen Deng, Chang Xu et al.
Efficient Data Selection at Scale via Influence Distillation
Mahdi Nikdan, Vincent Cohen-Addad, Dan Alistarh et al.
Transformers Provably Learn Chain-of-Thought Reasoning with Length Generalization
Yu Huang, Zixin Wen, Aarti Singh et al.
VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization
Sihan Yang, Runsen Xu, Chenhang Cui et al.
Neural Motion Simulator Pushing the Limit of World Models in Reinforcement Learning
Chenjie Hao, Weyl Lu, Yifan Xu et al.
DH-FaceVid-1K: A Large-Scale High-Quality Dataset for Face Video Generation
Donglin Di, He Feng, Wenzhang SUN et al.
LightsOut: Diffusion-based Outpainting for Enhanced Lens Flare Removal
Shr-Ruei Tsai, Wei-Cheng Chang, Jie-Ying Lee et al.
How Benchmark Prediction from Fewer Data Misses the Mark
Guanhua Zhang, Florian E. Dorner, Moritz Hardt
DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis
Yuming Gu, Phong Tran, Yujian Zheng et al.
Top-H Decoding: Adapting the Creativity and Coherence with Bounded Entropy in Text Generation
Erfan Baghaei Potraghloo, Seyedarmin Azizi, Souvik Kundu et al.
NLPrompt: Noise-Label Prompt Learning for Vision-Language Models
Bikang Pan, Qun Li, Xiaoying Tang et al.
Watermarking Autoregressive Image Generation
Nikola Jovanović, Ismail Labiad, Tomas Soucek et al.
Pos3R: 6D Pose Estimation for Unseen Objects Made Easy
Weijian Deng, Dylan Campbell, Chunyi Sun et al.
Street Gaussians without 3D Object Tracker
Ruida Zhang, Chengxi Li, Chenyangguang Zhang et al.
egoPPG: Heart Rate Estimation from Eye-Tracking Cameras in Egocentric Systems to Benefit Downstream Vision Tasks
Björn Braun, Rayan Armani, Manuel Meier et al.
Tight Lower Bounds and Improved Convergence in Performative Prediction
Pedram Khorsandi, Rushil Gupta, Mehrnaz Mofakhami et al.
A Polarization-Aided Transformer for Image Deblurring via Motion Vector Decomposition
Duosheng Chen, Shihao Zhou, Jinshan Pan et al.
IterIS: Iterative Inference-Solving Alignment for LoRA Merging
Hongxu chen, Zhen Wang, Runshi Li et al.
Robust Message Embedding via Attention Flow-Based Steganography
Huayuan Ye, Shenzhuo Zhang, Shiqi Jiang et al.
SphereUFormer: A U-Shaped Transformer for Spherical 360 Perception
Yaniv Benny, Lior Wolf
Learning from Neighbors: Category Extrapolation for Long-Tail Learning
Shizhen Zhao, Xin Wen, Jiahui Liu et al.
SpatialCrafter: Unleashing the Imagination of Video Diffusion Models for Scene Reconstruction from Limited Observations
Songchun Zhang, Huiyao Xu, Sitong Guo et al.
Bringing CLIP to the Clinic: Dynamic Soft Labels and Negation-Aware Learning for Medical Analysis
Hanbin Ko, Chang Min Park
L$^2$M: Mutual Information Scaling Law for Long-Context Language Modeling
Zhuo Chen, Oriol Comas, Zhuotao Jin et al.
Frame In-N-Out: Unbounded Controllable Image-to-Video Generation
Boyang Wang, Xuweiyi Chen, Matheus Gadelha et al.
TacoDepth: Towards Efficient Radar-Camera Depth Estimation with One-stage Fusion
Yiran Wang, Jiaqi Li, Chaoyi Hong et al.
Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis
Yousef Yeganeh, Ioannis Charisiadis, Marta Hasny et al.
Object-centric 3D Motion Field for Robot Learning from Human Videos
Zhao-Heng Yin, Sherry Yang, Pieter Abbeel
ChemPile: A 250 GB Diverse and Curated Dataset for Chemical Foundation Models
Adrian Mirza, Nawaf Alampara, Martiño Ríos-García et al.
Detect Any Mirrors: Boosting Learning Reliability on Large-Scale Unlabeled Data with an Iterative Data Engine
Zhaohu Xing, Lihao Liu, Yijun Yang et al.
Enhancing Testing-Time Robustness for Trusted Multi-View Classification in the Wild
Wei Liu, Yufei Chen, Xiaodong Yue
BeliefMapNav: 3D Voxel-Based Belief Map for Zero-Shot Object Navigation
Zibo Zhou, Yue Hu, Lingkai Zhang et al.
Towards Understanding the Mechanisms of Classifier-Free Guidance
Xiang Li, Rongrong Wang, Qing Qu
Understanding Prompt Tuning and In-Context Learning via Meta-Learning
Tim Genewein, Kevin Li, Jordi Grau-Moya et al.
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval
Leqi Shen, Guoqiang Gong, Tianxiang Hao et al.
NoPain: No-box Point Cloud Attack via Optimal Transport Singular Boundary
Zezeng Li, Xiaoyu Du, Na Lei et al.
Locality-Aware Zero-Shot Human-Object Interaction Detection
Sanghyun Kim, Deunsol Jung, Minsu Cho
EnvPoser: Environment-aware Realistic Human Motion Estimation from Sparse Observations with Uncertainty Modeling
Songpengcheng Xia, Yu Zhang, Zhuo Su et al.
RULE: Reinforcement UnLEarning Achieves Forget-retain Pareto Optimality
Chenlong Zhang, Zhuoran Jin, Hongbang Yuan et al.
SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions
Xianzhe Fan, Xuhui Zhou, Chuanyang Jin et al.
Enforcing Hard Linear Constraints in Deep Learning Models with Decision Rules
Gonzalo E. Constante, Hao Chen, Can Li
Flash-Split: 2D Reflection Removal with Flash Cues and Latent Diffusion Separation
Tianfu Wang, Mingyang Xie, Haoming Cai et al.
Novel View Synthesis with Pixel-Space Diffusion Models
Noam Elata, Bahjat Kawar, Yaron Ostrovsky-Berman et al.
VASparse: Towards Efficient Visual Hallucination Mitigation via Visual-Aware Token Sparsification
Xianwei Zhuang, Zhihong Zhu, Yuxin Xie et al.
OSLoPrompt: Bridging Low-Supervision Challenges and Open-Set Domain Generalization in CLIP
Mohamad Hassan N C, Divyam Gupta, Mainak Singha et al.