Most Cited 2025 "covariance structure" Papers
22,274 papers found • Page 49 of 112
Conference
AGO: Adaptive Grounding for Open World 3D Occupancy Prediction
Peizheng Li, Shuxiao Ding, You Zhou et al.
ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval
Eric Xing, Pranavi Kolouju, Robert Pless et al.
WeakMCN: Multi-task Collaborative Network for Weakly Supervised Referring Expression Comprehension and Segmentation
Silin Cheng, Yang Liu, Xinwei He et al.
Local Dense Logit Relations for Enhanced Knowledge Distillation
Liuchi Xu, Kang Liu, Jinshuai Liu et al.
Uni-LoRA: One Vector is All You Need
Kaiyang Li, Shaobo Han, Qing Su et al.
Continual Release Moment Estimation with Differential Privacy
Nikita Kalinin, Jalaj Upadhyay, Christoph Lampert
Feel-Good Thompson Sampling for Contextual Bandits: a Markov Chain Monte Carlo Showdown
Emile Anand, Sarah Liaw
SFDM: Robust Decomposition of Geometry and Reflectance for Realistic Face Rendering from Sparse-view Images
Daisheng Jin, Jiangbei Hu, Baixin Xu et al.
Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis
Boming Miao, Chunxiao Li, Xiaoxiao Wang et al.
FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization
Hao Chen, Shell Xu Hu, Wayne Luk et al.
FFR: Frequency Feature Rectification for Weakly Supervised Semantic Segmentation
Ziqian Yang, Xinqiao Zhao, Xiaolei Wang et al.
TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action Recognition
yilong wang, Zilin Gao, Qilong Wang et al.
GG-SSMs: Graph-Generating State Space Models
Nikola Zubic, Davide Scaramuzza
Cheb-GR: Rethinking K-nearest Neighbor Search in Re-ranking for Person Re-identification
Jinxi Yang, He Li, Bo Du et al.
Hierarchical Retrieval: The Geometry and a Pretrain-Finetune Recipe
Chong You, Rajesh Jayaram, Ananda Theertha Suresh et al.
GeneMAN: Generalizable Single-Image 3D Human Reconstruction from Multi-Source Human Data
Wentao Wang, Hang Ye, Fangzhou Hong et al.
Learning to Instruct for Visual Instruction Tuning
Zhihan Zhou, Feng Hong, JIAAN LUO et al.
DOF-GS: Adjustable Depth-of-Field 3D Gaussian Splatting for Post-Capture Refocusing, Defocus Rendering and Blur Removal
Yujie Wang, Praneeth Chakravarthula, Baoquan Chen
Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle
Miroslav Purkrabek, Jiri Matas
UA-Pose: Uncertainty-Aware 6D Object Pose Estimation and Online Object Completion with Partial References
Ming-Feng Li, Xin Yang, Fu-En Wang et al.
3D-AVS: LiDAR-based 3D Auto-Vocabulary Segmentation
Weijie Wei, Osman Ülger, Fatemeh Karimi Nejadasl et al.
Does Object Binding Naturally Emerge in Large Pretrained Vision Transformers?
Yihao Li, Saeed Salehi, Lyle Ungar et al.
Classifier-to-Bias: Toward Unsupervised Automatic Bias Detection for Visual Classifiers
Quentin Guimard, Moreno D'Incà, Massimiliano Mancini et al.
See Further When Clear: Curriculum Consistency Model
Yunpeng Liu, Boxiao Liu, Yi Zhang et al.
SynBrain: Enhancing Visual-to-fMRI Synthesis via Probabilistic Representation Learning
Weijian Mai, Jiamin Wu, Yu Zhu et al.
RGB-to-Polarization Estimation: A New Task and Benchmark Study
Beibei Lin, Zifeng Yuan, Tingting Chen
How Well Can Differential Privacy Be Audited in One Run?
Amit Keinan, Moshe Shenfeld, Katrina Ligett
PQPP: A Joint Benchmark for Text-to-Image Prompt and Query Performance Prediction
Eduard Poesina, Adriana Valentina Costache, Adrian-Gabriel Chifu et al.
SplitFlow: Flow Decomposition for Inversion-Free Text-to-Image Editing
Sung-Hoon Yoon, Minghan Li, Gaspard Beaudouin et al.
AnyMoLe: Any Character Motion In-betweening Leveraging Video Diffusion Models
Kwan Yun, Seokhyeon Hong, Chaelin Kim et al.
RefLoRA: Refactored Low-Rank Adaptation for Efficient Fine-Tuning of Large Models
Yilang Zhang, Bingcong Li, Georgios Giannakis
Unlearned but Not Forgotten: Data Extraction after Exact Unlearning in LLM
Xiaoyu Wu, Yifei Pang, Terrance Liu et al.
GEOPARD: Geometric Pretraining for Articulation Prediction in 3D Shapes
Pradyumn Goyal, Dmitrii Petrov, Sheldon Andrews et al.
SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction Scenarios
Lingwei Dang, Ruizhi Shao, Hongwen Zhang et al.
T2SMark: Balancing Robustness and Diversity in Noise-as-Watermark for Diffusion Models
Jindong Yang, Han Fang, Weiming Zhang et al.
EDM: Equirectangular Projection-Oriented Dense Kernelized Feature Matching
Dongki Jung, Jaehoon Choi, Yonghan Lee et al.
DAMamba: Vision State Space Model with Dynamic Adaptive Scan
Tanzhe Li, Caoshuo Li, Jiayi Lyu et al.
DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery
Utkarsh Mall, Cheng Perng Phoo, Mia Chiquier et al.
STEP: A Unified Spiking Transformer Evaluation Platform for Fair and Reproducible Benchmarking
Sicheng Shen, Dongcheng Zhao, Linghao Feng et al.
HybridMQA: Exploring Geometry-Texture Interactions for Colored Mesh Quality Assessment
Armin Shafiee Sarvestani, Sheyang Tang, Zhou Wang
MV-CoLight: Efficient Object Compositing with Consistent Lighting and Shadow Generation
Kerui Ren, Jiayang Bai, Linning Xu et al.
Spectral Analysis of Representational Similarity with Limited Neurons
Hyunmo Kang, Abdulkadir Canatar, SueYeon Chung
VITRIX-UniViTAR: Unified Vision Transformer with Native Resolution
Limeng Qiao, Yiyang Gan, Bairui Wang et al.
ElasticMM: Efficient Multimodal LLMs Serving with Elastic Multimodal Parallelism
Zedong Liu, Shenggan Cheng, Guangming Tan et al.
FIction: 4D Future Interaction Prediction from Video
Kumar Ashutosh, Georgios Pavlakos, Kristen Grauman
TopNet: Transformer-Efficient Occupancy Prediction Network for Octree-Structured Point Cloud Geometry Compression
Xinjie Wang, Yifan Zhang, Ting Liu et al.
How To Make Your Cell Tracker Say "I dunno!"
Richard D Paul, Johannes Seiffarth, David Rügamer et al.
MetaSlot: Break Through the Fixed Number of Slots in Object-Centric Learning
Hongjia Liu, Rongzhen Zhao, Haohan Chen et al.
Option-aware Temporally Abstracted Value for Offline Goal-Conditioned Reinforcement Learning
Hongjoon Ahn, Heewoong Choi, Jisu Han et al.
FACE: Faithful Automatic Concept Extraction
Dipkamal Bhusal, Michael Clifford, Sara Rampazzi et al.
RC-AutoCalib: An End-to-End Radar-Camera Automatic Calibration Network
Van-Tin Luu, Yong-Lin Cai, Vu-Hoang Tran et al.
Avoiding exp(R) scaling in RLHF through Preference-based Exploration
Mingyu Chen, Yiding Chen, Wen Sun et al.
Instant GaussianImage: A Generalizable and Self-Adaptive Image Representation via 2D Gaussian Splatting
Zhaojie Zeng, Yuesong Wang, Chao Yang et al.
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent
Shaojie Zhang, Ruoceng Zhang, Pei Fu et al.
ATA: Adaptive Transformation Agent for Text-Guided Subject-Position Variable Background Inpainting
Yizhe Tang, Zhimin Sun, Yuzhen Du et al.
DAViD: Data-efficient and Accurate Vision Models from Synthetic Data
Fatemeh Saleh, Sadegh Aliakbarian, Charlie Hewitt et al.
TITAN: A Trajectory-Informed Technique for Adaptive Parameter Freezing in Large-Scale VQE
Yifeng Peng, Xinyi Li, Samuel Yen-Chi Chen et al.
ReDi: Rectified Discrete Flow
Jaehoon Yoo, Wonjung Kim, Seunghoon Hong
Hallucinatory Image Tokens: A Training-free EAZY Approach to Detecting and Mitigating Object Hallucinations in LVLMs
Liwei Che, Qingze T Liu, Jing Jia et al.
SPAZER: Spatial-Semantic Progressive Reasoning Agent for Zero-shot 3D Visual Grounding
Zhao Jin, Rong-Cheng Tu, Jingyi Liao et al.
M2SFormer: Multi-Spectral and Multi-Scale Attention with Edge-Aware Difficulty Guidance for Image Forgery Localization
Ju-Hyeon Nam, Dong-Hyun Moon, Sang-Chul Lee
Constrained Entropic Unlearning: A Primal-Dual Framework for Large Language Models
Taha Entesari, Arman Hatami, Rinat Khaziev et al.
DFM: Differentiable Feature Matching for Anomaly Detection
Wu Sheng, Yimi Wang, Xudong Liu et al.
Regret Analysis of Average-Reward Unichain MDPs via an Actor-Critic Approach
Swetha Ganesh, Vaneet Aggarwal
VFRTok: Variable Frame Rates Video Tokenizer with Duration-Proportional Information Assumption
Tianxiong Zhong, Xingye Tian, Boyuan Jiang et al.
CrossAD: Time Series Anomaly Detection with Cross-scale Associations and Cross-window Modeling
Beibu Li, Qichao Shentu, Yang Shu et al.
High Temporal Consistency through Semantic Similarity Propagation in Semi-Supervised Video Semantic Segmentation for Autonomous Flight
Cédric Vincent, Taehyoung Kim, Henri Meeß
FedSPA: Generalizable Federated Graph Learning under Homophily Heterogeneity
Zihan Tan, Guancheng Wan, Wenke Huang et al.
Faster and Better 3D Splatting via Group Training
Chengbo Wang, Guozheng Ma, Yizhen Lao et al.
Towards Source-Free Machine Unlearning
Sk Miraj Ahmed, Umit Basaran, Dripta S. Raychaudhuri et al.
Finsler Multi-Dimensional Scaling: Manifold Learning for Asymmetric Dimensionality Reduction and Embedding
Thomas Dagès, Simon Weber, Ya-Wei Eileen Lin et al.
Flow4Agent: Long-form Video Understanding via Motion Prior from Optical Flow
Ruyang Liu, Shangkun Sun, Haoran Tang et al.
GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation
Wentao Hu, Shunkai Li, Ziqiao Peng et al.
ConformalSAM: Unlocking the Potential of Foundational Segmentation Models in Semi-Supervised Semantic Segmentation with Conformal Prediction
Danhui Chen, Ziquan Liu, Chuxi Yang et al.
End-to-End Multi-Modal Diffusion Mamba
Chunhao Lu, Qiang Lu, Meichen Dong et al.
DocSAM: Unified Document Image Segmentation via Query Decomposition and Heterogeneous Mixed Learning
Xiao-Hui Li, Fei Yin, Cheng-Lin Liu
ImageSentinel: Protecting Visual Datasets from Unauthorized Retrieval-Augmented Image Generation
Ziyuan Luo, Yangyi Zhao, Ka Chun Cheung et al.
THUNDER: Tile-level Histopathology image UNDERstanding benchmark
Pierre Marza, Leo Fillioux, Sofiène Boutaj et al.
Underwater Visual SLAM with Depth Uncertainty and Medium Modeling
Rui Liu, Sheng Fan, Wenguan Wang et al.
Electromyography-Informed Facial Expression Reconstruction for Physiological-Based Synthesis and Analysis
Tim Büchner, Christoph Anders, Orlando Guntinas-Lichius et al.
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights
Zhiyuan Liang, Dongwen Tang, Yuhao Zhou et al.
Robust Audio-Visual Segmentation via Audio-Guided Visual Convergent Alignment
Chen Liu, Peike Li, Liying Yang et al.
EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow
Yixiang Chen, Peiyan Li, Yan Huang et al.
Flatness is Necessary, Neural Collapse is Not: Rethinking Generalization via Grokking
Ting Han, Linara Adilova, Henning Petzka et al.
Enhanced Visual-Semantic Interaction with Tailored Prompts for Pedestrian Attribute Recognition
Junyi Wu, Yan Huang, Min Gao et al.
Learning from Synchronization: Self-Supervised Uncalibrated Multi-View Person Association in Challenging Scenes
Keqi Chen, vinkle srivastav, Didier MUTTER et al.
GenM3: Generative Pretrained Multi-path Motion Model for Text Conditional Human Motion Generation
Junyu Shi, Lijiang LIU, Yong Sun et al.
TRAP: Targeted Redirecting of Agentic Preferences
Hangoo Kang, Jehyeok Yeon, Gagandeep Singh
Notes-guided MLLM Reasoning: Enhancing MLLM with Knowledge and Visual Notes for Visual Question Answering
Wenlong Fang, Qiaofeng Wu, Jing Chen et al.
RTMap: Real-Time Recursive Mapping with Change Detection and Localization
Yuheng Du, Sheng Yang, Lingxuan Wang et al.
Constructing Ophthalmic MLLM for Positioning-diagnosis Collaboration Through Clinical Cognitive Chain Reasoning
Xinyao Liu, Diping Song
Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning
Jiuyang Dong, Junjun Jiang, Kui Jiang et al.
Towards Straggler-Resilient Split Federated Learning: An Unbalanced Update Approach
Dandan Liang, Jianing Zhang, Evan Chen et al.
UniRes: Universal Image Restoration for Complex Degradations
Mo Zhou, Keren Ye, Mauricio Delbracio et al.
FiRe: Fixed-points of Restoration Priors for Solving Inverse Problems
Matthieu Terris, Ulugbek Kamilov, Thomas Moreau
Plug-and-Play Versatile Compressed Video Enhancement
Huimin Zeng, Jiacheng Li, Zhiwei Xiong
SMMILE: An expert-driven benchmark for multimodal medical in-context learning
Melanie Rieff, Maya Varma, Ossian Rabow et al.
Learning to price with resource constraints: from full information to machine-learned prices
Ruicheng Ao, Jiashuo Jiang, David Simchi-Levi
StyleMotif: Multi-Modal Motion Stylization using Style-Content Cross Fusion
Ziyu Guo, Young-Yoon Lee, Joseph Liu et al.
HalLoc: Token-level Localization of Hallucinations for Vision Language Models
Eunkyu Park, Minyeong Kim, Gunhee Kim
VAGUE: Visual Contexts Clarify Ambiguous Expressions
Heejeong Nam, Jinwoo Ahn, Keummin Ka et al.
Action Detail Matters: Refining Video Recognition with Local Action Queries
Mengmeng Wang, Zeyi Huang, Xiangjie Kong et al.
MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation
Sankalp Sinha, Mohammad Sadil Khan, Muhammad Usama et al.
PDEfuncta: Spectrally-Aware Neural Representation for PDE Solution Modeling
Minju Jo, Woojin Cho, Uvini Balasuriya Mudiyanselage et al.
LightFair: Towards an Efficient Alternative for Fair T2I Diffusion via Debiasing Pre-trained Text Encoders
Boyu Han, Qianqian Xu, Shilong Bao et al.
COALA: Numerically Stable and Efficient Framework for Context-Aware Low-Rank Approximation
Uliana Parkina, Maxim Rakhuba
Attention! Your Vision Language Model Could Be Maliciously Manipulated
Xiaosen Wang, Shaokang Wang, Zhijin Ge et al.
AdaHuman: Animatable Detailed 3D Human Generation with Compositional Multiview Diffusion
Yangyi Huang, Ye Yuan, Xueting Li et al.
GO-N3RDet: Geometry Optimized NeRF-enhanced 3D Object Detector
Zechuan Li, Hongshan Yu, Yihao Ding et al.
DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution
Yuzhong Zhao, Feng Liu, Yue Liu et al.
LibraGrad: Balancing Gradient Flow for Universally Better Vision Transformer Attributions
Faridoun Mehri, Mahdieh Baghshah, Mohammad Taher Pilehvar
AutoDiscovery: Open-ended Scientific Discovery via Bayesian Surprise
Dhruv Agarwal, Bodhisattwa Prasad Majumder, Reece Adamson et al.
The Complexity of Symmetric Equilibria in Min-Max Optimization and Team Zero-Sum Games
Ioannis Anagnostides, Ioannis Panageas, Tuomas Sandholm et al.
Generative Photomontage
Sean J. Liu, Nupur Kumari, Ariel Shamir et al.
Deep RL Needs Deep Behavior Analysis: Exploring Implicit Planning by Model-Free Agents in Open-Ended Environments
Riley Simmons-Edler, Ryan Badman, Felix Berg et al.
GPS as a Control Signal for Image Generation
Chao Feng, Ziyang Chen, Aleksander Holynski et al.
DMesh++: An Efficient Differentiable Mesh for Complex Shapes
Sanghyun Son, Matheus Gadelha, Yang Zhou et al.
Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM
Zinuo Li, Xian Zhang, Yongxin Guo et al.
Who You Are Matters: Bridging Interests and Social Roles via LLM-Enhanced Logic Recommendation
Qing Yu, Xiaobei Wang, Shuchang Liu et al.
Diving into the Fusion of Monocular Priors for Generalized Stereo Matching
Chengtang Yao, Lidong Yu, Zhidan Liu et al.
I2VGuard: Safeguarding Images against Misuse in Diffusion-based Image-to-Video Models
Dongnan Gui, Xun Guo, Wengang Zhou et al.
SGFormer: Satellite-Ground Fusion for 3D Semantic Scene Completion
Xiyue Guo, Jiarui Hu, Junjie Hu et al.
One Sample is Enough to Make Conformal Prediction Robust
Soroush H. Zargarbashi, Mohammad Sadegh Akhondzadeh, Aleksandar Bojchevski
DAGSM: Disentangled Avatar Generation with GS-enhanced Mesh
Jingyu Zhuang, Di Kang, Linchao Bao et al.
Video Individual Counting for Moving Drones
Yaowu Fan, Jia Wan, Tao Han et al.
Moderating the Generalization of Score-based Generative Model
Wan Jiang, He Wang, Xin Zhang et al.
SemGes: Semantics-aware Co-Speech Gesture Generation using Semantic Coherence and Relevance Learning
Lanmiao Liu, Esam Ghaleb, asli ozyurek et al.
Pinco: Position-induced Consistent Adapter for Diffusion Transformer in Foreground-conditioned Inpainting
Guangben Lu, Yuzhen N/A, Zhimin Sun et al.
Rethinking Layered Graphic Design Generation with a Top-Down Approach
Jingye Chen, Zhaowen Wang, Nanxuan Zhao et al.
Exact and Linear Convergence for Federated Learning under Arbitrary Client Participation is Attainable
Bicheng Ying, Zhe Li, Haibo Yang
SceneCrafter: Controllable Multi-View Driving Scene Editing
Zehao Zhu, Yuliang Zou, Chiyu “Max” Jiang et al.
Monitoring Risks in Test-Time Adaptation
Mona Schirmer, Metod Jazbec, Christian Andersson Naesseth et al.
Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation
Yusuke Hirota, Ryo Hachiuma, Boyi Li et al.
Traversal Verification for Speculative Tree Decoding
Yepeng Weng, Qiao Hu, Xujie Chen et al.
Mono2Stereo: A Benchmark and Empirical Study for Stereo Conversion
Songsong Yu, Yuxin Chen, Zhongang Qi et al.
SHeaP: Self-supervised Head Geometry Predictor Learned via 2D Gaussians
Liam Schoneveld, Zhe Chen, Davide Davoli et al.
TARO: Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis
Tri Ton, Ji Woo Hong, Chang Yoo
You Think, You ACT: The New Task of Arbitrary Text to Motion Generation
Runqi Wang, Caoyuan Ma, Guopeng Li et al.
VolumetricSMPL: A Neural Volumetric Body Model for Efficient Interactions, Contacts, and Collisions
Marko Mihajlovic, Siwei Zhang, Gen Li et al.
Project-Probe-Aggregate: Efficient Fine-Tuning for Group Robustness
Beier Zhu, Jiequan Cui, Hanwang Zhang et al.
Local-Global Associative Frames for Symmetry-Preserving Crystal Structure Modeling
haowei hua, Wanyu Lin
Constraint-Aware Feature Learning for Parametric Point Cloud
Xi Cheng, Ruiqi Lei, Di Huang et al.
On the Robustness of Transformers against Context Hijacking for Linear Classification
Tianle Li, Chenyang Zhang, Xingwu Chen et al.
RoboTron-Sim: Improving Real-World Driving via Simulated Hard-Case
Baihui Xiao, Chengjian Feng, Zhijian Huang et al.
PHATNet: A Physics-guided Haze Transfer Network for Domain-adaptive Real-world Image Dehazing
Fu-Jen Tsai, Yan-Tsung Peng, Yen-Yu Lin et al.
Stable Part Diffusion 4D: Multi-View RGB and Kinematic Parts Video Generation
Hao Zhang, Chun-Han Yao, Simon Donné et al.
Pioneering 4-Bit FP Quantization for Diffusion Models: Mixup-Sign Quantization and Timestep-Aware Fine-Tuning
Maosen Zhao, Pengtao Chen, Chong Yu et al.
NeuraLeaf: Neural Parametric Leaf Models with Shape and Deformation Disentanglement
Yang Yang, Dongni Mao, Hiroaki Santo et al.
Diffusion-based Realistic Listening Head Generation via Hybrid Motion Modeling
Yinuo Wang, Yanbo Fan, Xuan Wang et al.
Measuring Scientific Capabilities of Language Models with a Systems Biology Dry Lab
Haonan Duan, Stephen Lu, Caitlin F Harrigan et al.
PMA: Towards Parameter-Efficient Point Cloud Understanding via Point Mamba Adapter
Yaohua Zha, Yanzi Wang, Hang Guo et al.
Who is a Better Talker: Subjective and Objective Quality Assessment for AI-Generated Talking Heads
Yingjie Zhou, Jiezhang Cao, Zicheng Zhang et al.
Revisiting Image Fusion for Multi-Illuminant White-Balance Correction
David Serrano, Aditya Arora, Luis Herranz et al.
Breaking the Encoder Barrier for Seamless Video-Language Understanding
Handong Li, Yiyuan Zhang, Longteng Guo et al.
Open-ended Hierarchical Streaming Video Understanding with Vision Language Models
Hyolim Kang, Yunsu Park, Youngbeom Yoo et al.
AutoSSVH: Exploring Automated Frame Sampling for Efficient Self-Supervised Video Hashing
Niu Lian, Jun Li, Jinpeng Wang et al.
SCAN: Bootstrapping Contrastive Pre-training for Data Efficiency
Yangyang Guo, Mohan Kankanhalli
Robust and Efficient 3D Gaussian Splatting for Urban Scene Reconstruction
Zhensheng Yuan, Haozhi Huang, Zhen Xiong et al.
CoralVQA: A Large-Scale Visual Question Answering Dataset for Coral Reef Image Understanding
hongyong han, Wei Wang, Gaowei Zhang et al.
Empowering Large Language Models with 3D Situation Awareness
Zhihao Yuan, Yibo Peng, Jinke Ren et al.
CVFusion: Cross-View Fusion of 4D Radar and Camera for 3D Object Detection
Hanzhi Zhong, Zhiyu Xiang, Ruoyu Xu et al.
X2-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction
Weihao Yu, Yuanhao Cai, Ruyi Zha et al.
Emergence of Linear Truth Encodings in Language Models
Shauli Ravfogel, Gilad Yehudai, Tal Linzen et al.
Parameterized Blur Kernel Prior Learning for Local Motion Deblurring
Zhenxuan Fang, Fangfang Wu, Tao Huang et al.
Compositional Caching for Training-free Open-vocabulary Attribute Detection
Marco Garosi, Alessandro Conti, Gaowen Liu et al.
Signs as Tokens: A Retrieval-Enhanced Multilingual Sign Language Generator
Ronglai Zuo, Rolandos Alexandros Potamias, Evangelos Ververas et al.
Face Forgery Video Detection via Temporal Forgery Cue Unraveling
Zonghui Guo, YingJie Liu, Jie Zhang et al.
Social Debiasing for Fair Multi-modal LLMs
Harry Cheng, Yangyang Guo, Qingpei Guo et al.
CAP: Evaluation of Persuasive and Creative Image Generation
Aysan Aghazadeh, Adriana Kovashka
GT-Loc: Unifying When and Where in Images through a Joint Embedding Space
David G. Shatwell, Ishan Rajendrakumar Dave, Swetha Sirnam et al.
GraSS: Scalable Data Attribution with Gradient Sparsification and Sparse Projection
Pingbang Hu, Joseph Melkonian, Weijing Tang et al.
I2-NeRF: Learning Neural Radiance Fields Under Physically-Grounded Media Interactions
Shuhong Liu, Lin Gu, Ziteng Cui et al.
DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion
Maksim Siniukov, Di Chang, Minh Tran et al.
Contrastive Representations for Temporal Reasoning
Alicja Ziarko, Michał Bortkiewicz, Michał Zawalski et al.
SAM-REF: Introducing Image-Prompt Synergy during Interaction for Detail Enhancement in the Segment Anything Model
Chongkai Yu, Ting Liu, Li Anqi et al.
Delta Attention: Fast and Accurate Sparse Attention Inference by Delta Correction
Jeffrey Willette, Heejun Lee, Sung Ju Hwang
PS-Diffusion: Photorealistic Subject-Driven Image Editing with Disentangled Control and Attention
Weicheng Wang, Guoli Jia, Zhongqi Zhang et al.
Dimension-Reduction Attack! Video Generative Models are Experts on Controllable Image Synthesis
Hengyuan Cao, Yutong Feng, Biao Gong et al.
Scale Your Instructions: Enhance the Instruction-Following Fidelity of Unified Image Generation Model by Self-Adaptive Attention Scaling
Chao Zhou, Tianyi Wei, Nenghai Yu
Color Matching Using Hypernetwork-Based Kolmogorov-Arnold Networks
Artem Nikonorov, Georgy Perevozchikov, Andrei Korepanov et al.
A Visual Leap in CLIP Compositionality Reasoning through Generation of Counterfactual Sets
Zexi Jia, Chuanwei Huang, Yeshuang Zhu et al.
Resonance: Learning to Predict Social-Aware Pedestrian Trajectories as Co-Vibrations
Conghao Wong, Ziqian Zou, Beihao Xia
CodeCrash: Exposing LLM Fragility to Misleading Natural Language in Code Reasoning
Man Ho Lam, Chaozheng Wang, Jen-Tse Huang et al.
On the creation of narrow AI: hierarchy and nonlocality of neural network skills
Eric Michaud, Asher Parker-Sartori, Max Tegmark
Kernel Density Steering: Inference-Time Scaling via Mode Seeking for Image Restoration
Yuyang Hu, Kangfu Mei, Mojtaba Ardakani et al.
DoF-Gaussian: Controllable Depth-of-Field for 3D Gaussian Splatting
Liao Shen, Tianqi Liu, Huiqiang Sun et al.
Generalizable Object Re-Identification via Visual In-Context Prompting
Zhizhong Huang, Xiaoming Liu
Is `Right' Right? Enhancing Object Orientation Understanding in Multimodal Large Language Models through Egocentric Instruction Tuning
JiHyeok Jung, EunTae Kim, SeoYeon Kim et al.
Grouped Speculative Decoding for Autoregressive Image Generation
Junhyuk So, Juncheol Shin, Hyunho Kook et al.
Online Language Splatting
Saimouli Katragadda, Cho-Ying Wu, Yuliang Guo et al.
PiKE: Adaptive Data Mixing for Large-Scale Multi-Task Learning Under Low Gradient Conflicts
Zeman Li, Yuan Deng, Peilin Zhong et al.
Invisible Backdoor Attack against Self-supervised Learning
Hanrong Zhang, Zhenting Wang, Boheng Li et al.
VA-MoE: Variables-Adaptive Mixture of Experts for Incremental Weather Forecasting
Hao Chen, Tao Han, Song Guo et al.
HoliGS: Holistic Gaussian Splatting for Embodied View Synthesis
Xiaoyuan Wang, Yizhou Zhao, Botao Ye et al.
Topology-Aware Conformal Prediction for Stream Networks
Jifan Zhang, Fangxin Wang, Zihe Song et al.
Spherical Manifold Guided Diffusion Model for Panoramic Image Generation
Xiancheng Sun, Mai Xu, Shengxi Li et al.
Performative Validity of Recourse Explanations
Gunnar König, Hidde Fokkema, Timo Freiesleben et al.
PARCO: Parallel AutoRegressive Models for Multi-Agent Combinatorial Optimization
Federico Berto, Chuanbo Hua, Laurin Luttmann et al.
BlockScan: Detecting Anomalies in Blockchain Transactions
Jiahao Yu, Xian Wu, Hao Liu et al.
Causal Head Gating: A Framework for Interpreting Roles of Attention Heads in Transformers
Andrew Nam, Henry Conklin, Yukang Yang et al.