Most Cited CVPR "high-level semantic understanding" Papers
5,589 papers found • Page 28 of 28
Conference
SVFR: A Unified Framework for Generalized Video Face Restoration
Zhiyao Wang, Xu Chen, Chengming Xu et al.
Decoupling Fine Detail and Global Geometry for Compressed Depth Map Super-Resolution
Huan Zheng, Wencheng Han, Jianbing Shen
Multi-Scale Neighborhood Occupancy Masked Autoencoder for Self-Supervised Learning in LiDAR Point Clouds
Mohamed Abdelsamad, Michael Ulrich, Claudius Glaeser et al.
CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image
Jingshun Huang, Haitao Lin, Tianyu Wang et al.
Plug-and-Play PPO: An Adaptive Point Prompt Optimizer Making SAM Greater
Xueyu Liu, Rui Wang, Yexin Lai et al.
Patch Matters: Training-free Fine-grained Image Caption Enhancement via Local Perception
ruotian peng, Haiying He, Yake Wei et al.
SCAP: Transductive Test-Time Adaptation via Supportive Clique-based Attribute Prompting
Chenyu Zhang, Kunlun Xu, Zichen Liu et al.
Neuro-3D: Towards 3D Visual Decoding from EEG Signals
Zhanqiang Guo, Jiamin Wu, Yonghao Song et al.
Adventurer: Optimizing Vision Mamba Architecture Designs for Efficiency
Feng Wang, Timing Yang, Yaodong Yu et al.
WeGen: A Unified Model for Interactive Multimodal Generation as We Chat
Zhipeng Huang, Shaobin Zhuang, Canmiao Fu et al.
Reducing Class-wise Confusion for Incremental Learning with Disentangled Manifolds
Huitong Chen, Yu Wang, Yan Fan et al.
Beyond Words: Augmenting Discriminative Richness via Diffusions in Unsupervised Prompt Learning
Hairui Ren, Fan Tang, He Zhao et al.
TokenMotion: Decoupled Motion Control via Token Disentanglement for Human-centric Video Generation
Ruineng Li, Daitao Xing, Huiming Sun et al.
Task-Aware Clustering for Prompting Vision-Language Models
Fusheng Hao, Fengxiang He, Fuxiang Wu et al.
Hunyuan-Portrait: Implicit Condition Control for Enhanced Portrait Animation
Zunnan Xu, Zhentao Yu, Zixiang Zhou et al.
MeshArt: Generating Articulated Meshes with Structure-Guided Transformers
Daoyi Gao, Mohd Yawar Nihal Siddiqui, Lei Li et al.
Non-Natural Image Understanding with Advancing Frequency-based Vision Encoders
Wang Lin, Qingsong Wang, Yueying Feng et al.
Navigating the Unseen: Zero-shot Scene Graph Generation via Capsule-Based Equivariant Features
Wenhuan Huang, Yi JI, guiqian zhu et al.
AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea
Qifan Yu, Wei Chow, Zhongqi Yue et al.
RefPose: Leveraging Reference Geometric Correspondences for Accurate 6D Pose Estimation of Unseen Objects
Jaeguk Kim, Jaewoo Park, Keuntek Lee et al.
GRAE-3DMOT: Geometry Relation-Aware Encoder for Online 3D Multi-Object Tracking
Hyunseop Kim, Hyo-Jun Lee, Yonguk Lee et al.
Generative Modeling of Class Probability for Multi-Modal Representation Learning
JungKyoo Shin, Bumsoo Kim, Eunwoo Kim
Unified Medical Lesion Segmentation via Self-referring Indicator
Shijie Chang, Xiaoqi Zhao, Lihe Zhang et al.
SGSST: Scaling Gaussian Splatting Style Transfer
Bruno Galerne, Jianling WANG, Lara Raad et al.
DIO: Decomposable Implicit 4D Occupancy-Flow World Model
Christopher Diehl, Quinlan Sykora, Ben Agro et al.
HERA: Hybrid Explicit Representation for Ultra-Realistic Head Avatars
Hongrui Cai, Yuting Xiao, Xuan Wang et al.
Hierarchical Adaptive Filtering Network for Text Image Specular Highlight Removal
Zhi Jiang, Jingbo Hu, Ling Zhang et al.
Conformal Prediction and MLLM aided Uncertainty Quantification in Scene Graph Generation
Sayak Nag, Udita Ghosh, Calvin-Khang Ta et al.
Move-in-2D: 2D-Conditioned Human Motion Generation
Hsin-Ping Huang, Yang Zhou, Jui-Hsien Wang et al.
Boosting Domain Incremental Learning: Selecting the Optimal Parameters is All You Need
Qiang Wang, Xiang Song, Yuhang He et al.
Phoenix: A Motion-based Self-Reflection Framework for Fine-grained Robotic Action Correction
Wenke Xia, Ruoxuan Feng, Dong Wang et al.
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
Wanhua Li, Renping Zhou, Jiawei Zhou et al.
Quaffure: Real-Time Quasi-Static Neural Hair Simulation
Tuur Stuyck, Gene Wei-Chin Lin, Egor Larionov et al.
Implicit Bias Injection Attacks against Text-to-Image Diffusion Models
Huayang Huang, Xiangye Jin, Jiaxu Miao et al.
Reversing Flow for Image Restoration
Haina Qin, Wenyang Luo, Bing Li et al.
MTADiffusion: Mask Text Alignment Diffusion Model for Object Inpainting
jun huang, Ting Liu, Yihang Wu et al.
DAGSM: Disentangled Avatar Generation with GS-enhanced Mesh
Jingyu Zhuang, Di Kang, Linchao Bao et al.
Open-Canopy: Towards Very High Resolution Forest Monitoring
Fajwel Fogel, Yohann PERRON, Nikola Besic et al.
S2D-LFE: Sparse-to-Dense Light Field Event Generation
Yutong Liu, Wenming Weng, Yueyi Zhang et al.
Beyond Generation: A Diffusion-based Low-level Feature Extractor for Detecting AI-generated Images
Nan Zhong, Haoyu Chen, Yiran Xu et al.
Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation
Pu Cao, Feng Zhou, Lu Yang et al.
MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations
Ziyang Zhang, Yang Yu, Yucheng Chen et al.
Inference-Scale Complexity in ANN-SNN Conversion for High-Performance and Low-Power Applications
Tong Bu, Maohua Li, Zhaofei Yu
GazeGene: Large-scale Synthetic Gaze Dataset with 3D Eyeball Annotations
Yiwei Bao, Zhiming Wang, Feng Lu
Multirate Neural Image Compression with Adaptive Lattice Vector Quantization
Hao Xu, Xiaolin Wu, Xi Zhang
VideoGEM: Training-free Action Grounding in Videos
Felix Vogel, Walid Bousselham, Anna Kukleva et al.
SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation
Aleksei Bokhovkin, Quan Meng, Shubham Tulsiani et al.
DefMamba: Deformable Visual State Space Model
Leiye Liu, Miao Zhang, Jihao Yin et al.
PEER Pressure: Model-to-Model Regularization for Single Source Domain Generalization
Dongkyu Cho, Inwoo Hwang, Sanghack Lee
Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks
Han Wang, Gang Wang, Huan Zhang
Less is More: Efficient Image Vectorization with Adaptive Parameterization
Kaibo Zhao, Liang Bao, Yufei Li et al.
Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
Wenbin An, Feng Tian, Sicong Leng et al.
PERSE: Personalized 3D Generative Avatars from A Single Portrait
Hyunsoo Cha, Inhee Lee, Hanbyul Joo
Towards Explainable and Unprecedented Accuracy in Matching Challenging Finger Crease Patterns
Zhenyu Zhou, Chengdong Dong, Ajay Kumar
Animate and Sound an Image
Xihua Wang, Ruihua Song, Chongxuan Li et al.
Enhancing Few-Shot Class-Incremental Learning via Training-Free Bi-Level Modality Calibration
Yiyang Chen, Tianyu Ding, Lei Wang et al.
Discovering Hidden Visual Concepts Beyond Linguistic Input in Infant Learning
Xueyi Ke, Satoshi Tsutsui, Yayun Zhang et al.
BEVDiffuser: Plug-and-Play Diffusion Model for BEV Denoising with Ground-Truth Guidance
Xin Ye, Burhan Yaman, Sheng Cheng et al.
Stop Learning it all to Mitigate Visual Hallucination, Focus on the Hallucination Target.
Dokyoon Yoon, Youngsook Song, Woomyoung Park
LeanGaussian: Breaking Pixel or Point Cloud Correspondence in Modeling 3D Gaussians
Jiamin WU, Kenkun Liu, Han Gao et al.
Rethinking Temporal Fusion with a Unified Gradient Descent View for 3D Semantic Occupancy Prediction
Dubing Chen, Huan Zheng, Jin Fang et al.
Star with Bilinear Mapping
Zelin Peng, Yu Huang, Zhengqin Xu et al.
OSMamba: Omnidirectional Spectral Mamba with Dual-Domain Prior Generator for Exposure Correction
Gehui Li, Bin Chen, Chen Zhao et al.
ArticulatedGS: Self-supervised Digital Twin Modeling of Articulated Objects using 3D Gaussian Splatting
Guo Junfu, Yu Xin, Gaoyi Liu et al.
Motions as Queries: One-Stage Multi-Person Holistic Human Motion Capture
Kenkun Liu, Yurong Fu, Weihao Yuan et al.
Large-scale Multi-view Tensor Clustering with Implicit Linear Kernels
Jiyuan Liu, Xinwang Liu, chuankun Li et al.
Accurate Scene Text Recognition with Efficient Model Scaling and Cloze Self-Distillation
Andrea Maracani, Savas Ozkan, Sijun Cho et al.
Analyzing the Synthetic-to-Real Domain Gap in 3D Hand Pose Estimation
Zhuoran ZHAO, Linlin Yang, Pengzhan Sun et al.
Zero-Shot 4D Lidar Panoptic Segmentation
Yushan Zhang, Aljoša Ošep, Laura Leal-Taixe et al.
POPEN: Preference-Based Optimization and Ensemble for LVLM-Based Reasoning Segmentation
Lanyun Zhu, Tianrun Chen, Qianxiong Xu et al.
HyperNet Fields: Efficiently Training Hypernetworks without Ground Truth by Learning Weight Trajectories
Eric Hedlin, Munawar Hayat, Fatih Porikli et al.
RICCARDO: Radar Hit Prediction and Convolution for Camera-Radar 3D Object Detection
Yunfei Long, Abhinav Kumar, Xiaoming Liu et al.
IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification
Yuhao Wang, Yongfeng Lv, Pingping Zhang et al.
MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices
Jianwen Jiang, Gaojie Lin, Zhengkun Rong et al.
Jailbreaking the Non-Transferable Barrier via Test-Time Data Disguising
Yongli Xiang, Ziming Hong, Lina Yao et al.
Joint Vision-Language Social Bias Removal for CLIP
Haoyu Zhang, Yangyang Guo, Mohan Kankanhalli
CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning
Yang Yue, Yulin Wang, Chenxin Tao et al.
Knowledge Bridger: Towards Training-Free Missing Modality Completion
Guanzhou Ke, Shengfeng He, Xiao-Li Wang et al.
SkySense-O: Towards Open-World Remote Sensing Interpretation with Vision-Centric Visual-Language Modeling
Qi Zhu, Jiangwei Lao, Deyi Ji et al.
AdaDARE-gamma: Balancing Stability and Plasticity in Multi-modal LLMs through Efficient Adaptation
Jingyi Xie, Jintao Yang, Zhunchen Luo et al.
Fingerprinting Denoising Diffusion Probabilistic Models
Huan Teng, Yuhui Quan, Chengyu Wang et al.
Query Efficient Black-Box Visual Prompting with Subspace Learning
Haozhen Zhang, Zhaogeng Liu, Hualin Zhang et al.
Efficient Depth Estimation for Unstable Stereo Camera Systems on AR Glasses
Yongfan Liu, Hyoukjun Kwon
Seeing is Not Believing: Adversarial Natural Object Optimization for Hard-Label 3D Scene Attacks
Daizong Liu, Wei Hu
Once-Tuning-Multiple-Variants: Tuning Once and Expanded as Multiple Vision-Language Model Variants
Chong Yu, Tao Chen, Zhongxue Gan
Heterogeneous Skeleton-Based Action Representation Learning
Xiaoyan Ma, jidong kuang, Hongsong Wang et al.
Recurrent Feature Mining and Keypoint Mixup Padding for Category-Agnostic Pose Estimation
Junjie Chen, Weilong Chen, Yifan Zuo et al.
Seeing Speech and Sound: Distinguishing and Locating Audio Sources in Visual Scenes
Hyeonggon Ryu, Seongyu Kim, Joon Chung et al.
FASTer: Focal token Acquiring-and-Scaling Transformer for Long-term 3D Objection Detection
Chenxu Dang, Pei An, Xinmin Zhang et al.
Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss Landscapes
Aodi Li, Liansheng Zhuang, Xiao Long et al.
Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization
Feifei Li, Mi Zhang, Yiming Sun et al.
Beyond Local Sharpness: Communication-Efficient Global Sharpness-aware Minimization for Federated Learning
Debora Caldarola, Pietro Cagnasso, Barbara Caputo et al.
LoKi: Low-dimensional KAN for Efficient Fine-tuning Image Models
Xuan Cai, Renjie Pan, Hua Yang
DIV-FF: Dynamic Image-Video Feature Fields For Environment Understanding in Egocentric Videos
Lorenzo Mur-Labadia, Jose J. Guerrero, Ruben Martinez-Cantin
AdMiT: Adaptive Multi-Source Tuning in Dynamic Environments
Xiangyu Chang, Fahim Faisal Niloy, Sk Miraj Ahmed et al.
Viewpoint Rosetta Stone: Unlocking Unpaired Ego-Exo Videos for View-invariant Representation Learning
Mi Luo, Zihui Xue, Alex Dimakis et al.
GIF: Generative Inspiration for Face Recognition at Scale
Mohammad Saadabadi Saadabadi, Sahar Rahimi Malakshan, Ali Dabouei et al.
CrossSDF: 3D Reconstruction of Thin Structures From Cross-Sections
Thomas Walker, Salvatore Esposito, Daniel Rebain et al.
Investigating the Role of Weight Decay in Enhancing Nonconvex SGD
Tao Sun, Yuhao Huang, Li Shen et al.
MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction
Jingcheng Ni, Yuxin Guo, Yichen Liu et al.
AVQACL: A Novel Benchmark for Audio-Visual Question Answering Continual Learning
Kaixuan Wu, Xinde Li, Xinglin Li et al.
Divide and Conquer: Heterogeneous Noise Integration for Diffusion-based Adversarial Purification
Gaozheng Pei, Shaojie Lyu, Gong Chen et al.
Foley-Flow: Coordinated Video-to-Audio Generation with Masked Audio-Visual Alignment and Dynamic Conditional Flows
Shentong Mo, Yibing Song
DL2G: Degradation-guided Local-to-Global Restoration for Eyeglass Reflection Removal
Yizhilv, Xiao Lu, Hong Ding et al.
Efficient Decoupled Feature 3D Gaussian Splatting via Hierarchical Compression
Zhenqi Dai, Ting Liu, Yanning Zhang
Hearing Hands: Generating Sounds from Physical Interactions in 3D Scenes
Yiming Dou, Wonseok Oh, Yuqing Luo et al.
Domain Generalization in CLIP via Learning with Diverse Text Prompts
Changsong Wen, Zelin Peng, Yu Huang et al.
Alias-Free Latent Diffusion Models: Improving Fractional Shift Equivariance of Diffusion Latent Space
Yifan Zhou, Zeqi Xiao, Shuai Yang et al.
Weakly Supervised Contrastive Adversarial Training for Learning Robust Features from Semi-supervised Data
Lilin Zhang, Chengpei Wu, Ning Yang
CarPlanner: Consistent Auto-regressive Trajectory Planning for Large-Scale Reinforcement Learning in Autonomous Driving
Dongkun Zhang, Jiaming Liang, Ke Guo et al.
UCM-VeID V2: A Richer Dataset and A Pre-training Method for UAV Cross-Modality Vehicle Re-Identification
Xingyue Liu, Jiahao Qi, Chen Chen et al.
Unboxed: Geometrically and Temporally Consistent Video Outpainting
Zhongrui Yu, Martina Megaro-Boldini, Robert Sumner et al.
Visual Lexicon: Rich Image Features in Language Space
XuDong Wang, Xingyi Zhou, Alireza Fathi et al.
Continual SFT Matches Multimodal RLHF with Negative Supervision
Ke Zhu, Yu Wang, Yanpeng Sun et al.
Decoupled Motion Expression Video Segmentation
Hao Fang, Runmin Cong, Xiankai Lu et al.
Mixture of Submodules for Domain Adaptive Person Search
Minsu Kim, Seungryong Kim, Kwanghoon Sohn
Unsupervised Discovery of Facial Landmarks and Head Pose
Satyajit Tourani, Siddharth Tourani, Arif Mahmood et al.
Dynamic Integration of Task-Specific Adapters for Class Incremental Learning
Jiashuo Li, Shaokun Wang, Bo Qian et al.
Test-time Augmentation Improves Efficiency in Conformal Prediction
Divya M Shanmugam, Helen Lu, Swami Sankaranarayanan et al.
GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding
Yawen Shao, Wei Zhai, Yuhang Yang et al.
Dual Diffusion for Unified Image Generation and Understanding
Zijie Li, Henry Li, Yichun Shi et al.
Commonsense Video Question Answering through Video-Grounded Entailment Tree Reasoning
Huabin Liu, Filip Ilievski, Cees G. M. Snoek
Enduring, Efficient and Robust Trajectory Prediction Attack in Autonomous Driving via Optimization-Driven Multi-Frame Perturbation Framework
Yi Yu, Weizhen Han, Libing Wu et al.
UNEM: UNrolled Generalized EM for Transductive Few-Shot Learning
Long Zhou, Fereshteh Shakeri, Aymen Sadraoui et al.
Flash-Split: 2D Reflection Removal with Flash Cues and Latent Diffusion Separation
Tianfu Wang, Mingyang Xie, Haoming Cai et al.
Test-Time Backdoor Detection for Object Detection Models
Hangtao Zhang, Yichen Wang, Shihui Yan et al.
Classifier-Free Guidance Inside the Attraction Basin May Cause Memorization
Anubhav Jain, Yuya Kobayashi, Takashi Shibuya et al.
Reconciling Stochastic and Deterministic Strategies for Zero-shot Image Restoration using Diffusion Model in Dual
Chong Wang, Lanqing Guo, Zixuan Fu et al.
Harnessing Frequency Spectrum Insights for Image Copyright Protection Against Diffusion Models
Zhenguang Liu, Chao Shuai, Shaojing Fan et al.
Gain from Neighbors: Boosting Model Robustness in the Wild via Adversarial Perturbations Toward Neighboring Classes
Zhou Yang, Mingtao Feng, Tao Huang et al.
Enhancing Creative Generation on Stable Diffusion-based Models
Jiyeon Han, Dahee Kwon, Gayoung Lee et al.
EquiPose: Exploiting Permutation Equivariance for Relative Camera Pose Estimation
Yuzhen Liu, Qiulei Dong
EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues
Sagar Soni, Akshay Dudhane, Hiyam Debary et al.
Visual Consensus Prompting for Co-Salient Object Detection
Jie Wang, Nana Yu, Zihao Zhang et al.
Classifier-guided CLIP Distillation for Unsupervised Multi-label Classification
Dongseob Kim, Hyunjung Shim
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
Hang Yin, Xiuwei Xu, Linqing Zhao et al.
Floating No More: Object-Ground Reconstruction from a Single Image
Yunze Man, Yichen Sheng, Jianming Zhang et al.
pFedMxF: Personalized Federated Class-Incremental Learning with Mixture of Frequency Aggregation
Yifei Zhang, Hao Zhu, Alysa Ziying Tan et al.
The Art of Deception: Color Visual Illusions and Diffusion Models
Alexandra Gomez-Villa, Kai Wang, C.Alejandro Parraga et al.
ACAttack: Adaptive Cross Attacking RGB-T Tracker via Multi-Modal Response Decoupling
Xinyu Xiang, Qinglong Yan, HAO ZHANG et al.
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
Bingda Tang, Sayak Paul, Boyang Zheng et al.
SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting
Dongliang Luo, Hanshen Zhu, Ziyang Zhang et al.
Distilling Long-tailed Datasets
Zhenghao Zhao, Haoxuan Wang, Yuzhang Shang et al.
Knowledge Memorization and Rumination for Pre-trained Model-based Class-Incremental Learning
Zijian Gao, Wangwang Jia, Xingxing Zhang et al.
PQPP: A Joint Benchmark for Text-to-Image Prompt and Query Performance Prediction
Eduard Poesina, Adriana Valentina Costache, Adrian-Gabriel Chifu et al.
CheXwhatsApp: A Dataset for Exploring Challenges in the Diagnosis of Chest X-rays through Mobile Devices
Mariamma Antony, Rajiv Porana, Sahil M. Lathiya et al.
From Prototypes to General Distributions: An Efficient Curriculum for Masked Image Modeling
Jinhong Lin, Cheng-En Wu, Huanran Li et al.
SINR: Sparsity Driven Compressed Implicit Neural Representations
Dhananjaya Jayasundara, Sudarshan Rajagopalan, Yasiru Ranasinghe et al.
Erase Diffusion: Empowering Object Removal Through Calibrating Diffusion Pathways
Yi Liu, Hao Zhou, Benlei Cui et al.
DVHGNN: Multi-Scale Dilated Vision HGNN for Efficient Vision Recognition
Caoshuo Li, Tanzhe Li, Xiaobin Hu et al.
RobSense: A Robust Multi-modal Foundation Model for Remote Sensing with Static, Temporal, and Incomplete Data Adaptability
Minh Kha Do, Kang Han, Phu Lai et al.
MESC-3D:Mining Effective Semantic Cues for 3D Reconstruction from a Single Image
Shaoming Li, Qing Cai, Songqi KONG et al.
Probing the Mid-level Vision Capabilities of Self-Supervised Learning
Xuweiyi Chen, Markus Marks, Zezhou Cheng
ConceptGuard: Continual Personalized Text-to-Image Generation with Forgetting and Confusion Mitigation
Zirun Guo, Tao Jin
Diffusion Bridge: Leveraging Diffusion Model to Reduce the Modality Gap Between Text and Vision for Zero-Shot Image Captioning
Jeongryong Lee, Yejee Shin, Geonhui Son et al.
The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation
Bingjie Gao, Xinyu Gao, Xiaoxue Wu et al.
VLMs-Guided Representation Distillation for Efficient Vision-Based Reinforcement Learning
Haoran Xu, Peixi Peng, Guang Tan et al.
Beyond Image Classification: A Video Benchmark and Dual-Branch Hybrid Discrimination Framework for Compositional Zero-Shot Learning
Dongyao Jiang, Haodong Jing, Yongqiang Ma et al.
AniGrad: Anisotropic Gradient-Adaptive Sampling for 3D Reconstruction From Monocular Video
Noah Stier, Alex Rich, Pradeep Sen et al.
Easy-editable Image Vectorization with Multi-layer Multi-scale Distributed Visual Feature Embedding
Ye Chen, Zhangli Hu, Zhongyin Zhao et al.
DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding
Yudong Han, Qingpei Guo, Liyuan Pan et al.
Automated Proof of Polynomial Inequalities via Reinforcement Learning
Banglong Liu, Niuniu Qi, Xia Zeng et al.
BIP3D: Bridging 2D Images and 3D Perception for Embodied Intelligence
Xuewu Lin, Tianwei Lin, Alan Huang et al.
MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors
Riku Murai, Eric Dexheimer, Andrew J. Davison
Toward Robust Neural Reconstruction from Sparse Point Sets
Amine Ouasfi, Shubhendu Jena, Eric Marchand et al.
Just Dance with pi! A Poly-modal Inductor for Weakly-supervised Video Anomaly Detection
Snehashis Majhi, Giacomo D'Amicantonio, Antitza Dantcheva et al.
Gaussian Splashing: Unified Particles for Versatile Motion Synthesis and Rendering
Yutao Feng, Xiang Feng, Yintong Shang et al.
Improving Accuracy and Calibration via Differentiated Deep Mutual Learning
Han Liu, Peng Cui, Bingning Wang et al.
Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer
Jiahao Cui, Hui Li, Qingkun Su et al.
Improving the Transferability of Adversarial Attacks on Face Recognition with Diverse Parameters Augmentation
Fengfan Zhou, Bangjie Yin, Hefei Ling et al.
Reward Fine-Tuning Two-Step Diffusion Models via Learning Differentiable Latent-Space Surrogate Reward
Zhiwei Jia, Yuesong Nan, Huixi Zhao et al.
CaricatureBooth: Data-Free Interactive Caricature Generation in a Photo Booth
Zhiyu Qu, Yunqi Miao, Zhensong Zhang et al.
Channel-wise Noise Scheduled Diffusion for Inverse Rendering in Indoor Scenes
JunYong Choi, Min-Cheol Sagong, SeokYeong Lee et al.
Discovering Fine-Grained Visual-Concept Relations by Disentangled Optimal Transport Concept Bottleneck Models
Yan Xie, Zequn Zeng, Hao Zhang et al.
Embodied Scene Understanding for Vision Language Models via MetaVQA
Weizhen Wang, Chenda Duan, Zhenghao Peng et al.
Learning Flow Fields in Attention for Controllable Person Image Generation
Zijian Zhou, Shikun Liu, Xiao Han et al.
Forming Auxiliary High-confident Instance-level Loss to Promote Learning from Label Proportions
Tianhao Ma, Han Chen, Juncheng Hu et al.
Towards Open-Vocabulary Audio-Visual Event Localization
Jinxing Zhou, Dan Guo, Ruohao Guo et al.
ADU: Adaptive Detection of Unknown Categories in Black-Box Domain Adaptation
Yushan Lai, Guowen Li, Haoyuan Liang et al.
Coeff-Tuning: A Graph Filter Subspace View for Tuning Attention-Based Large Models
Zichen Miao, WEI CHEN, Qiang Qiu
MVBoost: Boost 3D Reconstruction with Multi-View Refinement
Xiangyu Liu, Xiaomei Zhang, Zhiyuan Ma et al.
CARL: A Framework for Equivariant Image Registration
Hastings Greer, Lin Tian, François-Xavier Vialard et al.
AeroGen: Enhancing Remote Sensing Object Detection with Diffusion-Driven Data Generation
Datao Tang, Xiangyong Cao, Xuan Wu et al.
UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines
Chen Tang, Xinzhu Ma, Encheng Su et al.
RENO: Real-Time Neural Compression for 3D LiDAR Point Clouds
Kang You, Tong Chen, Dandan Ding et al.
MaskGaussian: Adaptive 3D Gaussian Representation from Probabilistic Masks
Yifei Liu, Zhihang Zhong, Yifan Zhan et al.
Forensics Adapter: Adapting CLIP for Generalizable Face Forgery Detection
Xinjie Cui, Yuezun Li, Ao Luo et al.
Co-Speech Gesture Video Generation with Implicit Motion-Audio Entanglement
Xinjie Li, Ziyi Chen, Xinlu Yu et al.
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation
Chengyue Wu, Xiaokang Chen, Zhiyu Wu et al.