Most Cited CVPR "image-based 3d generation" Papers
5,589 papers found • Page 19 of 28
Conference
Countering Personalized Text-to-Image Generation with Influence Watermarks
Hanwen Liu, Zhicheng Sun, Yadong Mu
T-VSL: Text-Guided Visual Sound Source Localization in Mixtures
Tanvir Mahmud, Yapeng Tian, Diana Marculescu
Initialization Matters for Adversarial Transfer Learning
Andong Hua, Jindong Gu, Zhiyu Xue et al.
MindBridge: A Cross-Subject Brain Decoding Framework
Shizun Wang, Songhua Liu, Zhenxiong Tan et al.
Narrative Action Evaluation with Prompt-Guided Multimodal Interaction
Shiyi Zhang, Sule Bai, Guangyi Chen et al.
JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments
Duy Tho Le, Chenhui Gou, Stavya Datta et al.
Minimal Perspective Autocalibration
Andrea Porfiri Dal Cin, Timothy Duff, Luca Magri et al.
RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos
Hongchi Xia, Yang Fu, Sifei Liu et al.
Aligning and Prompting Everything All at Once for Universal Visual Perception
Yunhang Shen, Chaoyou Fu, Peixian Chen et al.
Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption
Buzhen Huang, Chen Li, Chongyang Xu et al.
Label Propagation for Zero-shot Classification with Vision-Language Models
Vladan Stojnić, Yannis Kalantidis, Giorgos Tolias
IQ-VFI: Implicit Quadratic Motion Estimation for Video Frame Interpolation
Mengshun Hu, Kui Jiang, Zhihang Zhong et al.
Efficient Dataset Distillation via Minimax Diffusion
Jianyang Gu, Saeed Vahidian, Vyacheslav Kungurtsev et al.
Enhancing Video Super-Resolution via Implicit Resampling-based Alignment
Kai Xu, Ziwei Yu, Xin Wang et al.
Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis
Bichen Wu, Ching-Yao Chuang, Xiaoyan Wang et al.
RNb-NeuS: Reflectance and Normal-based Multi-View 3D Reconstruction
Baptiste Brument, Robin Bruneau, Yvain Queau et al.
Prompt-Enhanced Multiple Instance Learning for Weakly Supervised Video Anomaly Detection
Junxi Chen, Liang Li, Li Su et al.
DiPrompT: Disentangled Prompt Tuning for Multiple Latent Domain Generalization in Federated Learning
Sikai Bai, Jie ZHANG, Song Guo et al.
HIMap: HybrId Representation Learning for End-to-end Vectorized HD Map Construction
Yi ZHOU, Hui Zhang, Jiaqian Yu et al.
LTA-PCS: Learnable Task-Agnostic Point Cloud Sampling
Jiaheng Liu, Jianhao Li, Kaisiyuan Wang et al.
SRTube: Video-Language Pre-Training with Action-Centric Video Tube Features and Semantic Role Labeling
Juhee Lee, Jewon Kang
Generative Quanta Color Imaging
Vishal Purohit, Junjie Luo, Yiheng Chi et al.
SI-MIL: Taming Deep MIL for Self-Interpretability in Gigapixel Histopathology
Saarthak Kapse, Pushpak Pati, Srijan Das et al.
MINIMA: Modality Invariant Image Matching
Jiangwei Ren, Xingyu Jiang, Zizhuo Li et al.
Mind the Gap: Confidence Discrepancy Can Guide Federated Semi-Supervised Learning Across Pseudo-Mismatch
Yijie Liu, Xinyi Shang, Yiqun Zhang et al.
SATA: Spatial Autocorrelation Token Analysis for Enhancing the Robustness of Vision Transformers
Nikaan Nikzad, YI LIAO, Yongsheng Gao et al.
Tiled Diffusion
Or Madar, Ohad Fried
AI-Face: A Million-Scale Demographically Annotated AI-Generated Face Dataset and Fairness Benchmark
Li Lin, Santosh Santosh, Mingyang Wu et al.
Advancing Generalizable Tumor Segmentation with Anomaly-Aware Open-Vocabulary Attention Maps and Frozen Foundation Diffusion Models
Yankai Jiang, Peng Zhang, Donglin Yang et al.
Condensing Action Segmentation Datasets via Generative Network Inversion
Guodong Ding, Rongyu Chen, Angela Yao
HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation
Hermann Kumbong, Xian Liu, Tsung-Yi Lin et al.
CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models
Felix Taubner, Ruihang Zhang, Mathieu Tuli et al.
Task-Specific Gradient Adaptation for Few-Shot One-Class Classification
Yunlong Li, Xiabi Liu, Liyuan Pan et al.
Good, Cheap, and Fast: Overfitted Image Compression with Wasserstein Distortion
Jona Ballé, Luca Versari, Emilien Dupont et al.
Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
Sili Chen, Hengkai Guo, Shengnan Zhu et al.
Enhancing Facial Privacy Protection via Weakening Diffusion Purification
Ali Salar, Qing Liu, Yingli Tian et al.
Structured 3D Latents for Scalable and Versatile 3D Generation
Jianfeng XIANG, Zelong Lv, Sicheng Xu et al.
Practical Solutions to the Relative Pose of Three Calibrated Cameras
Charalambos Tzamos, Viktor Kocur, Yaqing Ding et al.
Temporal Alignment-Free Video Matching for Few-shot Action Recognition
SuBeen Lee, WonJun Moon, Hyun Seok Seong et al.
CRISP: Object Pose and Shape Estimation with Test-Time Adaptation
Jingnan Shi, Rajat Talak, Harry Zhang et al.
DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
Keda Tao, Can Qin, Haoxuan You et al.
CoLLM: A Large Language Model for Composed Image Retrieval
Chuong Huynh, Jinyu Yang, Ashish Tawari et al.
CLIP-driven Coarse-to-fine Semantic Guidance for Fine-grained Open-set Semi-supervised Learning
Xiaokun Li, Yaping Huang, Qingji Guan
Relative Pose Estimation through Affine Corrections of Monocular Depth Priors
Yifan Yu, Shaohui Liu, Rémi Pautrat et al.
Locality-Aware Zero-Shot Human-Object Interaction Detection
Sanghyun Kim, Deunsol Jung, Minsu Cho
Arc2Avatar: Generating Expressive 3D Avatars from a Single Image via ID Guidance
Dimitrios Gerogiannis, Foivos Paraperas Papantoniou, Rolandos Alexandros Potamias et al.
LIM: Large Interpolator Model for Dynamic Reconstruction
Remy Sabathier, Niloy J. Mitra, David Novotny
MoFlow: One-Step Flow Matching for Human Trajectory Forecasting via Implicit Maximum Likelihood Estimation based Distillation
Yuxiang Fu, Qi Yan, Ke Li et al.
Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions
Chan Hur, Jeong-hun Hong, Dong-hun Lee et al.
SAMWISE: Infusing Wisdom in SAM2 for Text-Driven Video Segmentation
Claudia Cuttano, Gabriele Trivigno, Gabriele Rosi et al.
MITracker: Multi-View Integration for Visual Object Tracking
Mengjie Xu, Yitao Zhu, Haotian Jiang et al.
Bias for Action: Video Implicit Neural Representations with Bias Modulation
Alper Kayabasi, Anil Kumar Vadathya, Guha Balakrishnan et al.
LotusFilter: Fast Diverse Nearest Neighbor Search via a Learned Cutoff Table
Yusuke Matsui
Towards Effective and Sparse Adversarial Attack on Spiking Neural Networks via Breaking Invisible Surrogate Gradients
Li Lun, Kunyu Feng, Qinglong Ni et al.
DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis
Yuming Gu, Phong Tran, Yujian Zheng et al.
h-Edit: Effective and Flexible Diffusion-Based Editing via Doob's h-Transform
Toan Nguyen, Kien Do, Duc Kieu et al.
EDM: Equirectangular Projection-Oriented Dense Kernelized Feature Matching
Dongki Jung, Jaehoon Choi, Yonghan Lee et al.
SegEarth-OV: Towards Training-Free Open-Vocabulary Segmentation for Remote Sensing Images
Kaiyu Li, Ruixun Liu, Xiangyong Cao et al.
RigGS: Rigging of 3D Gaussians for Modeling Articulated Objects in Videos
Yuxin Yao, Zhi Deng, Junhui Hou
Make-It-Animatable: An Efficient Framework for Authoring Animation-Ready 3D Characters
Zhiyang Guo, Jinxu Xiang, Kai Ma et al.
Unlocking the Potential of Unlabeled Data in Semi-Supervised Domain Generalization
Dongkwan Lee, Kyomin Hwang, Nojun Kwak
Exposure-slot: Exposure-centric Representations Learning with Slot-in-Slot Attention for Region-aware Exposure Correction
Donggoo Jung, DAEHYUN KIM, Guanghui Wang et al.
Explainable Saliency: Articulating Reasoning with Contextual Prioritization
Nuo Chen, Ming Jiang, Qi Zhao
VisionZip: Longer is Better but Not Necessary in Vision Language Models
Senqiao Yang, Yukang Chen, Zhuotao Tian et al.
HistoFS: Non-IID Histopathologic Whole Slide Image Classification via Federated Style Transfer with RoI-Preserving
Farchan Hakim Raswa, Chun-Shien Lu, Jia-Ching Wang
3D-Mem: 3D Scene Memory for Embodied Exploration and Reasoning
Yuncong Yang, Han Yang, Jiachen Zhou et al.
EffiDec3D: An Optimized Decoder for High-Performance and Efficient 3D Medical Image Segmentation
Md Mostafijur Rahman, Radu Marculescu
The Language of Motion: Unifying Verbal and Non-verbal Language of 3D Human Motion
Changan Chen, Juze Zhang, Shrinidhi Kowshika Lakshmikanth et al.
Parallelized Autoregressive Visual Generation
Yuqing Wang, Shuhuai Ren, Zhijie Lin et al.
One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion
Chunyang Cheng, Tianyang Xu, Zhenhua Feng et al.
Using Diffusion Priors for Video Amodal Segmentation
Kaihua Chen, Deva Ramanan, Tarasha Khurana
Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation
Jiantao Lin, Xin Yang, Meixi Chen et al.
ChatGarment: Garment Estimation, Generation and Editing via Large Language Models
Siyuan Bian, Chenghao Xu, Yuliang Xiu et al.
SpecTRe-GS: Modeling Highly Specular Surfaces with Reflected Nearby Objects by Tracing Rays in 3D Gaussian Splatting
Jiajun Tang, Fan Fei, Zhihao Li et al.
Self-supervised ControlNet with Spatio-Temporal Mamba for Real-world Video Super-resolution
Shijun Shi, Jing Xu, Lijing Lu et al.
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training
Jierun Chen, Dongting Hu, Xijie Huang et al.
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
Jihoon Kim, Jeongsoo Choi, Jaehun Kim et al.
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
Byung-Kwan Lee, Ryo Hachiuma, Yu-Chiang Frank Wang et al.
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Ho Kei Cheng, Masato Ishii, Akio Hayakawa et al.
Consistency Posterior Sampling for Diverse Image Synthesis
Vishal Purohit, Matthew Repasky, Jianfeng Lu et al.
Text Embedding is Not All You Need: Attention Control for Text-to-Image Semantic Alignment with Text Self-Attention Maps
Jeeyung Kim, Erfan Esmaeili Fakhabi, Qiang Qiu
Co-op: Correspondence-based Novel Object Pose Estimation
Sungphill Moon, Hyeontae Son, Dongcheol Hur et al.
Effective Cloud Removal for Remote Sensing Images by an Improved Mean-Reverting Denoising Model with Elucidated Design Space
Yi Liu, Wengen Li, Jihong Guan et al.
StageDesigner: Artistic Stage Generation for Scenography via Theater Scripts
Zhaoxing Gan, Mengtian Li, Ruhua Chen et al.
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
Yuhao Dong, Zuyan Liu, Hai-Long Sun et al.
Scaling Mesh Generation via Compressive Tokenization
Haohan Weng, Zibo Zhao, Biwen Lei et al.
3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes
Jan Held, Renaud Vandeghen, Abdullah J Hamdi et al.
Disco4D: Disentangled 4D Human Generation and Animation from a Single Image
Hui En Pang, Shuai Liu, Zhongang Cai et al.
ShowMak3r: Compositional TV Show Reconstruction
Sangmin Kim, Seunguk Do, Jaesik Park
Vision-Language Model IP Protection via Prompt-based Learning
Lianyu Wang, Meng Wang, Huazhu Fu et al.
Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera
Zhengdi Yu, Stefanos Zafeiriou, Tolga Birdal
Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration
JUNSEONG KIM, GeonU Kim, Kim Yu-Ji et al.
GauCho: Gaussian Distributions with Cholesky Decomposition for Oriented Object Detection
Jeffri Erwin Murrugarra Llerena, José Henrique Marques, Claudio Jung
Seeing the Abstract: Translating the Abstract Language for Vision Language Models
Davide Talon, Federico Girella, Ziyue Liu et al.
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts
Adnen Abdessaied, Anna Rohrbach, Marcus Rohrbach et al.
UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image
Xingyu Liu, Gu Wang, Ruida Zhang et al.
Conformal Prediction for Zero-Shot Models
Julio Silva-Rodríguez, Ismail Ben Ayed, Jose Dolz
PhysAnimator: Physics-Guided Generative Cartoon Animation
Tianyi Xie, Yiwei Zhao, Ying Jiang et al.
Pathways on the Image Manifold: Image Editing via Video Generation
Noam Rotstein, Gal Yona, Daniel Silver et al.
Silent Branding Attack: Trigger-free Data Poisoning Attack on Text-to-Image Diffusion Models
Sangwon Jang, June Suk Choi, Jaehyeong Jo et al.
Generative Omnimatte: Learning to Decompose Video into Layers
Yao-Chih Lee, Erika Lu, Sarah Rumbley et al.
SPMTrack: Spatio-Temporal Parameter-Efficient Fine-Tuning with Mixture of Experts for Scalable Visual Tracking
Wenrui Cai, Qingjie Liu, Yunhong Wang
Iterative Predictor-Critic Code Decoding for Real-World Image Dehazing
Jiayi Fu, Siyu Liu, Zikun Liu et al.
LT3SD: Latent Trees for 3D Scene Diffusion
Quan Meng, Lei Li, Matthias Nießner et al.
Towards Enhanced Image Inpainting: Mitigating Unwanted Object Insertion and Preserving Color Consistency
Yikai Wang, Chenjie Cao, Junqiu Yu et al.
GenAssets: Generating in-the-wild 3D Assets in Latent Space
Ze Yang, Jingkang Wang, Haowei Zhang et al.
PerLA: Perceptive 3D Language Assistant
Guofeng Mei, Wei Lin, Luigi Riz et al.
HyperNVD: Accelerating Neural Video Decomposition via Hypernetworks
Maria Pilligua, Danna Xue, Javier Vazquez-Corral
Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding
Yan Wang, Baoxiong Jia, Ziyu Zhu et al.
GLane3D: Detecting Lanes with Graph of 3D Keypoints
Halil İbrahim Öztürk, Muhammet Esat Kalfaoglu, Ozsel Kilinc
Hyperbolic Safety-Aware Vision-Language Models
Tobia Poppi, Tejaswi Kasarla, Pascal Mettes et al.
PanoGS: Gaussian-based Panoptic Segmentation for 3D Open Vocabulary Scene Understanding
Hongjia Zhai, Hai Li, Zhenzhe Li et al.
Community Forensics: Using Thousands of Generators to Train Fake Image Detectors
Jeongsoo Park, Andrew Owens
Can Large Vision-Language Models Correct Semantic Grounding Errors By Themselves?
Yuan-Hong Liao, Rafid Mahmood, Sanja Fidler et al.
Universal Domain Adaptation for Semantic Segmentation
Seun-An Choe, Keon Hee Park, Jinwoo Choi et al.
Simulator HC: Regression-based Online Simulation of Starting Problem-Solution Pairs for Homotopy Continuation in Geometric Vision
Xinyue Zhang, Zijia Dai, Wanting Xu et al.
ArtFormer: Controllable Generation of Diverse 3D Articulated Objects
Jiayi Su, Youhe Feng, Zheng Li et al.
Faster Parameter-Efficient Tuning with Token Redundancy Reduction
Kwonyoung Kim, Jungin Park, Jin Kim et al.
PhD: A ChatGPT-Prompted Visual Hallucination Evaluation Dataset
Jiazhen Liu, Yuhan Fu, Ruobing Xie et al.
A Bias-Free Training Paradigm for More General AI-generated Image Detection
Fabrizio Guillaro, Giada Zingarini, Ben Usman et al.
MERGE: Multi-faceted Hierarchical Graph-based GNN for Gene Expression Prediction from Whole Slide Histopathology Images
Aniruddha Ganguly, Debolina Chatterjee, Wentao Huang et al.
Accurate Differential Operators for Hybrid Neural Fields
Aditya Chetan, Guandao Yang, Zichen Wang et al.
Do Computer Vision Foundation Models Learn the Low-level Characteristics of the Human Visual System?
Yancheng Cai, Fei Yin, Dounia Hammou et al.
DeDe: Detecting Backdoor Samples for SSL Encoders via Decoders
Sizai Hou, Songze Li, Duanyi Yao
FoundHand: Large-Scale Domain-Specific Learning for Controllable Hand Image Generation
Kefan Chen, Chaerin Min, Linguang Zhang et al.
RelationField: Relate Anything in Radiance Fields
Sebastian Koch, Johanna Wald, Mirco Colosi et al.
Multitwine: Multi-Object Compositing with Text and Layout Control
Gemma Canet Tarrés, Zhe Lin, Zhifei Zhang et al.
DepthSplat: Connecting Gaussian Splatting and Depth
Haofei Xu, Songyou Peng, Fangjinhua Wang et al.
Any3DIS: Class-Agnostic 3D Instance Segmentation by 2D Mask Tracking
Phuc Nguyen, Minh Luu, Anh Tran et al.
MultiMorph: On-demand Atlas Construction
Mazdak Abulnaga, Andrew Hoopes, Neel Dey et al.
Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
Soumya Suvra Ghosal, Souradip Chakraborty, Vaibhav Singh et al.
Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding
Zining Wang, Tongkun Guan, Pei Fu et al.
Bridging the Vision-Brain Gap with an Uncertainty-Aware Blur Prior
Haitao Wu, Qing Li, Changqing Zhang et al.
Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents
Yunseok Jang, Yeda Song, Sungryull Sohn et al.
Progressive Focused Transformer for Single Image Super-Resolution
Wei Long, Xingyu Zhou, Leheng Zhang et al.
FreeCloth: Free-form Generation Enhances Challenging Clothed Human Modeling
Hang Ye, Xiaoxuan Ma, Hai Ci et al.
From Zero to Detail: Deconstructing Ultra-High-Definition Image Restoration from Progressive Spectral Perspective
Chen Zhao, Zhizhou Chen, Yunzhe Xu et al.
Volumetrically Consistent 3D Gaussian Rasterization
Chinmay Talegaonkar, Yash Belhe, Ravi Ramamoorthi et al.
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
Hongjie Wang, Chih-Yao Ma, Yen-Cheng Liu et al.
EditSplat: Multi-View Fusion and Attention-Guided Optimization for View-Consistent 3D Scene Editing with 3D Gaussian Splatting
Dong In Lee, Hyeongcheol Park, Jiyoung Seo et al.
Motion Prompting: Controlling Video Generation with Motion Trajectories
Daniel Geng, Charles Herrmann, Junhwa Hur et al.
ProHOC: Probabilistic Hierarchical Out-of-Distribution Classification via Multi-Depth Networks
Erik Wallin, Fredrik Kahl, Lars Hammarstrand
Flash3D: Super-scaling Point Transformers through Joint Hardware-Geometry Locality
Liyan Chen, Gregory P. Meyer, Zaiwei Zhang et al.
Exploration-Driven Generative Interactive Environments
Nedko Savov, Naser Kazemi, Mohammad Mahdi et al.
GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery
Enguang Wang, Zhimao Peng, Zhengyuan Xie et al.
ActiveGAMER: Active GAussian Mapping through Efficient Rendering
Liyan Chen, Huangying Zhan, Kevin Chen et al.
SeaLion: Semantic Part-Aware Latent Point Diffusion Models for 3D Generation
Dekai Zhu, Yan Di, Stefan Gavranovic et al.
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
Wenbo Hu, Xiangjun Gao, Xiaoyu Li et al.
Rethinking Epistemic and Aleatoric Uncertainty for Active Open-Set Annotation: An Energy-Based Approach
Chen-Chen Zong, Sheng-Jun Huang
InteractVLM: 3D Interaction Reasoning from 2D Foundational Models
Sai Kumar Dwivedi, Dimitrije Antić, Shashank Tripathi et al.
Reconstructing Animals and the Wild
Peter Kulits, Michael J. Black, Silvia Zuffi
Controllable Human Image Generation with Personalized Multi-Garments
Yisol Choi, Sangkyung Kwak, Sihyun Yu et al.
PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability
Weijie Zhou, Manli Tao, Chaoyang Zhao et al.
MonoDGP: Monocular 3D Object Detection with Decoupled-Query and Geometry-Error Priors
Fanqi Pu, Yifan Wang, Jiru Deng et al.
Steepest Descent Density Control for Compact 3D Gaussian Splatting
Peihao Wang, Yuehao Wang, Dilin Wang et al.
Interactive Medical Image Analysis with Concept-based Similarity Reasoning
Ta Duc Huy, Sen Kim Tran, Phan Nguyen et al.
StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements
Mingkun Lei, Xue Song, Beier Zhu et al.
Convex Relaxation for Robust Vanishing Point Estimation in Manhattan World
Bangyan Liao, Zhenjun Zhao, Haoang Li et al.
Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation
Henghui Du, Guangyao Li, Chang Zhou et al.
AutoLUT: LUT-Based Image Super-Resolution with Automatic Sampling and Adaptive Residual Learning
Yuheng Xu, Shijie Yang, Xin Liu et al.
Arbitrary-steps Image Super-resolution via Diffusion Inversion
Zongsheng Yue, Kang Liao, Chen Change Loy
MAGiC-SLAM: Multi-Agent Gaussian Globally Consistent SLAM
Vladimir Yugay, Theo Gevers, Martin R. Oswald
LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation
Vladan Stojnić, Yannis Kalantidis, Jiri Matas et al.
EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion
Haotian Wang, Yuzhe Weng, Yueyan Li et al.
Subnet-Aware Dynamic Supernet Training for Neural Architecture Search
Jeimin Jeon, Youngmin Oh, Junghyup Lee et al.
CTRL-O: Language-Controllable Object-Centric Visual Representation Learning
Aniket Rajiv Didolkar, Andrii Zadaianchuk, Rabiul Awal et al.
Detecting Open World Objects via Partial Attribute Assignment
Muli Yang, Gabriel James Goenawan, Huaiyuan Qin et al.
Optical-Flow Guided Prompt Optimization for Coherent Video Generation
Hyelin Nam, Jaemin Kim, Dohun Lee et al.
Let's Verify and Reinforce Image Generation Step by Step
Renrui Zhang, Chengzhuo Tong, Zhizheng Zhao et al.
Blurred LiDAR for Sharper 3D: Robust Handheld 3D Scanning with Diffuse LiDAR and RGB
Nikhil Behari, Aaron Young, Siddharth Somasundaram et al.
LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding
Hongyu Li, Jinyu Chen, Ziyu Wei et al.
Dynamic Neural Surfaces for Elastic 4D Shape Representation and Analysis
Awais Nizamani, Hamid Laga, Guanjin Wang et al.
SGC-Net: Stratified Granular Comparison Network for Open-Vocabulary HOI Detection
Xin Lin, Chong Shi, Zuopeng Yang et al.
FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training
Anjia Cao, Xing Wei, Zhiheng Ma
ProxyTransformation: Preshaping Point Cloud Manifold With Proxy Attention For 3D Visual Grounding
Qihang Peng, Henry Zheng, Gao Huang
Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression
Xiaoyi Qu, David Aponte, Colby Banbury et al.
VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
Lei Li, wei yuancheng, Zhihui Xie et al.
Feature-Preserving Mesh Decimation for Normal Integration
Moritz Heep, Sven Behnke, Eduard Zell
BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models
Taha Koleilat, Hojat Asgariandehkordi, Hassan Rivaz et al.
Conical Visual Concentration for Efficient Large Vision-Language Models
Long Xing, Qidong Huang, Xiaoyi Dong et al.
Prior-free 3D Object Tracking
Xiuqiang Song, Li Jin, Zhengxian Zhang et al.
OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels
Meng Lou, Yizhou Yu
Cross-modal Information Flow in Multimodal Large Language Models
Zhi Zhang, Srishti Yadav, Fengze Han et al.
Sparse Point Cloud Patches Rendering via Splitting 2D Gaussians
Changfeng Ma, Ran Bi, Jie Guo et al.
Digital Twin Catalog: A Large-Scale Photorealistic 3D Object Digital Twin Dataset
Zhao Dong, Ka chen, Zhaoyang Lv et al.
IEEE Computer Society
RORem: Training a Robust Object Remover with Human-in-the-Loop
Ruibin Li, Tao Yang, Song Guo et al.
FG^2: Fine-Grained Cross-View Localization by Fine-Grained Feature Matching
Zimin Xia, Alex Alahi
MatAnyone: Stable Video Matting with Consistent Memory Propagation
Peiqing Yang, Shangchen Zhou, Jixin Zhao et al.
Hierarchical Compact Clustering Attention (COCA) for Unsupervised Object-Centric Learning
Can Küçüksözen, Yucel Yemez
Pay Attention to the Foreground in Object-Centric Learning
Pinzhuo Tian, Shengjie Yang, Hang Yu et al.
MetaShadow: Object-Centered Shadow Detection, Removal, and Synthesis
Tianyu Wang, Jianming Zhang, Haitian Zheng et al.
Population Normalization for Federated Learning
Zhuoyao Wang, Fan Yi, Peizhu Gong et al.
DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture
Qianlong Xiang, Miao Zhang, Yuzhang Shang et al.
Satellite to GroundScape - Large-scale Consistent Ground View Generation from Satellite Views
Ningli Xu, Rongjun Qin
QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge
Xuan Shen, Weize Ma, Jing Liu et al.
HUNet: Homotopy Unfolding Network for Image Compressive Sensing
Feiyang Shen, Hongping Gan
The Devil is in Low-Level Features for Cross-Domain Few-Shot Segmentation
Yuhan Liu, Yixiong Zou, Yuhua Li et al.
A Physics-Informed Blur Learning Framework for Imaging Systems
liqun.chen, Yuxuan Li, Jun Dai et al.
Hiding Images in Diffusion Models by Editing Learned Score Functions
Haoyu Chen, Yunqiao Yang, Nan Zhong et al.