Most Cited CVPR "higher-resolution generation" Papers
5,589 papers found • Page 9 of 28
Conference
Towards Transformer-Based Aligned Generation with Self-Coherence Guidance
Shulei Wang, Wang Lin, Hai Huang et al.
MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes
Ruijie Lu, Yixin Chen, Junfeng Ni et al.
EventGPT: Event Stream Understanding with Multimodal Large Language Models
shaoyu liu, Jianing Li, guanghui zhao et al.
RELI11D: A Comprehensive Multimodal Human Motion Dataset and Method
Ming Yan, Yan Zhang, Shuqiang Cai et al.
MP-GUI: Modality Perception with MLLMs for GUI Understanding
Ziwei Wang, Weizhi Chen, Leyang Yang et al.
Vanishing-Point-Guided Video Semantic Segmentation of Driving Scenes
Diandian Guo, Deng-Ping Fan, Tongyu Lu et al.
InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment
Yunhong Lu, Qichao Wang, Hengyuan Cao et al.
Neural Super-Resolution for Real-time Rendering with Radiance Demodulation
Jia Li, Ziling Chen, Xiaolong Wu et al.
GenDeg: Diffusion-based Degradation Synthesis for Generalizable All-In-One Image Restoration
Sudarshan Rajagopalan, Nithin Gopalakrishnan Nair, Jay Paranjape et al.
Probability Density Geodesics in Image Diffusion Latent Space
Qingtao Yu, Jaskirat Singh, Zhaoyuan Yang et al.
Language Guided Concept Bottleneck Models for Interpretable Continual Learning
Lu Yu, HaoYu Han, Zhe Tao et al.
O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models
Ashshak Sharifdeen, Muhammad Akhtar Munir, Sanoojan Baliah et al.
REWIND: Real-Time Egocentric Whole-Body Motion Diffusion with Exemplar-Based Identity Conditioning
Jihyun Lee, Weipeng Xu, Alexander Richard et al.
Clockwork Diffusion: Efficient Generation With Model-Step Distillation
Amirhossein Habibian, Amir Ghodrati, Noor Fathima et al.
FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Generalized Segmentation
Dong Zhao, Jinlong Li, Shuang Wang et al.
Motion Diversification Networks
Hee Jae Kim, Eshed Ohn-Bar
Hierarchical Correlation Clustering and Tree Preserving Embedding
Morteza Haghir Chehreghani, Mostafa Haghir Chehreghani
MultiGO: Towards Multi-level Geometry Learning for Monocular 3D Textured Human Reconstruction
Gangjian Zhang, Nanjie Yao, Shunsi Zhang et al.
RNG: Relightable Neural Gaussians
Jiahui Fan, Fujun Luan, Jian Yang et al.
Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval
Arun Reddy, Alexander Martin, Eugene Yang et al.
FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication
Eric Slyman, Stefan Lee, Scott Cohen et al.
Post-pre-training for Modality Alignment in Vision-Language Foundation Models
Shin'ya Yamaguchi, Dewei Feng, Sekitoshi Kanai et al.
DiG-IN: Diffusion Guidance for Investigating Networks - Uncovering Classifier Differences Neuron Visualisations and Visual Counterfactual Explanations
Maximilian Augustin, Yannic Neuhaus, Matthias Hein
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Miran Heo, Min-Hung Chen, De-An Huang et al.
Improving Physics-Augmented Continuum Neural Radiance Field-Based Geometry-Agnostic System Identification with Lagrangian Particle Optimization
Takuhiro Kaneko
UnSAMFlow: Unsupervised Optical Flow Guided by Segment Anything Model
Shuai Yuan, Lei Luo, Zhuo Hui et al.
TurboSL: Dense Accurate and Fast 3D by Neural Inverse Structured Light
Parsa Mirdehghan, Maxx Wu, Wenzheng Chen et al.
DIFFER: Disentangling Identity Features via Semantic Cues for Clothes-Changing Person Re-ID
Xin Liang, Yogesh S. Rawat
Seurat: From Moving Points to Depth
Seokju Cho, Gabriel Huang, Seungryong Kim et al.
HumanRig: Learning Automatic Rigging for Humanoid Character in a Large Scale Dataset
Zedong Chu, Feng Xiong, Meiduo Liu et al.
Circumventing Shortcuts in Audio-visual Deepfake Detection Datasets with Unsupervised Learning
Stefan Smeu, Dragos-Alexandru Boldisor, Dan Oneata et al.
Boosting Flow-based Generative Super-Resolution Models via Learned Prior
Li-Yuan Tsao, Yi-Chen Lo, Chia-Che Chang et al.
GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency
Dongyue Lu, Lingdong Kong, Tianxin Huang et al.
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences
Zhikai Li, Xuewen Liu, Dongrong Joe Fu et al.
ScribbleLight: Single Image Indoor Relighting with Scribbles
Jun Myeong Choi, Annie N. Wang, Pieter Peers et al.
Active Object Detection with Knowledge Aggregation and Distillation from Large Models
Dejie Yang, Yang Liu
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
Weiming Ren, Huan Yang, Jie Min et al.
PlanarSplatting: Accurate Planar Surface Reconstruction in 3 Minutes
Bin Tan, Rui Yu, Yujun Shen et al.
LesionLocator: Zero-Shot Universal Tumor Segmentation and Tracking in 3D Whole-Body Imaging
Maximilian Rokuss, Yannick Kirchhoff, Seval Akbal et al.
SketchVideo: Sketch-based Video Generation and Editing
Feng-Lin Liu, Hongbo Fu, Xintao Wang et al.
Mamba4D: Efficient 4D Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models
Jiuming Liu, Jinru Han, Lihao Liu et al.
Memory-Scalable and Simplified Functional Map Learning
Robin Magnet, Maks Ovsjanikov
Efficient Fine-Tuning and Concept Suppression for Pruned Diffusion Models
Reza Shirkavand, Peiran Yu, Shangqian Gao et al.
Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing
Hanhui Wang, Yihua Zhang, Ruizheng Bai et al.
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs
Xiaoqin Wang, Xusen Ma, Xianxu Hou et al.
Automatic Controllable Colorization via Imagination
Xiaoyan Cong, Yue Wu, Qifeng Chen et al.
Image Quality Assessment: Investigating Causal Perceptual Effects with Abductive Counterfactual Inference
Wenhao Shen, Mingliang Zhou, Yu Chen et al.
MedBN: Robust Test-Time Adaptation against Malicious Test Samples
Hyejin Park, Jeongyeon Hwang, Sunung Mun et al.
A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition
Yusheng Dai, HangChen, Jun Du et al.
RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models
Haoran Hao, Jiaming Han, Changsheng Li et al.
ACE: Anti-Editing Concept Erasure in Text-to-Image Models
Zihao Wang, Yuxiang Wei, Fan Li et al.
AMO Sampler: Enhancing Text Rendering with Overshooting
Xixi Hu, Keyang Xu, Bo Liu et al.
Instruction-based Image Manipulation by Watching How Things Move
Mingdeng Cao, Xuaner Zhang, Yinqiang Zheng et al.
FALCON: Fairness Learning via Contrastive Attention Approach to Continual Semantic Scene Understanding
Thanh-Dat Truong, Utsav Prabhu, Bhiksha Raj et al.
Edge-SD-SR: Low Latency and Parameter Efficient On-device Super-Resolution with Stable Diffusion via Bidirectional Conditioning
Isma Hadji, Mehdi Noroozi, Victor Escorcia et al.
BARD-GS: Blur-Aware Reconstruction of Dynamic Scenes via Gaussian Splatting
Yiren Lu, Yunlai Zhou, Disheng Liu et al.
3D-GSW: 3D Gaussian Splatting for Robust Watermarking
Youngdong Jang, Hyunje Park, Feng Yang et al.
DORNet: A Degradation Oriented and Regularized Network for Blind Depth Super-Resolution
Zhengxue Wang, Zhiqiang Yan, Jinshan Pan et al.
SapiensID: Foundation for Human Recognition
Minchul Kim, Dingqiang Ye, Yiyang Su et al.
Entity-NeRF: Detecting and Removing Moving Entities in Urban Scenes
Takashi Otonari, Satoshi Ikehata, Kiyoharu Aizawa
Diversity-aware Channel Pruning for StyleGAN Compression
Jiwoo Chung, Sangeek Hyun, Sang-Heon Shim et al.
Focusing on Tracks for Online Multi-Object Tracking
Kyujin Shim, Kangwook Ko, YuJin Yang et al.
Synthetic Prior for Few-Shot Drivable Head Avatar Inversion
Wojciech Zielonka, Stephan J. Garbin, Alexandros Lattas et al.
DiffFNO: Diffusion Fourier Neural Operator
Xiaoyi Liu, Hao Tang
Hypergraph Vision Transformers: Images are More than Nodes, More than Edges
Joshua Fixelle
FFF: Fixing Flawed Foundations in Contrastive Pre-Training Results in Very Strong Vision-Language Models
Adrian Bulat, Yassine Ouali, Georgios Tzimiropoulos
Noise Calibration and Spatial-Frequency Interactive Network for STEM Image Enhancement
Hesong Li, Ziqi Wu, Ruiwen Shao et al.
UltraFusion: Ultra High Dynamic Imaging using Exposure Fusion
Zixuan Chen, Yujin Wang, Xin Cai et al.
Unveiling Differences in Generative Models: A Scalable Differential Clustering Approach
Jingwei Zhang, Mohammad Jalali, Cheuk Ting Li et al.
Learning Discriminative Dynamics with Label Corruption for Noisy Label Detection
Suyeon Kim, Dongha Lee, SeongKu Kang et al.
Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition
Chengxiang Huang, Yake Wei, Zequn Yang et al.
Sparse2DGS: Geometry-Prioritized Gaussian Splatting for Surface Reconstruction from Sparse Views
Jiang Wu, Rui Li, Yu Zhu et al.
Joint Out-of-Distribution Filtering and Data Discovery Active Learning
Sebastian Schmidt, Leonard Schenk, Leo Schwinn et al.
Generative Zero-Shot Composed Image Retrieval
Lan Wang, Wei Ao, Vishnu Naresh Boddeti et al.
Task-Aware Encoder Control for Deep Video Compression
Xingtong Ge, Jixiang Luo, XINJIE ZHANG et al.
RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training
Raktim Gautam Goswami, Prashanth Krishnamurthy, Yann LeCun et al.
PosterO: Structuring Layout Trees to Enable Language Models in Generalized Content-Aware Layout Generation
HsiaoYuan Hsu, Yuxin Peng
Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model
Yingying Fan, Quanwei Yang, Kaisiyuan Wang et al.
FlowRAM: Grounding Flow Matching Policy with Region-Aware Mamba Framework for Robotic Manipulation
Sen Wang, Le Wang, Sanping Zhou et al.
DTGBrepGen: A Novel B-rep Generative Model through Decoupling Topology and Geometry
Jing Li, Yihang Fu, Falai Chen
Mask^2DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation
Tianhao Qi, Jianlong Yuan, Wanquan Feng et al.
DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery
Jiadong Tang, Yu Gao, Dianyi Yang et al.
VidLA: Video-Language Alignment at Scale
Mamshad Nayeem Rizve, Fan Fei, Jayakrishnan Unnikrishnan et al.
ROD-MLLM: Towards More Reliable Object Detection in Multimodal Large Language Models
Heng Yin, Yuqiang Ren, Ke Yan et al.
PICO: Reconstructing 3D People In Contact with Objects
Alpár Cseke, Shashank Tripathi, Sai Kumar Dwivedi et al.
Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation
Yongkang Li, Tianheng Cheng, Bin Feng et al.
Tartan IMU: A Light Foundation Model for Inertial Positioning in Robotics
Shibo Zhao, Sifan Zhou, Raphael Blanchard et al.
BHViT: Binarized Hybrid Vision Transformer
Tian Gao, Yu Zhang, Zhiyuan Zhang et al.
Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation
Ziheng Zhang, Jianyang Gu, Arpita Chowdhury et al.
MammAlps: A Multi-view Video Behavior Monitoring Dataset of Wild Mammals in the Swiss Alps
Valentin Gabeff, Haozhe Qi, Brendan Flaherty et al.
ProKeR: A Kernel Perspective on Few-Shot Adaptation of Large Vision-Language Models
Yassir Bendou, Amine Ouasfi, Vincent Gripon et al.
The Change You Want To Detect: Semantic Change Detection In Earth Observation With Hybrid Data Generationf
Yanis Benidir, Nicolas Gonthier, Clement Mallet
PELA: Learning Parameter-Efficient Models with Low-Rank Approximation
Yangyang Guo, Guangzhi Wang, Mohan Kankanhalli
Do Visual Imaginations Improve Vision-and-Language Navigation Agents?
Akhil Perincherry, Jacob Krantz, Stefan Lee
Boost Your Human Image Generation Model via Direct Preference Optimization
Sanghyeon Na, Yonggyu Kim, Hyunjoon Lee
Deep Imbalanced Regression via Hierarchical Classification Adjustment
Haipeng Xiong, Angela Yao
Exploring CLIP's Dense Knowledge for Weakly Supervised Semantic Segmentation
Zhiwei Yang, Yucong Meng, Kexue Fu et al.
vesselFM: A Foundation Model for Universal 3D Blood Vessel Segmentation
Bastian Wittmann, Yannick Wattenberg, Tamaz Amiranashvili et al.
Rashomon Sets for Prototypical-Part Networks: Editing Interpretable Models in Real-Time
Jon Donnelly, Zhicheng Guo, Alina Jade Barnett et al.
Self-Cross Diffusion Guidance for Text-to-Image Synthesis of Similar Subjects
Weimin Qiu, Jieke Wang, Meng Tang
AFL: A Single-Round Analytic Approach for Federated Learning with Pre-trained Models
Run He, Kai Tong, Di Fang et al.
Task-Adaptive Saliency Guidance for Exemplar-free Class Incremental Learning
Xialei Liu, Jiang-Tian Zhai, Andrew Bagdanov et al.
Adversarially Robust Few-shot Learning via Parameter Co-distillation of Similarity and Class Concept Learners
Junhao Dong, Piotr Koniusz, Junxi Chen et al.
EdgeTAM: On-Device Track Anything Model
Chong Zhou, Chenchen Zhu, Yunyang Xiong et al.
General Point Model Pretraining with Autoencoding and Autoregressive
Zhe Li, Zhangyang Gao, Cheng Tan et al.
Clustering for Protein Representation Learning
Ruijie Quan, Wenguan Wang, Fan Ma et al.
GeoReF: Geometric Alignment Across Shape Variation for Category-level Object Pose Refinement
Linfang Zheng, Tze Ho Elden Tse, Chen Wang et al.
RAD: Region-Aware Diffusion Models for Image Inpainting
Sora Kim, Sungho Suh, Minsik Lee
Cross-Dimension Affinity Distillation for 3D EM Neuron Segmentation
Xiaoyu Liu, Miaomiao Cai, Yinda Chen et al.
Show and Segment: Universal Medical Image Segmentation via In-Context Learning
Yunhe Gao, Di Liu, Zhuowei Li et al.
Meta-Point Learning and Refining for Category-Agnostic Pose Estimation
Junjie Chen, Jiebin Yan, Yuming Fang et al.
Diffusion-FOF: Single-View Clothed Human Reconstruction via Diffusion-Based Fourier Occupancy Field
Yuanzhen Li, Fei LUO, Chunxia Xiao
DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation
Haonan Lin
Event-based Structure-from-Orbit
Ethan Elms, Yasir Latif, Tae Ha Park et al.
SoMA: Singular Value Decomposed Minor Components Adaptation for Domain Generalizable Representation Learning
Seokju Yun, Seunghye Chae, Dongheon Lee et al.
Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification
Yang Qin, Chao Chen, Zhihang Fu et al.
Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images
Kazi Sajeed Mehrab, M. Maruf, Arka Daw et al.
StreamingFlow: Streaming Occupancy Forecasting with Asynchronous Multi-modal Data Streams via Neural Ordinary Differential Equation
Yining Shi, Kun JIANG, Ke Wang et al.
Lessons and Insights from a Unifying Study of Parameter-Efficient Fine-Tuning (PEFT) in Visual Recognition
Zheda Mai, Ping Zhang, Cheng-Hao Tu et al.
AirPlanes: Accurate Plane Estimation via 3D-Consistent Embeddings
Jamie Watson, Filippo Aleotti, Mohamed Sayed et al.
MonoSplat: Generalizable 3D Gaussian Splatting from Monocular Depth Foundation Models
Yifan Liu, Keyu Fan, Weihao Yu et al.
Cross Initialization for Face Personalization of Text-to-Image Models
Lianyu Pang, Jian Yin, Haoran Xie et al.
DeCoTR: Enhancing Depth Completion with 2D and 3D Attentions
Yunxiao Shi, Manish Singh, Hong Cai et al.
Multi-Granularity Class Prototype Topology Distillation for Class-Incremental Source-Free Unsupervised Domain Adaptation
Peihua Deng, Jiehua Zhang, Xichun Sheng et al.
AniMer: Animal Pose and Shape Estimation Using Family Aware Transformer
Jin Lyu, Tianyi Zhu, Yi Gu et al.
Time-Efficient Light-Field Acquisition Using Coded Aperture and Events
Shuji Habuchi, Keita Takahashi, Chihiro Tsutake et al.
Multirate Neural Image Compression with Adaptive Lattice Vector Quantization
Hao Xu, Xiaolin Wu, Xi Zhang
Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration
Zilong Huang, Jun He, Junyan Ye et al.
Relational Matching for Weakly Semi-Supervised Oriented Object Detection
Wenhao Wu, Hau San Wong, Si Wu et al.
In-N-Out: Faithful 3D GAN Inversion with Volumetric Decomposition for Face Editing
Yiran Xu, Zhixin Shu, Cameron Smith et al.
g3D-LF: Generalizable 3D-Language Feature Fields for Embodied Tasks
Zihan Wang, Gim Hee Lee
DiC: Rethinking Conv3x3 Designs in Diffusion Models
Yuchuan Tian, Jing Han, Chengcheng Wang et al.
LoTUS: Large-Scale Machine Unlearning with a Taste of Uncertainty
Christoforos N. Spartalis, Theodoros Semertzidis, Efstratios Gavves et al.
T2ICount: Enhancing Cross-modal Understanding for Zero-Shot Counting
Yifei Qian, Zhongliang Guo, Bowen Deng et al.
Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves
Shihan Wu, Ji Zhang, Pengpeng Zeng et al.
SfM-Free 3D Gaussian Splatting via Hierarchical Training
Bo Ji, Angela Yao
An N-Point Linear Solver for Line and Motion Estimation with Event Cameras
Ling Gao, Daniel Gehrig, Hang Su et al.
AVF-MAE++: Scaling Affective Video Facial Masked Autoencoders via Efficient Audio-Visual Self-Supervised Learning
Xuecheng Wu, Heli Sun, Yifan Wang et al.
Gaussian Splatting for Efficient Satellite Image Photogrammetry
Luca Savant Aira, Gabriele Facciolo, Thibaud Ehret
DeNVeR: Deformable Neural Vessel Representations for Unsupervised Video Vessel Segmentation
Chun-Hung Wu, Shih-Hong Chen, Chih Yao Hu et al.
ProbPose: A Probabilistic Approach to 2D Human Pose Estimation
Miroslav Purkrábek, Jiri Matas
Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization
zefeng zhang, Hengzhu Tang, Jiawei Sheng et al.
Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve Adjustment
Ziteng Cui, Xuangeng Chu, Tatsuya Harada
Monocular Identity-Conditioned Facial Reflectance Reconstruction
Xingyu Ren, Jiankang Deng, Yuhao Cheng et al.
Language Model Guided Interpretable Video Action Reasoning
Ning Wang, Guangming Zhu, Hongsheng Li et al.
The Computer Vision Foundation
Yancheng Cai, Fei Yin, Dounia Hammou et al.
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Yuying Ge, Yizhuo Li, Yixiao Ge et al.
Parameter-efficient Fine-tuning in Hyperspherical Space for Open-vocabulary Semantic Segmentation
Zelin Peng, Zhengqin Xu, Zhilin Zeng et al.
BiTT: Bi-directional Texture Reconstruction of Interacting Two Hands from a Single Image
Minje Kim, Tae-Kyun Kim
SuperPrimitive: Scene Reconstruction at a Primitive Level
Kirill Mazur, Gwangbin Bae, Andrew J. Davison
Cross-modal Causal Relation Alignment for Video Question Grounding
weixing chen, Yang Liu, Binglin Chen et al.
Hyperbolic Category Discovery
Yuanpei Liu, Zhenqi He, Kai Han
Steganographic Passport: An Owner and User Verifiable Credential for Deep Model IP Protection Without Retraining
Qi Cui, Ruohan Meng, Chaohui Xu et al.
Dynamic Updates for Language Adaptation in Visual-Language Tracking
Xiaohai Li, Bineng Zhong, Qihua Liang et al.
LookCloser: Frequency-aware Radiance Field for Tiny-Detail Scene
Xiaoyu Zhang, Weihong Pan, Chong Bao et al.
Facial Identity Anonymization via Intrinsic and Extrinsic Attention Distraction
Zhenzhong Kuang, Xiaochen Yang, Yingjie Shen et al.
Towards Stable and Storage-efficient Dataset Distillation: Matching Convexified Trajectory
Wenliang Zhong, Haoyu Tang, Qinghai Zheng et al.
Panorama Generation From NFoV Image Done Right
Dian Zheng, Cheng Zhang, Xiao-Ming Wu et al.
Enhancing 3D Gaze Estimation in the Wild using Weak Supervision with Gaze Following Labels
Pierre Vuillecard, Jean-marc Odobez
Artist-Friendly Relightable and Animatable Neural Heads
Yingyan Xu, Prashanth Chandran, Sebastian Weiss et al.
Detail-Preserving Latent Diffusion for Stable Shadow Removal
Jiamin Xu, Yuxin Zheng, Zelong Li et al.
Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image Generative Models
Takami Sato, Justin Yue, Nanze Chen et al.
L-MAGIC: Language Model Assisted Generation of Images with Coherence
zhipeng cai, Matthias Mueller, Reiner Birkl et al.
Towards Generalizable Scene Change Detection
Jae-Woo KIM, Ue-Hwan Kim
DiffLocks: Generating 3D Hair from a Single Image using Diffusion Models
Radu Alexandru Rosu, Keyu Wu, Yao Feng et al.
HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization
Zitang Zhou, Ke Mei, Yu Lu et al.
Progress-Aware Video Frame Captioning
Zihui Xue, Joungbin An, Xitong Yang et al.
DiET-GS: Diffusion Prior and Event Stream-Assisted Motion Deblurring 3D Gaussian Splatting
Seungjun Lee, Gim Hee Lee
Instance-based Max-margin for Practical Few-shot Recognition
Minghao Fu, Ke Zhu
InteractAnything: Zero-shot Human Object Interaction Synthesis via LLM Feedback and Object Affordance Parsing
Jinlu Zhang, Yixin Chen, Zan Wang et al.
CL-LoRA: Continual Low-Rank Adaptation for Rehearsal-Free Class-Incremental Learning
Jiangpeng He, Zhihao Duan, Fengqing Zhu
Gradient-Guided Annealing for Domain Generalization
Aristotelis Ballas, Christos Diou
FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement
Ian Huang, Yanan Bao, Karen Truong et al.
Single View Refractive Index Tomography with Neural Fields
Brandon Zhao, Aviad Levis, Liam Connor et al.
Benchmarking Segmentation Models with Mask-Preserved Attribute Editing
Zijin Yin, Kongming Liang, Bing Li et al.
Generating Illustrated Instructions
Sachit Menon, Ishan Misra, Rohit Girdhar
SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization
Hongrui Jia, Chaoya Jiang, Haiyang Xu et al.
Stable Neighbor Denoising for Source-free Domain Adaptive Segmentation
Dong Zhao, Shuang Wang, Qi Zang et al.
Contextual AD Narration with Interleaved Multimodal Sequence
Hanlin Wang, Zhan Tong, Kecheng Zheng et al.
Adaptive Part Learning for Fine-Grained Generalized Category Discovery: A Plug-and-Play Enhancement
Qiyuan Dai, Hanzhuo Huang, Yu Wu et al.
StdGEN: Semantic-Decomposed 3D Character Generation from Single Images
Yuze He, Yanning Zhou, Wang Zhao et al.
SemGeoMo: Dynamic Contextual Human Motion Generation with Semantic and Geometric Guidance
Peishan Cong, Ziyi Wang, Yuexin Ma et al.
FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion
Haosen Yang, Adrian Bulat, Isma Hadji et al.
G3DR: Generative 3D Reconstruction in ImageNet
Pradyumna Reddy, Ismail Elezi, Jiankang Deng
BrainWash: A Poisoning Attack to Forget in Continual Learning
Ali Abbasi, Parsa Nooralinejad, Hamed Pirsiavash et al.
A Generative Approach for Wikipedia-Scale Visual Entity Recognition
Mathilde Caron, Ahmet Iscen, Alireza Fathi et al.
CLIPtone: Unsupervised Learning for Text-based Image Tone Adjustment
Hyeongmin Lee, Kyoungkook Kang, Jungseul Ok et al.
TIMotion: Temporal and Interactive Framework for Efficient Human-Human Motion Generation
Yabiao Wang, Shuo Wang, Jiangning Zhang et al.
Inversion Circle Interpolation: Diffusion-based Image Augmentation for Data-scarce Classification
Yanghao Wang, Long Chen
GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models with Generative Flow Networks
Haoqiang Kang, Enna Sachdeva, Piyush Gupta et al.
InterAct: Advancing Large-Scale Versatile 3D Human-Object Interaction Generation
Sirui Xu, Dongting Li, Yucheng Zhang et al.
A Simple yet Effective Layout Token in Large Language Models for Document Understanding
Zhaoqing Zhu, Chuwei Luo, Zirui Shao et al.
STPro: Spatial and Temporal Progressive Learning for Weakly Supervised Spatio-Temporal Grounding
Aaryan Garg, Akash Kumar, Yogesh S. Rawat
Towards Backward-Compatible Continual Learning of Image Compression
Zhihao Duan, Ming Lu, Justin Yang et al.
PBR-NeRF: Inverse Rendering with Physics-Based Neural Fields
Sean Wu, Shamik Basu, Tim Broedermann et al.
ConMo: Controllable Motion Disentanglement and Recomposition for Zero-Shot Motion Transfer
Jiayi Gao, Zijin Yin, Changcheng Hua et al.
Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model
Shengjun Zhang, Jinzhao Li, Xin Fei et al.
When Visual Grounding Meets Gigapixel-level Large-scale Scenes: Benchmark and Approach
TAO MA, Bing Bai, Haozhe Lin et al.
Robust Multimodal Survival Prediction with Conditional Latent Differentiation Variational AutoEncoder
Junjie Zhou, Jiao Tang, Yingli Zuo et al.
Object-Shot Enhanced Grounding Network for Egocentric Video
Yisen Feng, Haoyu Zhang, Meng Liu et al.