Most Cited CVPR "reliability" Papers
5,589 papers found • Page 16 of 28
Conference
BOTH2Hands: Inferring 3D Hands from Both Text Prompts and Body Dynamics
Wenqian Zhang, Molin Huang, Yuxuan Zhou et al.
DiC: Rethinking Conv3x3 Designs in Diffusion Models
Yuchuan Tian, Jing Han, Chengcheng Wang et al.
MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning
Matteo Farina, Massimiliano Mancini, Elia Cunegatti et al.
Improving Out-of-Distribution Generalization in Graphs via Hierarchical Semantic Environments
Yinhua Piao, Sangseon Lee, Yijingxiu Lu et al.
Tartan IMU: A Light Foundation Model for Inertial Positioning in Robotics
Shibo Zhao, Sifan Zhou, Raphael Blanchard et al.
Unleashing In-context Learning of Autoregressive Models for Few-shot Image Manipulation
Bolin Lai, Felix Juefei-Xu, Miao Liu et al.
MedBN: Robust Test-Time Adaptation against Malicious Test Samples
Hyejin Park, Jeongyeon Hwang, Sunung Mun et al.
WaveMo: Learning Wavefront Modulations to See Through Scattering
Mingyang Xie, Haiyun Guo, Brandon Y. Feng et al.
CacheQuant: Comprehensively Accelerated Diffusion Models
Xuewen Liu, Zhikai Li, Qingyi Gu
Self-Cross Diffusion Guidance for Text-to-Image Synthesis of Similar Subjects
Weimin Qiu, Jieke Wang, Meng Tang
An N-Point Linear Solver for Line and Motion Estimation with Event Cameras
Ling Gao, Daniel Gehrig, Hang Su et al.
Cross-Dimension Affinity Distillation for 3D EM Neuron Segmentation
Xiaoyu Liu, Miaomiao Cai, Yinda Chen et al.
Activity-Biometrics: Person Identification from Daily Activities
Shehreen Azad, Yogesh S. Rawat
Domain-Specific Block Selection and Paired-View Pseudo-Labeling for Online Test-Time Adaptation
Yeonguk Yu, Sungho Shin, Seunghyeok Back et al.
HumMUSS: Human Motion Understanding using State Space Models
Arnab Mondal, Stefano Alletto, Denis Tome
AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models
Xinghui Li, Qichao Sun, Pengze Zhang et al.
Selectively Informative Description can Reduce Undesired Embedding Entanglements in Text-to-Image Personalization
Jimyeong Kim, Jungwon Park, Wonjong Rhee
EventFly: Event Camera Perception from Ground to the Sky
Lingdong Kong, Dongyue Lu, Xiang Xu et al.
Deciphering ‘What’ and ‘Where’ Visual Pathways from Spectral Clustering of Layer-Distributed Neural Representations
Xiao Zhang, David Yunis, Michael Maire
LLaMA-Excitor: General Instruction Tuning via Indirect Feature Interaction
Bo Zou, Chao Yang, Yu Qiao et al.
MAR-3D: Progressive Masked Auto-regressor for High-Resolution 3D Generation
Jinnan Chen, Lingting Zhu, Zeyu HU et al.
GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation
Ning Gao, Yilun Chen, Shuai Yang et al.
ZoomLDM: Latent Diffusion Model for Multi-scale Image Generation
Srikar Yellapragada, Alexandros Graikos, Kostas Triaridis et al.
DTGBrepGen: A Novel B-rep Generative Model through Decoupling Topology and Geometry
Jing Li, Yihang Fu, Falai Chen
General Point Model Pretraining with Autoencoding and Autoregressive
Zhe Li, Zhangyang Gao, Cheng Tan et al.
Make-It-Vivid: Dressing Your Animatable Biped Cartoon Characters from Text
Junshu Tang, Yanhong Zeng, Ke Fan et al.
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding
Jinlong Li, Cristiano Saltori, Fabio Poiesi et al.
PEACE: Empowering Geologic Map Holistic Understanding with MLLMs
Yangyu Huang, Tianyi Gao, Haoran Xu et al.
Masked Spatial Propagation Network for Sparsity-Adaptive Depth Refinement
Jinyoung Jun, Jae-Han Lee, Chang-Su Kim
VA3: Virtually Assured Amplification Attack on Probabilistic Copyright Protection for Text-to-Image Generative Models
Xiang Li, Qianli Shen, Kenji Kawaguchi
StreamingFlow: Streaming Occupancy Forecasting with Asynchronous Multi-modal Data Streams via Neural Ordinary Differential Equation
Yining Shi, Kun JIANG, Ke Wang et al.
Stable Neighbor Denoising for Source-free Domain Adaptive Segmentation
Dong Zhao, Shuang Wang, Qi Zang et al.
Cross Initialization for Face Personalization of Text-to-Image Models
Lianyu Pang, Jian Yin, Haoran Xie et al.
3D LiDAR Mapping in Dynamic Environments using a 4D Implicit Neural Representation
Xingguang Zhong, Yue Pan, Cyrill Stachniss et al.
Task-Aware Encoder Control for Deep Video Compression
Xingtong Ge, Jixiang Luo, XINJIE ZHANG et al.
Physics-Aware Hand-Object Interaction Denoising
Haowen Luo, Yunze Liu, Li Yi
MV-Adapter: Multimodal Video Transfer Learning for Video Text Retrieval
bowen zhang, Xiaojie Jin, Weibo Gong et al.
AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos
Felix Wimbauer, Weirong Chen, Dominik Muhle et al.
Spectral and Polarization Vision: Spectro-polarimetric Real-world Dataset
Yujin Jeon, Eunsue Choi, Youngchan Kim et al.
RigGS: Rigging of 3D Gaussians for Modeling Articulated Objects in Videos
Yuxin Yao, Zhi Deng, Junhui Hou
Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs
Zeyi Huang, Yuyang Ji, Xiaofang Wang et al.
SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization
Hongrui Jia, Chaoya Jiang, Haiyang Xu et al.
Gradient-Guided Annealing for Domain Generalization
Aristotelis Ballas, Christos Diou
HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization
Zitang Zhou, Ke Mei, Yu Lu et al.
MI-DETR: An Object Detection Model with Multi-time Inquiries Mechanism
Zhixiong Nan, Xianghong Li, Tao Xiang et al.
LookCloser: Frequency-aware Radiance Field for Tiny-Detail Scene
Xiaoyu Zhang, Weihong Pan, Chong Bao et al.
AVF-MAE++: Scaling Affective Video Facial Masked Autoencoders via Efficient Audio-Visual Self-Supervised Learning
Xuecheng Wu, Heli Sun, Yifan Wang et al.
Detours for Navigating Instructional Videos
Kumar Ashutosh, Zihui Xue, Tushar Nagarajan et al.
Reference-Based 3D-Aware Image Editing with Triplanes
Bahri Batuhan Bilecen, Yiğit Yalın, Ning Yu et al.
Coeff-Tuning: A Graph Filter Subspace View for Tuning Attention-Based Large Models
Zichen Miao, WEI CHEN, Qiang Qiu
UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image
Xingyu Liu, Gu Wang, Ruida Zhang et al.
Reconstructing Humans with a Biomechanically Accurate Skeleton
Yan Xia, Xiaowei Zhou, Etienne Vouga et al.
CLIPtone: Unsupervised Learning for Text-based Image Tone Adjustment
Hyeongmin Lee, Kyoungkook Kang, Jungseul Ok et al.
FedAWA: Adaptive Optimization of Aggregation Weights in Federated Learning Using Client Vectors
Changlong Shi, He Zhao, Bingjie Zhang et al.
A Generative Approach for Wikipedia-Scale Visual Entity Recognition
Mathilde Caron, Ahmet Iscen, Alireza Fathi et al.
MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval
Reno Kriz, Kate Sanders, David Etter et al.
Task-Conditioned Adaptation of Visual Features in Multi-Task Policy Learning
Pierre Marza, Laetitia Matignon, Olivier Simonin et al.
DiffLocks: Generating 3D Hair from a Single Image using Diffusion Models
Radu Alexandru Rosu, Keyu Wu, Yao Feng et al.
Discovering Fine-Grained Visual-Concept Relations by Disentangled Optimal Transport Concept Bottleneck Models
Yan Xie, Zequn Zeng, Hao Zhang et al.
SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model
Shuhan Tan, John Wheatley Lambert, Hong Jeon et al.
GOAL: Global-local Object Alignment Learning
Hyungyu Choi, Young Kyun Jang, Chanho Eom
L-MAGIC: Language Model Assisted Generation of Images with Coherence
zhipeng cai, Matthias Mueller, Reiner Birkl et al.
GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill
Jieming Cui, Tengyu Liu, Ziyu Meng et al.
Boosting Order-Preserving and Transferability for Neural Architecture Search: a Joint Architecture Refined Search and Fine-tuning Approach
Beichen Zhang, Xiaoxing Wang, Xiaohan Qin et al.
VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models
Chi-Pin Huang, Yen-Siang Wu, Hung-Kai Chung et al.
Coherent Temporal Synthesis for Incremental Action Segmentation
Guodong Ding, Hans Golong, Angela Yao
EvEnhancer: Empowering Effectiveness, Efficiency and Generalizability for Continuous Space-Time Video Super-Resolution with Events
Shuoyan Wei, Feng Li, Shengeng Tang et al.
MotiF: Making Text Count in Image Animation with Motion Focal Loss
Shijie Wang, Samaneh Azadi, Rohit Girdhar et al.
TurboFill: Adapting Few-step Text-to-image Model for Fast Image Inpainting
Liangbin Xie, Daniil Pakhomov, Zhonghao Wang et al.
ChatHuman: Chatting about 3D Humans with Tools
Jing Lin, Yao Feng, Weiyang Liu et al.
Intriguing Properties of Diffusion Models: An Empirical Study of the Natural Attack Capability in Text-to-Image Generative Models
Takami Sato, Justin Yue, Nanze Chen et al.
The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation
Bingjie Gao, Xinyu Gao, Xiaoxue Wu et al.
Neural Lineage
Runpeng Yu, Xinchao Wang
PICD: Versatile Perceptual Image Compression with Diffusion Rendering
Tongda Xu, Jiahao Li, Bin Li et al.
Diffeomorphic Template Registration for Atmospheric Turbulence Mitigation
Dong Lao, Congli Wang, Alex Wong et al.
Do Computer Vision Foundation Models Learn the Low-level Characteristics of the Human Visual System?
Yancheng Cai, Fei Yin, Dounia Hammou et al.
BiTT: Bi-directional Texture Reconstruction of Interacting Two Hands from a Single Image
Minje Kim, Tae-Kyun Kim
LIRM: Large Inverse Rendering Model for Progressive Reconstruction of Shape, Materials and View-dependent Radiance Fields
Zhengqin Li, Dilin Wang, Ka chen et al.
Language Model Guided Interpretable Video Action Reasoning
Ning Wang, Guangming Zhu, Hongsheng Li et al.
M3amba: Memory Mamba is All You Need for Whole Slide Image Classification
Tingting Zheng, Kui Jiang, Yi Xiao et al.
AETTA: Label-Free Accuracy Estimation for Test-Time Adaptation
Taeckyung Lee, Sorn Chottananurak, Taesik Gong et al.
RelationField: Relate Anything in Radiance Fields
Sebastian Koch, Johanna Wald, Mirco Colosi et al.
Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics
Lee Chae-Yeon, Oh Hyun-Bin, Han EunGi et al.
Question-Aware Gaussian Experts for Audio-Visual Question Answering
Hongyeob Kim, Inyoung Jung, Dayoon Suh et al.
GCE-Pose: Global Context Enhancement for Category-level Object Pose Estimation
Weihang Li, Hongli XU, Junwen Huang et al.
Enhancing Privacy-Utility Trade-offs to Mitigate Memorization in Diffusion Models
Chen Chen, Daochang Liu, Mubarak Shah et al.
Adaptive Part Learning for Fine-Grained Generalized Category Discovery: A Plug-and-Play Enhancement
Qiyuan Dai, Hanzhuo Huang, Yu Wu et al.
Multi-View Attentive Contextualization for Multi-View 3D Object Detection
Xianpeng Liu, Ce Zheng, Ming Qian et al.
Diffusion Reflectance Map: Single-Image Stochastic Inverse Rendering of Illumination and Reflectance
Yuto Enyo, Ko Nishino
StdGEN: Semantic-Decomposed 3D Character Generation from Single Images
Yuze He, Yanning Zhou, Wang Zhao et al.
Artist-Friendly Relightable and Animatable Neural Heads
Yingyan Xu, Prashanth Chandran, Sebastian Weiss et al.
Image Quality Assessment: From Human to Machine Preference
Chunyi Li, Yuan Tian, Xiaoyue Ling et al.
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model
Lirui Zhao, Yue Yang, Kaipeng Zhang et al.
Classifier-Free Guidance Inside the Attraction Basin May Cause Memorization
Anubhav Jain, Yuya Kobayashi, Takashi Shibuya et al.
InteractVLM: 3D Interaction Reasoning from 2D Foundational Models
Sai Kumar Dwivedi, Dimitrije Antić, Shashank Tripathi et al.
Generating Illustrated Instructions
Sachit Menon, Ishan Misra, Rohit Girdhar
Generative Densification: Learning to Densify Gaussians for High-Fidelity Generalizable 3D Reconstruction
Seungtae Nam, Xiangyu Sun, Gyeongjin Kang et al.
CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoor Object Detection from Multi-view Images
Guanlin Shen, Jingwei Huang, Zhihua Hu et al.
Pose Priors from Language Models
Sanjay Subramanian, Evonne Ng, Lea Müller et al.
Active Open-Vocabulary Recognition: Let Intelligent Moving Mitigate CLIP Limitations
Lei Fan, Jianxiong Zhou, Xiaoying Xing et al.
A Simple yet Effective Layout Token in Large Language Models for Document Understanding
Zhaoqing Zhu, Chuwei Luo, Zirui Shao et al.
Inversion Circle Interpolation: Diffusion-based Image Augmentation for Data-scarce Classification
Yanghao Wang, Long Chen
DiverseFlow: Sample-Efficient Diverse Mode Coverage in Flows
Mashrur M. Morshed, Vishnu Naresh Boddeti
Steepest Descent Density Control for Compact 3D Gaussian Splatting
Peihao Wang, Yuehao Wang, Dilin Wang et al.
OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos
Dongyoung Choi, Hyeonjoong Jang, Min H. Kim
Querying as Prompt: Parameter-Efficient Learning for Multimodal Language Model
Tian Liang, Jing Huang, Ming Kong et al.
CoMapGS: Covisibility Map-based Gaussian Splatting for Sparse Novel View Synthesis
Youngkyoon Jang, Eduardo Pérez-Pellitero
PBR-NeRF: Inverse Rendering with Physics-Based Neural Fields
Sean Wu, Shamik Basu, Tim Broedermann et al.
HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation
Hongwei Zheng, Han Li, Wenrui Dai et al.
FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations
Hmrishav Bandyopadhyay, Yi-Zhe Song
Visual Lexicon: Rich Image Features in Language Space
XuDong Wang, Xingyi Zhou, Alireza Fathi et al.
Sim-to-Real Causal Transfer: A Metric Learning Approach to Causally-Aware Interaction Representations
Ahmad Rahimi, Po-Chien Luan, Yuejiang Liu et al.
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
Siyuan Li, Luyuan Zhang, Zedong Wang et al.
STPro: Spatial and Temporal Progressive Learning for Weakly Supervised Spatio-Temporal Grounding
Aaryan Garg, Akash Kumar, Yogesh S. Rawat
LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation
Vladan Stojnić, Yannis Kalantidis, Jiri Matas et al.
Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model
Shengjun Zhang, Jinzhao Li, Xin Fei et al.
KITRO: Refining Human Mesh by 2D Clues and Kinematic-tree Rotation
Fengyuan Yang, Kerui Gu, Angela Yao
Relative Pose Estimation through Affine Corrections of Monocular Depth Priors
Yifan Yu, Shaohui Liu, Rémi Pautrat et al.
Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange
Yanhao Wu, Tong Zhang, Wei Ke et al.
CTRL-O: Language-Controllable Object-Centric Visual Representation Learning
Aniket Rajiv Didolkar, Andrii Zadaianchuk, Rabiul Awal et al.
Multi-modal Vision Pre-training for Medical Image Analysis
Shaohao Rui, Lingzhi Chen, Zhenyu Tang et al.
Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD Map
Xinyuan Chang, Maixuan Xue, Xinran Liu et al.
Robust Multimodal Survival Prediction with Conditional Latent Differentiation Variational AutoEncoder
Junjie Zhou, Jiao Tang, Yingli Zuo et al.
MOS: Modeling Object-Scene Associations in Generalized Category Discovery
Zhengyuan Peng, Jinpeng Ma, Zhimin Sun et al.
Towards a Perceptual Evaluation Framework for Lighting Estimation
Justine Giroux, Mohammad Reza Karimi Dastjerdi, Yannick Hold-Geoffroy et al.
Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion
Vitor Guizilini, Muhammad Zubair Irshad, Dian Chen et al.
DiET-GS: Diffusion Prior and Event Stream-Assisted Motion Deblurring 3D Gaussian Splatting
Seungjun Lee, Gim Hee Lee
Progress-Aware Video Frame Captioning
Zihui Xue, Joungbin An, Xitong Yang et al.
Temporal Alignment-Free Video Matching for Few-shot Action Recognition
SuBeen Lee, WonJun Moon, Hyun Seok Seong et al.
Detail-Preserving Latent Diffusion for Stable Shadow Removal
Jiamin Xu, Yuxin Zheng, Zelong Li et al.
FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training
Anjia Cao, Xing Wei, Zhiheng Ma
Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression
Xiaoyi Qu, David Aponte, Colby Banbury et al.
Improving Distant 3D Object Detection Using 2D Box Supervision
Zetong Yang, Zhiding Yu, Christopher Choy et al.
Enhancing 3D Gaze Estimation in the Wild using Weak Supervision with Gaze Following Labels
Pierre Vuillecard, Jean-marc Odobez
Semantic Shield: Defending Vision-Language Models Against Backdooring and Poisoning via Fine-grained Knowledge Alignment
Alvi Md Ishmam, Chris Thomas
How Do I Do That? Synthesizing 3D Hand Motion and Contacts for Everyday Interactions
Aditya Prakash, Benjamin E Lundell, Dmitry Andreychuk et al.
GroupMamba: Efficient Group-Based Visual State Space Model
Abdelrahman Shaker, Syed Talal Wasim, Salman Khan et al.
Object-Shot Enhanced Grounding Network for Egocentric Video
Yisen Feng, Haoyu Zhang, Meng Liu et al.
PixelRNN: In-pixel Recurrent Neural Networks for End-to-end–optimized Perception with Neural Sensors
Haley So, Laurie Bose, Piotr Dudek et al.
LORS: Low-rank Residual Structure for Parameter-Efficient Network Stacking
Jialin Li, Qiang Nie, Weifu Fu et al.
FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance
Dian Shao, Mingfei Shi, Shengda Xu et al.
The Computer Vision Foundation
Yancheng Cai, Fei Yin, Dounia Hammou et al.
Global Latent Neural Rendering
Thomas Tanay, Matteo Maggioni
FG^2: Fine-Grained Cross-View Localization by Fine-Grained Feature Matching
Zimin Xia, Alex Alahi
Towards Backward-Compatible Continual Learning of Image Compression
Zhihao Duan, Ming Lu, Justin Yang et al.
3D Gaussian Head Avatars with Expressive Dynamic Appearances by Compact Tensorial Representations
yating wang, Xuan Wang, Ran Yi et al.
Rethinking Temporal Fusion with a Unified Gradient Descent View for 3D Semantic Occupancy Prediction
Dubing Chen, Huan Zheng, Jin Fang et al.
Semantic Human Mesh Reconstruction with Textures
xiaoyu zhan, Jianxin Yang, Yuanqi Li et al.
Think Small, Act Big: Primitive Prompt Learning for Lifelong Robot Manipulation
Yuanqi Yao, Siao Liu, Haoming Song et al.
GauSTAR: Gaussian Surface Tracking and Reconstruction
Chengwei Zheng, Lixin Xue, Juan Jose Zarate et al.
Implicit Motion Function
Yue Gao, Jiahao Li, Lei Chu et al.
BodyMAP - Jointly Predicting Body Mesh and 3D Applied Pressure Map for People in Bed
Abhishek Tandon, Anujraaj Goyal, Henry M. Clever et al.
CGI-DM: Digital Copyright Authentication for Diffusion Models via Contrasting Gradient Inversion
Xiaoyu Wu, Yang Hua, Chumeng Liang et al.
Exploring Historical Information for RGBE Visual Tracking with Mamba
Chuanyu Sun, Jiqing Zhang, Yang Wang et al.
Steganographic Passport: An Owner and User Verifiable Credential for Deep Model IP Protection Without Retraining
Qi Cui, Ruohan Meng, Chaohui Xu et al.
SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation
Chen Sichen, Yingyi Zhang, Siming Huang et al.
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
Sanghwan Kim, Rui Xiao, Iuliana Georgescu et al.
Free on the Fly: Enhancing Flexibility in Test-Time Adaptation with Online EM
Qiyuan Dai, Sibei Yang
QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge
Xuan Shen, Weize Ma, Jing Liu et al.
Federated Learning with Domain Shift Eraser
Zheng Wang, Zihui Wang, Zheng Wang et al.
SuperPrimitive: Scene Reconstruction at a Primitive Level
Kirill Mazur, Gwangbin Bae, Andrew J. Davison
Gaussian Splatting for Efficient Satellite Image Photogrammetry
Luca Savant Aira, Gabriele Facciolo, Thibaud Ehret
Image is All You Need to Empower Large-scale Diffusion Models for In-Domain Generation
Pu Cao, Feng Zhou, Lu Yang et al.
Grounding 3D Object Affordance with Language Instructions, Visual Observations and Interactions
He Zhu, Quyu Kong, Kechun Xu et al.
UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal Transformer for Image Generation
Lunhao Duan, Shanshan Zhao, Wenjun Yan et al.
ROICtrl: Boosting Instance Control for Visual Generation
Yuchao Gu, Yipin Zhou, Yunfan Ye et al.
U-VAP: User-specified Visual Appearance Personalization via Decoupled Self Augmentation
You Wu, Kean Liu, Xiaoyue Mi et al.
Epistemic Uncertainty Quantification For Pre-Trained Neural Networks
Hanjing Wang, Qiang Ji
ModeSeq: Taming Sparse Multimodal Motion Prediction with Sequential Mode Modeling
Zikang Zhou, Hengjian Zhou, Haibo Hu et al.
ArtiScene: Language-Driven Artistic 3D Scene Generation Through Image Intermediary
Zeqi Gu, Yin Cui, Max Li et al.
Real-time High-fidelity Gaussian Human Avatars with Position-based Interpolation of Spatially Distributed MLPs
Youyi Zhan, Tianjia Shao, Yin Yang et al.
Defense without Forgetting: Continual Adversarial Defense with Anisotropic & Isotropic Pseudo Replay
Yuhang Zhou, Zhongyun Hua
Guiding Human-Object Interactions with Rich Geometry and Relations
Mengqing Xue, Yifei Liu, Ling Guo et al.
Makeup Prior Models for 3D Facial Makeup Estimation and Applications
Xingchao Yang, Takafumi Taketomi, Yuki Endo et al.
Learning Bijective Surface Parameterization for Inferring Signed Distance Functions from Sparse Point Clouds with Grid Deformation
Takeshi Noda, Chao Chen, Junsheng Zhou et al.
FocSAM: Delving Deeply into Focused Objects in Segmenting Anything
You Huang, Zongyu Lan, Liujuan Cao et al.
MICap: A Unified Model for Identity-Aware Movie Descriptions
Haran Raajesh, Naveen Reddy Desanur, Zeeshan Khan et al.
RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives
Chirag Parikh, Deepti Rawat, Rakshitha R. T. et al.
DI-PCG: Diffusion-based Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation
Wang Zhao, Yan-Pei Cao, Jiale Xu et al.
Fully Exploiting Every Real Sample: SuperPixel Sample Gradient Model Stealing
Yunlong Zhao, Xiaoheng Deng, Yijing Liu et al.
Instance-based Max-margin for Practical Few-shot Recognition
Minghao Fu, Ke Zhu
SEAL: Semantic Attention Learning for Long Video Representation
Lan Wang, Yujia Chen, Wen-Sheng Chu et al.
Exploiting Temporal State Space Sharing for Video Semantic Segmentation
Hesham Syed, Yun Liu, Guolei Sun et al.
Fine-Grained Erasure in Text-to-Image Diffusion-based Foundation Models
Kartik Thakral, Tamar Glaser, Tal Hassner et al.
Do We Always Need the Simplicity Bias? Looking for Optimal Inductive Biases in the Wild
Damien Teney, Liangze Jiang, Florin Gogianu et al.
SCAP: Transductive Test-Time Adaptation via Supportive Clique-based Attribute Prompting
Chenyu Zhang, Kunlun Xu, Zichen Liu et al.
Don't Look into the Dark: Latent Codes for Pluralistic Image Inpainting
Haiwei Chen, Yajie Zhao
No More Ambiguity in 360° Room Layout via Bi-Layout Estimation
Yu-Ju Tsai, Jin-Cheng Jhang, JINGJING ZHENG et al.
Constrained Layout Generation with Factor Graphs
Mohammed Haroon Dupty, Yanfei Dong, Sicong Leng et al.
Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models?
Yanbo Wang, Jiyang Guan, Jian Liang et al.
Turbo3D: Ultra-fast Text-to-3D Generation
Hanzhe Hu, Tianwei Yin, Fujun Luan et al.
ChatGen: Automatic Text-to-Image Generation From FreeStyle Chatting
Chengyou Jia, Changliang Xia, Zhuohang Dang et al.
Taming the Tail in Class-Conditional GANs: Knowledge Sharing via Unconditional Training at Lower Resolutions
Saeed Khorram, Mingqi Jiang, Mohamad Shahbazi et al.
ADFactory: An Effective Framework for Generalizing Optical Flow with NeRF
Han Ling, Quansen Sun, Yinghui Sun et al.
GRAPHGPT-O: Synergistic Multimodal Comprehension and Generation on Graphs
Yi Fang, Bowen Jin, Jiacheng Shen et al.
Building Vision Models upon Heat Conduction
Zhaozhi Wang, Yue Liu, Yunjie Tian et al.
In Search of a Data Transformation That Accelerates Neural Field Training
Junwon Seo, Sangyoon Lee, Kwang In Kim et al.
Filter Images First, Generate Instructions Later: Pre-Instruction Data Selection for Visual Instruction Tuning
Bardia Safaei, Faizan Siddiqui, Jiacong Xu et al.
Binarized Mamba-Transformer for Lightweight Quad Bayer HybridEVS Demosaicing
Shiyang Zhou, Haijin Zeng, Yunfan Lu et al.
Dynamic Updates for Language Adaptation in Visual-Language Tracking
Xiaohai Li, Bineng Zhong, Qihua Liang et al.