Most Cited CVPR "iterative warping" Papers
5,589 papers found • Page 11 of 28
Conference
TSAM: Temporal SAM Augmented with Multimodal Prompts for Referring Audio-Visual Segmentation
Abduljalil Radman, Jorma Laaksonen
TCFG: Tangential Damping Classifier-free Guidance
Mingi Kwon, Shin seong Kim, Jaeseok Jeong et al.
Robust 3D Shape Reconstruction in Zero-Shot from a Single Image in the Wild
Junhyeong Cho, Kim Youwang, Hunmin Yang et al.
Interpretable Image Classification via Non-parametric Part Prototype Learning
Zhijie Zhu, Lei Fan, Maurice Pagnucco et al.
It’s a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data
Dominik Schnaus, Nikita Araslanov, Daniel Cremers
FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations
Hmrishav Bandyopadhyay, Yi-Zhe Song
Augmented Deep Contexts for Spatially Embedded Video Coding
Yifan Bian, Chuanbo Tang, Li Li et al.
PEACE: Empowering Geologic Map Holistic Understanding with MLLMs
Yangyu Huang, Tianyi Gao, Haoran Xu et al.
Rethinking Spiking Self-Attention Mechanism: Implementing α-XNOR Similarity Calculation in Spiking Transformers
Yichen Xiao, Shuai Wang, Dehao Zhang et al.
Laplacian-guided Entropy Model in Neural Codec with Blur-dissipated Synthesis
Atefeh Khoshkhahtinat, Ali Zafari, Piyush Mehta et al.
RealEdit: Reddit Edits As a Large-scale Empirical Dataset for Image Transformations
Peter Sushko, Ayana Bharadwaj, Zhi Yang Lim et al.
DualPM: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction
Ben Kaye, Tomas Jakab, Shangzhe Wu et al.
Diffusion Reflectance Map: Single-Image Stochastic Inverse Rendering of Illumination and Reflectance
Yuto Enyo, Ko Nishino
4DTAM: Non-Rigid Tracking and Mapping via Dynamic Surface Gaussians
Hidenobu Matsuki, Gwangbin Bae, Andrew J. Davison
Boosting Order-Preserving and Transferability for Neural Architecture Search: a Joint Architecture Refined Search and Fine-tuning Approach
Beichen Zhang, Xiaoxing Wang, Xiaohan Qin et al.
Learning to Rank Patches for Unbiased Image Redundancy Reduction
Yang Luo, Zhineng Chen, Peng Zhou et al.
RANGE: Retrieval Augmented Neural Fields for Multi-Resolution Geo-Embeddings
Aayush Dhakal, Srikumar Sastry, Subash Khanal et al.
TurboFill: Adapting Few-step Text-to-image Model for Fast Image Inpainting
Liangbin Xie, Daniil Pakhomov, Zhonghao Wang et al.
Revisiting Sampson Approximations for Geometric Estimation Problems
Felix Rydell, Angelica Torres, Viktor Larsson
FluxSpace: Disentangled Semantic Editing in Rectified Flow Models
Yusuf Dalva, Kavana Venkatesh, Pinar Yanardag
COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection
Jinqi Xiao, Shen Sang, Tiancheng Zhi et al.
Stable-SCore: A Stable Registration-based Framework for 3D Shape Correspondence
Haolin Liu, Xiaohang Zhan, Zizheng Yan et al.
Multi-Session SLAM with Differentiable Wide-Baseline Pose Optimization
Lahav Lipson, Jia Deng
3D Occupancy Prediction with Low-Resolution Queries via Prototype-aware View Transformation
Gyeongrok Oh, Sung June Kim, Heeju Ko et al.
DefectFill: Realistic Defect Generation with Inpainting Diffusion Model for Visual Inspection
Jaewoo Song, Daemin Park, Kanghyun Baek et al.
A Comprehensive Study of Decoder-Only LLMs for Text-to-Image Generation
Andrew Z Wang, Songwei Ge, Tero Karras et al.
BF-STVSR: B-Splines and Fourier---Best Friends for High Fidelity Spatial-Temporal Video Super-Resolution
Eunjin Kim, HYEONJIN KIM, Kyong Hwan Jin et al.
Guiding Human-Object Interactions with Rich Geometry and Relations
Mengqing Xue, Yifei Liu, Ling Guo et al.
StyleSSP: Sampling StartPoint Enhancement for Training-free Diffusion-based Method for Style Transfer
ruojun xu, Weijie Xi, Xiaodi Wang et al.
FreeMan: Towards Benchmarking 3D Human Pose Estimation under Real-World Conditions
Jiong WANG, Fengyu Yang, Bingliang Li et al.
Generative Sparse-View Gaussian Splatting
Hanyang Kong, Xingyi Yang, Xinchao Wang
Multi-modal Knowledge Distillation-based Human Trajectory Forecasting
Jaewoo Jeong, Seohee Lee, Daehee Park et al.
SDGOCC: Semantic and Depth-Guided Bird's-Eye View Transformation for 3D Multimodal Occupancy Prediction
ZaiPeng Duan, Xuzhong Hu, Pei An et al.
AlphaPre: Amplitude-Phase Disentanglement Model for Precipitation Nowcasting
Kenghong Lin, Baoquan Zhang, Demin Yu et al.
Fully Geometric Panoramic Localization
Junho Kim, Jiwon Jeong, Young Min Kim
PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model
Mingju Gao, Yike Pan, Huan-ang Gao et al.
Satellite Observations Guided Diffusion Model for Accurate Meteorological States at Arbitrary Resolution
Siwei Tu, Ben Fei, Weidong Yang et al.
Federated Learning with Domain Shift Eraser
Zheng Wang, Zihui Wang, Zheng Wang et al.
D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation
Weinan Jia, Mengqi Huang, Nan Chen et al.
GroupMamba: Efficient Group-Based Visual State Space Model
Abdelrahman Shaker, Syed Talal Wasim, Salman Khan et al.
Grounding 3D Object Affordance with Language Instructions, Visual Observations and Interactions
He Zhu, Quyu Kong, Kechun Xu et al.
How Do I Do That? Synthesizing 3D Hand Motion and Contacts for Everyday Interactions
Aditya Prakash, Benjamin E Lundell, Dmitry Andreychuk et al.
Density-guided Translator Boosts Synthetic-to-Real Unsupervised Domain Adaptive Segmentation of 3D Point Clouds
Zhimin Yuan, Wankang Zeng, Yanfei Su et al.
Learning from Neighbors: Category Extrapolation for Long-Tail Learning
Shizhen Zhao, Xin Wen, Jiahui Liu et al.
Bringing CLIP to the Clinic: Dynamic Soft Labels and Negation-Aware Learning for Medical Analysis
Hanbin Ko, Chang Min Park
Audio-Visual Semantic Graph Network for Audio-Visual Event Localization
Liang Liu, Shuaiyong Li, Yongqiang Zhu
Detect Any Mirrors: Boosting Learning Reliability on Large-Scale Unlabeled Data with an Iterative Data Engine
Zhaohu Xing, Lihao Liu, Yijun Yang et al.
AniMo: Species-Aware Model for Text-Driven Animal Motion Generation
Xuan Wang, Kai Ruan, Xing Zhang et al.
SimVS: Simulating World Inconsistencies for Robust View Synthesis
Alex Trevithick, Roni Paiss, Philipp Henzler et al.
Learned Scanpaths Aid Blind Panoramic Video Quality Assessment
Kanglong FAN, Wen Wen, Mu Li et al.
Point-to-Region Loss for Semi-Supervised Point-Based Crowd Counting
Wei Lin, Chenyang ZHAO, Antoni B. Chan
NECA: Neural Customizable Human Avatar
Junjin Xiao, Qing Zhang, Zhan Xu et al.
GarmentPile: Point-Level Visual Affordance Guided Retrieval and Adaptation for Cluttered Garments Manipulation
Ruihai Wu, Ziyu Zhu, Yuran Wang et al.
Pseudo Label Refinery for Unsupervised Domain Adaptation on Cross-dataset 3D Object Detection
Zhanwei Zhang, Minghao Chen, Shuai Xiao et al.
Shape My Moves: Text-Driven Shape-Aware Synthesis of Human Motions
Ting-Hsuan Liao, Yi Zhou, Yu Shen et al.
Zero-shot 3D Question Answering via Voxel-based Dynamic Token Compression
Hsiang-Wei Huang, Fu-Chen Chen, Wenhao Chai et al.
Seg2Reg: Differentiable 2D Segmentation to 1D Regression Rendering for 360 Room Layout Reconstruction
Cheng Sun, Wei-En Tai, Yu-Lin Shih et al.
EdgeDiff: Edge-aware Diffusion Network for Building Reconstruction from Point Clouds
Yujun Liu, Ruisheng Wang, Shangfeng Huang et al.
Boosting Adversarial Transferability through Augmentation in Hypothesis Space
Yu Guo, Weiquan Liu, Qingshan Xu et al.
HeMoRa: Unsupervised Heuristic Consensus Sampling for Robust Point Cloud Registration
Shaocheng Yan, Yiming Wang, Kaiyan Zhao et al.
FutureHuman3D: Forecasting Complex Long-Term 3D Human Behavior from Video Observations
Christian Diller, Thomas Funkhouser, Angela Dai
FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation
Tianyun Zhong, Chao Liang, Jianwen Jiang et al.
Efficient ANN-Guided Distillation: Aligning Rate-based Features of Spiking Neural Networks through Hybrid Block-wise Replacement
Shu Yang, Chengting Yu, Lei Liu et al.
IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments
Can Zhang, Gim Hee Lee
Exploiting Temporal State Space Sharing for Video Semantic Segmentation
Hesham Syed, Yun Liu, Guolei Sun et al.
Towards RAW Object Detection in Diverse Conditions
Zhong-Yu Li, Xin Jin, Bo-Yuan Sun et al.
Complementary Advantages: Exploiting Cross-Field Frequency Correlation for NIR-Assisted Image Denoising
Yuchen Wang, Hongyuan Wang, Lizhi Wang et al.
Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis
Tianci Bi, Xiaoyi Zhang, Zhizheng Zhang et al.
High-Fidelity Relightable Monocular Portrait Animation with Lighting-Controllable Video Diffusion Model
Mingtao Guo, Guanyu Xing, Yanli Liu
DeSplat: Decomposed Gaussian Splatting for Distractor-Free Rendering
Yihao Wang, Marcus Klasson, Matias Turkulainen et al.
D^3-Human: Dynamic Disentangled Digital Human from Monocular Video
Honghu Chen, Bo Peng, Yunfan Tao et al.
Stationary Representations: Optimally Approximating Compatibility and Implications for Improved Model Replacements
Niccolò Biondi, Federico Pernici, Simone Ricci et al.
Training-free Dense-Aligned Diffusion Guidance for Modular Conditional Image Synthesis
Zixuan Wang, DUO PENG, Feng Chen et al.
SCULPT: Shape-Conditioned Unpaired Learning of Pose-dependent Clothed and Textured Human Meshes
Soubhik Sanyal, Partha Ghosh, Jinlong Yang et al.
Unsegment Anything by Simulating Deformation
Jiahao Lu, Xingyi Yang, Xinchao Wang
ShapeWalk: Compositional Shape Editing Through Language-Guided Chains
Habib Slim, Mohamed Elhoseiny
On the Consistency of Video Large Language Models in Temporal Comprehension
Minjoon Jung, Junbin Xiao, Byoung-Tak Zhang et al.
MotionPRO: Exploring the Role of Pressure in Human MoCap and Beyond
Shenghao Ren, Yi Lu, Jiayi Huang et al.
A Polarization-Aided Transformer for Image Deblurring via Motion Vector Decomposition
Duosheng Chen, Shihao Zhou, Jinshan Pan et al.
Mind the Trojan Horse: Image Prompt Adapter Enabling Scalable and Deceptive Jailbreaking
Junxi Chen, Junhao Dong, Xiaohua Xie
One is Plenty: A Polymorphic Feature Interpreter for Immutable Heterogeneous Collaborative Perception
Yuchen Xia, Quan Yuan, Guiyang Luo et al.
ProtoDepth: Unsupervised Continual Depth Completion with Prototypes
Patrick Rim, Hyoungseob Park, Suchisrit Gangopadhyay et al.
ACT-Diffusion: Efficient Adversarial Consistency Training for One-step Diffusion Models
Fei Kong, Jinhao Duan, Lichao Sun et al.
AesthetiQ: Enhancing Graphic Layout Design via Aesthetic-Aware Preference Alignment of Multi-modal Large Language Models
Sohan Patnaik, Rishabh Jain, Balaji Krishnamurthy et al.
ZeroGrasp: Zero-Shot Shape Reconstruction Enabled Robotic Grasping
Shun Iwase, Muhammad Zubair Irshad, Katherine Liu et al.
Hyperspherical Classification with Dynamic Label-to-Prototype Assignment
Mohammad Saadabadi Saadabadi, Ali Dabouei, Sahar Rahimi Malakshan et al.
Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis
Yousef Yeganeh, Ioannis Charisiadis, Marta Hasny et al.
DiaLoc: An Iterative Approach to Embodied Dialog Localization
Chao Zhang, Mohan Li, Ignas Budvytis et al.
DiTASK: Multi-Task Fine-Tuning with Diffeomorphic Transformations
Krishna Sri Ipsit Mantri, Carola-Bibiane Schönlieb, Bruno Ribeiro et al.
GIFStream: 4D Gaussian-based Immersive Video with Feature Stream
Hao Li, Sicheng Li, Xiang Gao et al.
Intensity-Robust Autofocus for Spike Camera
Changqing Su, Zhiyuan Ye, Yongsheng Xiao et al.
Self-Learning Hyperspectral and Multispectral Image Fusion via Adaptive Residual Guided Subspace Diffusion Model
Jian Zhu, He Wang, Yang Xu et al.
Makeup Prior Models for 3D Facial Makeup Estimation and Applications
Xingchao Yang, Takafumi Taketomi, Yuki Endo et al.
Neural Motion Simulator Pushing the Limit of World Models in Reinforcement Learning
Chenjie Hao, Weyl Lu, Yifan Xu et al.
Mosaic of Modalities: A Comprehensive Benchmark for Multimodal Graph Learning
Jing Zhu, Yuhang Zhou, Shengyi Qian et al.
FreeGave: 3D Physics Learning from Dynamic Videos by Gaussian Velocity
Jinxi Li, Ziyang Song, Siyuan Zhou et al.
Robust Message Embedding via Attention Flow-Based Steganography
Huayuan Ye, Shenzhuo Zhang, Shiqi Jiang et al.
Multi-modal Vision Pre-training for Medical Image Analysis
Shaohao Rui, Lingzhi Chen, Zhenyu Tang et al.
AIpparel: A Multimodal Foundation Model for Digital Garments
Kiyohiro Nakayama, Jan Ackermann, Timur Levent Kesdogan et al.
ITA-MDT: Image-Timestep-Adaptive Masked Diffusion Transformer Framework for Image-Based Virtual Try-On
Ji Woo Hong, Tri Ton, Trung X. Pham et al.
UHD-processer: Unified UHD Image Restoration with Progressive Frequency Learning and Degradation-aware Prompts
Yidi Liu, Dong Li, Xueyang Fu et al.
NoPain: No-box Point Cloud Attack via Optimal Transport Singular Boundary
Zezeng Li, Xiaoyu Du, Na Lei et al.
Revisiting Source-Free Domain Adaptation: Insights into Representativeness, Generalization, and Variety
Ronghang Zhu, Mengxuan Hu, Weiming Zhuang et al.
Spotting the Unexpected (STU): A 3D LiDAR Dataset for Anomaly Segmentation in Autonomous Driving
Alexey Nekrasov, Malcolm Burdorf, Stewart Worrall et al.
SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis
Junho Kim, Hyunjun Kim, Hosu Lee et al.
ReNeg: Learning Negative Embedding with Reward Guidance
Xiaomin Li, yixuan liu, Takashi Isobe et al.
CLOC: Contrastive Learning for Ordinal Classification with Multi-Margin N-pair Loss
Dileepa Pitawela, Gustavo Carneiro, Hsiang-Ting Chen
Edit One for All: Interactive Batch Image Editing
Thao Nguyen, Utkarsh Ojha, Yuheng Li et al.
HandOS: 3D Hand Reconstruction in One Stage
Xingyu Chen, Zhuheng Song, Xiaoke Jiang et al.
InteractionMap: Improving Online Vectorized HDMap Construction with Interaction
Kuang Wu, Chuan Yang, Zhanbin Li
Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery
Yuqi Zhang, Guanying Chen, Jiaxing Chen et al.
RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments
Haisheng Su, Feixiang Song, CONG MA et al.
Efficient Hyperparameter Optimization with Adaptive Fidelity Identification
Jiantong Jiang, Zeyi Wen, Atif Mansoor et al.
V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents
Zhengrong Yue, Shaobin Zhuang, Kunchang Li et al.
Binarized Neural Network for Multi-spectral Image Fusion
Junming Hou, Xiaoyu Chen, Ran Ran et al.
Enhancing Testing-Time Robustness for Trusted Multi-View Classification in the Wild
Wei Liu, Yufei Chen, Xiaodong Yue
Pos3R: 6D Pose Estimation for Unseen Objects Made Easy
Weijian Deng, Dylan Campbell, Chunyi Sun et al.
CTRL-D: Controllable Dynamic 3D Scene Editing with Personalized 2D Diffusion
Kai He, Chin-Hsuan Wu, Igor Gilitschenski
Omnidirectional Multi-Object Tracking
Kai Luo, Hao Shi, Sheng Wu et al.
A Unified Framework for Human-centric Point Cloud Video Understanding
Yiteng Xu, Kecheng Ye, xiao han et al.
GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation
Ning Gao, Yilun Chen, Shuai Yang et al.
ProbeSDF: Light Field Probes For Neural Surface Reconstruction
Briac Toussaint, Diego Thomas, Jean-Sébastien Franco
Advancing Semantic Future Prediction through Multimodal Visual Sequence Transformers
Efstathios Karypidis, Ioannis Kakogeorgiou, Spyros Gidaris et al.
Dynamic Stereotype Theory Induced Micro-expression Recognition with Oriented Deformation
Bohao Zhang, Xuejiao Wang, Changbo Wang et al.
Robust Self-calibration of Focal Lengths from the Fundamental Matrix
Viktor Kocur, Daniel Kyselica, Zuzana Kukelova
FDS: Frequency-Aware Denoising Score for Text-Guided Latent Diffusion Image Editing
Yufan Ren, Zicong Jiang, Tong Zhang et al.
Generalized Diffusion Detector: Mining Robust Features from Diffusion Models for Domain-Generalized Detection
Boyong He, Yuxiang Ji, Qianwen Ye et al.
OCRT: Boosting Foundation Models in the Open World with Object-Concept-Relation Triad
Luyao Tang, Chaoqi Chen, Yuxuan Yuan et al.
Visual Objectification in Films: Towards a New AI Task for Video Interpretation
Julie Tores, Lucile Sassatelli, Hui-Yin Wu et al.
Learning Visual Generative Priors without Text
Shuailei Ma, Kecheng Zheng, Ying Wei et al.
MAtCha Gaussians: Atlas of Charts for High-Quality Geometry and Photorealism From Sparse Views
Antoine Guédon, Tomoki Ichikawa, Kohei Yamashita et al.
HUSH: Holistic Panoramic 3D Scene Understanding using Spherical Harmonics
Jongsung Lee, HARIN PARK, Byeong-Uk Lee et al.
LUCAS: Layered Universal Codec Avatars
Di Liu, Teng Deng, Giljoo Nam et al.
Improving Adversarial Transferability on Vision Transformers via Forward Propagation Refinement
Yuchen Ren, Zhengyu Zhao, Chenhao Lin et al.
Detection-Friendly Nonuniformity Correction: A Union Framework for Infrared UAV Target Detection
Houzhang Fang, Xiaolin Wang, Zengyang Li et al.
Context-Enhanced Memory-Refined Transformer for Online Action Detection
Zhanzhong Pang, Fadime Sener, Angela Yao
Projecting Trackable Thermal Patterns for Dynamic Computer Vision
Mark Sheinin, Aswin C. Sankaranarayanan, Srinivasa G. Narasimhan
Modeling Thousands of Human Annotators for Generalizable Text-to-Image Person Re-identification
Jiayu Jiang, Changxing Ding, Wentao Tan et al.
Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing
Zhedong Zhang, Liang Li, Chenggang Yan et al.
On the Zero-shot Adversarial Robustness of Vision-Language Models: A Truly Zero-shot and Training-free Approach
Baoshun Tong, Hanjiang Lai, Yan Pan et al.
3DEnhancer: Consistent Multi-View Diffusion for 3D Enhancement
Yihang Luo, Shangchen Zhou, Yushi Lan et al.
Concept Replacer: Replacing Sensitive Concepts in Diffusion Models via Precision Localization
lingyun zhang, Yu Xie, Yanwei Fu et al.
Understanding Fine-tuning CLIP for Open-vocabulary Semantic Segmentation in Hyperbolic Space
Zelin Peng, Zhengqin Xu, Zhilin Zeng et al.
CoMBO: Conflict Mitigation via Branched Optimization for Class Incremental Segmentation
Kai Fang, Anqi Zhang, Guangyu Gao et al.
Question-Aware Gaussian Experts for Audio-Visual Question Answering
Hongyeob Kim, Inyoung Jung, Dayoon Suh et al.
Finding Local Diffusion Schrödinger Bridge using Kolmogorov-Arnold Network
Xingyu Qiu, Mengying Yang, Xinghua Ma et al.
OnlineAnySeg: Online Zero-Shot 3D Segmentation by Visual Foundation Model Guided 2D Mask Merging
Yijie Tang, Jiazhao Zhang, Yuqing Lan et al.
Learning to Navigate Efficiently and Precisely in Real Environments
Guillaume Bono, Hervé Poirier, Leonid Antsfeld et al.
Object-aware Sound Source Localization via Audio-Visual Scene Understanding
Sung Jin Um, Dongjin Kim, Sangmin Lee et al.
PanDA: Towards Panoramic Depth Anything with Unlabeled Panoramas and Mobius Spatial Augmentation
Zidong Cao, Jinjing Zhu, Weiming Zhang et al.
Cropper: Vision-Language Model for Image Cropping through In-Context Learning
Seung Hyun Lee, Jijun jiang, Yiran Xu et al.
Spatial Transport Optimization by Repositioning Attention Map for Training-Free Text-to-Image Synthesis
Woojung Han, Yeonkyung Lee, Chanyoung Kim et al.
Interpretable Generative Models through Post-hoc Concept Bottlenecks
Akshay R. Kulkarni, Ge Yan, Chung-En Sun et al.
Distilled Prompt Learning for Incomplete Multimodal Survival Prediction
Yingxue Xu, Fengtao ZHOU, Chenyu Zhao et al.
Ego4o: Egocentric Human Motion Capture and Understanding from Multi-Modal Input
Jian Wang, Rishabh Dabral, Diogo Luvizon et al.
IterIS: Iterative Inference-Solving Alignment for LoRA Merging
Hongxu chen, Zhen Wang, Runshi Li et al.
Fully Exploiting Every Real Sample: SuperPixel Sample Gradient Model Stealing
Yunlong Zhao, Xiaoheng Deng, Yijing Liu et al.
AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward
Haonan Han, Xiangzuo Wu, Huan Liao et al.
EchoONE: Segmenting Multiple Echocardiography Planes in One Model
Jiongtong Hu, Wei Zhuo, Jun Cheng et al.
Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection
Gensheng Pei, Tao Chen, Yujia Wang et al.
Free360: Layered Gaussian Splatting for Unbounded 360-Degree View Synthesis from Extremely Sparse and Unposed Views
Chong Bao, Xiyu Zhang, Zehao Yu et al.
4Deform: Neural Surface Deformation for Robust Shape Interpolation
Lu Sang, Zehranaz Canfes, Dongliang Cao et al.
DecoupledGaussian: Object-Scene Decoupling for Physics-Based Interaction
Miaowei Wang, Yibo Zhang, Rui Ma et al.
Universal Robustness via Median Randomized Smoothing for Real-World Super-Resolution
Zakariya Chaouai, Mohamed Tamaazousti
Multi-party Collaborative Attention Control for Image Customization
Han Yang, Chuanguang Yang, Qiuli Wang et al.
From Alexnet to Transformers: Measuring the Non-linearity of Deep Neural Networks with Affine Optimal Transport
Quentin Bouniot, Ievgen Redko, Anton Mallasto et al.
Hardware-Rasterized Ray-Based Gaussian Splatting
Samuel Rota Bulò, Lorenzo Porzi, Nemanja Bartolovic et al.
Flow-Guided Online Stereo Rectification for Wide Baseline Stereo
Anush Kumar, Fahim Mannan, Omid Hosseini Jafari et al.
Removing Reflections from RAW Photos
Eric Kee, Adam Pikielny, Kevin Blackburn-Matzen et al.
Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts
Qizhou Chen, Chengyu Wang, Dakan Wang et al.
Learning Heterogeneous Tissues with Mixture of Experts for Gigapixel Whole Slide Images
Junxian Wu, Minheng Chen, Xinyi Ke et al.
Semantic and Expressive Variations in Image Captions Across Languages
Andre Ye, Sebastin Santy, Jena D. Hwang et al.
UIBDiffusion: Universal Imperceptible Backdoor Attack for Diffusion Models
Yuning Han, Bingyin Zhao, Rui Chu et al.
SocialGesture: Delving into Multi-person Gesture Understanding
Xu Cao, Pranav Virupaksha, Wenqi Jia et al.
TFCustom: Customized Image Generation with Time-Aware Frequency Feature Guidance
Mushui Liu, Dong She, Qihan Huang et al.
Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation
Xin Yan, Yuxuan Cai, Qiuyue Wang et al.
FaceLift: Semi-supervised 3D Facial Landmark Localization
David Ferman, Pablo Garrido, Gaurav Bharaj
Uni-Renderer: Unifying Rendering and Inverse Rendering Via Dual Stream Diffusion
ZhiFei Chen, Tianshuo Xu, Wenhang Ge et al.
Improving Transferable Targeted Attacks with Feature Tuning Mixup
Kaisheng Liang, Xuelong Dai, Yanjie Li et al.
Unleashing Network Potentials for Semantic Scene Completion
Fengyun Wang, Qianru Sun, Dong Zhang et al.
From Elements to Design: A Layered Approach for Automatic Graphic Design Composition
Jiawei Lin, Shizhao Sun, Danqing Huang et al.
DiffVsgg: Diffusion-Driven Online Video Scene Graph Generation
Mu Chen, Liulei Li, Wenguan Wang et al.
Open-World Objectness Modeling Unifies Novel Object Detection
Shan Zhang, Yao Ni, Jinhao Du et al.
Normalizing Flows on the Product Space of SO(3) Manifolds for Probabilistic Human Pose Modeling
Olaf Dünkel, Tim Salzmann, Florian Pfaff
Generalizable Face Landmarking Guided by Conditional Face Warping
Jiayi Liang, Haotian Liu, Hongteng Xu et al.
One2Any: One-Reference 6D Pose Estimation for Any Object
Mengya Liu, Siyuan Li, Ajad Chhatkuli et al.
JTD-UAV: MLLM-Enhanced Joint Tracking and Description Framework for Anti-UAV Systems
Yifan Wang, Jian Zhao, Zhaoxin Fan et al.
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
Chenxin Tao, Shiqian Su, Xizhou Zhu et al.
Birth and Death of a Rose
Chen Geng, Yunzhi Zhang, Shangzhe Wu et al.
Towards Robust 3D Pose Transfer with Adversarial Learning
Haoyu Chen, Hao Tang, Ehsan Adeli et al.
BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation
Qihang Zhang, Yinghao Xu, Yujun Shen et al.
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute
Sotiris Anagnostidis, Gregor Bachmann, Yeongmin Kim et al.
Adversarial Domain Prompt Tuning and Generation for Single Domain Generalization
Zhipeng Xu, De Cheng, XINYANG JIANG et al.
Multi-Resolution Pathology-Language Pre-training Model with Text-Guided Visual Representation
Shahad Albastaki, Anabia Sohail, IYYAKUTTI IYAPPAN GANAPATHI et al.
Learning with Unreliability: Fast Few-shot Voxel Radiance Fields with Relative Geometric Consistency
Xu Yingjie, Bangzhen Liu, Hao Tang et al.
NightAdapter: Learning a Frequency Adapter for Generalizable Night-time Scene Segmentation
Qi Bi, Jingjun Yi, Huimin Huang et al.
Probabilistic Sampling of Balanced K-Means using Adiabatic Quantum Computing
Jan-Nico Zaech, Martin Danelljan, Tolga Birdal et al.
Adaptive Non-Uniform Timestep Sampling for Accelerating Diffusion Model Training
Myunsoo Kim, Donghyeon Ki, Seong-Woong Shim et al.
EFormer: Enhanced Transformer towards Semantic-Contour Features of Foreground for Portraits Matting
Zitao Wang, Qiguang Miao, Yue Xi et al.
LongDiff: Training-Free Long Video Generation in One Go
Zhuoling Li, Hossein Rahmani, Qiuhong Ke et al.