Most Cited ECCV "koopman operator" Papers
2,387 papers found • Page 4 of 12
Conference
Long-term Temporal Context Gathering for Neural Video Compression
Linfeng Qi, Zhaoyang Jia, Jiahao Li et al.
Norface: Improving Facial Expression Analysis by Identity Normalization
Hanwei Liu, Rudong An, Zhimeng Zhang et al.
HumanRefiner: Benchmarking Abnormal Human Generation and Refining with Coarse-to-fine Pose-Reversible Guidance
Guian Fang, Wenbiao Yan, Yuanfan Guo et al.
NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation
Ruikai Cui, Weizhe Liu, Weixuan Sun et al.
DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition
Qi Wang, Zhou Xu, Yuming Lin et al.
Improving Text-guided Object Inpainting with Semantic Pre-inpainting
Yifu Chen, Jingwen Chen, Yingwei Pan et al.
Foster Adaptivity and Balance in Learning with Noisy Labels
Mengmeng Sheng, Zeren Sun, Tao Chen et al.
X-Pose: Detecting Any Keypoints
Jie Yang, AILING ZENG, Ruimao Zhang et al.
EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head
Qianyun He, Xinya Ji, Yicheng Gong et al.
Rate-Distortion-Cognition Controllable Versatile Neural Image Compression
Jinming Liu, Ruoyu Feng, Yunpeng Qi et al.
AttnZero: Efficient Attention Discovery for Vision Transformers
Lujun Li, Zimian Wei, Peijie Dong et al.
Event-Adapted Video Super-Resolution
Zeyu Xiao, Dachun Kai, Yueyi Zhang et al.
Referring Atomic Video Action Recognition
Kunyu Peng, Jia Fu, Kailun Yang et al.
MoVideo: Motion-Aware Video Generation with Diffusion Models
Jingyun Liang, Yuchen Fan, Kai Zhang et al.
MoEAD: A Parameter-efficient Model for Multi-class Anomaly Detection
Shiyuan Meng, Wenchao Meng, Qihang Zhou et al.
UniCode : Learning a Unified Codebook for Multimodal Large Language Models
Sipeng Zheng, Bohan Zhou, Yicheng Feng et al.
Finding Visual Task Vectors
Alberto Hojel, Yutong Bai, Trevor Darrell et al.
Non-Exemplar Domain Incremental Learning via Cross-Domain Concept Integration
Qiang Wang, Yuhang He, Songlin Dong et al.
Vision-Language Action Knowledge Learning for Semantic-Aware Action Quality Assessment
Huangbiao Xu, Xiao Ke, Yuezhou Li et al.
CLIFF: Continual Latent Diffusion for Open-Vocabulary Object Detection
Wuyang Li, Xinyu Liu, Jiayi Ma et al.
Free-Editor: Zero-shot Text-driven 3D Scene Editing
Md Nazmul Karim, Hasan Iqbal, Umar Khalid et al.
Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban Environments
Djamahl Etchegaray, Zi Helen Huang, Tatsuya Harada et al.
TetraDiffusion: Tetrahedral Diffusion Models for 3D Shape Generation
Nikolai Kalischek, Torben Peters, Jan Dirk Wegner et al.
Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
Shihao Zhao, Shaozhe Hao, Bojia Zi et al.
LEIA: Latent View-invariant Embeddings for Implicit 3D Articulation
Archana Swaminathan, Anubhav Anubhav, Kamal Gupta et al.
FineMatch: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
Hang Hua, Jing Shi, Kushal Kafle et al.
LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer
Ning Yu, Chia-Chih Chen, Zeyuan Chen et al.
Free-VSC: Free Semantics from Visual Foundation Models for Unsupervised Video Semantic Compression
Yuan Tian, Guo Lu, Guangtao Zhai
HUMOS: Human Motion Model Conditioned on Body Shape
Shashank Tripathi, Omid Taheri, Christoph Lassner et al.
Temporal Event Stereo via Joint Learning with Stereoscopic Flow
Hoonhee Cho, Jae-young Kang, Kuk-Jin Yoon
LaMI-DETR: Open-Vocabulary Detection with Language Model Instruction
Penghui Du, Yu Wang, Yifan Sun et al.
PointNeRF++: A multi-scale, point-based Neural Radiance Field
Weiwei Sun, Eduard Trulls, Yang-Che Tseng et al.
Reinforcement Learning Meets Visual Odometry
Nico Messikommer, Giovanni Cioffi, Mathias Gehrig et al.
Neural Volumetric World Models for Autonomous Driving
Zanming Huang, Jimuyang Zhang, Eshed Ohn-Bar
Bridge Past and Future: Overcoming Information Asymmetry in Incremental Object Detection
QIJIE MO, Yipeng Gao, Shenghao Fu et al.
Grounding Language Models for Visual Entity Recognition
Zilin Xiao, Ming Gong, Paola Cascante-Bonilla et al.
Every Pixel Has its Moments: Ultra-High-Resolution Unpaired Image-to-Image Translation via Dense Normalization
Ming-Yang Ho, Che-Ming Wu, Min-Sheng Wu et al.
Editable Image Elements for Controllable Synthesis
Jiteng Mu, Michael Gharbi, Richard Zhang et al.
DreamMover: Leveraging the Prior of Diffusion Models for Image Interpolation with Large Motion
Liao Shen, Tianqi Liu, Huiqiang Sun et al.
InstaStyle: Inversion Noise of a Stylized Image is Secretly a Style Adviser
Xing Cui, Zekun Li, Peipei Li et al.
ScanTalk: 3D Talking Heads from Unregistered Scans
Federico Nocentini, Thomas Besnier, Claudio Ferrari et al.
Learning Representations of Satellite Images From Metadata Supervision
Jules Bourcier, Gohar Dashyan, Karteek Alahari et al.
MagicEraser: Erasing Any Objects via Semantics-Aware Control
FAN LI, Zixiao Zhang, Yi Huang et al.
On the Utility of 3D Hand Poses for Action Recognition
Md Salman Shamil, Dibyadip Chatterjee, Fadime Sener et al.
PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects
Junyi Li, Junfeng Wu, Weizhi Zhao et al.
3iGS: Factorised Tensorial Illumination for 3D Gaussian Splatting
Zhe Jun Tang, Tat-Jen Cham
Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models
Chen Ju, Haicheng Wang, Haozhe Cheng et al.
Kalman-Inspired Feature Propagation for Video Face Super-Resolution
Ruicheng Feng, Chongyi Li, Chen Change Loy
Teddy: Efficient Large-Scale Dataset Distillation via Taylor-Approximated Matching
Ruonan Yu, Songhua Liu, Jingwen Ye et al.
Boosting 3D Single Object Tracking with 2D Matching Distillation and 3D Pre-training
qiangqiang wu, Yan Xia, Jia Wan et al.
Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation
Peng Jin, Hao Li, Zesen Cheng et al.
Representing Topological Self-Similarity Using Fractal Feature Maps for Accurate Segmentation of Tubular Structures
Jiaxing Huang, Yanfeng Zhou, Yaoru Luo et al.
WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language
Zhenxiang Lin, Xidong Peng, peishan cong et al.
Semi-supervised Segmentation of Histopathology Images with Noise-Aware Topological Consistency
Meilong Xu, Xiaoling Hu, Saumya Gupta et al.
Harnessing Text-to-Image Diffusion Models for Category-Agnostic Pose Estimation
Duo Peng, Zhengbo Zhang, Ping Hu et al.
OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation
Kwanyoung Kim, Yujin Oh, Jong Chul Ye
Brain-ID: Learning Contrast-agnostic Anatomical Representations for Brain Imaging
Peirong Liu, Oula Puonti, Xiaoling Hu et al.
Where am I? Scene Retrieval with Language
Jiaqi Chen, Daniel Barath, Iro Armeni et al.
3DGazeNet: Generalizing Gaze Estimation with Weak Supervision from Synthetic Views
Evangelos Ververas, Polydefkis Gkagkos, Jiankang Deng et al.
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding
Bowen Shi, Peisen Zhao, Zichen Wang et al.
Robust Multimodal Learning via Representation Decoupling
Shicai Wei, Yang Luo, Yuji Wang et al.
DiffuMatting: Synthesizing Arbitrary Objects with Matting-level Annotation
Xiaobin Hu, Xu Peng, Donghao Luo et al.
Rethinking Data Augmentation for Robust LiDAR Semantic Segmentation in Adverse Weather
Junsung Park, Kyungmin Kim, Hyunjung Shim
ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer
Jiazhi Guan, Zhiliang Xu, Hang Zhou et al.
STAMP: Outlier-Aware Test-Time Adaptation with Stable Memory Replay
Yu Yongcan, Lijun Sheng, Ran He et al.
MultiDelete for Multimodal Machine Unlearning
Jiali Cheng, Hadi Amiri
Self-Guided Generation of Minority Samples Using Diffusion Models
Soobin Um, Jong Chul Ye
3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation
Zihao Xiao, Longlong Jing, Shangxuan Wu et al.
ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments
Taewoong Kim, Cheolhong Min, Byeonghwi Kim et al.
IMMA: Immunizing text-to-image Models against Malicious Adaptation
Amber Yijia Zheng, Raymond Yeh
InstructGIE: Towards Generalizable Image Editing
Zichong Meng, Changdi Yang, Jun Liu et al.
OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models
Zijian Zhou, Zheng Zhu, Holger Caesar et al.
MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models
Nithin Gopalakrishnan Nair, Jeya Maria Jose Valanarasu, Vishal Patel
Just a Hint: Point-Supervised Camouflaged Object Detection
Huafeng Chen, Dian SHAO, Guangqian Guo et al.
BKDSNN: Enhancing the Performance of Learning-based Spiking Neural Networks Training with Blurred Knowledge Distillation
Zekai Xu, Kang You, Qinghai Guo et al.
Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance
Reyhane Askari Hemmat, Melissa Hall, Alicia Yi Sun et al.
Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL
Fangwei Zhong, Kui Wu, Hai Ci et al.
Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image
Pengkun Jiao, Na Zhao, Jingjing Chen et al.
ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention
Chenhang He, Ruihuang Li, Guowen Zhang et al.
Adaptive Compressed Sensing with Diffusion-Based Posterior Sampling
Noam Elata, Tomer Michaeli, Michael Elad
Physical-Based Event Camera Simulator
Haiqian Han, Jiacheng Lyu, Jianing Li et al.
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues
Sara Sarto, Marcella Cornia, Lorenzo Baraldi et al.
RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos
Tanveer Hannan, Mohaiminul Islam, Thomas Seidl et al.
Multi-Label Cluster Discrimination for Visual Representation Learning
Xiang An, Kaicheng Yang, Xiangzi Dai et al.
PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-view Self-Guidance
Aoming Liu, Zhong Li, Zhang Chen et al.
SAMFusion: Sensor-Adaptive Multimodal Fusion for 3D Object Detection in Adverse Weather
Edoardo Palladin, Roland Dietze, Praveen Narayanan et al.
Early Preparation Pays Off: New Classifier Pre-tuning for Class Incremental Semantic Segmentation
Zhengyuan Xie, Haiquan Lu, Jia-wen Xiao et al.
OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection
Jinghua Hou, Tong Wang, Xiaoqing Ye et al.
Improving Feature Stability during Upsampling -- Spectral Artifacts and the Importance of Spatial Context
Shashank Agnihotri, Julia Grabinski, Margret Keuper
ChEX: Interactive Localization and Region Description in Chest X-rays
Philip Müller, Georgios Kaissis, Daniel Rueckert
∞-Brush: Controllable Large Image Synthesis with Diffusion Models in Infinite Dimensions
Minh Quan Le, Alexandros Graikos, Srikar Yellapragada et al.
CamoTeacher: Dual-Rotation Consistency Learning for Semi-Supervised Camouflaged Object Detection
Xunfa Lai, Zhiyu Yang, Jie Hu et al.
Closed-Loop Unsupervised Representation Disentanglement with $\beta$-VAE Distillation and Diffusion Probabilistic Feedback
Xin Jin, Bohan Li, Baao Xie et al.
UpFusion: Novel View Diffusion from Unposed Sparse View Observations
Bharath Raj Nagoor Kani, Hsin-Ying Lee, Sergey Tulyakov et al.
CrossGLG: LLM Guides One-shot Skeleton-based 3D Action Recognition in a Cross-level Manner
Tingbing Yan, Wenzheng Zeng, Yang Xiao et al.
PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation
Shilin Yan, Xiaohao Xu, Renrui Zhang et al.
DeTra: A Unified Model for Object Detection and Trajectory Forecasting
Sergio Casas, Ben T Agro, Jiageng Mao et al.
Multi-Sentence Grounding for Long-term Instructional Video
Zeqian Li, QIRUI CHEN, Tengda Han et al.
Adapting Fine-Grained Cross-View Localization to Areas without Fine Ground Truth
Zimin Xia, Yujiao Shi, HONGDONG LI et al.
Can OOD Object Detectors Learn from Foundation Models?
Jiahui Liu, Xin Wen, Shizhen Zhao et al.
Towards Real-World Adverse Weather Image Restoration: Enhancing Clearness and Semantics with Vision-Language Models
Jiaqi Xu, Mengyang Wu, Xiaowei Hu et al.
Kernel Diffusion: An Alternate Approach to Blind Deconvolution
Yash Sanghvi, Yiheng Chi, Stanley Chan
Modeling and Driving Human Body Soundfields through Acoustic Primitives
Chao Huang, Dejan Markovic, Chenliang Xu et al.
COHO: Context-Sensitive City-Scale Hierarchical Urban Layout Generation
Liu He, Daniel Aliaga
DoughNet: A Visual Predictive Model for Topological Manipulation of Deformable Objects
Dominik Bauer, Zhenjia Xu, Shuran Song
SINDER: Repairing the Singular Defects of DINOv2
Haoqi Wang, Tong Zhang, Mathieu Salzmann
Mixture of Efficient Diffusion Experts Through Automatic Interval and Sub-Network Selection
Alireza Ganjdanesh, Yan Kang, Yuchen Liu et al.
DiffFAS: Face Anti-Spoofing via Generative Diffusion Models
Xinxu Ge, Xin Liu, Zitong Yu et al.
MICDrop: Masking Image and Depth Features via Complementary Dropout for Domain-Adaptive Semantic Segmentation
Linyan Yang, Lukas Hoyer, Mark Weber et al.
BAFFLE: A Baseline of Backpropagation-Free Federated Learning
Haozhe Feng, Tianyu Pang, Chao Du et al.
Which Model Generated This Image? A Model-Agnostic Approach for Origin Attribution
Fengyuan Liu, Haochen Luo, Yiming Li et al.
Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation
Seonghoon Yu, Paul Hongsuck Seo, Jeany Son
Multi-modal Crowd Counting via a Broker Modality
Haoliang Meng, Xiaopeng Hong, Chenhao Wang et al.
CMTA: Cross-Modal Temporal Alignment for Event-guided Video Deblurring
Taewoo Kim, Hoonhee Cho, Kuk-Jin Yoon
RICA^2: Rubric-Informed, Calibrated Assessment of Actions
Abrar Majeedi, Viswanatha Reddy Gajjala, Satya Sai Srinath Namburi GNVV et al.
Real Appearance Modeling for More General Deepfake Detection
Jiahe Tian, Yu Cai, Xi Wang et al.
Mitigating Background Shift in Class-Incremental Semantic Segmentation
gilhan Park, WonJun Moon, SuBeen Lee et al.
Contrastive ground-level image and remote sensing pre-training improves representation learning for natural world imagery
Andy V Huynh, Lauren Gillespie, Jael Lopez-Saucedo et al.
Eliminating Warping Shakes for Unsupervised Online Video Stitching
Lang Nie, Chunyu Lin, Kang Liao et al.
Learning Video Context as Interleaved Multimodal Sequences
Qinghong Lin, Pengchuan Zhang, Difei Gao et al.
Explorative Inbetweening of Time and Space
Haiwen Feng, Zheng Ding, Zhihao Xia et al.
FutureDepth: Learning to Predict the Future Improves Video Depth Estimation
Rajeev Yasarla, Manish Kumar Singh, Hong Cai et al.
Temporally Consistent Stereo Matching
Jiaxi Zeng, Chengtang Yao, Yuwei Wu et al.
FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior
Zhekai Chen, Wen Wang, Zhen Yang et al.
DreamDiffusion: High-Quality EEG-to-Image Generation with Temporal Masked Signal Modeling and CLIP Alignment
Yunpeng Bai, Xintao Wang, Yanpei Cao et al.
Towards Reliable Evaluation and Fast Training of Robust Semantic Segmentation Models
Francesco Croce, Naman D. Singh, Matthias Hein
UCIP: A Universal Framework for Compressed Image Super-Resolution using Dynamic Prompt
Xin Li, Bingchen Li, Yeying Jin et al.
C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition
Rongchang Li, Zhenhua Feng, Tianyang Xu et al.
Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models
Xiaoyu Zhu, Hao Zhou, Pengfei Xing et al.
ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders
Jefferson Hernandez, Ruben Villegas, Vicente Ordonez
CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts
Yichao Cai, Yuhang Liu, Zhen Zhang et al.
TimeCraft: Navigate Weakly-Supervised Temporal Grounded Video Question Answering via Bi-directional Reasoning
Huabin Liu, Xiao Ma, Cheng Zhong et al.
IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation
Yuanhao Zhai, Kevin Lin, Linjie Li et al.
NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model
Zhongqun Zhang, Hengfei Wang, Ziwei Yu et al.
Training-Free Model Merging for Multi-target Domain Adaptation
Wenyi Li, Huan-ang Gao, Mingju Gao et al.
Deblur e-NeRF: NeRF from Motion-Blurred Events under High-speed or Low-light Conditions
Weng Fei Low, Gim Hee Lee
TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models
Jeongho Kim, Min-Jung Kim, Junsoo Lee et al.
DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation
Haibo Yang, Yang Chen, Yingwei Pan et al.
Self-Supervised Any-Point Tracking by Contrastive Random Walks
Ayush Shrivastava, Andrew Owens
Few-Shot Image Generation by Conditional Relaxing Diffusion Inversion
Yu Cao, Shaogang Gong
DiffusionPen: Towards Controlling the Style of Handwritten Text Generation
KONSTANTINA NIKOLAIDOU, George Retsinas, Giorgos Sfikas et al.
CarFormer: Self-Driving with Learned Object-Centric Representations
Shadi Hamdan, Fatma Guney
Monocular Occupancy Prediction for Scalable Indoor Scenes
Hongxiao Yu, Yuqi Wang, Yuntao Chen et al.
Skeleton-based Group Activity Recognition via Spatial-Temporal Panoramic Graph
Zhengcen Li, Xinle Chang, Yueran Li et al.
Unlocking Attributes' Contribution to Successful Camouflage: A Combined Textual and Visual Analysis Strategy
Hong Zhang, Yixuan Lyu, Qian Yu et al.
Diff-Reg: Diffusion Model in Doubly Stochastic Matrix Space for Registration Problem
Qianliang Wu, Haobo Jiang, Lei Luo et al.
OneVOS: Unifying Video Object Segmentation with All-in-One Transformer Framework
Wanyun Li, Pinxue Guo, Xinyu Zhou et al.
TF-FAS: Twofold-Element Fine-Grained Semantic Guidance for Generalizable Face Anti-Spoofing
Xudong Wang, Ke-Yue Zhang, Taiping Yao et al.
EDformer: Transformer-Based Event Denoising Across Varied Noise Levels
Bin Jiang, Bo Xiong, Bohan Qu et al.
FreeDiff: Progressive Frequency Truncation for Image Editing with Diffusion Models
Wei WU, Qingnan Fan, Shuai Qin et al.
Rethinking Features-Fused-Pyramid-Neck for Object Detection
Hulin Li
Global Counterfactual Directions
Bartlomiej Sobieski, Przemyslaw Biecek
Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation
Shuangrui Ding, Rui Qian, Haohang Xu et al.
Unleashing the Potential of the Semantic Latent Space in Diffusion Models for Image Dehazing
Zizheng Yang, Hu Yu, Bing Li et al.
LabelDistill: Label-guided Cross-modal Knowledge Distillation for Camera-based 3D Object Detection
Sanmin Kim, Youngseok Kim, Sihwan Hwang et al.
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning
Haiwen Diao, Bo Wan, XU JIA et al.
DGD: Dynamic 3D Gaussians Distillation
Isaac Labe, Noam Issachar, Itai Lang et al.
Class-Agnostic Object Counting with Text-to-Image Diffusion Model
Xiaofei Hui, Qian Wu, Hossein Rahmani et al.
Diffusion Models are Geometry Critics: Single Image 3D Editing Using Pre-Trained Diffusion Priors
Ruicheng Wang, Jianfeng Xiang, Jiaolong Yang et al.
Subspace Prototype Guidance for Mitigating Class Imbalance in Point Cloud Semantic Segmentation
Jiawei Han, Kaiqi Liu, Wei Li et al.
MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion
Lehong Wu, Lilang Lin, Jiahang Zhang et al.
Dataset Quantization with Active Learning based Adaptive Sampling
Zhenghao Zhao, Yuzhang Shang, Junyi Wu et al.
Fairness-aware Vision Transformer via Debiased Self-Attention
Yao Qiang, Chengyin Li, Prashant Khanduri et al.
INTRA: Interaction Relationship-aware Weakly Supervised Affordance Grounding
jiha jang, Hoigi Seo, Se Young Chun
KFD-NeRF: Rethinking Dynamic NeRF with Kalman Filter
Yifan Zhan, Zhuoxiao Li, Muyao Niu et al.
MTA-CLIP: Language-Guided Semantic Segmentation with Mask-Text Alignment
Anurag Das, Xinting Hu, Li Jiang et al.
BlenderAlchemy: Editing 3D Graphics with Vision-Language Models
Ian Huang, Guandao Yang, Leonidas Guibas
DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation
Yi-Hao Peng, Faria Huq, Yue Jiang et al.
RoadPainter: Points Are Ideal Navigators for Topology transformER
Zhongxing Ma, Liang Shuang, Yongkun Wen et al.
How to Train the Teacher Model for Effective Knowledge Distillation
Shayan Mohajer Hamidi, Xizhen Deng, Renhao Tan et al.
Timestep-Aware Correction for Quantized Diffusion Models
Yuzhe YAO, Feng Tian, Jun Chen et al.
OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model
Runyi Li, Xuhan SHENG, Weiqi Li et al.
3x2: 3D Object Part Segmentation by 2D Semantic Correspondences
Anh Thai, Weiyao Wang, Hao Tang et al.
Placing Objects in Context via Inpainting for Out-of-distribution Segmentation
Pau de Jorge Aranda, Riccardo Volpi, Puneet Dokania et al.
Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation
Kihong Kim, Haneol Lee, Jihye Park et al.
The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization
Jiafeng Mao, Xueting Wang, Kiyoharu Aizawa
Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding
Ruihuang Li, Zhengqiang ZHANG, Chenhang He et al.
RT-Pose: A 4D Radar-Tensor based 3D Human Pose Estimation and Localization Benchmark
Yuan-Hao Ho, Jen-Hao Cheng, Sheng Yao Kuan et al.
Sparse Beats Dense: Rethinking Supervision in Radar-Camera Depth Completion
Huadong Li, Minhao Jing, Jin Wang et al.
Benchmarking Spurious Bias in Few-Shot Image Classifiers
Guangtao Zheng, Wenqian Ye, Aidong Zhang
SwapAnything: Enabling Arbitrary Object Swapping in Personalized Image Editing
Jing Gu, Nanxuan Zhao, Wei Xiong et al.
Inf-DiT: Upsampling any-resolution image with memory-efficient diffusion transformer.
Zhuoyi Yang, Heyang Jiang, Wenyi Hong et al.
Real-time Holistic Robot Pose Estimation with Unknown States
Shikun Ban, Juling Fan, Xiaoxuan Ma et al.
SAVE: Protagonist Diversification with Structure Agnostic Video Editing
Yeji Song, Wonsik Shin, Junsoo Lee et al.
Efficient Few-Shot Action Recognition via Multi-Level Post-Reasoning
Cong Wu, Xiao-Jun Wu, Linze Li et al.
Volumetric Rendering with Baked Quadrature Fields
Gopal Sharma, Daniel Rebain, Kwang Moo Yi et al.
BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models
Ye-Bin Moon, Nam Hyeon-Woo, Wonseok Choi et al.
Domain Shifting: A Generalized Solution for Heterogeneous Cross-Modality Person Re-Identification
Yan Jiang, Xu Cheng, Hao Yu et al.
Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation
Pengfei Wang, Yuxi Wang, Shuai Li et al.
Part2Object: Hierarchical Unsupervised 3D Instance Segmentation
cheng Shi, Yulin zhang, Bin Yang et al.
DiscoMatch: Fast Discrete Optimisation for Geometrically Consistent 3D Shape Matching
Paul Roetzer, Ahmed Abbas, Dongliang Cao et al.
Self-Adapting Large Visual-Language Models to Edge Devices across Visual Modalities
Kaiwen Cai, ZheKai Duan, Gaowen Liu et al.
Idempotent Unsupervised Representation Learning for Skeleton-Based Action Recognition
Lilang Lin, Lehong Wu, Jiahang Zhang et al.
Motion and Structure from Event-based Normal Flow
Zhongyang Ren, Bangyan Liao, Delei Kong et al.
KDProR: A Knowledge-Decoupling Probabilistic Framework for Video-Text Retrieval
Xianwei Zhuang, Hongxiang Li, Xuxin Cheng et al.
VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle Asset Generation in Autonomous Driving
Yibo Liu, Zheyuan Yang, Guile Wu et al.
Multiscale Sliced Wasserstein Distances as Perceptual Color Difference Measures
Jiaqi He, Zhihua Wang, Leon Wang et al.
Imaging Interiors: An Implicit Solution to Electromagnetic Inverse Scattering Problems
Ziyuan Luo, Boxin Shi, Haoliang Li et al.
Adversarially Robust Distillation by Reducing the Student-Teacher Variance Gap
Junhao Dong, Piotr Koniusz, Junxi Chen et al.
Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models
Minchan Kim, Minyeong Kim, Junik Bae et al.