Most Cited ECCV "clinical task adaptation" Papers
2,387 papers found • Page 4 of 12
Conference
X-Pose: Detecting Any Keypoints
Jie Yang, AILING ZENG, Ruimao Zhang et al.
LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer
Ning Yu, Chia-Chih Chen, Zeyuan Chen et al.
GRIDS: Grouped Multiple-Degradation Restoration with Image Degradation Similarity
Shuo Cao, Yihao Liu, Wenlong Zhang et al.
UniCode : Learning a Unified Codebook for Multimodal Large Language Models
Sipeng Zheng, Bohan Zhou, Yicheng Feng et al.
Free-VSC: Free Semantics from Visual Foundation Models for Unsupervised Video Semantic Compression
Yuan Tian, Guo Lu, Guangtao Zhai
Reinforcement Learning Meets Visual Odometry
Nico Messikommer, Giovanni Cioffi, Mathias Gehrig et al.
PointNeRF++: A multi-scale, point-based Neural Radiance Field
Weiwei Sun, Eduard Trulls, Yang-Che Tseng et al.
Iterative Ensemble Training with Anti-Gradient Control for Mitigating Memorization in Diffusion Models
Xiao Liu, Xiaoliu Guan, Yu Wu et al.
RoDUS: Robust Decomposition of Static and Dynamic Elements in Urban Scenes
Thang-Anh-Quan Nguyen, Luis G Roldao Jimenez, Nathan Piasco et al.
Finding Visual Task Vectors
Alberto Hojel, Yutong Bai, Trevor Darrell et al.
NeuSDFusion: A Spatial-Aware Generative Model for 3D Shape Completion, Reconstruction, and Generation
Ruikai Cui, Weizhe Liu, Weixuan Sun et al.
Reinforcement Learning Friendly Vision-Language Model for Minecraft
Haobin Jiang, Junpeng Yue, Hao Luo et al.
Long-term Temporal Context Gathering for Neural Video Compression
Linfeng Qi, Zhaoyang Jia, Jiahao Li et al.
Image Demoireing in RAW and sRGB Domains
Shuning Xu, Binbin Song, Xiangyu Chen et al.
Temporal Event Stereo via Joint Learning with Stereoscopic Flow
Hoonhee Cho, Jae-young Kang, Kuk-Jin Yoon
ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention
Chenhang He, Ruihuang Li, Guowen Zhang et al.
Brain-ID: Learning Contrast-agnostic Anatomical Representations for Brain Imaging
Peirong Liu, Oula Puonti, Xiaoling Hu et al.
DreamMover: Leveraging the Prior of Diffusion Models for Image Interpolation with Large Motion
Liao Shen, Tianqi Liu, Huiqiang Sun et al.
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding
Bowen Shi, Peisen Zhao, Zichen Wang et al.
Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation
Peng Jin, Hao Li, Zesen Cheng et al.
InstructGIE: Towards Generalizable Image Editing
Zichong Meng, Changdi Yang, Jun Liu et al.
Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL
Fangwei Zhong, Kui Wu, Hai Ci et al.
SAFNet: Selective Alignment Fusion Network for Efficient HDR Imaging
Lingtong Kong, Bo Li, Yike Xiong et al.
BKDSNN: Enhancing the Performance of Learning-based Spiking Neural Networks Training with Blurred Knowledge Distillation
Zekai Xu, Kang You, Qinghai Guo et al.
CardiacNet: Learning to Reconstruct Abnormalities for Cardiac Disease Assessment from Echocardiogram Videos
JIEWEN YANG, Yiqun Lin, Bin Pu et al.
PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects
Junyi Li, Junfeng Wu, Weizhi Zhao et al.
Adaptive Compressed Sensing with Diffusion-Based Posterior Sampling
Noam Elata, Tomer Michaeli, Michael Elad
Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image
Pengkun Jiao, Na Zhao, Jingjing Chen et al.
OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation
Kwanyoung Kim, Yujin Oh, Jong Chul Ye
Robust Multimodal Learning via Representation Decoupling
Shicai Wei, Yang Luo, Yuji Wang et al.
Grounding Language Models for Visual Entity Recognition
Zilin Xiao, Ming Gong, Paola Cascante-Bonilla et al.
InstaStyle: Inversion Noise of a Stylized Image is Secretly a Style Adviser
Xing Cui, Zekun Li, Peipei Li et al.
3iGS: Factorised Tensorial Illumination for 3D Gaussian Splatting
Zhe Jun Tang, Tat-Jen Cham
MultiDelete for Multimodal Machine Unlearning
Jiali Cheng, Hadi Amiri
Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance
Reyhane Askari Hemmat, Melissa Hall, Alicia Yi Sun et al.
Self-Guided Generation of Minority Samples Using Diffusion Models
Soobin Um, Jong Chul Ye
Rethinking Data Augmentation for Robust LiDAR Semantic Segmentation in Adverse Weather
Junsung Park, Kyungmin Kim, Hyunjung Shim
3D Open-Vocabulary Panoptic Segmentation with 2D-3D Vision-Language Distillation
Zihao Xiao, Longlong Jing, Shangxuan Wu et al.
OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models
Zijian Zhou, Zheng Zhu, Holger Caesar et al.
ReALFRED: An Embodied Instruction Following Benchmark in Photo-Realistic Environments
Taewoong Kim, Cheolhong Min, Byeonghwi Kim et al.
IMMA: Immunizing text-to-image Models against Malicious Adaptation
Amber Yijia Zheng, Raymond Yeh
Kalman-Inspired Feature Propagation for Video Face Super-Resolution
Ruicheng Feng, Chongyi Li, Chen Change Loy
WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language
Zhenxiang Lin, Xidong Peng, peishan cong et al.
Boosting 3D Single Object Tracking with 2D Matching Distillation and 3D Pre-training
qiangqiang wu, Yan Xia, Jia Wan et al.
MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models
Nithin Gopalakrishnan Nair, Jeya Maria Jose Valanarasu, Vishal Patel
3DGazeNet: Generalizing Gaze Estimation with Weak Supervision from Synthetic Views
Evangelos Ververas, Polydefkis Gkagkos, Jiankang Deng et al.
EvSign: Sign Language Recognition and Translation with Streaming Events
Pengyu Zhang, Hao Yin, Zeren Wang et al.
Editable Image Elements for Controllable Synthesis
Jiteng Mu, Michael Gharbi, Richard Zhang et al.
Semi-supervised Segmentation of Histopathology Images with Noise-Aware Topological Consistency
Meilong Xu, Xiaoling Hu, Saumya Gupta et al.
Harnessing Text-to-Image Diffusion Models for Category-Agnostic Pose Estimation
Duo Peng, Zhengbo Zhang, Ping Hu et al.
Where am I? Scene Retrieval with Language
Jiaqi Chen, Daniel Barath, Iro Armeni et al.
ReSyncer: Rewiring Style-based Generator for Unified Audio-Visually Synced Facial Performer
Jiazhi Guan, Zhiliang Xu, Hang Zhou et al.
Every Pixel Has its Moments: Ultra-High-Resolution Unpaired Image-to-Image Translation via Dense Normalization
Ming-Yang Ho, Che-Ming Wu, Min-Sheng Wu et al.
Teddy: Efficient Large-Scale Dataset Distillation via Taylor-Approximated Matching
Ruonan Yu, Songhua Liu, Jingwen Ye et al.
Just a Hint: Point-Supervised Camouflaged Object Detection
Huafeng Chen, Dian SHAO, Guangqian Guo et al.
Learning Representations of Satellite Images From Metadata Supervision
Jules Bourcier, Gohar Dashyan, Karteek Alahari et al.
DiffuMatting: Synthesizing Arbitrary Objects with Matting-level Annotation
Xiaobin Hu, Xu Peng, Donghao Luo et al.
Representing Topological Self-Similarity Using Fractal Feature Maps for Accurate Segmentation of Tubular Structures
Jiaxing Huang, Yanfeng Zhou, Yaoru Luo et al.
MagicEraser: Erasing Any Objects via Semantics-Aware Control
FAN LI, Zixiao Zhang, Yi Huang et al.
ScanTalk: 3D Talking Heads from Unregistered Scans
Federico Nocentini, Thomas Besnier, Claudio Ferrari et al.
STAMP: Outlier-Aware Test-Time Adaptation with Stable Memory Replay
Yu Yongcan, Lijun Sheng, Ran He et al.
On the Utility of 3D Hand Poses for Action Recognition
Md Salman Shamil, Dibyadip Chatterjee, Fadime Sener et al.
Early Preparation Pays Off: New Classifier Pre-tuning for Class Incremental Semantic Segmentation
Zhengyuan Xie, Haiquan Lu, Jia-wen Xiao et al.
DreamDiffusion: High-Quality EEG-to-Image Generation with Temporal Masked Signal Modeling and CLIP Alignment
Yunpeng Bai, Xintao Wang, Yanpei Cao et al.
DiffFAS: Face Anti-Spoofing via Generative Diffusion Models
Xinxu Ge, Xin Liu, Zitong Yu et al.
OPEN: Object-wise Position Embedding for Multi-view 3D Object Detection
Jinghua Hou, Tong Wang, Xiaoqing Ye et al.
UpFusion: Novel View Diffusion from Unposed Sparse View Observations
Bharath Raj Nagoor Kani, Hsin-Ying Lee, Sergey Tulyakov et al.
Adapting Fine-Grained Cross-View Localization to Areas without Fine Ground Truth
Zimin Xia, Yujiao Shi, HONGDONG LI et al.
FreeCompose: Generic Zero-Shot Image Composition with Diffusion Prior
Zhekai Chen, Wen Wang, Zhen Yang et al.
Modeling and Driving Human Body Soundfields through Acoustic Primitives
Chao Huang, Dejan Markovic, Chenliang Xu et al.
DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition
Qi Wang, Zhou Xu, Yuming Lin et al.
C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition
Rongchang Li, Zhenhua Feng, Tianyang Xu et al.
Can OOD Object Detectors Learn from Foundation Models?
Jiahui Liu, Xin Wen, Shizhen Zhao et al.
Explorative Inbetweening of Time and Space
Haiwen Feng, Zheng Ding, Zhihao Xia et al.
BRIDGE: Bridging Gaps in Image Captioning Evaluation with Stronger Visual Cues
Sara Sarto, Marcella Cornia, Lorenzo Baraldi et al.
Mitigating Background Shift in Class-Incremental Semantic Segmentation
gilhan Park, WonJun Moon, SuBeen Lee et al.
∞-Brush: Controllable Large Image Synthesis with Diffusion Models in Infinite Dimensions
Minh Quan Le, Alexandros Graikos, Srikar Yellapragada et al.
Real Appearance Modeling for More General Deepfake Detection
Jiahe Tian, Yu Cai, Xi Wang et al.
Closed-Loop Unsupervised Representation Disentanglement with $\beta$-VAE Distillation and Diffusion Probabilistic Feedback
Xin Jin, Bohan Li, Baao Xie et al.
BAFFLE: A Baseline of Backpropagation-Free Federated Learning
Haozhe Feng, Tianyu Pang, Chao Du et al.
Contrastive ground-level image and remote sensing pre-training improves representation learning for natural world imagery
Andy V Huynh, Lauren Gillespie, Jael Lopez-Saucedo et al.
RICA^2: Rubric-Informed, Calibrated Assessment of Actions
Abrar Majeedi, Viswanatha Reddy Gajjala, Satya Sai Srinath Namburi GNVV et al.
Multi-Label Cluster Discrimination for Visual Representation Learning
Xiang An, Kaicheng Yang, Xiangzi Dai et al.
Learning Video Context as Interleaved Multimodal Sequences
Qinghong Lin, Pengchuan Zhang, Difei Gao et al.
Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation
Seonghoon Yu, Paul Hongsuck Seo, Jeany Son
MICDrop: Masking Image and Depth Features via Complementary Dropout for Domain-Adaptive Semantic Segmentation
Linyan Yang, Lukas Hoyer, Mark Weber et al.
SAMFusion: Sensor-Adaptive Multimodal Fusion for 3D Object Detection in Adverse Weather
Edoardo Palladin, Roland Dietze, Praveen Narayanan et al.
CMTA: Cross-Modal Temporal Alignment for Event-guided Video Deblurring
Taewoo Kim, Hoonhee Cho, Kuk-Jin Yoon
UCIP: A Universal Framework for Compressed Image Super-Resolution using Dynamic Prompt
Xin Li, Bingchen Li, Yeying Jin et al.
Temporally Consistent Stereo Matching
Jiaxi Zeng, Chengtang Yao, Yuwei Wu et al.
Eliminating Warping Shakes for Unsupervised Online Video Stitching
Lang Nie, Chunyu Lin, Kang Liao et al.
Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Large Models
Chen Ju, Haicheng Wang, Haozhe Cheng et al.
FutureDepth: Learning to Predict the Future Improves Video Depth Estimation
Rajeev Yasarla, Manish Kumar Singh, Hong Cai et al.
Kernel Diffusion: An Alternate Approach to Blind Deconvolution
Yash Sanghvi, Yiheng Chi, Stanley Chan
CamoTeacher: Dual-Rotation Consistency Learning for Semi-Supervised Camouflaged Object Detection
Xunfa Lai, Zhiyu Yang, Jie Hu et al.
Towards Reliable Evaluation and Fast Training of Robust Semantic Segmentation Models
Francesco Croce, Naman D. Singh, Matthias Hein
Which Model Generated This Image? A Model-Agnostic Approach for Origin Attribution
Fengyuan Liu, Haochen Luo, Yiming Li et al.
SINDER: Repairing the Singular Defects of DINOv2
Haoqi Wang, Tong Zhang, Mathieu Salzmann
Towards Real-World Adverse Weather Image Restoration: Enhancing Clearness and Semantics with Vision-Language Models
Jiaqi Xu, Mengyang Wu, Xiaowei Hu et al.
ChEX: Interactive Localization and Region Description in Chest X-rays
Philip Müller, Georgios Kaissis, Daniel Rueckert
PanoFree: Tuning-Free Holistic Multi-view Image Generation with Cross-view Self-Guidance
Aoming Liu, Zhong Li, Zhang Chen et al.
Multi-modal Crowd Counting via a Broker Modality
Haoliang Meng, Xiaopeng Hong, Chenhao Wang et al.
COHO: Context-Sensitive City-Scale Hierarchical Urban Layout Generation
Liu He, Daniel Aliaga
MoEAD: A Parameter-efficient Model for Multi-class Anomaly Detection
Shiyuan Meng, Wenchao Meng, Qihang Zhou et al.
DoughNet: A Visual Predictive Model for Topological Manipulation of Deformable Objects
Dominik Bauer, Zhenjia Xu, Shuran Song
Improving Feature Stability during Upsampling -- Spectral Artifacts and the Importance of Spatial Context
Shashank Agnihotri, Julia Grabinski, Margret Keuper
CrossGLG: LLM Guides One-shot Skeleton-based 3D Action Recognition in a Cross-level Manner
Tingbing Yan, Wenzheng Zeng, Yang Xiao et al.
PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation
Shilin Yan, Xiaohao Xu, Renrui Zhang et al.
Rate-Distortion-Cognition Controllable Versatile Neural Image Compression
Jinming Liu, Ruoyu Feng, Yunpeng Qi et al.
Multi-Sentence Grounding for Long-term Instructional Video
Zeqian Li, QIRUI CHEN, Tengda Han et al.
RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos
Tanveer Hannan, Mohaiminul Islam, Thomas Seidl et al.
Mixture of Efficient Diffusion Experts Through Automatic Interval and Sub-Network Selection
Alireza Ganjdanesh, Yan Kang, Yuchen Liu et al.
DeTra: A Unified Model for Object Detection and Trajectory Forecasting
Sergio Casas, Ben T Agro, Jiageng Mao et al.
DiffusionPen: Towards Controlling the Style of Handwritten Text Generation
KONSTANTINA NIKOLAIDOU, George Retsinas, Giorgos Sfikas et al.
EDformer: Transformer-Based Event Denoising Across Varied Noise Levels
Bin Jiang, Bo Xiong, Bohan Qu et al.
Dataset Quantization with Active Learning based Adaptive Sampling
Zhenghao Zhao, Yuzhang Shang, Junyi Wu et al.
Benchmarking Spurious Bias in Few-Shot Image Classifiers
Guangtao Zheng, Wenqian Ye, Aidong Zhang
Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation
Kihong Kim, Haneol Lee, Jihye Park et al.
Class-Agnostic Object Counting with Text-to-Image Diffusion Model
Xiaofei Hui, Qian Wu, Hossein Rahmani et al.
Global Counterfactual Directions
Bartlomiej Sobieski, Przemyslaw Biecek
FreeDiff: Progressive Frequency Truncation for Image Editing with Diffusion Models
Wei WU, Qingnan Fan, Shuai Qin et al.
MTA-CLIP: Language-Guided Semantic Segmentation with Mask-Text Alignment
Anurag Das, Xinting Hu, Li Jiang et al.
TCAN: Animating Human Images with Temporally Consistent Pose Guidance using Diffusion Models
Jeongho Kim, Min-Jung Kim, Junsoo Lee et al.
Fairness-aware Vision Transformer via Debiased Self-Attention
Yao Qiang, Chengyin Li, Prashant Khanduri et al.
Unleashing the Potential of the Semantic Latent Space in Diffusion Models for Image Dehazing
Zizheng Yang, Hu Yu, Bing Li et al.
Diff-Reg: Diffusion Model in Doubly Stochastic Matrix Space for Registration Problem
Qianliang Wu, Haobo Jiang, Lei Luo et al.
SAVE: Protagonist Diversification with Structure Agnostic Video Editing
Yeji Song, Wonsik Shin, Junsoo Lee et al.
Monocular Occupancy Prediction for Scalable Indoor Scenes
Hongxiao Yu, Yuqi Wang, Yuntao Chen et al.
IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth Generation
Yuanhao Zhai, Kevin Lin, Linjie Li et al.
The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization
Jiafeng Mao, Xueting Wang, Kiyoharu Aizawa
NL2Contact: Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model
Zhongqun Zhang, Hengfei Wang, Ziwei Yu et al.
Skeleton-based Group Activity Recognition via Spatial-Temporal Panoramic Graph
Zhengcen Li, Xinle Chang, Yueran Li et al.
RoadPainter: Points Are Ideal Navigators for Topology transformER
Zhongxing Ma, Liang Shuang, Yongkun Wen et al.
KFD-NeRF: Rethinking Dynamic NeRF with Kalman Filter
Yifan Zhan, Zhuoxiao Li, Muyao Niu et al.
Dense Multimodal Alignment for Open-Vocabulary 3D Scene Understanding
Ruihuang Li, Zhengqiang ZHANG, Chenhang He et al.
3x2: 3D Object Part Segmentation by 2D Semantic Correspondences
Anh Thai, Weiyao Wang, Hao Tang et al.
TF-FAS: Twofold-Element Fine-Grained Semantic Guidance for Generalizable Face Anti-Spoofing
Xudong Wang, Ke-Yue Zhang, Taiping Yao et al.
Real-time Holistic Robot Pose Estimation with Unknown States
Shikun Ban, Juling Fan, Xiaoxuan Ma et al.
Sparse Beats Dense: Rethinking Supervision in Radar-Camera Depth Completion
Huadong Li, Minhao Jing, Jin Wang et al.
CarFormer: Self-Driving with Learned Object-Centric Representations
Shadi Hamdan, Fatma Guney
RT-Pose: A 4D Radar-Tensor based 3D Human Pose Estimation and Localization Benchmark
Yuan-Hao Ho, Jen-Hao Cheng, Sheng Yao Kuan et al.
DGD: Dynamic 3D Gaussians Distillation
Isaac Labe, Noam Issachar, Itai Lang et al.
Self-Supervised Any-Point Tracking by Contrastive Random Walks
Ayush Shrivastava, Andrew Owens
Inf-DiT: Upsampling any-resolution image with memory-efficient diffusion transformer.
Zhuoyi Yang, Heyang Jiang, Wenyi Hong et al.
Rethinking Features-Fused-Pyramid-Neck for Object Detection
Hulin Li
MacDiff: Unified Skeleton Modeling with Masked Conditional Diffusion
Lehong Wu, Lilang Lin, Jiahang Zhang et al.
OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model
Runyi Li, Xuhan SHENG, Weiqi Li et al.
CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts
Yichao Cai, Yuhang Liu, Zhen Zhang et al.
Efficient Few-Shot Action Recognition via Multi-Level Post-Reasoning
Cong Wu, Xiao-Jun Wu, Linze Li et al.
Few-Shot Image Generation by Conditional Relaxing Diffusion Inversion
Yu Cao, Shaogang Gong
DreamMesh: Jointly Manipulating and Texturing Triangle Meshes for Text-to-3D Generation
Haibo Yang, Yang Chen, Yingwei Pan et al.
SwapAnything: Enabling Arbitrary Object Swapping in Personalized Image Editing
Jing Gu, Nanxuan Zhao, Wei Xiong et al.
How to Train the Teacher Model for Effective Knowledge Distillation
Shayan Mohajer Hamidi, Xizhen Deng, Renhao Tan et al.
Unlocking Attributes' Contribution to Successful Camouflage: A Combined Textual and Visual Analysis Strategy
Hong Zhang, Yixuan Lyu, Qian Yu et al.
OneVOS: Unifying Video Object Segmentation with All-in-One Transformer Framework
Wanyun Li, Pinxue Guo, Xinyu Zhou et al.
ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders
Jefferson Hernandez, Ruben Villegas, Vicente Ordonez
Diffusion Models are Geometry Critics: Single Image 3D Editing Using Pre-Trained Diffusion Priors
Ruicheng Wang, Jianfeng Xiang, Jiaolong Yang et al.
TimeCraft: Navigate Weakly-Supervised Temporal Grounded Video Question Answering via Bi-directional Reasoning
Huabin Liu, Xiao Ma, Cheng Zhong et al.
Training-Free Model Merging for Multi-target Domain Adaptation
Wenyi Li, Huan-ang Gao, Mingju Gao et al.
Timestep-Aware Correction for Quantized Diffusion Models
Yuzhe YAO, Feng Tian, Jun Chen et al.
LabelDistill: Label-guided Cross-modal Knowledge Distillation for Camera-based 3D Object Detection
Sanmin Kim, Youngseok Kim, Sihwan Hwang et al.
Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models
Xiaoyu Zhu, Hao Zhou, Pengfei Xing et al.
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning
Haiwen Diao, Bo Wan, XU JIA et al.
Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation
Shuangrui Ding, Rui Qian, Haohang Xu et al.
Multi-Person Pose Forecasting with Individual Interaction Perceptron and Prior Learning
Peng Xiao, Yi Xie, Xuemiao Xu et al.
Ponymation: Learning Articulated 3D Animal Motions from Unlabeled Online Videos
Keqiang Sun, Dori Litvak, Yunzhi Zhang et al.
Free Lunch for Gait Recognition: A Novel Relation Descriptor
Jilong Wang, Saihui Hou, Yan Huang et al.
Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
Mohaiminul Islam, Tushar Nagarajan, Huiyu Wang et al.
Real-time 3D-aware Portrait Editing from a Single Image
Qingyan Bai, Zifan Shi, Yinghao Xu et al.
VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space
Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda et al.
Enhancing Recipe Retrieval with Foundation Models: A Data Augmentation Perspective
Fangzhou Song, Bin Zhu, Yanbin Hao et al.
NOVUM: Neural Object Volumes for Robust Object Classification
Artur Jesslen, Guofeng Zhang, Angtian Wang et al.
CoSIGN: Few-Step Guidance of ConSIstency Model to Solve General INverse Problems
Jiankun Zhao, Bowen Song, Liyue Shen
MAGR: Manifold-Aligned Graph Regularization for Continual Action Quality Assessment
Kanglei Zhou, Liyuan Wang, Xingxing Zhang et al.
Leveraging temporal contextualization for video action recognition
Minji Kim, Dongyoon Han, Taekyung Kim et al.
VQA-Diff: Exploiting VQA and Diffusion for Zero-Shot Image-to-3D Vehicle Asset Generation in Autonomous Driving
Yibo Liu, Zheyuan Yang, Guile Wu et al.
Towards More Practical Group Activity Detection: A New Benchmark and Model
Dongkeun Kim, Youngkil Song, Minsu Cho et al.
Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
Jinrui Zhang, Teng Wang, Haigang Zhang et al.
Distributionally Robust Loss for Long-Tailed Multi-Label Image Classification
Dekun Lin, Zhe Cui, Rui Chen et al.
Uncertainty-aware sign language video retrieval with probability distribution modeling
Xuan Wu, Hongxiang Li, yuanjiang luo et al.
ViPer: Visual Personalization of Generative Models via Individual Preference Learning
Sogand Salehi, Mahdi Shafiei, Roman Bachmann et al.
Deblur e-NeRF: NeRF from Motion-Blurred Events under High-speed or Low-light Conditions
Weng Fei Low, Gim Hee Lee
Few-shot NeRF by Adaptive Rendering Loss Regularization
Qingshan Xu, Xuanyu Yi, Jianyao Xu et al.
Volumetric Rendering with Baked Quadrature Fields
Gopal Sharma, Daniel Rebain, Kwang Moo Yi et al.
Open Vocabulary 3D Scene Understanding via Geometry Guided Self-Distillation
Pengfei Wang, Yuxi Wang, Shuai Li et al.
Domain Shifting: A Generalized Solution for Heterogeneous Cross-Modality Person Re-Identification
Yan Jiang, Xu Cheng, Hao Yu et al.
Leveraging Thermal Modality to Enhance Reconstruction in Low-Light Conditions
Jiacong Xu, Mingqian Liao, Ram Prabhakar Kathirvel et al.
Unrolled Decomposed Unpaired Learning for Controllable Low-Light Video Enhancement
Lingyu Zhu, Wenhan Yang, Baoliang Chen et al.
Surf-D: Generating High-Quality Surfaces of Arbitrary Topologies Using Diffusion Models
Zhengming Yu, Zhiyang Dou, Xiaoxiao Long et al.
DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation
Yi-Hao Peng, Faria Huq, Yue Jiang et al.
Adversarially Robust Distillation by Reducing the Student-Teacher Variance Gap
Junhao Dong, Piotr Koniusz, Junxi Chen et al.
Learn from the Learnt: Source-Free Active Domain Adaptation via Contrastive Sampling and Visual Persistence
Mengyao Lyu, Tianxiang Hao, Xinhao Xu et al.
Training-free Composite Scene Generation for Layout-to-Image Synthesis
Jiaqi Liu, Tao Huang, Chang Xu
EgoPet: Egomotion and Interaction Data from an Animal's Perspective
Amir Bar, Arya Bakhtiar, Danny L Tran et al.
Graph Neural Network Causal Explanation via Neural Causal Models
Arman Behnam, Binghui Wang
DiscoMatch: Fast Discrete Optimisation for Geometrically Consistent 3D Shape Matching
Paul Roetzer, Ahmed Abbas, Dongliang Cao et al.
Motion and Structure from Event-based Normal Flow
Zhongyang Ren, Bangyan Liao, Delei Kong et al.
Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation
Yingshan Chang, Yasi Zhang, Zhiyuan Fang et al.
Generating Physically Realistic and Directable Human Motions from Multi-Modal Inputs
Aayam Shrestha, Pan Liu, German Ros et al.
PMT: Progressive Mean Teacher via Exploring Temporal Consistency for Semi-Supervised Medical Image Segmentation
Ning Gao, Sanping Zhou, Le Wang et al.