Most Cited CVPR "neural source separation" Papers
5,589 papers found • Page 23 of 28
Conference
UV-IDM: Identity-Conditioned Latent Diffusion Model for Face UV-Texture Generation
Hong Li, Yutang Feng, Song Xue et al.
PBWR: Parametric-Building-Wireframe Reconstruction from Aerial LiDAR Point Clouds
Shangfeng Huang, Ruisheng Wang, Bo Guo et al.
GenH2R: Learning Generalizable Human-to-Robot Handover via Scalable Simulation Demonstration and Imitation
Zifan Wang, Junyu Chen, Ziqing Chen et al.
UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image
Xingyu Liu, Gu Wang, Ruida Zhang et al.
Content-Style Decoupling for Unsupervised Makeup Transfer without Generating Pseudo Ground Truth
Zhaoyang Sun, Shengwu Xiong, Yaxiong Chen et al.
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
Yanwu Xu, Yang Zhao, Zhisheng Xiao et al.
AuraFusion360: Augmented Unseen Region Alignment for Reference-based 360° Unbounded Scene Inpainting
Chung-Ho Wu, Yang-Jung Chen, Ying-Huan Chen et al.
SyncVP: Joint Diffusion for Synchronous Multi-Modal Video Prediction
Enrico Pallotta, Sina Mokhtarzadeh Azar, Shuai Li et al.
SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation
Keqi Chen, vinkle srivastav, Nicolas Padoy
Context-Aware Integration of Language and Visual References for Natural Language Tracking
Yanyan Shao, Shuting He, Qi Ye et al.
Open-Vocabulary Segmentation with Semantic-Assisted Calibration
Yong Liu, Sule Bai, Guanbin Li et al.
V^2Dial: Unification of Video and Visual Dialog via Multimodal Experts
Adnen Abdessaied, Anna Rohrbach, Marcus Rohrbach et al.
Training Generative Image Super-Resolution Models by Wavelet-Domain Losses Enables Better Control of Artifacts
Cansu Korkmaz, Ahmet Murat Tekalp, Zafer Dogan
RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception
Ruiyang Hao, Siqi Fan, Yingru Dai et al.
Task-Customized Mixture of Adapters for General Image Fusion
Pengfei Zhu, Yang Sun, Bing Cao et al.
PointBeV: A Sparse Approach for BeV Predictions
Loick Chambon, Éloi Zablocki, Mickaël Chen et al.
Learning Spatial Adaptation and Temporal Coherence in Diffusion Models for Video Super-Resolution
Zhikai Chen, Fuchen Long, Zhaofan Qiu et al.
Ensemble Diversity Facilitates Adversarial Transferability
Bowen Tang, Zheng Wang, Yi Bin et al.
CFAT: Unleashing Triangular Windows for Image Super-resolution
Abhisek Ray, Gaurav Kumar, Maheshkumar Kolekar
Convolutional Prompting meets Language Models for Continual Learning
Anurag Roy, Riddhiman Moulick, Vinay Verma et al.
Mitigating Noisy Correspondence by Geometrical Structure Consistency Learning
Zihua Zhao, Mengxi Chen, Tianjie Dai et al.
Seeing the Abstract: Translating the Abstract Language for Vision Language Models
Davide Talon, Federico Girella, Ziyue Liu et al.
Contextual Augmented Global Contrast for Multimodal Intent Recognition
Kaili Sun, Zhiwen Xie, Mang Ye et al.
Intraoperative 2D/3D Image Registration via Differentiable X-ray Rendering
Vivek Gopalakrishnan, Neel Dey, Polina Golland
Mind the Gap: Detecting Black-box Adversarial Attacks in the Making through Query Update Analysis
Jeonghwan Park, Niall McLaughlin, Ihsen Alouani
GauCho: Gaussian Distributions with Cholesky Decomposition for Oriented Object Detection
Jeffri Erwin Murrugarra Llerena, José Henrique Marques, Claudio Jung
Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration
JUNSEONG KIM, GeonU Kim, Kim Yu-Ji et al.
EgoPressure: A Dataset for Hand Pressure and Pose Estimation in Egocentric Vision
Yiming Zhao, Taein Kwon, Paul Streli et al.
Dyn-HaMR: Recovering 4D Interacting Hand Motion from a Dynamic Camera
Zhengdi Yu, Stefanos Zafeiriou, Tolga Birdal
Relaxed Contrastive Learning for Federated Learning
Seonguk Seo, Jinkyu Kim, Geeho Kim et al.
EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
Jiaxuan Li, Duc Minh Vo, Akihiro Sugimoto et al.
LLM4SGG: Large Language Models for Weakly Supervised Scene Graph Generation
Kibum Kim, Kanghoon Yoon, Jaehyeong Jeon et al.
Geometrically-driven Aggregation for Zero-shot 3D Point Cloud Understanding
Guofeng Mei, Luigi Riz, Yiming Wang et al.
ShapeWords: Guiding Text-to-Image Synthesis with 3D Shape-Aware Prompts
Dmitrii M Petrov, Pradyumn Goyal, Divyansh Shivashok et al.
ProAPO: Progressively Automatic Prompt Optimization for Visual Classification
Xiangyan Qu, Gaopeng Gou, Jiamin Zhuang et al.
Beyond Textual Constraints: Learning Novel Diffusion Conditions with Fewer Examples
Yuyang Yu, Bangzhen Liu, Chenxi Zheng et al.
Task-aligned Part-aware Panoptic Segmentation through Joint Object-Part Representations
Daan de Geus, Gijs Dubbelman
FreeKD: Knowledge Distillation via Semantic Frequency Prompt
Yuan Zhang, Tao Huang, Jiaming Liu et al.
Enhanced Motion-Text Alignment for Image-to-Video Transfer Learning
Wei Zhang, Chaoqun Wan, Tongliang Liu et al.
Temporal Score Analysis for Understanding and Correcting Diffusion Artifacts
Yu Cao, Zengqun Zhao, Ioannis Patras et al.
Targeted Representation Alignment for Open-World Semi-Supervised Learning
Ruixuan Xiao, Lei Feng, Kai Tang et al.
Nonisotropic Gaussian Diffusion for Realistic 3D Human Motion Prediction
Cecilia Curreli, Dominik Muhle, Abhishek Saroha et al.
SNIDA: Unlocking Few-Shot Object Detection with Non-linear Semantic Decoupling Augmentation
Yanjie Wang, Xu Zou, Luxin Yan et al.
3D Gaussian Inpainting with Depth-Guided Cross-View Consistency
Sheng-Yu Huang, Zi-Ting Chou, Yu-Chiang Frank Wang
Free-viewpoint Human Animation with Pose-correlated Reference Selection
Fa-Ting Hong, Zhan Xu, Haiyang Liu et al.
Adaptive Multi-Modal Cross-Entropy Loss for Stereo Matching
Peng Xu, Zhiyu Xiang, Chengyu Qiao et al.
OSDFace: One-Step Diffusion Model for Face Restoration
Jingkai Wang, Jue Gong, Lin Zhang et al.
Driving-Video Dehazing with Non-Aligned Regularization for Safety Assistance
Junkai Fan, Jiangwei Weng, Kun Wang et al.
Exploring Region-Word Alignment in Built-in Detector for Open-Vocabulary Object Detection
Heng Zhang, Qiuyu Zhao, Linyu Zheng et al.
L0-Sampler: An L0 Model Guided Volume Sampling for NeRF
Liangchen Li, Juyong Zhang
Towards Lossless Implicit Neural Representation via Bit Plane Decomposition
Woo Kyoung Han, Byeonghun Lee, Hyunmin Cho et al.
DPFlow: Adaptive Optical Flow Estimation with a Dual-Pyramid Framework
Henrique Morimitsu, Xiaobin Zhu, Roberto M. Cesar Jr et al.
Diffusion 3D Features (Diff3F): Decorating Untextured Shapes with Distilled Semantic Features
Niladri Shekhar Dutt, Sanjeev Muralikrishnan, Niloy J. Mitra
Unsupervised Occupancy Learning from Sparse Point Cloud
Amine Ouasfi, Adnane Boukhayma
GLOW: Global Layout Aware Attacks on Object Detection
Jun Bao, Buyu Liu, Kui Ren et al.
Context-based and Diversity-driven Specificity in Compositional Zero-Shot Learning
Yun Li, Zhe Liu, Hang Chen et al.
Vision-Language Model IP Protection via Prompt-based Learning
Lianyu Wang, Meng Wang, Huazhu Fu et al.
Neural Underwater Scene Representation
Yunkai Tang, Chengxuan Zhu, Renjie Wan et al.
Scaled Decoupled Distillation
Shicai Wei, Chunbo Luo, Yang Luo
VISTA-LLAMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens
Fan Ma, Xiaojie Jin, Heng Wang et al.
S^3-Face: SSS-Compliant Facial Reflectance Estimation via Diffusion Priors
Xingyu Ren, Jiankang Deng, Yuhao Cheng et al.
Hierarchical Intra-modal Correlation Learning for Label-free 3D Semantic Segmentation
Xin Kang, Lei Chu, Jiahao Li et al.
ShowMak3r: Compositional TV Show Reconstruction
Sangmin Kim, Seunguk Do, Jaesik Park
Disco4D: Disentangled 4D Human Generation and Animation from a Single Image
Hui En Pang, Shuai Liu, Zhongang Cai et al.
3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes
Jan Held, Renaud Vandeghen, Abdullah J Hamdi et al.
PARA-Drive: Parallelized Architecture for Real-time Autonomous Driving
Xinshuo Weng, Boris Ivanovic, Yan Wang et al.
Towards Generalizable Tumor Synthesis
Qi Chen, Xiaoxi Chen, Haorui Song et al.
Scaling Mesh Generation via Compressive Tokenization
Haohan Weng, Zibo Zhao, Biwen Lei et al.
Adaptive Hyper-graph Aggregation for Modality-Agnostic Federated Learning
Fan Qi, Shuai Li
Bi-SSC: Geometric-Semantic Bidirectional Fusion for Camera-based 3D Semantic Scene Completion
Yujie Xue, Ruihui Li, F anWu et al.
Efficient and Effective Weakly-Supervised Action Segmentation via Action-Transition-Aware Boundary Alignment
Angchi Xu, Wei-Shi Zheng
Depth-Aware Concealed Crop Detection in Dense Agricultural Scenes
Liqiong Wang, Jinyu Yang, Yanfu Zhang et al.
FC-GNN: Recovering Reliable and Accurate Correspondences from Interferences
Haobo Xu, Jun Zhou, Hua Yang et al.
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
Yuhao Dong, Zuyan Liu, Hai-Long Sun et al.
MoMask: Generative Masked Modeling of 3D Human Motions
chuan guo, Yuxuan Mu, Muhammad Gohar Javed et al.
CapsFusion: Rethinking Image-Text Data at Scale
Qiying Yu, Quan Sun, Xiaosong Zhang et al.
UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio Video Point Cloud Time-Series and Image Recognition
Xiaohan Ding, Yiyuan Zhang, Yixiao Ge et al.
A General and Efficient Training for Transformer via Token Expansion
Wenxuan Huang, Yunhang Shen, Jiao Xie et al.
BigGait: Learning Gait Representation You Want by Large Vision Models
Dingqiang Ye, Chao Fan, Jingzhe Ma et al.
Event-based Visible and Infrared Fusion via Multi-task Collaboration
Mengyue Geng, Lin Zhu, Lizhi Wang et al.
Breathing Life Into Sketches Using Text-to-Video Priors
Rinon Gal, Yael Vinker, Yuval Alaluf et al.
Gaussian Shell Maps for Efficient 3D Human Generation
Rameen Abdal, Wang Yifan, Zifan Shi et al.
World-consistent Video Diffusion with Explicit 3D Modeling
Qihang Zhang, Shuangfei Zhai, Miguel Ángel Bautista et al.
StageDesigner: Artistic Stage Generation for Scenography via Theater Scripts
Zhaoxing Gan, Mengtian Li, Ruhua Chen et al.
Byzantine-robust Decentralized Federated Learning via Dual-domain Clustering and Trust Bootstrapping
Peng Sun, Xinyang Liu, Zhibo Wang et al.
MotionEditor: Editing Video Motion via Content-Aware Diffusion
Shuyuan Tu, Qi Dai, Zhi-Qi Cheng et al.
State Space Models for Event Cameras
Nikola Zubic, Mathias Gehrig, Davide Scaramuzza
DiffInDScene: Diffusion-based High-Quality 3D Indoor Scene Generation
Xiaoliang Ju, Zhaoyang Huang, Yijin Li et al.
Towards Calibrated Multi-label Deep Neural Networks
Jiacheng Cheng, Nuno Vasconcelos
TIM: A Time Interval Machine for Audio-Visual Action Recognition
Jacob Chalk, Jaesung Huh, Evangelos Kazakos et al.
Test-Time Linear Out-of-Distribution Detection
Ke Fan, Tong Liu, Xingyu Qiu et al.
Blood Flow Speed Estimation with Optical Coherence Tomography Angiography Images
Wensheng Cheng, Zhenghong Li, Jiaxiang Ren et al.
Exploiting Style Latent Flows for Generalizing Deepfake Video Detection
Jongwook Choi, Taehoon Kim, Yonghyun Jeong et al.
Associative Transformer
Yuwei Sun, Hideya Ochiai, Zhirong Wu et al.
LeGO: Leveraging a Surface Deformation Network for Animatable Stylized Face Generation with One Example
Soyeon Yoon, Kwan Yun, Kwanggyoon Seo et al.
Identity-Preserving Text-to-Video Generation by Frequency Decomposition
Shenghai Yuan, Jinfa Huang, Xianyi He et al.
RestorGS: Depth-aware Gaussian Splatting for Efficient 3D Scene Restoration
Yuanjian Qiao, Mingwen Shao, Lingzhuang Meng et al.
Leveraging Predicate and Triplet Learning for Scene Graph Generation
Jiankai Li, Yunhong Wang, Xiefan Guo et al.
Unsupervised Semantic Segmentation Through Depth-Guided Feature Correlation and Sampling
Leon Sick, Dominik Engel, Pedro Hermosilla et al.
High Dynamic Range Video Compression: A Large-Scale Benchmark Dataset and A Learned Bit-depth Scalable Compression Algorithm
Zhaoyi Tian, Feifeng Wang, Shiwei Wang et al.
HHMR: Holistic Hand Mesh Recovery by Enhancing the Multimodal Controllability of Graph Diffusion Models
Mengcheng Li, Hongwen Zhang, Yuxiang Zhang et al.
Effective Cloud Removal for Remote Sensing Images by an Improved Mean-Reverting Denoising Model with Elucidated Design Space
Yi Liu, Wengen Li, Jihong Guan et al.
APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers
Zhuguanyu Wu, Jiayi Zhang, Jiaxin Chen et al.
Enhancing Visual Continual Learning with Language-Guided Supervision
Bolin Ni, Hongbo Zhao, Chenghao Zhang et al.
PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI
Yandan Yang, Baoxiong Jia, Peiyuan Zhi et al.
MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection
Hou-I Liu, Christine Wu, Jen-Hao Cheng et al.
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts
Peijie Wang, Zhong-Zhi Li, Fei Yin et al.
Co-op: Correspondence-based Novel Object Pose Estimation
Sungphill Moon, Hyeontae Son, Dongcheol Hur et al.
What’s in the Image? A Deep-Dive into the Vision of Vision Language Models
Omri Kaduri, Shai Bagon, Tali Dekel
Uncertainty-Aware Source-Free Adaptive Image Super-Resolution with Wavelet Augmentation Transformer
Yuang Ai, Xiaoqiang Zhou, Huaibo Huang et al.
Generalizing 6-DoF Grasp Detection via Domain Prior Knowledge
Haoxiang Ma, Modi Shi, Boyang GAO et al.
Text Embedding is Not All You Need: Attention Control for Text-to-Image Semantic Alignment with Text Self-Attention Maps
Jeeyung Kim, Erfan Esmaeili Fakhabi, Qiang Qiu
Making Vision Transformers Truly Shift-Equivariant
Renan A. Rojas-Gomez, Teck-Yian Lim, Minh Do et al.
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion
Yujie Wei, Shiwei Zhang, Zhiwu Qing et al.
SimLTD: Simple Supervised and Semi-Supervised Long-Tailed Object Detection
Phi Vu Tran
RankED: Addressing Imbalance and Uncertainty in Edge Detection Using Ranking-based Losses
bedrettin cetinkaya, Sinan Kalkan, Emre Akbas
Fine-Grained Bipartite Concept Factorization for Clustering
Chong Peng, Pengfei Zhang, Yongyong Chen et al.
Generalized Event Cameras
Varun Sundar, Matthew Dutson, Andrei Ardelean et al.
Multimodal Prompt Perceiver: Empower Adaptiveness Generalizability and Fidelity for All-in-One Image Restoration
Yuang Ai, Huaibo Huang, Xiaoqiang Zhou et al.
BEVSpread: Spread Voxel Pooling for Bird’s-Eye-View Representation in Vision-based Roadside 3D Object Detection
Wenjie Wang, Yehao Lu, Guangcong Zheng et al.
Dual Pose-invariant Embeddings: Learning Category and Object-specific Discriminative Representations for Recognition and Retrieval
Rohan Sarkar, Avinash Kak
Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer
Zhen Zhao, Jingqun Tang, Chunhui Lin et al.
NIVeL: Neural Implicit Vector Layers for Text-to-Vector Generation
Vikas Thamizharasan, Difan Liu, Matthew Fisher et al.
Hyperbolic Anomaly Detection
Huimin Li, Zhentao Chen, Yunhao Xu et al.
Selective Nonlinearities Removal from Digital Signals
Krzysztof Maliszewski, Magdalena Urbanska, Varvara Vetrova et al.
Backdoor Defense via Test-Time Detecting and Repairing
Jiyang Guan, Jian Liang, Ran He
Towards a Perceptual Evaluation Framework for Lighting Estimation
Justine Giroux, Mohammad Reza Karimi Dastjerdi, Yannick Hold-Geoffroy et al.
DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance
Zixuan Wang, Jia Jia, Shikun Sun et al.
Consistency Posterior Sampling for Diverse Image Synthesis
Vishal Purohit, Matthew Repasky, Jianfeng Lu et al.
HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative
CONG MA, Qiao Lei, Chengkai Zhu et al.
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Ho Kei Cheng, Masato Ishii, Akio Hayakawa et al.
What Sketch Explainability Really Means for Downstream Tasks?
Hmrishav Bandyopadhyay, Pinaki Nath Chowdhury, Ayan Kumar Bhunia et al.
Leveraging Frame Affinity for sRGB-to-RAW Video De-rendering
Chen Zhang, Wencheng Han, Yang Zhou et al.
Learning CNN on ViT: A Hybrid Model to Explicitly Class-specific Boundaries for Domain Adaptation
Ba Hung Ngo, Nhat-Tuong Do-Tran, Tuan-Ngoc Nguyen et al.
GoMVS: Geometrically Consistent Cost Aggregation for Multi-View Stereo
Jiang Wu, Rui Li, Haofei Xu et al.
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
Byung-Kwan Lee, Ryo Hachiuma, Yu-Chiang Frank Wang et al.
From Correspondences to Pose: Non-minimal Certifiably Optimal Relative Pose without Disambiguation
Javier Tirado-Garín, Javier Civera
CommonCanvas: Open Diffusion Models Trained on Creative-Commons Images
Aaron Gokaslan, A. Feder Cooper, Jasmine Collins et al.
Choose What You Need: Disentangled Representation Learning for Scene Text Recognition Removal and Editing
Boqiang Zhang, Hongtao Xie, Zuan Gao et al.
Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding
Wenxuan Guo, Xiuwei Xu, Ziwei Wang et al.
Memory-based Adapters for Online 3D Scene Perception
Xiuwei Xu, Chong Xia, Ziwei Wang et al.
CacheQuant: Comprehensively Accelerated Diffusion Models
Xuewen Liu, Zhikai Li, Qingyi Gu
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
Jihoon Kim, Jeongsoo Choi, Jaehun Kim et al.
Cross-spectral Gated-RGB Stereo Depth Estimation
Samuel Brucker, Stefanie Walz, Mario Bijelic et al.
Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos
Kumaranage Ravindu Nagasinghe, Honglu Zhou, Malitha Gunawardhana et al.
Token Cropr: Faster ViTs for Quite a Few Tasks
Benjamin Bergner, Christoph Lippert, Aravindh Mahendran
A Unified Latent Schrödinger Bridge Diffusion Model for Unsupervised Anomaly Detection and Localization
Shilhora Akshay, Niveditha Lakshmi Narasimhan, Jacob George et al.
EASE-DETR: Easing the Competition among Object Queries
Yulu Gao, Yifan Sun, Xudong Ding et al.
GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding
Zi-Ting Chou, Sheng-Yu Huang, I-Jieh Liu et al.
CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model
Jianhao Zeng, Dan Song, Weizhi Nie et al.
Readout Guidance: Learning Control from Diffusion Features
Grace Luo, Trevor Darrell, Oliver Wang et al.
Continuous Locomotive Crowd Behavior Generation
Inhwan Bae, Junoh Lee, Hae-Gon Jeon
Action Detection via an Image Diffusion Process
Lin Geng Foo, Tianjiao Li, Hossein Rahmani et al.
Transcriptomics-guided Slide Representation Learning in Computational Pathology
Guillaume Jaume, Lukas Oldenburg, Anurag Vaidya et al.
SurroundSDF: Implicit 3D Scene Understanding Based on Signed Distance Field
Lizhe Liu, Bohua Wang, Hongwei Xie et al.
MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning
Zhe Li, Laurence Yang, Bocheng Ren et al.
Active Open-Vocabulary Recognition: Let Intelligent Moving Mitigate CLIP Limitations
Lei Fan, Jianxiong Zhou, Xiaoying Xing et al.
Period-LLM: Extending the Periodic Capability of Multimodal Large Language Model
Yuting Zhang, Hao Lu, Qingyong Hu et al.
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training
Jierun Chen, Dongting Hu, Xijie Huang et al.
DyBluRF: Dynamic Neural Radiance Fields from Blurry Monocular Video
Huiqiang Sun, Xingyi Li, Liao Shen et al.
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Xubing Ye, Yukang Gan, Xiaoke Huang et al.
Unraveling Normal Anatomy via Fluid-Driven Anomaly Randomization
Peirong Liu, Ana Lawry Aguila, Juan Iglesias
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion
Zhaoxi Chen, Jiaxiang Tang, Yuhao Dong et al.
SAOR: Single-View Articulated Object Reconstruction
Mehmet Aygun, Oisin Mac Aodha
GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos
Tomas Soucek, Dima Damen, Michael Wray et al.
Self-supervised ControlNet with Spatio-Temporal Mamba for Real-world Video Super-resolution
Shijun Shi, Jing Xu, Lijing Lu et al.
Density-Adaptive Model Based on Motif Matrix for Multi-Agent Trajectory Prediction
Di Wen, Haoran Xu, Zhaocheng He et al.
Towards Accurate Post-training Quantization for Diffusion Models
Changyuan Wang, Ziwei Wang, Xiuwei Xu et al.
MoST: Multi-Modality Scene Tokenization for Motion Prediction
Norman Mu, Jingwei Ji, Zhenpei Yang et al.
Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions
Stefan Andreas Baumann, Felix Krause, Michael Neumayr et al.
Gear-NeRF: Free-Viewpoint Rendering and Tracking with Motion-aware Spatio-Temporal Sampling
Xinhang Liu, Yu-Wing Tai, Chi-Keung Tang et al.
SpecTRe-GS: Modeling Highly Specular Surfaces with Reflected Nearby Objects by Tracing Rays in 3D Gaussian Splatting
Jiajun Tang, Fan Fei, Zhihao Li et al.
MultiDiff: Consistent Novel View Synthesis from a Single Image
Norman Müller, Katja Schwarz, Barbara Roessle et al.
ChatGarment: Garment Estimation, Generation and Editing via Large Language Models
Siyuan Bian, Chenghao Xu, Yuliang Xiu et al.
Multi-Scale Video Anomaly Detection by Multi-Grained Spatio-Temporal Representation Learning
Menghao Zhang, Jingyu Wang, Qi Qi et al.
Uncertainty-aware Action Decoupling Transformer for Action Anticipation
Hongji Guo, Nakul Agarwal, Shao-Yuan Lo et al.
VidTwin: Video VAE with Decoupled Structure and Dynamics
Yuchi Wang, Junliang Guo, Xinyi Xie et al.
PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection
Kuan-Chih Huang, Weijie Lyu, Ming-Hsuan Yang et al.
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
Haoxin Chen, Yong Zhang, Xiaodong Cun et al.
TextNeRF: A Novel Scene-Text Image Synthesis Method based on Neural Radiance Fields
Jialei Cui, Jianwei Du, Wenzhuo Liu et al.
Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation
Jiantao Lin, Xin Yang, Meixi Chen et al.
Using Diffusion Priors for Video Amodal Segmentation
Kaihua Chen, Deva Ramanan, Tarasha Khurana
One Model for ALL: Low-Level Task Interaction Is a Key to Task-Agnostic Image Fusion
Chunyang Cheng, Tianyang Xu, Zhenhua Feng et al.
An Asymmetric Augmented Self-Supervised Learning Method for Unsupervised Fine-Grained Image Hashing
Feiran Hu, Chenlin Zhang, Jiangliang GUO et al.
MimicDiffusion: Purifying Adversarial Perturbation via Mimicking Clean Diffusion Model
Kaiyu Song, Hanjiang Lai, Yan Pan et al.
Action Scene Graphs for Long-Form Understanding of Egocentric Videos
Ivan Rodin, Antonino Furnari, Kyle Min et al.
Parallelized Autoregressive Visual Generation
Yuqing Wang, Shuhuai Ren, Zhijie Lin et al.
DiffusionTrack: Point Set Diffusion Model for Visual Object Tracking
Fei Xie, Zhongdao Wang, Chao Ma
EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models
Jingyuan Yang, Jiawei Feng, Hui Huang
MMRL: Multi-Modal Representation Learning for Vision-Language Models
Yuncheng Guo, Xiaodong Gu
SpiderMatch: 3D Shape Matching with Global Optimality and Geometric Consistency
Paul Roetzer, Florian Bernard
Realigning Confidence with Temporal Saliency Information for Point-Level Weakly-Supervised Temporal Action Localization
Ziying Xia, Jian Cheng, Siyu Liu et al.
3D Facial Expressions through Analysis-by-Neural-Synthesis
George Retsinas, Panagiotis Filntisis, Radek Danecek et al.
Segment and Caption Anything
Xiaoke Huang, Jianfeng Wang, Yansong Tang et al.
Brush2Prompt: Contextual Prompt Generator for Object Inpainting
Mang Tik Chiu, Yuqian Zhou, Lingzhi Zhang et al.
G^3-LQ: Marrying Hyperbolic Alignment with Explicit Semantic-Geometric Modeling for 3D Visual Grounding
Yuan Wang, Yali Li, Shengjin Wang
Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images
Chaoqin Huang, Aofan Jiang, Jinghao Feng et al.
The Language of Motion: Unifying Verbal and Non-verbal Language of 3D Human Motion
Changan Chen, Juze Zhang, Shrinidhi Kowshika Lakshmikanth et al.
Sparse Views Near Light: A Practical Paradigm for Uncalibrated Point-light Photometric Stereo
Mohammed Brahimi, Bjoern Haefner, Zhenzhang Ye et al.
Total Selfie: Generating Full-Body Selfies
Bowei Chen, Brian Curless, Ira Kemelmacher-Shlizerman et al.