Most Cited 2024 "hierarchical interaction model" Papers
12,324 papers found • Page 50 of 62
Conference
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
Ruyang Liu, Chen Li, Yixiao Ge et al.
Video Frame Interpolation via Direct Synthesis with the Event-based Reference
Yuhan Liu, Yongjian Deng, Hao Chen et al.
Lane2Seq: Towards Unified Lane Detection via Sequence Generation
Kunyang Zhou
CorrMatch: Label Propagation via Correlation Matching for Semi-Supervised Semantic Segmentation
Bo-Yuan Sun, Yuqi Yang, Le Zhang et al.
Rethinking Boundary Discontinuity Problem for Oriented Object Detection
Hang Xu, Xinyuan Liu, Haonan Xu et al.
MCNet: Rethinking the Core Ingredients for Accurate and Efficient Homography Estimation
Haokai Zhu, Si-Yuan Cao, Jianxin Hu et al.
UniDepth: Universal Monocular Metric Depth Estimation
Luigi Piccinelli, Yung-Hsu Yang, Christos Sakaridis et al.
Diffusion Model Alignment Using Direct Preference Optimization
Bram Wallace, Meihua Dang, Rafael Rafailov et al.
SD4Match: Learning to Prompt Stable Diffusion Model for Semantic Matching
Xinghui Li, Jingyi Lu, Kai Han et al.
Uncertainty-Guided Never-Ending Learning to Drive
Lei Lai, Eshed Ohn-Bar, Sanjay Arora et al.
Feedback-Guided Autonomous Driving
Jimuyang Zhang, Zanming Huang, Arijit Ray et al.
Small Steps and Level Sets: Fitting Neural Surface Models with Point Guidance
Chamin Hewa Koneputugodage, Yizhak Ben-Shabat, Dylan Campbell et al.
Adapt or Perish: Adaptive Sparse Transformer with Attentive Feature Refinement for Image Restoration
Shihao Zhou, Duosheng Chen, Jinshan Pan et al.
3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos
Jiakai Sun, Han Jiao, Guangyuan Li et al.
LTM: Lightweight Textured Mesh Extraction and Refinement of Large Unbounded Scenes for Efficient Storage and Real-time Rendering
Jaehoon Choi, Rajvi Shah, Qinbo Li et al.
Geometry Transfer for Stylizing Radiance Fields
Hyunyoung Jung, Seonghyeon Nam, Nikolaos Sarafianos et al.
3D Human Pose Perception from Egocentric Stereo Videos
Hiroyasu Akada, Jian Wang, Vladislav Golyanik et al.
QN-Mixer: A Quasi-Newton MLP-Mixer Model for Sparse-View CT Reconstruction
Ishak Ayad, Nicolas Larue, Mai K. Nguyen
Check Locate Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation
Biao Gong, Siteng Huang, Yutong Feng et al.
Prompt3D: Random Prompt Assisted Weakly-Supervised 3D Object Detection
Xiaohong Zhang, Huisheng Ye, Jingwen Li et al.
Boosting Self-Supervision for Single-View Scene Completion via Knowledge Distillation
Keonhee Han, Dominik Muhle, Felix Wimbauer et al.
Volumetric Environment Representation for Vision-Language Navigation
Liu, Wenguan Wang, Yi Yang
CrossKD: Cross-Head Knowledge Distillation for Object Detection
JiaBao Wang, yuming chen, Zhaohui Zheng et al.
Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation
Jiaming Liu, Ran Xu, Senqiao Yang et al.
TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing
Sherry X. Chen, Yaron Vaxman, Elad Ben Baruch et al.
Leveraging Camera Triplets for Efficient and Accurate Structure-from-Motion
Lalit Manam, Venu Madhav Govindu
CG-HOI: Contact-Guided 3D Human-Object Interaction Generation
Christian Diller, Angela Dai
Is Vanilla MLP in Neural Radiance Field Enough for Few-shot View Synthesis?
Hanxin Zhu, Tianyu He, Xin Li et al.
Resurrecting Old Classes with New Data for Exemplar-Free Continual Learning
Dipam Goswami, Albin Soutif, Yuyang Liu et al.
DIEM: Decomposition-Integration Enhancing Multimodal Insights
Xinyi Jiang, Guoming Wang, Junhao Guo et al.
Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters
Jiazuo Yu, Yunzhi Zhuge, Lu Zhang et al.
HOI-M^3: Capture Multiple Humans and Objects Interaction within Contextual Environment
Juze Zhang, Jingyan Zhang, Zining Song et al.
CORES: Convolutional Response-based Score for Out-of-distribution Detection
Keke Tang, Chao Hou, Weilong Peng et al.
Equivariant Multi-Modality Image Fusion
Zixiang Zhao, Haowen Bai, Jiangshe Zhang et al.
PDF: A Probability-Driven Framework for Open World 3D Point Cloud Semantic Segmentation
Jinfeng Xu, Siyuan Yang, Xianzhi Li et al.
NeISF: Neural Incident Stokes Field for Geometry and Material Estimation
Chenhao Li, Taishi Ono, Takeshi Uemori et al.
PromptKD: Unsupervised Prompt Distillation for Vision-Language Models
Zheng Li, Xiang Li, xinyi fu et al.
DeMatch: Deep Decomposition of Motion Field for Two-View Correspondence Learning
Shihua Zhang, Zizhuo Li, Yuan Gao et al.
Domain Gap Embeddings for Generative Dataset Augmentation
Yinong Oliver Wang, Younjoon Chung, Chen Henry Wu et al.
Domain-Agnostic Mutual Prompting for Unsupervised Domain Adaptation
Zhekai Du, Xinyao Li, Fengling Li et al.
TransLoc4D: Transformer-based 4D Radar Place Recognition
Guohao Peng, Heshan Li, Yangyang Zhao et al.
Higher-order Relational Reasoning for Pedestrian Trajectory Prediction
Sungjune Kim, Hyung-gun Chi, Hyerin Lim et al.
Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation
Jingyun Wang, Guoliang Kang
Absolute Pose from One or Two Scaled and Oriented Features
Jonathan Ventura, Zuzana Kukelova, Torsten Sattler et al.
Draw Step by Step: Reconstructing CAD Construction Sequences from Point Clouds via Multimodal Diffusion.
Weijian Ma, Shuaiqi Chen, Yunzhong Lou et al.
DSGG: Dense Relation Transformer for an End-to-end Scene Graph Generation
Zeeshan Hayder, Xuming He
Open-Vocabulary 3D Semantic Segmentation with Foundation Models
Li Jiang, Shaoshuai Shi, Bernt Schiele
Training Vision Transformers for Semi-Supervised Semantic Segmentation
Xinting Hu, Li Jiang, Bernt Schiele
APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation
Weizhao He, Yang Zhang, Wei Zhuo et al.
Design2Cloth: 3D Cloth Generation from 2D Masks
Jiali Zheng, Rolandos Alexandros Potamias, Stefanos Zafeiriou
S-DyRF: Reference-Based Stylized Radiance Fields for Dynamic Scenes
Xingyi Li, Zhiguo Cao, Yizheng Wu et al.
SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation
Aysim Toker, Marvin Eisenberger, Daniel Cremers et al.
Dual-Consistency Model Inversion for Non-Exemplar Class Incremental Learning
Zihuan Qiu, Yi Xu, Fanman Meng et al.
DS-NeRV: Implicit Neural Video Representation with Decomposed Static and Dynamic Codes
Hao Yan, Zhihui Ke, Xiaobo Zhou et al.
Rolling Shutter Correction with Intermediate Distortion Flow Estimation
Mingdeng Cao, Sidi Yang, Yujiu Yang et al.
Towards Transferable Targeted 3D Adversarial Attack in the Physical World
Yao Huang, Yinpeng Dong, Shouwei Ruan et al.
Hybrid Functional Maps for Crease-Aware Non-Isometric Shape Matching
Lennart Bastian, Yizheng Xie, Nassir Navab et al.
Class Tokens Infusion for Weakly Supervised Semantic Segmentation
Sung-Hoon Yoon, Hoyong Kwon, Hyeonseong Kim et al.
SFOD: Spiking Fusion Object Detector
Yimeng Fan, Wei Zhang, Changsong Liu et al.
AnyDoor: Zero-shot Object-level Image Customization
Xi Chen, Lianghua Huang, Yu Liu et al.
SeD: Semantic-Aware Discriminator for Image Super-Resolution
Bingchen Li, Xin Li, Hanxin Zhu et al.
InstanceDiffusion: Instance-level Control for Image Generation
XuDong Wang, Trevor Darrell, Sai Saketh Rambhatla et al.
Robust Emotion Recognition in Context Debiasing
Dingkang Yang, Kun Yang, Mingcheng Li et al.
Improving Training Efficiency of Diffusion Models via Multi-Stage Framework and Tailored Multi-Decoder Architecture
Huijie Zhang, Yifu Lu, Ismail Alkhouri et al.
Balancing Act: Distribution-Guided Debiasing in Diffusion Models
Rishubh Parihar, Abhijnya Bhat, Abhipsa Basu et al.
Sieve: Multimodal Dataset Pruning using Image Captioning Models
Anas Mahmoud, Mostafa Elhoushi, Amro Abbas et al.
Not All Voxels Are Equal: Hardness-Aware Semantic Scene Completion with Self-Distillation
Song Wang, Jiawei Yu, Wentong Li et al.
Towards Fairness-Aware Adversarial Learning
Yanghao Zhang, Tianle Zhang, Ronghui Mu et al.
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
Andong Wang, Bo Wu, Sunli Chen et al.
MuRF: Multi-Baseline Radiance Fields
Haofei Xu, Anpei Chen, Yuedong Chen et al.
Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans
Romain Loiseau, Elliot Vincent, Mathieu Aubry et al.
Hide in Thicket: Generating Imperceptible and Rational Adversarial Perturbations on 3D Point Clouds
Tianrui Lou, Xiaojun Jia, Jindong Gu et al.
PIGEON: Predicting Image Geolocations
Lukas Haas, Michal Skreta, Silas Alberti et al.
JoAPR: Cleaning the Lens of Prompt Learning for Vision-Language Models
YUNCHENG GUO, Xiaodong Gu
Retrieval-Augmented Egocentric Video Captioning
Jilan Xu, Yifei Huang, Junlin Hou et al.
GPLD3D: Latent Diffusion of 3D Shape Generative Models by Enforcing Geometric and Physical Priors
Yuan Dong, Qi Zuo, Xiaodong Gu et al.
Low-Rank Knowledge Decomposition for Medical Foundation Models
Yuhang Zhou, Haolin li, Siyuan Du et al.
Pixel-level Semantic Correspondence through Layout-aware Representation Learning and Multi-scale Matching Integration
Yixuan Sun, Zhangyue Yin, Haibo Wang et al.
View From Above: Orthogonal-View aware Cross-view Localization
Shan Wang, Chuong Nguyen, Jiawei Liu et al.
WorDepth: Variational Language Prior for Monocular Depth Estimation
Ziyao Zeng, Hyoungseob Park, Fengyu Yang et al.
Event-assisted Low-Light Video Object Segmentation
Li Hebei, Jin Wang, Jiahui Yuan et al.
3DToonify: Creating Your High-Fidelity 3D Stylized Avatar Easily from 2D Portrait Images
Yifang Men, Hanxi Liu, Yuan Yao et al.
Synthesize Diagnose and Optimize: Towards Fine-Grained Vision-Language Understanding
Wujian Peng, Sicheng Xie, Zuyao You et al.
EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
Tai Wang, Xiaohan Mao, Chenming Zhu et al.
DIOD: Self-Distillation Meets Object Discovery
Sandra Kara, Hejer AMMAR, Julien Denize et al.
FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models
LIn Zhao, Tianchen Zhao, Zinan Lin et al.
COLMAP-Free 3D Gaussian Splatting
Yang Fu, Sifei Liu, Amey Kulkarni et al.
SNED: Superposition Network Architecture Search for Efficient Video Diffusion Model
Zhengang Li, Yan Kang, Yuchen Liu et al.
Personalized Residuals for Concept-Driven Text-to-Image Generation
Cusuh Ham, Matthew Fisher, James Hays et al.
CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation
Seokju Cho, Heeseong Shin, Sunghwan Hong et al.
Deep Generative Model based Rate-Distortion for Image Downscaling Assessment
yuanbang liang, Bhavesh Garg, Paul L. Rosin et al.
Forecasting of 3D Whole-body Human Poses with Grasping Objects
yan haitao, Qiongjie Cui, Jiexin Xie et al.
VA3: Virtually Assured Amplification Attack on Probabilistic Copyright Protection for Text-to-Image Generative Models
Xiang Li, Qianli Shen, Kenji Kawaguchi
PIE-NeRF: Physics-based Interactive Elastodynamics with NeRF
Yutao Feng, Yintong Shang, Xuan Li et al.
SNI-SLAM: Semantic Neural Implicit SLAM
Siting Zhu, Guangming Wang, Hermann Blum et al.
Edge-Aware 3D Instance Segmentation Network with Intelligent Semantic Prior
Wonseok Roh, Hwanhee Jung, Giljoo Nam et al.
TextureDreamer: Image-Guided Texture Synthesis Through Geometry-Aware Diffusion
Yu-Ying Yeh, Jia-Bin Huang, Changil Kim et al.
MAFA: Managing False Negatives for Vision-Language Pre-training
Jaeseok Byun, Dohoon Kim, Taesup Moon
Blur2Blur: Blur Conversion for Unsupervised Image Deblurring on Unknown Domains
Bang-Dang Pham, Phong Tran, Anh Tran et al.
RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models
Ozgur Kara, Bariscan Kurtkaya, Hidir Yesiltepe et al.
ChatScene: Knowledge-Enabled Safety-Critical Scenario Generation for Autonomous Vehicles
Jiawei Zhang, Chejian Xu, Bo Li
MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos
Jielin Qiu, Jiacheng Zhu, William Han et al.
Generalizable Novel-View Synthesis using a Stereo Camera
Haechan Lee, Wonjoon Jin, Seung-Hwan Baek et al.
Learning Structure-from-Motion with Graph Attention Networks
Lucas Brynte, José Pedro Iglesias, Carl Olsson et al.
Don’t Drop Your Samples! Coherence-Aware Training Benefits Conditional Diffusion
Nicolas Dufour, Victor Besnier, Vicky Kalogeiton et al.
SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection
Peng Qi, Zehong Yan, Wynne Hsu et al.
Spatial-Aware Regression for Keypoint Localization
Dongkai Wang, Shiliang Zhang
Action-slot: Visual Action-centric Representations for Multi-label Atomic Activity Recognition in Traffic Scenes
Chi-Hsi Kung, 書緯 呂, Yi-Hsuan Tsai et al.
Diff-BGM: A Diffusion Model for Video Background Music Generation
Sizhe Li, Yiming Qin, Minghang Zheng et al.
ArtAdapter: Text-to-Image Style Transfer using Multi-Level Style Encoder and Explicit Adaptation
Dar-Yen Chen, Hamish Tennent, Ching-Wen Hsu
EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars
Nikita Drobyshev, Antoni Bigata Casademunt, Konstantinos Vougioukas et al.
Shadow-Enlightened Image Outpainting
Hang Yu, Ruilin Li, Shaorong Xie et al.
Specularity Factorization for Low-Light Enhancement
Saurabh Saini, P. J. Narayanan
Latent Modulated Function for Computational Optimal Continuous Image Representation
Zongyao He, Zhi Jin
Domain-Rectifying Adapter for Cross-Domain Few-Shot Segmentation
Jiapeng Su, Qi Fan, Wenjie Pei et al.
Shallow-Deep Collaborative Learning for Unsupervised Visible-Infrared Person Re-Identification
Bin Yang, Jun Chen, Mang Ye
L2B: Learning to Bootstrap Robust Models for Combating Label Noise
Yuyin Zhou, Xianhang li, Fengze Liu et al.
OED: Towards One-stage End-to-End Dynamic Scene Graph Generation
Guan Wang, Zhimin Li, Qingchao Chen et al.
SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis
Ziqiao Peng, Wentao Hu, Yue Shi et al.
Attack To Defend: Exploiting Adversarial Attacks for Detecting Poisoned Models
Samar Fares, Karthik Nandakumar
ViewFusion: Towards Multi-View Consistency via Interpolated Denoising
Xianghui Yang, Gil Avraham, Yan Zuo et al.
D3still: Decoupled Differential Distillation for Asymmetric Image Retrieval
Yi Xie, Yihong Lin, Wenjie Cai et al.
LiDAR-Net: A Real-scanned 3D Point Cloud Dataset for Indoor Scenes
Yanwen Guo, Yuanqi Li, Dayong Ren et al.
Delving into the Trajectory Long-tail Distribution for Muti-object Tracking
Sijia Chen, En Yu, Jinyang Li et al.
Non-autoregressive Sequence-to-Sequence Vision-Language Models
Kunyu Shi, Qi Dong, Luis Goncalves et al.
No More Ambiguity in 360° Room Layout via Bi-Layout Estimation
Yu-Ju Tsai, Jin-Cheng Jhang, JINGJING ZHENG et al.
MTLoRA: Low-Rank Adaptation Approach for Efficient Multi-Task Learning
Ahmed Agiza, Marina Neseem, Sherief Reda
HUNTER: Unsupervised Human-centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real Scenes
Yichen Yao, Zimo Jiang, YUJING SUN et al.
Streaming Dense Video Captioning
Xingyi Zhou, Anurag Arnab, Shyamal Buch et al.
3D LiDAR Mapping in Dynamic Environments using a 4D Implicit Neural Representation
Xingguang Zhong, Yue Pan, Cyrill Stachniss et al.
SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction
Yuanhui Huang, Wenzhao Zheng, Borui Zhang et al.
On the Scalability of Diffusion-based Text-to-Image Generation
Hao Li, Yang Zou, Ying Wang et al.
Bootstrapping Autonomous Driving Radars with Self-Supervised Learning
Yiduo Hao, Sohrab Madani, Junfeng Guan et al.
Analyzing and Improving the Training Dynamics of Diffusion Models
Tero Karras, Miika Aittala, Jaakko Lehtinen et al.
DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaptation by Combining 3D GANs and Diffusion Priors
Biwen Lei, Kai Yu, Mengyang Feng et al.
OneLLM: One Framework to Align All Modalities with Language
Jiaming Han, Kaixiong Gong, Yiyuan Zhang et al.
LAFS: Landmark-based Facial Self-supervised Learning for Face Recognition
Zhonglin Sun, Chen Feng, Ioannis Patras et al.
See Say and Segment: Teaching LMMs to Overcome False Premises
Tsung-Han Wu, Giscard Biamby, David Chan et al.
PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models
Fei Deng, Qifei Wang, Wei Wei et al.
Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relationship Detection
Jongha Kim, Jihwan Park, Jinyoung Park et al.
MMVP: A Multimodal MoCap Dataset with Vision and Pressure Sensors
He Zhang, Shenghao Ren, Haolei Yuan et al.
SD2Event:Self-supervised Learning of Dynamic Detectors and Contextual Descriptors for Event Cameras
Yuan Gao, Yuqing Zhu, Xinjun Li et al.
Tuning Stable Rank Shrinkage: Aiming at the Overlooked Structural Risk in Fine-tuning
Sicong Shen, Yang Zhou, Bingzheng Wei et al.
DiSR-NeRF: Diffusion-Guided View-Consistent Super-Resolution NeRF
Jie Long Lee, Chen Li, Gim Hee Lee
Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications
Junyi Ma, Xieyuanli Chen, Jiawei Huang et al.
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
Li Hu
Relightable and Animatable Neural Avatar from Sparse-View Video
Zhen Xu, Sida Peng, Chen Geng et al.
Objects as Volumes: A Stochastic Geometry View of Opaque Solids
Bailey Miller, Hanyu Chen, Alice Lai et al.
Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation
Xiao Lin, Wenfei Yang, Yuan Gao et al.
PaReNeRF: Toward Fast Large-scale Dynamic NeRF with Patch-based Reference
Xiao Tang, Min Yang, Penghui Sun et al.
PostureHMR: Posture Transformation for 3D Human Mesh Recovery
Yu-Pei Song, Xiao WU, Zhaoquan Yuan et al.
Desigen: A Pipeline for Controllable Design Template Generation
Haohan Weng, Danqing Huang, YU QIAO et al.
WANDR: Intention-guided Human Motion Generation
Markos Diomataris, Nikos Athanasiou, Omid Taheri et al.
WWW: A Unified Framework for Explaining What Where and Why of Neural Networks by Interpretation of Neuron Concepts
Yong Hyun Ahn, Hyeon Bae Kim, Seong Tae Kim
Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval
Minkuk Kim, Hyeon Bae Kim, Jinyoung Moon et al.
Rich Human Feedback for Text-to-Image Generation
Youwei Liang, Junfeng He, Gang Li et al.
Dr. Bokeh: DiffeRentiable Occlusion-aware Bokeh Rendering
Yichen Sheng, Zixun Yu, Lu Ling et al.
SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation
Yuxuan Zhang, Yiren Song, Jiaming Liu et al.
SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting
Hoon Kim, Minje Jang, Wonjun Yoon et al.
CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
Shuyang Sun, Runjia Li, Philip H.S. Torr et al.
Learning from Observer Gaze: Zero-Shot Attention Prediction Oriented by Human-Object Interaction Recognition
Yuchen Zhou, Linkai Liu, Chao Gou
Super-Resolution Reconstruction from Bayer-Pattern Spike Streams
Yanchen Dong, Ruiqin Xiong, Jian Zhang et al.
Image Neural Field Diffusion Models
Yinbo Chen, Oliver Wang, Richard Zhang et al.
Denoising Point Clouds in Latent Space via Graph Convolution and Invertible Neural Network
Aihua Mao, Biao Yan, Zijing Ma et al.
Dual-View Visual Contextualization for Web Navigation
Jihyung Kil, Chan Hee Song, Boyuan Zheng et al.
Improving the Generalization of Segmentation Foundation Model under Distribution Shift via Weakly Supervised Adaptation
Haojie Zhang, Yongyi Su, Xun Xu et al.
Language-guided Image Reflection Separation
Haofeng Zhong, Yuchen Hong, Shuchen Weng et al.
SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation
Thuan Nguyen, Anh Tran
Looking 3D: Anomaly Detection with 2D-3D Alignment
Ankan Kumar Bhunia, Changjian Li, Hakan Bilen
EventPS: Real-Time Photometric Stereo Using an Event Camera
Bohan Yu, Jieji Ren, Jin Han et al.
Diffusion Handles Enabling 3D Edits for Diffusion Models by Lifting Activations to 3D
Karran Pandey, Paul Guerrero, Matheus Gadelha et al.
Circuit Design and Efficient Simulation of Quantum Inner Product and Empirical Studies of Its Effect on Near-Term Hybrid Quantum-Classic Machine Learning
Hao Xiong, Yehui Tang, Xinyu Ye et al.
Uncovering What Why and How: A Comprehensive Benchmark for Causation Understanding of Video Anomaly
Hang Du, Sicheng Zhang, Binzhu Xie et al.
CrowdDiff: Multi-hypothesis Crowd Density Estimation using Diffusion Models
Yasiru Ranasinghe, Nithin Gopalakrishnan Nair, Wele Gedara Chaminda Bandara et al.
PSDPM: Prototype-based Secondary Discriminative Pixels Mining for Weakly Supervised Semantic Segmentation
Xinqiao Zhao, Ziqian Yang, Tianhong Dai et al.
Towards 3D Vision with Low-Cost Single-Photon Cameras
Fangzhou Mu, Carter Sifferman, Sacha Jungerman et al.
Shadows Don't Lie and Lines Can't Bend! Generative Models don't know Projective Geometry...for now
Ayush Sarkar, Hanlin Mai, Amitabh Mahapatra et al.
Aligning Logits Generatively for Principled Black-Box Knowledge Distillation
Jing Ma, Xiang Xiang, Ke Wang et al.
Semantically-Shifted Incremental Adapter-Tuning is A Continual ViTransformer
Yuwen Tan, Qinhao Zhou, Xiang Xiang et al.
Physics-guided Shape-from-Template: Monocular Video Perception through Neural Surrogate Models
David Stotko, Nils Wandel, Reinhard Klein
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation
Tong Wu, Guandao Yang, Zhibing Li et al.
On Train-Test Class Overlap and Detection for Image Retrieval
Chull Hwan Song, Jooyoung Yoon, Taebaek Hwang et al.
Orthogonal Adaptation for Modular Customization of Diffusion Models
Ryan Po, Guandao Yang, Kfir Aberman et al.
Permutation Equivariance of Transformers and Its Applications
Hengyuan Xu, Liyao Xiang, Hangyu Ye et al.
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Jieneng Chen, Qihang Yu, Xiaohui Shen et al.
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models
Yiming Zhang, Zhening Xing, Yanhong Zeng et al.
DemoCaricature: Democratising Caricature Generation with a Rough Sketch
Dar-Yen Chen, Ayan Kumar Bhunia, Subhadeep Koley et al.
End-to-End Spatio-Temporal Action Localisation with Video Transformers
Alexey Gritsenko, Xuehan Xiong, Josip Djolonga et al.
TRINS: Towards Multimodal Language Models that Can Read
Ruiyi Zhang, Yanzhe Zhang, Jian Chen et al.
EFHQ: Multi-purpose ExtremePose-Face-HQ dataset
Trung Dao, Duc H Vu, Cuong Pham et al.
Logarithmic Lenses: Exploring Log RGB Data for Image Classification
Bruce Maxwell, Sumegha Singhania, Avnish Patel et al.
Unlocking Pre-trained Image Backbones for Semantic Image Synthesis
Tariq Berrada, Jakob Verbeek, camille couprie et al.
TokenCompose: Text-to-Image Diffusion with Token-level Supervision
Zirui Wang, Zhizhou Sha, Zheng Ding et al.
Language-driven Object Fusion into Neural Radiance Fields with Pose-Conditioned Dataset Updates
Ka Chun SHUM, Jaeyeon Kim, Binh-Son Hua et al.
Infer from What You Have Seen Before: Temporally-dependent Classifier for Semi-supervised Video Segmentation
Jiafan Zhuang, Zilei Wang, Yixin Zhang et al.
Seeing the World through Your Eyes
Hadi Alzayer, Kevin Zhang, Brandon Y. Feng et al.
Learning Vision from Models Rivals Learning Vision from Data
Yonglong Tian, Lijie Fan, Kaifeng Chen et al.
RegionGPT: Towards Region Understanding Vision Language Model
Qiushan Guo, Shalini De Mello, Danny Yin et al.
Unlocking the Potential of Pre-trained Vision Transformers for Few-Shot Semantic Segmentation through Relationship Descriptors
Ziqin Zhou, Hai-Ming Xu, Yangyang Shu et al.
Holo-Relighting: Controllable Volumetric Portrait Relighting from a Single Image
Yiqun Mei, Yu Zeng, He Zhang et al.