Most Cited CVPR "neural network analysis" Papers
5,589 papers found • Page 21 of 28
Conference
Robotic Visual Instruction
Yanbang Li, ZiYang Gong, Haoyang Li et al.
Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data
Haoxin Li, Boyang Li
AnyMap: Learning a General Camera Model for Structure-from-Motion with Unknown Distortion in Dynamic Scenes
Andrea Porfiri Dal Cin, Georgi Dikov, Jihong Ju et al.
Learning Physics From Video: Unsupervised Physical Parameter Estimation for Continuous Dynamical Systems
Alejandro Castañeda Garcia, Jan Warchocki, Jan van Gemert et al.
DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion
Qitao Zhao, Amy Lin, Jeff Tan et al.
Navigating Image Restoration with VAR’s Distribution Alignment Prior
Siyang Wang, Naishan Zheng, Jie Huang et al.
Neuron: Learning Context-Aware Evolving Representations for Zero-Shot Skeleton Action Recognition
Yang Chen, Jingcai Guo, Song Guo et al.
Two by Two: Learning Multi-Task Pairwise Objects Assembly for Generalizable Robot Manipulation
Yu Qi, Yuanchen Ju, Tianming Wei et al.
ALIEN: Implicit Neural Representations for Human Motion Prediction under Arbitrary Latency
Dong Wei, Xiaoning Sun, Xizhan Gao et al.
Decentralized Diffusion Models
David McAllister, Matthew Tancik, Jiaming Song et al.
TinyFusion: Diffusion Transformers Learned Shallow
Gongfan Fang, Kunjun Li, Xinyin Ma et al.
Poly-Autoregressive Prediction for Modeling Interactions
Neerja Thakkar, Tara Sadjadpour, Jathushan Rajasegaran et al.
DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery
Utkarsh Mall, Cheng Perng Phoo, Mia Chiquier et al.
UVGS: Reimagining Unstructured 3D Gaussian Splatting using UV Mapping
Aashish Rai, Dilin Wang, Mihir Jain et al.
Z-Magic: Zero-shot Multiple Attributes Guided Image Creator
Yingying Deng, Xiangyu He, Fan Tang et al.
Science-T2I: Addressing Scientific Illusions in Image Synthesis
Jialuo Li, Wenhao Chai, XINGYU FU et al.
Minding Fuzzy Regions: A Data-driven Alternating Learning Paradigm for Stable Lesion Segmentation
Lexin Fang, Yunyang Xu, Xiang Ma et al.
LumiNet: Latent Intrinsics Meets Diffusion Models for Indoor Scene Relighting
Xiaoyan Xing, Konrad Groh, Sezer Karaoglu et al.
ProjAttacker: A Configurable Physical Adversarial Attack for Face Recognition via Projector
Yuanwei Liu, Hui Wei, Chengyu Jia et al.
Towards Improved Text-Aligned Codebook Learning: Multi-Hierarchical Codebook-Text Alignment with Long Text
Guotao liang, Baoquan Zhang, Zhiyuan Wen et al.
StyleMaster: Stylize Your Video with Artistic Generation and Translation
Zixuan Ye, Huijuan Huang, Xintao Wang et al.
BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing
Yunqi Gu, Ian Huang, Jihyeon Je et al.
Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition
Zhiyuan Chen, Keyi Li, Yifan Jia et al.
RoomPainter: View-Integrated Diffusion for Consistent Indoor Scene Texturing
Zhipeng Huang, Wangbo Yu, Xinhua Cheng et al.
A Simple Data Augmentation for Feature Distribution Skewed Federated Learning
Yunlu Yan, Huazhu Fu, Yuexiang Li et al.
Graph-Embedded Structure-Aware Perceptual Hashing for Neural Network Protection and Piracy Detection
Ruiheng Liu, Haozhe Chen, Boyao Zhao et al.
Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation
Dingcheng Zhen, Shunshun Yin, Shiyang Qin et al.
K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs
Ziheng Ouyang, Zhen Li, Qibin Hou
Less is More: Efficient Model Merging with Binary Task Switch
Biqing Qi, Fangyuan Li, Zhen Wang et al.
Cross-Modal 3D Representation with Multi-View Images and Point Clouds
Ziyang Zhou, Pinghui Wang, Zi Liang et al.
HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation
Hongwei Zheng, Han Li, Wenrui Dai et al.
Exploring Contextual Attribute Density in Referring Expression Counting
Zhicheng Wang, Zhiyu Pan, Zhan Peng et al.
FSHNet: Fully Sparse Hybrid Network for 3D Object Detection
Shuai Liu, Mingyue Cui, Boyang Li et al.
DiverseFlow: Sample-Efficient Diverse Mode Coverage in Flows
Mashrur M. Morshed, Vishnu Naresh Boddeti
Learning Compatible Multi-Prize Subnetworks for Asymmetric Retrieval
Yushuai Sun, Zikun Zhou, Dongmei Jiang et al.
Opportunistic Single-Photon Time of Flight
Sotiris Nousias, Mian Wei, Howard Xiao et al.
Volumetric Surfaces: Representing Fuzzy Geometries with Layered Meshes
Stefano Esposito, Anpei Chen, Christian Reiser et al.
MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data
Hanwen Jiang, Zexiang Xu, Desai Xie et al.
Pose Priors from Language Models
Sanjay Subramanian, Evonne Ng, Lea Müller et al.
Generative Densification: Learning to Densify Gaussians for High-Fidelity Generalizable 3D Reconstruction
Seungtae Nam, Xiangyu Sun, Gyeongjin Kang et al.
Towards Optimizing Large-Scale Multi-Graph Matching in Bioimaging
Max Kahl, Sebastian Stricker, Lisa Hutschenreiter et al.
Image Quality Assessment: From Human to Machine Preference
Chunyi Li, Yuan Tian, Xiaoyue Ling et al.
MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
Ruicheng Wang, Sicheng Xu, Cassie Lee Dai et al.
DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving
Bencheng Liao, Shaoyu Chen, haoran yin et al.
Higher-Order Ratio Cycles for Fast and Globally Optimal Shape Matching
Paul Roetzer, Viktoria Ehm, Daniel Cremers et al.
Acc3D: Accelerating Single Image to 3D Diffusion Models via Edge Consistency Guided Score Distillation
Kendong Liu, Zhiyu Zhu, Hui LIU et al.
ReDiffDet: Rotation-equivariant Diffusion Model for Oriented Object Detection
Jiaqi Zhao, Zeyu Ding, Yong Zhou et al.
SKE-Layout: Spatial Knowledge Enhanced Layout Generation with LLMs
Junsheng Wang, Nieqing Cao, Yan Ding et al.
Continuous Adverse Weather Removal via Degradation-Aware Distillation
Xin Lu, Jie Xiao, Yurui Zhu et al.
Subspace Constraint and Contribution Estimation for Heterogeneous Federated Learning
Xiangtao Zhang, Sheng Li, Ao Li et al.
Emphasizing Discriminative Features for Dataset Distillation in Complex Scenarios
Kai Wang, Zekai Li, Zhi-Qi Cheng et al.
Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing
Pengcheng Xu, Boyuan Jiang, Xiaobin Hu et al.
SeqMvRL: A Sequential Fusion Framework for Multi-view Representation Learning
Ren Wang, Haoliang Sun, Yuxiu Lin et al.
Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language
Yicheng Chen, Xiangtai Li, Yining Li et al.
Structure-from-Motion with a Non-Parametric Camera Model
Yihan Wang, Linfei Pan, Marc Pollefeys et al.
Sea-ing in Low-light
Nisha Varghese, A. N. Rajagopalan
Towards Autonomous Micromobility through Scalable Urban Simulation
Wayne Wu, Honglin He, Chaoyuan Zhang et al.
Learning-enabled Polynomial Lyapunov Function Synthesis via High-Accuracy Counterexample-Guided Framework
Hanrui Zhao, Niuniu Qi, Mengxin Ren et al.
Advancing Adversarial Robustness in GNeRFs: The IL2-NeRF Attack
Nicole Meng, Caleb Manicke, Ronak Sahu et al.
Spiking Transformer: Introducing Accurate Addition-Only Spiking Self-Attention for Transformer
Yufei Guo, Xiaode Liu, Yuanpei Chen et al.
NoiseCtrl: A Sampling-Algorithm-Agnostic Conditional Generation Method for Diffusion Models
Longquan Dai, He Wang, Jinhui Tang
PCM : Picard Consistency Model for Fast Parallel Sampling of Diffusion Models
Junhyuk So, Jiwoong Shin, Chaeyeon Jang et al.
Towards Precise Scaling Laws for Video Diffusion Transformers
Yuanyang Yin, Yaqi Zhao, Mingwu Zheng et al.
Sonic: Shifting Focus to Global Audio Perception in Portrait Animation
Xiaozhong Ji, Xiaobin Hu, Zhihong Xu et al.
T2SG: Traffic Topology Scene Graph for Topology Reasoning in Autonomous Driving
Changsheng Lv, Mengshi Qi, Liang Liu et al.
MODfinity: Unsupervised Domain Adaptation with Multimodal Information Flow Intertwining
Shanglin Liu, Jianming Lv, Jingdan Kang et al.
Volume Tells: Dual Cycle-Consistent Diffusion for 3D Fluorescence Microscopy De-noising and Super-Resolution
ZELIN LI, Chenwei Wang, Zhaoke Huang et al.
DreamTrack: Dreaming the Future for Multimodal Visual Object Tracking
Mingzhe Guo, Weiping Tan, Wenyu Ran et al.
Learned Image Compression with Dictionary-based Entropy Model
Jingbo Lu, Leheng Zhang, Xingyu Zhou et al.
Training Data Provenance Verification: Did Your Model Use Synthetic Data from My Generative Model for Training?
Yuechen Xie, Jie Song, Huiqiong Wang et al.
Shadow Generation Using Diffusion Model with Geometry Prior
Haonan Zhao, Qingyang Liu, Xinhao Tao et al.
How to Merge Your Multimodal Models Over Time?
Sebastian Dziadzio, Vishaal Udandarao, Karsten Roth et al.
Active Hyperspectral Imaging Using an Event Camera
Bohan Yu, Jinxiu Liang, Zhuofeng Wang et al.
Bridging the Gap between Gaussian Diffusion Models and Universal Quantization for Image Compression
Lucas Relic, Roberto Azevedo, Yang Zhang et al.
InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception
Haijie Li, Yanmin Wu, Jiarui Meng et al.
Online Task-Free Continual Learning via Dynamic Expansionable Memory Distribution
Fei Ye, Adrian Bors
Plug-and-Play Interpretable Responsible Text-to-Image Generation via Dual-Space Multi-facet Concept Control
Basim Azam, Naveed Akhtar
Generalized Gaussian Entropy Model for Point Cloud Attribute Compression with Dynamic Likelihood Intervals
Changhao Peng
HSI-GPT: A General-Purpose Large Scene-Motion-Language Model for Human Scene Interaction
Yuan Wang, Yali Li, Lixiang Li et al.
SoftShadow: Leveraging Soft Masks for Penumbra-Aware Shadow Removal
Xinrui Wang, Lanqing Guo, Xiyu Wang et al.
Towards Satellite Image Road Graph Extraction: A Global-Scale Dataset and A Novel Method
Pan Yin, Kaiyu Li, Xiangyong Cao et al.
ManipTrans: Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning
Kailin Li, Puhao Li, Tengyu Liu et al.
COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting
Jiaxin Zhang, Junjun Jiang, Youyu Chen et al.
GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill
Jieming Cui, Tengyu Liu, Ziyu Meng et al.
OPTICAL: Leveraging Optimal Transport for Contribution Allocation in Dataset Distillation
Xiao Cui, Yulei Qin, Wengang Zhou et al.
Incremental Object Keypoint Learning
Mingfu Liang, Jiahuan Zhou, Xu Zou et al.
SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model
Shuhan Tan, John Wheatley Lambert, Hong Jeon et al.
Learning Extremely High Density Crowds as Active Matters
Feixiang He, Jiangbei Yue, Jialin Zhu et al.
Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation
Junha Lee, Chunghyun Park, Jaesung Choe et al.
MobileH2R: Learning Generalizable Human to Mobile Robot Handover Exclusively from Scalable and Diverse Synthetic Data
Zifan Wang, Ziqing Chen, Junyu Chen et al.
Is this Generated Person Existed in Real-world? Fine-grained Detecting and Calibrating Abnormal Human-body
Zeqing Wang, Qingyang Ma, Wentao Wan et al.
Shape and Texture: What Influences Reliable Optical Flow Estimation?
Libo Long, Xiao Hu, Jochen Lang
Rate-In: Information-Driven Adaptive Dropout Rates for Improved Inference-Time Uncertainty Estimation
Tal Zeevi, Ravid Shwartz-Ziv, Yann LeCun et al.
RoboTwin: Dual-Arm Robot Benchmark with Generative Digital Twins
Yao Mu, Tianxing Chen, Zanxin Chen et al.
Beyond Background Shift: Rethinking Instance Replay in Continual Semantic Segmentation
Hongmei Yin, Tingliang Feng, Fan Lyu et al.
DKC: Differentiated Knowledge Consolidation for Cloth-Hybrid Lifelong Person Re-identification
Zhenyu Cui, Jiahuan Zhou, Yuxin Peng
Rectification-specific Supervision and Constrained Estimator for Online Stereo Rectification
Rui Gong, Kim-Hui Yap, Weide Liu et al.
Dual Focus-Attention Transformer for Robust Point Cloud Registration
Kexue Fu, Ming'zhi Yuan, Changwei Wang et al.
Traversing Distortion-Perception Tradeoff using a Single Score-Based Generative Model
Yuhan Wang, Suzhi Bi, Ying-Jun Angela Zhang et al.
SLVR: Super-Light Visual Reconstruction via Blueprint Controllable Convolutions and Exploring Feature Diversity Representation
Ning Ni, Libao Zhang
MobileMamba: Lightweight Multi-Receptive Visual Mamba Network
Haoyang He, Jiangning Zhang, Yuxuan Cai et al.
Learning Endogenous Attention for Incremental Object Detection
Xiang Song, Yuhang He, Jingyuan Li et al.
Minimizing Labeled, Maximizing Unlabeled: An Image-Driven Approach for Video Instance Segmentation
Fangyun Wei, Jinjing Zhao, Kun Yan et al.
DeClotH: Decomposable 3D Cloth and Human Body Reconstruction from a Single Image
Hyeongjin Nam, Donghwan Kim, Jeongtaek Oh et al.
Learning to Sample Effective and Diverse Prompts for Text-to-Image Generation
Taeyoung Yun, Dinghuai Zhang, Jinkyoo Park et al.
Perceptual Inductive Bias Is What You Need Before Contrastive Learning
Junru Zhao, Tianqin Li, Dunhan Jiang et al.
Diffusion Self-Distillation for Zero-Shot Customized Image Generation
Shengqu Cai, Eric Ryan Chan, Yunzhi Zhang et al.
EnvPoser: Environment-aware Realistic Human Motion Estimation from Sparse Observations with Uncertainty Modeling
Songpengcheng Xia, Yu Zhang, Zhuo Su et al.
HOIAnimator: Generating Text-prompt Human-object Animations using Novel Perceptive Diffusion Models
Wenfeng Song, Xinyu Zhang, Shuai Li et al.
HDQMF: Holographic Feature Decomposition Using Quantum Algorithms
Prathyush Poduval, Zhuowen Zou, Mohsen Imani
DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes
Xiaoyu Zhou, Zhiwei Lin, Xiaojun Shan et al.
PortraitBooth: A Versatile Portrait Model for Fast Identity-preserved Personalization
Xu Peng, Junwei Zhu, Boyuan Jiang et al.
Dr.Hair: Reconstructing Scalp-Connected Hair Strands without Pre-Training via Differentiable Rendering of Line Segments
Yusuke Takimoto, Hikari Takehara, Hiroyuki Sato et al.
Cache Me if You Can: Accelerating Diffusion Models through Block Caching
Felix Wimbauer, Bichen Wu, Edgar Schoenfeld et al.
SparseOcc: Rethinking Sparse Latent Representation for Vision-Based Semantic Occupancy Prediction
Pin Tang, Zhongdao Wang, Guoqing Wang et al.
Beyond First-Order Tweedie: Solving Inverse Problems using Latent Diffusion
Litu Rout, Yujia Chen, Abhishek Kumar et al.
MAPLM: A Real-World Large-Scale Vision-Language Benchmark for Map and Traffic Scene Understanding
Xu Cao, Tong Zhou, Yunsheng Ma et al.
Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training
Runze He, Shaofei Huang, Xuecheng Nie et al.
Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
Hao Li, Xue Yang, Zhaokai Wang et al.
Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion
Yuanxun Lu, Jingyang Zhang, Shiwei Li et al.
Gradient-based Parameter Selection for Efficient Fine-Tuning
Zhi Zhang, Qizhe Zhang, Zijun Gao et al.
HPL-ESS: Hybrid Pseudo-Labeling for Unsupervised Event-based Semantic Segmentation
Linglin Jing, Yiming Ding, Yunpeng Gao et al.
Promptable Behaviors: Personalizing Multi-Objective Rewards from Human Preferences
Minyoung Hwang, Luca Weihs, Chanwoo Park et al.
Fourier Priors-Guided Diffusion for Zero-Shot Joint Low-Light Enhancement and Deblurring
Xiaoqian Lv, Shengping Zhang, Chenyang Wang et al.
From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding
Yonglu Li, Xiaoqian Wu, Xinpeng Liu et al.
LowRankOcc: Tensor Decomposition and Low-Rank Recovery for Vision-based 3D Semantic Occupancy Prediction
Linqing Zhao, Xiuwei Xu, Ziwei Wang et al.
UniDepth: Universal Monocular Metric Depth Estimation
Luigi Piccinelli, Yung-Hsu Yang, Christos Sakaridis et al.
Small Steps and Level Sets: Fitting Neural Surface Models with Point Guidance
Chamin Hewa Koneputugodage, Yizhak Ben-Shabat, Dylan Campbell et al.
Adapt or Perish: Adaptive Sparse Transformer with Attentive Feature Refinement for Image Restoration
Shihao Zhou, Duosheng Chen, Jinshan Pan et al.
3D Human Pose Perception from Egocentric Stereo Videos
Hiroyasu Akada, Jian Wang, Vladislav Golyanik et al.
Check Locate Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation
Biao Gong, Siteng Huang, Yutong Feng et al.
Volumetric Environment Representation for Vision-Language Navigation
Liu, Wenguan Wang, Yi Yang
Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation
Jiaming Liu, Ran Xu, Senqiao Yang et al.
Is Vanilla MLP in Neural Radiance Field Enough for Few-shot View Synthesis?
Hanxin Zhu, Tianyu He, Xin Li et al.
DIEM: Decomposition-Integration Enhancing Multimodal Insights
Xinyi Jiang, Guoming Wang, Junhao Guo et al.
DeMatch: Deep Decomposition of Motion Field for Two-View Correspondence Learning
Shihua Zhang, Zizhuo Li, Yuan Gao et al.
Domain-Agnostic Mutual Prompting for Unsupervised Domain Adaptation
Zhekai Du, Xinyao Li, Fengling Li et al.
Absolute Pose from One or Two Scaled and Oriented Features
Jonathan Ventura, Zuzana Kukelova, Torsten Sattler et al.
Training Vision Transformers for Semi-Supervised Semantic Segmentation
Xinting Hu, Li Jiang, Bernt Schiele
APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation
Weizhao He, Yang Zhang, Wei Zhuo et al.
SFOD: Spiking Fusion Object Detector
Yimeng Fan, Wei Zhang, Changsong Liu et al.
InstanceDiffusion: Instance-level Control for Image Generation
XuDong Wang, Trevor Darrell, Sai Saketh Rambhatla et al.
Robust Emotion Recognition in Context Debiasing
Dingkang Yang, Kun Yang, Mingcheng Li et al.
Balancing Act: Distribution-Guided Debiasing in Diffusion Models
Rishubh Parihar, Abhijnya Bhat, Abhipsa Basu et al.
Sieve: Multimodal Dataset Pruning using Image Captioning Models
Anas Mahmoud, Mostafa Elhoushi, Amro Abbas et al.
Not All Voxels Are Equal: Hardness-Aware Semantic Scene Completion with Self-Distillation
Song Wang, Jiawei Yu, Wentong Li et al.
Towards Fairness-Aware Adversarial Learning
Yanghao Zhang, Tianle Zhang, Ronghui Mu et al.
SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World Knowledge
Andong Wang, Bo Wu, Sunli Chen et al.
MuRF: Multi-Baseline Radiance Fields
Haofei Xu, Anpei Chen, Yuedong Chen et al.
Hide in Thicket: Generating Imperceptible and Rational Adversarial Perturbations on 3D Point Clouds
Tianrui Lou, Xiaojun Jia, Jindong Gu et al.
Retrieval-Augmented Egocentric Video Captioning
Jilan Xu, Yifei Huang, Junlin Hou et al.
Low-Rank Knowledge Decomposition for Medical Foundation Models
Yuhang Zhou, Haolin li, Siyuan Du et al.
Pixel-level Semantic Correspondence through Layout-aware Representation Learning and Multi-scale Matching Integration
Yixuan Sun, Zhangyue Yin, Haibo Wang et al.
Event-assisted Low-Light Video Object Segmentation
Li Hebei, Jin Wang, Jiahui Yuan et al.
3DToonify: Creating Your High-Fidelity 3D Stylized Avatar Easily from 2D Portrait Images
Yifang Men, Hanxi Liu, Yuan Yao et al.
CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation
Seokju Cho, Heeseong Shin, Sunghwan Hong et al.
PIE-NeRF: Physics-based Interactive Elastodynamics with NeRF
Yutao Feng, Yintong Shang, Xuan Li et al.
MAFA: Managing False Negatives for Vision-Language Pre-training
Jaeseok Byun, Dohoon Kim, Taesup Moon
ChatScene: Knowledge-Enabled Safety-Critical Scenario Generation for Autonomous Vehicles
Jiawei Zhang, Chejian Xu, Bo Li
MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos
Jielin Qiu, Jiacheng Zhu, William Han et al.
Learning Structure-from-Motion with Graph Attention Networks
Lucas Brynte, José Pedro Iglesias, Carl Olsson et al.
SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection
Peng Qi, Zehong Yan, Wynne Hsu et al.
Spatial-Aware Regression for Keypoint Localization
Dongkai Wang, Shiliang Zhang
EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars
Nikita Drobyshev, Antoni Bigata Casademunt, Konstantinos Vougioukas et al.
Latent Modulated Function for Computational Optimal Continuous Image Representation
Zongyao He, Zhi Jin
Domain-Rectifying Adapter for Cross-Domain Few-Shot Segmentation
Jiapeng Su, Qi Fan, Wenjie Pei et al.
L2B: Learning to Bootstrap Robust Models for Combating Label Noise
Yuyin Zhou, Xianhang li, Fengze Liu et al.
OED: Towards One-stage End-to-End Dynamic Scene Graph Generation
Guan Wang, Zhimin Li, Qingchao Chen et al.
ViewFusion: Towards Multi-View Consistency via Interpolated Denoising
Xianghui Yang, Gil Avraham, Yan Zuo et al.
Delving into the Trajectory Long-tail Distribution for Muti-object Tracking
Sijia Chen, En Yu, Jinyang Li et al.
Streaming Dense Video Captioning
Xingyi Zhou, Anurag Arnab, Shyamal Buch et al.
On the Scalability of Diffusion-based Text-to-Image Generation
Hao Li, Yang Zou, Ying Wang et al.
Bootstrapping Autonomous Driving Radars with Self-Supervised Learning
Yiduo Hao, Sohrab Madani, Junfeng Guan et al.
OneLLM: One Framework to Align All Modalities with Language
Jiaming Han, Kaixiong Gong, Yiyuan Zhang et al.
LAFS: Landmark-based Facial Self-supervised Learning for Face Recognition
Zhonglin Sun, Chen Feng, Ioannis Patras et al.
PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models
Fei Deng, Qifei Wang, Wei Wei et al.
MMVP: A Multimodal MoCap Dataset with Vision and Pressure Sensors
He Zhang, Shenghao Ren, Haolei Yuan et al.
Tuning Stable Rank Shrinkage: Aiming at the Overlooked Structural Risk in Fine-tuning
Sicong Shen, Yang Zhou, Bingzheng Wei et al.
Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications
Junyi Ma, Xieyuanli Chen, Jiawei Huang et al.
Relightable and Animatable Neural Avatar from Sparse-View Video
Zhen Xu, Sida Peng, Chen Geng et al.
Objects as Volumes: A Stochastic Geometry View of Opaque Solids
Bailey Miller, Hanyu Chen, Alice Lai et al.
Do You Remember? Dense Video Captioning with Cross-Modal Memory Retrieval
Minkuk Kim, Hyeon Bae Kim, Jinyoung Moon et al.
SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting
Hoon Kim, Minje Jang, Wonjun Yoon et al.
CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
Shuyang Sun, Runjia Li, Philip H.S. Torr et al.
Image Neural Field Diffusion Models
Yinbo Chen, Oliver Wang, Richard Zhang et al.
Dual-View Visual Contextualization for Web Navigation
Jihyung Kil, Chan Hee Song, Boyuan Zheng et al.
Improving the Generalization of Segmentation Foundation Model under Distribution Shift via Weakly Supervised Adaptation
Haojie Zhang, Yongyi Su, Xun Xu et al.
Language-guided Image Reflection Separation
Haofeng Zhong, Yuchen Hong, Shuchen Weng et al.
CrowdDiff: Multi-hypothesis Crowd Density Estimation using Diffusion Models
Yasiru Ranasinghe, Nithin Gopalakrishnan Nair, Wele Gedara Chaminda Bandara et al.
Semantically-Shifted Incremental Adapter-Tuning is A Continual ViTransformer
Yuwen Tan, Qinhao Zhou, Xiang Xiang et al.
Physics-guided Shape-from-Template: Monocular Video Perception through Neural Surrogate Models
David Stotko, Nils Wandel, Reinhard Klein
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation
Tong Wu, Guandao Yang, Zhibing Li et al.
Orthogonal Adaptation for Modular Customization of Diffusion Models
Ryan Po, Guandao Yang, Kfir Aberman et al.
End-to-End Spatio-Temporal Action Localisation with Video Transformers
Alexey Gritsenko, Xuehan Xiong, Josip Djolonga et al.
TRINS: Towards Multimodal Language Models that Can Read
Ruiyi Zhang, Yanzhe Zhang, Jian Chen et al.
Unlocking Pre-trained Image Backbones for Semantic Image Synthesis
Tariq Berrada, Jakob Verbeek, camille couprie et al.
Infer from What You Have Seen Before: Temporally-dependent Classifier for Semi-supervised Video Segmentation
Jiafan Zhuang, Zilei Wang, Yixin Zhang et al.
RegionGPT: Towards Region Understanding Vision Language Model
Qiushan Guo, Shalini De Mello, Danny Yin et al.
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Xiang Yue, Yuansheng Ni, Kai Zhang et al.
Navigate Beyond Shortcuts: Debiased Learning Through the Lens of Neural Collapse
Yining Wang, Junjie Sun, Chenyue Wang et al.