Most Cited CVPR "markov chain estimation" Papers
5,589 papers found • Page 10 of 28
Conference
PICD: Versatile Perceptual Image Compression with Diffusion Rendering
Tongda Xu, Jiahao Li, Bin Li et al.
Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification
Yang Qin, Chao Chen, Zhihang Fu et al.
Generating Multimodal Driving Scenes via Next-Scene Prediction
Yanhao Wu, Haoyang Zhang, Tianwei Lin et al.
Logits DeConfusion with CLIP for Few-Shot Learning
Shuo Li, Fang Liu, Zehua Hao et al.
FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations
Hmrishav Bandyopadhyay, Yi-Zhe Song
Density-guided Translator Boosts Synthetic-to-Real Unsupervised Domain Adaptive Segmentation of 3D Point Clouds
Zhimin Yuan, Wankang Zeng, Yanfei Su et al.
Robust 3D Shape Reconstruction in Zero-Shot from a Single Image in the Wild
Junhyeong Cho, Kim Youwang, Hunmin Yang et al.
Joint Out-of-Distribution Filtering and Data Discovery Active Learning
Sebastian Schmidt, Leonard Schenk, Leo Schwinn et al.
Large Self-Supervised Models Bridge the Gap in Domain Adaptive Object Detection
Marc-Antoine Lavoie, Anas Mahmoud, Steven L. Waslander
It’s a (Blind) Match! Towards Vision-Language Correspondence without Parallel Data
Dominik Schnaus, Nikita Araslanov, Daniel Cremers
MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception
Wenzhuo Liu, Wenshuo Wang, Yicheng Qiao et al.
Golden Cudgel Network for Real-Time Semantic Segmentation
Guoyu Yang, Yuan Wang, Daming Shi et al.
Unveiling the Unknown: Unleashing the Power of Unknown to Known in Open-Set Source-Free Domain Adaptation
Fuli Wan, Han Zhao, Xu Yang et al.
RANGE: Retrieval Augmented Neural Fields for Multi-Resolution Geo-Embeddings
Aayush Dhakal, Srikumar Sastry, Subash Khanal et al.
Boosting Order-Preserving and Transferability for Neural Architecture Search: a Joint Architecture Refined Search and Fine-tuning Approach
Beichen Zhang, Xiaoxing Wang, Xiaohan Qin et al.
RePerformer: Immersive Human-centric Volumetric Videos from Playback to Photoreal Reperformance
Yuheng Jiang, Zhehao Shen, Chengcheng Guo et al.
Accurate Training Data for Occupancy Map Prediction in Automated Driving Using Evidence Theory
Jonas Kälble, Sascha Wirges, Maxim Tatarchenko et al.
Inversion Circle Interpolation: Diffusion-based Image Augmentation for Data-scarce Classification
Yanghao Wang, Long Chen
4DTAM: Non-Rigid Tracking and Mapping via Dynamic Surface Gaussians
Hidenobu Matsuki, Gwangbin Bae, Andrew J. Davison
EVOS: Efficient Implicit Neural Training via EVOlutionary Selector
Weixiang Zhang, Shuzhao Xie, Chengwei Ren et al.
TurboFill: Adapting Few-step Text-to-image Model for Fast Image Inpainting
Liangbin Xie, Daniil Pakhomov, Zhonghao Wang et al.
Augmented Deep Contexts for Spatially Embedded Video Coding
Yifan Bian, Chuanbo Tang, Li Li et al.
Coupled Laplacian Eigenmaps for Locally-Aware 3D Rigid Point Cloud Matching
Matteo Bastico, Etienne Decencière, Laurent Corté et al.
BHViT: Binarized Hybrid Vision Transformer
Tian Gao, Yu Zhang, Zhiyuan Zhang et al.
Multi-Session SLAM with Differentiable Wide-Baseline Pose Optimization
Lahav Lipson, Jia Deng
COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection
Jinqi Xiao, Shen Sang, Tiancheng Zhi et al.
Prompt Augmentation for Self-supervised Text-guided Image Manipulation
Rumeysa Bodur, Binod Bhattarai, Tae-Kyun Kim
ProKeR: A Kernel Perspective on Few-Shot Adaptation of Large Vision-Language Models
Yassir Bendou, Amine Ouasfi, Vincent Gripon et al.
Geometry-aware Reconstruction and Fusion-refined Rendering for Generalizable Neural Radiance Fields
Tianqi Liu, Xinyi Ye, Min Shi et al.
Generative Sparse-View Gaussian Splatting
Hanyang Kong, Xingyi Yang, Xinchao Wang
EEE-Bench: A Comprehensive Multimodal Electrical And Electronics Engineering Benchmark
Ming Li, Jike Zhong, Tianle Chen et al.
FreeMan: Towards Benchmarking 3D Human Pose Estimation under Real-World Conditions
Jiong WANG, Fengyu Yang, Bingliang Li et al.
VasTSD: Learning 3D Vascular Tree-state Space Diffusion Model for Angiography Synthesis
Zhifeng Wang, Renjiao Yi, Xin Wen et al.
SparseAlign: a Fully Sparse Framework for Cooperative Object Detection
Yunshuang Yuan, Yan Xia, Daniel Cremers et al.
FedAWA: Adaptive Optimization of Aggregation Weights in Federated Learning Using Client Vectors
Changlong Shi, He Zhao, Bingjie Zhang et al.
3D-MVP: 3D Multiview Pretraining for Manipulation
Shengyi Qian, Kaichun Mo, Valts Blukis et al.
DefectFill: Realistic Defect Generation with Inpainting Diffusion Model for Visual Inspection
Jaewoo Song, Daemin Park, Kanghyun Baek et al.
Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation
Shivam Duggal, Yushi Hu, Oscar Michel et al.
GroupMamba: Efficient Group-Based Visual State Space Model
Abdelrahman Shaker, Syed Talal Wasim, Salman Khan et al.
Revisiting Sampson Approximations for Geometric Estimation Problems
Felix Rydell, Angelica Torres, Viktor Larsson
POT: Prototypical Optimal Transport for Weakly Supervised Semantic Segmentation
Jian Wang, Tianhong Dai, Bingfeng Zhang et al.
Diffusion Reflectance Map: Single-Image Stochastic Inverse Rendering of Illumination and Reflectance
Yuto Enyo, Ko Nishino
Incorporating Geo-Diverse Knowledge into Prompting for Increased Geographical Robustness in Object Recognition
Kyle Buettner, Sina Malakouti, Xiang Li et al.
Learning to Rank Patches for Unbiased Image Redundancy Reduction
Yang Luo, Zhineng Chen, Peng Zhou et al.
3D-GSW: 3D Gaussian Splatting for Robust Watermarking
Youngdong Jang, Hyunje Park, Feng Yang et al.
PEACE: Empowering Geologic Map Holistic Understanding with MLLMs
Yangyu Huang, Tianyi Gao, Haoran Xu et al.
Implicit Motion Function
Yue Gao, Jiahao Li, Lei Chu et al.
Fully Geometric Panoramic Localization
Junho Kim, Jiwon Jeong, Young Min Kim
D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation
Weinan Jia, Mengqi Huang, Nan Chen et al.
Gaussian Splatting for Efficient Satellite Image Photogrammetry
Luca Savant Aira, Gabriele Facciolo, Thibaud Ehret
Defense without Forgetting: Continual Adversarial Defense with Anisotropic & Isotropic Pseudo Replay
Yuhang Zhou, Zhongyun Hua
Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation
Reza Qorbani, Gianluca Villani, Theodoros Panagiotakopoulos et al.
MambaVLT: Time-Evolving Multimodal State Space Model for Vision-Language Tracking
Xinqi Liu, Li Zhou, Zikun Zhou et al.
OpticalDR: A Deep Optical Imaging Model for Privacy-Protective Depression Recognition
Yuchen Pan, Junjun Jiang, Kui Jiang et al.
CMA: A Chromaticity Map Adapter for Robust Detection of Screen-Recapture Document Images
Changsheng Chen, Liangwei Lin, Yongqi Chen et al.
UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting
Ziyi Wang, Yanran Zhang, Jie Zhou et al.
TetraSphere: A Neural Descriptor for O(3)-Invariant Point Cloud Analysis
Pavlo Melnyk, Andreas Robinson, Michael Felsberg et al.
Scene-Centric Unsupervised Panoptic Segmentation
Oliver Hahn, Christoph Reich, Nikita Araslanov et al.
3D-Aware Face Editing via Warping-Guided Latent Direction Learning
Yuhao Cheng, Zhuo Chen, Xingyu Ren et al.
Point Clouds Meets Physics: Dynamic Acoustic Field Fitting Network for Point Cloud Understanding
Changshuo Wang, Shuting He, Xiang Fang et al.
PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model
Mingju Gao, Yike Pan, Huan-ang Gao et al.
MATCHA: Towards Matching Anything
Fei Xue, Sven Elflein, Laura Leal-Taixe et al.
Rethinking Spiking Self-Attention Mechanism: Implementing α-XNOR Similarity Calculation in Spiking Transformers
Yichen Xiao, Shuai Wang, Dehao Zhang et al.
CL-LoRA: Continual Low-Rank Adaptation for Rehearsal-Free Class-Incremental Learning
Jiangpeng He, Zhihao Duan, Fengqing Zhu
Grounding 3D Object Affordance with Language Instructions, Visual Observations and Interactions
He Zhu, Quyu Kong, Kechun Xu et al.
Dual-Enhanced Coreset Selection with Class-wise Collaboration for Online Blurry Class Incremental Learning
Yutian Luo, Shiqi Zhao, Haoran Wu et al.
Improving Out-of-Distribution Generalization in Graphs via Hierarchical Semantic Environments
Yinhua Piao, Sangseon Lee, Yijingxiu Lu et al.
Guiding Human-Object Interactions with Rich Geometry and Relations
Mengqing Xue, Yifei Liu, Ling Guo et al.
MIRE: Matched Implicit Neural Representations
Dhananjaya Jayasundara, Heng Zhao, Demetrio Labate et al.
ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions
Tomas Soucek, Prajwal Gatti, Michael Wray et al.
Flexible Depth Completion for Sparse and Varying Point Densities
Jinhyung Park, Yu-Jhe Li, Kris Kitani
Hearing Anywhere in Any Environment
Xiulong Liu, Anurag Kumar, Paul Calamia et al.
Turbo3D: Ultra-fast Text-to-3D Generation
Hanzhe Hu, Tianwei Yin, Fujun Luan et al.
TCFG: Tangential Damping Classifier-free Guidance
Mingi Kwon, Shin seong Kim, Jaeseok Jeong et al.
Enhancing SAM with Efficient Prompting and Preference Optimization for Semi-supervised Medical Image Segmentation
Aishik Konwer, Zhijian Yang, Erhan Bas et al.
CNC-Net: Self-Supervised Learning for CNC Machining Operations
Mohsen Yavartanoo, Sangmin Hong, Reyhaneh Neshatavar et al.
Denoising Functional Maps: Diffusion Models for Shape Correspondence
Aleksei Zhuravlev, Zorah Lähner, Vladislav Golyanik
Binarized Mamba-Transformer for Lightweight Quad Bayer HybridEVS Demosaicing
Shiyang Zhou, Haijin Zeng, Yunfan Lu et al.
Laplacian-guided Entropy Model in Neural Codec with Blur-dissipated Synthesis
Atefeh Khoshkhahtinat, Ali Zafari, Piyush Mehta et al.
AutoURDF: Unsupervised Robot Modeling from Point Cloud Frames Using Cluster Registration
Jiong Lin, Lechen Zhang, Kwansoo Lee et al.
NVComposer: Boosting Generative Novel View Synthesis with Multiple Sparse and Unposed Images
Lingen Li, Zhaoyang Zhang, Yaowei Li et al.
SimMotionEdit: Text-Based Human Motion Editing with Motion Similarity Prediction
Zhengyuan Li, Kai Cheng, Anindita Ghosh et al.
Seq2Time: Sequential Knowledge Transfer for Video LLM Temporal Grounding
Andong Deng, Zhongpai Gao, Anwesa Choudhuri et al.
Functionality Understanding and Segmentation in 3D Scenes
Jaime Corsetti, Francesco Giuliari, Alice Fasoli et al.
Exploring Simple Open-Vocabulary Semantic Segmentation
Zihang Lai
Lift3D Policy: Lifting 2D Foundation Models for Robust 3D Robotic Manipulation
Yueru Jia, Jiaming Liu, Sixiang Chen et al.
AnomalyNCD: Towards Novel Anomaly Class Discovery in Industrial Scenarios
Ziming Huang, Xurui Li, Haotian Liu et al.
CoG-DQA: Chain-of-Guiding Learning with Large Language Models for Diagram Question Answering
Shaowei Wang, Lingling Zhang, Longji Zhu et al.
Epistemic Uncertainty Quantification For Pre-Trained Neural Networks
Hanjing Wang, Qiang Ji
PEGASUS: Personalized Generative 3D Avatars with Composable Attributes
Hyunsoo Cha, Byungjun Kim, Hanbyul Joo
Multi-modal Knowledge Distillation-based Human Trajectory Forecasting
Jaewoo Jeong, Seohee Lee, Daehee Park et al.
Detail-Preserving Latent Diffusion for Stable Shadow Removal
Jiamin Xu, Yuxin Zheng, Zelong Li et al.
FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion
Haosen Yang, Adrian Bulat, Isma Hadji et al.
AlphaPre: Amplitude-Phase Disentanglement Model for Precipitation Nowcasting
Kenghong Lin, Baoquan Zhang, Demin Yu et al.
From Sparse to Dense: Camera Relocalization with Scene-Specific Detector from Feature Gaussian Splatting
Zhiwei Huang, Hailin Yu, Yichun Shentu et al.
Uncertain Multimodal Intention and Emotion Understanding in the Wild
Qu Yang, QingHongYa Shi, Tongxin Wang et al.
A Unified Model for Compressed Sensing MRI Across Undersampling Patterns
Armeet Singh Jatyani, Jiayun Wang, Aditi Chandrashekar et al.
StyleSSP: Sampling StartPoint Enhancement for Training-free Diffusion-based Method for Style Transfer
ruojun xu, Weijie Xi, Xiaodi Wang et al.
VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment
Darshana Saravanan, Varun Gupta, Darshan Singh S et al.
Cross-view and Cross-pose Completion for 3D Human Understanding
Matthieu Armando, Salma Galaaoui, Fabien Baradel et al.
MOS: Modeling Object-Scene Associations in Generalized Category Discovery
Zhengyuan Peng, Jinpeng Ma, Zhimin Sun et al.
How Do I Do That? Synthesizing 3D Hand Motion and Contacts for Everyday Interactions
Aditya Prakash, Benjamin E Lundell, Dmitry Andreychuk et al.
Improving Gaussian Splatting with Localized Points Management
Haosen Yang, Chenhao Zhang, Wenqing Wang et al.
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
Siyuan Li, Luyuan Zhang, Zedong Wang et al.
Combining Frame and GOP Embeddings for Neural Video Representation
Jens Eirik Saethre, Roberto Azevedo, Christopher Schroers
FreqDebias: Towards Generalizable Deepfake Detection via Consistency-Driven Frequency Debiasing
Hossein Kashiani, Niloufar Alipour Talemi, Fatemeh Afghah
DyCON: Dynamic Uncertainty-aware Consistency and Contrastive Learning for Semi-supervised Medical Image Segmentation
Maregu Assefa, Muzammal Naseer, IYYAKUTTI IYAPPAN GANAPATHI et al.
Motion Modes: What Could Happen Next?
Karran Pandey, Yannick Hold-Geoffroy, Matheus Gadelha et al.
RCL: Reliable Continual Learning for Unified Failure Detection
Fei Zhu, Zhen Cheng, Xu-Yao Zhang et al.
HybridGS: Decoupling Transients and Statics with 2D and 3D Gaussian Splatting
Jingyu Lin, Jiaqi Gu, Lubin Fan et al.
Efficient Multitask Dense Predictor via Binarization
Yuzhang Shang, Dan Xu, Gaowen Liu et al.
Show and Tell: Visually Explainable Deep Neural Nets via Spatially-Aware Concept Bottleneck Models
Itay Benou, Tammy Riklin Raviv
ArtiScene: Language-Driven Artistic 3D Scene Generation Through Image Intermediary
Zeqi Gu, Yin Cui, Max Li et al.
Mind Artist: Creating Artistic Snapshots with Human Thought
Jiaxuan Chen, Yu Qi, Yueming Wang et al.
Federated Learning with Domain Shift Eraser
Zheng Wang, Zihui Wang, Zheng Wang et al.
Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis
Arpita Chowdhury, Dipanjyoti Paul, Zheda Mai et al.
In-Context Matting
He Guo, Zixuan Ye, Zhiguo Cao et al.
3D Occupancy Prediction with Low-Resolution Queries via Prototype-aware View Transformation
Gyeongrok Oh, Sung June Kim, Heeju Ko et al.
Neuro-Symbolic Evaluation of Text-to-Video Models using Formal Verification
S P Sharan, Minkyu Choi, Sahil Shah et al.
GaussHDR: High Dynamic Range Gaussian Splatting via Learning Unified 3D and 2D Local Tone Mapping
Jinfeng Liu, Lingtong Kong, Bo Li et al.
SaMam: Style-aware State Space Model for Arbitrary Image Style Transfer
Hongda Liu, Longguang Wang, Ye Zhang et al.
Enhancing Privacy-Utility Trade-offs to Mitigate Memorization in Diffusion Models
Chen Chen, Daochang Liu, Mubarak Shah et al.
Language-Guided Audio-Visual Learning for Long-Term Sports Assessment
Huangbiao Xu, Xiao Ke, Huanqi Wu et al.
DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models
Yongqi Huang, Peng Ye, Chenyu Huang et al.
Chat-based Person Retrieval via Dialogue-Refined Cross-Modal Alignment
Yang Bai, Yucheng Ji, Min Cao et al.
MultiVENT 2.0: A Massive Multilingual Benchmark for Event-Centric Video Retrieval
Reno Kriz, Kate Sanders, David Etter et al.
GCE-Pose: Global Context Enhancement for Category-level Object Pose Estimation
Weihang Li, Hongli XU, Junwen Huang et al.
Satellite Observations Guided Diffusion Model for Accurate Meteorological States at Arbitrary Resolution
Siwei Tu, Ben Fei, Weidong Yang et al.
Towards In-the-wild 3D Plane Reconstruction from a Single Image
Jiachen Liu, Rui Yu, Sili Chen et al.
Question-Aware Gaussian Experts for Audio-Visual Question Answering
Hongyeob Kim, Inyoung Jung, Dayoon Suh et al.
UHD-processer: Unified UHD Image Restoration with Progressive Frequency Learning and Degradation-aware Prompts
Yidi Liu, Dong Li, Xueyang Fu et al.
CoMBO: Conflict Mitigation via Branched Optimization for Class Incremental Segmentation
Kai Fang, Anqi Zhang, Guangyu Gao et al.
Interpretable Generative Models through Post-hoc Concept Bottlenecks
Akshay R. Kulkarni, Ge Yan, Chung-En Sun et al.
RealEdit: Reddit Edits As a Large-scale Empirical Dataset for Image Transformations
Peter Sushko, Ayana Bharadwaj, Zhi Yang Lim et al.
DiTASK: Multi-Task Fine-Tuning with Diffeomorphic Transformations
Krishna Sri Ipsit Mantri, Carola-Bibiane Schönlieb, Bruno Ribeiro et al.
Revisiting Source-Free Domain Adaptation: Insights into Representativeness, Generalization, and Variety
Ronghang Zhu, Mengxuan Hu, Weiming Zhuang et al.
Fully Exploiting Every Real Sample: SuperPixel Sample Gradient Model Stealing
Yunlong Zhao, Xiaoheng Deng, Yijing Liu et al.
EFormer: Enhanced Transformer towards Semantic-Contour Features of Foreground for Portraits Matting
Zitao Wang, Qiguang Miao, Yue Xi et al.
OnlineAnySeg: Online Zero-Shot 3D Segmentation by Visual Foundation Model Guided 2D Mask Merging
Yijie Tang, Jiazhao Zhang, Yuqing Lan et al.
Normalizing Flows on the Product Space of SO(3) Manifolds for Probabilistic Human Pose Modeling
Olaf Dünkel, Tim Salzmann, Florian Pfaff
FreeGave: 3D Physics Learning from Dynamic Videos by Gaussian Velocity
Jinxi Li, Ziyang Song, Siyuan Zhou et al.
Interpretable Image Classification via Non-parametric Part Prototype Learning
Zhijie Zhu, Lei Fan, Maurice Pagnucco et al.
Ego4o: Egocentric Human Motion Capture and Understanding from Multi-Modal Input
Jian Wang, Rishabh Dabral, Diogo Luvizon et al.
Hardware-Rasterized Ray-Based Gaussian Splatting
Samuel Rota Bulò, Lorenzo Porzi, Nemanja Bartolovic et al.
CTRL-D: Controllable Dynamic 3D Scene Editing with Personalized 2D Diffusion
Kai He, Chin-Hsuan Wu, Igor Gilitschenski
OCRT: Boosting Foundation Models in the Open World with Object-Concept-Relation Triad
Luyao Tang, Chaoqi Chen, Yuxuan Yuan et al.
NECA: Neural Customizable Human Avatar
Junjin Xiao, Qing Zhang, Zhan Xu et al.
EchoONE: Segmenting Multiple Echocardiography Planes in One Model
Jiongtong Hu, Wei Zhuo, Jun Cheng et al.
Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection
Gensheng Pei, Tao Chen, Yujia Wang et al.
Unsegment Anything by Simulating Deformation
Jiahao Lu, Xingyi Yang, Xinchao Wang
4Deform: Neural Surface Deformation for Robust Shape Interpolation
Lu Sang, Zehranaz Canfes, Dongliang Cao et al.
From Alexnet to Transformers: Measuring the Non-linearity of Deep Neural Networks with Affine Optimal Transport
Quentin Bouniot, Ievgen Redko, Anton Mallasto et al.
Intensity-Robust Autofocus for Spike Camera
Changqing Su, Zhiyuan Ye, Yongsheng Xiao et al.
IMFine: 3D Inpainting via Geometry-guided Multi-view Refinement
Zhihao Shi, Dong Huo, Yuhongze Zhou et al.
DecoupledGaussian: Object-Scene Decoupling for Physics-Based Interaction
Miaowei Wang, Yibo Zhang, Rui Ma et al.
RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments
Haisheng Su, Feixiang Song, CONG MA et al.
Stable-SCore: A Stable Registration-based Framework for 3D Shape Correspondence
Haolin Liu, Xiaohang Zhan, Zizheng Yan et al.
Flow-Guided Online Stereo Rectification for Wide Baseline Stereo
Anush Kumar, Fahim Mannan, Omid Hosseini Jafari et al.
On the Zero-shot Adversarial Robustness of Vision-Language Models: A Truly Zero-shot and Training-free Approach
Baoshun Tong, Hanjiang Lai, Yan Pan et al.
Self-Learning Hyperspectral and Multispectral Image Fusion via Adaptive Residual Guided Subspace Diffusion Model
Jian Zhu, He Wang, Yang Xu et al.
Understanding Fine-tuning CLIP for Open-vocabulary Semantic Segmentation in Hyperbolic Space
Zelin Peng, Zhengqin Xu, Zhilin Zeng et al.
UIBDiffusion: Universal Imperceptible Backdoor Attack for Diffusion Models
Yuning Han, Bingyin Zhao, Rui Chu et al.
Learning Heterogeneous Tissues with Mixture of Experts for Gigapixel Whole Slide Images
Junxian Wu, Minheng Chen, Xinyi Ke et al.
Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts
Qizhou Chen, Chengyu Wang, Dakan Wang et al.
Generalizable Face Landmarking Guided by Conditional Face Warping
Jiayi Liang, Haotian Liu, Hongteng Xu et al.
A Unified Framework for Human-centric Point Cloud Video Understanding
Yiteng Xu, Kecheng Ye, xiao han et al.
Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation
Xin Yan, Yuxuan Cai, Qiuyue Wang et al.
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
Chenxin Tao, Shiqian Su, Xizhou Zhu et al.
SocialGesture: Delving into Multi-person Gesture Understanding
Xu Cao, Pranav Virupaksha, Wenqi Jia et al.
ProbeSDF: Light Field Probes For Neural Surface Reconstruction
Briac Toussaint, Diego Thomas, Jean-Sébastien Franco
Probabilistic Sampling of Balanced K-Means using Adiabatic Quantum Computing
Jan-Nico Zaech, Martin Danelljan, Tolga Birdal et al.
DA-VPT: Semantic-Guided Visual Prompt Tuning for Vision Transformers
Li Ren, Chen Chen, Liqiang Wang et al.
AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward
Haonan Han, Xiangzuo Wu, Huan Liao et al.
TSAM: Temporal SAM Augmented with Multimodal Prompts for Referring Audio-Visual Segmentation
Abduljalil Radman, Jorma Laaksonen
Learning to Navigate Efficiently and Precisely in Real Environments
Guillaume Bono, Hervé Poirier, Leonid Antsfeld et al.
BF-STVSR: B-Splines and Fourier---Best Friends for High Fidelity Spatial-Temporal Video Super-Resolution
Eunjin Kim, HYEONJIN KIM, Kyong Hwan Jin et al.
Context-Enhanced Memory-Refined Transformer for Online Action Detection
Zhanzhong Pang, Fadime Sener, Angela Yao
HandOS: 3D Hand Reconstruction in One Stage
Xingyu Chen, Zhuheng Song, Xiaoke Jiang et al.
FluxSpace: Disentangled Semantic Editing in Rectified Flow Models
Yusuf Dalva, Kavana Venkatesh, Pinar Yanardag
Uni-Renderer: Unifying Rendering and Inverse Rendering Via Dual Stream Diffusion
ZhiFei Chen, Tianshuo Xu, Wenhang Ge et al.
Open-World Objectness Modeling Unifies Novel Object Detection
Shan Zhang, Yao Ni, Jinhao Du et al.
Removing Reflections from RAW Photos
Eric Kee, Adam Pikielny, Kevin Blackburn-Matzen et al.
One2Any: One-Reference 6D Pose Estimation for Any Object
Mengya Liu, Siyuan Li, Ajad Chhatkuli et al.
Multi-party Collaborative Attention Control for Image Customization
Han Yang, Chuanguang Yang, Qiuli Wang et al.
BerfScene: Bev-conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation
Qihang Zhang, Yinghao Xu, Yujun Shen et al.
Edit One for All: Interactive Batch Image Editing
Thao Nguyen, Utkarsh Ojha, Yuheng Li et al.
Exact: Exploring Space-Time Perceptive Clues for Weakly Supervised Satellite Image Time Series Semantic Segmentation
Hao Zhu, Yan Zhu, Jiayu Xiao et al.
Towards Robust 3D Pose Transfer with Adversarial Learning
Haoyu Chen, Hao Tang, Ehsan Adeli et al.
Multi-Resolution Pathology-Language Pre-training Model with Text-Guided Visual Representation
Shahad Albastaki, Anabia Sohail, IYYAKUTTI IYAPPAN GANAPATHI et al.
Adaptive Non-Uniform Timestep Sampling for Accelerating Diffusion Model Training
Myunsoo Kim, Donghyeon Ki, Seong-Woong Shim et al.
From Elements to Design: A Layered Approach for Automatic Graphic Design Composition
Jiawei Lin, Shizhao Sun, Danqing Huang et al.
Semantic and Expressive Variations in Image Captions Across Languages
Andre Ye, Sebastin Santy, Jena D. Hwang et al.
Learning with Unreliability: Fast Few-shot Voxel Radiance Fields with Relative Geometric Consistency
Xu Yingjie, Bangzhen Liu, Hao Tang et al.
Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery
Yuqi Zhang, Guanying Chen, Jiaxing Chen et al.
Self-supervised Debiasing Using Low Rank Regularization
Geon Yeong Park, Chanyong Jung, Sangmin Lee et al.
Audio-Visual Semantic Graph Network for Audio-Visual Event Localization
Liang Liu, Shuaiyong Li, Yongqiang Zhu
Robust Self-calibration of Focal Lengths from the Fundamental Matrix
Viktor Kocur, Daniel Kyselica, Zuzana Kukelova
InteractionMap: Improving Online Vectorized HDMap Construction with Interaction
Kuang Wu, Chuan Yang, Zhanbin Li
HUSH: Holistic Panoramic 3D Scene Understanding using Spherical Harmonics
Jongsung Lee, HARIN PARK, Byeong-Uk Lee et al.
Visual Objectification in Films: Towards a New AI Task for Video Interpretation
Julie Tores, Lucile Sassatelli, Hui-Yin Wu et al.