Most Cited CVPR "mesh retrieval" Papers
5,589 papers found • Page 8 of 28
Conference
BSNet: Box-Supervised Simulation-assisted Mean Teacher for 3D Instance Segmentation
Jiahao Lu, Jiacheng Deng, Tianzhu Zhang
Hybrid Global-Local Representation with Augmented Spatial Guidance for Zero-Shot Referring Image Segmentation
Ting Liu, Siyuan Li
GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration
Yuchen Sun, Shanhui Zhao, Tao Yu et al.
Characteristics Matching Based Hash Codes Generation for Efficient Fine-grained Image Retrieval
Zhen-Duo Chen, Li-Jun Zhao, Zi-Chao Zhang et al.
EMOE: Modality-Specific Enhanced Dynamic Emotion Experts
Yiyang Fang, Wenke Huang, Guancheng Wan et al.
HIIF: Hierarchical Encoding based Implicit Image Function for Continuous Super-resolution
Yuxuan Jiang, Ho Man Kwan, jasmine peng et al.
Distilling ODE Solvers of Diffusion Models into Smaller Steps
Sanghwan Kim, Hao Tang, Fisher Yu
VTON-HandFit: Virtual Try-on for Arbitrary Hand Pose Guided by Hand Priors Embedding
Yujie Liang, Xiaobin Hu, Boyuan Jiang et al.
Gazing Into Missteps: Leveraging Eye-Gaze for Unsupervised Mistake Detection in Egocentric Videos of Skilled Human Activities
Michele Mazzamuto, Antonino Furnari, Yoichi Sato et al.
DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding
Wenhui Liao, Jiapeng Wang, Hongliang Li et al.
CADDreamer: CAD Object Generation from Single-view Images
Yuan Li, Cheng Lin, Yuan Liu et al.
SIRA: Scalable Inter-frame Relation and Association for Radar Perception
Ryoma Yataka, Pu Wang, Petros Boufounos et al.
CCIN: Compositional Conflict Identification and Neutralization for Composed Image Retrieval
Likai Tian, Jian Zhao, Zechao Hu et al.
FedMIA: An Effective Membership Inference Attack Exploiting "All for One" Principle in Federated Learning
Gongxi Zhu, Donghao Li, Hanlin Gu et al.
Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think
Zhenyi Lu, Xiaoye Qu, Zhenyi Lu et al.
S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Model with Spatio-Temporal Visual Representation
Yichen Xie, Runsheng Xu, Tong He et al.
SaCo Loss: Sample-wise Affinity Consistency for Vision-Language Pre-training
WU Sitong, Haoru Tan, Zhuotao Tian et al.
SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection
Mingxuan Liu, Tyler Hayes, Elisa Ricci et al.
Towards Training-free Anomaly Detection with Vision and Language Foundation Models
Jinjin Zhang, Guodong Wang, yizhou jin et al.
VTON 360: High-Fidelity Virtual Try-On from Any Viewing Direction
Zijian He, Yuwei Ning, Yipeng Qin et al.
FSC: Few-point Shape Completion
Xianzu Wu, Xianfeng Wu, Tianyu Luan et al.
Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models
Jin Wang, Chenghui Lv, Xian Li et al.
EnergyMoGen: Compositional Human Motion Generation with Energy-Based Diffusion Model in Latent Space
Jianrong Zhang, Hehe Fan, Yi Yang
When the Future Becomes the Past: Taming Temporal Correspondence for Self-supervised Video Representation Learning
Yang Liu, Qianqian Xu, Peisong Wen et al.
SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining
Chull Hwan Song, Taebaek Hwang, Jooyoung Yoon et al.
DreamRelation: Bridging Customization and Relation Generation
Qingyu Shi, Lu Qi, Jianzong Wu et al.
Step Differences in Instructional Video
Tushar Nagarajan, Lorenzo Torresani
Flexible Frame Selection for Efficient Video Reasoning
Shyamal Buch, Arsha Nagrani, Anurag Arnab et al.
Dual-Scale Transformer for Large-Scale Single-Pixel Imaging
Gang Qu, Ping Wang, Xin Yuan
PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation
Qihan Huang, Weilong Dai, Jinlong Liu et al.
DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation
Hongbin Lin, Zilu Guo, Yifan Zhang et al.
Smart Help: Strategic Opponent Modeling for Proactive and Adaptive Robot Assistance in Households
Zhihao Cao, ZiDong Wang, Siwen Xie et al.
ID-Patch: Robust ID Association for Group Photo Personalization
Yimeng Zhang, Tiancheng Zhi, Jing Liu et al.
CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIP
Songlong Xing, Zhengyu Zhao, Nicu Sebe
Disentangled Pre-training for Human-Object Interaction Detection
Zhuolong Li, Xingao Li, Changxing Ding et al.
Flowing from Words to Pixels: A Noise-Free Framework for Cross-Modality Evolution
Qihao Liu, Xi Yin, Alan L. Yuille et al.
Generalized Recorrupted-to-Recorrupted: Self-Supervised Learning Beyond Gaussian Noise
Brayan Monroy, Jorge Bacca, Julián Tachella
Diff2Flow: Training Flow Matching Models via Diffusion Model Alignment
Johannes Schusterbauer, Ming Gui, Frank Fundel et al.
A Theory of Joint Light and Heat Transport for Lambertian Scenes
Mani Ramanagopal, Sriram Narayanan, Aswin C. Sankaranarayanan et al.
RayFlow: Instance-Aware Diffusion Acceleration via Adaptive Flow Trajectories
Huiyang Shao, Xin Xia, Yuhong Yang et al.
MIMO: A Medical Vision Language Model with Visual Referring Multimodal Input and Pixel Grounding Multimodal Output
Yanyuan Chen, Dexuan Xu, Yu Huang et al.
OrthCaps: An Orthogonal CapsNet with Sparse Attention Routing and Pruning
Geng Xinyu, Jiaming Wang, Jiawei Gong et al.
ReCap: Better Gaussian Relighting with Cross-Environment Captures
Jingzhi Li, Zongwei Wu, Eduard Zamfir et al.
Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis
Marianna Ohanyan, Hayk Manukyan, Zhangyang Wang et al.
Spectrum AUC Difference (SAUCD): Human-aligned 3D Shape Evaluation
Tianyu Luan, Zhong Li, Lele Chen et al.
LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images
Jing Zhang, Irving Fang, Hao Wu et al.
Towards Automated Movie Trailer Generation
Dawit Argaw Argaw, Mattia Soldan, Alejandro Pardo et al.
PLeaS - Merging Models with Permutations and Least Squares
Anshul Nasery, Jonathan Hayase, Pang Wei Koh et al.
CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering
Tianyu Huai, Jie Zhou, Xingjiao Wu et al.
Anchor-based Robust Finetuning of Vision-Language Models
Jinwei Han, Zhiwen Lin, Zhongyisun Sun et al.
Gaussian Eigen Models for Human Heads
Wojciech Zielonka, Timo Bolkart, Thabo Beeler et al.
ContextSeg: Sketch Semantic Segmentation by Querying the Context with Attention
Jiawei Wang, Changjian Li
Integrating Efficient Optimal Transport and Functional Maps For Unsupervised Shape Correspondence Learning
Tung Le, Khai Nguyen, Shanlin Sun et al.
MemoNav: Working Memory Model for Visual Navigation
Hongxin Li, Zeyu Wang, Xu Yang et al.
SplatFlow: Self-Supervised Dynamic Gaussian Splatting in Neural Motion Flow Field for Autonomous Driving
Su Sun, Cheng Zhao, Zhuoyang Sun et al.
PCDreamer: Point Cloud Completion Through Multi-view Diffusion Priors
Guangshun Wei, Yuan Feng, Long Ma et al.
VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving
Haiming Zhang, Wending Zhou, Shenzhen The Chinese University of Hongkong et al.
V2X-R: Cooperative LiDAR-4D Radar Fusion with Denoising Diffusion for 3D Object Detection
Xun Huang, Jinlong Wang, Qiming Xia et al.
Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
Federico Cocchi, Nicholas Moratelli, Marcella Cornia et al.
Neural Point Cloud Diffusion for Disentangled 3D Shape and Appearance Generation
Philipp Schröppel, Christopher Wewer, Jan Lenssen et al.
Text-guided Explorable Image Super-resolution
Kanchana Vaishnavi Gandikota, Paramanand Chandramouli
BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation
Yuyang Peng, Shishi Xiao, Keming Wu et al.
Single Mesh Diffusion Models with Field Latents for Texture Generation
Thomas W. Mitchel, Carlos Esteves, Ameesh Makadia
GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image
Chong Bao, Yinda Zhang, Yuan Li et al.
Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model
Zhaochong An, Guolei Sun, Yun Liu et al.
STCOcc: Sparse Spatial-Temporal Cascade Renovation for 3D Occupancy and Scene Flow Prediction
Zhimin Liao, Ping Wei, Shuaijia Chen et al.
Zero-Shot Structure-Preserving Diffusion Model for High Dynamic Range Tone Mapping
Ruoxi Zhu, Shusong Xu, Peiye Liu et al.
SpatialDreamer: Self-supervised Stereo Video Synthesis from Monocular Input
Zhen Lv, Yangqi Long, Congzhentao Huang et al.
Towards Accurate and Robust Architectures via Neural Architecture Search
Yuwei Ou, Yuqi Feng, Yanan Sun
DocVLM: Make Your VLM an Efficient Reader
Mor Shpigel Nacson, Aviad Aberdam, Roy Ganz et al.
YolOOD: Utilizing Object Detection Concepts for Multi-Label Out-of-Distribution Detection
Alon Zolfi, Guy AmiT, Amit Baras et al.
Open-World Amodal Appearance Completion
Jiayang Ao, Yanbei Jiang, Qiuhong Ke et al.
Mind The Edge: Refining Depth Edges in Sparsely-Supervised Monocular Depth Estimation
Lior Talker, Aviad Cohen, Erez Yosef et al.
HiMoR: Monocular Deformable Gaussian Reconstruction with Hierarchical Motion Representation
Yiming Liang, Tianhan Xu, Yuta Kikuchi
UnCommon Objects in 3D
Xingchen Liu, Piyush Tayal, Jianyuan Wang et al.
InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment
Yunhong Lu, Qichao Wang, Hengyuan Cao et al.
Language Guided Concept Bottleneck Models for Interpretable Continual Learning
Lu Yu, HaoYu Han, Zhe Tao et al.
FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication
Eric Slyman, Stefan Lee, Scott Cohen et al.
BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models
Zenghui Yuan, Jiawen Shi, Pan Zhou et al.
RELI11D: A Comprehensive Multimodal Human Motion Dataset and Method
Ming Yan, Yan Zhang, Shuqiang Cai et al.
O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models
Ashshak Sharifdeen, Muhammad Akhtar Munir, Sanoojan Baliah et al.
RNG: Relightable Neural Gaussians
Jiahui Fan, Fujun Luan, Jian Yang et al.
High-Quality Facial Geometry and Appearance Capture at Home
Yuxuan Han, Junfeng Lyu, Feng Xu
TurboSL: Dense Accurate and Fast 3D by Neural Inverse Structured Light
Parsa Mirdehghan, Maxx Wu, Wenzheng Chen et al.
Neural Video Compression with Context Modulation
Chuanbo Tang, Zhuoyuan Li, Yifan Bian et al.
Bayesian Test-Time Adaptation for Vision-Language Models
Lihua Zhou, Mao Ye, Shuaifeng Li et al.
MultiGO: Towards Multi-level Geometry Learning for Monocular 3D Textured Human Reconstruction
Gangjian Zhang, Nanjie Yao, Shunsi Zhang et al.
MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining
Yunze Liu, Li Yi
Circumventing Shortcuts in Audio-visual Deepfake Detection Datasets with Unsupervised Learning
Stefan Smeu, Dragos-Alexandru Boldisor, Dan Oneata et al.
Memory-Scalable and Simplified Functional Map Learning
Robin Magnet, Maks Ovsjanikov
GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency
Dongyue Lu, Lingdong Kong, Tianxin Huang et al.
CoA: Towards Real Image Dehazing via Compression-and-Adaptation
Long Ma, Yuxin Feng, Yan Zhang et al.
Visual-Instructed Degradation Diffusion for All-in-One Image Restoration
Haina Qin, Wenyang Luo, Zewen Chen et al.
DIFFER: Disentangling Identity Features via Semantic Cues for Clothes-Changing Person Re-ID
Xin Liang, Yogesh S. Rawat
DreamText: High Fidelity Scene Text Synthesis
Yibin Wang, Weizhong Zhang, honghui xu et al.
Probability Density Geodesics in Image Diffusion Latent Space
Qingtao Yu, Jaskirat Singh, Zhaoyuan Yang et al.
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
Weiming Ren, Huan Yang, Jie Min et al.
Rethinking Query-based Transformer for Continual Image Segmentation
Yuchen Zhu, Cheng Shi, Dingyou Wang et al.
Fully Convolutional Slice-to-Volume Reconstruction for Single-Stack MRI
Sean I. Young, Yaël Balbastre, Bruce Fischl et al.
LoRA Subtraction for Drift-Resistant Space in Exemplar-Free Continual Learning
Xuan Liu, Xiaobin Chang
Trajectory Mamba: Efficient Attention-Mamba Forecasting Model Based on Selective SSM
Yizhou Huang, Yihua Cheng, Kezhi Wang
Semantics-aware Motion Retargeting with Vision-Language Models
Haodong Zhang, ZhiKe Chen, Haocheng Xu et al.
A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives
Simone Alberto Peirone, Francesca Pistilli, Antonio Alliegro et al.
CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools
Chinedu Innocent Nwoye, Kareem elgohary, Anvita A. Srinivas et al.
SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments
Yue Cao, Yun Xing, Jie Zhang et al.
CDFormer: When Degradation Prediction Embraces Diffusion Model for Blind Image Super-Resolution
Qingguo Liu, Chenyi Zhuang, Pan Gao et al.
Layered Image Vectorization via Semantic Simplification
Zhenyu Wang, Jianxi Huang, Zhida Sun et al.
Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks
Peng Xie, Yequan Bie, Jianda Mao et al.
Clockwork Diffusion: Efficient Generation With Model-Step Distillation
Amirhossein Habibian, Amir Ghodrati, Noor Fathima et al.
Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training
Haicheng Wang, Chen Ju, Weixiong Lin et al.
Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval
Arun Reddy, Alexander Martin, Eugene Yang et al.
Post-pre-training for Modality Alignment in Vision-Language Foundation Models
Shin'ya Yamaguchi, Dewei Feng, Sekitoshi Kanai et al.
Efficient 3D Implicit Head Avatar with Mesh-anchored Hash Table Blendshapes
Ziqian Bai, Feitong Tan, Sean Fanello et al.
Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression
Hancheng Ye, Chong Yu, Peng Ye et al.
From Poses to Identity: Training-Free Person Re-Identification via Feature Centralization
Chao Yuan, Guiwei Zhang, Changxiao Ma et al.
Imagine and Seek: Improving Composed Image Retrieval with an Imagined Proxy
You Li, Fan Ma, Yi Yang
Relation Rectification in Diffusion Model
Yinwei Wu, Xingyi Yang, Xinchao Wang
LiDAR-RT: Gaussian-based Ray Tracing for Dynamic LiDAR Re-simulation
Chenxu Zhou, Lvchang Fu, Sida Peng et al.
ScribbleLight: Single Image Indoor Relighting with Scribbles
Jun Myeong Choi, Annie N. Wang, Pieter Peers et al.
PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution
Zhu Li Bo, Jianze Li, Haotong Qin et al.
GLASS: Guided Latent Slot Diffusion for Object-Centric Learning
Krishnakant Singh, Simone Schaub-Meyer, Stefan Roth
Distilling Monocular Foundation Model for Fine-grained Depth Completion
Yingping Liang, Yutao Hu, Wenqi Shao et al.
MUST: The First Dataset and Unified Framework for Multispectral UAV Single Object Tracking
Haolin Qin, Tingfa Xu, Tianhao Li et al.
Neural Super-Resolution for Real-time Rendering with Radiance Demodulation
Jia Li, Ziling Chen, Xiaolong Wu et al.
GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic Fields
Fangyin Wei, Hanlin Chen, Gim Hee Lee
PlanarSplatting: Accurate Planar Surface Reconstruction in 3 Minutes
Bin Tan, Rui Yu, Yujun Shen et al.
TraF-Align: Trajectory-aware Feature Alignment for Asynchronous Multi-agent Perception
Zhiying Song, Lei Yang, Fuxi Wen et al.
Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model
Benlin Liu, Yuhao Dong, Yiqin Wang et al.
BASKET: A Large-Scale Video Dataset for Fine-Grained Skill Estimation
Yulu Pan, Ce Zhang, Gedas Bertasius
Improving Physics-Augmented Continuum Neural Radiance Field-Based Geometry-Agnostic System Identification with Lagrangian Particle Optimization
Takuhiro Kaneko
PreciseCam: Precise Camera Control for Text-to-Image Generation
Edurne Bernal-Berdun, Ana Serrano, Belen Masia et al.
An End-to-End Robust Point Cloud Semantic Segmentation Network with Single-Step Conditional Diffusion Models
Wentao Qu, Jing Wang, Yongshun Gong et al.
SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation
Leigang Qu, Haochuan Li, Wenjie Wang et al.
Active Object Detection with Knowledge Aggregation and Distillation from Large Models
Dejie Yang, Yang Liu
LITA-GS: Illumination-Agnostic Novel View Synthesis via Reference-Free 3D Gaussian Splatting and Physical Priors
Han Zhou, Wei Dong, Jun Chen
Making Visual Sense of Oracle Bones for You and Me
Runqi Qiao, LAN YANG, Kaiyue Pang et al.
KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation
Antoni Bigata Casademunt, Michał Stypułkowski, Rodrigo Mira et al.
Towards More Accurate Diffusion Model Acceleration with A Timestep Tuner
Mengfei Xia, Yujun Shen, Changsong Lei et al.
One More Step: A Versatile Plug-and-Play Module for Rectifying Diffusion Schedule Flaws and Enhancing Low-Frequency Controls
Minghui Hu, Jianbin Zheng, Chuanxia Zheng et al.
Fusing Personal and Environmental Cues for Identification and Segmentation of First-Person Camera Wearers in Third-Person Views
Ziwei Zhao, Yuchen Wang, Chuhua Wang
FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Generalized Segmentation
Dong Zhao, Jinlong Li, Shuang Wang et al.
Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners
Chun Feng, Joy Hsu, Weiyu Liu et al.
Knowledge-Aligned Counterfactual-Enhancement Diffusion Perception for Unsupervised Cross-Domain Visual Emotion Recognition
Wen Yin, Yong Wang, Guiduo Duan et al.
Vanishing-Point-Guided Video Semantic Segmentation of Driving Scenes
Diandian Guo, Deng-Ping Fan, Tongyu Lu et al.
Hierarchical Correlation Clustering and Tree Preserving Embedding
Morteza Haghir Chehreghani, Mostafa Haghir Chehreghani
Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models
Jiaming Zhang, Junhong Ye, Xingjun Ma et al.
Physical Plausibility-aware Trajectory Prediction via Locomotion Embodiment
Hiromu Taketsugu, Takeru Oba, Takahiro Maeda et al.
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Miran Heo, Min-Hung Chen, De-An Huang et al.
C3Net: Compound Conditioned ControlNet for Multimodal Content Generation
Juntao Zhang, Yuehuai LIU, Yu-Wing Tai et al.
SuperPC: A Single Diffusion Model for Point Cloud Completion, Upsampling, Denoising, and Colorization
Yi Du, Zhipeng Zhao, Shaoshu Su et al.
SAM2Object: Consolidating View Consistency via SAM2 for Zero-Shot 3D Instance Segmentation
Jihuai Zhao, Junbao Zhuo, Jiansheng Chen et al.
AAMDM: Accelerated Auto-regressive Motion Diffusion Model
Tianyu Li, Calvin Zhuhan Qiao, Ren Guanqiao et al.
LesionLocator: Zero-Shot Universal Tumor Segmentation and Tracking in 3D Whole-Body Imaging
Maximilian Rokuss, Yannick Kirchhoff, Seval Akbal et al.
DiG-IN: Diffusion Guidance for Investigating Networks - Uncovering Classifier Differences Neuron Visualisations and Visual Counterfactual Explanations
Maximilian Augustin, Yannic Neuhaus, Matthias Hein
DepthCues: Evaluating Monocular Depth Perception in Large Vision Models
Duolikun Danier, Mehmet Aygun, Changjian Li et al.
UnSAMFlow: Unsupervised Optical Flow Guided by Segment Anything Model
Shuai Yuan, Lei Luo, Zhuo Hui et al.
Dual Prompting Image Restoration with Diffusion Transformers
Dehong Kong, Fan Li, Zhixin Wang et al.
ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos
Tanveer Hannan, Md Mohaiminul Islam, Jindong Gu et al.
SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters
Jianping Jiang, Weiye Xiao, Zhengyu Lin et al.
Boosting Flow-based Generative Super-Resolution Models via Learned Prior
Li-Yuan Tsao, Yi-Chen Lo, Chia-Che Chang et al.
HyperGS: Hyperspectral 3D Gaussian Splatting
Christopher Thirgood, Oscar Mendez, Erin Chao Ling et al.
MP-GUI: Modality Perception with MLLMs for GUI Understanding
Ziwei Wang, Weizhi Chen, Leyang Yang et al.
MP-SfM: Monocular Surface Priors for Robust Structure-from-Motion
Zador Pataki, Paul-Edouard Sarlin, Johannes Schönberger et al.
Zero-Shot Styled Text Image Generation, but Make It Autoregressive
Vittorio Pippi, Fabio Quattrini, Silvia Cascianelli et al.
Motion Diversification Networks
Hee Jae Kim, Eshed Ohn-Bar
Seurat: From Moving Points to Depth
Seokju Cho, Gabriel Huang, Seungryong Kim et al.
EventGPT: Event Stream Understanding with Multimodal Large Language Models
shaoyu liu, Jianing Li, guanghui zhao et al.
Bilateral Event Mining and Complementary for Event Stream Super-Resolution
Zhilin Huang, Quanmin Liang, Yijie Yu et al.
ProMotion: Prototypes As Motion Learners
Yawen Lu, Dongfang Liu, Qifan Wang et al.
Online Video Understanding: OVBench and VideoChat-Online
Zhenpeng Huang, Xinhao Li, Jiaqi Li et al.
A Unified and Interpretable Emotion Representation and Expression Generation
Reni Paskaleva, Mykyta Holubakha, Andela Ilic et al.
Lessons and Insights from a Unifying Study of Parameter-Efficient Fine-Tuning (PEFT) in Visual Recognition
Zheda Mai, Ping Zhang, Cheng-Hao Tu et al.
Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation
Yongkang Li, Tianheng Cheng, Bin Feng et al.
UltraFusion: Ultra High Dynamic Imaging using Exposure Fusion
Zixuan Chen, Yujin Wang, Xin Cai et al.
REWIND: Real-Time Egocentric Whole-Body Motion Diffusion with Exemplar-Based Identity Conditioning
Jihyun Lee, Weipeng Xu, Alexander Richard et al.
LoTUS: Large-Scale Machine Unlearning with a Taste of Uncertainty
Christoforos N. Spartalis, Theodoros Semertzidis, Efstratios Gavves et al.
BARD-GS: Blur-Aware Reconstruction of Dynamic Scenes via Gaussian Splatting
Yiren Lu, Yunlai Zhou, Disheng Liu et al.
RAD: Region-Aware Diffusion Models for Image Inpainting
Sora Kim, Sungho Suh, Minsik Lee
Automatic Controllable Colorization via Imagination
Xiaoyan Cong, Yue Wu, Qifeng Chen et al.
Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation
Chanyoung Kim, Dayun Ju, Woojung Han et al.
Deformable Radial Kernel Splatting
Yihua Huang, Mingxian Lin, Yangtian Sun et al.
MonoSplat: Generalizable 3D Gaussian Splatting from Monocular Depth Foundation Models
Yifan Liu, Keyu Fan, Weihao Yu et al.
Show and Segment: Universal Medical Image Segmentation via In-Context Learning
Yunhe Gao, Di Liu, Zhuowei Li et al.
PosterO: Structuring Layout Trees to Enable Language Models in Generalized Content-Aware Layout Generation
HsiaoYuan Hsu, Yuxin Peng
Rashomon Sets for Prototypical-Part Networks: Editing Interpretable Models in Real-Time
Jon Donnelly, Zhicheng Guo, Alina Jade Barnett et al.
ROD-MLLM: Towards More Reliable Object Detection in Multimodal Large Language Models
Heng Yin, Yuqiang Ren, Ke Yan et al.
HumanRig: Learning Automatic Rigging for Humanoid Character in a Large Scale Dataset
Zedong Chu, Feng Xiong, Meiduo Liu et al.
AniMer: Animal Pose and Shape Estimation Using Family Aware Transformer
Jin Lyu, Tianyi Zhu, Yi Gu et al.
EdgeTAM: On-Device Track Anything Model
Chong Zhou, Chenchen Zhu, Yunyang Xiong et al.
g3D-LF: Generalizable 3D-Language Feature Fields for Embodied Tasks
Zihan Wang, Gim Hee Lee
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs
Xiaoqin Wang, Xusen Ma, Xianxu Hou et al.
SoMA: Singular Value Decomposed Minor Components Adaptation for Domain Generalizable Representation Learning
Seokju Yun, Seunghye Chae, Dongheon Lee et al.
SketchVideo: Sketch-based Video Generation and Editing
Feng-Lin Liu, Hongbo Fu, Xintao Wang et al.
Image Quality Assessment: Investigating Causal Perceptual Effects with Abductive Counterfactual Inference
Wenhao Shen, Mingliang Zhou, Yu Chen et al.
Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing
Hanhui Wang, Yihua Zhang, Ruizheng Bai et al.
RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models
Haoran Hao, Jiaming Han, Changsheng Li et al.
MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation
Mingcheng Li, Xiaolu Hou, Ziyang Liu et al.
ACE: Anti-Editing Concept Erasure in Text-to-Image Models
Zihao Wang, Yuxiang Wei, Fan Li et al.
GenDeg: Diffusion-based Degradation Synthesis for Generalizable All-In-One Image Restoration
Sudarshan Rajagopalan, Nithin Gopalakrishnan Nair, Jay Paranjape et al.
Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images
Kazi Sajeed Mehrab, M. Maruf, Arka Daw et al.