Most Cited CVPR "orientation-aligned 3d generation" Papers
5,589 papers found • Page 8 of 28
Conference
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories
Muzhi Zhu, Yuzhuo Tian, Hao Chen et al.
Privacy-Preserving Optics for Enhancing Protection in Face De-Identification
Jhon Lopez, Carlos Hinojosa, Henry Arguello et al.
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
Jiange Yang, Haoyi Zhu, Yating Wang et al.
From Activation to Initialization: Scaling Insights for Optimizing Neural Fields
Hemanth Saratchandran, Sameera Ramasinghe, Simon Lucey
Detecting Backdoor Attacks in Federated Learning via Direction Alignment Inspection
Jiahao Xu, Zikai Zhang, Rui Hu
Robust Depth Enhancement via Polarization Prompt Fusion Tuning
Kei IKEMURA, Yiming Huang, Felix Heide et al.
GaussianSpa: An “Optimizing-Sparsifying” Simplification Framework for Compact and High-Quality 3D Gaussian Splatting
Yangming Zhang, Wenqi Jia, Wei Niu et al.
Rectified Diffusion Guidance for Conditional Generation
Mengfei Xia, Nan Xue, Yujun Shen et al.
NeRFPrior: Learning Neural Radiance Field as a Prior for Indoor Scene Reconstruction
Wenyuan Zhang, Emily Yue-ting Jia, Junsheng Zhou et al.
GenPC: Zero-shot Point Cloud Completion via 3D Generative Priors
An Li, Zhe Zhu, Mingqiang Wei
Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models
Jinho Jeong, Sangmin Han, Jinwoo Kim et al.
GaussianUDF: Inferring Unsigned Distance Functions through 3D Gaussian Splatting
Shujuan Li, Yu-Shen Liu, Zhizhong Han
BiPer: Binary Neural Networks using a Periodic Function
Edwin Vargas, Claudia Correa, Carlos Hinojosa et al.
Bridging the Gap Between End-to-End and Two-Step Text Spotting
Mingxin Huang, Hongliang Li, Yuliang Liu et al.
UltrAvatar: A Realistic Animatable 3D Avatar Diffusion Model with Authenticity Guided Textures
Mingyuan Zhou, Rakib Hyder, Ziwei Xuan et al.
4DGC: Rate-Aware 4D Gaussian Compression for Efficient Streamable Free-Viewpoint Video
Qiang Hu, Zihan Zheng, Houqiang Zhong et al.
DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation
Hongbin Lin, Zilu Guo, Yifan Zhang et al.
EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis
Sheng Miao, Jiaxin Huang, Dongfeng Bai et al.
Towards Understanding and Improving Adversarial Robustness of Vision Transformers
Samyak Jain, Tanima Dutta
TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting
Jianchuan Chen, Jingchuan Hu, Gaige Wang et al.
BlockDance: Reuse Structurally Similar Spatio-Temporal Features to Accelerate Diffusion Transformers
Hui Zhang, Tingwei Gao, Jie Shao et al.
NoT: Federated Unlearning via Weight Negation
Yasser Khalil, Leo Maxime Brunswic, Soufiane Lamghari et al.
SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding
Yangliu Hu, Zikai Song, Na Feng et al.
SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos
Changan Chen, Kumar Ashutosh, Rohit Girdhar et al.
SIGNeRF: Scene Integrated Generation for Neural Radiance Fields
Jan-Niklas Dihlmann, Andreas Engelhardt, Hendrik Lensch
VladVA: Discriminative Fine-tuning of LVLMs
Yassine Ouali, Adrian Bulat, ALEXANDROS XENOS et al.
TopoCellGen: Generating Histopathology Cell Topology with a Diffusion Model
Meilong Xu, Saumya Gupta, Xiaoling Hu et al.
SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation
Duc-Hai Pham, Tung Do, Phong Nguyen et al.
Data-Efficient Unsupervised Interpolation Without Any Intermediate Frame for 4D Medical Images
JungEun Kim, Hangyul Yoon, Geondo Park et al.
Data Synthesis with Diverse Styles for Face Recognition via 3DMM-Guided Diffusion
Yuxi Mi, Zhizhou Zhong, Yuge Huang et al.
Semantic and Sequential Alignment for Referring Video Object Segmentation
Feiyu Pan, Hao Fang, Fangkai Li et al.
StraightPCF: Straight Point Cloud Filtering
Dasith de Silva Edirimuni, Xuequan Lu, Gang Li et al.
SwiftEdit: Lightning Fast Text-Guided Image Editing via One-Step Diffusion
Trong-Tung Nguyen, Quang Nguyen, Khoi Nguyen et al.
Finsler-Laplace-Beltrami Operators with Application to Shape Analysis
Simon Weber, Thomas Dagès, Maolin Gao et al.
Multi-view Reconstruction via SfM-guided Monocular Depth Estimation
Haoyu Guo, He Zhu, Sida Peng et al.
SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining
Chull Hwan Song, Taebaek Hwang, Jooyoung Yoon et al.
Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes
Lihan Jiang, Kerui Ren, Mulin Yu et al.
MIMO: A Medical Vision Language Model with Visual Referring Multimodal Input and Pixel Grounding Multimodal Output
Yanyuan Chen, Dexuan Xu, Yu Huang et al.
DreamRelation: Bridging Customization and Relation Generation
Qingyu Shi, Lu Qi, Jianzong Wu et al.
Towards Training-free Anomaly Detection with Vision and Language Foundation Models
Jinjin Zhang, Guodong Wang, yizhou jin et al.
Flexible Frame Selection for Efficient Video Reasoning
Shyamal Buch, Arsha Nagrani, Anurag Arnab et al.
Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis
Marianna Ohanyan, Hayk Manukyan, Zhangyang Wang et al.
S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Model with Spatio-Temporal Visual Representation
Yichen Xie, Runsheng Xu, Tong He et al.
OrthCaps: An Orthogonal CapsNet with Sparse Attention Routing and Pruning
Geng Xinyu, Jiaming Wang, Jiawei Gong et al.
LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images
Jing Zhang, Irving Fang, Hao Wu et al.
HIIF: Hierarchical Encoding based Implicit Image Function for Continuous Super-resolution
Yuxuan Jiang, Ho Man Kwan, jasmine peng et al.
Unsigned Orthogonal Distance Fields: An Accurate Neural Implicit Representation for Diverse 3D Shapes
YuJie Lu, Long Wan, Nayu Ding et al.
PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation
Qihan Huang, Weilong Dai, Jinlong Liu et al.
Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models
Jin Wang, Chenghui Lv, Xian Li et al.
ID-Patch: Robust ID Association for Group Photo Personalization
Yimeng Zhang, Tiancheng Zhi, Jing Liu et al.
CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIP
Songlong Xing, Zhengyu Zhao, Nicu Sebe
Improving Visual Recognition with Hyperbolical Visual Hierarchy Mapping
Hyeongjun Kwon, Jinhyun Jang, Jin Kim et al.
Towards Accurate and Robust Architectures via Neural Architecture Search
Yuwei Ou, Yuqi Feng, Yanan Sun
Anchor-based Robust Finetuning of Vision-Language Models
Jinwei Han, Zhiwen Lin, Zhongyisun Sun et al.
RayFlow: Instance-Aware Diffusion Acceleration via Adaptive Flow Trajectories
Huiyang Shao, Xin Xia, Yuhong Yang et al.
Generalized Recorrupted-to-Recorrupted: Self-Supervised Learning Beyond Gaussian Noise
Brayan Monroy, Jorge Bacca, Julián Tachella
ResCLIP: Residual Attention for Training-free Dense Vision-language Inference
Jinhong Deng, Yuhang Yang, Wen Li et al.
Gaussian Eigen Models for Human Heads
Wojciech Zielonka, Timo Bolkart, Thabo Beeler et al.
Continuous Optical Zooming: A Benchmark for Arbitrary-Scale Image Super-Resolution in Real World
Huiyuan Fu, Fei Peng, Xianwei Li et al.
PLeaS - Merging Models with Permutations and Least Squares
Anshul Nasery, Jonathan Hayase, Pang Wei Koh et al.
CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering
Tianyu Huai, Jie Zhou, Xingjiao Wu et al.
Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
Federico Cocchi, Nicholas Moratelli, Marcella Cornia et al.
EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation
Zihao Zhang, Haoran Chen, Haoyu Zhao et al.
Temporal Separation with Entropy Regularization for Knowledge Distillation in Spiking Neural Networks
Kairong Yu, Chengting Yu, Tianqing Zhang et al.
DocVLM: Make Your VLM an Efficient Reader
Mor Shpigel Nacson, Aviad Aberdam, Roy Ganz et al.
RGBAvatar: Reduced Gaussian Blendshapes for Online Modeling of Head Avatars
Linzhou Li, Yumeng Li, Yanlin Weng et al.
V2X-R: Cooperative LiDAR-4D Radar Fusion with Denoising Diffusion for 3D Object Detection
Xun Huang, Jinlong Wang, Qiming Xia et al.
Mind The Edge: Refining Depth Edges in Sparsely-Supervised Monocular Depth Estimation
Lior Talker, Aviad Cohen, Erez Yosef et al.
PCDreamer: Point Cloud Completion Through Multi-view Diffusion Priors
Guangshun Wei, Yuan Feng, Long Ma et al.
Trajectory Mamba: Efficient Attention-Mamba Forecasting Model Based on Selective SSM
Yizhou Huang, Yihua Cheng, Kezhi Wang
DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation
Zhiqiang Shen, Ammar Sherif, Zeyuan Yin et al.
Benchmarking Implicit Neural Representation and Geometric Rendering in Real-Time RGB-D SLAM
Tongyan Hua, Addison, Lin Wang
VTON 360: High-Fidelity Virtual Try-On from Any Viewing Direction
Zijian He, Yuwei Ning, Yipeng Qin et al.
Synergistic Global-space Camera and Human Reconstruction from Videos
Yizhou Zhao, Tuanfeng Y. Wang, Bhiksha Raj et al.
Spectrum AUC Difference (SAUCD): Human-aligned 3D Shape Evaluation
Tianyu Luan, Zhong Li, Lele Chen et al.
Hybrid Global-Local Representation with Augmented Spatial Guidance for Zero-Shot Referring Image Segmentation
Ting Liu, Siyuan Li
Video Summarization with Large Language Models
Min Jung Lee, Dayoung Gong, Minsu Cho
Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model
Zhaochong An, Guolei Sun, Yun Liu et al.
Reconstructing People, Places, and Cameras
Lea Müller, Hongsuk Choi, Anthony Zhang et al.
ReCap: Better Gaussian Relighting with Cross-Environment Captures
Jingzhi Li, Zongwei Wu, Eduard Zamfir et al.
BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation
Yuyang Peng, Shishi Xiao, Keming Wu et al.
Layered Image Vectorization via Semantic Simplification
Zhenyu Wang, Jianxi Huang, Zhida Sun et al.
Open-World Amodal Appearance Completion
Jiayang Ao, Yanbei Jiang, Qiuhong Ke et al.
InsightEdit: Towards Better Instruction Following for Image Editing
Yingjing Xu, Jie Kong, Jiazhi Wang et al.
Task-Agnostic Guided Feature Expansion for Class-Incremental Learning
Bowen Zheng, Da-Wei Zhou, Han-Jia Ye et al.
Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks
Peng Xie, Yequan Bie, Jianda Mao et al.
CCIN: Compositional Conflict Identification and Neutralization for Composed Image Retrieval
Likai Tian, Jian Zhao, Zechao Hu et al.
VTON-HandFit: Virtual Try-on for Arbitrary Hand Pose Guided by Hand Priors Embedding
Yujie Liang, Xiaobin Hu, Boyuan Jiang et al.
SpatialDreamer: Self-supervised Stereo Video Synthesis from Monocular Input
Zhen Lv, Yangqi Long, Congzhentao Huang et al.
Data-Efficient Multimodal Fusion on a Single GPU
Noël Vouitsis, Zhaoyan Liu, Satya Krishna Gorti et al.
VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving
Haiming Zhang, Wending Zhou, Shenzhen The Chinese University of Hongkong et al.
ContextSeg: Sketch Semantic Segmentation by Querying the Context with Attention
Jiawei Wang, Changjian Li
Integrating Efficient Optimal Transport and Functional Maps For Unsupervised Shape Correspondence Learning
Tung Le, Khai Nguyen, Shanlin Sun et al.
SpikeNeRF: Learning Neural Radiance Fields from Continuous Spike Stream
Lin Zhu, Kangmin Jia, Yifan Zhao et al.
SplatFlow: Self-Supervised Dynamic Gaussian Splatting in Neural Motion Flow Field for Autonomous Driving
Su Sun, Cheng Zhao, Zhuoyang Sun et al.
HOIGPT: Learning Long-Sequence Hand-Object Interaction with Language Models
Mingzhen Huang, Fu-Jen Chu, Bugra Tekin et al.
MemoNav: Working Memory Model for Visual Navigation
Hongxin Li, Zeyu Wang, Xu Yang et al.
GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration
Yuchen Sun, Shanhui Zhao, Tao Yu et al.
BSNet: Box-Supervised Simulation-assisted Mean Teacher for 3D Instance Segmentation
Jiahao Lu, Jiacheng Deng, Tianzhu Zhang
Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think
Zhenyi Lu, Xiaoye Qu, Zhenyi Lu et al.
Distilling ODE Solvers of Diffusion Models into Smaller Steps
Sanghwan Kim, Hao Tang, Fisher Yu
Learning from One Continuous Video Stream
Joao Carreira, Michael King, Viorica Patraucean et al.
Smart Help: Strategic Opponent Modeling for Proactive and Adaptive Robot Assistance in Households
Zhihao Cao, ZiDong Wang, Siwen Xie et al.
Point2RBox-v2: Rethinking Point-supervised Oriented Object Detection with Spatial Layout Among Instances
Yi Yu, Botao Ren, Peiyuan Zhang et al.
Tuning the Frequencies: Robust Training for Sinusoidal Neural Networks
Tiago Novello, Diana Aldana Moreno, André Araujo et al.
Dual-Scale Transformer for Large-Scale Single-Pixel Imaging
Gang Qu, Ping Wang, Xin Yuan
Efficient Privacy-Preserving Visual Localization Using 3D Ray Clouds
Heejoon Moon, Chunghwan Lee, Je Hyeong Hong
EnergyMoGen: Compositional Human Motion Generation with Energy-Based Diffusion Model in Latent Space
Jianrong Zhang, Hehe Fan, Yi Yang
Towards Automated Movie Trailer Generation
Dawit Argaw Argaw, Mattia Soldan, Alejandro Pardo et al.
LTA-PCS: Learnable Task-Agnostic Point Cloud Sampling
Jiaheng Liu, Jianhao Li, Kaisiyuan Wang et al.
Disentangled Pre-training for Human-Object Interaction Detection
Zhuolong Li, Xingao Li, Changxing Ding et al.
Zero-Shot Structure-Preserving Diffusion Model for High Dynamic Range Tone Mapping
Ruoxi Zhu, Shusong Xu, Peiye Liu et al.
BASKET: A Large-Scale Video Dataset for Fine-Grained Skill Estimation
Yulu Pan, Ce Zhang, Gedas Bertasius
SIRA: Scalable Inter-frame Relation and Association for Radar Perception
Ryoma Yataka, Pu Wang, Petros Boufounos et al.
SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection
Mingxuan Liu, Tyler Hayes, Elisa Ricci et al.
SaCo Loss: Sample-wise Affinity Consistency for Vision-Language Pre-training
WU Sitong, Haoru Tan, Zhuotao Tian et al.
Characteristics Matching Based Hash Codes Generation for Efficient Fine-grained Image Retrieval
Zhen-Duo Chen, Li-Jun Zhao, Zi-Chao Zhang et al.
GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image
Chong Bao, Yinda Zhang, Yuan Li et al.
Neural Point Cloud Diffusion for Disentangled 3D Shape and Appearance Generation
Philipp Schröppel, Christopher Wewer, Jan Lenssen et al.
Diff2Flow: Training Flow Matching Models via Diffusion Model Alignment
Johannes Schusterbauer, Ming Gui, Frank Fundel et al.
YolOOD: Utilizing Object Detection Concepts for Multi-Label Out-of-Distribution Detection
Alon Zolfi, Guy AmiT, Amit Baras et al.
Single Mesh Diffusion Models with Field Latents for Texture Generation
Thomas W. Mitchel, Carlos Esteves, Ameesh Makadia
Text-guided Explorable Image Super-resolution
Kanchana Vaishnavi Gandikota, Paramanand Chandramouli
STCOcc: Sparse Spatial-Temporal Cascade Renovation for 3D Occupancy and Scene Flow Prediction
Zhimin Liao, Ping Wei, Shuaijia Chen et al.
FSC: Few-point Shape Completion
Xianzu Wu, Xianfeng Wu, Tianyu Luan et al.
FedMIA: An Effective Membership Inference Attack Exploiting "All for One" Principle in Federated Learning
Gongxi Zhu, Donghao Li, Hanlin Gu et al.
Flowing from Words to Pixels: A Noise-Free Framework for Cross-Modality Evolution
Qihao Liu, Xi Yin, Alan L. Yuille et al.
CADDreamer: CAD Object Generation from Single-view Images
Yuan Li, Cheng Lin, Yuan Liu et al.
Step Differences in Instructional Video
Tushar Nagarajan, Lorenzo Torresani
ObjectMover: Generative Object Movement with Video Prior
Xin Yu, Tianyu Wang, Soo Ye Kim et al.
STOP: Integrated Spatial-Temporal Dynamic Prompting for Video Understanding
Zichen Liu, Kunlun Xu, Bing Su et al.
A Theory of Joint Light and Heat Transport for Lambertian Scenes
Mani Ramanagopal, Sriram Narayanan, Aswin C. Sankaranarayanan et al.
DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding
Wenhui Liao, Jiapeng Wang, Hongliang Li et al.
Splatter-360: Generalizable 360 Gaussian Splatting for Wide-baseline Panoramic Images
Zheng Chen, Chenming Wu, Zhelun Shen et al.
Imagine and Seek: Improving Composed Image Retrieval with an Imagined Proxy
You Li, Fan Ma, Yi Yang
CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools
Chinedu Innocent Nwoye, Kareem elgohary, Anvita A. Srinivas et al.
LoRA Subtraction for Drift-Resistant Space in Exemplar-Free Continual Learning
Xuan Liu, Xiaobin Chang
One More Step: A Versatile Plug-and-Play Module for Rectifying Diffusion Schedule Flaws and Enhancing Low-Frequency Controls
Minghui Hu, Jianbin Zheng, Chuanxia Zheng et al.
SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments
Yue Cao, Yun Xing, Jie Zhang et al.
ProMotion: Prototypes As Motion Learners
Yawen Lu, Dongfang Liu, Qifan Wang et al.
Seurat: From Moving Points to Depth
Seokju Cho, Gabriel Huang, Seungryong Kim et al.
Distilling Monocular Foundation Model for Fine-grained Depth Completion
Yingping Liang, Yutao Hu, Wenqi Shao et al.
Fusing Personal and Environmental Cues for Identification and Segmentation of First-Person Camera Wearers in Third-Person Views
Ziwei Zhao, Yuchen Wang, Chuhua Wang
GLASS: Guided Latent Slot Diffusion for Object-Centric Learning
Krishnakant Singh, Simone Schaub-Meyer, Stefan Roth
Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos
Sagnik Majumder, Ziad Al-Halah, Kristen Grauman
Bilateral Event Mining and Complementary for Event Stream Super-Resolution
Zhilin Huang, Quanmin Liang, Yijie Yu et al.
SuperPC: A Single Diffusion Model for Point Cloud Completion, Upsampling, Denoising, and Colorization
Yi Du, Zhipeng Zhao, Shaoshu Su et al.
From Poses to Identity: Training-Free Person Re-Identification via Feature Centralization
Chao Yuan, Guiwei Zhang, Changxiao Ma et al.
Neural Video Compression with Context Modulation
Chuanbo Tang, Zhuoyuan Li, Yifan Bian et al.
MUST: The First Dataset and Unified Framework for Multispectral UAV Single Object Tracking
Haolin Qin, Tingfa Xu, Tianhao Li et al.
PreciseCam: Precise Camera Control for Text-to-Image Generation
Edurne Bernal-Berdun, Ana Serrano, Belen Masia et al.
An End-to-End Robust Point Cloud Semantic Segmentation Network with Single-Step Conditional Diffusion Models
Wentao Qu, Jing Wang, Yongshun Gong et al.
Online Video Understanding: OVBench and VideoChat-Online
Zhenpeng Huang, Xinhao Li, Jiaqi Li et al.
A Unified and Interpretable Emotion Representation and Expression Generation
Reni Paskaleva, Mykyta Holubakha, Andela Ilic et al.
TraF-Align: Trajectory-aware Feature Alignment for Asynchronous Multi-agent Perception
Zhiying Song, Lei Yang, Fuxi Wen et al.
Making Visual Sense of Oracle Bones for You and Me
Runqi Qiao, LAN YANG, Kaiyue Pang et al.
SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation
Leigang Qu, Haochuan Li, Wenjie Wang et al.
KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation
Antoni Bigata Casademunt, Michał Stypułkowski, Rodrigo Mira et al.
HiMoR: Monocular Deformable Gaussian Reconstruction with Hierarchical Motion Representation
Yiming Liang, Tianhan Xu, Yuta Kikuchi
Towards More Accurate Diffusion Model Acceleration with A Timestep Tuner
Mengfei Xia, Yujun Shen, Changsong Lei et al.
Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training
Haicheng Wang, Chen Ju, Weixiong Lin et al.
Semantics-aware Motion Retargeting with Vision-Language Models
Haodong Zhang, ZhiKe Chen, Haocheng Xu et al.
MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining
Yunze Liu, Li Yi
LITA-GS: Illumination-Agnostic Novel View Synthesis via Reference-Free 3D Gaussian Splatting and Physical Priors
Han Zhou, Wei Dong, Jun Chen
Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model
Benlin Liu, Yuhao Dong, Yiqin Wang et al.
CDFormer: When Degradation Prediction Embraces Diffusion Model for Blind Image Super-Resolution
Qingguo Liu, Chenyi Zhuang, Pan Gao et al.
Knowledge-Aligned Counterfactual-Enhancement Diffusion Perception for Unsupervised Cross-Domain Visual Emotion Recognition
Wen Yin, Yong Wang, Guiduo Duan et al.
CoA: Towards Real Image Dehazing via Compression-and-Adaptation
Long Ma, Yuxin Feng, Yan Zhang et al.
UnCommon Objects in 3D
Xingchen Liu, Piyush Tayal, Jianyuan Wang et al.
High-Quality Facial Geometry and Appearance Capture at Home
Yuxuan Han, Junfeng Lyu, Feng Xu
Physical Plausibility-aware Trajectory Prediction via Locomotion Embodiment
Hiromu Taketsugu, Takeru Oba, Takahiro Maeda et al.
DreamText: High Fidelity Scene Text Synthesis
Yibin Wang, Weizhong Zhang, honghui xu et al.
Rethinking Query-based Transformer for Continual Image Segmentation
Yuchen Zhu, Cheng Shi, Dingyou Wang et al.
Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models
Jiaming Zhang, Junhong Ye, Xingjun Ma et al.
Towards Transformer-Based Aligned Generation with Self-Coherence Guidance
Shulei Wang, Wang Lin, Hai Huang et al.
FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication
Eric Slyman, Stefan Lee, Scott Cohen et al.
EventGPT: Event Stream Understanding with Multimodal Large Language Models
shaoyu liu, Jianing Li, guanghui zhao et al.
Turb-Seg-Res: A Segment-then-Restore Pipeline for Dynamic Videos with Atmospheric Turbulence
Ripon Saha, Dehao Qin, Nianyi Li et al.
Dual Prompting Image Restoration with Diffusion Transformers
Dehong Kong, Fan Li, Zhixin Wang et al.
SAM2Object: Consolidating View Consistency via SAM2 for Zero-Shot 3D Instance Segmentation
Jihuai Zhao, Junbao Zhuo, Jiansheng Chen et al.
DepthCues: Evaluating Monocular Depth Perception in Large Vision Models
Duolikun Danier, Mehmet Aygun, Changjian Li et al.
TurboSL: Dense Accurate and Fast 3D by Neural Inverse Structured Light
Parsa Mirdehghan, Maxx Wu, Wenzheng Chen et al.
Clockwork Diffusion: Efficient Generation With Model-Step Distillation
Amirhossein Habibian, Amir Ghodrati, Noor Fathima et al.
Mimic In-Context Learning for Multimodal Tasks
Yuchu Jiang, Jiale Fu, chenduo hao et al.
Memory-Scalable and Simplified Functional Map Learning
Robin Magnet, Maks Ovsjanikov
VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language Models
Muchao Ye, Weiyang Liu, Pan He
MP-SfM: Monocular Surface Priors for Robust Structure-from-Motion
Zador Pataki, Paul-Edouard Sarlin, Johannes Schönberger et al.
ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos
Tanveer Hannan, Md Mohaiminul Islam, Jindong Gu et al.
FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Generalized Segmentation
Dong Zhao, Jinlong Li, Shuang Wang et al.
GenDeg: Diffusion-based Degradation Synthesis for Generalizable All-In-One Image Restoration
Sudarshan Rajagopalan, Nithin Gopalakrishnan Nair, Jay Paranjape et al.
Bayesian Test-Time Adaptation for Vision-Language Models
Lihua Zhou, Mao Ye, Shuaifeng Li et al.
AAMDM: Accelerated Auto-regressive Motion Diffusion Model
Tianyu Li, Calvin Zhuhan Qiao, Ren Guanqiao et al.
A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives
Simone Alberto Peirone, Francesca Pistilli, Antonio Alliegro et al.
C3Net: Compound Conditioned ControlNet for Multimodal Content Generation
Juntao Zhang, Yuehuai LIU, Yu-Wing Tai et al.
MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes
Ruijie Lu, Yixin Chen, Junfeng Ni et al.
Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval
Arun Reddy, Alexander Martin, Eugene Yang et al.
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Miran Heo, Min-Hung Chen, De-An Huang et al.
Post-pre-training for Modality Alignment in Vision-Language Foundation Models
Shin'ya Yamaguchi, Dewei Feng, Sekitoshi Kanai et al.
Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners
Chun Feng, Joy Hsu, Weiyu Liu et al.
MP-GUI: Modality Perception with MLLMs for GUI Understanding
Ziwei Wang, Weizhi Chen, Leyang Yang et al.