Most Cited CVPR "higher-resolution generation" Papers
5,589 papers found • Page 15 of 28
Conference
Pseudo Visible Feature Fine-Grained Fusion for Thermal Object Detection
Ting Li, Mao Ye, Tianwen Wu et al.
MDP: Multidimensional Vision Model Pruning with Latency Constraint
Xinglong Sun, Barath Lakshmanan, Maying Shen et al.
Depth-Guided Bundle Sampling for Efficient Generalizable Neural Radiance Field Reconstruction
Li Fang, Hao Zhu, Longlong Chen et al.
Dual Energy-Based Model with Open-World Uncertainty Estimation for Out-of-distribution Detection
Qi Chen, Hu Ding
STEPS: Sequential Probability Tensor Estimation for Text-to-Image Hard Prompt Search
Yuning Qiu, Andong Wang, Chao Li et al.
Data-free Universal Adversarial Perturbation with Pseudo-semantic Prior
Chanhui Lee, Yeonghwan Song, Jeany Son
Foveated Instance Segmentation
Hongyi Zeng, Wenxuan Liu, Tianhua Xia et al.
Uncertainty-Instructed Structure Injection for Generalizable HD Map Construction
Xiaolu Liu, Ruizi Yang, Song Wang et al.
DiGIT: Multi-Dilated Gated Encoder and Central-Adjacent Region Integrated Decoder for Temporal Action Detection Transformer
Ho-Joong Kim, Yearang Lee, Jung-Ho Hong et al.
Understanding Multi-layered Transmission Matrices
Marina Alterman, Anat Levin
Explicit Depth-Aware Blurry Video Frame Interpolation Guided by Differential Curves
yan zaoming, pengcheng lei, Tingting Wang et al.
NN-Former: Rethinking Graph Structure in Neural Architecture Representation
Ruihan Xu, Haokui Zhang, Yaowei Wang et al.
DiSRT-In-Bed: Diffusion-Based Sim-to-Real Transfer Framework for In-Bed Human Mesh Recovery
Jing Gao, Ce Zheng, Laszlo Jeni et al.
Let Samples Speak: Mitigating Spurious Correlation by Exploiting the Clusterness of Samples
WEIWEI LI, Junzhuo Liu, Yuanyuan Ren et al.
HELVIPAD: A Real-World Dataset for Omnidirectional Stereo Depth Estimation
Mehdi Zayene, Albias Havolli, Jannik Endres et al.
RipVIS: Rip Currents Video Instance Segmentation Benchmark for Beach Monitoring and Safety
Andrei Dumitriu, Florin Tatui, Florin Miron et al.
HiLoTs: High-Low Temporal Sensitive Representation Learning for Semi-Supervised LiDAR Segmentation in Autonomous Driving
R.D. Lin, Pengcheng Weng, Yinqiao Wang et al.
Medusa: A Multi-Scale High-order Contrastive Dual-Diffusion Approach for Multi-View Clustering
Liang Chen, Zhe Xue, Yawen Li et al.
Dual Semantic Guidance for Open Vocabulary Semantic Segmentation
ZhengYang Wang, Tingliang Feng, Fan Lyu et al.
Decouple-Then-Merge: Finetune Diffusion Models as Multi-Task Learning
Qianli Ma, Xuefei Ning, Dongrui Liu et al.
SPARC: Score Prompting and Adaptive Fusion for Zero-Shot Multi-Label Recognition in Vision-Language Models
Kevin Miller, Aditya Gangrade, Samarth Mishra et al.
SyncSDE: A Probabilistic Framework for Diffusion Synchronization
Hyunjun Lee, Hyunsoo Lee, Sookwan Han
Dense Dispersed Structured Light for Hyperspectral 3D Imaging of Dynamic Scenes
Suhyun Shin, Seungwoo Yoon, Ryota Maeda et al.
PURA: Parameter Update-Recovery Test-Time Adaption for RGB-T Tracking
Zekai Shao, Yufan Hu, Bin Fan et al.
CamPoint: Boosting Point Cloud Segmentation with Virtual Camera
Jianhui Zhang, Luo Yizhi, Zicheng Zhang et al.
Improving Editability in Image Generation with Layer-wise Memory
Daneul Kim, Jaeah Lee, Jaesik Park
Neural 3D Strokes: Creating Stylized 3D Scenes with Vectorized 3D Strokes
Haobin Duan, Miao Wang, Yanxun Li et al.
Identity-preserving Distillation Sampling by Fixed-Point Iterator
SeonHwa Kim, Jiwon Kim, Soobin Park et al.
Boost the Inference with Co-training: A Depth-guided Mutual Learning Framework for Semi-supervised Medical Polyp Segmentation
Yuxin Li, Zihao Zhu, Yuxiang Zhang et al.
Two is Better than One: Efficient Ensemble Defense for Robust and Compact Models
Yoojin Jung, Byung Cheol Song
AirRoom: Objects Matter in Room Reidentification
Runmao Yao, Yi Du, Zhuoqun Chen et al.
CaMuViD: Calibration-Free Multi-View Detection
Amir Etefaghi Daryani, M. Usman Maqbool Bhutta, Byron Hernandez et al.
Link to the Past: Temporal Propagation for Fast 3D Human Reconstruction from Monocular Video
Marchellus Matthew, Nadhira Noor, In Kyu Park
Targeted Forgetting of Image Subgroups in CLIP Models
Zeliang Zhang, Gaowen Liu, Charles Fleming et al.
Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark
Hao Guo, Xugong Qin, Jun Jie Ou Yang et al.
Revisiting Generative Replay for Class Incremental Object Detection
Shizhou Zhang, Xueqiang Lv, Yinghui Xing et al.
Creating Your Editable 3D Photorealistic Avatar with Tetrahedron-constrained Gaussian Splatting
Hanxi Liu, Yifang Men, Zhouhui Lian
HiFi-Portrait: Zero-shot Identity-preserved Portrait Generation with High-fidelity Multi-face Fusion
Yifang Xu, BenXiang Zhai, Yunzhuo Sun et al.
Spk2SRImgNet: Super-Resolve Dynamic Scene from Spike Stream via Motion Aligned Collaborative Filtering
Yuanlin Wang, Yiyang Zhang, Ruiqin Xiong et al.
Visual and Semantic Prompt Collaboration for Generalized Zero-Shot Learning
Huajie Jiang, Zhengxian Li, Xiaohan Yu et al.
De^2Gaze: Deformable and Decoupled Representation Learning for 3D Gaze Estimation
Yunfeng Xiao, Xiaowei Bai, Baojun Chen et al.
Integral Fast Fourier Color Constancy
Wenjun Wei, Yanlin Qian, Huaian Chen et al.
COUNTS: Benchmarking Object Detectors and Multimodal Large Language Models under Distribution Shifts
Jiansheng Li, Xingxuan Zhang, Hao Zou et al.
Feature Spectrum Learning for Remote Sensing Change Detection
Qi Zang, Dong Zhao, Shuang Wang et al.
Relation-Rich Visual Document Generator for Visual Information Extraction
Zi-Han Jiang, Chien-Wei Lin, WeiHua Li et al.
OpticalNet: An Optical Imaging Dataset and Benchmark Beyond the Diffraction Limit
Benquan Wang, Ruyi An, Jin-Kyu So et al.
PHGC: Procedural Heterogeneous Graph Completion for Natural Language Task Verification in Egocentric Videos
Xun Jiang, Zhiyi Huang, Xing Xu et al.
Keep the Balance: A Parameter-Efficient Symmetrical Framework for RGB+X Semantic Segmentation
Jiaxin Cai, Jingze Su, Qi Li et al.
Semantic Line Combination Detector
JINWON KO, Dongkwon Jin, Chang-Su Kim
Difference Inversion: Interpolate and Isolate the Difference with Token Consistency for Image Analogy Generation
Hyunsoo Kim, Donghyun Kim, Suhyun Kim
Sketchtopia: A Dataset and Foundational Agents for Benchmarking Asynchronous Multimodal Communication with Iconic Feedback
Mohd Hozaifa Khan, Ravi Kiran Sarvadevabhatla
Probabilistic Prompt Distribution Learning for Animal Pose Estimation
Jiyong Rao, Brian Nlong Zhao, Yu Wang
Point-Cache: Test-time Dynamic and Hierarchical Cache for Robust and Generalizable Point Cloud Analysis
Hongyu Sun, Qiuhong Ke, Ming Cheng et al.
MEET: Towards Memory-Efficient Temporal Sparse Deep Neural Networks
Zeqi Zhu, Ibrahim Batuhan Akkaya, Luc Waeijen et al.
Self-Supervised Large Scale Point Cloud Completion for Archaeological Site Restoration
Aocheng Li, James R. Zimmer-Dauphinee, Rajesh Kalyanam et al.
Non-Rigid Structure-from-Motion: Temporally-Smooth Procrustean Alignment and Spatially-Variant Deformation Modeling
Jiawei Shi, Hui Deng, Yuchao Dai
Self-Supervised Learning for Color Spike Camera Reconstruction
Yanchen Dong, Ruiqin Xiong, Xiaopeng Fan et al.
UMFN: Unified Multi-Domain Face Normalization for Joint Cross-domain Prototype Learning and Heterogeneous Face Recognition
Meng Pang, Wenjun Zhang, Nanrun Zhou et al.
Towards Cost-Effective Learning: A Synergy of Semi-Supervised and Active Learning
Tianxiang Yin, Ningzhong Liu, Han Sun
Saliuitl: Ensemble Salience Guided Recovery of Adversarial Patches against CNNs
Mauricio Byrd Victorica, György Dán, Henrik Sandberg
Percept, Memory, and Imagine: World Feature Simulating for Open-Domain Unknown Object Detection
Aming Wu, Cheng Deng
Dynamic Group Normalization: Spatio-Temporal Adaptation to Evolving Data Statistics
Yair Smadar, Assaf Hoogi
HyperPose: Hypernetwork-Infused Camera Pose Localization and an Extended Cambridge Landmarks Dataset
Ron Ferens, Yosi Keller
Libra-Merging: Importance-redundancy and Pruning-merging Trade-off for Acceleration Plug-in in Large Vision-Language Model
Longrong Yang, Dong Shen, Chaoxiang Cai et al.
FSboard: Over 3 Million Characters of ASL Fingerspelling Collected via Smartphones
Manfred Georg, Garrett Tanzer, Esha Uboweja et al.
TADFormer: Task-Adaptive Dynamic TransFormer for Efficient Multi-Task Learning
Seungmin Baek, Soyul Lee, Hayeon Jo et al.
OFER: Occluded Face Expression Reconstruction
Pratheba Selvaraju, Victoria Abrevaya, Timo Bolkart et al.
Object Dynamics Modeling with Hierarchical Point Cloud-based Representations
Chanho Kim, Li Fuxin
Classic Video Denoising in a Machine Learning World: Robust, Fast, and Controllable
Xin Jin, Simon Niklaus, Zhoutong Zhang et al.
ScaleLSD: Scalable Deep Line Segment Detection Streamlined
Zeran Ke, Bin Tan, Xianwei Zheng et al.
FATE: Full-head Gaussian Avatar with Textural Editing from Monocular Video
Jiawei Zhang, Zijian Wu, Zhiyang Liang et al.
BioX-CPath: Biologically-driven Explainable Diagnostics for Multistain IHC Computational Pathology
Amaya Gallagher-Syed, Henry Senior, Omnia Alwazzan et al.
Unlocking Generalization Power in LiDAR Point Cloud Registration
Zhenxuan Zeng, Qiao Wu, Xiyu Zhang et al.
MultimodalStudio: A Heterogeneous Sensor Dataset and Framework for Neural Rendering across Multiple Imaging Modalities
Federico Lincetto, Gianluca Agresti, Mattia Rossi et al.
Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy
Zaijing Li, Yuquan Xie, Rui Shao et al.
Sketch Down the FLOPs: Towards Efficient Networks for Human Sketch
Aneeshan Sain, Subhajit Maity, Pinaki Nath Chowdhury et al.
Occlusion-aware Text-Image-Point Cloud Pretraining for Open-World 3D Object Recognition
Khanh Nguyen, Ghulam Mubashar Hassan, Ajmal Mian
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
Chan Hee Song, Valts Blukis, Jonathan Tremblay et al.
Instant Gaussian Stream: Fast and Generalizable Streaming of Dynamic Scene Reconstruction via Gaussian Splatting
Jinbo Yan, Rui Peng, Zhiyan Wang et al.
Structure-Aware Correspondence Learning for Relative Pose Estimation
Yihan Chen, Wenfei Yang, Huan Ren et al.
Shape Abstraction via Marching Differentiable Support Functions
Sunkyung Park, Jeongmin Lee, Dongjun Lee
CH3Depth: Efficient and Flexible Depth Foundation Model with Flow Matching
Jiaqi Li, Yiran Wang, Jinghong Zheng et al.
R2C: Mapping Room to Chessboard to Unlock LLM As Low-Level Action Planner
Ziyi Bai, Hanxuan Li, Bin Fu et al.
ODHSR: Online Dense 3D Reconstruction of Humans and Scenes from Monocular Videos
Zetong Zhang, Manuel Kaufmann, Lixin Xue et al.
Split Adaptation for Pre-trained Vision Transformers
Lixu Wang, Bingqi Shang, Yi Li et al.
Reanimating Images using Neural Representations of Dynamic Stimuli
Jacob Yeung, Andrew Luo, Gabriel Sarch et al.
Relation3D : Enhancing Relation Modeling for Point Cloud Instance Segmentation
Edward LOO, Jiacheng Deng
High Dynamic Range Video Compression: A Large-Scale Benchmark Dataset and A Learned Bit-depth Scalable Compression Algorithm
Zhaoyi Tian, Feifeng Wang, Shiwei Wang et al.
Towards Lossless Implicit Neural Representation via Bit Plane Decomposition
Woo Kyoung Han, Byeonghun Lee, Hyunmin Cho et al.
Mind the Trojan Horse: Image Prompt Adapter Enabling Scalable and Deceptive Jailbreaking
Junxi Chen, Junhao Dong, Xiaohua Xie
Mind the Gap: Detecting Black-box Adversarial Attacks in the Making through Query Update Analysis
Jeonghwan Park, Niall McLaughlin, Ihsen Alouani
Reconstructing Humans with a Biomechanically Accurate Skeleton
Yan Xia, Xiaowei Zhou, Etienne Vouga et al.
Change3D: Revisiting Change Detection and Captioning from A Video Modeling Perspective
Duowang Zhu, Xiaohu Huang, Haiyan Huang et al.
ProbPose: A Probabilistic Approach to 2D Human Pose Estimation
Miroslav Purkrábek, Jiri Matas
Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves
Shihan Wu, Ji Zhang, Pengpeng Zeng et al.
Improving Transferable Targeted Attacks with Feature Tuning Mixup
Kaisheng Liang, Xuelong Dai, Yanjie Li et al.
Low-Biased General Annotated Dataset Generation
Dengyang Jiang, Haoyu Wang, Lei Zhang et al.
ChatHuman: Chatting about 3D Humans with Tools
Jing Lin, Yao Feng, Weiyang Liu et al.
Open Ad-hoc Categorization with Contextualized Feature Learning
Zilin Wang, Sangwoo Mo, Stella X. Yu et al.
MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes
Ruijie Lu, Yixin Chen, Junfeng Ni et al.
Tora: Trajectory-oriented Diffusion Transformer for Video Generation
Zhenghao Zhang, Junchao Liao, Menghao Li et al.
SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video
Jongmin Park, Minh-Quan Viet Bui, Juan Luis Gonzalez Bello et al.
EVPGS: Enhanced View Prior Guidance for Splatting-based Extrapolated View Synthesis
Jiahe Li, Feiyu Wang, Xiaochao Qu et al.
Diffusion Renderer: Neural Inverse and Forward Rendering with Video Diffusion Models
Ruofan Liang, Žan Gojčič, Huan Ling et al.
COSMIC: Clique-Oriented Semantic Multi-space Integration for Robust CLIP Test-Time Adaptation
Fanding Huang, Jingyan Jiang, Qinting Jiang et al.
MAGE : Single Image to Material-Aware 3D via the Multi-View G-Buffer Estimation Model
Haoyuan Wang, Zhenwei Wang, Xiaoxiao Long et al.
Hyperbolic Uncertainty-Aware Few-Shot Incremental Point Cloud Segmentation
Tanuj Sur, Samrat Mukherjee, Kaizer Rahaman et al.
Optimal Transport-Guided Source-Free Adaptation for Face Anti-Spoofing
Zhuowei Li, Tianchen Zhao, Xiang Xu et al.
Mamba4D: Efficient 4D Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models
Jiuming Liu, Jinru Han, Lihao Liu et al.
Anatomical Consistency and Adaptive Prior-informed Transformation for Multi-contrast MR Image Synthesis via Diffusion Model
Yejee Shin, Yeeun Lee, Hanbyol Jang et al.
Towards Universal AI-Generated Image Detection by Variational Information Bottleneck Network
Haifeng Zhang, Qinghui He, Xiuli Bi et al.
LLM-driven Multimodal and Multi-Identity Listening Head Generation
Peiwen Lai, Weizhi Zhong, Yipeng Qin et al.
Mono3DVLT: Monocular-Video-Based 3D Visual Language Tracking
Hongkai Wei, YANG YANG, Shijie Sun et al.
Glossy Object Reconstruction with Cost-effective Polarized Acquisition
Bojian Wu, YIFAN PENG, Ruizhen Hu et al.
ColabSfM: Collaborative Structure-from-Motion by Point Cloud Registration
Johan Edstedt, André Mateus, Alberto Jaenal
Scaling up Image Segmentation across Data and Tasks
Pei Wang, Zhaowei Cai, Hao Yang et al.
SEEN-DA: SEmantic ENtropy guided Domain-aware Attention for Domain Adaptive Object Detection
Haochen Li, Rui Zhang, Hantao Yao et al.
Pixel-level and Semantic-level Adjustable Super-resolution: A Dual-LoRA Approach
Lingchen Sun, Rongyuan Wu, Zhiyuan Ma et al.
EntitySAM: Segment Everything in Video
Mingqiao Ye, Seoung Wug Oh, Lei Ke et al.
Towards Consistent Multi-Task Learning: Unlocking the Potential of Task-Specific Parameters
Xiaohan Qin, Xiaoxing Wang, Junchi Yan
Supervising Sound Localization by In-the-wild Egomotion
Anna Min, Ziyang Chen, Hang Zhao et al.
Concept Replacer: Replacing Sensitive Concepts in Diffusion Models via Precision Localization
lingyun zhang, Yu Xie, Yanwei Fu et al.
Matrix-Free Shared Intrinsics Bundle Adjustment
Daniel Safari
Doppelgängers and Adversarial Vulnerability
George Kamberov
A4A: Adapter for Adapter Transfer via All-for-All Mapping for Cross-Architecture Models
Keyu Tu, Mengqi Huang, Zhuowei Chen et al.
CASP: Consistency-aware Audio-induced Saliency Prediction Model for Omnidirectional Video
Zhaolin Wan, Han Qin, Zhiyang Li et al.
Hybrid Reciprocal Transformer with Triplet Feature Alignment for Scene Graph Generation
Jiawei Fu, ZHANG Tiantian, Kai Chen et al.
Camera Resection from Known Line Pencils and a Radially Distorted Scanline
Juan Carlos Dibene Simental, Enrique Dunn
Generalized Zero-Shot Classification via Semantics-Free Inter-Class Feature Generation
Libiao Chen, Dong Nie, Junjun Pan et al.
Redefining <Creative> in Dictionary: Towards an Enhanced Semantic Understanding of Creative Generation
Fu Feng, Yucheng Xie, Xu Yang et al.
Collaborative Tree Search for Enhancing Embodied Multi-Agent Collaboration
Lizheng Zu, Lin Lin, Song Fu et al.
Generative Map Priors for Collaborative BEV Semantic Segmentation
Jiahui Fu, Yue Gong, Luting Wang et al.
SOAP: Vision-Centric 3D Semantic Scene Completion with Scene-Adaptive Decoder and Occluded Region-Aware View Projection
Hyo-Jun Lee, Yeong Jun Koh, Hanul Kim et al.
beta-FFT: Nonlinear Interpolation and Differentiated Training Strategies for Semi-Supervised Medical Image Segmentation
Ming Hu, Jianfu Yin, Zhuangzhuang Ma et al.
GliaNet: Adaptive Neural Network Structure Learning with Glia-Driven
Mengqiao Han, Liyuan Pan, Xiabi Liu
WISH: Weakly Supervised Instance Segmentation using Heterogeneous Labels
Hyeokjun Kweon, Kuk-Jin Yoon
Frequency-Biased Synergistic Design for Image Compression and Compensation
Jiaming Liu, Qi Zheng, Zihao Liu et al.
Hierarchical Gaussian Mixture Model Splatting for Efficient and Part Controllable 3D Generation
Qitong Yang, Mingtao Feng, Zijie Wu et al.
Quad-Pixel Image Defocus Deblurring: A New Benchmark and Model
Hang Chen, Yin Xie, Xiaoxiu Peng et al.
D^3CTTA: Domain-Dependent Decorrelation for Continual Test-Time Adaption of 3D LiDAR Segmentation
Jichun Zhao, Haiyong Jiang, Haoxuan Song et al.
Empowering LLMs to Understand and Generate Complex Vector Graphics
XiMing Xing, Juncheng Hu, Guotao Liang et al.
IM-Zero: Instance-level Motion Controllable Video Generation in a Zero-shot Manner
Yuyang Huang, Yabo Chen, Li Ding et al.
MaSS13K: A Matting-level Semantic Segmentation Benchmark
Chenxi Xie, Minghan LI, Hui Zeng et al.
OmniStereo: Real-time Omnidireactional Depth Estimation with Multiview Fisheye Cameras
Jiaxi Deng, Yushen Wang, Haitao Meng et al.
MetaWriter: Personalized Handwritten Text Recognition Using Meta-Learned Prompt Tuning
Wenhao Gu, Li Gu, Ching Suen et al.
Remote Photoplethysmography in Real-World and Extreme Lighting Scenarios
Hang Shao, lei luo, Jianjun Qian et al.
Adapting Pre-trained 3D Models for Point Cloud Video Understanding via Cross-frame Spatio-temporal Perception
Baixuan Lv, Yaohua Zha, Tao Dai et al.
UWAV: Uncertainty-weighted Weakly-supervised Audio-Visual Video Parsing
Yung-Hsuan Lai, Janek Ebbers, Yu-Chiang Frank Wang et al.
A Regularization-Guided Equivariant Approach for Image Restoration
Yulu Bai, Jiahong Fu, Qi Xie et al.
DejaVid: Encoder-Agnostic Learned Temporal Matching for Video Classification
Darryl Ho, Samuel Madden
CoSER: Towards Consistent Dense Multiview Text-to-Image Generator for 3D Creation
Bonan Li, Zicheng Zhang, Xingyi Yang et al.
ADD: Attribution-Driven Data Augmentation Framework for Boosting Image Super-Resolution
Zeyu Mi, Yu-Bin Yang
All-Day Multi-Camera Multi-Target Tracking
Huijie Fan, Yu Qiao, Yihao Zhen et al.
Enhancing Online Continual Learning with Plug-and-Play State Space Model and Class-Conditional Mixture of Discretization
Sihao Liu, Yibo Yang, Xiaojie Li et al.
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
Ashmal Vayani, Dinura Dissanayake, Hasindri Watawana et al.
Text Augmented Correlation Transformer For Few-shot Classification & Segmentation
Srinivasa Rao Nandam, Sara Atito, Zhenhua Feng et al.
ODA-GAN: Orthogonal Decoupling Alignment GAN Assisted by Weakly-supervised Learning for Virtual Immunohistochemistry Staining
Tong Wang, Mingkang Wang, Zhongze Wang et al.
Harnessing Global-Local Collaborative Adversarial Perturbation for Anti-Customization
Long Xu, Jiakai Wang, Haojie Hao et al.
GBlobs: Explicit Local Structure via Gaussian Blobs for Improved Cross-Domain LiDAR-based 3D Object Detection
Dušan Malić, Christian Fruhwirth-Reisinger, Samuel Schulter et al.
Segmenting Maxillofacial Structures in CBCT Volumes
Federico Bolelli, Kevin Marchesini, Niels van Nistelrooij et al.
Activating Sparse Part Concepts for 3D Class Incremental Learning
Zhenya Tian, Jun Xiao, Liu lupeng et al.
Data-Free Group-Wise Fully Quantized Winograd Convolution via Learnable Scales
Shuokai Pan, Gerti Tuzi, Sudarshan Sreeram et al.
CorrBEV: Multi-View 3D Object Detection by Correlation Learning with Multi-modal Prototypes
ziteng xue, Mingzhe Guo, Heng Fan et al.
ACL: Activating Capability of Linear Attention for Image Restoration
Yubin Gu, Yuan Meng, Jiayi Ji et al.
PARC: A Quantitative Framework Uncovering the Symmetries within Vision Language Models
Jenny Schmalfuss, Nadine Chang, Vibashan VS et al.
Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding
Duo Zheng, Shijia Huang, Liwei Wang
MExD: An Expert-Infused Diffusion Model for Whole-Slide Image Classification
Jianwei Zhao, XIN LI, Fan Yang et al.
DynScene: Scalable Generation of Dynamic Robotic Manipulation Scenes for Embodied AI
Sangmin Lee, Sungyong Park, Heewon Kim
Adversarial Domain Prompt Tuning and Generation for Single Domain Generalization
Zhipeng Xu, De Cheng, XINYANG JIANG et al.
M3GYM: A Large-Scale Multimodal Multi-view Multi-person Pose Dataset for Fitness Activity Understanding in Real-world Settings
Qingzheng Xu, Ru Cao, Xin Shen et al.
D2SP: Dynamic Dual-Stage Purification Framework for Dual Noise Mitigation in Vision-based Affective Recognition.
Haoran Wang, Xinji Mai, Zeng Tao et al.
DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation
Zhiqiang Shen, Ammar Sherif, Zeyuan Yin et al.
Towards Transformer-Based Aligned Generation with Self-Coherence Guidance
Shulei Wang, Wang Lin, Hai Huang et al.
SpatialCLIP: Learning 3D-aware Image Representations from Spatially Discriminative Language
zehan wang, Sashuai zhou, Shaoxuan He et al.
MonSter: Marry Monodepth to Stereo Unleashes Power
JunDa Cheng, Longliang Liu, Gangwei Xu et al.
Insightful Instance Features for 3D Instance Segmentation
Wonseok Roh, Hwanhee Jung, Giljoo Nam et al.
NightAdapter: Learning a Frequency Adapter for Generalizable Night-time Scene Segmentation
Qi Bi, Jingjun Yi, Huimin Huang et al.
Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Mutimodal Models
Xingrui Wang, Wufei Ma, Tiezheng Zhang et al.
Consistency-aware Self-Training for Iterative-based Stereo Matching
Jingyi Zhou, Peng Ye, Haoyu Zhang et al.
HomoGen: Enhanced Video Inpainting via Homography Propagation and Diffusion
Ding Ding, Yueming Pan, Ruoyu Feng et al.
Ev-3DOD: Pushing the Temporal Boundaries of 3D Object Detection with Event Cameras
Hoonhee Cho, Jae-Young Kang, Youngho Kim et al.
Fortifying Federated Learning Towards Trustworthiness via Auditable Data Valuation and Verifiable Client Contribution
Naveen Kumar Kummari, Ranjeet Ranjan Jha, Krishna Mohan Chalavadi et al.
TIDE: Training Locally Interpretable Domain Generalization Models Enables Test-time Correction
Aishwarya Agarwal, Srikrishna Karanam, Vineet Gandhi
Parameterized Blur Kernel Prior Learning for Local Motion Deblurring
Zhenxuan Fang, Fangfang Wu, Tao Huang et al.
The Change You Want To Detect: Semantic Change Detection In Earth Observation With Hybrid Data Generationf
Yanis Benidir, Nicolas Gonthier, Clement Mallet
VideoGigaGAN: Towards Detail-rich Video Super-Resolution
Yiran Xu, Taesung Park, Richard Zhang et al.
MoST: Efficient Monarch Sparse Tuning for 3D Representation Learning
Xu Han, Yuan Tang, Jinfeng Xu et al.
DeformCL: Learning Deformable Centerline Representation for Vessel Extraction in 3D Medical Image
Ziwei Zhao, Zhixing Zhang, Yuhang Liu et al.
Towards Continual Universal Segmentation
Zihan Lin, Zilei Wang, Xu Wang
Multi-Group Proportional Representations for Text-to-Image Models
Sangwon Jung, Alex Oesterling, Claudio Mayrink Verdun et al.
Multimodal Autoregressive Pre-training of Large Vision Encoders
Enrico Fini, Mustafa Shukor, Xiujun Li et al.
GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models with Generative Flow Networks
Haoqiang Kang, Enna Sachdeva, Piyush Gupta et al.
IndoorGS: Geometric Cues Guided Gaussian Splatting for Indoor Scene Reconstruction
Cong Ruan, Yuesong Wang, Bin Zhang et al.
Boosting Point-Supervised Temporal Action Localization through Integrating Query Reformation and Optimal Transport
Mengnan Liu, Le Wang, Sanping Zhou et al.
M^3-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation
Zixuan Chen, Jiaxin Li, Junxuan Liang et al.
FeedEdit: Text-Based Image Editing with Dynamic Feedback Regulation
Fengyi Fu, Lei Zhang, Mengqi Huang et al.
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Models
Yongting Zhang, Lu Chen, Guodong Zheng et al.
EZSR: Event-based Zero-Shot Recognition
Yan Yang, Liyuan Pan, Dongxu Li et al.
DiffLO: Semantic-Aware LiDAR Odometry with Diffusion-Based Refinement
huang yongshu, Chen Liu, Minghang Zhu et al.
High-fidelity 3D Object Generation from Single Image with RGBN-Volume Gaussian Reconstruction Model
Yiyang Shen, Kun Zhou, He Wang et al.