Most Cited CVPR "preprocessing algorithms" Papers
5,589 papers found • Page 21 of 28
Conference
Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression
Xiaoyi Qu, David Aponte, Colby Banbury et al.
Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity
Ruijie Quan, Wenguan Wang, Zhibo Tian et al.
G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images
Zixiong Huang, Qi Chen, Libo Sun et al.
Active Prompt Learning in Vision Language Models
Jihwan Bang, Sumyeong Ahn, Jae-Gil Lee
ProxyTransformation: Preshaping Point Cloud Manifold With Proxy Attention For 3D Visual Grounding
Qihang Peng, Henry Zheng, Gao Huang
DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis
Yuming Gu, Phong Tran, Yujian Zheng et al.
FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training
Anjia Cao, Xing Wei, Zhiheng Ma
Generating Handwritten Mathematical Expressions From Symbol Graphs: An End-to-End Pipeline
Yu chen, Fei Gao, YanguangZhang et al.
On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation
Agneet Chatterjee, Tejas Gokhale, Chitta Baral et al.
SingularTrajectory: Universal Trajectory Predictor Using Diffusion Model
Inhwan Bae, Young-Jae Park, Hae-Gon Jeon
Domain Separation Graph Neural Networks for Saliency Object Ranking
Zijian Wu, Jun Lu, Jing Han et al.
Solving the Catastrophic Forgetting Problem in Generalized Category Discovery
Xinzi Cao, Xiawu Zheng, Guanhong Wang et al.
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models
Qirui Jiao, Daoyuan Chen, Yilun Huang et al.
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving
Yuqi Wang, Jiawei He, Lue Fan et al.
WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models
Changhoon Kim, Kyle Min, Maitreya Patel et al.
MatchU: Matching Unseen Objects for 6D Pose Estimation from RGB-D Images
Junwen Huang, Hao Yu, Kuan-Ting Yu et al.
Resource-Efficient Transformer Pruning for Finetuning of Large Models
Fatih Ilhan, Gong Su, Selim Tekin et al.
Link-Context Learning for Multimodal LLMs
Yan Tai, Weichen Fan, Zhao Zhang et al.
The Manga Whisperer: Automatically Generating Transcriptions for Comics
Ragav Sachdeva, Andrew Zisserman
SGC-Net: Stratified Granular Comparison Network for Open-Vocabulary HOI Detection
Xin Lin, Chong Shi, Zuopeng Yang et al.
Deep-TROJ: An Inference Stage Trojan Insertion Algorithm through Efficient Weight Replacement Attack
Sabbir Ahmed, RANYANG ZHOU, Shaahin Angizi et al.
Dynamic Neural Surfaces for Elastic 4D Shape Representation and Analysis
Awais Nizamani, Hamid Laga, Guanjin Wang et al.
Dynamic LiDAR Re-simulation using Compositional Neural Fields
Hanfeng Wu, Xingxing Zuo, Stefan Leutenegger et al.
Learning with Noisy Triplet Correspondence for Composed Image Retrieval
Shuxian Li, Changhao He, XitingLiu et al.
LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding
Hongyu Li, Jinyu Chen, Ziyu Wei et al.
Language-aware Visual Semantic Distillation for Video Question Answering
Bo Zou, Chao Yang, Yu Qiao et al.
3DInAction: Understanding Human Actions in 3D Point Clouds
Yizhak Ben-Shabat, Oren Shrout, Stephen Gould
AnyMoLe: Any Character Motion In-betweening Leveraging Video Diffusion Models
Kwan Yun, Seokhyeon Hong, Chaelin Kim et al.
DiLiGenRT: A Photometric Stereo Dataset with Quantified Roughness and Translucency
Heng Guo, Jieji Ren, Feishi Wang et al.
StyLitGAN: Image-Based Relighting via Latent Control
Anand Bhattad, James Soole, David Forsyth
Label-Efficient Group Robustness via Out-of-Distribution Concept Curation
Yiwei Yang, Anthony Liu, Robert Wolfe et al.
Explaining in Diffusion: Explaining a Classifier with Diffusion Semantics
Tahira Kazimi, Ritika Allada, Pinar Yanardag
Blurred LiDAR for Sharper 3D: Robust Handheld 3D Scanning with Diffuse LiDAR and RGB
Nikhil Behari, Aaron Young, Siddharth Somasundaram et al.
Unsupervised Universal Image Segmentation
XuDong Wang, Dantong Niu, Xinyang Han et al.
Batch Normalization Alleviates the Spectral Bias in Coordinate Networks
Zhicheng Cai, Hao Zhu, Qiu Shen et al.
Let's Verify and Reinforce Image Generation Step by Step
Renrui Zhang, Chengzhuo Tong, Zhizheng Zhao et al.
Not All Classes Stand on Same Embeddings: Calibrating a Semantic Distance with Metric Tensor
Jae Hyeon Park, Gyoomin Lee, Seunggi Park et al.
CodedEvents: Optimal Point-Spread-Function Engineering for 3D-Tracking with Event Cameras
Sachin Shah, Matthew Chan, Haoming Cai et al.
Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling
Zhe Li, Zerong Zheng, Lizhen Wang et al.
Dynamic Cues-Assisted Transformer for Robust Point Cloud Registration
Hong Chen, Pei Yan, sihe xiang et al.
Retrieval-Augmented Open-Vocabulary Object Detection
Jooyeon Kim, Eulrang Cho, Sehyung Kim et al.
NB-GTR: Narrow-Band Guided Turbulence Removal
Yifei Xia, Chu Zhou, Chengxuan Zhu et al.
LangSplat: 3D Language Gaussian Splatting
Minghan Qin, Wanhua Li, Jiawei ZHOU et al.
Positive-Unlabeled Learning by Latent Group-Aware Meta Disambiguation
Lin Long, Haobo Wang, Zhijie Jiang et al.
Text-conditional Attribute Alignment across Latent Spaces for 3D Controllable Face Image Synthesis
FeiFan Xu, Rui Li, Si Wu et al.
EVPGS: Enhanced View Prior Guidance for Splatting-based Extrapolated View Synthesis
Jiahe Li, Feiyu Wang, Xiaochao Qu et al.
Optical-Flow Guided Prompt Optimization for Coherent Video Generation
Hyelin Nam, Jaemin Kim, Dohun Lee et al.
SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering
Antoine Guédon, Vincent Lepetit
DiffusionPoser: Real-time Human Motion Reconstruction From Arbitrary Sparse Sensors Using Autoregressive Diffusion
Tom Van Wouwe, Seunghwan Lee, Antoine Falisse et al.
HumanRef: Single Image to 3D Human Generation via Reference-Guided Diffusion
Jingbo Zhang, Xiaoyu Li, Qi Zhang et al.
CurveCloudNet: Processing Point Clouds with 1D Structure
Colton Stearns, Alex Fu, Jiateng Liu et al.
Detecting Open World Objects via Partial Attribute Assignment
Muli Yang, Gabriel James Goenawan, Huaiyuan Qin et al.
Harnessing Meta-Learning for Improving Full-Frame Video Stabilization
Muhammad Kashif Ali, Eun Woo Im, Dongjin Kim et al.
OpenMIBOOD: Open Medical Imaging Benchmarks for Out-Of-Distribution Detection
Max Gutbrod, David Rauber, Danilo Weber Nunes et al.
Physical 3D Adversarial Attacks against Monocular Depth Estimation in Autonomous Driving
Junhao Zheng, Chenhao Lin, Jiahao Sun et al.
SeaBird: Segmentation in Bird’s View with Dice Loss Improves Monocular 3D Detection of Large Objects
Abhinav Kumar, Yuliang Guo, Xinyu Huang et al.
CTRL-O: Language-Controllable Object-Centric Visual Representation Learning
Aniket Rajiv Didolkar, Andrii Zadaianchuk, Rabiul Awal et al.
MoML: Online Meta Adaptation for 3D Human Motion Prediction
Xiaoning Sun, Huaijiang Sun, Bin Li et al.
Subnet-Aware Dynamic Supernet Training for Neural Architecture Search
Jeimin Jeon, Youngmin Oh, Junghyup Lee et al.
MARVEL-40M+: Multi-Level Visual Elaboration for High-Fidelity Text-to-3D Content Creation
Sankalp Sinha, Mohammad Sadil Khan, Muhammad Usama et al.
PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation
Qiyao Xue, Xiangyu Yin, Boyuan Yang et al.
Learning with Structural Labels for Learning with Noisy Labels
Noo-ri Kim, Jin-Seop Lee, Jee-Hyong Lee
AutoSSVH: Exploring Automated Frame Sampling for Efficient Self-Supervised Video Hashing
Niu Lian, Jun Li, Jinpeng Wang et al.
What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models
Letian Zhang, Xiaotong Zhai, Zhongkai Zhao et al.
EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion
Haotian Wang, Yuzhe Weng, Yueyan Li et al.
Incremental Nuclei Segmentation from Histopathological Images via Future-class Awareness and Compatibility-inspired Distillation
Huyong Wang, Huisi Wu, Jing Qin
Model Inversion Robustness: Can Transfer Learning Help?
Sy-Tuyen Ho, Koh Jun Hao, Keshigeyan Chandrasegaran et al.
Scene-adaptive and Region-aware Multi-modal Prompt for Open Vocabulary Object Detection
Xiaowei Zhao, Xianglong Liu, Duorui Wang et al.
LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation
Vladan Stojnić, Yannis Kalantidis, Jiri Matas et al.
InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models
Jiun Tian Hoe, Xudong Jiang, Chee Seng Chan et al.
MAGiC-SLAM: Multi-Agent Gaussian Globally Consistent SLAM
Vladimir Yugay, Theo Gevers, Martin R. Oswald
MAP: MAsk-Pruning for Source-Free Model Intellectual Property Protection
Boyang Peng, Sanqing Qu, Yong Wu et al.
Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis
Xin Zhou, Dingkang Liang, Wei Xu et al.
EMCAD: Efficient Multi-scale Convolutional Attention Decoding for Medical Image Segmentation
Md Mostafijur Rahman, Mustafa Munir, Radu Marculescu
Arbitrary-steps Image Super-resolution via Diffusion Inversion
Zongsheng Yue, Kang Liao, Chen Change Loy
On Exact Inversion of DPM-Solvers
Seongmin Hong, Kyeonghyun Lee, Suh Yoon Jeon et al.
Generate Like Experts: Multi-Stage Font Generation by Incorporating Font Transfer Process into Diffusion Models
Bin Fu, Fanghua Yu, Anran Liu et al.
A Unified Diffusion Framework for Scene-aware Human Motion Estimation from Sparse Signals
Jiangnan Tang, Jingya Wang, Kaiyang Ji et al.
PICTURE: PhotorealistIC virtual Try-on from UnconstRained dEsigns
Shuliang Ning, Duomin Wang, Yipeng Qin et al.
MaskCLR: Attention-Guided Contrastive Learning for Robust Action Representation Learning
Mohamed Abdelfattah, Mariam Hassan, Alex Alahi
AutoLUT: LUT-Based Image Super-Resolution with Automatic Sampling and Adaptive Residual Learning
Yuheng Xu, Shijie Yang, Xin Liu et al.
D3T: Distinctive Dual-Domain Teacher Zigzagging Across RGB-Thermal Gap for Domain-Adaptive Object Detection
Dinh Phat Do, Taehoon Kim, JAEMIN NA et al.
MAGICK: A Large-scale Captioned Dataset from Matting Generated Images using Chroma Keying
Ryan Burgert, Brian Price, Jason Kuen et al.
Intrinsic Image Diffusion for Indoor Single-view Material Estimation
Peter Kocsis, Vincent Sitzmann, Matthias Nießner
Prompt Highlighter: Interactive Control for Multi-Modal LLMs
Yuechen Zhang, Shengju Qian, Bohao Peng et al.
Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation
Henghui Du, Guangyao Li, Chang Zhou et al.
Can Protective Perturbation Safeguard Personal Data from Being Exploited by Stable Diffusion?
Zhengyue Zhao, Jinhao Duan, Kaidi Xu et al.
NetTrack: Tracking Highly Dynamic Objects with a Net
Guangze Zheng, Shijie Lin, Haobo Zuo et al.
Scaling Up Video Summarization Pretraining with Large Language Models
Dawit Argaw Argaw, Seunghyun Yoon, Fabian Caba Heilbron et al.
Video Recognition in Portrait Mode
Mingfei Han, Linjie Yang, Xiaojie Jin et al.
Online Task-Free Continual Generative and Discriminative Learning via Dynamic Cluster Memory
飞 叶, Adrian Bors
FADES: Fair Disentanglement with Sensitive Relevance
Taeuk Jang, Xiaoqian Wang
Versatile Navigation Under Partial Observability via Value-guided Diffusion Policy
Gengyu Zhang, Hao Tang, Yan Yan
Improving Depth Completion via Depth Feature Upsampling
Yufei Wang, Ge Zhang, Shaoqian Wang et al.
EgoLife: Towards Egocentric Life Assistant
Jingkang Yang, Shuai Liu, Hongming Guo et al.
Deep Single Image Camera Calibration by Heatmap Regression to Recover Fisheye Images Under Manhattan World Assumption
Nobuhiko Wakai, Satoshi Sato, Yasunori Ishii et al.
Convex Relaxation for Robust Vanishing Point Estimation in Manhattan World
Bangyan Liao, Zhenjun Zhao, Haoang Li et al.
StyleStudio: Text-Driven Style Transfer with Selective Control of Style Elements
Mingkun Lei, Xue Song, Beier Zhu et al.
MRFS: Mutually Reinforcing Image Fusion and Segmentation
HAO ZHANG, Xuhui Zuo, Jie Jiang et al.
Multi-agent Long-term 3D Human Pose Forecasting via Interaction-aware Trajectory Conditioning
Jaewoo Jeong, Daehee Park, Kuk-Jin Yoon
Invisible Backdoor Attack against Self-supervised Learning
Hanrong Zhang, Zhenting Wang, Boheng Li et al.
OrCo: Towards Better Generalization via Orthogonality and Contrast for Few-Shot Class-Incremental Learning
Noor Ahmed, Anna Kukleva, Bernt Schiele
3D-LFM: Lifting Foundation Model
Mosam Dabhi, László A. Jeni, Simon Lucey
SketchFusion: Learning Universal Sketch Features through Fusing Foundation Models
Subhadeep Koley, Tapas Kumar Dutta, Aneeshan Sain et al.
CoMapGS: Covisibility Map-based Gaussian Splatting for Sparse Novel View Synthesis
Youngkyoon Jang, Eduardo Pérez-Pellitero
Interactive Medical Image Analysis with Concept-based Similarity Reasoning
Ta Duc Huy, Sen Kim Tran, Phan Nguyen et al.
LASIL: Learner-Aware Supervised Imitation Learning For Long-term Microscopic Traffic Simulation
Ke Guo, Zhenwei Miao, Wei Jing et al.
Steepest Descent Density Control for Compact 3D Gaussian Splatting
Peihao Wang, Yuehao Wang, Dilin Wang et al.
HybridNeRF: Efficient Neural Rendering via Adaptive Volumetric Surfaces
Haithem Turki, Vasu Agrawal, Samuel Rota Bulò et al.
IIRP-Net: Iterative Inference Residual Pyramid Network for Enhanced Image Registration
Tai Ma, zhangsuwei, Jiafeng Li et al.
SEED-Bench: Benchmarking Multimodal Large Language Models
Bohao Li, Yuying Ge, Yixiao Ge et al.
Style Aligned Image Generation via Shared Attention
Amir Hertz, Andrey Voynov, Shlomi Fruchter et al.
NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows
Zhenggang Tang, Jason Ren, Xiaoming Zhao et al.
InterMimic: Towards Universal Whole-Body Control for Physics-Based Human-Object Interactions
Sirui Xu, Hung Yu Ling, Yu-Xiong Wang et al.
EdgeMovingNet: Edge-preserving Point Cloud Reconstruction via Joint Geometry Features
Xinran Yang, Donghao Ji, Yuanqi Li et al.
MonoDGP: Monocular 3D Object Detection with Decoupled-Query and Geometry-Error Priors
Fanqi Pu, Yifan Wang, Jiru Deng et al.
BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models
Fengyuan Shi, Jiaxi Gu, Hang Xu et al.
PhysVLM: Enabling Visual Language Models to Understand Robotic Physical Reachability
Weijie Zhou, Manli Tao, Chaoyang Zhao et al.
Active Domain Adaptation with False Negative Prediction for Object Detection
Yuzuru Nakamura, Yasunori Ishii, Takayoshi Yamashita
LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content
Qihao Zhao, Yalun Dai, Hao Li et al.
How to Train Neural Field Representations: A Comprehensive Study and Benchmark
Samuele Papa, Riccardo Valperga, David Knigge et al.
Preserving Clusters in Prompt Learning for Unsupervised Domain Adaptation
Long Tung Vuong, Hoang Phan, Vy Vo et al.
Controllable Human Image Generation with Personalized Multi-Garments
Yisol Choi, Sangkyung Kwak, Sihyun Yu et al.
Reconstructing Animals and the Wild
Peter Kulits, Michael J. Black, Silvia Zuffi
Semantic-Aware Multi-Label Adversarial Attacks
Hassan Mahmood, Ehsan Elhamifar
Motion-adaptive Separable Collaborative Filters for Blind Motion Deblurring
Chengxu Liu, Xuan Wang, Xiangyu Xu et al.
SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation
Zhixuan Liu, Peter Schaldenbrand, Beverley-Claire Okogwu et al.
InteractVLM: 3D Interaction Reasoning from 2D Foundational Models
Sai Kumar Dwivedi, Dimitrije Antić, Shashank Tripathi et al.
CoralSCOP: Segment any COral Image on this Planet
Zheng Ziqiang, Liang Haixin, Binh-Son Hua et al.
Reg-PTQ: Regression-specialized Post-training Quantization for Fully Quantized Object Detector
Yifu Ding, Weilun Feng, Chuyan Chen et al.
Rethinking Epistemic and Aleatoric Uncertainty for Active Open-Set Annotation: An Energy-Based Approach
Chen-Chen Zong, Sheng-Jun Huang
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos
Wenbo Hu, Xiangjun Gao, Xiaoyu Li et al.
FREE: Faster and Better Data-Free Meta-Learning
Yongxian Wei, Zixuan Hu, Zhenyi Wang et al.
SeaLion: Semantic Part-Aware Latent Point Diffusion Models for 3D Generation
Dekai Zhu, Yan Di, Stefan Gavranovic et al.
Open Vocabulary Semantic Scene Sketch Understanding
Ahmed Bourouis, Judith Fan, Yulia Gryaditskaya
You Only Need Less Attention at Each Stage in Vision Transformers
Shuoxi Zhang, Hanpeng Liu, Stephen Lin et al.
Hierarchical Patch Diffusion Models for High-Resolution Video Generation
Ivan Skorokhodov, Willi Menapace, Aliaksandr Siarohin et al.
Rethinking the Up-Sampling Operations in CNN-based Generative Network for Generalizable Deepfake Detection
Chuangchuang Tan, Huan Liu, Yao Zhao et al.
Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception
Haoming Chen, Zhizhong Zhang, Yanyun Qu et al.
ActiveGAMER: Active GAussian Mapping through Efficient Rendering
Liyan Chen, Huangying Zhan, Kevin Chen et al.
BoQ: A Place is Worth a Bag of Learnable Queries
Amar Ali-bey, Brahim Chaib-draa, Philippe Giguère
PromptHash: Affinity-Prompted Collaborative Cross-Modal Learning for Adaptive Hashing Retrieval
Qiang Zou, Shuli Cheng, Jiayi Chen
UFC-Net: Unrolling Fixed-point Continuous Network for Deep Compressive Sensing
Xiaoyang Wang, Hongping Gan
Symphonize 3D Semantic Scene Completion with Contextual Instance Queries
Haoyi Jiang, Tianheng Cheng, Naiyu Gao et al.
GET: Unlocking the Multi-modal Potential of CLIP for Generalized Category Discovery
Enguang Wang, Zhimao Peng, Zhengyuan Xie et al.
Exploration-Driven Generative Interactive Environments
Nedko Savov, Naser Kazemi, Mohammad Mahdi et al.
CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment
Sajid Javed, Arif Mahmood, IYYAKUTTI IYAPPAN GANAPATHI et al.
Extreme Rotation Estimation in the Wild
Hana Bezalel, Dotan Ankri, Ruojin Cai et al.
Flash3D: Super-scaling Point Transformers through Joint Hardware-Geometry Locality
Liyan Chen, Gregory P. Meyer, Zaiwei Zhang et al.
ProHOC: Probabilistic Hierarchical Out-of-Distribution Classification via Multi-Depth Networks
Erik Wallin, Fredrik Kahl, Lars Hammarstrand
Motion Prompting: Controlling Video Generation with Motion Trajectories
Daniel Geng, Charles Herrmann, Junhwa Hur et al.
EditSplat: Multi-View Fusion and Attention-Guided Optimization for View-Consistent 3D Scene Editing with 3D Gaussian Splatting
Dong In Lee, Hyeongcheol Park, Jiyoung Seo et al.
MaskPLAN: Masked Generative Layout Planning from Partial Input
Hang Zhang, Anton Savov, Benjamin Dillenburger
Solving Masked Jigsaw Puzzles with Diffusion Vision Transformers
Jinyang Liu, Wondmgezahu Teshome, Sandesh Ghimire et al.
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
Hongjie Wang, Chih-Yao Ma, Yen-Cheng Liu et al.
Towards Memorization-Free Diffusion Models
Chen Chen, Daochang Liu, Chang Xu
Volumetrically Consistent 3D Gaussian Rasterization
Chinmay Talegaonkar, Yash Belhe, Ravi Ramamoorthi et al.
AV-RIR: Audio-Visual Room Impulse Response Estimation
Anton Ratnarajah, Sreyan Ghosh, Sonal Kumar et al.
Entangled View-Epipolar Information Aggregation for Generalizable Neural Radiance Fields
Zhiyuan Min, Yawei Luo, Wei Yang et al.
A-Teacher: Asymmetric Network for 3D Semi-Supervised Object Detection
Hanshi Wang, Zhipeng Zhang, Jin Gao et al.
HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances
Supreeth Narasimhaswamy, Uttaran Bhattacharya, Xiang Chen et al.
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Bin Xiao, Haiping Wu, Weijian Xu et al.
From Zero to Detail: Deconstructing Ultra-High-Definition Image Restoration from Progressive Spectral Perspective
Chen Zhao, Zhizhou Chen, Yunzhe Xu et al.
DMR: Decomposed Multi-Modality Representations for Frames and Events Fusion in Visual Reinforcement Learning
Haoran Xu, Peixi Peng, Guang Tan et al.
3D Feature Tracking via Event Camera
Siqi Li, Zhou Zhikuan, Zhou Xue et al.
Frequency-aware Event-based Video Deblurring for Real-World Motion Blur
Taewoo Kim, Hoonhee Cho, Kuk-Jin Yoon
FedHCA2: Towards Hetero-Client Federated Multi-Task Learning
Yuxiang Lu, Suizhi Huang, Yuwen Yang et al.
Improving Unsupervised Hierarchical Representation with Reinforcement Learning
Ruyi An, Yewen Li, Xu He et al.
FreeCloth: Free-form Generation Enhances Challenging Clothed Human Modeling
Hang Ye, Xiaoxuan Ma, Hai Ci et al.
Global Latent Neural Rendering
Thomas Tanay, Matteo Maggioni
Data Poisoning based Backdoor Attacks to Contrastive Learning
Jinghuai Zhang, Hongbin Liu, Jinyuan Jia et al.
Progressive Focused Transformer for Single Image Super-Resolution
Wei Long, Xingyu Zhou, Leheng Zhang et al.
RoHM: Robust Human Motion Reconstruction via Diffusion
Siwei Zhang, Bharat Lal Bhatnagar, Yuanlu Xu et al.
KVQ: Boosting Video Quality Assessment via Saliency-guided Local Perception
Yunpeng Qu, Kun Yuan, Qizhi Xie et al.
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos
Tao Wu, Runyu He, Gangshan Wu et al.
Classes Are Not Equal: An Empirical Study on Image Recognition Fairness
Jiequan Cui, Beier Zhu, Xin Wen et al.
Efficient Personalization of Quantized Diffusion Model without Backpropagation
Hoigi Seo, Wongi Jeong, Kyungryeol Lee et al.
From Sparse Signal to Smooth Motion: Real-Time Motion Generation with Rolling Prediction Models
German Barquero, Nadine Bertsch, Manojkumar Marramreddy et al.
ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object
Chenshuang Zhang, Fei Pan, Junmo Kim et al.
Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents
Yunseok Jang, Yeda Song, Sungryull Sohn et al.
BlockGCN: Redefine Topology Awareness for Skeleton-Based Action Recognition
Yuxuan Zhou, Xudong Yan, Zhi-Qi Cheng et al.
Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors
Yu Zhang, Songpengcheng Xia, Lei Chu et al.
Person-in-WiFi 3D: End-to-End Multi-Person 3D Pose Estimation with Wi-Fi
Kangwei Yan, Fei Wang, Bo Qian et al.
Bridging the Vision-Brain Gap with an Uncertainty-Aware Blur Prior
Haitao Wu, Qing Li, Changqing Zhang et al.
ERMVP: Communication-Efficient and Collaboration-Robust Multi-Vehicle Perception in Challenging Environments
Jingyu Zhang, Kun Yang, Yilei Wang et al.
GRAM: Global Reasoning for Multi-Page VQA
Itshak Blau, Sharon Fogel, Roi Ronen et al.
Free Lunch Enhancements for Multi-modal Crowd Counting
Haoliang Meng, Xiaopeng Hong, Zhengqin Lai et al.
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data
Qifan Yu, Juncheng Li, Longhui Wei et al.
Tri-Perspective View Decomposition for Geometry-Aware Depth Completion
Zhiqiang Yan, Yuankai Lin, Kun Wang et al.
Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding
Zining Wang, Tongkun Guan, Pei Fu et al.
MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric
Haokun Lin, Haoli Bai, Zhili Liu et al.
PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting
Alex Hanson, Allen Tu, Vasu Singla et al.
Immune: Improving Safety Against Jailbreaks in Multi-modal LLMs via Inference-Time Alignment
Soumya Suvra Ghosal, Souradip Chakraborty, Vaibhav Singh et al.
DiffusionRegPose: Enhancing Multi-Person Pose Estimation using a Diffusion-Based End-to-End Regression Approach
Dayi Tan, Hansheng Chen, Wei Tian et al.
PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models
Chenyu Yang, Xuan Dong, Xizhou Zhu et al.
Tumor Micro-environment Interactions Guided Graph Learning for Survival Analysis of Human Cancers from Whole-slide Pathological Images
WEI SHAO, YangYang Shi, Daoqiang Zhang et al.
Perception-Oriented Video Frame Interpolation via Asymmetric Blending
Guangyang Wu, Xin Tao, Changlin Li et al.
Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
Qi Yang, Xing Nie, Tong Li et al.
Exact Fusion via Feature Distribution Matching for Few-shot Image Generation
Yingbo Zhou, Yutong Ye, Pengyu Zhang et al.
Fooling Polarization-Based Vision using Locally Controllable Polarizing Projection
Zhuoxiao Li, Zhihang Zhong, Shohei Nobuhara et al.