Most Cited CVPR "sub-gaussian design" Papers
5,589 papers found • Page 5 of 28
Conference
Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation
Razvan Pasca, Alexey Gavryushin, Muhammad Hamza et al.
Domain Prompt Learning with Quaternion Networks
Qinglong Cao, Zhengqin Xu, Yuntian Chen et al.
Rethinking Multi-view Representation Learning via Distilled Disentangling
Guanzhou Ke, Bo Wang, Xiao-Li Wang et al.
Unifying Top-down and Bottom-up Scanpath Prediction Using Transformers
Zhibo Yang, Sounak Mondal, Seoyoung Ahn et al.
Seeing Far and Clearly: Mitigating Hallucinations in MLLMs with Attention Causal Decoding
feilong tang, Chengzhi Liu, Zhongxing Xu et al.
Semantic-aware SAM for Point-Prompted Instance Segmentation
Zhaoyang Wei, Pengfei Chen, Xuehui Yu et al.
CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology
Yuxuan Sun, Yixuan Si, Chenglu Zhu et al.
SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation
Yunxiang Fu, Meng Lou, Yizhou Yu
F-LMM: Grounding Frozen Large Multimodal Models
Size Wu, Sheng Jin, Wenwei Zhang et al.
Large Language Models are Good Prompt Learners for Low-Shot Image Classification
Zhaoheng Zheng, Jingmin Wei, Xuefeng Hu et al.
Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation
Yudi Shi, Shangzhe Di, Qirui Chen et al.
StegoGAN: Leveraging Steganography for Non-Bijective Image-to-Image Translation
Sidi Wu, Yizi Chen, Loic Landrieu et al.
Material Anything: Generating Materials for Any 3D Object via Diffusion
Xin Huang, Tengfei Wang, Ziwei Liu et al.
GEARS: Local Geometry-aware Hand-object Interaction Synthesis
Keyang Zhou, Bharat Lal Bhatnagar, Jan Lenssen et al.
GAFusion: Adaptive Fusing LiDAR and Camera with Multiple Guidance for 3D Object Detection
Xiaotian Li, Baojie Fan, Jiandong Tian et al.
BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training
Xuanpu Zhang, Dan Song, pengxin zhan et al.
Category-Level Multi-Part Multi-Joint 3D Shape Assembly
Yichen Li, Kaichun Mo, Yueqi Duan et al.
Mind the Time: Temporally-Controlled Multi-Event Video Generation
Ziyi Wu, Aliaksandr Siarohin, Willi Menapace et al.
Spatio-Temporal Turbulence Mitigation: A Translational Perspective
Xingguang Zhang, Nicholas M Chimitt, Yiheng Chi et al.
HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations
Peng Dai, Yang Zhang, Tao Liu et al.
Language-Driven Anchors for Zero-Shot Adversarial Robustness
Xiao Li, Wei Zhang, Yining Liu et al.
PrPSeg: Universal Proposition Learning for Panoramic Renal Pathology Segmentation
Ruining Deng, Quan Liu, Can Cui et al.
Real-time 3D-aware Portrait Video Relighting
Ziqi Cai, Kaiwen Jiang, Shu-Yu Chen et al.
Guided Slot Attention for Unsupervised Video Object Segmentation
Minhyeok Lee, Suhwan Cho, Dogyoon Lee et al.
Rethinking Few-shot 3D Point Cloud Semantic Segmentation
Zhaochong An, Guolei Sun, Yun Liu et al.
MonoHair: High-Fidelity Hair Modeling from a Monocular Video
Keyu Wu, LINGCHEN YANG, Zhiyi Kuang et al.
Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy
Zaijing Li, Yuquan Xie, Rui Shao et al.
NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors
Yannan He, Garvita Tiwari, Tolga Birdal et al.
PracticalDG: Perturbation Distillation on Vision-Language Models for Hybrid Domain Generalization
Zining Chen, Weiqiu Wang, Zhicheng Zhao et al.
Composed Video Retrieval via Enriched Context and Discriminative Embeddings
Omkar Thawakar, Muzammal Naseer, Rao Anwer et al.
GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion
Jiapeng Tang, Davide Davoli, Tobias Kirschstein et al.
PromptHMR: Promptable Human Mesh Recovery
Yufu Wang, Yu Sun, Priyanka Patel et al.
Self-Supervised Multi-Object Tracking with Path Consistency
Zijia Lu, Bing Shuai, Yanbei Chen et al.
Neural Spline Fields for Burst Image Fusion and Layer Separation
Ilya Chugunov, David Shustin, Ruyu Yan et al.
Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval
Young Kyun Jang, Donghyun Kim, Zihang Meng et al.
SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE
YONGWEI CHEN, Yushi Lan, Shangchen Zhou et al.
PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering
Yifan Gao, Zihang Lin, Chuanbin Liu et al.
Unlocking the Potential of Prompt-Tuning in Bridging Generalized and Personalized Federated Learning
wenlong deng, Christos Thrampoulidis, Xiaoxiao Li
VideoRF: Rendering Dynamic Radiance Fields as 2D Feature Video Streams
Liao Wang, Kaixin Yao, Chengcheng Guo et al.
Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach
Mir Rayat Imtiaz Hossain, Mennatullah Siam, Leonid Sigal et al.
GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmentation
Lang Lin, Xueyang Yu, Ziqi Pang et al.
TI2V-Zero: Zero-Shot Image Conditioning for Text-to-Video Diffusion Models
Haomiao Ni, Bernhard Egger, Suhas Lohit et al.
UVEB: A Large-scale Benchmark and Baseline Towards Real-World Underwater Video Enhancement
yaofeng xie, Lingwei Kong, Kai Chen et al.
Matrix3D: Large Photogrammetry Model All-in-One
Yuanxun Lu, Jingyang Zhang, Tian Fang et al.
Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward Text-to-3D Scene Generation
Yuanbo Yang, Jiahao Shao, Xinyang Li et al.
DreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling
Linqi Zhou, Andy Shih, Chenlin Meng et al.
MC^2: Multi-concept Guidance for Customized Multi-concept Generation
Jiaxiu Jiang, Yabo Zhang, Kailai Feng et al.
Clustering Propagation for Universal Medical Image Segmentation
Yuhang Ding, Liulei Li, Wenguan Wang et al.
3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation
Songchun Zhang, Yibo Zhang, Quan Zheng et al.
Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning
Desai Xie, Jiahao Li, Hao Tan et al.
TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model
Cheng Yang, Yang Sui, Jinqi Xiao et al.
HyperFree: A Channel-adaptive and Tuning-free Foundation Model for Hyperspectral Remote Sensing Imagery
Jingtao Li, Yingyi Liu, XINYU WANG et al.
Dual Prior Unfolding for Snapshot Compressive Imaging
Jiancheng Zhang, Haijin Zeng, Jiezhang Cao et al.
Improving Plasticity in Online Continual Learning via Collaborative Learning
Maorong Wang, Nicolas Michel, Ling Xiao et al.
Generative Image Layer Decomposition with Visual Effects
Jinrui Yang, Qing Liu, Yijun Li et al.
VideoMAC: Video Masked Autoencoders Meet ConvNets
Gensheng Pei, Tao Chen, Xiruo Jiang et al.
Adv-CPG: A Customized Portrait Generation Framework with Facial Adversarial Attacks
Junying Wang, Hongyuan Zhang, Yuan Yuan
Loose Inertial Poser: Motion Capture with IMU-attached Loose-Wear Jacket
Chengxu Zuo, Yiming Wang, Lishuang Zhan et al.
Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use
Imad Eddine Toubal, Aditya Avinash, Neil Alldrin et al.
One-Class Face Anti-spoofing via Spoof Cue Map-Guided Feature Learning
Pei-Kai Huang, Cheng-Hsuan Chiang, Tzu-Hsien Chen et al.
Video Depth without Video Models
Bingxin Ke, Dominik Narnhofer, Shengyu Huang et al.
ZePT: Zero-Shot Pan-Tumor Segmentation via Query-Disentangling and Self-Prompting
Yankai Jiang, Zhongzhen Huang, Rongzhao Zhang et al.
SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device
Yushu Wu, Zhixing Zhang, Yanyu Li et al.
Steerers: A Framework for Rotation Equivariant Keypoint Descriptors
Georg Bökman, Johan Edstedt, Michael Felsberg et al.
AlignMiF: Geometry-Aligned Multimodal Implicit Field for LiDAR-Camera Joint Synthesis
Tao Tang, Guangrun Wang, Yixing Lao et al.
ASAM: Boosting Segment Anything Model with Adversarial Tuning
Bo Li, Haoke Xiao, Lv Tang
SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video
Jongmin Park, Minh-Quan Viet Bui, Juan Luis Gonzalez Bello et al.
Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient
Zigeng Chen, Xinyin Ma, Gongfan Fang et al.
NeRF-HuGS: Improved Neural Radiance Fields in Non-static Scenes Using Heuristics-Guided Segmentation
Jiahao Chen, Yipeng Qin, Lingjie Liu et al.
FlowTrack: Revisiting Optical Flow for Long-Range Dense Tracking
Seokju Cho, Gabriel Huang, Seungryong Kim et al.
Instant Gaussian Stream: Fast and Generalizable Streaming of Dynamic Scene Reconstruction via Gaussian Splatting
Jinbo Yan, Rui Peng, Zhiyan Wang et al.
You'll Never Walk Alone: A Sketch and Text Duet for Fine-Grained Image Retrieval
Subhadeep Koley, Ayan Kumar Bhunia, Aneeshan Sain et al.
360Loc: A Dataset and Benchmark for Omnidirectional Visual Localization with Cross-device Queries
Huajian Huang, Changkun Liu, Yipeng Zhu et al.
DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness
Yiming Zhong, Qi Jiang, Jingyi Yu et al.
Generative Video Propagation
Shaoteng Liu, Tianyu Wang, Jui-Hsien Wang et al.
OccMamba: Semantic Occupancy Prediction with State Space Models
Heng Li, Yuenan Hou, Xiaohan Xing et al.
Enhanced Contrastive Learning with Multi-view Longitudinal Data for Chest X-ray Report Generation
Kang Liu, Zhuoqi Ma, Xiaolu Kang et al.
D^3: Scaling Up Deepfake Detection by Learning from Discrepancy
Yongqi Yang, Zhihao Qian, Ye Zhu et al.
DIBS: Enhancing Dense Video Captioning with Unlabeled Videos via Pseudo Boundary Enrichment and Online Refinement
Hao Wu, Huabin Liu, Yu Qiao et al.
A Call to Reflect on Evaluation Practices for Age Estimation: Comparative Analysis of the State-of-the-Art and a Unified Benchmark
Jakub Paplham, Vojtech Franc
Distilling Vision-Language Models on Millions of Videos
Yue Zhao, Long Zhao, Xingyi Zhou et al.
Long-Tailed Anomaly Detection with Learnable Class Names
Chih-Hui Ho, Kuan-Chuan Peng, Nuno Vasconcelos
LSNet: See Large, Focus Small
Ao Wang, Hui Chen, Zijia Lin et al.
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation
Yifan Pu, Yiming Zhao, Zhicong Tang et al.
Structure-Guided Adversarial Training of Diffusion Models
Ling Yang, Haotian Qian, Zhilong Zhang et al.
2DMamba: Efficient State Space Model for Image Representation with Applications on Giga-Pixel Whole Slide Image Classification
Jingwei Zhang, Anh Tien Nguyen, Xi Han et al.
TeTriRF: Temporal Tri-Plane Radiance Fields for Efficient Free-Viewpoint Video
Minye Wu, Zehao Wang, Georgios Kouros et al.
Learning to Predict Activity Progress by Self-Supervised Video Alignment
Gerard Donahue, Ehsan Elhamifar
Multi-Sensor Object Anomaly Detection: Unifying Appearance, Geometry, and Internal Properties
wenqiao Li, BoZhong Zheng, Xiaohao Xu et al.
PerAda: Parameter-Efficient Federated Learning Personalization with Generalization Guarantees
Chulin Xie, De-An Huang, Wenda Chu et al.
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices
Xudong LU, Yinghao Chen, chencheng Chen et al.
Taming Teacher Forcing for Masked Autoregressive Video Generation
Deyu Zhou, Quan Sun, Yuang Peng et al.
An Upload-Efficient Scheme for Transferring Knowledge From a Server-Side Pre-trained Generator to Clients in Heterogeneous Federated Learning
Jianqing Zhang, Yang Liu, Yang Hua et al.
Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions
Zeyu Han, Fangrui Zhu, Qianru Lao et al.
Grid Diffusion Models for Text-to-Video Generation
Taegyeong Lee, Soyeong Kwon, Taehwan Kim
Multi-Level Neural Scene Graphs for Dynamic Urban Environments
Tobias Fischer, Lorenzo Porzi, Samuel Rota Bulò et al.
Any-Resolution AI-Generated Image Detection by Spectral Learning
Dimitrios Karageorgiou, Symeon Papadopoulos, Ioannis Kompatsiaris et al.
Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models
Andreas Müller, Denis Lukovnikov, Jonas Thietke et al.
MLP Can Be A Good Transformer Learner
Sihao Lin, Pumeng Lyu, Dongrui Liu et al.
NViST: In the Wild New View Synthesis from a Single Image with Transformers
Wonbong Jang, Lourdes Agapito
Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards
Zijing Hu, Fengda Zhang, Long Chen et al.
Dexterous Grasp Transformer
Guo-Hao Xu, Yi-Lin Wei, Dian Zheng et al.
DreamComposer: Controllable 3D Object Generation via Multi-View Conditions
Yunhan Yang, Yukun Huang, Xiaoyang Wu et al.
Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought
Yunze Man, De-An Huang, Guilin Liu et al.
Theoretically Achieving Continuous Representation of Oriented Bounding Boxes
Zikai Xiao, Guo-Ye Yang, Xue Yang et al.
As-Plausible-As-Possible: Plausibility-Aware Mesh Deformation Using 2D Diffusion Priors
Seungwoo Yoo, Kunho Kim, Vladimir G. Kim et al.
Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection
Jiacheng Zhang, Jiaming Li, Xiangru Lin et al.
The Devil is in Temporal Token: High Quality Video Reasoning Segmentation
Sitong Gong, Yunzhi Zhuge, Lu Zhang et al.
LAN: Learning to Adapt Noise for Image Denoising
Changjin Kim, Tae Hyun Kim, Sungyong Baik
ODIN: A Single Model for 2D and 3D Segmentation
Ayush Jain, Pushkal Katara, Nikolaos Gkanatsios et al.
OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
Pengfei Zhou, Xiaopeng Peng, Jiajun Song et al.
Segmenting Maxillofacial Structures in CBCT Volumes
Federico Bolelli, Kevin Marchesini, Niels van Nistelrooij et al.
Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly
Yexin Liu, Zhengyang Liang, Yueze Wang et al.
Joint Reconstruction of 3D Human and Object via Contact-Based Refinement Transformer
Hyeongjin Nam, Daniel Jung, Gyeongsik Moon et al.
Feat2GS: Probing Visual Foundation Models with Gaussian Splatting
Yue Chen, Xingyu Chen, Anpei Chen et al.
Distraction is All You Need for Multimodal Large Language Model Jailbreaking
Zuopeng Yang, Jiluan Fan, Anli Yan et al.
QMambaBSR: Burst Image Super-Resolution with Query State Space Model
Xin Di, Long Peng, Peizhe Xia et al.
Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation
Ji-Jia Wu, Andy Chia-Hao Chang, Chieh-Yu Chuang et al.
DSL-FIQA: Assessing Facial Image Quality via Dual-Set Degradation Learning and Landmark-Guided Transformer
Wei-Ting Chen, Gurunandan Krishnan, Qiang Gao et al.
ParamISP: Learned Forward and Inverse ISPs using Camera Parameters
Woohyeok Kim, Geonu Kim, Junyong Lee et al.
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
ziang yan, Zhilin Li, Yinan He et al.
Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation
Xinyao Li, Yuke Li, Zhekai Du et al.
Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification
Chao Yi, Lu Ren, De-Chuan Zhan et al.
GlitchBench: Can Large Multimodal Models Detect Video Game Glitches?
Mohammad Reza Taesiri, Tianjun Feng, Cor-Paul Bezemer et al.
Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders
Fiona Ryan, Ajay Bati, Sangmin Lee et al.
Mixed-Precision Quantization for Federated Learning on Resource-Constrained Heterogeneous Devices
Huancheng Chen, Haris Vikalo
The Unreasonable Effectiveness of Pre-Trained Features for Camera Pose Refinement
Gabriele Trivigno, Carlo Masone, Barbara Caputo et al.
SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis
Hyojun Go, byeongjun park, Jiho Jang et al.
RecDiffusion: Rectangling for Image Stitching with Diffusion Models
Tianhao Zhou, Li Haipeng, Ziyi Wang et al.
Improving Transferable Targeted Adversarial Attacks with Model Self-Enhancement
Han Wu, Guanyan Ou, Weibin Wu et al.
Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking
You Wu, Xucheng Wang, Xiangyang Yang et al.
Task-driven Image Fusion with Learnable Fusion Loss
Haowen Bai, Jiangshe Zhang, Zixiang Zhao et al.
Unsupervised Keypoints from Pretrained Diffusion Models
Eric Hedlin, Gopal Sharma, Shweta Mahajan et al.
LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels
Tuo Feng, Wenguan Wang, Fan Ma et al.
LiDAR-based Person Re-identification
Wenxuan Guo, Zhiyu Pan, Yingping Liang et al.
Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation
Zilyu Ye, Zhiyang Chen, Tiancheng Li et al.
CA-Jaccard: Camera-aware Jaccard Distance for Person Re-identification
Yiyu Chen, Zheyi Fan, Zhaoru Chen et al.
LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion
Pancheng Zhao, Peng Xu, Pengda Qin et al.
Robust Image Denoising through Adversarial Frequency Mixup
Donghun Ryou, Inju Ha, Hyewon Yoo et al.
Improved Video VAE for Latent Video Diffusion Model
Pingyu Wu, Kai Zhu, Yu Liu et al.
MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D
Wei Cheng, Juncheng Mu, Xianfang Zeng et al.
EmoEdit: Evoking Emotions through Image Manipulation
Jingyuan Yang, Jiawei Feng, Weibin Luo et al.
Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach
Wei Dong, Xing Zhang, Bihui Chen et al.
DriveGPT4-V2: Harnessing Large Language Model Capabilities for Enhanced Closed-Loop Autonomous Driving
Zhenhua Xu, Yan Bai, Yujia Zhang et al.
Scalable 3D Registration via Truncated Entry-wise Absolute Residuals
Tianyu Huang, Liangzu Peng, Rene Vidal et al.
DesignDiffusion: High-Quality Text-to-Design Image Generation with Diffusion Models
Zhendong Wang, Jianmin Bao, Shuyang Gu et al.
LaVin-DiT: Large Vision Diffusion Transformer
Zhaoqing Wang, Xiaobo Xia, Runnan Chen et al.
TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation
Hongxiang Zhao, Xingchen Liu, Mutian Xu et al.
VTQA: Visual Text Question Answering via Entity Alignment and Cross-Media Reasoning
Kang Chen, Xiangqian Wu
Discriminability-Driven Channel Selection for Out-of-Distribution Detection
Yue Yuan, Rundong He, Yicong Dong et al.
CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset
Xiao Wang, Fuling Wang, Yuehang Li et al.
HOISDF: Constraining 3D Hand-Object Pose Estimation with Global Signed Distance Fields
Haozhe Qi, Chen Zhao, Mathieu Salzmann et al.
ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain Retrieval
Fang Kaipeng, Jingkuan Song, Lianli Gao et al.
Textured Gaussians for Enhanced 3D Scene Appearance Modeling
Brian Chao, Hung-Yu Tseng, Lorenzo Porzi et al.
Exploring Efficient Asymmetric Blind-Spots for Self-Supervised Denoising in Real-World Scenarios
Shiyan Chen, Jiyuan Zhang, Zhaofei Yu et al.
UniK3D: Universal Camera Monocular 3D Estimation
Luigi Piccinelli, Christos Sakaridis, Mattia Segu et al.
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
Kaihang Pan, Wang Lin, Zhongqi Yue et al.
DriveTrack: A Benchmark for Long-Range Point Tracking in Real-World Videos
Arjun Balasingam, Joseph Chandler, Chenning Li et al.
Open-Set Domain Adaptation for Semantic Segmentation
Seun-An Choe, Ah-Hyung Shin, Keon Hee Park et al.
TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting
Bojun Xiong, Jialun Liu, JiaKui Hu et al.
Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach
Jing Bi, Lianggong Bruce Wen, Zhang Liu et al.
MESA: Matching Everything by Segmenting Anything
Yesheng Zhang, Xu Zhao
PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos
Yufei Zhang, Jeffrey Kephart, Zijun Cui et al.
HiPose: Hierarchical Binary Surface Encoding and Correspondence Pruning for RGB-D 6DoF Object Pose Estimation
Yongliang Lin, Yongzhi Su, Praveen Nathan et al.
FreePoint: Unsupervised Point Cloud Instance Segmentation
Zhikai Zhang, Jian Ding, Li Jiang et al.
Decompositional Neural Scene Reconstruction with Generative Diffusion Prior
Junfeng Ni, Yu Liu, Ruijie Lu et al.
Fair-VPT: Fair Visual Prompt Tuning for Image Classification
Sungho Park, Hyeran Byun
UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing
Yiheng Li, RuiBing Hou, Hong Chang et al.
Adaptive Rectangular Convolution for Remote Sensing Pansharpening
Xueyang Wang, Zhixin Zheng, Jiandong Shao et al.
Video Motion Transfer with Diffusion Transformers
Alexander Pondaven, Aliaksandr Siarohin, Sergey Tulyakov et al.
Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy
Joonhyun Jeong, Seyun Bae, Yeonsung Jung et al.
SuperSVG: Superpixel-based Scalable Vector Graphics Synthesis
Teng Hu, Ran Yi, Baihong Qian et al.
Ref-GS: Directional Factorization for 2D Gaussian Splatting
Youjia Zhang, Anpei Chen, Yumin Wan et al.
4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion
Chaoyang Wang, Peiye Zhuang, Tuan Duc Ngo et al.
Composing Object Relations and Attributes for Image-Text Matching
Khoi Pham, Chuong Huynh, Ser-Nam Lim et al.
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
Jiangyong Huang, Baoxiong Jia, Yan Wang et al.
Vid2Avatar-Pro: Authentic Avatar from Videos in the Wild via Universal Prior
Chen Guo, Junxuan Li, Yash Kant et al.
FedMef: Towards Memory-efficient Federated Dynamic Pruning
Hong Huang, Weiming Zhuang, Chen Chen et al.
RMem: Restricted Memory Banks Improve Video Object Segmentation
Junbao Zhou, Ziqi Pang, Yu-Xiong Wang
Rethinking the Evaluation Protocol of Domain Generalization
Han Yu, Xingxuan Zhang, Renzhe Xu et al.
GenFusion: Closing the Loop between Reconstruction and Generation via Videos
Sibo Wu, Congrong Xu, Binbin Huang et al.
TAMM: TriAdapter Multi-Modal Learning for 3D Shape Understanding
Zhihao Zhang, Shengcao Cao, Yu-Xiong Wang
Rotation-Agnostic Image Representation Learning for Digital Pathology
Saghir Alfasly, Abubakr Shafique, Peyman Nejat et al.
Blur-aware Spatio-temporal Sparse Transformer for Video Deblurring
Huicong Zhang, Haozhe Xie, Hongxun Yao
SketchAgent: Language-Driven Sequential Sketch Generation
Yael Vinker, Tamar Rott Shaham, Kristine Zheng et al.
Unmixing Diffusion for Self-Supervised Hyperspectral Image Denoising
Haijin Zeng, Jiezhang Cao, Yongyong Chen et al.
HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation
Boyuan Wang, Xiaofeng Wang, Chaojun Ni et al.
TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing
Stefan Lionar, Jiabin Liang, Gim Hee Lee
FAR: Flexible Accurate and Robust 6DoF Relative Camera Pose Estimation
Chris Rockwell, Nilesh Kulkarni, Linyi Jin et al.
FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio
Chao Xu, Yang Liu, Jiazheng Xing et al.
Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning
Fan Lu, Wei Wu, Kecheng Zheng et al.
AssistGUI: Task-Oriented PC Graphical User Interface Automation
Difei Gao, Lei Ji, Zechen Bai et al.
Correspondence-Free Non-Rigid Point Set Registration Using Unsupervised Clustering Analysis
Mingyang Zhao, Jiang Jingen, Lei Ma et al.
Shadow Generation for Composite Image Using Diffusion Model
Qingyang Liu, Junqi You, Jian-Ting Wang et al.
OmniGuard: Hybrid Manipulation Localization via Augmented Versatile Deep Image Watermarking
Xuanyu Zhang, Zecheng Tang, Zhipei Xu et al.
Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset
Xiao Wang, Yu Jin, Wentao Wu et al.
T4P: Test-Time Training of Trajectory Prediction via Masked Autoencoder and Actor-specific Token Memory
Daehee Park, Jaeseok Jeong, Sung-Hoon Yoon et al.
A Simple and Effective Point-based Network for Event Camera 6-DOFs Pose Relocalization
Hongwei Ren, Jiadong Zhu, Yue Zhou et al.
Hash3D: Training-free Acceleration for 3D Generation
Xingyi Yang, Songhua Liu, Xinchao Wang
Text-Enhanced Data-free Approach for Federated Class-Incremental Learning
Minh-Tuan Tran, Trung Le, Xuan-May Le et al.