Most Cited CVPR "gradient-boosted decision trees" Papers
5,589 papers found • Page 22 of 28
Conference
Self-Calibrating Vicinal Risk Minimisation for Model Calibration
Jiawei Liu, Changkun Ye, Ruikai Cui et al.
Closely Interactive Human Reconstruction with Proxemics and Physics-Guided Adaption
Buzhen Huang, Chen Li, Chongyang Xu et al.
Cinematic Behavior Transfer via NeRF-based Differentiable Filming
Xuekun Jiang, Anyi Rao, Jingbo Wang et al.
Event Ellipsometer: Event-based Mueller-Matrix Video Imaging
Ryota Maeda, Yunseong Moon, Seung-Hwan Baek
Finding Lottery Tickets in Vision Models via Data-driven Spectral Foresight Pruning
Leonardo Iurada, Marco Ciccone, Tatiana Tommasi
CORE-MPI: Consistency Object Removal with Embedding MultiPlane Image
Donggeun Yoon, Donghyeon Cho
ScoreHypo: Probabilistic Human Mesh Estimation with Hypothesis Scoring
Yuan Xu, Xiaoxuan Ma, Jiajun Su et al.
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
Nataniel Ruiz, Yuanzhen Li, Varun Jampani et al.
Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models
Chang Liu, Haoning Wu, Yujie Zhong et al.
UniMix: Towards Domain Adaptive and Generalizable LiDAR Semantic Segmentation in Adverse Weather
Haimei Zhao, Jing Zhang, Zhuo Chen et al.
LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs
Yunsheng Ma, Can Cui, Xu Cao et al.
Dynamic Policy-Driven Adaptive Multi-Instance Learning for Whole Slide Image Classification
Tingting Zheng, Kui Jiang, Hongxun Yao
A Physics-Informed Blur Learning Framework for Imaging Systems
liqun.chen, Yuxuan Li, Jun Dai et al.
BilevelPruning: Unified Dynamic and Static Channel Pruning for Convolutional Neural Networks
Shangqian Gao, Yanfu Zhang, Feihu Huang et al.
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
Jiayi Guo, Xingqian Xu, Yifan Pu et al.
Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents
Yuxi Wei, Zi Wang, Yifan Lu et al.
Label Propagation for Zero-shot Classification with Vision-Language Models
Vladan Stojnić, Yannis Kalantidis, Giorgos Tolias
Learning Continuous 3D Words for Text-to-Image Generation
Ta-Ying Cheng, Matheus Gadelha, Thibault Groueix et al.
Content-Adaptive Non-Local Convolution for Remote Sensing Pansharpening
Yule Duan, Xiao Wu, Haoyu Deng et al.
A Conditional Denoising Diffusion Probabilistic Model for Point Cloud Upsampling
Wentao Qu, Yuantian Shao, Lingwu Meng et al.
HUNet: Homotopy Unfolding Network for Image Compressive Sensing
Feiyang Shen, Hongping Gan
Vid2Sim: Generalizable, Video-based Reconstruction of Appearance, Geometry and Physics for Mesh-free Simulation
Chuhao Chen, Zhiyang Dou, Chen Wang et al.
APISR: Anime Production Inspired Real-World Anime Super-Resolution
Boyang Wang, Fengyu Yang, Xihang Yu et al.
Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions
Weizhen He, Yiheng Deng, SHIXIANG TANG et al.
Device-Wise Federated Network Pruning
Shangqian Gao, Junyi Li, Zeyu Zhang et al.
SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers
Ioannis Kakogeorgiou, Spyros Gidaris, Konstantinos Karantzalos et al.
QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge
Xuan Shen, Weize Ma, Jing Liu et al.
IQ-VFI: Implicit Quadratic Motion Estimation for Video Frame Interpolation
Mengshun Hu, Kui Jiang, Zhihang Zhong et al.
Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition
Anqi Zhu, Qiuhong Ke, Mingming Gong et al.
M^3-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation
Zixuan Chen, Jiaxin Li, Junxuan Liang et al.
MRC-Net: 6-DoF Pose Estimation with MultiScale Residual Correlation
Yuelong Li, Yafei Mao, Raja Bala et al.
Diffusion-ES: Gradient-free Planning with Diffusion for Autonomous and Instruction-guided Driving
Brian Yang, Huangyuan Su, Nikolaos Gkanatsios et al.
Progress-Aware Online Action Segmentation for Egocentric Procedural Task Videos
Yuhan Shen, Ehsan Elhamifar
MOHO: Learning Single-view Hand-held Object Reconstruction with Multi-view Occlusion-Aware Supervision
Chenyangguang Zhang, Guanlong Jiao, Yan Di et al.
OralXrays-9: Towards Hospital-Scale Panoramic X-ray Anomaly Detection via Personalized Multi-Object Query-Aware Mining
Bingzhi Chen, Sisi Fu, Xiaocheng Fang et al.
Faces that Speak: Jointly Synthesising Talking Face and Speech from Text
Youngjoon Jang, Jihoon Kim, Junseok Ahn et al.
Learning to Segment Referred Objects from Narrated Egocentric Videos
Yuhan Shen, Huiyu Wang, Xitong Yang et al.
EGTR: Extracting Graph from Transformer for Scene Graph Generation
Jinbae Im, JeongYeon Nam, Nokyung Park et al.
Distributionally Generative Augmentation for Fair Facial Attribute Classification
Fengda Zhang, Qianpei He, Kun Kuang et al.
PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural Networks
Marina Neseem, Conor McCullough, Randy Hsin et al.
THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models
Prannay Kaul, Zhizhong Li, Hao Yang et al.
UniVS: Unified and Universal Video Segmentation with Prompts as Queries
Minghan LI, Shuai Li, Xindong Zhang et al.
Free on the Fly: Enhancing Flexibility in Test-Time Adaptation with Online EM
Qiyuan Dai, Sibei Yang
Parametric Point Cloud Completion for Polygonal Surface Reconstruction
Zhaiyu Chen, Yuqing Wang, Liangliang Nan et al.
Inlier Confidence Calibration for Point Cloud Registration
Yongzhe Yuan, Yue Wu, Xiaolong Fan et al.
CLIP-BEVFormer: Enhancing Multi-View Image-Based BEV Detector with Ground Truth Flow
Chenbin Pan, Burhan Yaman, Senem Velipasalar et al.
Everything to the Synthetic: Diffusion-driven Test-time Adaptation via Synthetic-Domain Alignment
Jiayi Guo, Zhao Junhao, Chaoqun Du et al.
ADFactory: An Effective Framework for Generalizing Optical Flow with NeRF
Han Ling, Quansen Sun, Yinghui Sun et al.
3D Building Reconstruction from Monocular Remote Sensing Images with Multi-level Supervisions
Weijia Li, Haote Yang, Zhenghao Hu et al.
In Search of a Data Transformation That Accelerates Neural Field Training
Junwon Seo, Sangyoon Lee, Kwang In Kim et al.
FastMAC: Stochastic Spectral Sampling of Correspondence Graph
Yifei Zhang, Hao Zhao, Hongyang Li et al.
Geometry-guided Online 3D Video Synthesis with Multi-View Temporal Consistency
Hyunho Ha, Lei Xiao, Christian Richardt et al.
Satellite to GroundScape - Large-scale Consistent Ground View Generation from Satellite Views
Ningli Xu, Rongjun Qin
Continuous Space-Time Video Resampling with Invertible Motion Steganography
Yuantong zhang, Zhenzhong Chen
Localized Concept Erasure for Text-to-Image Diffusion Models Using Training-Free Gated Low-Rank Adaptation
Byung Hyun Lee, Sungjin Lim, Se Young Chun
PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution
Honghao Chen, Xiangxiang Chu, Renyongjian et al.
Towards Generalizing to Unseen Domains with Few Labels
Chamuditha Jayanga Galappaththige, Sanoojan Baliah, Malitha Gunawardhana et al.
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
Jialin Wu, Xia Hu, Yaqing Wang et al.
Generative Hard Example Augmentation for Semantic Point Cloud Segmentation
Qi Zhang, Jibin Peng, Zhao Huang et al.
Improving Semantic Correspondence with Viewpoint-Guided Spherical Maps
Octave Mariotti, Oisin Mac Aodha, Hakan Bilen
Focal Split: Untethered Snapshot Depth from Differential Defocus
Junjie Luo, John Mamish, Alan Fu et al.
Learning Degradation-Independent Representations for Camera ISP Pipelines
Yanhui Guo, Fangzhou Luo, Xiaolin Wu
DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture
Qianlong Xiang, Miao Zhang, Yuzhang Shang et al.
Structured Model Probing: Empowering Efficient Transfer Learning by Structured Regularization
Zhi-Fan Wu, Chaojie Mao, Xue Wang et al.
A Subspace-Constrained Tyler's Estimator and its Applications to Structure from Motion
Feng Yu, Teng Zhang, Gilad Lerman
A Theory of Learning Unified Model via Knowledge Integration from Label Space Varying Domains
Dexuan Zhang, Thomas Westfechtel, Tatsuya Harada
Low-Resource Vision Challenges for Foundation Models
Yunhua Zhang, Hazel Doughty, Cees G. M. Snoek
T-FAKE: Synthesizing Thermal Images for Facial Landmarking
Philipp Flotho, Moritz Piening, Anna Kukleva et al.
Low-Rank Adaptation in Multilinear Operator Networks for Security-Preserving Incremental Learning
Huu Binh Ta, Duc Nguyen, Quyen Tran et al.
Low-Latency Neural Stereo Streaming
Qiqi Hou, Farzad Farhadzadeh, Amir Said et al.
Population Normalization for Federated Learning
Zhuoyao Wang, Fan Yi, Peizhu Gong et al.
MetaShadow: Object-Centered Shadow Detection, Removal, and Synthesis
Tianyu Wang, Jianming Zhang, Haitian Zheng et al.
Your Transferability Barrier is Fragile: Free-Lunch for Transferring the Non-Transferable Learning
Ziming Hong, Li Shen, Tongliang Liu
ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe
Yifan Bai, Zeyang Zhao, Yihong Gong et al.
DPHMs: Diffusion Parametric Head Models for Depth-based Tracking
Jiapeng Tang, Angela Dai, Yinyu Nie et al.
MaxQ: Multi-Axis Query for N:M Sparsity Network
Jingyang Xiang, Siqi Li, Junhao Chen et al.
CRKD: Enhanced Camera-Radar Object Detection with Cross-modality Knowledge Distillation
Lingjun Zhao, Jingyu Song, Katherine Skinner
The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes
Myeongseob Ko, Feiyang Kang, Weiyan Shi et al.
Pay Attention to the Foreground in Object-Centric Learning
Pinzhuo Tian, Shengjie Yang, Hang Yu et al.
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
Haoning Wu, Zicheng Zhang, Erli Zhang et al.
Hierarchical Compact Clustering Attention (COCA) for Unsupervised Object-Centric Learning
Can Küçüksözen, Yucel Yemez
Efficient Scene Recovery Using Luminous Flux Prior
ZhongYu Li, Lei Zhang
CountLLM: Towards Generalizable Repetitive Action Counting via Large Language Model
Ziyu Yao, Xuxin Cheng, Zhiqi Huang et al.
Structured Gradient-based Interpretations via Norm-Regularized Adversarial Training
Shizhan Gong, Qi Dou, Farzan Farnia
Revisiting Global Translation Estimation with Feature Tracks
Peilin Tao, Hainan Cui, Mengqi Rong et al.
Forgery-aware Adaptive Transformer for Generalizable Synthetic Image Detection
Huan Liu, Zichang Tan, Chuangchuang Tan et al.
MatAnyone: Stable Video Matting with Consistent Memory Propagation
Peiqing Yang, Shangchen Zhou, Jixin Zhao et al.
MeaCap: Memory-Augmented Zero-shot Image Captioning
Zequn Zeng, Yan Xie, Hao Zhang et al.
MuseChat: A Conversational Music Recommendation System for Videos
Zhikang Dong, Bin Chen, Xiulong Liu et al.
CAP4D: Creating Animatable 4D Portrait Avatars with Morphable Multi-View Diffusion Models
Felix Taubner, Ruihang Zhang, Mathieu Tuli et al.
Novel View Synthesis with View-Dependent Effects from a Single Image
Juan Luis Gonzalez Bello, Munchurl Kim
Orchestrate Latent Expertise: Advancing Online Continual Learning with Multi-Level Supervision and Reverse Self-Distillation
Hongwei Yan, Liyuan Wang, Kaisheng Ma et al.
FG^2: Fine-Grained Cross-View Localization by Fine-Grained Feature Matching
Zimin Xia, Alex Alahi
DisCo: Disentangled Control for Realistic Human Dance Generation
Tan Wang, Linjie Li, Kevin Lin et al.
Suppress and Rebalance: Towards Generalized Multi-Modal Face Anti-Spoofing
Xun Lin, Shuai Wang, RIZHAO CAI et al.
Constructing and Exploring Intermediate Domains in Mixed Domain Semi-supervised Medical Image Segmentation
Qinghe Ma, Jian Zhang, Lei Qi et al.
LAMP: Learn A Motion Pattern for Few-Shot Video Generation
Rui-Qi Wu, Liangyu Chen, Tong Yang et al.
RORem: Training a Robust Object Remover with Human-in-the-Loop
Ruibin Li, Tao Yang, Song Guo et al.
PixelLM: Pixel Reasoning with Large Multimodal Model
Zhongwei Ren, Zhicheng Huang, Yunchao Wei et al.
Towards CLIP-driven Language-free 3D Visual Grounding via 2D-3D Relational Enhancement and Consistency
Yuqi Zhang, Han Luo, Yinjie Lei
iKUN: Speak to Trackers without Retraining
Yunhao Du, Cheng Lei, Zhicheng Zhao et al.
Neural Fields as Distributions: Signal Processing Beyond Euclidean Space
Daniel Rebain, Soroosh Yazdani, Kwang Moo Yi et al.
Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation
Xingqun Qi, Jiahao Pan, Peng Li et al.
LaRE^2: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection
Yunpeng Luo, Junlong Du, Ke Yan et al.
Stratified Avatar Generation from Sparse Observations
Han Feng, Wenchao Ma, Quankai Gao et al.
Few-shot Learner Parameterization by Diffusion Time-steps
Zhongqi Yue, Pan Zhou, Richang Hong et al.
Global and Hierarchical Geometry Consistency Priors for Few-shot NeRFs in Indoor Scenes
Xiaotian Sun, Qingshan Xu, Xinjie Yang et al.
Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation
Zihan Wang, Xiangyang Li, Jiahao Yang et al.
Compressed 3D Gaussian Splatting for Accelerated Novel View Synthesis
Simon Niedermayr, Josef Stumpfegger, rüdiger westermann
The STVchrono Dataset: Towards Continuous Change Recognition in Time
Yanjun Sun, Yue Qiu, Mariia Khan et al.
Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing
Bingyan Liu, Chengyu Wang, Tingfeng Cao et al.
Unleashing Channel Potential: Space-Frequency Selection Convolution for SAR Object Detection
Ke Li, Di Wang, Zhangyuan Hu et al.
Motion Blur Decomposition with Cross-shutter Guidance
Xiang Ji, Haiyang Jiang, Yinqiang Zheng
Gain from Neighbors: Boosting Model Robustness in the Wild via Adversarial Perturbations Toward Neighboring Classes
Zhou Yang, Mingtao Feng, Tao Huang et al.
Motion-Grounded Video Reasoning: Understanding and Perceiving Motion at Pixel Level
Andong Deng, Tongjia Chen, Shoubin Yu et al.
Curriculum Coarse-to-Fine Selection for High-IPC Dataset Distillation
Yanda Chen, Gongwei Chen, Miao Zhang et al.
IEEE Computer Society
LMO: Linear Mamba Operator for MRI Reconstruction
Wei Li, jiawei jiang, Jie Wu et al.
LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
Gongwei Chen, Leyang Shen, Rui Shao et al.
Pixel-Aligned Language Model
Jiarui Xu, Xingyi Zhou, Shen Yan et al.
Digital Twin Catalog: A Large-Scale Photorealistic 3D Object Digital Twin Dataset
Zhao Dong, Ka chen, Zhaoyang Lv et al.
Eclipse: Disambiguating Illumination and Materials using Unintended Shadows
Dor Verbin, Ben Mildenhall, Peter Hedman et al.
ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis
Muhammad Hamza Mughal, Rishabh Dabral, Ikhsanul Habibie et al.
2S-UDF: A Novel Two-stage UDF Learning Method for Robust Non-watertight Model Reconstruction from Multi-view Images
Junkai Deng, Fei Hou, Xuhui Chen et al.
HVI: A New Color Space for Low-light Image Enhancement
Qingsen Yan, Yixu Feng, Cheng Zhang et al.
Sparse Point Cloud Patches Rendering via Splitting 2D Gaussians
Changfeng Ma, Ran Bi, Jie Guo et al.
ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models
Lukas Höllein, Aljaž Božič, Norman Müller et al.
FinePhys: Fine-grained Human Action Generation by Explicitly Incorporating Physical Laws for Effective Skeletal Guidance
Dian Shao, Mingfei Shi, Shengda Xu et al.
TULIP: Transformer for Upsampling of LiDAR Point Clouds
Bin Yang, Patrick Pfreundschuh, Roland Siegwart et al.
Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos
Leonhard Sommer, Artur Jesslen, Eddy Ilg et al.
Harnessing Frequency Spectrum Insights for Image Copyright Protection Against Diffusion Models
Zhenguang Liu, Chao Shuai, Shaojing Fan et al.
Incremental Residual Concept Bottleneck Models
Chenming Shang, Shiji Zhou, Hengyuan Zhang et al.
HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation
Hermann Kumbong, Xian Liu, Tsung-Yi Lin et al.
Taming Stable Diffusion for Text to 360 Panorama Image Generation
Cheng Zhang, Qianyi Wu, Camilo Cruz Gambardella et al.
Cross-modal Information Flow in Multimodal Large Language Models
Zhi Zhang, Srishti Yadav, Fengze Han et al.
CAMEL: CAusal Motion Enhancement Tailored for Lifting Text-driven Video Editing
Guiwei Zhang, Tianyu Zhang, Guanglin Niu et al.
OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels
Meng Lou, Yizhou Yu
DuPL: Dual Student with Trustworthy Progressive Learning for Robust Weakly Supervised Semantic Segmentation
Yuanchen Wu, Xichen Ye, KequanYang et al.
A Physics-informed Low-rank Deep Neural Network for Blind and Universal Lens Aberration Correction
Jin Gong, Runzhao Yang, Weihang Zhang et al.
Multi-View Pose-Agnostic Change Localization with Zero Labels
Chamuditha Jayanga Galappaththige, Jason Lai, Lloyd Windrim et al.
NAPGuard: Towards Detecting Naturalistic Adversarial Patches
Siyang Wu, Jiakai Wang, Jiejie Zhao et al.
Descriptor and Word Soups: Overcoming the Parameter Efficiency Accuracy Tradeoff for Out-of-Distribution Few-shot Learning
Christopher Liao, Theodoros Tsiligkaridis, Brian Kulis
A Stealthy Wrongdoer: Feature-Oriented Reconstruction Attack against Split Learning
Xiaoyang Xu, Mengda Yang, Wenzhe Yi et al.
Bootstrapping SparseFormers from Vision Foundation Models
Ziteng Gao, Zhan Tong, Kevin Qinghong Lin et al.
DViN: Dynamic Visual Routing Network for Weakly Supervised Referring Expression Comprehension
Xiaofu Chen, Yaxin Luo, Luo et al.
Prior-free 3D Object Tracking
Xiuqiang Song, Li Jin, Zhengxian Zhang et al.
Conical Visual Concentration for Efficient Large Vision-Language Models
Long Xing, Qidong Huang, Xiaoyi Dong et al.
Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity
Ruijie Quan, Wenguan Wang, Zhibo Tian et al.
G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images
Zixiong Huang, Qi Chen, Libo Sun et al.
Active Prompt Learning in Vision Language Models
Jihwan Bang, Sumyeong Ahn, Jae-Gil Lee
Efficient Dataset Distillation via Minimax Diffusion
Jianyang Gu, Saeed Vahidian, Vyacheslav Kungurtsev et al.
BiomedCoOp: Learning to Prompt for Biomedical Vision-Language Models
Taha Koleilat, Hojat Asgariandehkordi, Hassan Rivaz et al.
HOT3D: Hand and Object Tracking in 3D from Egocentric Multi-View Videos
Prithviraj Banerjee, Sindi Shkodrani, Pierre Moulon et al.
DUSt3R: Geometric 3D Vision Made Easy
Shuzhe Wang, Vincent Leroy, Yohann Cabon et al.
Generating Handwritten Mathematical Expressions From Symbol Graphs: An End-to-End Pipeline
Yu chen, Fei Gao, YanguangZhang et al.
On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation
Agneet Chatterjee, Tejas Gokhale, Chitta Baral et al.
SingularTrajectory: Universal Trajectory Predictor Using Diffusion Model
Inhwan Bae, Young-Jae Park, Hae-Gon Jeon
Feature-Preserving Mesh Decimation for Normal Integration
Moritz Heep, Sven Behnke, Eduard Zell
Enhancing Video Super-Resolution via Implicit Resampling-based Alignment
Kai Xu, Ziwei Yu, Xin Wang et al.
Domain Separation Graph Neural Networks for Saliency Object Ranking
Zijian Wu, Jun Lu, Jing Han et al.
Solving the Catastrophic Forgetting Problem in Generalized Category Discovery
Xinzi Cao, Xiawu Zheng, Guanhong Wang et al.
VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models
Lei Li, wei yuancheng, Zhihui Xie et al.
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving
Yuqi Wang, Jiawei He, Lue Fan et al.
WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models
Changhoon Kim, Kyle Min, Maitreya Patel et al.
MatchU: Matching Unseen Objects for 6D Pose Estimation from RGB-D Images
Junwen Huang, Hao Yu, Kuan-Ting Yu et al.
Resource-Efficient Transformer Pruning for Finetuning of Large Models
Fatih Ilhan, Gong Su, Selim Tekin et al.
Link-Context Learning for Multimodal LLMs
Yan Tai, Weichen Fan, Zhao Zhang et al.
The Manga Whisperer: Automatically Generating Transcriptions for Comics
Ragav Sachdeva, Andrew Zisserman
When Domain Generalization meets Generalized Category Discovery: An Adaptive Task-Arithmetic Driven Approach
Vaibhav Rathore, Shubhranil B, Saikat Dutta et al.
Deep-TROJ: An Inference Stage Trojan Insertion Algorithm through Efficient Weight Replacement Attack
Sabbir Ahmed, RANYANG ZHOU, Shaahin Angizi et al.
Automatic Joint Structured Pruning and Quantization for Efficient Neural Network Training and Compression
Xiaoyi Qu, David Aponte, Colby Banbury et al.
Dynamic LiDAR Re-simulation using Compositional Neural Fields
Hanfeng Wu, Xingxing Zuo, Stefan Leutenegger et al.
Adversarial Text to Continuous Image Generation
Kilichbek Haydarov, Aashiq Muhamed, Xiaoqian Shen et al.
Language-aware Visual Semantic Distillation for Video Question Answering
Bo Zou, Chao Yang, Yu Qiao et al.
3DInAction: Understanding Human Actions in 3D Point Clouds
Yizhak Ben-Shabat, Oren Shrout, Stephen Gould
ProxyTransformation: Preshaping Point Cloud Manifold With Proxy Attention For 3D Visual Grounding
Qihang Peng, Henry Zheng, Gao Huang
DiLiGenRT: A Photometric Stereo Dataset with Quantified Roughness and Translucency
Heng Guo, Jieji Ren, Feishi Wang et al.
StyLitGAN: Image-Based Relighting via Latent Control
Anand Bhattad, James Soole, David Forsyth
Label-Efficient Group Robustness via Out-of-Distribution Concept Curation
Yiwei Yang, Anthony Liu, Robert Wolfe et al.
DiffPortrait360: Consistent Portrait Diffusion for 360 View Synthesis
Yuming Gu, Phong Tran, Yujian Zheng et al.
FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training
Anjia Cao, Xing Wei, Zhiheng Ma
Divide and Conquer: Heterogeneous Noise Integration for Diffusion-based Adversarial Purification
Gaozheng Pei, Shaojie Lyu, Gong Chen et al.
Unsupervised Universal Image Segmentation
XuDong Wang, Dantong Niu, Xinyang Han et al.
Batch Normalization Alleviates the Spectral Bias in Coordinate Networks
Zhicheng Cai, Hao Zhu, Qiu Shen et al.
Foley-Flow: Coordinated Video-to-Audio Generation with Masked Audio-Visual Alignment and Dynamic Conditional Flows
Shentong Mo, Yibing Song
StyleMaster: Stylize Your Video with Artistic Generation and Translation
Zixuan Ye, Huijuan Huang, Xintao Wang et al.
Not All Classes Stand on Same Embeddings: Calibrating a Semantic Distance with Metric Tensor
Jae Hyeon Park, Gyoomin Lee, Seunggi Park et al.
CodedEvents: Optimal Point-Spread-Function Engineering for 3D-Tracking with Event Cameras
Sachin Shah, Matthew Chan, Haoming Cai et al.
Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling
Zhe Li, Zerong Zheng, Lizhen Wang et al.
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models
Qirui Jiao, Daoyuan Chen, Yilun Huang et al.
Retrieval-Augmented Open-Vocabulary Object Detection
Jooyeon Kim, Eulrang Cho, Sehyung Kim et al.
SGC-Net: Stratified Granular Comparison Network for Open-Vocabulary HOI Detection
Xin Lin, Chong Shi, Zuopeng Yang et al.
NB-GTR: Narrow-Band Guided Turbulence Removal
Yifei Xia, Chu Zhou, Chengxuan Zhu et al.
LangSplat: 3D Language Gaussian Splatting
Minghan Qin, Wanhua Li, Jiawei ZHOU et al.
Positive-Unlabeled Learning by Latent Group-Aware Meta Disambiguation
Lin Long, Haobo Wang, Zhijie Jiang et al.
Dynamic Neural Surfaces for Elastic 4D Shape Representation and Analysis
Awais Nizamani, Hamid Laga, Guanjin Wang et al.
Text-conditional Attribute Alignment across Latent Spaces for 3D Controllable Face Image Synthesis
FeiFan Xu, Rui Li, Si Wu et al.
Learning with Noisy Triplet Correspondence for Composed Image Retrieval
Shuxian Li, Changhao He, XitingLiu et al.
LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding
Hongyu Li, Jinyu Chen, Ziyu Wei et al.
AnyMoLe: Any Character Motion In-betweening Leveraging Video Diffusion Models
Kwan Yun, Seokhyeon Hong, Chaelin Kim et al.