Most Cited 2024 "sequential classification" Papers

12,324 papers found • Page 52 of 62

#10201

MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning

Matteo Farina, Massimiliano Mancini, Elia Cunegatti et al.

CVPR 2024posterarXiv:2404.05621
#10202

Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization

Guopeng Li, Ming Qian, Gui-Song Xia

CVPR 2024posterarXiv:2403.14198
#10203

CURSOR: Scalable Mixed-Order Hypergraph Matching with CUR Decomposition

Qixuan Zheng, Ming Zhang, Hong Yan

CVPR 2024posterarXiv:2402.16594
#10204

FCS: Feature Calibration and Separation for Non-Exemplar Class Incremental Learning

Qiwei Li, Yuxin Peng, Jiahuan Zhou

CVPR 2024poster
#10205

Exploring Regional Clues in CLIP for Zero-Shot Semantic Segmentation

Yi Zhang, Meng-Hao Guo, Miao Wang et al.

CVPR 2024poster
#10206

GALA: Generating Animatable Layered Assets from a Single Scan

Taeksoo Kim, Byungjun Kim, Shunsuke Saito et al.

CVPR 2024posterarXiv:2401.12979
#10207

Improving Graph Contrastive Learning via Adaptive Positive Sampling

Jiaming Zhuo, Feiyang Qin, Can Cui et al.

CVPR 2024poster
#10208

Hearing Anything Anywhere

Mason Wang, Ryosuke Sawata, Samuel Clarke et al.

CVPR 2024posterarXiv:2406.07532
#10209

Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning

Chen Zhao, Shuming Liu, Karttikeya Mangalam et al.

CVPR 2024poster
#10210

Diffusion-EDFs: Bi-equivariant Denoising Generative Modeling on SE(3) for Visual Robotic Manipulation

Hyunwoo Ryu, Jiwoo Kim, Hyunseok An et al.

CVPR 2024highlightarXiv:2309.02685
#10211

BOTH2Hands: Inferring 3D Hands from Both Text Prompts and Body Dynamics

Wenqian Zhang, Molin Huang, Yuxuan Zhou et al.

CVPR 2024posterarXiv:2312.07937
#10212

Bayesian Exploration of Pre-trained Models for Low-shot Image Classification

Yibo Miao, Yu lei, Feng Zhou et al.

CVPR 2024posterarXiv:2404.00312
#10213

Benchmarking the Robustness of Temporal Action Detection Models Against Temporal Corruptions

Runhao Zeng, Xiaoyong Chen, Jiaming Liang et al.

CVPR 2024posterarXiv:2403.20254
#10214

RepKPU: Point Cloud Upsampling with Kernel Point Representation and Deformation

Yi Rong, Haoran Zhou, Kang Xia et al.

CVPR 2024poster
#10215

4K4D: Real-Time 4D View Synthesis at 4K Resolution

Zhen Xu, Sida Peng, Haotong Lin et al.

CVPR 2024posterarXiv:2310.11448
#10216

Context-Guided Spatio-Temporal Video Grounding

Xin Gu, Heng Fan, Yan Huang et al.

CVPR 2024posterarXiv:2401.01578
#10217

TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation

Sai Kumar Dwivedi, Yu Sun, Priyanka Patel et al.

CVPR 2024posterarXiv:2404.16752
#10218

Re-thinking Data Availability Attacks Against Deep Neural Networks

Bin Fang, Bo Li, Shuang Wu et al.

CVPR 2024poster
#10219

Logit Standardization in Knowledge Distillation

Shangquan Sun, Wenqi Ren, Jingzhi Li et al.

CVPR 2024highlightarXiv:2403.01427
#10220

A Unified Approach for Text- and Image-guided 4D Scene Generation

Yufeng Zheng, Xueting Li, Koki Nagano et al.

CVPR 2024posterarXiv:2311.16854
#10221

CONFORM: Contrast is All You Need for High-Fidelity Text-to-Image Diffusion Models

Tuna Han Salih Meral, Enis Simsar, Federico Tombari et al.

CVPR 2024posterarXiv:2312.06059
#10222

SPECAT: SPatial-spEctral Cumulative-Attention Transformer for High-Resolution Hyperspectral Image Reconstruction

Zhiyang Yao, Shuyang Liu, Xiaoyun Yuan et al.

CVPR 2024poster
#10223

Video-Based Human Pose Regression via Decoupled Space-Time Aggregation

Jijie He, Wenwu Yang

CVPR 2024posterarXiv:2403.19926
#10224

Neural Refinement for Absolute Pose Regression with Feature Synthesis

Shuai Chen, Yash Bhalgat, Xinghui Li et al.

CVPR 2024posterarXiv:2303.10087
#10225

VGGSfM: Visual Geometry Grounded Deep Structure From Motion

Jianyuan Wang, Nikita Karaev, Christian Rupprecht et al.

CVPR 2024highlight
#10226

Boosting Image Restoration via Priors from Pre-trained Models

Xiaogang Xu, Shu Kong, Tao Hu et al.

CVPR 2024posterarXiv:2403.06793
#10227

CPP-Net: Embracing Multi-Scale Feature Fusion into Deep Unfolding CP-PPA Network for Compressive Sensing

Zhen Guo, Hongping Gan

CVPR 2024poster
#10228

GenFlow: Generalizable Recurrent Flow for 6D Pose Refinement of Novel Objects

Sungphill Moon, Hyeontae Son, Dongcheol Hur et al.

CVPR 2024posterarXiv:2403.11510
#10229

PKU-DyMVHumans: A Multi-View Video Benchmark for High-Fidelity Dynamic Human Modeling

Xiaoyun Zheng, Liwei Liao, Xufeng Li et al.

CVPR 2024posterarXiv:2403.16080
#10230

MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers

Yawar Siddiqui, Antonio Alliegro, Alexey Artemov et al.

CVPR 2024highlightarXiv:2311.15475
#10231

RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features

Geonho Bang, Kwangjin Choi, Jisong Kim et al.

CVPR 2024posterarXiv:2403.05061
#10232

Task-Conditioned Adaptation of Visual Features in Multi-Task Policy Learning

Pierre Marza, Laetitia Matignon, Olivier Simonin et al.

CVPR 2024posterarXiv:2402.07739
#10233

EasyDrag: Efficient Point-based Manipulation on Diffusion Models

Xingzhong Hou, Boxiao Liu, Yi Zhang et al.

CVPR 2024poster
#10234

Learned Lossless Image Compression based on Bit Plane Slicing

Zhe Zhang, Huairui Wang, Zhenzhong Chen et al.

CVPR 2024poster
#10235

BEM: Balanced and Entropy-based Mix for Long-Tailed Semi-Supervised Learning

Hongwei Zheng, Linyuan Zhou, Han Li et al.

CVPR 2024posterarXiv:2404.01179
#10236

Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement

Ziyu Wang, Yue Xu, Cewu Lu et al.

CVPR 2024posterarXiv:2312.00362
#10237

SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models

Tongtian Yue, Jie Cheng, Longteng Guo et al.

CVPR 2024posterarXiv:2403.13263
#10238

Frequency-Adaptive Dilated Convolution for Semantic Segmentation

Linwei Chen, Lin Gu, Dezhi Zheng et al.

CVPR 2024highlightarXiv:2403.05369
#10239

TexTile: A Differentiable Metric for Texture Tileability

Carlos Rodriguez-Pardo, Dan Casas, Elena Garces et al.

CVPR 2024posterarXiv:2403.12961
#10240

MatSynth: A Modern PBR Materials Dataset

Giuseppe Vecchio, Valentin Deschaintre

CVPR 2024posterarXiv:2401.06056
#10241

Image Processing GNN: Breaking Rigidity in Super-Resolution

Yuchuan Tian, Hanting Chen, Chao Xu et al.

CVPR 2024poster
#10242

ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation

Suraj Patni, Aradhye Agarwal, Chetan Arora

CVPR 2024posterarXiv:2403.18807
#10243

Riemannian Multinomial Logistics Regression for SPD Neural Networks

Ziheng Chen, Yue Song, Gaowen Liu et al.

CVPR 2024posterarXiv:2305.11288
#10244

LED: A Large-scale Real-world Paired Dataset for Event Camera Denoising

Yuxing Duan

CVPR 2024posterarXiv:2405.19718
#10245

NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging

Takahiro Shirakawa, Seiichi Uchida

CVPR 2024posterarXiv:2403.03485
#10246

OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM

Yutao Hu, Tianbin, Quanfeng Lu et al.

CVPR 2024posterarXiv:2402.09181
#10247

Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models

Daniel Geng, Inbum Park, Andrew Owens

CVPR 2024posterarXiv:2311.17919
#10248

Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding

Zhihao Yuan, Jinke Ren, Chun-Mei Feng et al.

CVPR 2024posterarXiv:2311.15383
#10249

Towards HDR and HFR Video from Rolling-Mixed-Bit Spikings

Yakun Chang, Yeliduosi Xiaokaiti, Yujia Liu et al.

CVPR 2024poster
#10250

Learn from View Correlation: An Anchor Enhancement Strategy for Multi-view Clustering

Suyuan Liu, KE LIANG, Zhibin Dong et al.

CVPR 2024poster
#10251

Passive Snapshot Coded Aperture Dual-Pixel RGB-D Imaging

Bhargav Ghanekar, Salman Siddique Khan, Pranav Sharma et al.

CVPR 2024posterarXiv:2402.18102
#10252

UniPAD: A Universal Pre-training Paradigm for Autonomous Driving

Honghui Yang, Sha Zhang, Di Huang et al.

CVPR 2024posterarXiv:2310.08370
#10253

ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations

Maitreya Patel, Changhoon Kim, Sheng Cheng et al.

CVPR 2024posterarXiv:2312.04655
#10254

Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning

Haoyu Chen, Wenbo Li, Jinjin Gu et al.

CVPR 2024posterarXiv:2403.02601
#10255

Neural Video Compression with Feature Modulation

Jiahao Li, Bin Li, Yan Lu

CVPR 2024posterarXiv:2402.17414
#10256

Nearest is Not Dearest: Towards Practical Defense against Quantization-conditioned Backdoor Attacks

Boheng Li, Yishuo Cai, Haowei Li et al.

CVPR 2024posterarXiv:2405.12725
#10257

Dual DETRs for Multi-Label Temporal Action Detection

Yuhan Zhu, Guozhen Zhang, Jing Tan et al.

CVPR 2024posterarXiv:2404.00653
#10258

Discriminative Probing and Tuning for Text-to-Image Generation

Leigang Qu, Wenjie Wang, Yongqi Li et al.

CVPR 2024posterarXiv:2403.04321
#10259

GigaTraj: Predicting Long-term Trajectories of Hundreds of Pedestrians in Gigapixel Complex Scenes

Haozhe Lin, Chunyu Wei, Li He et al.

CVPR 2024poster
#10260

Outdoor Scene Extrapolation with Hierarchical Generative Cellular Automata

Dongsu Zhang, Francis Williams, Žan Gojčič et al.

CVPR 2024highlightarXiv:2406.08292
#10261

Comparing the Decision-Making Mechanisms by Transformers and CNNs via Explanation Methods

Mingqi Jiang, Saeed Khorram, Li Fuxin

CVPR 2024posterarXiv:2212.06872
#10262

Continual Segmentation with Disentangled Objectness Learning and Class Recognition

Yizheng Gong, Siyue Yu, Xiaoyang Wang et al.

CVPR 2024posterarXiv:2403.03477
#10263

Image Sculpting: Precise Object Editing with 3D Geometry Control

Jiraphon Yenphraphai, Xichen Pan, Sainan Liu et al.

CVPR 2024posterarXiv:2401.01702
#10264

Attribute-Guided Pedestrian Retrieval: Bridging Person Re-ID with Internal Attribute Variability

Yan Huang, Zhang Zhang, Qiang Wu et al.

CVPR 2024poster
#10265

Weakly Misalignment-free Adaptive Feature Alignment for UAVs-based Multimodal Object Detection

Chen Chen, Jiahao Qi, Xingyue Liu et al.

CVPR 2024poster
#10266

Prompt-Driven Dynamic Object-Centric Learning for Single Domain Generalization

Deng Li, Aming Wu, Yaowei Wang et al.

CVPR 2024posterarXiv:2402.18447
#10267

EscherNet: A Generative Model for Scalable View Synthesis

Xin Kong, Shikun Liu, Xiaoyang Lyu et al.

CVPR 2024posterarXiv:2402.03908
#10268

MVCPS-NeuS: Multi-view Constrained Photometric Stereo for Neural Surface Reconstruction

Hiroaki Santo, Fumio Okura, Yasuyuki Matsushita

CVPR 2024poster
#10269

OHTA: One-shot Hand Avatar via Data-driven Implicit Priors

Xiaozheng Zheng, Chao Wen, Zhuo Su et al.

CVPR 2024posterarXiv:2402.18969
#10270

E-GPS: Explainable Geometry Problem Solving via Top-Down Solver and Bottom-Up Generator

Wenjun Wu, Lingling Zhang, Jun Liu et al.

CVPR 2024poster
#10271

MultiPhys: Multi-Person Physics-aware 3D Motion Estimation

Nicolás Ugrinovic, Boxiao Pan, Georgios Pavlakos et al.

CVPR 2024posterarXiv:2404.11987
#10272

LMDrive: Closed-Loop End-to-End Driving with Large Language Models

Hao Shao, Yuxuan Hu, Letian Wang et al.

CVPR 2024posterarXiv:2312.07488
#10273

ID-Blau: Image Deblurring by Implicit Diffusion-based reBLurring AUgmentation

Jia-Hao Wu, Fu-Jen Tsai, Yan-Tsung Peng et al.

CVPR 2024posterarXiv:2312.10998
#10274

GauHuman: Articulated Gaussian Splatting from Monocular Human Videos

Shoukang Hu, Tao Hu, Ziwei Liu

CVPR 2024posterarXiv:2312.02973
#10275

BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection

Zhenxin Li, Shiyi Lan, Jose M. Alvarez et al.

CVPR 2024posterarXiv:2312.01696
#10276

AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents

Jieming Cui, Tengyu Liu, Nian Liu et al.

CVPR 2024posterarXiv:2403.12835
#10277

HumanNeRF-SE: A Simple yet Effective Approach to Animate HumanNeRF with Diverse Poses

Caoyuan Ma, Yu-Lun Liu, Zhixiang Wang et al.

CVPR 2024posterarXiv:2312.02232
#10278

SurMo: Surface-based 4D Motion Modeling for Dynamic Human Rendering

Tao Hu, Fangzhou Hong, Ziwei Liu

CVPR 2024posterarXiv:2404.01225
#10279

LeftRefill: Filling Right Canvas based on Left Reference through Generalized Text-to-Image Diffusion Model

Chenjie Cao, Yunuo Cai, Qiaole Dong et al.

CVPR 2024posterarXiv:2305.11577
#10280

ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation

Xiaoqi Li, Mingxu Zhang, Yiran Geng et al.

CVPR 2024posterarXiv:2312.16217
#10281

Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation

Wenxuan Wang, Tongtian Yue, Yisi Zhang et al.

CVPR 2024poster
#10282

PanoPose: Self-supervised Relative Pose Estimation for Panoramic Images

Diantao Tu, Hainan Cui, Xianwei Zheng et al.

CVPR 2024highlight
#10283

Mask4Align: Aligned Entity Prompting with Color Masks for Multi-Entity Localization Problems

Haoquan Zhang, Ronggang Huang, Yi Xie et al.

CVPR 2024poster
#10284

Global and Local Prompts Cooperation via Optimal Transport for Federated Learning

Hongxia Li, Wei Huang, Jingya Wang et al.

CVPR 2024posterarXiv:2403.00041
#10285

VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning

Ziyang Luo, Nian Liu, Wangbo Zhao et al.

CVPR 2024posterarXiv:2311.15011
#10286

Dense Optical Tracking: Connecting the Dots

Guillaume Le Moing, Jean Ponce, Cordelia Schmid

CVPR 2024highlightarXiv:2312.00786
#10287

Multi-agent Collaborative Perception via Motion-aware Robust Communication Network

Shixin Hong, Yu LIU, Zhi Li et al.

CVPR 2024poster
#10288

Ungeneralizable Examples

Jingwen Ye, Xinchao Wang

CVPR 2024posterarXiv:2404.14016
#10289

Language-only Training of Zero-shot Composed Image Retrieval

Geonmo Gu, Sanghyuk Chun, Wonjae Kim et al.

CVPR 2024poster
#10290

Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models

Shitian Zhao, Zhuowan Li, YadongLu et al.

CVPR 2024highlightarXiv:2312.06685
#10291

Rapid Motor Adaptation for Robotic Manipulator Arms

Yichao Liang, Kevin Ellis, João F. Henriques

CVPR 2024posterarXiv:2312.04670
#10292

Instruct-Imagen: Image Generation with Multi-modal Instruction

Hexiang Hu, Kelvin C.K. Chan, Yu-Chuan Su et al.

CVPR 2024posterarXiv:2401.01952
#10293

Diffeomorphic Template Registration for Atmospheric Turbulence Mitigation

Dong Lao, Congli Wang, Alex Wong et al.

CVPR 2024highlightarXiv:2405.03662
#10294

Adapting to Length Shift: FlexiLength Network for Trajectory Prediction

Yi Xu, Yun Fu

CVPR 2024posterarXiv:2404.00742
#10295

CausalPC: Improving the Robustness of Point Cloud Classification by Causal Effect Identification

Yuanmin Huang, Mi Zhang, Daizong Ding et al.

CVPR 2024poster
#10296

LiSA: LiDAR Localization with Semantic Awareness

Bochun Yang, Zijun Li, Wen Li et al.

CVPR 2024highlight
#10297

Rethinking Prior Information Generation with CLIP for Few-Shot Segmentation

Jin Wang, Bingfeng Zhang, Jian Pang et al.

CVPR 2024posterarXiv:2405.08458
#10298

Adaptive Slot Attention: Object Discovery with Dynamic Slot Number

Ke Fan, Zechen Bai, Tianjun Xiao et al.

CVPR 2024posterarXiv:2406.09196
#10299

Learning Coupled Dictionaries from Unpaired Data for Image Super-Resolution

Longguang Wang, Juncheng Li, Yingqian Wang et al.

CVPR 2024poster
#10300

C3: High-Performance and Low-Complexity Neural Compression from a Single Image or Video

Hyunjik Kim, Matthias Bauer, Lucas Theis et al.

CVPR 2024posterarXiv:2312.02753
#10301

AttriHuman-3D: Editable 3D Human Avatar Generation with Attribute Decomposition and Indexing

Fan Yang, Tianyi Chen, XIAOSHENG HE et al.

CVPR 2024posterarXiv:2312.02209
#10302

iToF-flow-based High Frame Rate Depth Imaging

Yu Meng, Zhou Xue, Xu Chang et al.

CVPR 2024poster
#10303

Rethinking Human Motion Prediction with Symplectic Integral

Haipeng Chen, Kedi L yu, Zhenguang Liu et al.

CVPR 2024poster
#10304

Detector-Free Structure from Motion

Xingyi He, Jiaming Sun, Yifan Wang et al.

CVPR 2024posterarXiv:2306.15669
#10305

Holodeck: Language Guided Generation of 3D Embodied AI Environments

Yue Yang, Fan-Yun Sun, Luca Weihs et al.

CVPR 2024posterarXiv:2312.09067
#10306

DiVAS: Video and Audio Synchronization with Dynamic Frame Rates

Clara Maria Fernandez Labrador, Mertcan Akcay, Eitan Abecassis et al.

CVPR 2024poster
#10307

Benchmarking Audio Visual Segmentation for Long-Untrimmed Videos

Chen Liu, Peike Li, Qingtao Yu et al.

CVPR 2024poster
#10308

Inter-X: Towards Versatile Human-Human Interaction Analysis

Liang Xu, Xintao Lv, Yichao Yan et al.

CVPR 2024posterarXiv:2312.16051
#10309

Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld

Yijun Yang, Tianyi Zhou, kanxue Li et al.

CVPR 2024posterarXiv:2311.16714
#10310

One-Shot Open Affordance Learning with Foundation Models

Gen Li, Deqing Sun, Laura Sevilla-Lara et al.

CVPR 2024posterarXiv:2311.17776
#10311

SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks

Xinyu Shi, Zecheng Hao, Zhaofei Yu

CVPR 2024posterarXiv:2403.14302
#10312

Tactile-Augmented Radiance Fields

Yiming Dou, Fengyu Yang, Yi Liu et al.

CVPR 2024posterarXiv:2405.04534
#10313

Mean-Shift Feature Transformer

Takumi Kobayashi

CVPR 2024poster
#10314

Consistent Prompting for Rehearsal-Free Continual Learning

Zhanxin Gao, Jun Cen, Xiaobin Chang

CVPR 2024posterarXiv:2403.08568
#10315

KVQ: Kwai Video Quality Assessment for Short-form Videos

Yiting Lu, Xin Li, Yajing Pei et al.

CVPR 2024posterarXiv:2402.07220
#10316

Purified and Unified Steganographic Network

GuoBiao Li, Sheng Li, Zicong Luo et al.

CVPR 2024posterarXiv:2402.17210
#10317

Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model

Dian Zheng, Xiao-Ming Wu, Shuzhou Yang et al.

CVPR 2024posterarXiv:2403.11157
#10318

Fast Adaptation for Human Pose Estimation via Meta-Optimization

Shengxiang Hu, Huaijiang Sun, Bin Li et al.

CVPR 2024poster
#10319

Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection

Jiawen Zhu, Choubo Ding, Yu Tian et al.

CVPR 2024posterarXiv:2310.12790
#10320

L4D-Track: Language-to-4D Modeling Towards 6-DoF Tracking and Shape Reconstruction in 3D Point Cloud Stream

Jingtao Sun, Yaonan Wang, Mingtao Feng et al.

CVPR 2024poster
#10321

MAPSeg: Unified Unsupervised Domain Adaptation for Heterogeneous Medical Image Segmentation Based on 3D Masked Autoencoding and Pseudo-Labeling

Xuzhe Zhang, Yuhao Wu, Elsa Angelini et al.

CVPR 2024posterarXiv:2303.09373
#10322

IBD-SLAM: Learning Image-Based Depth Fusion for Generalizable SLAM

Minghao Yin, Shangzhe Wu, Kai Han

CVPR 2024poster
#10323

Segment Any Event Streams via Weighted Adaptation of Pivotal Tokens

Zhiwen Chen, Zhiyu Zhu, Yifan Zhang et al.

CVPR 2024poster
#10324

Boosting Image Quality Assessment through Efficient Transformer Adaptation with Local Feature Enhancement

Kangmin Xu, Liang Liao, Jing Xiao et al.

CVPR 2024poster
#10325

Learning without Exact Guidance: Updating Large-scale High-resolution Land Cover Maps from Low-resolution Historical Labels

Zhuohong Li, Wei He, Jiepan Li et al.

CVPR 2024highlightarXiv:2403.02746
#10326

Multi-Modal Hallucination Control by Visual Information Grounding

Alessandro Favero, Luca Zancato, Matthew Trager et al.

CVPR 2024posterarXiv:2403.14003
#10327

Exploring Orthogonality in Open World Object Detection

Zhicheng Sun, Jinghan Li, Yadong Mu

CVPR 2024poster
#10328

DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing

Jia-Wei Liu, Yan-Pei Cao, Jay Zhangjie Wu et al.

CVPR 2024posterarXiv:2310.10624
#10329

SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer

Rui Zhu, Yingwei Pan, Yehao Li et al.

CVPR 2024posterarXiv:2403.17004
#10330

Traffic Scene Parsing through the TSP6K Dataset

Peng-Tao Jiang, Yuqi Yang, Yang Cao et al.

CVPR 2024posterarXiv:2303.02835
#10331

KPConvX: Modernizing Kernel Point Convolution with Kernel Attention

Hugues Thomas, Yao-Hung Hubert Tsai, Timothy Barfoot et al.

CVPR 2024posterarXiv:2405.13194
#10332

Latency Correction for Event-guided Deblurring and Frame Interpolation

Yixin Yang, Jinxiu Liang, Bohan Yu et al.

CVPR 2024poster
#10333

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

Muyang Li, Tianle Cai, Jiaxin Cao et al.

CVPR 2024highlightarXiv:2402.19481
#10334

MoReVQA: Exploring Modular Reasoning Models for Video Question Answering

Juhong Min, Shyamal Buch, Arsha Nagrani et al.

CVPR 2024posterarXiv:2404.06511
#10335

DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models

Nastaran Saadati, Minh Pham, Nasla Saleem et al.

CVPR 2024posterarXiv:2404.08079
#10336

NTO3D: Neural Target Object 3D Reconstruction with Segment Anything

Xiaobao Wei, Renrui Zhang, Jiarui Wu et al.

CVPR 2024posterarXiv:2309.12790
#10337

Text-Driven Image Editing via Learnable Regions

Yuanze Lin, Yi-Wen Chen, Yi-Hsuan Tsai et al.

CVPR 2024posterarXiv:2311.16432
#10338

ZERO-IG: Zero-Shot Illumination-Guided Joint Denoising and Adaptive Enhancement for Low-Light Images

Yiqi Shi, Duo Liu, Liguo Zhang et al.

CVPR 2024poster
#10339

Self-Supervised Representation Learning from Arbitrary Scenarios

Zhaowen Li, Yousong Zhu, Zhiyang Chen et al.

CVPR 2024poster
#10340

Rethinking Multi-domain Generalization with A General Learning Objective

Zhaorui Tan, Xi Yang, Kaizhu Huang

CVPR 2024posterarXiv:2402.18853
#10341

Adversarial Distillation Based on Slack Matching and Attribution Region Alignment

Shenglin Yin, Zhen Xiao, Mingxuan Song et al.

CVPR 2024poster
#10342

The Neglected Tails in Vision-Language Models

Shubham Parashar, Tian Liu, Zhiqiu Lin et al.

CVPR 2024posterarXiv:2401.12425
#10343

Multi-View Attentive Contextualization for Multi-View 3D Object Detection

Xianpeng Liu, Ce Zheng, Ming Qian et al.

CVPR 2024posterarXiv:2405.12200
#10344

SODA: Bottleneck Diffusion Models for Representation Learning

Drew Hudson, Daniel Zoran, Mateusz Malinowski et al.

CVPR 2024posterarXiv:2311.17901
#10345

AHIVE: Anatomy-aware Hierarchical Vision Encoding for Interactive Radiology Report Retrieval

Sixing Yan, William K. Cheung, Ivor Tsang et al.

CVPR 2024poster
#10346

SPU-PMD: Self-Supervised Point Cloud Upsampling via Progressive Mesh Deformation

Yanzhe Liu, Rong Chen, Yushi Li et al.

CVPR 2024poster
#10347

Enhancing the Power of OOD Detection via Sample-Aware Model Selection

Feng Xue, Zi He, Yuan Zhang et al.

CVPR 2024poster
#10348

Self-Supervised Class-Agnostic Motion Prediction with Spatial and Temporal Consistency Regularizations

Kewei Wang, Yizheng Wu, Jun Cen et al.

CVPR 2024posterarXiv:2403.13261
#10349

MMA: Multi-Modal Adapter for Vision-Language Models

Lingxiao Yang, Ru-Yuan Zhang, Yanchen Wang et al.

CVPR 2024poster
#10350

Grounding and Enhancing Grid-based Models for Neural Fields

Zelin Zhao, FENGLEI FAN, Wenlong Liao et al.

CVPR 2024posterarXiv:2403.20002
#10351

A Category Agnostic Model for Visual Rearrangment

Yuyi Liu, Xinhang Song, Weijie Li et al.

CVPR 2024poster
#10352

Towards More Unified In-context Visual Understanding

Dianmo Sheng, Dongdong Chen, Zhentao Tan et al.

CVPR 2024posterarXiv:2312.02520
#10353

Towards Progressive Multi-Frequency Representation for Image Warping

Jun Xiao, Zihang Lyu, Cong Zhang et al.

CVPR 2024poster
#10354

Fourier-basis Functions to Bridge Augmentation Gap: Rethinking Frequency Augmentation in Image Classification

Mei Vaish, Shunxin Wang, Nicola Strisciuglio

CVPR 2024posterarXiv:2403.01944
#10355

VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction

Jiaqi Lin, Zhihao Li, Xiao Tang et al.

CVPR 2024posterarXiv:2402.17427
#10356

What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation

Yihua Cheng, Yaning Zhu, Zongji Wang et al.

CVPR 2024posterarXiv:2403.15664
#10357

Molecular Data Programming: Towards Molecule Pseudo-labeling with Systematic Weak Supervision

Xin Juan, Kaixiong Zhou, Ninghao Liu et al.

CVPR 2024poster
#10358

OTE: Exploring Accurate Scene Text Recognition Using One Token

Jianjun Xu, Yuxin Wang, Hongtao Xie et al.

CVPR 2024poster
#10359

TTA-EVF: Test-Time Adaptation for Event-based Video Frame Interpolation via Reliable Pixel and Sample Estimation

Hoonhee Cho, Taewoo Kim, Yuhwan Jeong et al.

CVPR 2024poster
#10360

HUGS: Human Gaussian Splats

Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel et al.

CVPR 2024posterarXiv:2311.17910
#10361

Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models

Yushi Hu, Otilia Stretcu, Chun-Ta Lu et al.

CVPR 2024posterarXiv:2312.03052
#10362

Mocap Everyone Everywhere: Lightweight Motion Capture With Smartwatches and a Head-Mounted Camera

Jiye Lee, Hanbyul Joo

CVPR 2024posterarXiv:2401.00847
#10363

DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses

Chen Zhao, Tong Zhang, Zheng Dang et al.

CVPR 2024poster
#10364

AVID: Any-Length Video Inpainting with Diffusion Model

Zhixing Zhang, Bichen Wu, Xiaoyan Wang et al.

CVPR 2024posterarXiv:2312.03816
#10365

Hyper-MD: Mesh Denoising with Customized Parameters Aware of Noise Intensity and Geometric Characteristics

Xingtao Wang, Hongliang Wei, Xiaopeng Fan et al.

CVPR 2024poster
#10366

KTPFormer: Kinematics and Trajectory Prior Knowledge-Enhanced Transformer for 3D Human Pose Estimation

Jihua Peng, Yanghong Zhou, Tracy P Y Mok

CVPR 2024posterarXiv:2404.00658
#10367

MFP: Making Full Use of Probability Maps for Interactive Image Segmentation

Chaewon Lee, Seon-Ho Lee, Chang-Su Kim

CVPR 2024posterarXiv:2404.18448
#10368

Cross-Domain Few-Shot Segmentation via Iterative Support-Query Correspondence Mining

Jiahao Nie, Yun Xing, Gongjie Zhang et al.

CVPR 2024posterarXiv:2401.08407
#10369

Strong Transferable Adversarial Attacks via Ensembled Asymptotically Normal Distribution Learning

Zhengwei Fang, Rui Wang, Tao Huang et al.

CVPR 2024highlightarXiv:2209.11964
#10370

An Empirical Study of Scaling Law for Scene Text Recognition

Miao Rang, Zhenni Bi, Chuanjian Liu et al.

CVPR 2024poster
#10371

Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching

Xianqi Wang, Gangwei Xu, Hao Jia et al.

CVPR 2024highlightarXiv:2403.00486
#10372

When StyleGAN Meets Stable Diffusion: a W+ Adapter for Personalized Image Generation

Xiaoming Li, Xinyu Hou, Chen Change Loy

CVPR 2024poster
#10373

Differentiable Neural Surface Refinement for Modeling Transparent Objects

Weijian Deng, Dylan Campbell, Chunyi Sun et al.

CVPR 2024poster
#10374

Low-power Continuous Remote Behavioral Localization with Event Cameras

Friedhelm Hamann, Suman Ghosh, Ignacio Juarez Martinez et al.

CVPR 2024posterarXiv:2312.03799
#10375

Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting

Taeho Kang, Youngki Lee

CVPR 2024highlightarXiv:2402.18330
#10376

AM-RADIO: Agglomerative Vision Foundation Model Reduce All Domains Into One

Mike Ranzinger, Greg Heinrich, Jan Kautz et al.

CVPR 2024posterarXiv:2312.06709
#10377

Towards Co-Evaluation of Cameras HDR and Algorithms for Industrial-Grade 6DoF Pose Estimation

Agastya Kalra, Guy Stoppi, Dmitrii Marin et al.

CVPR 2024poster
#10378

Tune-An-Ellipse: CLIP Has Potential to Find What You Want

Jinheng Xie, Songhe Deng, Bing Li et al.

CVPR 2024highlight
#10379

BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model

song yiran, Qianyu Zhou, Xiangtai Li et al.

CVPR 2024posterarXiv:2401.02317
#10380

EarthLoc: Astronaut Photography Localization by Indexing Earth from Space

Gabriele Berton, Alex Stoken, Barbara Caputo et al.

CVPR 2024posterarXiv:2403.06758
#10381

PairDETR : Joint Detection and Association of Human Bodies and Faces

Ammar Ali, Georgii Gaikov, Denis Rybalchenko et al.

CVPR 2024poster
#10382

Close Imitation of Expert Retouching for Black-and-White Photography

Seunghyun Shin, Jisu Shin, Jihwan Bae et al.

CVPR 2024poster
#10383

OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos

Dongyoung Choi, Hyeonjoong Jang, Min H. Kim

CVPR 2024posterarXiv:2404.00676
#10384

Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model

Kai Yang, Jian Tao, Jiafei Lyu et al.

CVPR 2024posterarXiv:2311.13231
#10385

Reconstructing Hands in 3D with Transformers

Georgios Pavlakos, Dandan Shan, Ilija Radosavovic et al.

CVPR 2024posterarXiv:2312.05251
#10386

XFeat: Accelerated Features for Lightweight Image Matching

Guilherme Potje, Felipe Cadar, André Araujo et al.

CVPR 2024posterarXiv:2404.19174
#10387

Systematic Comparison of Semi-supervised and Self-supervised Learning for Medical Image Classification

Zhe Huang, Ruijie Jiang, Shuchin Aeron et al.

CVPR 2024posterarXiv:2307.08919
#10388

GoodSAM: Bridging Domain and Capacity Gaps via Segment Anything Model for Distortion-aware Panoramic Semantic Segmentation

WEIMING ZHANG, Yexin Liu, Xu Zheng et al.

CVPR 2024posterarXiv:2403.16370
#10389

VRP-SAM: SAM with Visual Reference Prompt

Yanpeng Sun, Jiahui Chen, Shan Zhang et al.

CVPR 2024posterarXiv:2402.17726
#10390

DiffuScene: Denoising Diffusion Models for Generative Indoor Scene Synthesis

Jiapeng Tang, Yinyu Nie, Lev Markhasin et al.

CVPR 2024posterarXiv:2303.14207
#10391

Looking Similar Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning

Nikhil Singh, Chih-Wei Wu, Iroro Orife et al.

CVPR 2024posterarXiv:2304.05600
#10392

YOLO-World: Real-Time Open-Vocabulary Object Detection

Tianheng Cheng, Lin Song, Yixiao Ge et al.

CVPR 2024posterarXiv:2401.17270
#10393

Bézier Everywhere All at Once: Learning Drivable Lanes as Bézier Graphs

Hugh Blayney, Hanlin Tian, Hamish Scott et al.

CVPR 2024poster
#10394

Taming Self-Training for Open-Vocabulary Object Detection

Shiyu Zhao, Samuel Schulter, Long Zhao et al.

CVPR 2024posterarXiv:2308.06412
#10395

Ink Dot-Oriented Differentiable Optimization for Neural Image Halftoning

Hao Jiang, Bingfeng Zhou, Yadong Mu

CVPR 2024poster
#10396

GeoChat: Grounded Large Vision-Language Model for Remote Sensing

Kartik Kuckreja, Muhammad Sohail Danish, Muzammal Naseer et al.

CVPR 2024posterarXiv:2311.15826
#10397

FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Action Segmentation

Zijia Lu, Ehsan Elhamifar

CVPR 2024poster
#10398

GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians

Liangxiao Hu, Hongwen Zhang, Yuxiang Zhang et al.

CVPR 2024posterarXiv:2312.02134
#10399

ShapeMatcher: Self-Supervised Joint Shape Canonicalization Segmentation Retrieval and Deformation

Yan Di, Chenyangguang Zhang, Chaowei Wang et al.

CVPR 2024poster
#10400

Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning

Rui Li, Tobias Fischer, Mattia Segu et al.

CVPR 2024posterarXiv:2404.03658