Most Cited CVPR "exploration rate" Papers
5,589 papers found • Page 24 of 28
Conference
MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation
Petru-Daniel Tudosiu, Yongxin Yang, Shifeng Zhang et al.
Collaborative Learning of Anomalies with Privacy (CLAP) for Unsupervised Video Anomaly Detection: A New Baseline
Anas Al-lahham, Muhammad Zaigham Zaheer, Nurbek Tastan et al.
DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data
Chengxiang Fan, Muzhi Zhu, Hao Chen et al.
SPAD: Spatially Aware Multi-View Diffusers
Yash Kant, Aliaksandr Siarohin, Ziyi Wu et al.
DiffPerformer: Iterative Learning of Consistent Latent Guidance for Diffusion-based Human Video Generation
Chenyang Wang, Zerong Zheng, Tao Yu et al.
Unsupervised Video Domain Adaptation with Masked Pre-Training and Collaborative Self-Training
Arun Reddy, William Paul, Corban Rivera et al.
RCBEVDet: Radar-camera Fusion in Bird's Eye View for 3D Object Detection
Zhiwei Lin, Zhe Liu, Zhongyu Xia et al.
FineParser: A Fine-grained Spatio-temporal Action Parser for Human-centric Action Quality Assessment
Jinglin Xu, Sibo Yin, Guohao Zhao et al.
SceneFun3D: Fine-Grained Functionality and Affordance Understanding in 3D Scenes
Alexandros Delitzas, Ayça Takmaz, Federico Tombari et al.
Do Vision and Language Encoders Represent the World Similarly?
Mayug Maniparambil, Raiymbek Akshulakov, YASSER ABDELAZIZ DAHOU DJILALI et al.
Construct to Associate: Cooperative Context Learning for Domain Adaptive Point Cloud Segmentation
Guangrui Li
Wavelet-based Fourier Information Interaction with Frequency Diffusion Adjustment for Underwater Image Restoration
Chen Zhao, Weiling Cai, Chenyu Dong et al.
Map-Relative Pose Regression for Visual Re-Localization
Shuai Chen, Tommaso Cavallari, Victor Adrian Prisacariu et al.
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
Willi Menapace, Aliaksandr Siarohin, Ivan Skorokhodov et al.
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
Zigang Geng, Binxin Yang, Tiankai Hang et al.
Towards General Robustness Verification of MaxPool-based Convolutional Neural Networks via Tightening Linear Approximation
Yuan Xiao, Shiqing Ma, Juan Zhai et al.
Overcoming Generic Knowledge Loss with Selective Parameter Update
Wenxuan Zhang, Paul Janson, Rahaf Aljundi et al.
Lane2Seq: Towards Unified Lane Detection via Sequence Generation
Kunyang Zhou
Diffusion Model Alignment Using Direct Preference Optimization
Bram Wallace, Meihua Dang, Rafael Rafailov et al.
SD4Match: Learning to Prompt Stable Diffusion Model for Semantic Matching
Xinghui Li, Jingyi Lu, Kai Han et al.
LTM: Lightweight Textured Mesh Extraction and Refinement of Large Unbounded Scenes for Efficient Storage and Real-time Rendering
Jaehoon Choi, Rajvi Shah, Qinbo Li et al.
Geometry Transfer for Stylizing Radiance Fields
Hyunyoung Jung, Seonghyeon Nam, Nikolaos Sarafianos et al.
Boosting Self-Supervision for Single-View Scene Completion via Knowledge Distillation
Keonhee Han, Dominik Muhle, Felix Wimbauer et al.
CrossKD: Cross-Head Knowledge Distillation for Object Detection
JiaBao Wang, yuming chen, Zhaohui Zheng et al.
TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing
Sherry X. Chen, Yaron Vaxman, Elad Ben Baruch et al.
Leveraging Camera Triplets for Efficient and Accurate Structure-from-Motion
Lalit Manam, Venu Madhav Govindu
CG-HOI: Contact-Guided 3D Human-Object Interaction Generation
Christian Diller, Angela Dai
Resurrecting Old Classes with New Data for Exemplar-Free Continual Learning
Dipam Goswami, Albin Soutif, Yuyang Liu et al.
HOI-M^3: Capture Multiple Humans and Objects Interaction within Contextual Environment
Juze Zhang, Jingyan Zhang, Zining Song et al.
PDF: A Probability-Driven Framework for Open World 3D Point Cloud Semantic Segmentation
Jinfeng Xu, Siyuan Yang, Xianzhi Li et al.
NeISF: Neural Incident Stokes Field for Geometry and Material Estimation
Chenhao Li, Taishi Ono, Takeshi Uemori et al.
TransLoc4D: Transformer-based 4D Radar Place Recognition
Guohao Peng, Heshan Li, Yangyang Zhao et al.
Higher-order Relational Reasoning for Pedestrian Trajectory Prediction
Sungjune Kim, Hyung-gun Chi, Hyerin Lim et al.
DSGG: Dense Relation Transformer for an End-to-end Scene Graph Generation
Zeeshan Hayder, Xuming He
Design2Cloth: 3D Cloth Generation from 2D Masks
Jiali Zheng, Rolandos Alexandros Potamias, Stefanos Zafeiriou
S-DyRF: Reference-Based Stylized Radiance Fields for Dynamic Scenes
Xingyi Li, Zhiguo Cao, Yizheng Wu et al.
SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation
Aysim Toker, Marvin Eisenberger, Daniel Cremers et al.
Dual-Consistency Model Inversion for Non-Exemplar Class Incremental Learning
Zihuan Qiu, Yi Xu, Fanman Meng et al.
DS-NeRV: Implicit Neural Video Representation with Decomposed Static and Dynamic Codes
Hao Yan, Zhihui Ke, Xiaobo Zhou et al.
Rolling Shutter Correction with Intermediate Distortion Flow Estimation
Mingdeng Cao, Sidi Yang, Yujiu Yang et al.
Towards Transferable Targeted 3D Adversarial Attack in the Physical World
Yao Huang, Yinpeng Dong, Shouwei Ruan et al.
Hybrid Functional Maps for Crease-Aware Non-Isometric Shape Matching
Lennart Bastian, Yizheng Xie, Nassir Navab et al.
AnyDoor: Zero-shot Object-level Image Customization
Xi Chen, Lianghua Huang, Yu Liu et al.
Improving Training Efficiency of Diffusion Models via Multi-Stage Framework and Tailored Multi-Decoder Architecture
Huijie Zhang, Yifu Lu, Ismail Alkhouri et al.
Learnable Earth Parser: Discovering 3D Prototypes in Aerial Scans
Romain Loiseau, Elliot Vincent, Mathieu Aubry et al.
PIGEON: Predicting Image Geolocations
Lukas Haas, Michal Skreta, Silas Alberti et al.
GPLD3D: Latent Diffusion of 3D Shape Generative Models by Enforcing Geometric and Physical Priors
Yuan Dong, Qi Zuo, Xiaodong Gu et al.
Synthesize Diagnose and Optimize: Towards Fine-Grained Vision-Language Understanding
Wujian Peng, Sicheng Xie, Zuyao You et al.
FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models
LIn Zhao, Tianchen Zhao, Zinan Lin et al.
COLMAP-Free 3D Gaussian Splatting
Yang Fu, Sifei Liu, Amey Kulkarni et al.
Personalized Residuals for Concept-Driven Text-to-Image Generation
Cusuh Ham, Matthew Fisher, James Hays et al.
Forecasting of 3D Whole-body Human Poses with Grasping Objects
yan haitao, Qiongjie Cui, Jiexin Xie et al.
Edge-Aware 3D Instance Segmentation Network with Intelligent Semantic Prior
Wonseok Roh, Hwanhee Jung, Giljoo Nam et al.
RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models
Ozgur Kara, Bariscan Kurtkaya, Hidir Yesiltepe et al.
Generalizable Novel-View Synthesis using a Stereo Camera
Haechan Lee, Wonjoon Jin, Seung-Hwan Baek et al.
Don’t Drop Your Samples! Coherence-Aware Training Benefits Conditional Diffusion
Nicolas Dufour, Victor Besnier, Vicky Kalogeiton et al.
Action-slot: Visual Action-centric Representations for Multi-label Atomic Activity Recognition in Traffic Scenes
Chi-Hsi Kung, 書緯 呂, Yi-Hsuan Tsai et al.
Diff-BGM: A Diffusion Model for Video Background Music Generation
Sizhe Li, Yiming Qin, Minghang Zheng et al.
Shadow-Enlightened Image Outpainting
Hang Yu, Ruilin Li, Shaorong Xie et al.
Specularity Factorization for Low-Light Enhancement
Saurabh Saini, P. J. Narayanan
Shallow-Deep Collaborative Learning for Unsupervised Visible-Infrared Person Re-Identification
Bin Yang, Jun Chen, Mang Ye
Attack To Defend: Exploiting Adversarial Attacks for Detecting Poisoned Models
Samar Fares, Karthik Nandakumar
LiDAR-Net: A Real-scanned 3D Point Cloud Dataset for Indoor Scenes
Yanwen Guo, Yuanqi Li, Dayong Ren et al.
Non-autoregressive Sequence-to-Sequence Vision-Language Models
Kunyu Shi, Qi Dong, Luis Goncalves et al.
No More Ambiguity in 360° Room Layout via Bi-Layout Estimation
Yu-Ju Tsai, Jin-Cheng Jhang, JINGJING ZHENG et al.
HUNTER: Unsupervised Human-centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real Scenes
Yichen Yao, Zimo Jiang, YUJING SUN et al.
3D LiDAR Mapping in Dynamic Environments using a 4D Implicit Neural Representation
Xingguang Zhong, Yue Pan, Cyrill Stachniss et al.
See Say and Segment: Teaching LMMs to Overcome False Premises
Tsung-Han Wu, Giscard Biamby, David Chan et al.
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
Li Hu
Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation
Xiao Lin, Wenfei Yang, Yuan Gao et al.
PostureHMR: Posture Transformation for 3D Human Mesh Recovery
Yu-Pei Song, Xiao WU, Zhaoquan Yuan et al.
WANDR: Intention-guided Human Motion Generation
Markos Diomataris, Nikos Athanasiou, Omid Taheri et al.
WWW: A Unified Framework for Explaining What Where and Why of Neural Networks by Interpretation of Neuron Concepts
Yong Hyun Ahn, Hyeon Bae Kim, Seong Tae Kim
SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation
Yuxuan Zhang, Yiren Song, Jiaming Liu et al.
Denoising Point Clouds in Latent Space via Graph Convolution and Invertible Neural Network
Aihua Mao, Biao Yan, Zijing Ma et al.
SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation
Thuan Nguyen, Anh Tran
Looking 3D: Anomaly Detection with 2D-3D Alignment
Ankan Kumar Bhunia, Changjian Li, Hakan Bilen
EventPS: Real-Time Photometric Stereo Using an Event Camera
Bohan Yu, Jieji Ren, Jin Han et al.
Circuit Design and Efficient Simulation of Quantum Inner Product and Empirical Studies of Its Effect on Near-Term Hybrid Quantum-Classic Machine Learning
Hao Xiong, Yehui Tang, Xinyu Ye et al.
PSDPM: Prototype-based Secondary Discriminative Pixels Mining for Weakly Supervised Semantic Segmentation
Xinqiao Zhao, Ziqian Yang, Tianhong Dai et al.
Towards 3D Vision with Low-Cost Single-Photon Cameras
Fangzhou Mu, Carter Sifferman, Sacha Jungerman et al.
On Train-Test Class Overlap and Detection for Image Retrieval
Chull Hwan Song, Jooyoung Yoon, Taebaek Hwang et al.
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Jieneng Chen, Qihang Yu, Xiaohui Shen et al.
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models
Yiming Zhang, Zhening Xing, Yanhong Zeng et al.
DemoCaricature: Democratising Caricature Generation with a Rough Sketch
Dar-Yen Chen, Ayan Kumar Bhunia, Subhadeep Koley et al.
Language-driven Object Fusion into Neural Radiance Fields with Pose-Conditioned Dataset Updates
Ka Chun SHUM, Jaeyeon Kim, Binh-Son Hua et al.
Unlocking the Potential of Pre-trained Vision Transformers for Few-Shot Semantic Segmentation through Relationship Descriptors
Ziqin Zhou, Hai-Ming Xu, Yangyang Shu et al.
Relightful Harmonization: Lighting-aware Portrait Background Replacement
Mengwei Ren, Wei Xiong, Jae Shin Yoon et al.
Taming the Tail in Class-Conditional GANs: Knowledge Sharing via Unconditional Training at Lower Resolutions
Saeed Khorram, Mingqi Jiang, Mohamad Shahbazi et al.
MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
Yanhui Wang, Jianmin Bao, Wenming Weng et al.
LightOctree: Lightweight 3D Spatially-Coherent Indoor Lighting Estimation
Xuecan Wang, Shibang Xiao, Xiaohui Liang
Bi-level Learning of Task-Specific Decoders for Joint Registration and One-Shot Medical Image Segmentation
Xin Fan, Xiaolin Wang, Jiaxin Gao et al.
FairCLIP: Harnessing Fairness in Vision-Language Learning
Yan Luo, MIN SHI, Muhammad Osama Khan et al.
ArGue: Attribute-Guided Prompt Tuning for Vision-Language Models
Xinyu Tian, Shu Zou, Zhaoyuan Yang et al.
MS-MANO: Enabling Hand Pose Tracking with Biomechanical Constraints
Pengfei Xie, Wenqiang Xu, Tutian Tang et al.
CGI-DM: Digital Copyright Authentication for Diffusion Models via Contrasting Gradient Inversion
Xiaoyu Wu, Yang Hua, Chumeng Liang et al.
Separate and Conquer: Decoupling Co-occurrence via Decomposition and Representation for Weakly Supervised Semantic Segmentation
Zhiwei Yang, Kexue Fu, Minghong Duan et al.
Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence
Junyi Zhang, Charles Herrmann, Junhwa Hur et al.
MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual Grounding
Chun-Peng Chang, Shaoxiang Wang, Alain Pagani et al.
RTracker: Recoverable Tracking via PN Tree Structured Memory
Yuqing Huang, Xin Li, Zikun Zhou et al.
Efficient Solution of Point-Line Absolute Pose
Petr Hruby, Timothy Duff, Marc Pollefeys
SPIN: Simultaneous Perception Interaction and Navigation
Shagun Uppal, Ananye Agarwal, Haoyu Xiong et al.
CAMixerSR: Only Details Need More "Attention"
Yan Wang, Yi Liu, Shijie Zhao et al.
MarkovGen: Structured Prediction for Efficient Text-to-Image Generation
Sadeep Jayasumana, Daniel Glasner, Srikumar Ramalingam et al.
Constrained Layout Generation with Factor Graphs
Mohammed Haroon Dupty, Yanfei Dong, Sicong Leng et al.
Neural Implicit Morphing of Face Images
Guilherme Schardong, Tiago Novello, Hallison Paz et al.
Kernel Adaptive Convolution for Scene Text Detection via Distance Map Prediction
Jinzhi Zheng, Heng Fan, Libo Zhang
PlatoNeRF: 3D Reconstruction in Plato's Cave via Single-View Two-Bounce Lidar
Tzofi Klinghoffer, Xiaoyu Xiang, Siddharth Somasundaram et al.
Tackling the Singularities at the Endpoints of Time Intervals in Diffusion Models
Pengze Zhang, Hubery Yin, Chen Li et al.
Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning
Siteng Huang, Biao Gong, Yutong Feng et al.
LoCoNet: Long-Short Context Network for Active Speaker Detection
Xizi Wang, Feng Cheng, Gedas Bertasius
WinSyn: : A High Resolution Testbed for Synthetic Data
Tom Kelly, John Femiani, Peter Wonka
SpatialTracker: Tracking Any 2D Pixels in 3D Space
Yuxi Xiao, Qianqian Wang, Shangzhan Zhang et al.
Multimodal Representation Learning by Alternating Unimodal Adaptation
Xiaohui Zhang, Jaehong Yoon, Mohit Bansal et al.
Attention Calibration for Disentangled Text-to-Image Personalization
Yanbing Zhang, Mengping Yang, Qin Zhou et al.
Segment Every Out-of-Distribution Object
Wenjie Zhao, Jia Li, Xin Dong et al.
Open-Vocabulary Object 6D Pose Estimation
Jaime Corsetti, Davide Boscaini, Changjae Oh et al.
Quantifying Uncertainty in Motion Prediction with Variational Bayesian Mixture
Juanwu Lu, Can Cui, Yunsheng Ma et al.
SG-PGM: Partial Graph Matching Network with Semantic Geometric Fusion for 3D Scene Graph Alignment and Its Downstream Tasks
Yaxu Xie, Alain Pagani, Didier Stricker
Fast ODE-based Sampling for Diffusion Models in Around 5 Steps
Zhenyu Zhou, Defang Chen, Can Wang et al.
From-Ground-To-Objects: Coarse-to-Fine Self-supervised Monocular Depth Estimation of Dynamic Objects with Ground Contact Prior
Jaeho Moon, Juan Luis Gonzalez Bello, Byeongjun Kwon et al.
MMCert: Provable Defense against Adversarial Attacks to Multi-modal Models
Yanting Wang, Hongye Fu, Wei Zou et al.
Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization
Guopeng Li, Ming Qian, Gui-Song Xia
Hearing Anything Anywhere
Mason Wang, Ryosuke Sawata, Samuel Clarke et al.
Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning
Chen Zhao, Shuming Liu, Karttikeya Mangalam et al.
Video-Based Human Pose Regression via Decoupled Space-Time Aggregation
Jijie He, Wenwu Yang
Boosting Image Restoration via Priors from Pre-trained Models
Xiaogang Xu, Shu Kong, Tao Hu et al.
RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features
Geonho Bang, Kwangjin Choi, Jisong Kim et al.
EasyDrag: Efficient Point-based Manipulation on Diffusion Models
Xingzhong Hou, Boxiao Liu, Yi Zhang et al.
Learned Lossless Image Compression based on Bit Plane Slicing
Zhe Zhang, Huairui Wang, Zhenzhong Chen et al.
BEM: Balanced and Entropy-based Mix for Long-Tailed Semi-Supervised Learning
Hongwei Zheng, Linyuan Zhou, Han Li et al.
TexTile: A Differentiable Metric for Texture Tileability
Carlos Rodriguez-Pardo, Dan Casas, Elena Garces et al.
MatSynth: A Modern PBR Materials Dataset
Giuseppe Vecchio, Valentin Deschaintre
LED: A Large-scale Real-world Paired Dataset for Event Camera Denoising
Yuxing Duan
Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models
Daniel Geng, Inbum Park, Andrew Owens
Learn from View Correlation: An Anchor Enhancement Strategy for Multi-view Clustering
Suyuan Liu, KE LIANG, Zhibin Dong et al.
Passive Snapshot Coded Aperture Dual-Pixel RGB-D Imaging
Bhargav Ghanekar, Salman Siddique Khan, Pranav Sharma et al.
ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations
Maitreya Patel, Changhoon Kim, Sheng Cheng et al.
Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning
Haoyu Chen, Wenbo Li, Jinjin Gu et al.
Dual DETRs for Multi-Label Temporal Action Detection
Yuhan Zhu, Guozhen Zhang, Jing Tan et al.
GigaTraj: Predicting Long-term Trajectories of Hundreds of Pedestrians in Gigapixel Complex Scenes
Haozhe Lin, Chunyu Wei, Li He et al.
Image Sculpting: Precise Object Editing with 3D Geometry Control
Jiraphon Yenphraphai, Xichen Pan, Sainan Liu et al.
ID-Blau: Image Deblurring by Implicit Diffusion-based reBLurring AUgmentation
Jia-Hao Wu, Fu-Jen Tsai, Yan-Tsung Peng et al.
HumanNeRF-SE: A Simple yet Effective Approach to Animate HumanNeRF with Diverse Poses
Caoyuan Ma, Yu-Lun Liu, Zhixiang Wang et al.
LeftRefill: Filling Right Canvas based on Left Reference through Generalized Text-to-Image Diffusion Model
Chenjie Cao, Yunuo Cai, Qiaole Dong et al.
Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation
Wenxuan Wang, Tongtian Yue, Yisi Zhang et al.
Ungeneralizable Examples
Jingwen Ye, Xinchao Wang
Language-only Training of Zero-shot Composed Image Retrieval
Geonmo Gu, Sanghyuk Chun, Wonjae Kim et al.
Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models
Shitian Zhao, Zhuowan Li, YadongLu et al.
Instruct-Imagen: Image Generation with Multi-modal Instruction
Hexiang Hu, Kelvin C.K. Chan, Yu-Chuan Su et al.
Diffeomorphic Template Registration for Atmospheric Turbulence Mitigation
Dong Lao, Congli Wang, Alex Wong et al.
Holodeck: Language Guided Generation of 3D Embodied AI Environments
Yue Yang, Fan-Yun Sun, Luca Weihs et al.
SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks
Xinyu Shi, Zecheng Hao, Zhaofei Yu
Tactile-Augmented Radiance Fields
Yiming Dou, Fengyu Yang, Yi Liu et al.
KVQ: Kwai Video Quality Assessment for Short-form Videos
Yiting Lu, Xin Li, Yajing Pei et al.
Purified and Unified Steganographic Network
GuoBiao Li, Sheng Li, Zicong Luo et al.
Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model
Dian Zheng, Xiao-Ming Wu, Shuzhou Yang et al.
L4D-Track: Language-to-4D Modeling Towards 6-DoF Tracking and Shape Reconstruction in 3D Point Cloud Stream
Jingtao Sun, Yaonan Wang, Mingtao Feng et al.
MAPSeg: Unified Unsupervised Domain Adaptation for Heterogeneous Medical Image Segmentation Based on 3D Masked Autoencoding and Pseudo-Labeling
Xuzhe Zhang, Yuhao Wu, Elsa Angelini et al.
Traffic Scene Parsing through the TSP6K Dataset
Peng-Tao Jiang, Yuqi Yang, Yang Cao et al.
DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Muyang Li, Tianle Cai, Jiaxin Cao et al.
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Juhong Min, Shyamal Buch, Arsha Nagrani et al.
DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models
Nastaran Saadati, Minh Pham, Nasla Saleem et al.
Text-Driven Image Editing via Learnable Regions
Yuanze Lin, Yi-Wen Chen, Yi-Hsuan Tsai et al.
Self-Supervised Representation Learning from Arbitrary Scenarios
Zhaowen Li, Yousong Zhu, Zhiyang Chen et al.
The Neglected Tails in Vision-Language Models
Shubham Parashar, Tian Liu, Zhiqiu Lin et al.
SODA: Bottleneck Diffusion Models for Representation Learning
Drew Hudson, Daniel Zoran, Mateusz Malinowski et al.
Enhancing the Power of OOD Detection via Sample-Aware Model Selection
Feng Xue, Zi He, Yuan Zhang et al.
Towards More Unified In-context Visual Understanding
Dianmo Sheng, Dongdong Chen, Zhentao Tan et al.
Fourier-basis Functions to Bridge Augmentation Gap: Rethinking Frequency Augmentation in Image Classification
Mei Vaish, Shunxin Wang, Nicola Strisciuglio
Molecular Data Programming: Towards Molecule Pseudo-labeling with Systematic Weak Supervision
Xin Juan, Kaixiong Zhou, Ninghao Liu et al.
Hyper-MD: Mesh Denoising with Customized Parameters Aware of Noise Intensity and Geometric Characteristics
Xingtao Wang, Hongliang Wei, Xiaopeng Fan et al.
Strong Transferable Adversarial Attacks via Ensembled Asymptotically Normal Distribution Learning
Zhengwei Fang, Rui Wang, Tao Huang et al.
An Empirical Study of Scaling Law for Scene Text Recognition
Miao Rang, Zhenni Bi, Chuanjian Liu et al.
Differentiable Neural Surface Refinement for Modeling Transparent Objects
Weijian Deng, Dylan Campbell, Chunyi Sun et al.
Tune-An-Ellipse: CLIP Has Potential to Find What You Want
Jinheng Xie, Songhe Deng, Bing Li et al.
OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos
Dongyoung Choi, Hyeonjoong Jang, Min H. Kim
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
Kai Yang, Jian Tao, Jiafei Lyu et al.
XFeat: Accelerated Features for Lightweight Image Matching
Guilherme Potje, Felipe Cadar, André Araujo et al.
Looking Similar Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Nikhil Singh, Chih-Wei Wu, Iroro Orife et al.
Taming Self-Training for Open-Vocabulary Object Detection
Shiyu Zhao, Samuel Schulter, Long Zhao et al.
FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Action Segmentation
Zijia Lu, Ehsan Elhamifar
GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians
Liangxiao Hu, Hongwen Zhang, Yuxiang Zhang et al.
FlowerFormer: Empowering Neural Architecture Encoding using a Flow-aware Graph Transformer
Dongyeong Hwang, Hyunju Kim, Sunwoo Kim et al.
Text2QR: Harmonizing Aesthetic Customization and Scanning Robustness for Text-Guided QR Code Generation
Guangyang Wu, Xiaohong Liu, Jun Jia et al.
ProMark: Proactive Diffusion Watermarking for Causal Attribution
Vishal Asnani, John Collomosse, Tu Bui et al.
DiffForensics: Leveraging Diffusion Prior to Image Forgery Detection and Localization
Zeqin Yu, Jiangqun Ni, Yuzhen Lin et al.
Neural Directional Encoding for Efficient and Accurate View-Dependent Appearance Modeling
Liwen Wu, Sai Bi, Zexiang Xu et al.
SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection
Gang Zhang, Chen Junnan, Guohuan Gao et al.
Sheared Backpropagation for Fine-tuning Foundation Models
Zhiyuan Yu, Li Shen, Liang Ding et al.
On the Content Bias in Fréchet Video Distance
Songwei Ge, Aniruddha Mahapatra, Gaurav Parmar et al.
Multiview Aerial Visual RECognition (MAVREC): Can Multi-view Improve Aerial Visual Perception?
Aritra Dutta, Srijan Das, Jacob Nielsen et al.
CPGA: Coding Priors-Guided Aggregation Network for Compressed Video Quality Enhancement
Qiang Zhu, Jinhua Hao, Yukang Ding et al.
Video2Game: Real-time Interactive Realistic and Browser-Compatible Environment from a Single Video
Hongchi Xia, Chih-Hao Lin, Wei-Chiu Ma et al.
Identifying Important Group of Pixels using Interactions
Kosuke Sumiyasu, Kazuhiko Kawamoto, Hiroshi Kera
Are Conventional SNNs Really Efficient? A Perspective from Network Quantization
Guobin Shen, Dongcheng Zhao, Tenglong Li et al.
CapHuman: Capture Your Moments in Parallel Universes
Chao Liang, Fan Ma, Linchao Zhu et al.
Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation
Hang Li, Chengzhi Shen, Philip H.S. Torr et al.
Towards Automatic Power Battery Detection: New Challenge Benchmark Dataset and Baseline
Xiaoqi Zhao, Youwei Pang, Zhenyu Chen et al.
Infrared Small Target Detection with Scale and Location Sensitivity
Qiankun Liu, Rui Liu, Bolun Zheng et al.