Most Cited ECCV "llm web agents" Papers
2,387 papers found • Page 5 of 12
Conference
Motion and Structure from Event-based Normal Flow
Zhongyang Ren, Bangyan Liao, Delei Kong et al.
AdaDistill: Adaptive Knowledge Distillation for Deep Face Recognition
Fadi Boutros, Vitomir Struc, Naser Damer
Generating Physically Realistic and Directable Human Motions from Multi-Modal Inputs
Aayam Shrestha, Pan Liu, German Ros et al.
KDProR: A Knowledge-Decoupling Probabilistic Framework for Video-Text Retrieval
Xianwei Zhuang, Hongxiang Li, Xuxin Cheng et al.
ByteEdit: Boost, Comply and Accelerate Generative Image Editing
YUXI REN, Jie Wu, Yanzuo Lu et al.
CloudFixer: Test-Time Adaptation for 3D Point Clouds via Diffusion-Guided Geometric Transformation
Hajin Shim, Changhun Kim, Eunho Yang
Graph Neural Network Causal Explanation via Neural Causal Models
Arman Behnam, Binghui Wang
Open-Set Recognition in the Age of Vision-Language Models
Dimity Miller, Niko Suenderhauf, Alex Kenna et al.
Temporal-Mapping Photography for Event Cameras
Yuhan Bao, Lei Sun, Yuqin Ma et al.
Placing Objects in Context via Inpainting for Out-of-distribution Segmentation
Pau de Jorge Aranda, Riccardo Volpi, Puneet Dokania et al.
RepVF: A Unified Vector Fields Representation for Multi-task 3D Perception
Shen Jianbing, Chunliang Li, Wencheng Han et al.
Hiding Imperceptible Noise in Curvature-Aware Patches for 3D Point Cloud Attack
Mingyu Yang, Daizong Liu, Keke Tang et al.
Exploiting Semantic Reconstruction to Mitigate Hallucinations in Vision-Language Models
Minchan Kim, Minyeong Kim, Junik Bae et al.
Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
Mohaiminul Islam, Tushar Nagarajan, Huiyu Wang et al.
Distributionally Robust Loss for Long-Tailed Multi-Label Image Classification
Dekun Lin, Zhe Cui, Rui Chen et al.
BEAF: Observing BEfore-AFter Changes to Evaluate Hallucination in Vision-language Models
Ye-Bin Moon, Nam Hyeon-Woo, Wonseok Choi et al.
Multiscale Sliced Wasserstein Distances as Perceptual Color Difference Measures
Jiaqi He, Zhihua Wang, Leon Wang et al.
Idempotent Unsupervised Representation Learning for Skeleton-Based Action Recognition
Lilang Lin, Lehong Wu, Jiahang Zhang et al.
Few-shot NeRF by Adaptive Rendering Loss Regularization
Qingshan Xu, Xuanyu Yi, Jianyao Xu et al.
Emerging Property of Masked Token for Effective Pre-training
Hyesong Choi, Hunsang Lee, Seyoung Joung et al.
VQ-HPS: Human Pose and Shape Estimation in a Vector-Quantized Latent Space
Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda et al.
Shape2Scene: 3D Scene Representation Learning Through Pre-training on Shape Data
Tuo FENG, Wenguan Wang, Ruijie Quan et al.
Learning Semantic Latent Directions for Accurate and Controllable Human Motion Prediction
Guowei Xu, Jiale Tao, Wen Li et al.
Event-Aided Time-To-Collision Estimation for Autonomous Driving
Jinghang Li, Bangyan Liao, Xiuyuan LU et al.
3D Weakly Supervised Semantic Segmentation with 2D Vision-Language Guidance
Xiaoxu Xu, Yitian Yuan, Jinlong Li et al.
Enhancing Perceptual Quality in Video Super-Resolution through Temporally-Consistent Detail Synthesis using Diffusion Models
Claudio Rota, Marco Buzzelli, Joost Van de Weijer
Dual-level Adaptive Self-Labeling for Novel Class Discovery in Point Cloud Segmentation
Ruijie Xu, Chuyu Zhang, Hui Ren et al.
BlenderAlchemy: Editing 3D Graphics with Vision-Language Models
Ian Huang, Guandao Yang, Leonidas Guibas
Exploring Vulnerabilities in Spiking Neural Networks: Direct Adversarial Attacks on Raw Event Data
Yanmeng Yao, Xiaohan Zhao, Bin Gu
ActionVOS: Actions as Prompts for Video Object Segmentation
LIANGYANG OUYANG, Ruicong Liu, Yifei Huang et al.
Take A Step Back: Rethinking the Two Stages in Visual Reasoning
Mingyu Zhang, Jiting Cai, Mingyu Liu et al.
Certifiably Robust Image Watermark
Zhengyuan Jiang, Moyang Guo, Yuepeng Hu et al.
CountFormer: Multi-View Crowd Counting Transformer
Hong Mo, Xiong Zhang, Jianchao Tan et al.
Rethinking Video Deblurring with Wavelet-Aware Dynamic Transformer and Diffusion Model
chen rao, Guangyuan Li, Zehua Lan et al.
T-CorresNet: Template Guided 3D Point Cloud Completion with Correspondence Pooling Query Generation Strategy
Fan Duan, Jiahao Yu, Li Chen
DNI: Dilutional Noise Initialization for Diffusion Video Editing
Sunjae Yoon, Gwanhyeong Koo, Ji Woo Hong et al.
EraseDraw : Learning to Insert Objects by Erasing Them from Images
Alper Canberk, Maksym Bondarenko, Ege Ozguroglu et al.
Learning Diffusion Models for Multi-View Anomaly Detection
Chieh Liu, Yu-Min Chu, Ting-I Hsieh et al.
Self-supervised visual learning from interactions with objects
Arthur Aubret, Céline Teulière, Jochen Triesch
CanonicalFusion: Generating Drivable 3D Human Avatars from Multiple Images
Jisu Shin, Junmyeong Lee, Seongmin Lee et al.
SMILe: Leveraging Submodular Mutual Information For Robust Few-Shot Object Detection
Anay Majee, Ryan X Sharp, Rishabh Iyer
Enhancing Cross-Subject fMRI-to-Video Decoding with Global-Local Functional Alignment
Chong Li, Xuelin Qian, Yun Wang et al.
NVS-Adapter: Plug-and-Play Novel View Synthesis from a Single Image
Yoonwoo Jeong, Jinwoo Lee, Chiheon Kim et al.
LG-Gaze: Learning Geometry-aware Continuous Prompts for Language-Guided Gaze Estimation
Pengwei Yin, Jingjing Wang, Guanzhong Zeng et al.
Perceptual Evaluation of Audio-Visual Synchrony Grounded in Viewers’ Opinion Scores
Lucas Goncalves, Prashant Mathur, Chandrashekhar Lavania et al.
Brain Netflix: Scaling Data to Reconstruct Videos from Brain Signals
Camilo Fosco, Benjamin Lahner, Bowen Pan et al.
Learning to Make Keypoints Sub-Pixel Accurate
Shinjeong Kim, Marc Pollefeys, Daniel Barath
Self-Adapting Large Visual-Language Models to Edge Devices across Visual Modalities
Kaiwen Cai, ZheKai Duan, Gaowen Liu et al.
3D Small Object Detection with Dynamic Spatial Pruning
Xiuwei Xu, Zhihao Sun, Ziwei Wang et al.
GenView: Enhancing View Quality with Pretrained Generative Model for Self-Supervised Learning
Xiaojie Li, Yibo Yang, Xiangtai Li et al.
Make a Strong Teacher with Label Assistance: A Novel Knowledge Distillation Approach for Semantic Segmentation
Shoumeng Qiu, Jie Chen, Xinrun Li et al.
HSR: Holistic 3D Human-Scene Reconstruction from Monocular Videos
Lixin Xue, Chen Guo, Chengwei Zheng et al.
Exploring Phrase-Level Grounding with Text-to-Image Diffusion Model
Danni Yang, Ruohan Dong, Jiayi Ji et al.
GS-Pose: Category-Level Object Pose Estimation via Geometric and Semantic Correspondence
Pengyuan Wang, Takuya Ikeda, Robert Lee et al.
Quantized Prompt for Efficient Generalization of Vision-Language Models
Tianxiang Hao, Xiaohan Ding, Juexiao Feng et al.
PQ-SAM: Post-training Quantization for Segment Anything Model
Xiaoyu Liu, Xin Ding, Lei Yu et al.
RoScenes: A Large-scale Multi-view 3D Dataset for Roadside Perception
Xiaosu Zhu, Hualian Sheng, Sijia Cai et al.
PiTe: Pixel-Temporal Alignment for Large Video-Language Model
Yang Liu, Pengxiang Ding, Siteng Huang et al.
Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing
Wonjun Kang, Kevin Galim, Hyung Il Koo
RoomTex: Texturing Compositional Indoor Scenes via Iterative Inpainting
Qi Wang, Ruijie Lu, Xudong XU et al.
Statewide Visual Geolocalization in the Wild
Florian Fervers, Sebastian Bullinger, Christoph Bodensteiner et al.
PredBench: Benchmarking Spatio-Temporal Prediction across Diverse Disciplines
Zidong Wang, Zeyu Lu, Di Huang et al.
Safeguard Text-to-Image Diffusion Models with Human Feedback Inversion
Sanghyun Kim, Seohyeon Jung, Balhae Kim et al.
On Pretraining Data Diversity for Self-Supervised Learning
Hasan Abed El Kader Hammoud, Tuhin Das, Fabio Pizzati et al.
GenRC: Generative 3D Room Completion from Sparse Image Collections
Ming-Feng Li, Yueh-Feng Ku, Hong-Xuan Yen et al.
X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs
Swetha Sirnam, Jinyu Yang, Tal Neiman et al.
BK-SDM: A Lightweight, Fast, and Cheap Version of Stable Diffusion
Bo-Kyeong Kim, Hyoung-Kyu Song, Thibault Castells et al.
Layer-Wise Relevance Propagation with Conservation Property for ResNet
Seitaro Otsuki, Tsumugi Iida, Félix Doublet et al.
Subspace Prototype Guidance for Mitigating Class Imbalance in Point Cloud Semantic Segmentation
Jiawei Han, Kaiqi Liu, Wei Li et al.
Cs2K: Class-specific and Class-shared Knowledge Guidance for Incremental Semantic Segmentation
Wei Cong, Yang Cong, Yuyang Liu et al.
O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation
Muer Tie, Julong Wei, Zhengjun Wang et al.
Compress3D: a Compressed Latent Space for 3D Generation from a Single Image
Bowen Zhang, Tianyu Yang, Yu Li et al.
DynoSurf: Neural Deformation-based Temporally Consistent Dynamic Surface Reconstruction
Yuxin Yao, Siyu Ren, Junhui Hou et al.
AFF-ttention! Affordances and Attention models for Short-Term Object Interaction Anticipation
Lorenzo Mur Labadia, Ruben Martinez-Cantin, Jose J Guerrero et al.
AddBiomechanics Dataset: Capturing the Physics of Human Motion at Scale
Keenon Werling, Janelle M Kaneda, Tian Tan et al.
CityGuessr: City-Level Video Geo-Localization on a Global Scale
Parth Parag Kulkarni, Gaurav Kumar Nayak, Shah Mubarak
Taming Lookup Tables for Efficient Image Retouching
Sidi Yang, Binxiao Huang, Mingdeng Cao et al.
ZoLA: Zero-Shot Creative Long Animation Generation with Short Video Model
Fu-Yun Wang, Zhaoyang Huang, Qiang Ma et al.
PCF-Lift: Panoptic Lifting by Probabilistic Contrastive Fusion
Runsong Zhu, Shi Qiu, Qianyi Wu et al.
Flying with Photons: Rendering Novel Views of Propagating Light
Anagh Malik, Noah Juravsky, Ryan Po et al.
TriNeRFLet: A Wavelet Based Triplane NeRF Representation
Rajaei Khatib, RAJA GIRYES
Bi-TTA: Bidirectional Test-Time Adapter for Remote Physiological Measurement
Haodong LI, Hao LU, Yingcong Chen
MinD-3D: Reconstruct High-quality 3D objects in Human Brain
Jianxiong Gao, Yuqian Fu, Yun Wang et al.
MetaCap: Meta-learning Priors from Multi-View Imagery for Sparse-view Human Performance Capture and Rendering
Guoxing Sun, Rishabh Dabral, Pascal Fua et al.
Zero-Shot Adaptation for Approximate Posterior Sampling of Diffusion Models in Inverse Problems
Yasar Utku Alcalar, Mehmet Akcakaya
Power Variable Projection for Initialization-Free Large-Scale Bundle Adjustment
Simon Weber, Je Hyeong Hong, Daniel Cremers
Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection
Tim Salzmann, Markus Ryll, Alex Bewley et al.
Self-Supervised Representation Learning for Adversarial Attack Detection
Yi Li, Plamen Angelov, Neeraj Suri
Zero-Shot Image Feature Consensus with Deep Functional Maps
Xinle Cheng, Congyue Deng, Adam Harley et al.
Concept Arithmetics for Circumventing Concept Inhibition in Diffusion Models
Vitali Petsiuk, Kate Saenko
Accelerating Online Mapping and Behavior Prediction via Direct BEV Feature Attention
Xunjiang Gu, Guanyu Song, Igor Gilitschenski et al.
Auto-DAS: Automated Proxy Discovery for Training-free Distillation-aware Architecture Search
Haosen SUN, Lujun Li, Peijie Dong et al.
Parameterized Quasi-Physical Simulators for Dexterous Manipulations Transfer
Xueyi Liu, Kangbo Lyu, jieqiong zhang et al.
PoseCrafter: One-Shot Personalized Video Synthesis Following Flexible Pose Control
Yong Zhong, Min Zhao, Zebin You et al.
Shape from Heat Conduction
Sriram Narayanan, Mani Ramanagopal, Mark Sheinin et al.
FSD-BEV: Foreground Self-Distillation for Multi-view 3D Object Detection
Zheng Jiang, Jinqing Zhang, Yanan Zhang et al.
Enriching Information and Preserving Semantic Consistency in Expanding Curvilinear Object Segmentation Datasets
Qin Lei, Jiang Zhong, Qizhu Dai
Towards Scene Graph Anticipation
Rohith Peddi, Saksham Singh, Saurabh . et al.
UniCal: Unified Neural Sensor Calibration
Ze Yang, George G Chen, Haowei Zhang et al.
Made to Order: Discovering monotonic temporal changes via self-supervised video ordering
Charig Yang, Weidi Xie, Andrew ZISSERMAN
Preventing Catastrophic Overfitting in Fast Adversarial Training: A Bi-level Optimization Perspective
Zhaoxin Wang, Handing Wang, Cong Tian et al.
SLIM: Spuriousness Mitigation with Minimal Human Annotations
Xiwei Xuan, Ziquan Deng, Hsuan-Tien Lin et al.
D4-VTON: Dynamic Semantics Disentangling for Differential Diffusion based Virtual Try-On
Zhaotong Yang, Zicheng Jiang, Xinzhe Li et al.
RAVE: Residual Vector Embedding for CLIP-Guided Backlit Image Enhancement
Tatiana Gaintseva, Martin Benning, Greg Slabaugh
Toward Tiny and High-quality Facial Makeup with Data Amplify Learning
Qiaoqiao Jin, Xuanhong Chen, Meiguang Jin et al.
You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception
Sheng Jin, Shuhuai Li, Tong Li et al.
Plan, Posture and Go: Towards Open-vocabulary Text-to-Motion Generation
Jinpeng Liu, Wenxun Dai, Chunyu Wang et al.
Flash Cache: Reducing Bias in Radiance Cache Based Inverse Rendering
Benjamin Attal, Dor Verbin, Ben Mildenhall et al.
RePOSE: 3D Human Pose Estimation via Spatio-Temporal Depth Relational Consistency
Ziming Sun, Yuan Liang, Zejun Ma et al.
Bottom-Up Domain Prompt Tuning for Generalized Face Anti-Spoofing
SI-QI LIU, Qirui Wang, Pong Chi Yuen
PolyOculus: Simultaneous Multi-view Image-based Novel View Synthesis
Jason Yu, Tristan Aumentado-Armstrong, Fereshteh Forghani et al.
Bidirectional Progressive Transformer for Interaction Intention Anticipation
Zichen Zhang, Hongchen Luo, Wei Zhai et al.
Memory-Efficient Fine-Tuning for Quantized Diffusion Model
Hyogon Ryu, Seohyun Lim, Hyunjung Shim
TTT-MIM: Test-Time Training with Masked Image Modeling for Denoising Distribution Shifts
Youssef Mansour, Xuyang Zhong, Serdar Caglar et al.
Layered Rendering Diffusion Model for Controllable Zero-Shot Image Synthesis
Zipeng Qi, Guoxi Huang, Chenyang Liu et al.
Bidirectional Stereo Image Compression with Cross-Dimensional Entropy Model
Zhening Liu, XINJIE ZHANG, Jiawei Shao et al.
DIAL: Dense Image-text ALignment for Weakly Supervised Semantic Segmentation
Soojin Jang, JungMin Yun, JuneHyoung Kwon et al.
Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable Repainting
Junwu Zhang, Zhenyu Tang, Yatian Pang et al.
Relightable Neural Actor with Intrinsic Decomposition and Pose Control
Diogo Carbonera Luvizon, Vladislav Golyanik, Adam Kortylewski et al.
PEA-Diffusion: Parameter-Efficient Adapter with Knowledge Distillation in non-English Text-to-Image Generation
Jian Ma, Chen Chen, Qingsong Xie et al.
DC-Solver: Improving Predictor-Corrector Diffusion Sampler via Dynamic Compensation
Wenliang Zhao, Haolin Wang, Jie Zhou et al.
An Explainable Vision Question Answer Model via Diffusion Chain-of-Thought
Chunhao LU, Qiang Lu, Jake Luo
Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head Videos
Ekta Prashnani, Koki Nagano, Shalini De Mello et al.
High-Precision Self-Supervised Monocular Depth Estimation with Rich-Resource Prior
Shen Jianbing, Wencheng Han
GazeXplain: Learning to Predict Natural Language Explanations of Visual Scanpaths
Xianyu Chen, Ming Jiang, Qi Zhao
Un-EVIMO: Unsupervised Event-based Independent Motion Segmentation
Ziyun Wang, Jinyuan Guo, Kostas Daniilidis
WorldPose: A World Cup Dataset for Global 3D Human Pose Estimation
Tianjian Jiang, Johsan Billingham, Sebastian Müksch et al.
Prompting Future Driven Diffusion Model for Hand Motion Prediction
Bowen Tang, Kaihao Zhang, Wenhan Luo et al.
ARoFace: Alignment Robustness to Improve Low-quality Face Recognition
Mohammad Saeed Ebrahimi Saadabadi, Sahar Rahimi Malakshan, Ali Dabouei et al.
GeometrySticker: Enabling Ownership Claim of Recolorized Neural Radiance Fields
Xiufeng HUANG, Ka Chun Cheung, Simon See et al.
COMO: Compact Mapping and Odometry
Eric Dexheimer, Andrew Davison
Within the Dynamic Context: Inertia-aware 3D Human Modeling with Pose Sequence
Yutong Chen, Yifan Zhan, Zhihang Zhong et al.
ProCreate, Don't Reproduce! Propulsive Energy Diffusion for Creative Generation
Jack Lu, Ryan Teehan, Mengye Ren
Learning Neural Volumetric Pose Features for Camera Localization
Jingyu Lin, Jiaqi Gu, Bojian Wu et al.
Anytime Continual Learning for Open Vocabulary Classification
Zhen Zhu, Yiming Gong, Derek Hoiem
Markov Knowledge Distillation: Make Nasty Teachers trained by Self-undermining Knowledge Distillation Fully Distillable
En-Hui Yang, Linfeng Ye
ConDense: Consistent 2D-3D Pre-training for Dense and Sparse Features from Multi-View Images
Xiaoshuai Zhang, Zhicheng Wang, Howard Zhou et al.
Distill Gold from Massive Ores: Bi-level Data Pruning towards Efficient Dataset Distillation
YUE XU, Yong-Lu Li, Kaitong Cui et al.
Improving Knowledge Distillation via Regularizing Feature Direction and Norm
Yuzhu Wang, Lechao Cheng, Manni Duan et al.
Self-supervised co-salient object detection via feature correspondences at multiple scales
Souradeep Chakraborty, Dimitris Samaras
DiffuX2CT: Diffusion Learning to Reconstruct CT Images from Biplanar X-Rays
Baochang Zhang, Zhi Qiao, Runkun Liu et al.
AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-level Retrieval
Pavel Suma, Giorgos Kordopatis-Zilos, Ahmet Iscen et al.
Camera Calibration using a Collimator System
Shunkun Liang, Banglei Guan, Zhenbao Yu et al.
Quality Assured: Rethinking Annotation Strategies in Imaging AI
Tim Rädsch, Annika Reinke, Vivienn Weru et al.
Snuffy: Efficient Whole Slide Image Classifier
Hossein Jafarinia, Alireza Alipanah, Saeed Razavi et al.
FARSE-CNN: Fully Asynchronous, Recurrent and Sparse Event-Based CNN
Riccardo Santambrogio, Marco Cannici, Matteo Matteucci
Hyperion – A fast, versatile symbolic Gaussian Belief Propagation framework for Continuous-Time SLAM
David Hug, Ignacio Alzugaray Lopez, Margarita Chli
AWOL: Analysis WithOut synthesis using Language
Silvia Zuffi, Michael J. Black
SLAck: Semantic, Location, and Appearance Aware Open-Vocabulary Tracking
Siyuan Li, Lei Ke, Yung-Hsu Yang et al.
Trainable Highly-expressive Activation Functions
Irit Chelly, Shahaf Finder, Shira Ifergane et al.
RecurrentBEV: A Long-term Temporal Fusion Framework for Multi-view 3D Detection
Ming Chang, Xishan Zhang, Rui Zhang et al.
UniINR: Event-guided Unified Rolling Shutter Correction, Deblurring, and Interpolation
Yunfan Lu, Guoqiang Liang, Yusheng Wang et al.
PartImageNet++ Dataset: Scaling up Part-based Models for Robust Recognition
Xiao Li, Yining Liu, Na Dong et al.
IFTR: An Instance-Level Fusion Transformer for Visual Collaborative Perception
Shaohong Wang, Lu Bin, Xinyu Xiao et al.
VCD-Texture: Variance Alignment based 3D-2D Co-Denoising for Text-Guided Texturing
Shang Liu, Chaohui Yu, Chenjie Cao et al.
Camera-LiDAR Cross-modality Gait Recognition
Wenxuan Guo, Yingping Liang, Zhiyu Pan et al.
PairingNet: A Learning-based Pair-searching and -matching Network for Image Fragments
rixin zhou, Ding Xia, YI ZHANG et al.
SemReg: Semantics Constrained Point Cloud Registration
Sheldon Fung, Xuequan Lu, Dasith de Silva Edirimuni et al.
Personalized Video Relighting With an At-Home Light Stage
Jun Myeong Choi, Max Christman, Roni Sengupta
Diffusion Prior-Based Amortized Variational Inference for Noisy Inverse Problems
Sojin Lee, Dogyun Park, Inho Kong et al.
Temporal Residual Jacobians for Rig-free Motion Transfer
Sanjeev Muralikrishnan, Niladri Shekhar Dutt, Siddhartha Chaudhuri et al.
From Fake to Real: Pretraining on Balanced Synthetic Images to Prevent Spurious Correlations in Image Recognition
Maan Qraitem, Kate Saenko, Bryan Plummer
Category-level Object Detection, Pose Estimation and Reconstruction from Stereo Images
Chuanrui Zhang, Yonggen Ling, Minglei Lu et al.
Data Augmentation via Latent Diffusion for Saliency Prediction
Bahar Aydemir, Deblina Bhattacharjee, Tong Zhang et al.
Understanding Physical Dynamics with Counterfactual World Modeling
Rahul Mysore Venkatesh, Honglin Chen, Kevin Feigelis et al.
Insect Identification in the Wild: The AMI Dataset
Aditya Jain, Fagner Cunha, Michael J Bunsen et al.
HGL: Hierarchical Geometry Learning for Test-time Adaptation in 3D Point Cloud Segmentation
Tianpei Zou, Sanqing Qu, Zhijun Li et al.
Rawformer: Unpaired Raw-to-Raw Translation for Learnable Camera ISPs
Georgy Perevozchikov, Nancy Mehta, Mahmoud Afifi et al.
Self-Supervised Audio-Visual Soundscape Stylization
Tingle Li, Renhao Wang, Po-Yao Huang et al.
Improving Robustness to Model Inversion Attacks via Sparse Coding Architectures
Sayanton Vhaduri Dibbo, Adam Breuer, Juston Moore et al.
Zero-Shot Multi-Object Scene Completion
Shun Iwase, Katherine Liu, Vitor Guizilini et al.
Agglomerative Token Clustering
Joakim Bruslund Haurum, Sergio Escalera, Graham W. Taylor et al.
Weight Conditioning for Smooth Optimization of Neural Networks
Hemanth Saratchandran, Thomas X Wang, Simon Lucey
Layout-Corrector: Alleviating Layout Sticking Phenomenon in Discrete Diffusion Model
Shoma Iwai, Atsuki Osanai, Shunsuke Kitada et al.
External Knowledge Enhanced 3D Scene Generation from Sketch
Zijie Wu, Mingtao Feng, Yaonan Wang et al.
Click Prompt Learning with Optimal Transport for Interactive Segmentation
Jie Liu, haochen wang, Wenzhe Yin et al.
PARIS3D: Reasoning-based 3D Part Segmentation Using Large Multimodal Model
Amrin Kareem, Jean Lahoud, Hisham Cholakkal
A high-quality robust diffusion framework for corrupted dataset
Quan Dao, Binh Ta, Tung Pham et al.
Unsupervised Multi-modal Medical Image Registration via Invertible Translation
Mengjie Guo
Shedding More Light on Robust Classifiers under the lens of Energy-based Models
Mujtaba Hussain Mirza, Maria Rosaria Briglia, Senad Beadini et al.
Reshaping the Online Data Buffering and Organizing Mechanism for Continual Test-Time Adaptation
Zhilin Zhu, Xiaopeng Hong, Zhiheng Ma et al.
DiffCD: A Symmetric Differentiable Chamfer Distance for Neural Implicit Surface Fitting
Linus Härenstam-Nielsen, Lu Sang, Abhishek Saroha et al.
Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection
Youheng Sun, Shengming Yuan, Xuanhan Wang et al.
Improving Domain Generalization in Self-Supervised Monocular Depth Estimation via Stabilized Adversarial Training
Yuanqi Yao, Gang Wu, Kui Jiang et al.
Improving Zero-Shot Generalization for CLIP with Variational Adapter
Ziqian Lu, Fengli Shen, Mushui Liu et al.
Learn to Memorize and to Forget: A Continual Learning Perspective of Dynamic SLAM
Baicheng Li, Zike Yan, Dong Wu et al.
Diffusion-Based Image-to-Image Translation by Noise Correction via Prompt Interpolation
Junsung Lee, Minsoo Kang, Bohyung Han
EA-VTR: Event-Aware Video-Text Retrieval
Zongyang Ma, Ziqi Zhang, Yuxin Chen et al.
Audio-visual Generalized Zero-shot Learning the Easy Way
Shentong Mo, Pedro Morgado
Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations
KILICHBEK HAYDAROV, Xiaoqian Shen, Avinash Madasu et al.
Learning Cross-hand Policies of High-DOF Reaching and Grasping
Qijin She, Shishun Zhang, Yunfan Ye et al.
DySeT: a Dynamic Masked Self-distillation Approach for Robust Trajectory Prediction
MOZHGAN POURKESHAVARZ, Arielle Zhang, Amir Rasouli
MonoWAD: Weather-Adaptive Diffusion Model for Robust Monocular 3D Object Detection
Youngmin Oh, Hyung-Il Kim, Seong Tae Kim et al.
FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance
Jiedong Zhuang, Jiaqi Hu, Lianrui Mu et al.
VISAGE: Video Instance Segmentation with Appearance-Guided Enhancement
Hanjung Kim, Jaehyun Kang, Miran Heo et al.
Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics
Shishira R Maiya, Anubhav Anubhav, Matthew Gwilliam et al.
SkyScenes: A Synthetic Dataset for Aerial Scene Understanding
Sahil Santosh Khose, Anisha Pal, Aayushi Agarwal et al.
Concise Plane Arrangements for Low-Poly Surface and Volume Modelling
Raphael Sulzer, Florent Lafarge
FrePolad: Frequency-Rectified Point Latent Diffusion for Point Cloud Generation
Chenliang Zhou, Fangcheng Zhong, Param Hanji et al.
DreamDissector: Learning Disentangled Text-to-3D Generation from 2D Diffusion Priors
Zizheng Yan, Jiapeng Zhou, Fanpeng Meng et al.