Most Cited ECCV "unsafe image generation" Papers
2,387 papers found • Page 2 of 12
Conference
Local All-Pair Correspondence for Point Tracking
Seokju Cho, Jiahui Huang, Jisu Nam et al.
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots
Pengxiang Ding, Han Zhao, Wenjie Zhang et al.
Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models
Yuchen Yang, Kwonjoon Lee, Behzad Dariush et al.
TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting
Jiahe Li, Jiawei Zhang, Xiao Bai et al.
VeCLIP: Improving CLIP Training via Visual-enriched Captions
Zhengfeng Lai, Haotian Zhang, Bowen Zhang et al.
DQ-DETR: DETR with Dynamic Query for Tiny Object Detection
Yi-Xin Huang, Hou-I Liu, Hong-Han Shuai et al.
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control
Yue Han, Junwei Zhu, Keke He et al.
Diffusion Models for Open-Vocabulary Segmentation
Laurynas Karazija, Iro Laina, Andrea Vedaldi et al.
AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models
Xuelong Dai, Kaisheng Liang, Bin Xiao
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion
Daniel Winter, Matan Cohen, Shlomi Fruchter et al.
SILC: Improving Vision Language Pretraining with Self-Distillation
Muhammad Ferjad Naeem, Yongqin Xian, Xiaohua Zhai et al.
SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation
Yi-Chia Chen, WeiHua Li, Cheng Sun et al.
GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection
hang yao, Ming LIU, Zhicun Yin et al.
BRAVE: Broadening the visual encoding of vision-language models
Oguzhan Fatih Kar, Alessio Tonioni, Petra Poklukar et al.
VCP-CLIP: A visual context prompting model for zero-shot anomaly segmentation
Zhen Qu, Xian Tao, Mukesh Prasad et al.
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation
Omer Dahary, Or Patashnik, Kfir Aberman et al.
IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination
Xi Chen, Sida Peng, Dongchen Yang et al.
ControlLLM: Augment Language Models with Tools by Searching on Graphs
Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao et al.
Lane Graph as Path: Continuity-preserving Path-wise Modeling for Online Lane Graph Construction
Bencheng Liao, Shaoyu Chen, Bo Jiang et al.
Improving 2D Feature Representations by 3D-Aware Fine-Tuning
Yuanwen Yue, Anurag Das, Francis Engelmann et al.
View-Consistent 3D Editing with Gaussian Splatting
Yuxuan Wang, Xuanyu Yi, Zike Wu et al.
OmniSat: Self-Supervised Modality Fusion for Earth Observation
Guillaume Astruc, Nicolas Gonthier, Clement Mallet et al.
Analytic-Splatting: Anti-Aliased 3D Gaussian Splatting via Analytic Integration
Zhihao Liang, Qi Zhang, WENBO HU et al.
Latent Guard: a Safety Framework for Text-to-image Generation
Runtao Liu, Ashkan Khakzar, Jindong Gu et al.
HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution
Xiang Zhang, Yulun Zhang, Fisher Yu
SWAG: Splatting in the Wild images with Appearance-conditioned Gaussians
Hiba Dahmani, Moussab Bennehar, Nathan Piasco et al.
Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis
Yuanhao Cai, Yixun Liang, Jiahao Wang et al.
HeadGaS: Real-Time Animatable Head Avatars via 3D Gaussian Splatting
Helisa Dhamo, Yinyu Nie, Arthur Moreau et al.
LaRa: Efficient Large-Baseline Radiance Fields
Anpei Chen, Haofei Xu, Stefano Esposito et al.
GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering
Yanyan Li, Chenyu Lyu, Yan Di et al.
FlashTex: Fast Relightable Mesh Texturing with LightControlNet
Kangle Deng, Timothy Omernick, Alexander B Weiss et al.
See and Think: Embodied Agent in Virtual Environment
Zhonghan Zhao, Xuan Wang, Wenhao Chai et al.
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering
Zeyu Liu, Weicong Liang, Zhanhao Liang et al.
MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping
Jiacheng Chen, Yuefan Wu, Tan Jiaqi et al.
VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models
Junlin Han, Filippos Kokkinos, Philip Torr
VideoStudio: Generating Consistent-Content and Multi-Scene Videos
Fuchen Long, Zhaofan Qiu, Ting Yao et al.
Embodied Understanding of Driving Scenarios
Yunsong Zhou, Linyan Huang, Qingwen Bu et al.
Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector
Yuqian Fu, Yu Wang, Yixuan Pan et al.
GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes
Ibrahim Ethem Hamamci, Sezgin Er, Anjany Sekuboyina et al.
EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion
Guangyao Zhai, Evin Pınar Örnek, Dave Zhenyu Chen et al.
A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment
Tianhe Wu, Kede Ma, Jie Liang et al.
A Comparative Study of Image Restoration Networks for General Backbone Network Design
Xiangyu Chen, Zheyuan Li, Yuandong Pu et al.
Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training
David Wan, Jaemin Cho, Elias Stengel-Eskin et al.
ScanReason: Empowering 3D Visual Grounding with Reasoning Capabilities
CHENMING ZHU, Tai Wang, Wenwei Zhang et al.
Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention
Jie Ren, Yaxin Li, Shenglai Zeng et al.
GraphBEV: Towards Robust BEV Feature Alignment for Multi-Modal 3D Object Detection
Ziying Song, Lei Yang, Shaoqing Xu et al.
LCM-Lookahead for Encoder-based Text-to-Image Personalization
Rinon Gal, Or Lichter, Elad Richardson et al.
Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation
Lanqing Guo, Yingqing He, Haoxin Chen et al.
Challenging Forgets: Unveiling the Worst-Case Forget Sets in Machine Unlearning
Chongyu Fan, Jiancheng Liu, Alfred Hero et al.
Deep Patch Visual SLAM
Lahav Lipson, Zachary Teed, Jia Deng
ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions
Anindita Ghosh, Rishabh Dabral, Vladislav Golyanik et al.
UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation
Zexiang Liu, Yangguang Li, Youtian Lin et al.
GVGEN: Text-to-3D Generation with Volumetric Representation
Xianglong He, Junyi Chen, Sida Peng et al.
FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis
Linjiang Huang, Rongyao Fang, Aiping Zhang et al.
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
Qilang Ye, Zitong Yu, Rui Shao et al.
Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction
Zihao Liu, Xiaoyu Zhang, Guangwei Liu et al.
Leveraging Representations from Intermediate Encoder-blocks for Synthetic Image Detection
Christos Koutlis, Symeon Papadopoulos
CoMo: Controllable Motion Generation through Language Guided Pose Code Editing
Yiming Huang, WEILIN WAN, Yue Yang et al.
SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer
Zijie Wu, Chaohui Yu, Yanqin Jiang et al.
ReMamber: Referring Image Segmentation with Mamba Twister
Yuhuan Yang, Chaofan Ma, Jiangchao Yao et al.
LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis
Kevin Xie, Tianshi Cao, Jonathan P Lorraine et al.
VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models
Shicheng Li, Lei Li, Yi Liu et al.
DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control
Yuru Jia, Lukas Hoyer, Shengyu Huang et al.
Fully Sparse 3D Occupancy Prediction
Haisong Liu, Yang Chen, Haiguang Wang et al.
TIBET: Identifying and Evaluating Biases in Text-to-Image Generative Models
Aditya Aravind Chinchure, Pushkar Shukla, Gaurav Bhatt et al.
Watch Your Steps: Local Image and Scene Editing by Text Instructions
Ashkan Mirzaei, Tristan T Aumentado-Armstrong, Marcus A Brubaker et al.
Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer
Yu Deng, Duomin Wang, Baoyuan Wang
DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM
Yixuan Wu, Yizhou Wang, Shixiang Tang et al.
Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models
Yixuan Ren, Yang Zhou, Jimei Yang et al.
FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally
Qiuhong Shen, Xingyi Yang, Xinchao Wang
UniTraj: A Unified Framework for Scalable Vehicle Trajectory Prediction
Lan Feng, Mohammadhossein Bahari, Kaouther Messaoud et al.
When Fast Fourier Transform Meets Transformer for Image Restoration
xingyu jiang, Xiuhui Zhang, Ning Gao et al.
UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models
Yiming Zhao, Zhouhui Lian
FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models
Andrea Caraffa, Davide Boscaini, Amir Hamza et al.
Online Vectorized HD Map Construction using Geometry
Zhixin Zhang, Yiyuan Zhang, Xiaohan Ding et al.
Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation
Tong Shao, Zhuotao Tian, Hang Zhao et al.
RealGen: Retrieval Augmented Generation for Controllable Traffic Scenarios
Wenhao Ding, Yulong Cao, DING ZHAO et al.
GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval
Han Zhou, Wei Dong, Xiaohong Liu et al.
CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs
Yassine Ouali, Adrian Bulat, Brais Martinez et al.
RangeLDM: Fast Realistic LiDAR Point Cloud Generation
Qianjiang Hu, Zhimin Zhang, Wei Hu
R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model
Changhoon Kim, Kyle Min, Yezhou Yang
Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
Xiaoshi Wu, Yiming Hao, Manyuan Zhang et al.
Click-Gaussian: Interactive Segmentation to Any 3D Gaussians
Seokhun Choi, Hyeonseop Song, Jaechul Kim et al.
MyVLM: Personalizing VLMs for User-Specific Queries
Yuval Alaluf, Elad Richardson, Sergey Tulyakov et al.
LivePhoto: Real Image Animation with Text-guided Motion Control
Xi Chen, Zhiheng Liu, Mengting Chen et al.
MagMax: Leveraging Model Merging for Seamless Continual Learning
Daniel Marczak, Bartlomiej Twardowski, Tomasz Trzcinski et al.
PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations
Yang Zheng, Qingqing Zhao, Guandao Yang et al.
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
Rizhao Cai, Zirui Song, DAYAN GUAN et al.
BAGS: Blur Agnostic Gaussian Splatting through Multi-Scale Kernel Modeling
Cheng Peng, Yutao Tang, Yifan Zhou et al.
DrivingDiffusion: Layout-Guided Multi-View Driving Scenarios Video Generation with Latent Diffusion Model
Li Xiaofan, Zhang Yifu, Xiaoqing Ye
NeuroNCAP: Photorealistic Closed-loop Safety Testing for Autonomous Driving
William Ljungbergh, Adam Tonderski, Joakim Johnander et al.
On the Error Analysis of 3D Gaussian Splatting and an Optimal Projection Strategy
Letian Huang, Jiayang Bai, Jie Guo et al.
ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance
Yongwei Chen, Tengfei Wang, Tong Wu et al.
LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model
Yulin Luo, Ruichuan An, Bocheng Zou et al.
BAMM: Bidirectional Autoregressive Motion Model
Ekkasit Pinyoanuntapong, Muhammad Usama Saleem, Pu Wang et al.
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs
Donghyun Kim, Byeongho Heo, Dongyoon Han
Diffusion Reward: Learning Rewards via Conditional Video Diffusion
Tao Huang, Guangqi Jiang, Yanjie Ze et al.
EgoLifter: Open-world 3D Segmentation for Egocentric Perception
Qiao Gu, Zhaoyang Lv, Duncan Frost et al.
ParCo: Part-Coordinating Text-to-Motion Synthesis
Qiran Zou, Shangyuan Yuan, Shian Du et al.
ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image
Hallee E. Wong, Marianne Rakic, John Guttag et al.
Zero-Shot Detection of AI-Generated Images
Davide Cozzolino, GIovanni Poggi, Matthias Niessner et al.
Think2Drive: Efficient Reinforcement Learning by Thinking with Latent World Model for Autonomous Driving (in CARLA-v2)
Qifeng Li, Xiaosong Jia, Shaobo Wang et al.
Multi-Memory Matching for Unsupervised Visible-Infrared Person Re-Identification
Jiangming Shi, Xiangbo Yin, Yeyun Chen et al.
TAPTR: Tracking Any Point with Transformers as Detection
Hongyang Li, Hao Zhang, Shilong Liu et al.
Stream Query Denoising for Vectorized HD-Map Construction
Shuo Wang, Fan Jia, Weixin Mao et al.
A Watermark-Conditioned Diffusion Model for IP Protection
Rui Min, Sen Li, Hongyang Chen et al.
A Compact Dynamic 3D Gaussian Representation for Real-Time Dynamic View Synthesis
Kai Katsumata, Duc Minh Vo, Hideki Nakayama
Class-Incremental Learning with CLIP: Adaptive Representation Adjustment and Parameter Fusion
Linlan Huang, Xusheng Cao, Haori Lu et al.
Motion-Guided Latent Diffusion for Temporally Consistent Real-world Video Super-resolution
Xi Yang, Chenhang He, Jianqi Ma et al.
MesonGS: Post-training Compression of 3D Gaussians via Efficient Attribute Transformation
Shuzhao Xie, Weixiang Zhang, Chen Tang et al.
MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation
KUNPENG SONG, Yizhe Zhu, Bingchen Liu et al.
MegaScenes: Scene-Level View Synthesis at Scale
Joseph Tung, Gene Chou, Ruojin Cai et al.
Making Large Language Models Better Planners with Reasoning-Decision Alignment
Zhijian Huang, Tao Tang, Shaoxiang Chen et al.
Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models
Longxiang Tang, Zhuotao Tian, Kai Li et al.
Texture-GS: Disentangle the Geometry and Texture for 3D Gaussian Splatting Editing
Tian-Xing Xu, WENBO HU, Yu-Kun Lai et al.
TransFusion -- A Transparency-Based Diffusion Model for Anomaly Detection
Matic Fučka, Vitjan Zavrtanik, Danijel Skocaj
Better Call SAL: Towards Learning to Segment Anything in Lidar
Aljoša Ošep, Tim Meinhardt, Francesco Ferroni et al.
TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes
Bu Jin, Yupeng Zheng, Pengfei Li et al.
Scissorhands: Scrub Data Influence via Connection Sensitivity in Networks
Jing Wu, Mehrtash Harandi
Spherical Linear Interpolation and Text-Anchoring for Zero-shot Composed Image Retrieval
Young Kyun Jang, Dat B Huynh, Ashish Shah et al.
SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance
Lukas Hoyer, David Tan, Muhammad Ferjad Naeem et al.
AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question Answering
Xiuyuan Chen, Yuan Lin, Yuchen Zhang et al.
GalLop: Learning global and local prompts for vision-language models
Marc Lafon, Elias Ramzi, Clément Rambour et al.
6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model
Matteo Bortolon, Theodoros Tsesmelis, Stuart James et al.
Decoupling Common and Unique Representations for Multimodal Self-supervised Learning
Yi Wang, Conrad M Albrecht, Nassim Ait Ali Braham et al.
SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views
Chao Xu, Ang Li, Linghao Chen et al.
SMooDi: Stylized Motion Diffusion Model
Lei Zhong, Yiming Xie, Varun Jampani et al.
DAMSDet: Dynamic Adaptive Multispectral Detection Transformer with Competitive Query Selection and Adaptive Feature Fusion
Junjie Guo, Chenqiang Gao, Fangcen liu et al.
R3D-AD: Reconstruction via Diffusion for 3D Anomaly Detection
Zheyuan Zhou, Wang Le, Naiyu Fang et al.
Diagnosing and Re-learning for Balanced Multimodal Learning
Yake Wei, Siwei Li, Ruoxuan Feng et al.
Weighted Ensemble Models Are Strong Continual Learners
Imad Eddine Marouf, Subhankar Roy, Enzo Tartaglione et al.
Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment
Mengting Chen, Xi Chen, Zhonghua Zhai et al.
Solving Motion Planning Tasks with a Scalable Generative Model
Yihan Hu, Siqi Chai, Zhening Yang et al.
FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis
Ke Fan, Junshu Tang, Weijian Cao et al.
HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting
Zhenglin Zhou, Fan Ma, Hehe Fan et al.
MEVG : Multi-event Video Generation with Text-to-Video Models
Gyeongrok Oh, Jaehwan Jeong, Sieun Kim et al.
MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization
Zhao Tianchen, Xuefei Ning, Tongcheng Fang et al.
Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking
Jiyao Zhang, Weiyao Huang, Bo Peng et al.
Alternate Diverse Teaching for Semi-supervised Medical Image Segmentation
Zhen Zhao, Zicheng Wang, Dian Yu et al.
SegPoint: Segment Any Point Cloud via Large Language Model
Shuting He, Henghui Ding, Xudong Jiang et al.
Video Question Answering with Procedural Programs
Rohan Choudhury, Koichiro Niinuma, Kris Kitani et al.
Goldfish: Vision-Language Understanding of Arbitrarily Long Videos
Kirolos Ataallah, Xiaoqian Shen, Eslam mohamed abdelrahman et al.
MobileDiffusion: Instant Text-to-Image Generation on Mobile Devices
Yang Zhao, Zhisheng Xiao, Yanwu Xu et al.
DragVideo: Interactive Drag-style Video Editing
Yufan Deng, Ruida Wang, Yuhao ZHANG et al.
Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation
Siyu Jiao, hongguang Zhu, Yunchao Wei et al.
Vamos: Versatile Action Models for Video Understanding
Shijie Wang, Qi Zhao, Minh Quan et al.
Tokenize Anything via Prompting
Ting Pan, Lulu Tang, Xinlong Wang et al.
Fake It till You Make It: Curricular Dynamic Forgery Augmentations towards General Deepfake Detection
Yuzhen Lin, Wentang Song, Bin Li et al.
PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology
YUXUAN SUN, Hao Wu, Chenglu Zhu et al.
Pyramid Diffusion for Fine 3D Large Scene Generation
Yuheng Liu, Xinke Li, Xueting Li et al.
Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation
Seung Hyun Lee, Yinxiao Li, Junjie Ke et al.
SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction
Marko Mihajlovic, Sergey Prokudin, Siyu Tang et al.
m&m’s: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks
Zixian Ma, Weikai Huang, Jieyu Zhang et al.
V-IRL: Grounding Virtual Intelligence in Real Life
Jihan YANG, Runyu Ding, Ellis L Brown et al.
DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment
Jiuming Liu, Dong Zhuo, Zhiheng Feng et al.
FunQA: Towards Surprising Video Comprehension
Binzhu Xie, Sicheng Zhang, Zitang Zhou et al.
GPSFormer: A Global Perception and Local Structure Fitting-based Transformer for Point Cloud Understanding
Changshuo Wang, Meiqing Wu, Siew-Kei Lam et al.
SeFlow: A Self-Supervised Scene Flow Method in Autonomous Driving
Qingwen Zhang, Yi Yang, Peizheng Li et al.
Prompting Language-Informed Distribution for Compositional Zero-Shot Learning
Wentao Bao, Lichang Chen, Heng Huang et al.
HiFi-123: Towards High-fidelity One Image to 3D Content Generation
Wangbo Yu, Li Yuan, Yanpei Cao et al.
LaWa: Using Latent Space for In-Generation Image Watermarking
Ahmad Rezaei, Mohammad Akbari, Saeed Ranjbar Alvar et al.
Towards Multimodal Sentiment Analysis Debiasing via Bias Purification
Dingkang Yang, Mingcheng Li, Dongling Xiao et al.
SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher
Trung Dao, Thuan Nguyen, Thanh Van Le et al.
UGG: Unified Generative Grasping
Jiaxin Lu, Hao Kang, Haoxiang Li et al.
Asynchronous Large Language Model Enhanced Planner for Autonomous Driving
Yuan Chen, Zi-han Ding, Ziqin Wang et al.
SemGrasp: Semantic Grasp Generation via Language Aligned Discretization
Kailin Li, Jingbo Wang, Lixin Yang et al.
Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory
Sensen Gao, Xiaojun Jia, Xuhong Ren et al.
Generalizable Human Gaussians for Sparse View Synthesis
Youngjoong Kwon, Baole Fang, Yixing Lu et al.
Adversarial Prompt Tuning for Vision-Language Models
Jiaming Zhang, Xingjun Ma, Xin Wang et al.
Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3D Object Detection
Feng Liu, Tengteng Huang, Qianjing Zhang et al.
NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation
Jingyang Huo, Yikai Wang, Yanwei Fu et al.
MoAI: Mixture of All Intelligence for Large Language and Vision Models
Byung-Kwan Lee, Beomchan Park, Chae Won Kim et al.
LingoQA: Video Question Answering for Autonomous Driving
Ana-Maria Marcu, Long Chen, Jan Hünermann et al.
Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm
Yi Wu, Ziqiang Li, Heliang Zheng et al.
AUFormer: Vision Transformers are Parameter-Efficient Facial Action Unit Detectors
Kaishen Yuan, Zitong Yu, Xin Liu et al.
Disentangled Clothed Avatar Generation from Text Descriptions
Jionghao Wang, Yuan Liu, Zhiyang Dou et al.
Audio-Synchronized Visual Animation
Lin Zhang, Shentong Mo, Yijing Zhang et al.
XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution
Yunpeng Qu, Kun Yuan, Kai Zhao et al.
Improving Point-based Crowd Counting and Localization Based on Auxiliary Point Guidance
I-HSIANG CHEN, Wei-Ting Chen, Yu-Wei Liu et al.
Detecting As Labeling: Rethinking LiDAR-camera Fusion in 3D Object Detection
Junjie Huang, Yun Ye, Zhujin Liang et al.
WeConvene: Learned Image Compression with Wavelet-Domain Convolution and Entropy Model
Haisheng Fu, Jie Liang, Zhenman Fang et al.
Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion
Bohan Li, Jiajun Deng, Wenyao Zhang et al.
GTP-4o: Modality-prompted Heterogeneous Graph Learning for Omni-modal Biomedical Representation
Chenxin Li, Xinyu Liu, Cheng Wang et al.
Label-anticipated Event Disentanglement for Audio-Visual Video Parsing
Jinxing Zhou, Dan Guo, Yuxin Mao et al.
Skeleton Recall Loss for Connectivity Conserving and Resource Efficient Segmentation of Thin Tubular Structures
Yannick Kirchhoff, Maximilian Rokuss, Saikat Roy et al.
Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects
Zicong Fan, Takehiko Ohkawa, Linlin Yang et al.
AnimatableDreamer: Text-Guided Non-rigid 3D Model Generation and Reconstruction with Canonical Score Distillation
Xinzhou Wang, Yikai Wang, junliang ye et al.
Learning 3D Geometry and Feature Consistent Gaussian Splatting for Object Removal
YUXIN WANG, Qianyi Wu, Guofeng Zhang et al.
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
Nina Shvetsova, Anna Kukleva, Xudong Hong et al.
RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models
Bowen Zhang, Yiji Cheng, Chunyu Wang et al.
R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
Ye Liu, Jixuan He, Wanhua Li et al.
Exact Diffusion Inversion via Bidirectional Integration Approximation
Guoqiang Zhang, j.p. lewis, W. Bastiaan Kleijn
Lossy Image Compression with Foundation Diffusion Models
Lucas Relic, Roberto Azevedo, Markus Gross et al.
RegionDrag: Fast Region-Based Image Editing with Diffusion Models
Jingyi Lu, Xinghui Li, Kai Han
SAM-guided Graph Cut for 3D Instance Segmentation
Haoyu Guo, He Zhu, Sida Peng et al.
Contrasting Deepfakes Diffusion via Contrastive Learning and Global-Local Similarities
Lorenzo Baraldi, Federico Cocchi, Marcella Cornia et al.
ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling
Siming Yan, Min Bai, Weifeng Chen et al.
Beyond Prompt Learning: Continual Adapter for Efficient Rehearsal-Free Continual Learning
XINYUAN GAO, Songlin Dong, Yuhang He et al.
Cross-view image geo-localization with Panorama-BEV Co-Retrieval Network
ye junyan, Zhutao Lv, Li Weijia et al.
Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention
Zuyao Chen, Jinlin Wu, Zhen Lei et al.