Most Cited ICCV "ucb-driven elimination" Papers
2,701 papers found • Page 11 of 14
Conference
Straighten Viscous Rectified Flow via Noise Optimization
Jimin Dai, Jiexi Yan, Jian Yang et al.
CRAM: Large Scale Video Continual Learning with Bootstrapped Compression
Shivani Mall, Joao F. Henriques
VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation
Shoubin Yu, Difan Liu, Ziqiao Ma et al.
CycleVAR: Repurposing Autoregressive Model for Unsupervised One-Step Image Translation
Yi Liu, Shengqian Li, Zuzeng Lin et al.
Edicho: Consistent Image Editing in the Wild
Qingyan Bai, Hao Ouyang, Yinghao Xu et al.
Translation of Text Embedding via Delta Vector to Suppress Strongly Entangled Content in Text-to-Image Diffusion Models
Eunseo Koh, SeungHoo Hong, Tae-Young Kim et al.
IQA-Adapter: Exploring Knowledge Transfer from Image Quality Assessment to Diffusion-based Generative Models
Khaled Abud, Sergey Lavrushkin, Alexey Kirillov et al.
Dual Recursive Feedback on Generation and Appearance Latents for Pose-Robust Text-to-Image Diffusion
Jiwon Kim, Pureum Kim, SeonHwa Kim et al.
Anti-Tamper Protection for Unauthorized Individual Image Generation
Zelin Li, Ruohan Zong, Yifan Liu et al.
Continual Personalization for Diffusion Models
Yu-Chien Liao, Jr-Jen Chen, Chi-Pin Huang et al.
Global and Local Entailment Learning for Natural World Imagery
Srikumar Sastry, Aayush Dhakal, Eric Xing et al.
TRKT: Weakly Supervised Dynamic Scene Graph Generation with Temporal-enhanced Relation-aware Knowledge Transferring
Zhu Xu, Ting Lei, Zhimin Li et al.
Magic Insert: Style-Aware Drag-and-Drop
Nataniel Ruiz, Yuanzhen Li, Neal Wadhwa et al.
CharaConsist: Fine-Grained Consistent Character Generation
Mengyu Wang, Henghui Ding, Jianing Peng et al.
Beyond Perspective: Neural 360-Degree Video Compression
Andy Regensky, Marc Windsheimer, Fabian Brand et al.
One-Step Specular Highlight Removal with Adapted Diffusion Models
Mahir Atmis, LEVENT KARACAN, Mehmet SARIGÜL
DiGA3D: Coarse-to-Fine Diffusional Propagation of Geometry and Appearance for Versatile 3D Inpainting
Jingyi Pan, Dan Xu, Qiong Luo
From Linearity to Non-Linearity: How Masked Autoencoders Capture Spatial Correlations
Anthony Bisulco, Rahul Ramesh, Randall Balestriero et al.
FiVE-Bench: A Fine-grained Video Editing Benchmark for Evaluating Emerging Diffusion and Rectified Flow Models
Minghan LI, Chenxi Xie, Yichen Wu et al.
Co-Painter: Fine-Grained Controllable Image Stylization via Implicit Decoupling and Adaptive Injection
Bowen Fu, Wei Wei, Jiaqi Tang et al.
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion
Haonan Qiu, Shiwei Zhang, Yujie Wei et al.
SafetyDPO: Scalable Safety Alignment for Text-to-Image Generation
Runtao Liu, I Chen, Jindong Gu et al.
FairHuman: Boosting Hand and Face Quality in Human Image Generation with Minimum Potential Delay Fairness in Diffusion Models
Yuxuan Wang, Tianwei Cao, Huayu Zhang et al.
Calibrating MLLM-as-a-judge via Multimodal Bayesian Prompt Ensembles
Eric Slyman, Mehrab Tanjim, Kushal Kafle et al.
LOTA: Bit-Planes Guided AI-Generated Image Detection
Renxi Cheng, Hongsong Wang, Yang Zhang et al.
Streamlining Image Editing with Layered Diffusion Brushes
Peyman Gholami, Robert Xiao
ArtEditor: Learning Customized Instructional Image Editor from Few-Shot Examples
Shijie Huang, Yiren Song, Yuxuan Zhang et al.
Disrupting Model Merging: A Parameter-Level Defense Without Sacrificing Accuracy
JUNHAO WEI, YU ZHE, Jun Sakuma
A3GS: Arbitrary Artistic Style into Arbitrary 3D Gaussian Splatting
Zhiyuan Fang, Rengan Xie, Xuancheng Jin et al.
HouseTour: A Virtual Real Estate A(I)gent
Ata Çelen, Iro Armeni, Daniel Barath et al.
Free2Guide: Training-Free Text-to-Video Alignment using Image LVLM
Jaemin Kim, Bryan Sangwoo Kim, Jong Ye
VideoVAE+: Large Motion Video Autoencoding with Cross-modal Video VAE
Yazhou Xing, Yang Fei, Yingqing He et al.
DIA: The Adversarial Exposure of Deterministic Inversion in Diffusion Models
SeungHoo Hong, GeonHo Son, Juhun Lee et al.
GFPack++: Attention-Driven Gradient Fields for Optimizing 2D Irregular Packing
Tianyang Xue, Lin Lu, Yang Liu et al.
Preserve Anything: Controllable Image Synthesis with Object Preservation
Prasen Kumar Sharma, Neeraj Matiyali, Siddharth Srivastava et al.
Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing
Taihang Hu, Linxuan Li, Kai Wang et al.
EEGMirror: Leveraging EEG data in the wild via Montage-Agnostic Self-Supervision for EEG to Video Decoding
Xuan-Hao Liu, Bao-liang Lu, Wei-Long Zheng
Accelerating Diffusion Sampling via Exploiting Local Transition Coherence
shangwen zhu, Han Zhang, Zhantao Yang et al.
SA-LUT: Spatial Adaptive 4D Look-Up Table for Photorealistic Style Transfer
Zerui Gong, Zhonghua Wu, Qingyi Tao et al.
UniGlyph: Unified Segmentation-Conditioned Diffusion for Precise Visual Text Synthesis
Yuanrui Wang, Cong Han, Yafei Li et al.
Semantic Discrepancy-aware Detector for Image Forgery Identification
Wang Ziye, Minghang Yu, Chunyan Xu et al.
REGEN: Learning Compact Video Embedding with (Re-)Generative Decoder
Yitian Zhang, Long Mai, Aniruddha Mahapatra et al.
Gain-MLP: Improving HDR Gain Map Encoding via a Lightweight MLP
Trevor Canham, SaiKiran Tedla, Michael Murdoch et al.
Less-to-More Generalization: Unlocking More Controllability by In-Context Generation
shaojin wu, Mengqi Huang, wenxu wu et al.
FlexGen: Flexible Multi-View Generation from Text and Image Inputs
Xinli Xu, Wenhang Ge, Jiantao Lin et al.
Teleportraits: Training-Free People Insertion into Any Scene
Jialu Gao, Joseph K J, Fernando De la Torre
DiT4SR: Taming Diffusion Transformer for Real-World Image Super-Resolution
Zheng-Peng Duan, jiawei zhang, Xin Jin et al.
QK-Edit: Revisiting Attention-based Injection in MM-DiT for Image and Video Editing
Tiancheng SHEN, Jun Hao Liew, Zilong Huang et al.
Pretrained Reversible Generation as Unsupervised Visual Representation Learning
Rongkun Xue, Jinouwen Zhang, Yazhe Niu et al.
Beyond Brain Decoding: Visual-Semantic Reconstructions to Mental Creation Extension Based on fMRI
Haodong Jing, Dongyao Jiang, Yongqiang Ma et al.
ADCD-Net: Robust Document Image Forgery Localization via Adaptive DCT Feature and Hierarchical Content Disentanglement
KA WONG, Jicheng Zhou, Haiwei Wu et al.
Generative Video Bi-flow
Chen Liu, Tobias Ritschel
AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation
Moayed Haji-Ali, Willi Menapace, Aliaksandr Siarohin et al.
JPEG Processing Neural Operator for Backward-Compatible Coding
Woo Kyoung Han, Yongjun Lee, Byeonghun Lee et al.
All Parts Matter: A Unified Mask-Free Virtual Try-On Framework
Chenghu Du, Shengwu Xiong, Yi Rong
DC-AE 1.5: Accelerating Diffusion Model Convergence with Structured Latent Space
Junyu Chen, Dongyun Zou, Wenkun He et al.
MH-LVC: Multi-Hypothesis Temporal Prediction for Learned Conditional Residual Video Coding
Gao Zong lin, Huu-Tai Phung, Yi-Chen Yao et al.
An Efficient Hybrid Vision Transformer for TinyML Applications
Fanhong Zeng, Huanan LI, Juntao Guan et al.
Graph Domain Adaptation with Dual-branch Encoder and Two-level Alignment for Whole Slide Image-based Survival Prediction
Yuntao Shou, Xiangyong Cao, PeiqiangYan PeiqiangYan et al.
Multi-Schema Proximity Network for Composed Image Retrieval
Jiangming Shi, Xiangbo Yin, yeyunchen yeyunchen et al.
Moment Quantization for Video Temporal Grounding
Xiaolong Sun, Le Wang, Sanping Zhou et al.
SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition
Yongkun Du, Zhineng Chen, Hongtao Xie et al.
ROVI: A VLM-LLM Re-Captioned Dataset for Open-Vocabulary Instance-Grounded Text-to-Image Generation
Cihang Peng, Qiming HOU, Zhong Ren et al.
Feature Purification Matters: Suppressing Outlier Propagation for Training-Free Open-Vocabulary Semantic Segmentation
Shuo Jin, Siyue Yu, Bingfeng Zhang et al.
DiffPS: Leveraging Prior Knowledge of Diffusion Model for Person Search
Giyeol Kim, Sooyoung Yang, Jihyong Oh et al.
LaCoOT: Layer Collapse through Optimal Transport
Victor Quétu, Zhu LIAO, Nour Hezbri et al.
Semantic versus Identity: A Divide-and-Conquer Approach towards Adjustable Medical Image De-Identification
Yuan Tian, Shuo Wang, Rongzhao Zhang et al.
Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding
Yuanhan Zhang, Yunice Chew, Yuhao Dong et al.
Cross-View Isolated Sign Language Recognition via View Synthesis and Feature Disentanglement
Xin Shen, Xinyu Wang, Lei Shen et al.
Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction
Zeren Jiang, Chuanxia Zheng, Iro Laina et al.
On the Recovery of Cameras from Fundamental Matrices
Rakshith Madhavan, Federica Arrigoni
The Devil is in the Spurious Correlations: Boosting Moment Retrieval with Dynamic Learning
Xinyang Zhou, Fanyue Wei, Lixin Duan et al.
SegAnyPET: Universal Promptable Segmentation from Positron Emission Tomography Images
Yichi Zhang, Le Xue, Wenbo zhang et al.
Multi-View Slot Attention Using Paraphrased Texts for Face Anti-Spoofing
Jeongmin Yu, Susang Kim, Kisu Lee et al.
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding
Wenxuan Zhu, Bing Li, Cheng Zheng et al.
SAMora: Enhancing SAM through Hierarchical Self-Supervised Pre-Training for Medical Images
Shuhang Chen, Hangjie Yuan, Pengwei Liu et al.
Text-guided Visual Prompt DINO for Generic Segmentation
Yuchen Guan, Chong Sun, Canmiao Fu et al.
STDDNet: Harnessing Mamba for Video Polyp Segmentation via Spatial-aligned Temporal Modeling and Discriminative Dynamic Representation Learning
Guilian Chen, Huisi Wu, Jing Qin
MedSegFactory: Text-Guided Generation of Medical Image-Mask Pairs
Jiawei Mao, Yuhan Wang, Yucheng Tang et al.
Few-Shot Pattern Detection via Template Matching and Regression
Eunchan Jo, Dahyun Kang, Sanghyun Kim et al.
Exploring Probabilistic Modeling Beyond Domain Generalization for Semantic Segmentation
I-Hsiang Chen, Hua-En Chang, Wei-Ting Chen et al.
Bridging the Gap between Brain and Machine in Interpreting Visual Semantics: Towards Self-adaptive Brain-to-Text Decoding
Jiaxuan Chen, Yu Qi, Yueming Wang et al.
DisTime: Distribution-based Time Representation for Video Large Language Models
yingsen zeng, Zepeng Huang, Yujie Zhong et al.
WeaveSeg: Iterative Contrast-weaving and Spectral Feature-refining for Nuclei Instance Segmentation
Jiajia Li, Huisi Wu, Jing Qin
CARIM: Caption-Based Autonomous Driving Scene Retrieval via Inclusive Text Matching
Minjoo Ki, Dae Jung Kim, Kisung Kim et al.
Modeling Saliency Dataset Bias
Matthias Kümmerer, Harneet Singh Khanuja, Matthias Bethge
Advancing Visual Large Language Model for Multi-granular Versatile Perception
Wentao Xiang, Haoxian Tan, Cong Wei et al.
Prompt-driven Transferable Adversarial Attack on Person Re-Identification with Attribute-aware Textual Inversion
Yuan Bian, Min Liu, Yunqi Yi et al.
DIH-CLIP: Unleashing the Diversity of Multi-Head Self-Attention for Training-Free Open-Vocabulary Semantic Segmentation
Songsong Duan, Xi Yang, Nannan Wang
Feather the Throttle: Revisiting Visual Token Pruning for Vision-Language Model Acceleration
Mark Endo, Xiaohan Wang, Serena Yeung-Levy
LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models
Yuzhang Shang, Mu Cai, Bingxin Xu et al.
HRScene: How Far Are VLMs from Effective High-Resolution Image Understanding?
Yusen Zhang, Wenliang Zheng, Aashrith Madasu et al.
Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring
Yufei Zhan, Shurong Zheng, Yousong Zhu et al.
ReferEverything: Towards Segmenting Everything We Can Speak of in Videos
Anurag Bagchi, Zhipeng Bao, Yu-Xiong Wang et al.
From Trial to Triumph: Advancing Long Video Understanding via Visual Context Sample Scaling and Self-reward Alignment
Yucheng Suo, Fan Ma, Linchao Zhu et al.
VIPerson: Flexibly Generating Virtual Identity for Person Re-Identification
Xiao-Wen Zhang, Delong Zhang, Yi-Xing Peng et al.
HarmonySeg: Tubular Structure Segmentation with Deep-Shallow Feature Fusion and Growth-Suppression Balanced Loss
Ke Zhang, Yi Huang, Wei Liu et al.
SSVQ: Unleashing the Potential of Vector Quantization with Sign-Splitting
Shuaiting Li, Juncan Deng, Chengxuan Wang et al.
VideoMiner: Iteratively Grounding Key Frames of Hour-Long Videos via Tree-based Group Relative Policy Optimization
Xinye Cao, Hongcan Guo, Jiawen Qian et al.
Scaling Tumor Segmentation: Best Lessons from Real and Synthetic Data
Qi Chen, Xinze Zhou, Chen Liu et al.
Region-aware Anchoring Mechanism for Efficient Referring Visual Grounding
Shuyi Ouyang, Ziwei Niu, Hongyi Wang et al.
MaskSAM: Auto-prompt SAM with Mask Classification for Volumetric Medical Image Segmentation
Bin Xie, Hao Tang, Bin Duan et al.
MEH: A Multi-Style Dataset and Toolkit for Advancing Egyptian Hieroglyph Recognition
Maksim Golyadkin, Rubanova Alexandrovna, Aleksandr Utkov et al.
B-VLLM: A Vision Large Language Model with Balanced Spatio-Temporal Tokens
Zhuqiang Lu, Zhenfei Yin, Mengwei He et al.
DiffTell: A High-Quality Dataset for Describing Image Manipulation Changes
Zonglin Di, Jing Shi, Yifei Fan et al.
HyperGCT: A Dynamic Hyper-GNN-Learned Geometric Constraint for 3D Registration
Xiyu Zhang, Jiayi Ma, Jianwei Guo et al.
AD-GS: Object-Aware B-Spline Gaussian Splatting for Self-Supervised Autonomous Driving
Jiawei Xu, Kai Deng, Zexin Fan et al.
PossLoss: A Reliable and Sensitive Facial Landmark Detection Loss Function
Qikui Zhu
RESCUE: Crowd Evacuation Simulation via Controlling SDM-United Characters
Xiaolin Liu, Tianyi zhou, Hongbo Kang et al.
SG-LDM: Semantic-Guided LiDAR Generation via Latent-Aligned Diffusion
Zhengkang Xiang, Zizhao Li, Amir Khodabandeh et al.
PointGAC: Geometric-Aware Codebook for Masked Point Modeling
Abiao Li, Chenlei Lv, Guofeng Mei et al.
PRM: Photometric Stereo based Large Reconstruction Model
Wenhang Ge, Jiantao Lin, Guibao SHEN et al.
RobuSTereo: Robust Zero-Shot Stereo Matching under Adverse Weather
Yuran Wang, Yingping Liang, Yutao Hu et al.
Gaussian-based World Model: Gaussian Priors for Voxel-Based Occupancy Prediction and Future Motion Prediction
Tuo Feng, Wenguan Wang, Yi Yang
Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction
JIXUAN FAN, Wanhua Li, Yifei Han et al.
REPARO: Compositional 3D Assets Generation with Differentiable 3D Layout Alignment
Haonan Han, Rui Yang, Huan Liao et al.
Towards Safer and Understandable Driver Intention Prediction
Mukilan Karuppasamy, Shankar Gangisetty, Shyam Nandan Rai et al.
High-Precision 3D Measurement of Complex Textured Surfaces Using Multiple Filtering Approach
Yuchong Chen, Jian Yu, Shaoyan Gai et al.
Resonance: Learning to Predict Social-Aware Pedestrian Trajectories as Co-Vibrations
Conghao Wong, Ziqian Zou, Beihao Xia
InsideOut: Integrated RGB-Radiative Gaussian Splatting for Comprehensive 3D Object Representation
Jungmin Lee, Seonghyuk Hong, Juyong Lee et al.
CoLMDriver: LLM-based Negotiation Benefits Cooperative Autonomous Driving
Changxing Liu, Genjia Liu, Zijun Wang et al.
Mitigating Geometric Degradation in Fast DownSampling via FastAdapter for Point Cloud Segmentation
Shuofeng Sun, Haibin Yan
DoppDrive: Doppler-Driven Temporal Aggregation for Improved Radar Object Detection
Yuval Haitman, Oded Bialer
MDP-Omni: Parameter-free Multimodal Depth Prior-based Sampling for Omnidirectional Stereo Matching
Eunjin Son, HyungGi Jo, Wookyong Kwon et al.
EDM: Efficient Deep Feature Matching
Xi Li, Tong Rao, Cihui Pan
Occupancy Learning with Spatiotemporal Memory
Ziyang Leng, Jiawei Yang, Wenlong Yi et al.
ACE-G: Improving Generalization of Scene Coordinate Regression Through Query Pre-Training
Leonard Bruns, Axel Barroso-Laguna, Tommaso Cavallari et al.
Towards Visual Localization Interoperability: Cross-Feature for Collaborative Visual Localization and Mapping
Alberto Jaenal, Paula Carbó Cubero, Jose Araujo et al.
Explaining Human Preferences via Metrics for Structured 3D Reconstruction
Jack Langerman, Denis Rozumny, Yuzhong Huang et al.
Inverse 3D Microscopy Rendering for Cell Shape Inference with Active Mesh
Sacha Ichbiah, Anshuman Sinha, Fabrice Delbary et al.
Bridging 3D Anomaly Localization and Repair via High-Quality Continuous Geometric Representation
Bozhong Zheng, Jinye Gan, Xiaohao Xu et al.
SGAD: Semantic and Geometric-aware Descriptor for Local Feature Matching
Xiangzeng Liu, CHI WANG, Guanglu Shi et al.
Generative Gaussian Splatting: Generating 3D Scenes with Video Diffusion Priors
Katja Schwarz, Norman Müller, Peter Kontschieder
Curve-Aware Gaussian Splatting for 3D Parametric Curve Reconstruction
Zhirui Gao, Renjiao Yi, YaQiao Dai et al.
Tree Skeletonization from 3D Point Clouds by Denoising Diffusion
Elias Marks, Lucas Nunes, Federico Magistri et al.
Neural Inverse Rendering for High-Accuracy 3D Measurement of Moving Objects with Fewer Phase-Shifting Patterns
Yuki Urakawa, Yoshihiro Watanabe
When Anchors Meet Cold Diffusion: A Multi-Stage Approach to Lane Detection
Bo-Lun Huang, Tzu-Hsiang Ni, Feng-Kai Huang et al.
Sat2City: 3D City Generation from A Single Satellite Image with Cascaded Latent Diffusion
Tongyan Hua, Lutao Jiang, Ying-Cong Chen et al.
NeuFrameQ: Neural Frame Fields for Scalable and Generalizable Anisotropic Quadrangulation
Ying-Tian Liu, Jiajun Li, Yu-Tao Liu et al.
Controllable 3D Outdoor Scene Generation via Scene Graphs
Yuheng Liu, Xinke Li, Yuning Zhang et al.
PolGS: Polarimetric Gaussian Splatting for Fast Reflective Surface Reconstruction
Yufei Han, Bowen Tie, Heng Guo et al.
Driving View Synthesis on Free-form Trajectories with Generative Prior
Zeyu Yang, Zijie Pan, Yuankun Yang et al.
CVFusion: Cross-View Fusion of 4D Radar and Camera for 3D Object Detection
Hanzhi Zhong, Zhiyu Xiang, Ruoyu Xu et al.
MAESTRO: Task-Relevant Optimization via Adaptive Feature Enhancement and Suppression for Multi-task 3D Perception
ChangWon Kang, Jisong Kim, Hongjae Shin et al.
Joint Semantic and Rendering Enhancements in 3D Gaussian Modeling with Anisotropic Local Encoding
Jingming He, Chongyi Li, Shiqi Wang et al.
DCHM: Depth-Consistent Human Modeling for Multiview Detection
Jiahao Ma, Tianyu Wang, Miaomiao Liu et al.
V2XScenes: A Multiple Challenging Traffic Conditions Dataset for Large-Range Vehicle-Infrastructure Collaborative Perception
Bowen Wang, Yafei Wang, Wei Gong et al.
Leveraging BEV Paradigm for Ground-to-Aerial Image Synthesis
Junyan Ye, Jun He, Weijia Li et al.
EMD: Explicit Motion Modeling for High-Quality Street Gaussian Splatting
Xiaobao Wei, Qingpo Wuwu, Zhongyu Zhao et al.
Interaction-Merged Motion Planning: Effectively Leveraging Diverse Motion Datasets for Robust Planning
Giwon Lee, Wooseong Jeong, Daehee Park et al.
Communication-Efficient Multi-Vehicle Collaborative Semantic Segmentation via Sparse 3D Gaussian Sharing
Tianyu Hong, Xiaobo Zhou, Wenkai Hu et al.
DATA: Domain-And-Time Alignment for High-Quality Feature Fusion in Collaborative Perception
Chengchang Tian, Jianwei Ma, Yan Huang et al.
Heatmap Regression without Soft-Argmax for Facial Landmark Detection
Chiao-An Yang, Raymond A. Yeh
Mixed Signals: A Diverse Point Cloud Dataset for Heterogeneous LiDAR V2X Collaboration
Katie Luo, Minh-Quan Dao, Zhenzhen Liu et al.
Puzzle Similarity: A Perceptually-guided Cross-Reference Metric for Artifact Detection in 3D Scene Reconstructions
Nicolai Hermann, Jorge Condor, Piotr Didyk
Authentic 4D Driving Simulation with a Video Generation Model
Lening Wang, Wenzhao Zheng, Dalong Du et al.
Spherical Epipolar Rectification for Deep Two-View Absolute Depth Estimation
Pierre-André Brousseau, Sébastien Roy
Leveraging 2D Priors and SDF Guidance for Urban Scene Rendering
Siddharth Tourani, Jayaram Reddy, Akash Kumbar et al.
Super Resolved Imaging with Adaptive Optics
Robin Swanson, Esther Y. H. Lin, Masen Lamb et al.
Knowledge Distillation for Learned Image Compression
Yunuo Chen, Zezheng Lyu, Bing He et al.
ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
Jianhong Bai, Menghan Xia, Xiao Fu et al.
Diving into the Fusion of Monocular Priors for Generalized Stereo Matching
Chengtang Yao, Lidong Yu, Zhidan Liu et al.
ROAR: Reducing Inversion Error in Generative Image Watermarking
Hanyi Wang, Han Fang, Shi-Lin Wang et al.
Automated Model Evaluation for Object Detection via Prediction Consistency and Reliability
Seungju Yoo, Hyuk Kwon, Joong-Won Hwang et al.
LOTS of Fashion! Multi-Conditioning for Image Generation via Sketch-Text Pairing
Federico Girella, Davide Talon, Ziyue Liu et al.
FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models
Vladimir Kulikov, Matan Kleiner, Inbar Huberman-Spiegelglas et al.
Event-based Visual Vibrometry
Xinyu Zhou, Peiqi Duan, Yeliduosi Xiaokaiti et al.
ObjectRelator: Enabling Cross-View Object Relation Understanding Across Ego-Centric and Exo-Centric Perspectives
Yuqian Fu, Runze Wang, Bin Ren et al.
Scaling Transformer-Based Novel View Synthesis with Models Token Disentanglement and Synthetic Data
Nithin Gopalakrishnan Nair, Srinivas Kaza, Xuan Luo et al.
DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving
Chen Shi, Shaoshuai Shi, Kehua Sheng et al.
MamV2XCalib: V2X-based Target-less Infrastructure Camera Calibration with State Space Model
Yaoye Zhu, Zhe Wang, Yan Wang
PARTE: Part-Guided Texturing for 3D Human Reconstruction from a Single Image
Hyeongjin Nam, Donghwan Kim, Gyeongsik Moon et al.
Boosting MLLM Reasoning with Text-Debiased Hint-GRPO
Qihan Huang, Weilong Dai, Jinlong Liu et al.
AirCache: Activating Inter-modal Relevancy KV Cache Compression for Efficient Large Vision-Language Model Inference
Kai Huang, hao zou, Bochen Wang et al.
FlowStyler: Artistic Video Stylization via Transformation Fields Transports
YuNing Gong, Jiaming Chen, Xiaohua Ren et al.
ShadowHack: Hacking Shadows via Luminance-Color Divide and Conquer
Jin Hu, Mingjia Li, Xiaojie Guo
Toward Fair and Accurate Cross-Domain Medical Image Segmentation: A VLM-Driven Active Domain Adaptation Paradigm
Hongqiu Wang, Wu Chen, Xiangde Luo et al.
Decouple to Reconstruct: High Quality UHD Restoration via Active Feature Disentanglement and Reversible Fusion
Yidi Liu, Dong Li, Yuxin Ma et al.
BlueNeg: A 35mm Negative Film Dataset for Restoring Channel-Heterogeneous Deterioration
Hanyuan Liu, Chengze Li, Minshan Xie et al.
Rethinking Key-frame-based Micro-expression Recognition: A Robust and Accurate Framework Against Key-frame Errors
Zheyuan Zhang, Weihao Tang, Hong Chen
What we need is explicit controllability: Training 3D gaze estimator using only facial images
Tingwei Li, Jun Bao, Zhenzhong Kuang et al.
SemiVisBooster: Boosting Semi-Supervised Learning for Fine-Grained Classification through Pseudo-Label Semantic Guidance
Wenjin Zhang, Xinyu Li, Chenyang Gao et al.
Enhancing Prompt Generation with Adaptive Refinement for Camouflaged Object Detection
Xuehan Chen, Guangyu Ren, Tianhong Dai et al.
DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness
Ruining Li, Chuanxia Zheng, Christian Rupprecht et al.
EgoAgent: A Joint Predictive Agent Model in Egocentric Worlds
Lu Chen, Yizhou Wang, SHIXIANG TANG et al.
SIC: Similarity-Based Interpretable Image Classification with Neural Networks
Tom Nuno Wolf, Emre Kavak, Fabian Bongratz et al.
MambaML: Exploring State Space Models for Multi-Label Image Classification
Xuelin Zhu, Jian liu, Jiuxin Cao et al.
SEAL: Semantic Aware Image Watermarking
Kasra Arabi, R. Teal Witter, Chinmay Hegde et al.
Unsupervised Identification of Protein Compositions and Conformations via Implicit Content-Transformation Disentanglement
Mostofa Rafid Uddin, Jana Armouti, Min Xu
AR-1-to-3: Single Image to Consistent 3D Object via Next-View Prediction
Xuying Zhang, Yupeng Zhou, Kai Wang et al.
Memory-Efficient Generative Models via Product Quantization
Jie Shao, Hanxiao Zhang, Hao Yu et al.
Rethinking Discrete Tokens: Treating Them as Conditions for Continuous Autoregressive Image Synthesis
Peng Zheng, Junke Wang, Yi Chang et al.
CogCM: Cognition-Inspired Contextual Modeling for Audio-Visual Speech Enhancement
Feixiang Wang, Shuang Yang, Shiguang Shan et al.
EDFFDNet: Towards Accurate and Efficient Unsupervised Multi-Grid Image Registration
Haokai Zhu, Bo Qu, Si-Yuan Cao et al.
Leveraging Debiased Cross-modal Attention Maps and Code-based Reasoning for Zero-shot Referring Expression Comprehension
Juntao Chen, Wen Shen, Zhihua Wei et al.
UST-SSM: Unified Spatio-Temporal State Space Models for Point Cloud Video Modeling
Peiming Li, Ziyi Wang, Yulin Yuan et al.
Automated Red Teaming for Text-to-Image Models through Feedback-Guided Prompt Iteration with Vision-Language Models
Wei Xu, Kangjie Chen, Jiawei Qiu et al.
Enhancing Spatial Reasoning in Multimodal Large Language Models through Reasoning-based Segmentation
Zhenhua Ning, Zhuotao Tian, Shaoshuai Shi et al.
BézierGS: Dynamic Urban Scene Reconstruction with Bézier Curve Gaussian Splatting
Zipei Ma, Junzhe Jiang, Yurui Chen et al.
CLIPSym: Delving into Symmetry Detection with CLIP
Tinghan Yang, Md Ashiqur Rahman, Raymond A. Yeh