Most Cited CVPR "sensitive optimality" Papers
5,589 papers found • Page 26 of 28
Conference
SkySense-O: Towards Open-World Remote Sensing Interpretation with Vision-Centric Visual-Language Modeling
Qi Zhu, Jiangwei Lao, Deyi Ji et al.
IndoorGS: Geometric Cues Guided Gaussian Splatting for Indoor Scene Reconstruction
Cong Ruan, Yuesong Wang, Bin Zhang et al.
Pose Priors from Language Models
Sanjay Subramanian, Evonne Ng, Lea Müller et al.
Generative Densification: Learning to Densify Gaussians for High-Fidelity Generalizable 3D Reconstruction
Seungtae Nam, Xiangyu Sun, Gyeongjin Kang et al.
Towards Optimizing Large-Scale Multi-Graph Matching in Bioimaging
Max Kahl, Sebastian Stricker, Lisa Hutschenreiter et al.
Test-Time Backdoor Detection for Object Detection Models
Hangtao Zhang, Yichen Wang, Shihui Yan et al.
Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis
Bichen Wu, Ching-Yao Chuang, Xiaoyan Wang et al.
Classifier-Free Guidance Inside the Attraction Basin May Cause Memorization
Anubhav Jain, Yuya Kobayashi, Takashi Shibuya et al.
Image Quality Assessment: From Human to Machine Preference
Chunyi Li, Yuan Tian, Xiaoyue Ling et al.
MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision
Ruicheng Wang, Sicheng Xu, Cassie Lee Dai et al.
Knowledge Bridger: Towards Training-Free Missing Modality Completion
Guanzhou Ke, Shengfeng He, Xiao-Li Wang et al.
Reconciling Stochastic and Deterministic Strategies for Zero-shot Image Restoration using Diffusion Model in Dual
Chong Wang, Lanqing Guo, Zixuan Fu et al.
Harnessing Frequency Spectrum Insights for Image Copyright Protection Against Diffusion Models
Zhenguang Liu, Chao Shuai, Shaojing Fan et al.
Gain from Neighbors: Boosting Model Robustness in the Wild via Adversarial Perturbations Toward Neighboring Classes
Zhou Yang, Mingtao Feng, Tao Huang et al.
M^3-VOS: Multi-Phase, Multi-Transition, and Multi-Scenery Video Object Segmentation
Zixuan Chen, Jiaxin Li, Junxuan Liang et al.
Enhancing Creative Generation on Stable Diffusion-based Models
Jiyeon Han, Dahee Kwon, Gayoung Lee et al.
EquiPose: Exploiting Permutation Equivariance for Relative Camera Pose Estimation
Yuzhen Liu, Qiulei Dong
Visual Consensus Prompting for Co-Salient Object Detection
Jie Wang, Nana Yu, Zihao Zhang et al.
BimArt: A Unified Approach for the Synthesis of 3D Bimanual Interaction with Articulated Objects
Wanyue Zhang, Rishabh Dabral, Vladislav Golyanik et al.
DiffusionDrive: Truncated Diffusion Model for End-to-End Autonomous Driving
Bencheng Liao, Shaoyu Chen, haoran yin et al.
Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method
Xinshuai Song, weixing chen, Yang Liu et al.
Classifier-guided CLIP Distillation for Unsupervised Multi-label Classification
Dongseob Kim, Hyunjung Shim
CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning
Yang Yue, Yulin Wang, Chenxin Tao et al.
Higher-Order Ratio Cycles for Fast and Globally Optimal Shape Matching
Paul Roetzer, Viktoria Ehm, Daniel Cremers et al.
Joint Vision-Language Social Bias Removal for CLIP
Haoyu Zhang, Yangyang Guo, Mohan Kankanhalli
ReDiffDet: Rotation-equivariant Diffusion Model for Oriented Object Detection
Jiaqi Zhao, Zeyu Ding, Yong Zhou et al.
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
Hang Yin, Xiuwei Xu, Linqing Zhao et al.
IDEA: Inverted Text with Cooperative Deformable Aggregation for Multi-modal Object Re-Identification
Yuhao Wang, Yongfeng Lv, Pingping Zhang et al.
FIFA: Fine-grained Inter-frame Attention for Driver's Video Gaze Estimation
Daosong Hu, Mingyue Cui, Kai Huang
RICCARDO: Radar Hit Prediction and Convolution for Camera-Radar 3D Object Detection
Yunfei Long, Abhinav Kumar, Xiaoming Liu et al.
Floating No More: Object-Ground Reconstruction from a Single Image
Yunze Man, Yichen Sheng, Jianming Zhang et al.
SKE-Layout: Spatial Knowledge Enhanced Layout Generation with LLMs
Junsheng Wang, Nieqing Cao, Yan Ding et al.
DiffLO: Semantic-Aware LiDAR Odometry with Diffusion-Based Refinement
huang yongshu, Chen Liu, Minghang Zhu et al.
pFedMxF: Personalized Federated Class-Incremental Learning with Mixture of Frequency Aggregation
Yifei Zhang, Hao Zhu, Alysa Ziying Tan et al.
HyperNet Fields: Efficiently Training Hypernetworks without Ground Truth by Learning Weight Trajectories
Eric Hedlin, Munawar Hayat, Fatih Porikli et al.
The Art of Deception: Color Visual Illusions and Diffusion Models
Alexandra Gomez-Villa, Kai Wang, C.Alejandro Parraga et al.
iG-6DoF: Model-free 6DoF Pose Estimation for Unseen Object via Iterative 3D Gaussian Splatting
Tuo Cao, Fei LUO, Jiongming Qin et al.
Continuous Adverse Weather Removal via Degradation-Aware Distillation
Xin Lu, Jie Xiao, Yurui Zhu et al.
Condensing Action Segmentation Datasets via Generative Network Inversion
Guodong Ding, Rongyu Chen, Angela Yao
High-fidelity 3D Object Generation from Single Image with RGBN-Volume Gaussian Reconstruction Model
Yiyang Shen, Kun Zhou, He Wang et al.
ACAttack: Adaptive Cross Attacking RGB-T Tracker via Multi-Modal Response Decoupling
Xinyu Xiang, Qinglong Yan, HAO ZHANG et al.
Subspace Constraint and Contribution Estimation for Heterogeneous Federated Learning
Xiangtao Zhang, Sheng Li, Ao Li et al.
Emphasizing Discriminative Features for Dataset Distillation in Complex Scenarios
Kai Wang, Zekai Li, Zhi-Qi Cheng et al.
Noise-Consistent Siamese-Diffusion for Medical Image Synthesis and Segmentation
Kunpeng Qiu, Zhiqiang Gao, Zhiying Zhou et al.
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
Bingda Tang, Sayak Paul, Boyang Zheng et al.
POPEN: Preference-Based Optimization and Ensemble for LVLM-Based Reasoning Segmentation
Lanyun Zhu, Tianrun Chen, Qianxiong Xu et al.
SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting
Dongliang Luo, Hanshen Zhu, Ziyang Zhang et al.
Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens
Zhangqi Jiang, Junkai Chen, Beier Zhu et al.
Distilling Long-tailed Datasets
Zhenghao Zhao, Haoxuan Wang, Yuzhang Shang et al.
UNIALIGN: Scaling Multimodal Alignment within One Unified Model
bo zhou, Liulei Li, Yujia Wang et al.
Zero-Shot 4D Lidar Panoptic Segmentation
Yushan Zhang, Aljoša Ošep, Laura Leal-Taixe et al.
DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering
Jingzhou Luo, Yang Liu, weixing chen et al.
Knowledge Memorization and Rumination for Pre-trained Model-based Class-Incremental Learning
Zijian Gao, Wangwang Jia, Xingxing Zhang et al.
Unveil Inversion and Invariance in Flow Transformer for Versatile Image Editing
Pengcheng Xu, Boyuan Jiang, Xiaobin Hu et al.
SeqMvRL: A Sequential Fusion Framework for Multi-view Representation Learning
Ren Wang, Haoliang Sun, Yuxiu Lin et al.
Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language
Yicheng Chen, Xiangtai Li, Yining Li et al.
Efficient Motion-Aware Video MLLM
Zijia Zhao, Yuqi Huo, Tongtian Yue et al.
RNb-NeuS: Reflectance and Normal-based Multi-View 3D Reconstruction
Baptiste Brument, Robin Bruneau, Yvain Queau et al.
Hyperspectral Pansharpening via Diffusion Models with Iteratively Zero-Shot Guidance
Jin-Liang Xiao, Ting-Zhu Huang, Liang-Jian Deng et al.
Structure-from-Motion with a Non-Parametric Camera Model
Yihan Wang, Linfei Pan, Marc Pollefeys et al.
EventPSR: Surface Normal and Reflectance Estimation from Photometric Stereo Using an Event Camera
Bohan Yu, Jin Han, Boxin Shi et al.
LAL: Enhancing 3D Human Motion Prediction with Latency-aware Auxiliary Learning
Xiaoning Sun, Dong Wei, Huaijiang Sun et al.
Sea-ing in Low-light
Nisha Varghese, A. N. Rajagopalan
Analyzing the Synthetic-to-Real Domain Gap in 3D Hand Pose Estimation
Zhuoran ZHAO, Linlin Yang, Pengzhan Sun et al.
Accurate Scene Text Recognition with Efficient Model Scaling and Cloze Self-Distillation
Andrea Maracani, Savas Ozkan, Sijun Cho et al.
DiskVPS: Vanishing Point Detector via Hough Transform in a Disk Region
Jianping Wu
Towards Smart Point-and-Shoot Photography
Jiawan Li, Fei Zhou, Zhipeng Zhong et al.
Towards Autonomous Micromobility through Scalable Urban Simulation
Wayne Wu, Honglin He, Chaoyuan Zhang et al.
Image Reconstruction from Readout-Multiplexed Single-Photon Detector Arrays
Shashwath Bharadwaj, Ruangrawee Kitichotkul, Akshay Agarwal et al.
Learning on Model Weights using Tree Experts
Eliahu Horwitz, Bar Cavia, Jonathan Kahana et al.
PQPP: A Joint Benchmark for Text-to-Image Prompt and Query Performance Prediction
Eduard Poesina, Adriana Valentina Costache, Adrian-Gabriel Chifu et al.
CheXwhatsApp: A Dataset for Exploring Challenges in the Diagnosis of Chest X-rays through Mobile Devices
Mariamma Antony, Rajiv Porana, Sahil M. Lathiya et al.
Large-scale Multi-view Tensor Clustering with Implicit Linear Kernels
Jiyuan Liu, Xinwang Liu, chuankun Li et al.
Learning-enabled Polynomial Lyapunov Function Synthesis via High-Accuracy Counterexample-Guided Framework
Hanrui Zhao, Niuniu Qi, Mengxin Ren et al.
The Impact Label Noise and Choice of Threshold has on Cross-Entropy and Soft-Dice in Image Segmentation
Marcus Nordström, Atsuto Maki, Henrik Hult
EfficientLLaVA: Generalizable Auto-Pruning for Large Vision-language Models
Yinan Liang, Ziwei Wang, Xiuwei Xu et al.
Advancing Adversarial Robustness in GNeRFs: The IL2-NeRF Attack
Nicole Meng, Caleb Manicke, Ronak Sahu et al.
From Prototypes to General Distributions: An Efficient Curriculum for Masked Image Modeling
Jinhong Lin, Cheng-En Wu, Huanran Li et al.
Motions as Queries: One-Stage Multi-Person Holistic Human Motion Capture
Kenkun Liu, Yurong Fu, Weihao Yuan et al.
SINR: Sparsity Driven Compressed Implicit Neural Representations
Dhananjaya Jayasundara, Sudarshan Rajagopalan, Yasiru Ranasinghe et al.
ArticulatedGS: Self-supervised Digital Twin Modeling of Articulated Objects using 3D Gaussian Splatting
Guo Junfu, Yu Xin, Gaoyi Liu et al.
Spiking Transformer: Introducing Accurate Addition-Only Spiking Self-Attention for Transformer
Yufei Guo, Xiaode Liu, Yuanpei Chen et al.
Rethinking Noisy Video-Text Retrieval via Relation-aware Alignment
Huakai Lai, Guoxin Xiong, Huayu Mai et al.
NoiseCtrl: A Sampling-Algorithm-Agnostic Conditional Generation Method for Diffusion Models
Longquan Dai, He Wang, Jinhui Tang
Reconstructing Close Human Interaction with Appearance and Proxemics Reasoning
Buzhen Huang, Chen Li, Chongyang Xu et al.
OSMamba: Omnidirectional Spectral Mamba with Dual-Domain Prior Generator for Exposure Correction
Gehui Li, Bin Chen, Chen Zhao et al.
Erase Diffusion: Empowering Object Removal Through Calibrating Diffusion Pathways
Yi Liu, Hao Zhou, Benlei Cui et al.
M3GYM: A Large-Scale Multimodal Multi-view Multi-person Pose Dataset for Fitness Activity Understanding in Real-world Settings
Qingzheng Xu, Ru Cao, Xin Shen et al.
Star with Bilinear Mapping
Zelin Peng, Yu Huang, Zhengqin Xu et al.
PCM : Picard Consistency Model for Fast Parallel Sampling of Diffusion Models
Junhyuk So, Jiwoong Shin, Chaeyeon Jang et al.
Towards Precise Scaling Laws for Video Diffusion Transformers
Yuanyang Yin, Yaqi Zhao, Mingwu Zheng et al.
RUBIK: A Structured Benchmark for Image Matching across Geometric Challenges
Thibaut Loiseau, Guillaume Bourmaud
RobSense: A Robust Multi-modal Foundation Model for Remote Sensing with Static, Temporal, and Incomplete Data Adaptability
Minh Kha Do, Kang Han, Phu Lai et al.
Sonic: Shifting Focus to Global Audio Perception in Portrait Animation
Xiaozhong Ji, Xiaobin Hu, Zhihong Xu et al.
MESC-3D:Mining Effective Semantic Cues for 3D Reconstruction from a Single Image
Shaoming Li, Qing Cai, Songqi KONG et al.
Probing the Mid-level Vision Capabilities of Self-Supervised Learning
Xuweiyi Chen, Markus Marks, Zezhou Cheng
T2SG: Traffic Topology Scene Graph for Topology Reasoning in Autonomous Driving
Changsheng Lv, Mengshi Qi, Liang Liu et al.
ConceptGuard: Continual Personalized Text-to-Image Generation with Forgetting and Confusion Mitigation
Zirun Guo, Tao Jin
Diffusion Bridge: Leveraging Diffusion Model to Reduce the Modality Gap Between Text and Vision for Zero-Shot Image Captioning
Jeongryong Lee, Yejee Shin, Geonhui Son et al.
MODfinity: Unsupervised Domain Adaptation with Multimodal Information Flow Intertwining
Shanglin Liu, Jianming Lv, Jingdan Kang et al.
Volume Tells: Dual Cycle-Consistent Diffusion for 3D Fluorescence Microscopy De-noising and Super-Resolution
ZELIN LI, Chenwei Wang, Zhaoke Huang et al.
Rethinking Temporal Fusion with a Unified Gradient Descent View for 3D Semantic Occupancy Prediction
Dubing Chen, Huan Zheng, Jin Fang et al.
The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation
Bingjie Gao, Xinyu Gao, Xiaoxue Wu et al.
CLIP is Almost All You Need: Towards Parameter-Efficient Scene Text Retrieval without OCR
Xugong Qin, peng zhang, Jun Jie Ou Yang et al.
DreamTrack: Dreaming the Future for Multimodal Visual Object Tracking
Mingzhe Guo, Weiping Tan, Wenyu Ran et al.
Modeling Multiple Normal Action Representations for Error Detection in Procedural Tasks
Wei-Jin Huang, Yuan-Ming Li, Zhi-Wei Xia et al.
Learned Image Compression with Dictionary-based Entropy Model
Jingbo Lu, Leheng Zhang, Xingyu Zhou et al.
Training Data Provenance Verification: Did Your Model Use Synthetic Data from My Generative Model for Training?
Yuechen Xie, Jie Song, Huiqiong Wang et al.
VLMs-Guided Representation Distillation for Efficient Vision-Based Reinforcement Learning
Haoran Xu, Peixi Peng, Guang Tan et al.
LeanGaussian: Breaking Pixel or Point Cloud Correspondence in Modeling 3D Gaussians
Jiamin WU, Kenkun Liu, Han Gao et al.
Shadow Generation Using Diffusion Model with Geometry Prior
Haonan Zhao, Qingyang Liu, Xinhao Tao et al.
STEP: Enhancing Video-LLMs’ Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training
Haiyi Qiu, Minghe Gao, Long Qian et al.
Stop Learning it all to Mitigate Visual Hallucination, Focus on the Hallucination Target.
Dokyoon Yoon, Youngsook Song, Woomyoung Park
AniGrad: Anisotropic Gradient-Adaptive Sampling for 3D Reconstruction From Monocular Video
Noah Stier, Alex Rich, Pradeep Sen et al.
HOT: Hadamard-based Optimized Training
Seonggon Kim, Juncheol Shin, Seung-taek Woo et al.
BOOTPLACE: Bootstrapped Object Placement with Detection Transformers
Hang Zhou, Xinxin Zuo, Rui Ma et al.
Q-PART: Quasi-Periodic Adaptive Regression with Test-time Training for Pediatric Left Ventricular Ejection Fraction Regression
Jie Liu, Tiexin Qin, Hui Liu et al.
Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Large Model Enhancement
Qianhan Feng, Wenshuo Li, Tong Lin et al.
Think Small, Act Big: Primitive Prompt Learning for Lifelong Robot Manipulation
Yuanqi Yao, Siao Liu, Haoming Song et al.
BEVDiffuser: Plug-and-Play Diffusion Model for BEV Denoising with Ground-Truth Guidance
Xin Ye, Burhan Yaman, Sheng Cheng et al.
How to Merge Your Multimodal Models Over Time?
Sebastian Dziadzio, Vishaal Udandarao, Karsten Roth et al.
Discovering Hidden Visual Concepts Beyond Linguistic Input in Infant Learning
Xueyi Ke, Satoshi Tsutsui, Yayun Zhang et al.
Learning Textual Prompts for Open-World Semi-Supervised Learning
Yuxin Fan, Junbiao Cui, Jiye Liang
Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation
Xiao Ma, Sumit Patidar, Iain Haughton et al.
Easy-editable Image Vectorization with Multi-layer Multi-scale Distributed Visual Feature Embedding
Ye Chen, Zhangli Hu, Zhongyin Zhao et al.
DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding
Yudong Han, Qingpei Guo, Liyuan Pan et al.
Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization
Zhanhao Liang, Yuhui Yuan, Shuyang Gu et al.
VinaBench: Benchmark for Faithful and Consistent Visual Narratives
Silin Gao, Sheryl Mathew, Li Mi et al.
Automated Proof of Polynomial Inequalities via Reinforcement Learning
Banglong Liu, Niuniu Qi, Xia Zeng et al.
Active Hyperspectral Imaging Using an Event Camera
Bohan Yu, Jinxiu Liang, Zhuofeng Wang et al.
Bridging the Gap between Gaussian Diffusion Models and Universal Quantization for Image Compression
Lucas Relic, Roberto Azevedo, Yang Zhang et al.
InstanceGaussian: Appearance-Semantic Joint Gaussian Representation for 3D Instance-Level Perception
Haijie Li, Yanmin Wu, Jiarui Meng et al.
BIP3D: Bridging 2D Images and 3D Perception for Embodied Intelligence
Xuewu Lin, Tianwei Lin, Alan Huang et al.
MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors
Riku Murai, Eric Dexheimer, Andrew J. Davison
VoxelSplat: Dynamic Gaussian Splatting as an Effective Loss for Occupancy and Flow Prediction
Ziyue Zhu, Shenlong Wang, Jin Xie et al.
SOGS: Second-Order Anchor for Advanced 3D Gaussian Splatting
Jiahui Zhang, Fangneng Zhan, Ling Shao et al.
Online Task-Free Continual Learning via Dynamic Expansionable Memory Distribution
Fei Ye, Adrian Bors
OffsetOPT: Explicit Surface Reconstruction without Normals
Huan Lei
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs
Lucas Ventura, Antoine Yang, Cordelia Schmid et al.
Enhancing Few-Shot Class-Incremental Learning via Training-Free Bi-Level Modality Calibration
Yiyang Chen, Tianyu Ding, Lei Wang et al.
Animate and Sound an Image
Xihua Wang, Ruihua Song, Chongxuan Li et al.
Plug-and-Play Interpretable Responsible Text-to-Image Generation via Dual-Space Multi-facet Concept Control
Basim Azam, Naveed Akhtar
Generalized Gaussian Entropy Model for Point Cloud Attribute Compression with Dynamic Likelihood Intervals
Changhao Peng
Advancing Generalizable Tumor Segmentation with Anomaly-Aware Open-Vocabulary Attention Maps and Frozen Foundation Diffusion Models
Yankai Jiang, Peng Zhang, Donglin Yang et al.
Towards Explainable and Unprecedented Accuracy in Matching Challenging Finger Crease Patterns
Zhenyu Zhou, Chengdong Dong, Ajay Kumar
Toward Robust Neural Reconstruction from Sparse Point Sets
Amine Ouasfi, Shubhendu Jena, Eric Marchand et al.
Just Dance with pi! A Poly-modal Inductor for Weakly-supervised Video Anomaly Detection
Snehashis Majhi, Giacomo D'Amicantonio, Antitza Dantcheva et al.
PERSE: Personalized 3D Generative Avatars from A Single Portrait
Hyunsoo Cha, Inhee Lee, Hanbyul Joo
Sketchy Bounding-box Supervision for 3D Instance Segmentation
qian deng, Le Hui, Jin Xie et al.
Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
Wenbin An, Feng Tian, Sicong Leng et al.
HSI-GPT: A General-Purpose Large Scene-Motion-Language Model for Human Scene Interaction
Yuan Wang, Yali Li, Lixiang Li et al.
Diffusion Model is Effectively Its Own Teacher
Xinyin Ma, Runpeng Yu, Songhua Liu et al.
SoftShadow: Leveraging Soft Masks for Penumbra-Aware Shadow Removal
Xinrui Wang, Lanqing Guo, Xiyu Wang et al.
Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection
wenqiao Li, Yao Gu, Xintao Chen et al.
Flow-NeRF: Joint Learning of Geometry, Poses, and Dense Flow within Unified Neural Representations
Xunzhi Zheng, Dan Xu
MotionBench: Benchmarking and Improving Fine-grained Video Motion Understanding for Vision Language Models
Wenyi Hong, Yean Cheng, Zhuoyi Yang et al.
RASP: Revisiting 3D Anamorphic Art for Shadow-Guided Packing of Irregular Objects
Soumyaratna Debnath, Ashish Tiwari, Kaustubh Sadekar et al.
Exploring Scene Affinity for Semi-Supervised LiDAR Semantic Segmentation
Chuandong Liu, Xingxing Weng, Shuguo Jiang et al.
Towards Satellite Image Road Graph Extraction: A Global-Scale Dataset and A Novel Method
Pan Yin, Kaiyu Li, Xiangyong Cao et al.
Adaptive Parameter Selection for Tuning Vision-Language Models
Yi Zhang, Yi-Xuan Deng, Meng-Hao Guo et al.
ManipTrans: Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning
Kailin Li, Puhao Li, Tengyu Liu et al.
COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting
Jiaxin Zhang, Junjun Jiang, Youyu Chen et al.
GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill
Jieming Cui, Tengyu Liu, Ziyu Meng et al.
Revisiting Audio-Visual Segmentation with Vision-Centric Transformer
Shaofei Huang, Rui Ling, Tianrui Hui et al.
Less is More: Efficient Image Vectorization with Adaptive Parameterization
Kaibo Zhao, Liang Bao, Yufei Li et al.
Steering Away from Harm: An Adaptive Approach to Defending Vision Language Model Against Jailbreaks
Han Wang, Gang Wang, Huan Zhang
PEER Pressure: Model-to-Model Regularization for Single Source Domain Generalization
Dongkyu Cho, Inwoo Hwang, Sanghack Lee
Gaussian Splashing: Unified Particles for Versatile Motion Synthesis and Rendering
Yutao Feng, Xiang Feng, Yintong Shang et al.
Improving Accuracy and Calibration via Differentiated Deep Mutual Learning
Han Liu, Peng Cui, Bingning Wang et al.
Similarity-Guided Layer-Adaptive Vision Transformer for UAV Tracking
chaocan xue, Bineng Zhong, Qihua Liang et al.
Unified Dense Prediction of Video Diffusion
Lehan Yang, Lu Qi, Xiangtai Li et al.
Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer
Jiahao Cui, Hui Li, Qingkun Su et al.
Improving the Transferability of Adversarial Attacks on Face Recognition with Diverse Parameters Augmentation
Fengfan Zhou, Bangjie Yin, Hefei Ling et al.
Joint Scheduling of Causal Prompts and Tasks for Multi-Task Learning
Chaoyang Li, Jianyang Qin, Jinhao Cui et al.
OPTICAL: Leveraging Optimal Transport for Contribution Allocation in Dataset Distillation
Xiao Cui, Yulei Qin, Wengang Zhou et al.
Is Your World Simulator a Good Story Presenter? A Consecutive Events-Based Benchmark for Future Long Video Generation
Yiping Wang, Xuehai He, Kuan Wang et al.
DynScene: Scalable Generation of Dynamic Robotic Manipulation Scenes for Embodied AI
Sangmin Lee, Sungyong Park, Heewon Kim
Reward Fine-Tuning Two-Step Diffusion Models via Learning Differentiable Latent-Space Surrogate Reward
Zhiwei Jia, Yuesong Nan, Huixi Zhao et al.
CaricatureBooth: Data-Free Interactive Caricature Generation in a Photo Booth
Zhiyu Qu, Yunqi Miao, Zhensong Zhang et al.
Incremental Object Keypoint Learning
Mingfu Liang, Jiahuan Zhou, Xu Zou et al.
DefMamba: Deformable Visual State Space Model
Leiye Liu, Miao Zhang, Jihao Yin et al.
SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation
Aleksei Bokhovkin, Quan Meng, Shubham Tulsiani et al.
VideoGEM: Training-free Action Grounding in Videos
Felix Vogel, Walid Bousselham, Anna Kukleva et al.
LEMON: Learning 3D Human-Object Interaction Relation from 2D Images
Yuhang Yang, Wei Zhai, Hongchen Luo et al.
SceneDiffuser++: City-Scale Traffic Simulation via a Generative World Model
Shuhan Tan, John Wheatley Lambert, Hong Jeon et al.
Learning Extremely High Density Crowds as Active Matters
Feixiang He, Jiangbei Yue, Jialin Zhu et al.
Mosaic3D: Foundation Dataset and Model for Open-Vocabulary 3D Segmentation
Junha Lee, Chunghyun Park, Jaesung Choe et al.
Channel-wise Noise Scheduled Diffusion for Inverse Rendering in Indoor Scenes
JunYong Choi, Min-Cheol Sagong, SeokYeong Lee et al.
MobileH2R: Learning Generalizable Human to Mobile Robot Handover Exclusively from Scalable and Diverse Synthetic Data
Zifan Wang, Ziqing Chen, Junyu Chen et al.
Is this Generated Person Existed in Real-world? Fine-grained Detecting and Calibrating Abnormal Human-body
Zeqing Wang, Qingyang Ma, Wentao Wan et al.
ProReflow: Progressive Reflow with Decomposed Velocity
Lei Ke, Haohang Xu, Xuefei Ning et al.
Discovering Fine-Grained Visual-Concept Relations by Disentangled Optimal Transport Concept Bottleneck Models
Yan Xie, Zequn Zeng, Hao Zhang et al.
Event-Equalized Dense Video Captioning
Kangyi Wu, Pengna Li, Jingwen Fu et al.
GazeGene: Large-scale Synthetic Gaze Dataset with 3D Eyeball Annotations
Yiwei Bao, Zhiming Wang, Feng Lu
Shape and Texture: What Influences Reliable Optical Flow Estimation?
Libo Long, Xiao Hu, Jochen Lang
Attribute-formed Class-specific Concept Space: Endowing Language Bottleneck Model with Better Interpretability and Scalability
Jianyang Zhang, Qianli Luo, Guowu Yang et al.
Feature Information Driven Position Gaussian Distribution Estimation for Tiny Object Detection
Jinghao Bian, Mingtao Feng, Weisheng Dong et al.
PRaDA: Projective Radial Distortion Averaging
Daniil Sinitsyn, Linus Härenstam-Nielsen, Daniel Cremers
Embodied Scene Understanding for Vision Language Models via MetaVQA
Weizhen Wang, Chenda Duan, Zhenghao Peng et al.
Rate-In: Information-Driven Adaptive Dropout Rates for Improved Inference-Time Uncertainty Estimation
Tal Zeevi, Ravid Shwartz-Ziv, Yann LeCun et al.