Most Cited CVPR "junction trees" Papers
5,589 papers found • Page 20 of 28
Conference
Spectral Informed Mamba for Robust Point Cloud Processing
Ali Bahri, Moslem Yazdanpanah, Mehrdad Noori et al.
DiN: Diffusion Model for Robust Medical VQA with Semantic Noisy Labels
Erjian Guo, Zhen Zhao, Zicheng Wang et al.
VideoDirector: Precise Video Editing via Text-to-Video Models
Yukun Wang, Longguang Wang, Zhiyuan Ma et al.
MEGA: Masked Generative Autoencoder for Human Mesh Recovery
Guénolé Fiche, Simon Leglaive, Xavier Alameda-Pineda et al.
TacoDepth: Towards Efficient Radar-Camera Depth Estimation with One-stage Fusion
Yiran Wang, Jiaqi Li, Chaoyi Hong et al.
Omni-Scene: Omni-Gaussian Representation for Ego-Centric Sparse-View Scene Reconstruction
Dongxu Wei, Zhiqi Li, Peidong Liu
AvatarArtist: Open-Domain 4D Avatarization
Hongyu Liu, Xuan Wang, Ziyu Wan et al.
DI-PCG: Diffusion-based Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation
Wang Zhao, Yan-Pei Cao, Jiale Xu et al.
Dragin3D: Image Editing by Dragging in 3D Space
Weiran Guang, Xiaoguang Gu, Mengqi Huang et al.
T2V-CompBench: A Comprehensive Benchmark for Compositional Text-to-video Generation
Kaiyue Sun, Kaiyi Huang, Xian Liu et al.
ORIDa: Object-centric Real-world Image Composition Dataset
Jinwoo Kim, Sangmin Han, Jinho Jeong et al.
MeGA: Hybrid Mesh-Gaussian Head Avatar for High-Fidelity Rendering and Head Editing
Cong Wang, Di Kang, Heyi Sun et al.
Comprehensive Information Bottleneck for Unveiling Universal Attribution to Interpret Vision Transformers
Jung-Ho Hong, Ho-Joong Kim, Kyu-Sung Jeon et al.
Towards Universal Dataset Distillation via Task-Driven Diffusion
Ding Qi, Jian Li, Junyao Gao et al.
VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video-Text Models
Dahun Kim, AJ Piergiovanni, Ganesh Satish Mallya et al.
Do We Always Need the Simplicity Bias? Looking for Optimal Inductive Biases in the Wild
Damien Teney, Liangze Jiang, Florin Gogianu et al.
TriTex: Learning Texture from a Single Mesh via Triplane Semantic Features
Dana Cohen-Bar, Daniel Cohen-Or, Gal Chechik et al.
Cross-View Completion Models are Zero-shot Correspondence Estimators
Honggyu An, Jin Hyeon Kim, Seonghoon Park et al.
Segment Anything, Even Occluded
Wei-En Tai, Yu-Lin Shih, Cheng Sun et al.
Advancing Multiple Instance Learning with Continual Learning for Whole Slide Imaging
Xianrui Li, Yufei Cui, Jun Li et al.
Decoupled Distillation to Erase: A General Unlearning Method for Any Class-centric Tasks
Yu Zhou, Dian Zheng, Qijie Mo et al.
ChainHOI: Joint-based Kinematic Chain Modeling for Human-Object Interaction Generation
Ling-An Zeng, Guohong Huang, Yi-Lin Wei et al.
WAVE: Weight Templates for Adaptive Initialization of Variable-sized Models
Fu Feng, Yucheng Xie, Jing Wang et al.
CraftsMan3D: High-fidelity Mesh Generation with 3D Native Diffusion and Interactive Geometry Refiner
Weiyu Li, Jiarui Liu, Hongyu Yan et al.
Black Swan: Abductive and Defeasible Video Reasoning in Unpredictable Events
Aditya Chinchure, Sahithya Ravi, Raymond Ng et al.
Filter Images First, Generate Instructions Later: Pre-Instruction Data Selection for Visual Instruction Tuning
Bardia Safaei, Faizan Siddiqui, Jiacong Xu et al.
VL2Lite: Task-Specific Knowledge Distillation from Large Vision-Language Models to Lightweight Networks
Jinseong Jang, Chunfei Ma, Byeongwon Lee
CARE Transformer: Mobile-Friendly Linear Visual Transformer via Decoupled Dual Interaction
Yuan Zhou, Qingshan Xu, Jiequan Cui et al.
Video-Bench: Human-Aligned Video Generation Benchmark
Hui Han, Siyuan Li, Jiaqi Chen et al.
PMNI: Pose-free Multi-view Normal Integration for Reflective and Textureless Surface Reconstruction
Mingzhi Pei, Xu Cao, Xiangyi Wang et al.
Joint Optimization of Neural Radiance Fields and Continuous Camera Motion from a Monocular Video
Hoang Chuong Nguyen, Wei Mao, Jose M. Alvarez et al.
Dynamic Pseudo Labeling via Gradient Cutting for High-Low Entropy Exploration
Jae Hyeon Park, Joo Hyeon Jeon, Jae Yun Lee et al.
MegaSaM: Accurate, Fast and Robust Structure and Motion from Casual Dynamic Videos
Zhengqi Li, Richard Tucker, Forrester Cole et al.
Shift the Lens: Environment-Aware Unsupervised Camouflaged Object Detection
Ji Du, Fangwei Hao, Mingyang Yu et al.
Robust Audio-Visual Segmentation via Audio-Guided Visual Convergent Alignment
Chen Liu, Peike Li, Liying Yang et al.
Distribution Prototype Diffusion Learning for Open-set Supervised Anomaly Detection
Fuyun Wang, Tong Zhang, Yuanzhi Wang et al.
Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster Inference
Hao Yin, Guangzong Si, Zilei Wang
Minority-Focused Text-to-Image Generation via Prompt Optimization
Soobin Um, Jong Chul Ye
CADRef: Robust Out-of-Distribution Detection via Class-Aware Decoupled Relative Feature Leveraging
Zhiwei Ling, Yachen Chang, Hailiang Zhao et al.
A Selective Re-learning Mechanism for Hyperspectral Fusion Imaging
Yuanye Liu, jinyang liu, Renwei Dian et al.
Text-Driven Fashion Image Editing with Compositional Concept Learning and Counterfactual Abduction
Shanshan Huang, Haoxuan Li, Chunyuan Zheng et al.
HotSpot: Signed Distance Function Optimization with an Asymptotically Sufficient Condition
Zimo Wang, Cheng Wang, Taiki Yoshino et al.
3D Student Splatting and Scooping
Jialin Zhu, Jiangbei Yue, Feixiang He et al.
LOGICZSL: Exploring Logic-induced Representation for Compositional Zero-shot Learning
Peng Wu, Xiankai Lu, Hao Hu et al.
Learning Partonomic 3D Reconstruction from Image Collections
Xiaoqian Ruan, Pei Yu, Dian Jia et al.
No Thing, Nothing: Highlighting Safety-Critical Classes for Robust LiDAR Semantic Segmentation in Adverse Weather
Junsung Park, HwiJeong Lee, Inha Kang et al.
Cross-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images
Jie Mei, Chenyu Lin, Yu Qiu et al.
UniHOPE: A Unified Approach for Hand-Only and Hand-Object Pose Estimation
Yinqiao Wang, Hao Xu, Pheng-Ann Heng et al.
Chain of Semantics Programming in 3D Gaussian Splatting Representation for 3D Vision Grounding
Jiaxin Shi, Mingyue Xiang, Hao Sun et al.
MEAT: Multiview Diffusion Model for Human Generation on Megapixels with Mesh Attention
Yuhan Wang, Fangzhou Hong, Shuai Yang et al.
Efficient Data Driven Mixture-of-Expert Extraction from Trained Networks
Uranik Berisha, Jens Mehnert, Alexandru Paul Condurache
Towards Precise Embodied Dialogue Localization via Causality Guided Diffusion
Haoyu Wang, Le Wang, Sanping Zhou et al.
Neural Inverse Rendering from Propagating Light
Anagh Malik, Benjamin Attal, Andrew Xie et al.
A Universal Scale-Adaptive Deformable Transformer for Image Restoration across Diverse Artifacts
Xuyi He, Yuhui Quan, Ruotao Xu et al.
Gromov–Wasserstein Problem with Cyclic Symmetry
Shoichiro Takeda, Yasunori Akagi
RoGSplat: Learning Robust Generalizable Human Gaussian Splatting from Sparse Multi-View Images
Junjin Xiao, Qing Zhang, Yongwei Nie et al.
SDBF: Steep-Decision-Boundary Fingerprinting for Hard-Label Tampering Detection of DNN Models
Xiaofan Bai, Shixin Li, Xiaojing Ma et al.
AnyCam: Learning to Recover Camera Poses and Intrinsics from Casual Videos
Felix Wimbauer, Weirong Chen, Dominik Muhle et al.
Effortless Active Labeling for Long-Term Test-Time Adaptation
Guowei Wang, Changxing Ding
LATTE-MV: Learning to Anticipate Table Tennis Hits from Monocular Videos
Daniel Etaat, Dvij Rajesh Kalaria, Nima Rahmanian et al.
Learning Person-Specific Animatable Face Models from In-the-Wild Images via a Shared Base Model
Yuxiang Mao, Zhenfeng Fan, Zhijie Zhang et al.
Gradient Inversion Attacks on Parameter-Efficient Fine-Tuning
Hasin Us Sami, Swapneel Sen, Amit K. Roy-Chowdhury et al.
RDD: Robust Feature Detector and Descriptor using Deformable Transformer
Gonglin Chen, Tianwen Fu, Haiwei Chen et al.
Stacking Brick by Brick: Aligned Feature Isolation for Incremental Face Forgery Detection
Jikang Cheng, Zhiyuan Yan, Ying Zhang et al.
Closest Neighbors are Harmful for Lightweight Masked Auto-encoders
Jian Meng, Ahmed Hasssan, Li Yang et al.
SKDream: Controllable Multi-view and 3D Generation with Arbitrary Skeletons
Yuanyou Xu, Zongxin Yang, Yi Yang
Dual Exposure Stereo for Extended Dynamic Range 3D Imaging
Juhyung Choi, Jinneyong Kim, Seokjun Choi et al.
Samba: A Unified Mamba-based Framework for General Salient Object Detection
Jiahao He, Keren Fu, Xiaohong Liu et al.
FlexUOD: The Answer to Real-world Unsupervised Image Outlier Detection
Zhonghang Liu, Kun Zhou, Changshuo Wang et al.
GraphI2P: Image-to-Point Cloud Registration with Exploring Pattern of Correspondence via Graph Learning
Lin Bie, Shouan Pan, Siqi Li et al.
FedCS: Coreset Selection for Federated Learning
Chenhe Hao, Weiying Xie, Daixun Li et al.
Cross-Rejective Open-Set SAR Image Registration
Shasha Mao, Shiming Lu, Zhaolong Du et al.
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
Feng Liu, Shiwei Zhang, Xiaofeng Wang et al.
Bridging Gait Recognition and Large Language Models Sequence Modeling
Shaopeng Yang, Jilong Wang, Saihui Hou et al.
MANTA: Diffusion Mamba for Efficient and Effective Stochastic Long-Term Dense Action Anticipation
Olga Zatsarynna, Emad Bahrami, Yazan Abu Farha et al.
FIFA: Fine-grained Inter-frame Attention for Driver's Video Gaze Estimation
Daosong Hu, Mingyue Cui, Kai Huang
Learning from Streaming Video with Orthogonal Gradients
Tengda Han, Dilara Gokay, Joseph Heyward et al.
SuperLightNet: Lightweight Parameter Aggregation Network for Multimodal Brain Tumor Segmentation
Feng Yu, Jiacheng Cao, Li Liu et al.
VidSeg: Training-free Video Semantic Segmentation based on Diffusion Models
Qian Wang, Abdelrahman Eldesokey, Mohit Mendiratta et al.
Efficient Diffusion as Low Light Enhancer
Guanzhou Lan, Qianli Ma, YUQI YANG et al.
VI^3NR: Variance Informed Initialization for Implicit Neural Representations
Chamin Hewa Koneputugodage, Yizhak Ben-Shabat, Sameera Ramasinghe et al.
MPDrive: Improving Spatial Understanding with Marker-Based Prompt Learning for Autonomous Driving
Zhi-Yuan Zhang, Xiaofan Li, Zhihao Xu et al.
Rotation-Equivariant Self-Supervised Method in Image Denoising
Hanze Liu, Jiahong Fu, Qi Xie et al.
Foundations of the Theory of Performance-Based Ranking
Sébastien Piérard, Anaïs Halin, Anthony Cioppa et al.
APT: Adaptive Personalized Training for Diffusion Models with Limited Data
JungWoo Chae, Jiyoon Kim, Jaewoong Choi et al.
Theoretical Insights in Model Inversion Robustness and Conditional Entropy Maximization for Collaborative Inference Systems
Song Xia, Yi Yu, Wenhan Yang et al.
VolFormer: Explore More Comprehensive Cube Interaction for Hyperspectral Image Restoration and Beyond
Dabing Yu, Zheng Gao
UniMamba: Unified Spatial-Channel Representation Learning with Group-Efficient Mamba for LiDAR-based 3D Object Detection
Xin Jin, Haisheng Su, Kai Liu et al.
CamFreeDiff: Camera-free Image to Panorama Generation with Diffusion Model
Xiaoding Yuan, Shitao Tang, Kejie Li et al.
Illumination Spectrum Estimation for Multispectral Images via Surface Reflectance Modeling and Spatial-Spectral Feature Generation
Hyejin Oh, Woo-Shik Kim, Sangyoon Lee et al.
SET: Spectral Enhancement for Tiny Object Detection
Huixin Sun, Runqi Wang, Yanjing Li et al.
Temporally Consistent Object-Centric Learning by Contrasting Slots
Anna Manasyan, Maximilian Seitzer, Filip Radovic et al.
FactCheXcker: Mitigating Measurement Hallucinations in Chest X-ray Report Generation Models
Alice Heiman, Xiaoman Zhang, Emma Chen et al.
R-SCoRe: Revisiting Scene Coordinate Regression for Robust Large-Scale Visual Localization
Xudong Jiang, Fangjinhua Wang, Silvano Galliani et al.
Repurposing Stable Diffusion Attention for Training-Free Unsupervised Interactive Segmentation
Markus Karmann, Onay Urfalioglu
Retaining Knowledge and Enhancing Long-Text Representations in CLIP through Dual-Teacher Distillation
Yuheng Feng, Changsong Wen, Zelin Peng et al.
Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection
Enshen Zhou, Qi Su, Cheng Chi et al.
Rethinking Reconstruction and Denoising in the Dark: New Perspective, General Architecture and Beyond
Long Ma, Tengyu Ma, Ziye Li et al.
Deterministic Certification of Graph Neural Networks against Graph Poisoning Attacks with Arbitrary Perturbations
Jiate Li, Meng Pang, Yun Dong et al.
Layered Motion Fusion: Lifting Motion Segmentation to 3D in Egocentric Videos
Vadim Tschernezki, Diane Larlus, Andrea Vedaldi et al.
A Focused Human Body Model for Accurate Anthropometric Measurements Extraction
Shuhang Chen, Xianliang Huang, Zhizhou Zhong et al.
All-Optical Nonlinear Diffractive Deep Network for Ultrafast Image Denoising
Xiaoling Zhou, Zhemg Lee, Wei Ye et al.
SIR-DIFF: Sparse Image Sets Restoration with Multi-View Diffusion Model
Yucheng Mao, Boyang Wang, Nilesh Kulkarni et al.
SASep: Saliency-Aware Structured Separation of Geometry and Feature for Open Set Learning on Point Clouds
Jinfeng Xu, Xianzhi Li, Yuan Tang et al.
Beyond Sight: Towards Cognitive Alignment in LVLM via Enriched Visual Knowledge
Yaqi Zhao, Yuanyang Yin, Lin Li et al.
BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
Md Mohaiminul Islam, Tushar Nagarajan, Huiyu Wang et al.
Task-aware Cross-modal Feature Refinement Transformer with Large Language Models for Visual Grounding
Wenbo Chen, Zhen Xu, Ruotao Xu et al.
Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents
Jun Chen, Dannong Xu, Junjie Fei et al.
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning
David Junhao Zhang, Roni Paiss, Shiran Zada et al.
GS-DiT: Advancing Video Generation with Dynamic 3D Gaussian Fields through Efficient Dense 3D Point Tracking
Weikang Bian, Zhaoyang Huang, Xiaoyu Shi et al.
CoCoGaussian: Leveraging Circle of Confusion for Gaussian Splatting from Defocused Images
Jungho Lee, Suhwan Cho, Taeoh Kim et al.
MaIR: A Locality- and Continuity-Preserving Mamba for Image Restoration
Boyun Li, Haiyu Zhao, Wenxin Wang et al.
Few-shot Implicit Function Generation via Equivariance
Suizhi Huang, Xingyi Yang, Hongtao Lu et al.
SerialGen: Personalized Image Generation by First Standardization Then Personalization
Cong Xie, Han Zou, Ruiqi Yu et al.
SimAvatar: Simulation-Ready Avatars with Layered Hair and Clothing
Xueting Li, Ye Yuan, Shalini De Mello et al.
Camouflage Anything: Learning to Hide using Controlled Out-painting and Representation Engineering
Biplab Das, Viswanath Gopalakrishnan
AniDoc: Animation Creation Made Easier
Yihao Meng, Hao Ouyang, Hanlin Wang et al.
PointSR: Self-Regularized Point Supervision for Drone-View Object Detection
Weizhuo Li, Yue Xi, Wenjing Jia et al.
Self-Expansion of Pre-trained Models with Mixture of Adapters for Continual Learning
Huiyi Wang, Haodong Lu, Lina Yao et al.
Brain-Inspired Spiking Neural Networks for Energy-Efficient Object Detection
Ziqi Li, Tao Gao, Yisheng An et al.
DnLUT: Ultra-Efficient Color Image Denoising via Channel-Aware Lookup Tables
Sidi Yang, Binxiao Huang, Yulun Zhang et al.
Incomplete Multi-View Multi-label Learning via Disentangled Representation and Label Semantic Embedding
Xu Yan, Jun Yin, Jie Wen
Building Vision Models upon Heat Conduction
Zhaozhi Wang, Yue Liu, Yunjie Tian et al.
PDFactor: Learning Tri-Perspective View Policy Diffusion Field for Multi-Task Robotic Manipulation
Jingyi Tian, Le Wang, Sanping Zhou et al.
Open Set Label Shift with Test Time Out-of-Distribution Reference
Changkun Ye, Russell Tsuchida, Lars Petersson et al.
Effective SAM Combination for Open-Vocabulary Semantic Segmentation
Minhyeok Lee, Suhwan Cho, Jungho Lee et al.
Do We Really Need Curated Malicious Data for Safety Alignment in Multi-modal Large Language Models?
Yanbo Wang, Jiyang Guan, Jian Liang et al.
Generating 6DoF Object Manipulation Trajectories from Action Description in Egocentric Vision
Tomoya Yoshida, Shuhei Kurita, Taichi Nishimura et al.
Pattern Analogies: Learning to Perform Programmatic Image Edits by Analogy
Aditya Ganeshan, Thibault Groueix, Paul Guerrero et al.
Efficient Transfer Learning for Video-language Foundation Models
Haoxing Chen, Zizheng Huang, Yan Hong et al.
SVDC: Consistent Direct Time-of-Flight Video Depth Completion with Frequency Selective Fusion
Xuan Zhu, Jijun Xiang, Xianqi Wang et al.
FLAIR: VLM with Fine-grained Language-informed Image Representations
Rui Xiao, Sanghwan Kim, Iuliana Georgescu et al.
CryptoFace: End-to-End Encrypted Face Recognition
Wei Ao, Vishnu Naresh Boddeti
SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces
Sumit Chaturvedi, Mengwei Ren, Yannick Hold-Geoffroy et al.
Plug-and-Play Versatile Compressed Video Enhancement
Huimin Zeng, Jiacheng Li, Zhiwei Xiong
Geometry Field Splatting with Gaussian Surfels
Kaiwen Jiang, Venkataram Sivaram, Cheng Peng et al.
PS-EIP: Robust Photometric Stereo Based on Event Interval Profile
Kazuma Kitazawa, Takahito Aoto, Satoshi Ikehata et al.
Model Diagnosis and Correction via Linguistic and Implicit Attribute Editing
Xuanbai Chen, Xiang Xu, Zhihua Li et al.
BFANet: Revisiting 3D Semantic Segmentation with Boundary Feature Analysis
Weiguang Zhao, Rui Zhang, Qiufeng Wang et al.
DUNE: Distilling a Universal Encoder from Heterogeneous 2D and 3D Teachers
Mert Bülent Sarıyıldız, Philippe Weinzaepfel, Thomas Lucas et al.
NeISF++: Neural Incident Stokes Field for Polarized Inverse Rendering of Conductors and Dielectrics
Chenhao Li, Taishi Ono, Takeshi Uemori et al.
Gazing at Rewards: Eye Movements as a Lens into Human and AI Decision-Making in Hybrid Visual Foraging
Bo Wang, Dingwei Tan, Yen-Ling Kuo et al.
Layer- and Timestep-Adaptive Differentiable Token Compression Ratios for Efficient Diffusion Transformers
Haoran You, Connelly Barnes, Yuqian Zhou et al.
FreePCA: Integrating Consistency Information across Long-short Frames in Training-free Long Video Generation via Principal Component Analysis
Jiangtong Tan, Hu Yu, Jie Huang et al.
Improving Semi-Supervised Semantic Segmentation with Sliced-Wasserstein Feature Alignment and Uniformity
Chen Yi Lu, Kasra Derakhshandeh, Somali Chaterji
Theory-Inspired Deep Multi-View Multi-Label Learning with Incomplete Views and Noisy Labels
Quanjiang Li, Tingjin Luo, Jiahui Liao
Asynchronous Collaborative Graph Representation for Frames and Events
Dianze Li, Jianing Li, Xu Liu et al.
SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model
Chunlin Yu, Hanqing Wang, Ye Shi et al.
Bridging Viewpoint Gaps: Geometric Reasoning Boosts Semantic Correspondence
Qiyang Qian, Hansheng Chen, Masayoshi Tomizuka et al.
ROLL: Robust Noisy Pseudo-label Learning for Multi-View Clustering with Noisy Correspondence
Yuan Sun, Yongxiang Li, Zhenwen Ren et al.
Label Shift Meets Online Learning: Ensuring Consistent Adaptation with Universal Dynamic Regret
Yucong Dai, Shilin Gu, Ruidong Fan et al.
PolarNeXt: Rethink Instance Segmentation with Polar Representation
Jiacheng Sun, Xinghong Zhou, Yiqiang Wu et al.
ShiftwiseConv: Small Convolutional Kernel with Large Kernel Effect
Dachong Li, li li, zhuangzhuang chen et al.
G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation
Tianxing Chen, Yao Mu, Zhixuan Liang et al.
Evolving High-Quality Rendering and Reconstruction in a Unified Framework with Contribution-Adaptive Regularization
You Shen, Zhipeng Zhang, Xinyang Li et al.
Language-Guided Salient Object Ranking
Fang Liu, Yuhao Liu, Ke Xu et al.
Structure from Collision
Takuhiro Kaneko
PRaDA: Projective Radial Distortion Averaging
Daniil Sinitsyn, Linus Härenstam-Nielsen, Daniel Cremers
Feature Information Driven Position Gaussian Distribution Estimation for Tiny Object Detection
Jinghao Bian, Mingtao Feng, Weisheng Dong et al.
Event-Equalized Dense Video Captioning
Kangyi Wu, Pengna Li, Jingwen Fu et al.
ProReflow: Progressive Reflow with Decomposed Velocity
Lei Ke, Haohang Xu, Xuefei Ning et al.
Is Your World Simulator a Good Story Presenter? A Consecutive Events-Based Benchmark for Future Long Video Generation
Yiping Wang, Xuehai He, Kuan Wang et al.
Joint Scheduling of Causal Prompts and Tasks for Multi-Task Learning
Chaoyang Li, Jianyang Qin, Jinhao Cui et al.
Unified Dense Prediction of Video Diffusion
Lehan Yang, Lu Qi, Xiangtai Li et al.
Similarity-Guided Layer-Adaptive Vision Transformer for UAV Tracking
chaocan xue, Bineng Zhong, Qihua Liang et al.
Revisiting Audio-Visual Segmentation with Vision-Centric Transformer
Shaofei Huang, Rui Ling, Tianrui Hui et al.
Exploring Scene Affinity for Semi-Supervised LiDAR Semantic Segmentation
Chuandong Liu, Xingxing Weng, Shuguo Jiang et al.
RASP: Revisiting 3D Anamorphic Art for Shadow-Guided Packing of Irregular Objects
Soumyaratna Debnath, Ashish Tiwari, Kaustubh Sadekar et al.
Sketchy Bounding-box Supervision for 3D Instance Segmentation
qian deng, Le Hui, Jin Xie et al.
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs
Lucas Ventura, Antoine Yang, Cordelia Schmid et al.
SOGS: Second-Order Anchor for Advanced 3D Gaussian Splatting
Jiahui Zhang, Fangneng Zhan, Ling Shao et al.
VoxelSplat: Dynamic Gaussian Splatting as an Effective Loss for Occupancy and Flow Prediction
Ziyue Zhu, Shenlong Wang, Jin Xie et al.
VinaBench: Benchmark for Faithful and Consistent Visual Narratives
Silin Gao, Sheryl Mathew, Li Mi et al.
Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization
Zhanhao Liang, Yuhui Yuan, Shuyang Gu et al.
Learning Textual Prompts for Open-World Semi-Supervised Learning
Yuxin Fan, Junbiao Cui, Jiye Liang
Think Small, Act Big: Primitive Prompt Learning for Lifelong Robot Manipulation
Yuanqi Yao, Siao Liu, Haoming Song et al.
Align-KD: Distilling Cross-Modal Alignment Knowledge for Mobile Vision-Language Large Model Enhancement
Qianhan Feng, Wenshuo Li, Tong Lin et al.
HOT: Hadamard-based Optimized Training
Seonggon Kim, Juncheol Shin, Seung-taek Woo et al.
JTD-UAV: MLLM-Enhanced Joint Tracking and Description Framework for Anti-UAV Systems
Yifan Wang, Jian Zhao, Zhaoxin Fan et al.
STEP: Enhancing Video-LLMs’ Compositional Reasoning by Spatio-Temporal Graph-guided Self-Training
Haiyi Qiu, Minghe Gao, Long Qian et al.
Modeling Multiple Normal Action Representations for Error Detection in Procedural Tasks
Wei-Jin Huang, Yuan-Ming Li, Zhi-Wei Xia et al.
RUBIK: A Structured Benchmark for Image Matching across Geometric Challenges
Thibaut Loiseau, Guillaume Bourmaud
Reconstructing Close Human Interaction with Appearance and Proxemics Reasoning
Buzhen Huang, Chen Li, Chongyang Xu et al.
Rethinking Noisy Video-Text Retrieval via Relation-aware Alignment
Huakai Lai, Guoxin Xiong, Huayu Mai et al.
EfficientLLaVA: Generalizable Auto-Pruning for Large Vision-language Models
Yinan Liang, Ziwei Wang, Xiuwei Xu et al.
The Impact Label Noise and Choice of Threshold has on Cross-Entropy and Soft-Dice in Image Segmentation
Marcus Nordström, Atsuto Maki, Henrik Hult
Learning on Model Weights using Tree Experts
Eliahu Horwitz, Bar Cavia, Jonathan Kahana et al.
Image Reconstruction from Readout-Multiplexed Single-Photon Detector Arrays
Shashwath Bharadwaj, Ruangrawee Kitichotkul, Akshay Agarwal et al.
Towards Smart Point-and-Shoot Photography
Jiawan Li, Fei Zhou, Zhipeng Zhong et al.
Hyperspectral Pansharpening via Diffusion Models with Iteratively Zero-Shot Guidance
Jin-Liang Xiao, Ting-Zhu Huang, Liang-Jian Deng et al.
Efficient Motion-Aware Video MLLM
Zijia Zhao, Yuqi Huo, Tongtian Yue et al.
DSPNet: Dual-vision Scene Perception for Robust 3D Question Answering
Jingzhou Luo, Yang Liu, weixing chen et al.
Exploring CLIP's Dense Knowledge for Weakly Supervised Semantic Segmentation
Zhiwei Yang, Yucong Meng, Kexue Fu et al.
UNIALIGN: Scaling Multimodal Alignment within One Unified Model
bo zhou, Liulei Li, Yujia Wang et al.
Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens
Zhangqi Jiang, Junkai Chen, Beier Zhu et al.
iG-6DoF: Model-free 6DoF Pose Estimation for Unseen Object via Iterative 3D Gaussian Splatting
Tuo Cao, Fei LUO, Jiongming Qin et al.
Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method
Xinshuai Song, weixing chen, Yang Liu et al.
BimArt: A Unified Approach for the Synthesis of 3D Bimanual Interaction with Articulated Objects
Wanyue Zhang, Rishabh Dabral, Vladislav Golyanik et al.
Improving Autoregressive Visual Generation with Cluster-Oriented Token Prediction
Teng Hu, Jiangning Zhang, Ran Yi et al.
Learned Binocular-Encoding Optics for RGBD Imaging Using Joint Stereo and Focus Cues
Yuhui Liu, Liangxun Ou, Qiang Fu et al.