Most Cited 2024 "3d scene decomposition" Papers
12,324 papers found • Page 7 of 62
Conference
OmniViD: A Generative Framework for Universal Video Understanding
Junke Wang, Dongdong Chen, Chong Luo et al.
Understanding In-Context Learning from Repetitions
Jianhao (Elliott) Yan, Jin Xu, Chiyu Song et al.
Insect-Foundation: A Foundation Model and Large-scale 1M Dataset for Visual Insect Understanding
Hoang-Quan Nguyen, Thanh-Dat Truong, Xuan-Bac Nguyen et al.
Vision Transformer Off-the-Shelf: A Surprising Baseline for Few-Shot Class-Agnostic Counting
Zhicheng Wang, Liwen Xiao, Zhiguo Cao et al.
Biased Temporal Convolution Graph Network for Time Series Forecasting with Missing Values
Xiaodan Chen, Xiucheng Li, Bo Liu et al.
Entropic Open-Set Active Learning
Bardia Safaei, Vibashan VS, Celso de Melo et al.
UniGarmentManip: A Unified Framework for Category-Level Garment Manipulation via Dense Visual Correspondence
Ruihai Wu, Haoran Lu, Yiyan Wang et al.
Hunting Attributes: Context Prototype-Aware Learning for Weakly Supervised Semantic Segmentation
feilong tang, Zhongxing Xu, Zhaojun QU et al.
Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training
Cheng Tan, Jingxuan Wei, Zhangyang Gao et al.
PolyVoice: Language Models for Speech to Speech Translation
Qianqian Dong, Zhiying Huang, Qiao Tian et al.
Copula Conformal prediction for multi-step time series prediction
Sophia Sun, Rose Yu
Personalized Federated Domain-Incremental Learning based on Adaptive Knowledge Matching
Yichen Li, Wenchao Xu, Haozhao Wang et al.
Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions
Fabio Tosi, Pierluigi Zama Ramirez, Matteo Poggi
Prompt-Enhanced Multiple Instance Learning for Weakly Supervised Video Anomaly Detection
Junxi Chen, Liang Li, Li Su et al.
Ghost on the Shell: An Expressive Representation of General 3D Shapes
Zhen Liu, Yao Feng, Yuliang Xiu et al.
Zero-1-to-3: Domain-Level Zero-Shot Cognitive Diagnosis via One Batch of Early-Bird Students towards Three Diagnostic Objectives
Weibo Gao, Qi Liu, Hao Wang et al.
MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders
Baijiong Lin, Weisen Jiang, Pengguang Chen et al.
Self-Supervised Facial Representation Learning with Facial Region Awareness
Zheng Gao, Ioannis Patras
A Simple Baseline for Efficient Hand Mesh Reconstruction
zhishan zhou, shihao zhou, Zhi Lv et al.
I'M HOI: Inertia-aware Monocular Capture of 3D Human-Object Interactions
Chengfeng Zhao, Juze Zhang, Jiashen Du et al.
Revisiting the Domain Shift and Sample Uncertainty in Multi-source Active Domain Transfer
Wenqiao Zhang, Zheqi Lv
Nuvo: Neural UV Mapping for Unruly 3D Representations
Pratul Srinivasan, Stephan J Garbin, Dor Verbin et al.
UMBRAE: Unified Multimodal Brain Decoding
Weihao Xia, Raoul de Charette, Cengiz Oztireli et al.
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling
Rui Liu, Yifan Hu, Yi Ren et al.
Chinese Spelling Correction as Rephrasing Language Model
Linfeng Liu, Hongqiu Wu, Hai Zhao
Single Domain Generalization for Crowd Counting
Zhuoxuan Peng, S.-H. Gary Chan
DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks
Jiaxin Zhang, Dezhi Peng, Chongyu Liu et al.
Unified Language-driven Zero-shot Domain Adaptation
Senqiao Yang, Zhuotao Tian, Li Jiang et al.
UMIE: Unified Multimodal Information Extraction with Instruction Tuning
Lin Sun, Kai Zhang, Qingyuan Li et al.
Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding
Ozan Unal, Christos Sakaridis, Suman Saha et al.
Logical Languages Accepted by Transformer Encoders with Hard Attention
Pablo Barcelo, Alexander Kozachinskiy, Anthony W. Lin et al.
VOODOO 3D: Volumetric Portrait Disentanglement For One-Shot 3D Head Reenactment
Phong Tran, Egor Zakharov, Long Nhat Ho et al.
Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework
Ziyao Huang, Fan Tang, Yong Zhang et al.
View Selection for 3D Captioning via Diffusion Ranking
Tiange Luo, Justin Johnson, Honglak Lee
Dataset Distillation by Automatic Training Trajectories
Dai Liu, Jindong Gu, Hu Cao et al.
Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis
Zanlin Ni, Yulin Wang, Renping Zhou et al.
VEGS: View Extrapolation of Urban Scenes in 3D Gaussian Splatting using Learned Priors
Sungwon Hwang, Min-Jung Kim, Taewoong Kang et al.
DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly
Gianluca Scarpellini, Stefano Fiorini, Francesco Giuliari et al.
Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models
Hongjie Wang, Difan Liu, Yan Kang et al.
DP-AdamBC: Your DP-Adam Is Actually DP-SGD (Unless You Apply Bias Correction)
Qiaoyue Tang, Frederick Shpilevskiy, Mathias Lécuyer
Text-to-3D Generation with Bidirectional Diffusion using both 2D and 3D priors
Lihe Ding, Shaocong Dong, Zhanpeng Huang et al.
Cooper: Coordinating Specialized Agents towards a Complex Dialogue Goal
Yi Cheng, Wenge Liu, Jian Wang et al.
Offline and Online Optical Flow Enhancement for Deep Video Compression
Chuanbo Tang, Xihua Sheng, Zhuoyuan Li et al.
Attention Prompting on Image for Large Vision-Language Models
Runpeng Yu, Weihao Yu, Xinchao Wang
Unified Generative Modeling of 3D Molecules with Bayesian Flow Networks
Yuxuan Song, Jingjing Gong, Hao Zhou et al.
Think Twice Before Selection: Federated Evidential Active Learning for Medical Image Analysis with Domain Shifts
Jiayi Chen, Benteng Ma, Hengfei Cui et al.
DreamFlow: High-quality text-to-3D generation by Approximating Probability Flow
Kyungmin Lee, Kihyuk Sohn, Jinwoo Shin
Retrieval-Augmented Embodied Agents
Yichen Zhu, Zhicai Ou, Xiaofeng Mou et al.
AccDiffusion: An Accurate Method for Higher-Resolution Image Generation
Zhihang Lin, Mingbao Lin, Meng Zhao et al.
TextCraftor: Your Text Encoder Can be Image Quality Controller
Yanyu Li, Xian Liu, Anil Kag et al.
MeshAvatar: Learning High-quality Triangular Human Avatars from Multi-view Videos
Yushuo Chen, Zerong Zheng, Zhe Li et al.
Analyzing Feed-Forward Blocks in Transformers through the Lens of Attention Maps
Goro Kobayashi, Tatsuki Kuribayashi, Sho Yokoi et al.
EventBind: Learning a Unified Representation to Bind Them All for Event-based Open-world Understanding
jiazhou zhou, Xu Zheng, Yuanhuiyi Lyu et al.
Cascade-Zero123: One Image to Highly Consistent 3D with Self-Prompted Nearby Views
Yabo Chen, Jiemin Fang, Yuyang Huang et al.
Sparse Semi-DETR: Sparse Learnable Queries for Semi-Supervised Object Detection
Tahira Shehzadi, Khurram Azeem Hashmi, Didier Stricker et al.
DC-NAS: Divide-and-Conquer Neural Architecture Search for Multi-Modal Classification
Xinyan Liang, Pinhan Fu, Qian Guo et al.
DREAM: Dual Structured Exploration with Mixup for Open-set Graph Domain Adaption
Nan Yin, Mengzhu Wang, Mengzhu Wang et al.
Auto-Prox: Training-Free Vision Transformer Architecture Search via Automatic Proxy Discovery
Zimian Wei, Peijie Dong, Zheng Hui et al.
It's All About Your Sketch: Democratising Sketch Control in Diffusion Models
Subhadeep Koley, Ayan Kumar Bhunia, Deeptanshu Sekhri et al.
CAT-SAM: Conditional Tuning for Few-Shot Adaptation of Segment Anything Model
Aoran Xiao, Weihao Xuan, Heli Qi et al.
Learning Unified Reference Representation for Unsupervised Multi-class Anomaly Detection
Liren He, Zhengkai Jiang, Jinlong Peng et al.
Language-Driven Physics-Based Scene Synthesis and Editing via Feature Splatting
Ri-Zhao Qiu, Ge Yang, Weijia Zeng et al.
T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models
Zhongqi Wang, Jie Zhang, Shiguang Shan et al.
CyberDemo: Augmenting Simulated Human Demonstration for Real-World Dexterous Manipulation
Jun Wang, Yuzhe Qin, Kaiming Kuang et al.
Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model
Xu He, Qiaochu Huang, Zhensong Zhang et al.
Graph Disentangled Contrastive Learning with Personalized Transfer for Cross-Domain Recommendation
Jing Liu, Lele Sun, Wei-zhi Nie et al.
Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder
Jinseok Kim, Tae-Kyun Kim
Parallelizing non-linear sequential models over the sequence length
Yi Heng Lim, Qi Zhu, Joshua Selfridge et al.
Learning Modality-agnostic Representation for Semantic Segmentation from Any Modalities
Xu Zheng, Yuanhuiyi Lyu, LIN WANG
GP-NeRF: Generalized Perception NeRF for Context-Aware 3D Scene Understanding
Hao Li, Dingwen Zhang, Yalun Dai et al.
Unifying Correspondence Pose and NeRF for Generalized Pose-Free Novel View Synthesis
Sunghwan Hong, Jaewoo Jung, Heeseong Shin et al.
PPAD: Iterative Interactions of Prediction and Planning for End-to-end Autonomous Driving
Zhili Chen, Maosheng Ye, Shuangjie Xu et al.
WHAC: World-grounded Humans and Cameras
Wanqi Yin, Zhongang Cai, Chen Wei et al.
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
Zhecan Wang, Garrett Bingham, Adams Wei Yu et al.
LAMM: Label Alignment for Multi-Modal Prompt Learning
Jingsheng Gao, Jiacheng Ruan, Suncheng Xiang et al.
Video Editing via Factorized Diffusion Distillation
Uriel Singer, Amit Zohar, Yuval Kirstain et al.
VideoCon: Robust Video-Language Alignment via Contrast Captions
Hritik Bansal, Yonatan Bitton, Idan Szpektor et al.
Early Neuron Alignment in Two-layer ReLU Networks with Small Initialization
Hancheng Min, Enrique Mallada, Rene Vidal
LaneCPP: Continuous 3D Lane Detection using Physical Priors
Maximilian Pittner, Joel Janai, Alexandru Paul Condurache
Open-Vocabulary Semantic Segmentation with Image Embedding Balancing
Xiangheng Shan, Dongyue Wu, Guilin Zhu et al.
I-MedSAM: Implicit Medical Image Segmentation with Segment Anything
Xiaobao Wei, Jiajun Cao, Yizhu Jin et al.
NARUTO: Neural Active Reconstruction from Uncertain Target Observations
Ziyue Feng, Huangying Zhan, Zheng Chen et al.
3D Paintbrush: Local Stylization of 3D Shapes with Cascaded Score Distillation
Dale Decatur, Itai Lang, Kfir Aberman et al.
Masked Structural Growth for 2x Faster Language Model Pre-training
Yiqun Yao, Zheng Zhang, Jing Li et al.
Dual Self-Paced Cross-Modal Hashing
Yuan Sun, Jian Dai, Zhenwen Ren et al.
UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory
Haiwen Diao, Bo Wan, Ying Zhang et al.
Image Compression for Machine and Human Vision With Spatial-Frequency Adaptation
han li, Shaohui Li, Shuangrui Ding et al.
FMA-Net: Flow-Guided Dynamic Filtering and Iterative Feature Refinement with Multi-Attention for Joint Video Super-Resolution and Deblurring
Geunhyuk Youk, Jihyong Oh, Munchurl Kim
Dispel Darkness for Better Fusion: A Controllable Visual Enhancer based on Cross-modal Conditional Adversarial Learning
HAO ZHANG, Linfeng Tang, Xinyu Xiang et al.
ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers
Jinke Li, Xiao He, Chonghua Zhou et al.
Zero-shot Object Counting with Good Exemplars
Huilin Zhu, Jingling Yuan, Zhengwei Yang et al.
Energy-guided Entropic Neural Optimal Transport
Petr Mokrov, Alexander Korotin, Alexander Kolesov et al.
Backdoor Federated Learning by Poisoning Backdoor-Critical Layers
Haomin Zhuang, Mingxian Yu, Hao Wang et al.
Text-Conditioned Generative Model of 3D Strand-based Human Hairstyles
Vanessa Sklyarova, Egor Zakharov, Otmar Hilliges et al.
ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning
Beomyoung Kim, Joonsang Yu, Sung Ju Hwang
Audio-Visual Segmentation via Unlabeled Frame Exploitation
Jinxiang Liu, Yikun Liu, Ferenas et al.
R&B: Region and Boundary Aware Zero-shot Grounded Text-to-image Generation
Jiayu Xiao, Henglei Lv, Henglei Lv et al.
Multi-modal Learning for Geospatial Vegetation Forecasting
Vitus Benson, Claire Robin, Christian Requena-Mesa et al.
Multistain Pretraining for Slide Representation Learning in Pathology
Guillaume Jaume, Anurag J Vaidya, Andrew Zhang et al.
Fast Context-Based Low-Light Image Enhancement via Neural Implicit Representations
Tomáš Chobola, Yu Liu, Hanyi Zhang et al.
A Generalized Neural Diffusion Framework on Graphs
10011 Yibo Li, Xiao Wang, Hongrui Liu et al.
Distribution-aware Knowledge Prototyping for Non-exemplar Lifelong Person Re-identification
Kunlun Xu, Xu Zou, Yuxin Peng et al.
GeneralAD: Anomaly Detection Across Domains by Attending to Distorted Features
Luc Sträter, Mohammadreza Salehi, Efstratios Gavves et al.
FairDomain: Achieving Fairness in Cross-Domain Medical Image Segmentation and Classification
Yu Tian, Congcong Wen, Min Shi et al.
HIG: Hierarchical Interlacement Graph Approach to Scene Graph Generation in Video Understanding
Trong-Thuan Nguyen, Pha Nguyen, Khoa Luu
OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding
Ming Hu, Peng Xia, Lin Wang et al.
Blind Image Quality Assessment Based on Geometric Order Learning
Nyeong-Ho Shin, Seon-Ho Lee, Chang-Su Kim
A Multi-Modal Contrastive Diffusion Model for Therapeutic Peptide Generation
Yongkang Wang, Xuan Liu, Feng Huang et al.
No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation
Xiangyang Zhu, Renrui Zhang, Bowei He et al.
Higher-Order Graph Convolutional Network with Flower-Petals Laplacians on Simplicial Complexes
Yiming Huang, Yujie Zeng, Qiang Wu et al.
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional Understanding
Le Zhang, Rabiul Awal, Aishwarya Agrawal
The Nerfect Match: Exploring NeRF Features for Visual Localization
Qunjie Zhou, Maxim Maximov, Or Litany et al.
Sparse Global Matching for Video Frame Interpolation with Large Motion
Chunxu Liu, Guozhen Zhang, Rui Zhao et al.
PartDistill: 3D Shape Part Segmentation by Vision-Language Model Distillation
Ardian Umam, Cheng-Kun Yang, Min-Hung Chen et al.
Progressive Pretext Task Learning for Human Trajectory Prediction
Xiaotong Lin, Tianming Liang, Jian-Huang Lai et al.
Multimodal Patient Representation Learning with Missing Modalities and Labels
Zhenbang Wu, Anant Dadu, Nicholas Tustison et al.
Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion
Zuoyue Li, Zhenqiang Li, Zhaopeng Cui et al.
HiKER-SGG: Hierarchical Knowledge Enhanced Robust Scene Graph Generation
Ce Zhang, Simon Stepputtis, Joseph Campbell et al.
The Devil is in the Fine-Grained Details: Evaluating Open-Vocabulary Object Detectors for Fine-Grained Understanding
Lorenzo Bianchi, Fabio Carrara, Nicola Messina et al.
Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning
Xinshun Wang, Zhongbin Fang, Xia Li et al.
Zero Bubble (Almost) Pipeline Parallelism
Penghui Qi, Xinyi Wan, Guangxing Huang et al.
Dolfin: Diffusion Layout Transformers without Autoencoder
Yilin Wang, Zeyuan Chen, Liangjun Zhong et al.
CPR: Retrieval Augmented Generation for Copyright Protection
Aditya Golatkar, Alessandro Achille, Luca Zancato et al.
Efficient and Scalable Graph Generation through Iterative Local Expansion
Andreas Bergmeister, Karolis Martinkus, Nathanaël Perraudin et al.
Improved baselines for vision-language pre-training
Jakob Verbeek, Enrico Fini, Michal Drozdzal et al.
Small Model Can Self-Correct
Haixia Han, Jiaqing Liang, Jie Shi et al.
2382 SSMG: Spatial-Semantic Map Guided Diffusion Model for Free-Form Layout-to-Image Generation
Chengyou Jia, Minnan Luo, Zhuohang Dang et al.
Predicting Emergent Abilities with Infinite Resolution Evaluation
Shengding Hu, Xin Liu, Xu Han et al.
TEA: Test-time Energy Adaptation
Yige Yuan, Bingbing Xu, Liang Hou et al.
HyperFast: Instant Classification for Tabular Data
David Bonet, Daniel Mas Montserrat, Xavier Giró-i-Nieto et al.
Getting it Right: Improving Spatial Consistency in Text-to-Image Models
Agneet Chatterjee, Gabriela Ben Melech Stan, Estelle Guez Aflalo et al.
Unmasking and Improving Data Credibility: A Study with Datasets for Training Harmless Language Models
Zhaowei Zhu, Jialu Wang, Hao Cheng et al.
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Lucas D. Lingle
Safe-Sim: Safety-Critical Closed-Loop Traffic Simulation with Diffusion-Controllable Adversaries
WEI-JER Chang, Francesco Pittaluga, Masayoshi TOMIZUKA et al.
Generalization Analysis of Machine Learning Algorithms via the Worst-Case Data-Generating Probability Measure
Xinying Zou, Samir Perlaza, Inaki Esnaola et al.
MVIP-NeRF: Multi-view 3D Inpainting on NeRF Scenes via Diffusion Prior
Honghua Chen, Chen Change Loy, Xingang Pan
Generative Multi-modal Models are Good Class Incremental Learners
Xusheng Cao, Haori Lu, Linlan Huang et al.
Bridging the Gap between 2D and 3D Visual Question Answering: A Fusion Approach for 3D VQA
Wentao Mo, Yang Liu
Working Memory Capacity of ChatGPT: An Empirical Study
Dongyu Gong, Xingchen Wan, Dingmin Wang
Peering Through Preferences: Unraveling Feedback Acquisition for Aligning Large Language Models
Hritik Bansal, John Dang, Aditya Grover
Motif-Aware Riemannian Graph Neural Network with Generative-Contrastive Learning
Li Sun, Zhenhao Huang, Zixi Wang et al.
DiffPortrait3D: Controllable Diffusion for Zero-Shot Portrait View Synthesis
Yuming Gu, Hongyi Xu, You Xie et al.
BadRL: Sparse Targeted Backdoor Attack against Reinforcement Learning
Jing Cui, Yufei Han, Yuzhe Ma et al.
DTL: Disentangled Transfer Learning for Visual Recognition
Minghao Fu, Ke Zhu, Jianxin Wu
Efficient Inference of Vision Instruction-Following Models with Elastic Cache
ZUYAN LIU, Benlin Liu, Jiahui Wang et al.
PYRA: Parallel Yielding Re-Activation for Training-Inference Efficient Task Adaptation
Yizhe Xiong, Hui Chen, Tianxiang Hao et al.
Enhancing Diffusion Models with Text-Encoder Reinforcement Learning
Chaofeng Chen, Annan Wang, Haoning Wu et al.
ZeroRF: Fast Sparse View 360° Reconstruction with Zero Pretraining
Ruoxi Shi, Xinyue Wei, Cheng Wang et al.
Automatic Radiology Reports Generation via Memory Alignment Network
Hongyu Shen, Mingtao Pei, Juncai Liu et al.
Towards Natural Language-Guided Drones: GeoText-1652 Benchmark with Spatial Relation Matching
Meng Chu, Zhedong Zheng, Wei Ji et al.
Navigating Open Set Scenarios for Skeleton-Based Action Recognition
Kunyu Peng, Cheng Yin, Junwei Zheng et al.
eTag: Class-Incremental Learning via Embedding Distillation and Task-Oriented Generation
Libo Huang, Yan Zeng, Chuanguang Yang et al.
M&M VTO: Multi-Garment Virtual Try-On and Editing
Luyang Zhu, Yingwei Li, Nan Liu et al.
Do text-free diffusion models learn discriminative visual representations?
Soumik Mukhopadhyay, Matthew Gwilliam, Yosuke Yamaguchi et al.
ScaleDreamer: Scalable Text-to-3D Synthesis with Asynchronous Score Distillation
Zhiyuan MA, Yuxiang WEI, Yabin Zhang et al.
Trackastra: Transformer-based cell tracking for live-cell microscopy
Benjamin Gallusser, Weigert Martin
Beyond OOD State Actions: Supported Cross-Domain Offline Reinforcement Learning
Jinxin Liu, Ziqi Zhang, Zhenyu Wei et al.
Boosting Spike Camera Image Reconstruction from a Perspective of Dealing with Spike Fluctuations
Rui Zhao, Ruiqin Xiong, Jing Zhao et al.
SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation
Heyuan Li, Ce Chen, Tianhao Shi et al.
Generative Rendering: Controllable 4D-Guided Video Generation with 2D Diffusion Models
Shengqu Cai, Duygu Ceylan, Matheus Gadelha et al.
Text-Based Occluded Person Re-identification via Multi-Granularity Contrastive Consistency Learning
Xinyi Wu, Wentao Ma, Dan Guo et al.
GroundVLP: Harnessing Zero-Shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection
Haozhan Shen, Tiancheng Zhao, Mingwei Zhu et al.
FRED: Towards a Full Rotation-Equivariance in Aerial Image Object Detection
Chanho Lee, Jinsu Son, Hyounguk Shon et al.
SWinGS: Sliding Windows for Dynamic 3D Gaussian Splatting
Richard Shaw, Michal Nazarczuk, Song Jifei et al.
SasWOT: Real-Time Semantic Segmentation Architecture Search WithOut Training
Chendi Zhu, Lujun Li, Yuli Wu et al.
Learning to design protein-protein interactions with enhanced generalization
Anton Bushuiev, Roman Bushuiev, Petr Kouba et al.
MoDE: CLIP Data Experts via Clustering
Jiawei Ma, Po-Yao Huang, Saining Xie et al.
Curriculum Point Prompting for Weakly-Supervised Referring Image Segmentation
Qiyuan Dai, Sibei Yang
Efficient Deweahter Mixture-of-Experts with Uncertainty-Aware Feature-Wise Linear Modulation
Rongyu Zhang, Yulin Luo, Jiaming Liu et al.
Mirasol3B: A Multimodal Autoregressive Model for Time-Aligned and Contextual Modalities
AJ Piergiovanni, Isaac Noble, Dahun Kim et al.
Empirical Analysis of Model Selection for Heterogeneous Causal Effect Estimation
Divyat Mahajan, Ioannis Mitliagkas, Brady Neal et al.
SAM-COD: SAM-guided Unified Framework for Weakly-Supervised Camouflaged Object Detection
Huafeng Chen, Pengxu Wei, Guangqian Guo et al.
CLIM: Contrastive Language-Image Mosaic for Region Representation
Size Wu, Wenwei Zhang, Lumin XU et al.
Doubly Abductive Counterfactual Inference for Text-based Image Editing
Xue Song, Jiequan Cui, Hanwang Zhang et al.
Beyond the Contact: Discovering Comprehensive Affordance for 3D Objects from Pre-trained 2D Diffusion Models
Hyeonwoo Kim, Sookwan Han, Patrick Kwon et al.
Revisiting Plasticity in Visual Reinforcement Learning: Data, Modules and Training Stages
Guozheng Ma, Lu Li, Sen Zhang et al.
SignAvatars: A Large-scale 3D Sign Language Holistic Motion Dataset and Benchmark
Zhengdi Yu, Shaoli Huang, yongkang cheng et al.
Federated Generalized Category Discovery
Nan Pu, Wenjing Li, Xinyuan Ji et al.
Out-of-Distribution Detection in Long-Tailed Recognition with Calibrated Outlier Class Learning
Wenjun Miao, Guansong Pang, Xiao Bai et al.
Multi-Object Tracking in the Dark
Xinzhe Wang, Kang Ma, Qiankun Liu et al.
GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding
Chengyao Wang, Li Jiang, Xiaoyang Wu et al.
Learning Correlation Structures for Vision Transformers
Manjin Kim, Paul Hongsuck Seo, Cordelia Schmid et al.
Template Free Reconstruction of Human-object Interaction with Procedural Interaction Generation
Xianghui Xie, Bharat Lal Bhatnagar, Jan Lenssen et al.
Synthesize Step-by-Step: Tools Templates and LLMs as Data Generators for Reasoning-Based Chart VQA
Zhuowan Li, Bhavan Jasani, Peng Tang et al.
Mask2Map: Vectorized HD Map Construction Using Bird's Eye View Segmentation Masks
Sehwan Choi, Jun Won Choi, JUNGHO KIM et al.
RLIF: Interactive Imitation Learning as Reinforcement Learning
Jianlan Luo, Perry Dong, Yuexiang Zhai et al.
WateRF: Robust Watermarks in Radiance Fields for Protection of Copyrights
Youngdong Jang, Dong In Lee, MinHyuk Jang et al.
Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment
Muhammad Sohail Danish, Muhammad Haris Khan, Muhammad Akhtar Munir et al.
HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
Fucai Ke, Zhixi Cai, Simindokht Jahangard et al.
Contrastive Pre-Training with Multi-View Fusion for No-Reference Point Cloud Quality Assessment
Ziyu Shan, Yujie Zhang, Qi Yang et al.
Adapt Before Comparison: A New Perspective on Cross-Domain Few-Shot Segmentation
Jonas Herzog
Multi-Class Support Vector Machine with Maximizing Minimum Margin
Feiping Nie, Zhezheng Hao, Rong Wang
Generate Subgoal Images before Act: Unlocking the Chain-of-Thought Reasoning in Diffusion Model for Robot Manipulation with Multimodal Prompts
Fei Ni, Jianye Hao, Shiguang Wu et al.
ACT: Empowering Decision Transformer with Dynamic Programming via Advantage Conditioning
Chen-Xiao Gao, Chenyang Wu, Mingjun Cao et al.
Motion2VecSets: 4D Latent Vector Set Diffusion for Non-rigid Shape Reconstruction and Tracking
Wei Cao, Chang Luo, Biao Zhang et al.
LLMGA: Multimodal Large Language Model based Generation Assistant
Bin Xia, Shiyin Wang, Yingfan Tao et al.
Entity-Centric Reinforcement Learning for Object Manipulation from Pixels
Dan Haramati, Tal Daniel, Aviv Tamar
Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection
Ting Lei, Shaofeng Yin, Yuxin Peng et al.
SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation
Junyan Ye, Qiyan Luo, Jinhua Yu et al.
TrojVLM: Backdoor Attack Against Vision Language Models
Weimin Lyu, Lu Pang, Tengfei Ma et al.