Most Cited 2024 "hierarchical interaction model" Papers
12,324 papers found • Page 52 of 62
Conference
Improving Graph Contrastive Learning via Adaptive Positive Sampling
Jiaming Zhuo, Feiyang Qin, Can Cui et al.
Hearing Anything Anywhere
Mason Wang, Ryosuke Sawata, Samuel Clarke et al.
Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning
Chen Zhao, Shuming Liu, Karttikeya Mangalam et al.
Diffusion-EDFs: Bi-equivariant Denoising Generative Modeling on SE(3) for Visual Robotic Manipulation
Hyunwoo Ryu, Jiwoo Kim, Hyunseok An et al.
BOTH2Hands: Inferring 3D Hands from Both Text Prompts and Body Dynamics
Wenqian Zhang, Molin Huang, Yuxuan Zhou et al.
Bayesian Exploration of Pre-trained Models for Low-shot Image Classification
Yibo Miao, Yu lei, Feng Zhou et al.
Benchmarking the Robustness of Temporal Action Detection Models Against Temporal Corruptions
Runhao Zeng, Xiaoyong Chen, Jiaming Liang et al.
RepKPU: Point Cloud Upsampling with Kernel Point Representation and Deformation
Yi Rong, Haoran Zhou, Kang Xia et al.
4K4D: Real-Time 4D View Synthesis at 4K Resolution
Zhen Xu, Sida Peng, Haotong Lin et al.
Context-Guided Spatio-Temporal Video Grounding
Xin Gu, Heng Fan, Yan Huang et al.
TokenHMR: Advancing Human Mesh Recovery with a Tokenized Pose Representation
Sai Kumar Dwivedi, Yu Sun, Priyanka Patel et al.
Re-thinking Data Availability Attacks Against Deep Neural Networks
Bin Fang, Bo Li, Shuang Wu et al.
Logit Standardization in Knowledge Distillation
Shangquan Sun, Wenqi Ren, Jingzhi Li et al.
A Unified Approach for Text- and Image-guided 4D Scene Generation
Yufeng Zheng, Xueting Li, Koki Nagano et al.
CONFORM: Contrast is All You Need for High-Fidelity Text-to-Image Diffusion Models
Tuna Han Salih Meral, Enis Simsar, Federico Tombari et al.
SPECAT: SPatial-spEctral Cumulative-Attention Transformer for High-Resolution Hyperspectral Image Reconstruction
Zhiyang Yao, Shuyang Liu, Xiaoyun Yuan et al.
Video-Based Human Pose Regression via Decoupled Space-Time Aggregation
Jijie He, Wenwu Yang
Neural Refinement for Absolute Pose Regression with Feature Synthesis
Shuai Chen, Yash Bhalgat, Xinghui Li et al.
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
Jianyuan Wang, Nikita Karaev, Christian Rupprecht et al.
Boosting Image Restoration via Priors from Pre-trained Models
Xiaogang Xu, Shu Kong, Tao Hu et al.
CPP-Net: Embracing Multi-Scale Feature Fusion into Deep Unfolding CP-PPA Network for Compressive Sensing
Zhen Guo, Hongping Gan
GenFlow: Generalizable Recurrent Flow for 6D Pose Refinement of Novel Objects
Sungphill Moon, Hyeontae Son, Dongcheol Hur et al.
PKU-DyMVHumans: A Multi-View Video Benchmark for High-Fidelity Dynamic Human Modeling
Xiaoyun Zheng, Liwei Liao, Xufeng Li et al.
MeshGPT: Generating Triangle Meshes with Decoder-Only Transformers
Yawar Siddiqui, Antonio Alliegro, Alexey Artemov et al.
RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features
Geonho Bang, Kwangjin Choi, Jisong Kim et al.
Task-Conditioned Adaptation of Visual Features in Multi-Task Policy Learning
Pierre Marza, Laetitia Matignon, Olivier Simonin et al.
EasyDrag: Efficient Point-based Manipulation on Diffusion Models
Xingzhong Hou, Boxiao Liu, Yi Zhang et al.
Learned Lossless Image Compression based on Bit Plane Slicing
Zhe Zhang, Huairui Wang, Zhenzhong Chen et al.
BEM: Balanced and Entropy-based Mix for Long-Tailed Semi-Supervised Learning
Hongwei Zheng, Linyuan Zhou, Han Li et al.
Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement
Ziyu Wang, Yue Xu, Cewu Lu et al.
SC-Tune: Unleashing Self-Consistent Referential Comprehension in Large Vision Language Models
Tongtian Yue, Jie Cheng, Longteng Guo et al.
Frequency-Adaptive Dilated Convolution for Semantic Segmentation
Linwei Chen, Lin Gu, Dezhi Zheng et al.
TexTile: A Differentiable Metric for Texture Tileability
Carlos Rodriguez-Pardo, Dan Casas, Elena Garces et al.
MatSynth: A Modern PBR Materials Dataset
Giuseppe Vecchio, Valentin Deschaintre
Image Processing GNN: Breaking Rigidity in Super-Resolution
Yuchuan Tian, Hanting Chen, Chao Xu et al.
ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation
Suraj Patni, Aradhye Agarwal, Chetan Arora
Riemannian Multinomial Logistics Regression for SPD Neural Networks
Ziheng Chen, Yue Song, Gaowen Liu et al.
LED: A Large-scale Real-world Paired Dataset for Event Camera Denoising
Yuxing Duan
NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging
Takahiro Shirakawa, Seiichi Uchida
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM
Yutao Hu, Tianbin, Quanfeng Lu et al.
Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models
Daniel Geng, Inbum Park, Andrew Owens
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
Zhihao Yuan, Jinke Ren, Chun-Mei Feng et al.
Towards HDR and HFR Video from Rolling-Mixed-Bit Spikings
Yakun Chang, Yeliduosi Xiaokaiti, Yujia Liu et al.
Learn from View Correlation: An Anchor Enhancement Strategy for Multi-view Clustering
Suyuan Liu, KE LIANG, Zhibin Dong et al.
Passive Snapshot Coded Aperture Dual-Pixel RGB-D Imaging
Bhargav Ghanekar, Salman Siddique Khan, Pranav Sharma et al.
UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
Honghui Yang, Sha Zhang, Di Huang et al.
ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations
Maitreya Patel, Changhoon Kim, Sheng Cheng et al.
Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning
Haoyu Chen, Wenbo Li, Jinjin Gu et al.
Neural Video Compression with Feature Modulation
Jiahao Li, Bin Li, Yan Lu
Nearest is Not Dearest: Towards Practical Defense against Quantization-conditioned Backdoor Attacks
Boheng Li, Yishuo Cai, Haowei Li et al.
Dual DETRs for Multi-Label Temporal Action Detection
Yuhan Zhu, Guozhen Zhang, Jing Tan et al.
Discriminative Probing and Tuning for Text-to-Image Generation
Leigang Qu, Wenjie Wang, Yongqi Li et al.
GigaTraj: Predicting Long-term Trajectories of Hundreds of Pedestrians in Gigapixel Complex Scenes
Haozhe Lin, Chunyu Wei, Li He et al.
Outdoor Scene Extrapolation with Hierarchical Generative Cellular Automata
Dongsu Zhang, Francis Williams, Žan Gojčič et al.
Comparing the Decision-Making Mechanisms by Transformers and CNNs via Explanation Methods
Mingqi Jiang, Saeed Khorram, Li Fuxin
Continual Segmentation with Disentangled Objectness Learning and Class Recognition
Yizheng Gong, Siyue Yu, Xiaoyang Wang et al.
Image Sculpting: Precise Object Editing with 3D Geometry Control
Jiraphon Yenphraphai, Xichen Pan, Sainan Liu et al.
Attribute-Guided Pedestrian Retrieval: Bridging Person Re-ID with Internal Attribute Variability
Yan Huang, Zhang Zhang, Qiang Wu et al.
Weakly Misalignment-free Adaptive Feature Alignment for UAVs-based Multimodal Object Detection
Chen Chen, Jiahao Qi, Xingyue Liu et al.
Prompt-Driven Dynamic Object-Centric Learning for Single Domain Generalization
Deng Li, Aming Wu, Yaowei Wang et al.
EscherNet: A Generative Model for Scalable View Synthesis
Xin Kong, Shikun Liu, Xiaoyang Lyu et al.
MVCPS-NeuS: Multi-view Constrained Photometric Stereo for Neural Surface Reconstruction
Hiroaki Santo, Fumio Okura, Yasuyuki Matsushita
OHTA: One-shot Hand Avatar via Data-driven Implicit Priors
Xiaozheng Zheng, Chao Wen, Zhuo Su et al.
E-GPS: Explainable Geometry Problem Solving via Top-Down Solver and Bottom-Up Generator
Wenjun Wu, Lingling Zhang, Jun Liu et al.
MultiPhys: Multi-Person Physics-aware 3D Motion Estimation
Nicolás Ugrinovic, Boxiao Pan, Georgios Pavlakos et al.
LMDrive: Closed-Loop End-to-End Driving with Large Language Models
Hao Shao, Yuxuan Hu, Letian Wang et al.
ID-Blau: Image Deblurring by Implicit Diffusion-based reBLurring AUgmentation
Jia-Hao Wu, Fu-Jen Tsai, Yan-Tsung Peng et al.
GauHuman: Articulated Gaussian Splatting from Monocular Human Videos
Shoukang Hu, Tao Hu, Ziwei Liu
BEVNeXt: Reviving Dense BEV Frameworks for 3D Object Detection
Zhenxin Li, Shiyi Lan, Jose M. Alvarez et al.
AnySkill: Learning Open-Vocabulary Physical Skill for Interactive Agents
Jieming Cui, Tengyu Liu, Nian Liu et al.
HumanNeRF-SE: A Simple yet Effective Approach to Animate HumanNeRF with Diverse Poses
Caoyuan Ma, Yu-Lun Liu, Zhixiang Wang et al.
SurMo: Surface-based 4D Motion Modeling for Dynamic Human Rendering
Tao Hu, Fangzhou Hong, Ziwei Liu
LeftRefill: Filling Right Canvas based on Left Reference through Generalized Text-to-Image Diffusion Model
Chenjie Cao, Yunuo Cai, Qiaole Dong et al.
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation
Xiaoqi Li, Mingxu Zhang, Yiran Geng et al.
Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation
Wenxuan Wang, Tongtian Yue, Yisi Zhang et al.
PanoPose: Self-supervised Relative Pose Estimation for Panoramic Images
Diantao Tu, Hainan Cui, Xianwei Zheng et al.
Mask4Align: Aligned Entity Prompting with Color Masks for Multi-Entity Localization Problems
Haoquan Zhang, Ronggang Huang, Yi Xie et al.
Global and Local Prompts Cooperation via Optimal Transport for Federated Learning
Hongxia Li, Wei Huang, Jingya Wang et al.
VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning
Ziyang Luo, Nian Liu, Wangbo Zhao et al.
Dense Optical Tracking: Connecting the Dots
Guillaume Le Moing, Jean Ponce, Cordelia Schmid
Multi-agent Collaborative Perception via Motion-aware Robust Communication Network
Shixin Hong, Yu LIU, Zhi Li et al.
Ungeneralizable Examples
Jingwen Ye, Xinchao Wang
Language-only Training of Zero-shot Composed Image Retrieval
Geonmo Gu, Sanghyuk Chun, Wonjae Kim et al.
Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models
Shitian Zhao, Zhuowan Li, YadongLu et al.
Rapid Motor Adaptation for Robotic Manipulator Arms
Yichao Liang, Kevin Ellis, João F. Henriques
Instruct-Imagen: Image Generation with Multi-modal Instruction
Hexiang Hu, Kelvin C.K. Chan, Yu-Chuan Su et al.
Diffeomorphic Template Registration for Atmospheric Turbulence Mitigation
Dong Lao, Congli Wang, Alex Wong et al.
Adapting to Length Shift: FlexiLength Network for Trajectory Prediction
Yi Xu, Yun Fu
CausalPC: Improving the Robustness of Point Cloud Classification by Causal Effect Identification
Yuanmin Huang, Mi Zhang, Daizong Ding et al.
LiSA: LiDAR Localization with Semantic Awareness
Bochun Yang, Zijun Li, Wen Li et al.
Rethinking Prior Information Generation with CLIP for Few-Shot Segmentation
Jin Wang, Bingfeng Zhang, Jian Pang et al.
Adaptive Slot Attention: Object Discovery with Dynamic Slot Number
Ke Fan, Zechen Bai, Tianjun Xiao et al.
Learning Coupled Dictionaries from Unpaired Data for Image Super-Resolution
Longguang Wang, Juncheng Li, Yingqian Wang et al.
C3: High-Performance and Low-Complexity Neural Compression from a Single Image or Video
Hyunjik Kim, Matthias Bauer, Lucas Theis et al.
AttriHuman-3D: Editable 3D Human Avatar Generation with Attribute Decomposition and Indexing
Fan Yang, Tianyi Chen, XIAOSHENG HE et al.
iToF-flow-based High Frame Rate Depth Imaging
Yu Meng, Zhou Xue, Xu Chang et al.
Rethinking Human Motion Prediction with Symplectic Integral
Haipeng Chen, Kedi L yu, Zhenguang Liu et al.
Detector-Free Structure from Motion
Xingyi He, Jiaming Sun, Yifan Wang et al.
Holodeck: Language Guided Generation of 3D Embodied AI Environments
Yue Yang, Fan-Yun Sun, Luca Weihs et al.
DiVAS: Video and Audio Synchronization with Dynamic Frame Rates
Clara Maria Fernandez Labrador, Mertcan Akcay, Eitan Abecassis et al.
Benchmarking Audio Visual Segmentation for Long-Untrimmed Videos
Chen Liu, Peike Li, Qingtao Yu et al.
Inter-X: Towards Versatile Human-Human Interaction Analysis
Liang Xu, Xintao Lv, Yichao Yan et al.
Embodied Multi-Modal Agent trained by an LLM from a Parallel TextWorld
Yijun Yang, Tianyi Zhou, kanxue Li et al.
One-Shot Open Affordance Learning with Foundation Models
Gen Li, Deqing Sun, Laura Sevilla-Lara et al.
SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks
Xinyu Shi, Zecheng Hao, Zhaofei Yu
Tactile-Augmented Radiance Fields
Yiming Dou, Fengyu Yang, Yi Liu et al.
Mean-Shift Feature Transformer
Takumi Kobayashi
Consistent Prompting for Rehearsal-Free Continual Learning
Zhanxin Gao, Jun Cen, Xiaobin Chang
KVQ: Kwai Video Quality Assessment for Short-form Videos
Yiting Lu, Xin Li, Yajing Pei et al.
Purified and Unified Steganographic Network
GuoBiao Li, Sheng Li, Zicong Luo et al.
Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model
Dian Zheng, Xiao-Ming Wu, Shuzhou Yang et al.
Fast Adaptation for Human Pose Estimation via Meta-Optimization
Shengxiang Hu, Huaijiang Sun, Bin Li et al.
Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection
Jiawen Zhu, Choubo Ding, Yu Tian et al.
L4D-Track: Language-to-4D Modeling Towards 6-DoF Tracking and Shape Reconstruction in 3D Point Cloud Stream
Jingtao Sun, Yaonan Wang, Mingtao Feng et al.
MAPSeg: Unified Unsupervised Domain Adaptation for Heterogeneous Medical Image Segmentation Based on 3D Masked Autoencoding and Pseudo-Labeling
Xuzhe Zhang, Yuhao Wu, Elsa Angelini et al.
IBD-SLAM: Learning Image-Based Depth Fusion for Generalizable SLAM
Minghao Yin, Shangzhe Wu, Kai Han
Segment Any Event Streams via Weighted Adaptation of Pivotal Tokens
Zhiwen Chen, Zhiyu Zhu, Yifan Zhang et al.
Boosting Image Quality Assessment through Efficient Transformer Adaptation with Local Feature Enhancement
Kangmin Xu, Liang Liao, Jing Xiao et al.
Learning without Exact Guidance: Updating Large-scale High-resolution Land Cover Maps from Low-resolution Historical Labels
Zhuohong Li, Wei He, Jiepan Li et al.
Multi-Modal Hallucination Control by Visual Information Grounding
Alessandro Favero, Luca Zancato, Matthew Trager et al.
Exploring Orthogonality in Open World Object Detection
Zhicheng Sun, Jinghan Li, Yadong Mu
DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing
Jia-Wei Liu, Yan-Pei Cao, Jay Zhangjie Wu et al.
SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer
Rui Zhu, Yingwei Pan, Yehao Li et al.
Traffic Scene Parsing through the TSP6K Dataset
Peng-Tao Jiang, Yuqi Yang, Yang Cao et al.
KPConvX: Modernizing Kernel Point Convolution with Kernel Attention
Hugues Thomas, Yao-Hung Hubert Tsai, Timothy Barfoot et al.
Latency Correction for Event-guided Deblurring and Frame Interpolation
Yixin Yang, Jinxiu Liang, Bohan Yu et al.
DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Muyang Li, Tianle Cai, Jiaxin Cao et al.
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Juhong Min, Shyamal Buch, Arsha Nagrani et al.
DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models
Nastaran Saadati, Minh Pham, Nasla Saleem et al.
NTO3D: Neural Target Object 3D Reconstruction with Segment Anything
Xiaobao Wei, Renrui Zhang, Jiarui Wu et al.
Text-Driven Image Editing via Learnable Regions
Yuanze Lin, Yi-Wen Chen, Yi-Hsuan Tsai et al.
ZERO-IG: Zero-Shot Illumination-Guided Joint Denoising and Adaptive Enhancement for Low-Light Images
Yiqi Shi, Duo Liu, Liguo Zhang et al.
Self-Supervised Representation Learning from Arbitrary Scenarios
Zhaowen Li, Yousong Zhu, Zhiyang Chen et al.
Rethinking Multi-domain Generalization with A General Learning Objective
Zhaorui Tan, Xi Yang, Kaizhu Huang
Adversarial Distillation Based on Slack Matching and Attribution Region Alignment
Shenglin Yin, Zhen Xiao, Mingxuan Song et al.
The Neglected Tails in Vision-Language Models
Shubham Parashar, Tian Liu, Zhiqiu Lin et al.
Multi-View Attentive Contextualization for Multi-View 3D Object Detection
Xianpeng Liu, Ce Zheng, Ming Qian et al.
SODA: Bottleneck Diffusion Models for Representation Learning
Drew Hudson, Daniel Zoran, Mateusz Malinowski et al.
AHIVE: Anatomy-aware Hierarchical Vision Encoding for Interactive Radiology Report Retrieval
Sixing Yan, William K. Cheung, Ivor Tsang et al.
SPU-PMD: Self-Supervised Point Cloud Upsampling via Progressive Mesh Deformation
Yanzhe Liu, Rong Chen, Yushi Li et al.
Enhancing the Power of OOD Detection via Sample-Aware Model Selection
Feng Xue, Zi He, Yuan Zhang et al.
Self-Supervised Class-Agnostic Motion Prediction with Spatial and Temporal Consistency Regularizations
Kewei Wang, Yizheng Wu, Jun Cen et al.
MMA: Multi-Modal Adapter for Vision-Language Models
Lingxiao Yang, Ru-Yuan Zhang, Yanchen Wang et al.
Grounding and Enhancing Grid-based Models for Neural Fields
Zelin Zhao, FENGLEI FAN, Wenlong Liao et al.
A Category Agnostic Model for Visual Rearrangment
Yuyi Liu, Xinhang Song, Weijie Li et al.
Towards More Unified In-context Visual Understanding
Dianmo Sheng, Dongdong Chen, Zhentao Tan et al.
Towards Progressive Multi-Frequency Representation for Image Warping
Jun Xiao, Zihang Lyu, Cong Zhang et al.
Fourier-basis Functions to Bridge Augmentation Gap: Rethinking Frequency Augmentation in Image Classification
Mei Vaish, Shunxin Wang, Nicola Strisciuglio
VastGaussian: Vast 3D Gaussians for Large Scene Reconstruction
Jiaqi Lin, Zhihao Li, Xiao Tang et al.
What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation
Yihua Cheng, Yaning Zhu, Zongji Wang et al.
Molecular Data Programming: Towards Molecule Pseudo-labeling with Systematic Weak Supervision
Xin Juan, Kaixiong Zhou, Ninghao Liu et al.
OTE: Exploring Accurate Scene Text Recognition Using One Token
Jianjun Xu, Yuxin Wang, Hongtao Xie et al.
TTA-EVF: Test-Time Adaptation for Event-based Video Frame Interpolation via Reliable Pixel and Sample Estimation
Hoonhee Cho, Taewoo Kim, Yuhwan Jeong et al.
HUGS: Human Gaussian Splats
Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel et al.
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
Yushi Hu, Otilia Stretcu, Chun-Ta Lu et al.
Mocap Everyone Everywhere: Lightweight Motion Capture With Smartwatches and a Head-Mounted Camera
Jiye Lee, Hanbyul Joo
DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses
Chen Zhao, Tong Zhang, Zheng Dang et al.
AVID: Any-Length Video Inpainting with Diffusion Model
Zhixing Zhang, Bichen Wu, Xiaoyan Wang et al.
Hyper-MD: Mesh Denoising with Customized Parameters Aware of Noise Intensity and Geometric Characteristics
Xingtao Wang, Hongliang Wei, Xiaopeng Fan et al.
KTPFormer: Kinematics and Trajectory Prior Knowledge-Enhanced Transformer for 3D Human Pose Estimation
Jihua Peng, Yanghong Zhou, Tracy P Y Mok
MFP: Making Full Use of Probability Maps for Interactive Image Segmentation
Chaewon Lee, Seon-Ho Lee, Chang-Su Kim
Cross-Domain Few-Shot Segmentation via Iterative Support-Query Correspondence Mining
Jiahao Nie, Yun Xing, Gongjie Zhang et al.
Strong Transferable Adversarial Attacks via Ensembled Asymptotically Normal Distribution Learning
Zhengwei Fang, Rui Wang, Tao Huang et al.
An Empirical Study of Scaling Law for Scene Text Recognition
Miao Rang, Zhenni Bi, Chuanjian Liu et al.
Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching
Xianqi Wang, Gangwei Xu, Hao Jia et al.
When StyleGAN Meets Stable Diffusion: a W+ Adapter for Personalized Image Generation
Xiaoming Li, Xinyu Hou, Chen Change Loy
Differentiable Neural Surface Refinement for Modeling Transparent Objects
Weijian Deng, Dylan Campbell, Chunyi Sun et al.
Low-power Continuous Remote Behavioral Localization with Event Cameras
Friedhelm Hamann, Suman Ghosh, Ignacio Juarez Martinez et al.
Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting
Taeho Kang, Youngki Lee
AM-RADIO: Agglomerative Vision Foundation Model Reduce All Domains Into One
Mike Ranzinger, Greg Heinrich, Jan Kautz et al.
Towards Co-Evaluation of Cameras HDR and Algorithms for Industrial-Grade 6DoF Pose Estimation
Agastya Kalra, Guy Stoppi, Dmitrii Marin et al.
Tune-An-Ellipse: CLIP Has Potential to Find What You Want
Jinheng Xie, Songhe Deng, Bing Li et al.
BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model
song yiran, Qianyu Zhou, Xiangtai Li et al.
EarthLoc: Astronaut Photography Localization by Indexing Earth from Space
Gabriele Berton, Alex Stoken, Barbara Caputo et al.
PairDETR : Joint Detection and Association of Human Bodies and Faces
Ammar Ali, Georgii Gaikov, Denis Rybalchenko et al.
Close Imitation of Expert Retouching for Black-and-White Photography
Seunghyun Shin, Jisu Shin, Jihwan Bae et al.
OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos
Dongyoung Choi, Hyeonjoong Jang, Min H. Kim
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
Kai Yang, Jian Tao, Jiafei Lyu et al.
Reconstructing Hands in 3D with Transformers
Georgios Pavlakos, Dandan Shan, Ilija Radosavovic et al.
XFeat: Accelerated Features for Lightweight Image Matching
Guilherme Potje, Felipe Cadar, André Araujo et al.
Systematic Comparison of Semi-supervised and Self-supervised Learning for Medical Image Classification
Zhe Huang, Ruijie Jiang, Shuchin Aeron et al.
GoodSAM: Bridging Domain and Capacity Gaps via Segment Anything Model for Distortion-aware Panoramic Semantic Segmentation
WEIMING ZHANG, Yexin Liu, Xu Zheng et al.
VRP-SAM: SAM with Visual Reference Prompt
Yanpeng Sun, Jiahui Chen, Shan Zhang et al.
DiffuScene: Denoising Diffusion Models for Generative Indoor Scene Synthesis
Jiapeng Tang, Yinyu Nie, Lev Markhasin et al.
Looking Similar Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Nikhil Singh, Chih-Wei Wu, Iroro Orife et al.
YOLO-World: Real-Time Open-Vocabulary Object Detection
Tianheng Cheng, Lin Song, Yixiao Ge et al.
Bézier Everywhere All at Once: Learning Drivable Lanes as Bézier Graphs
Hugh Blayney, Hanlin Tian, Hamish Scott et al.
Taming Self-Training for Open-Vocabulary Object Detection
Shiyu Zhao, Samuel Schulter, Long Zhao et al.
Ink Dot-Oriented Differentiable Optimization for Neural Image Halftoning
Hao Jiang, Bingfeng Zhou, Yadong Mu
GeoChat: Grounded Large Vision-Language Model for Remote Sensing
Kartik Kuckreja, Muhammad Sohail Danish, Muzammal Naseer et al.
FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Action Segmentation
Zijia Lu, Ehsan Elhamifar
GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians
Liangxiao Hu, Hongwen Zhang, Yuxiang Zhang et al.
ShapeMatcher: Self-Supervised Joint Shape Canonicalization Segmentation Retrieval and Deformation
Yan Di, Chenyangguang Zhang, Chaowei Wang et al.
Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning
Rui Li, Tobias Fischer, Mattia Segu et al.
SVDTree: Semantic Voxel Diffusion for Single Image Tree Reconstruction
Yuan Li, Zhihao Liu, Bedrich Benes et al.
Patch2Self2: Self-supervised Denoising on Coresets via Matrix Sketching
Shreyas Fadnavis, Agniva Chowdhury, Joshua Batson et al.
FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition
Ganggui Ding, Canyu Zhao, Wen Wang et al.
Generative Unlearning for Any Identity
Juwon Seo, Sung-Hoon Lee, Tae-Young Lee et al.
Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange
Yanhao Wu, Tong Zhang, Wei Ke et al.
Multi-scale Dynamic and Hierarchical Relationship Modeling for Facial Action Units Recognition
Zihan Wang, Siyang Song, Cheng Luo et al.