Most Cited 2024 "interpretable prediction networks" Papers
12,324 papers found • Page 55 of 62
Conference
A Stealthy Wrongdoer: Feature-Oriented Reconstruction Attack against Split Learning
Xiaoyang Xu, Mengda Yang, Wenzhe Yi et al.
Bootstrapping SparseFormers from Vision Foundation Models
Ziteng Gao, Zhan Tong, Kevin Qinghong Lin et al.
Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity
Ruijie Quan, Wenguan Wang, Zhibo Tian et al.
G-NeRF: Geometry-enhanced Novel View Synthesis from Single-View Images
Zixiong Huang, Qi Chen, Libo Sun et al.
Active Prompt Learning in Vision Language Models
Jihwan Bang, Sumyeong Ahn, Jae-Gil Lee
Generating Handwritten Mathematical Expressions From Symbol Graphs: An End-to-End Pipeline
Yu chen, Fei Gao, YanguangZhang et al.
On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation
Agneet Chatterjee, Tejas Gokhale, Chitta Baral et al.
SingularTrajectory: Universal Trajectory Predictor Using Diffusion Model
Inhwan Bae, Young-Jae Park, Hae-Gon Jeon
Domain Separation Graph Neural Networks for Saliency Object Ranking
Zijian Wu, Jun Lu, Jing Han et al.
Solving the Catastrophic Forgetting Problem in Generalized Category Discovery
Xinzi Cao, Xiawu Zheng, Guanhong Wang et al.
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving
Yuqi Wang, Jiawei He, Lue Fan et al.
WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models
Changhoon Kim, Kyle Min, Maitreya Patel et al.
MatchU: Matching Unseen Objects for 6D Pose Estimation from RGB-D Images
Junwen Huang, Hao Yu, Kuan-Ting Yu et al.
Resource-Efficient Transformer Pruning for Finetuning of Large Models
Fatih Ilhan, Gong Su, Selim Tekin et al.
Link-Context Learning for Multimodal LLMs
Yan Tai, Weichen Fan, Zhao Zhang et al.
The Manga Whisperer: Automatically Generating Transcriptions for Comics
Ragav Sachdeva, Andrew Zisserman
Deep-TROJ: An Inference Stage Trojan Insertion Algorithm through Efficient Weight Replacement Attack
Sabbir Ahmed, RANYANG ZHOU, Shaahin Angizi et al.
Dynamic LiDAR Re-simulation using Compositional Neural Fields
Hanfeng Wu, Xingxing Zuo, Stefan Leutenegger et al.
Language-aware Visual Semantic Distillation for Video Question Answering
Bo Zou, Chao Yang, Yu Qiao et al.
3DInAction: Understanding Human Actions in 3D Point Clouds
Yizhak Ben-Shabat, Oren Shrout, Stephen Gould
DiLiGenRT: A Photometric Stereo Dataset with Quantified Roughness and Translucency
Heng Guo, Jieji Ren, Feishi Wang et al.
StyLitGAN: Image-Based Relighting via Latent Control
Anand Bhattad, James Soole, David Forsyth
Label-Efficient Group Robustness via Out-of-Distribution Concept Curation
Yiwei Yang, Anthony Liu, Robert Wolfe et al.
Unsupervised Universal Image Segmentation
XuDong Wang, Dantong Niu, Xinyang Han et al.
Batch Normalization Alleviates the Spectral Bias in Coordinate Networks
Zhicheng Cai, Hao Zhu, Qiu Shen et al.
Not All Classes Stand on Same Embeddings: Calibrating a Semantic Distance with Metric Tensor
Jae Hyeon Park, Gyoomin Lee, Seunggi Park et al.
CodedEvents: Optimal Point-Spread-Function Engineering for 3D-Tracking with Event Cameras
Sachin Shah, Matthew Chan, Haoming Cai et al.
Animatable Gaussians: Learning Pose-dependent Gaussian Maps for High-fidelity Human Avatar Modeling
Zhe Li, Zerong Zheng, Lizhen Wang et al.
Dynamic Cues-Assisted Transformer for Robust Point Cloud Registration
Hong Chen, Pei Yan, sihe xiang et al.
Retrieval-Augmented Open-Vocabulary Object Detection
Jooyeon Kim, Eulrang Cho, Sehyung Kim et al.
NB-GTR: Narrow-Band Guided Turbulence Removal
Yifei Xia, Chu Zhou, Chengxuan Zhu et al.
LangSplat: 3D Language Gaussian Splatting
Minghan Qin, Wanhua Li, Jiawei ZHOU et al.
Positive-Unlabeled Learning by Latent Group-Aware Meta Disambiguation
Lin Long, Haobo Wang, Zhijie Jiang et al.
Text-conditional Attribute Alignment across Latent Spaces for 3D Controllable Face Image Synthesis
FeiFan Xu, Rui Li, Si Wu et al.
SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering
Antoine Guédon, Vincent Lepetit
DiffusionPoser: Real-time Human Motion Reconstruction From Arbitrary Sparse Sensors Using Autoregressive Diffusion
Tom Van Wouwe, Seunghwan Lee, Antoine Falisse et al.
HumanRef: Single Image to 3D Human Generation via Reference-Guided Diffusion
Jingbo Zhang, Xiaoyu Li, Qi Zhang et al.
CurveCloudNet: Processing Point Clouds with 1D Structure
Colton Stearns, Alex Fu, Jiateng Liu et al.
Harnessing Meta-Learning for Improving Full-Frame Video Stabilization
Muhammad Kashif Ali, Eun Woo Im, Dongjin Kim et al.
Physical 3D Adversarial Attacks against Monocular Depth Estimation in Autonomous Driving
Junhao Zheng, Chenhao Lin, Jiahao Sun et al.
SeaBird: Segmentation in Bird’s View with Dice Loss Improves Monocular 3D Detection of Large Objects
Abhinav Kumar, Yuliang Guo, Xinyu Huang et al.
MoML: Online Meta Adaptation for 3D Human Motion Prediction
Xiaoning Sun, Huaijiang Sun, Bin Li et al.
Learning with Structural Labels for Learning with Noisy Labels
Noo-ri Kim, Jin-Seop Lee, Jee-Hyong Lee
What If the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-modal Language Models
Letian Zhang, Xiaotong Zhai, Zhongkai Zhao et al.
Incremental Nuclei Segmentation from Histopathological Images via Future-class Awareness and Compatibility-inspired Distillation
Huyong Wang, Huisi Wu, Jing Qin
Model Inversion Robustness: Can Transfer Learning Help?
Sy-Tuyen Ho, Koh Jun Hao, Keshigeyan Chandrasegaran et al.
Scene-adaptive and Region-aware Multi-modal Prompt for Open Vocabulary Object Detection
Xiaowei Zhao, Xianglong Liu, Duorui Wang et al.
InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models
Jiun Tian Hoe, Xudong Jiang, Chee Seng Chan et al.
MAP: MAsk-Pruning for Source-Free Model Intellectual Property Protection
Boyang Peng, Sanqing Qu, Yong Wu et al.
Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis
Xin Zhou, Dingkang Liang, Wei Xu et al.
EMCAD: Efficient Multi-scale Convolutional Attention Decoding for Medical Image Segmentation
Md Mostafijur Rahman, Mustafa Munir, Radu Marculescu
On Exact Inversion of DPM-Solvers
Seongmin Hong, Kyeonghyun Lee, Suh Yoon Jeon et al.
Generate Like Experts: Multi-Stage Font Generation by Incorporating Font Transfer Process into Diffusion Models
Bin Fu, Fanghua Yu, Anran Liu et al.
A Unified Diffusion Framework for Scene-aware Human Motion Estimation from Sparse Signals
Jiangnan Tang, Jingya Wang, Kaiyang Ji et al.
MaskCLR: Attention-Guided Contrastive Learning for Robust Action Representation Learning
Mohamed Abdelfattah, Mariam Hassan, Alex Alahi
D3T: Distinctive Dual-Domain Teacher Zigzagging Across RGB-Thermal Gap for Domain-Adaptive Object Detection
Dinh Phat Do, Taehoon Kim, JAEMIN NA et al.
MAGICK: A Large-scale Captioned Dataset from Matting Generated Images using Chroma Keying
Ryan Burgert, Brian Price, Jason Kuen et al.
Intrinsic Image Diffusion for Indoor Single-view Material Estimation
Peter Kocsis, Vincent Sitzmann, Matthias Nießner
Prompt Highlighter: Interactive Control for Multi-Modal LLMs
Yuechen Zhang, Shengju Qian, Bohao Peng et al.
Can Protective Perturbation Safeguard Personal Data from Being Exploited by Stable Diffusion?
Zhengyue Zhao, Jinhao Duan, Kaidi Xu et al.
NetTrack: Tracking Highly Dynamic Objects with a Net
Guangze Zheng, Shijie Lin, Haobo Zuo et al.
Scaling Up Video Summarization Pretraining with Large Language Models
Dawit Argaw Argaw, Seunghyun Yoon, Fabian Caba Heilbron et al.
Video Recognition in Portrait Mode
Mingfei Han, Linjie Yang, Xiaojie Jin et al.
Online Task-Free Continual Generative and Discriminative Learning via Dynamic Cluster Memory
飞 叶, Adrian Bors
FADES: Fair Disentanglement with Sensitive Relevance
Taeuk Jang, Xiaoqian Wang
Versatile Navigation Under Partial Observability via Value-guided Diffusion Policy
Gengyu Zhang, Hao Tang, Yan Yan
Improving Depth Completion via Depth Feature Upsampling
Yufei Wang, Ge Zhang, Shaoqian Wang et al.
Deep Single Image Camera Calibration by Heatmap Regression to Recover Fisheye Images Under Manhattan World Assumption
Nobuhiko Wakai, Satoshi Sato, Yasunori Ishii et al.
MRFS: Mutually Reinforcing Image Fusion and Segmentation
HAO ZHANG, Xuhui Zuo, Jie Jiang et al.
Multi-agent Long-term 3D Human Pose Forecasting via Interaction-aware Trajectory Conditioning
Jaewoo Jeong, Daehee Park, Kuk-Jin Yoon
OrCo: Towards Better Generalization via Orthogonality and Contrast for Few-Shot Class-Incremental Learning
Noor Ahmed, Anna Kukleva, Bernt Schiele
3D-LFM: Lifting Foundation Model
Mosam Dabhi, László A. Jeni, Simon Lucey
LASIL: Learner-Aware Supervised Imitation Learning For Long-term Microscopic Traffic Simulation
Ke Guo, Zhenwei Miao, Wei Jing et al.
HybridNeRF: Efficient Neural Rendering via Adaptive Volumetric Surfaces
Haithem Turki, Vasu Agrawal, Samuel Rota Bulò et al.
IIRP-Net: Iterative Inference Residual Pyramid Network for Enhanced Image Registration
Tai Ma, zhangsuwei, Jiafeng Li et al.
SEED-Bench: Benchmarking Multimodal Large Language Models
Bohao Li, Yuying Ge, Yixiao Ge et al.
Style Aligned Image Generation via Shared Attention
Amir Hertz, Andrey Voynov, Shlomi Fruchter et al.
NeRFDeformer: NeRF Transformation from a Single View via 3D Scene Flows
Zhenggang Tang, Jason Ren, Xiaoming Zhao et al.
BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models
Fengyuan Shi, Jiaxi Gu, Hang Xu et al.
Active Domain Adaptation with False Negative Prediction for Object Detection
Yuzuru Nakamura, Yasunori Ishii, Takayoshi Yamashita
LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content
Qihao Zhao, Yalun Dai, Hao Li et al.
How to Train Neural Field Representations: A Comprehensive Study and Benchmark
Samuele Papa, Riccardo Valperga, David Knigge et al.
Semantic-Aware Multi-Label Adversarial Attacks
Hassan Mahmood, Ehsan Elhamifar
Motion-adaptive Separable Collaborative Filters for Blind Motion Deblurring
Chengxu Liu, Xuan Wang, Xiangyu Xu et al.
SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation
Zhixuan Liu, Peter Schaldenbrand, Beverley-Claire Okogwu et al.
CoralSCOP: Segment any COral Image on this Planet
Zheng Ziqiang, Liang Haixin, Binh-Son Hua et al.
Reg-PTQ: Regression-specialized Post-training Quantization for Fully Quantized Object Detector
Yifu Ding, Weilun Feng, Chuyan Chen et al.
FREE: Faster and Better Data-Free Meta-Learning
Yongxian Wei, Zixuan Hu, Zhenyi Wang et al.
Open Vocabulary Semantic Scene Sketch Understanding
Ahmed Bourouis, Judith Fan, Yulia Gryaditskaya
You Only Need Less Attention at Each Stage in Vision Transformers
Shuoxi Zhang, Hanpeng Liu, Stephen Lin et al.
Hierarchical Patch Diffusion Models for High-Resolution Video Generation
Ivan Skorokhodov, Willi Menapace, Aliaksandr Siarohin et al.
Rethinking the Up-Sampling Operations in CNN-based Generative Network for Generalizable Deepfake Detection
Chuangchuang Tan, Huan Liu, Yao Zhao et al.
Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception
Haoming Chen, Zhizhong Zhang, Yanyun Qu et al.
BoQ: A Place is Worth a Bag of Learnable Queries
Amar Ali-bey, Brahim Chaib-draa, Philippe Giguère
UFC-Net: Unrolling Fixed-point Continuous Network for Deep Compressive Sensing
Xiaoyang Wang, Hongping Gan
Symphonize 3D Semantic Scene Completion with Contextual Instance Queries
Haoyi Jiang, Tianheng Cheng, Naiyu Gao et al.
CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment
Sajid Javed, Arif Mahmood, IYYAKUTTI IYAPPAN GANAPATHI et al.
MaskPLAN: Masked Generative Layout Planning from Partial Input
Hang Zhang, Anton Savov, Benjamin Dillenburger
Solving Masked Jigsaw Puzzles with Diffusion Vision Transformers
Jinyang Liu, Wondmgezahu Teshome, Sandesh Ghimire et al.
Towards Memorization-Free Diffusion Models
Chen Chen, Daochang Liu, Chang Xu
AV-RIR: Audio-Visual Room Impulse Response Estimation
Anton Ratnarajah, Sreyan Ghosh, Sonal Kumar et al.
Entangled View-Epipolar Information Aggregation for Generalizable Neural Radiance Fields
Zhiyuan Min, Yawei Luo, Wei Yang et al.
A-Teacher: Asymmetric Network for 3D Semi-Supervised Object Detection
Hanshi Wang, Zhipeng Zhang, Jin Gao et al.
HanDiffuser: Text-to-Image Generation With Realistic Hand Appearances
Supreeth Narasimhaswamy, Uttaran Bhattacharya, Xiang Chen et al.
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Bin Xiao, Haiping Wu, Weijian Xu et al.
DMR: Decomposed Multi-Modality Representations for Frames and Events Fusion in Visual Reinforcement Learning
Haoran Xu, Peixi Peng, Guang Tan et al.
3D Feature Tracking via Event Camera
Siqi Li, Zhou Zhikuan, Zhou Xue et al.
Frequency-aware Event-based Video Deblurring for Real-World Motion Blur
Taewoo Kim, Hoonhee Cho, Kuk-Jin Yoon
FedHCA2: Towards Hetero-Client Federated Multi-Task Learning
Yuxiang Lu, Suizhi Huang, Yuwen Yang et al.
Improving Unsupervised Hierarchical Representation with Reinforcement Learning
Ruyi An, Yewen Li, Xu He et al.
Global Latent Neural Rendering
Thomas Tanay, Matteo Maggioni
Data Poisoning based Backdoor Attacks to Contrastive Learning
Jinghuai Zhang, Hongbin Liu, Jinyuan Jia et al.
RoHM: Robust Human Motion Reconstruction via Diffusion
Siwei Zhang, Bharat Lal Bhatnagar, Yuanlu Xu et al.
SportsHHI: A Dataset for Human-Human Interaction Detection in Sports Videos
Tao Wu, Runyu He, Gangshan Wu et al.
Classes Are Not Equal: An Empirical Study on Image Recognition Fairness
Jiequan Cui, Beier Zhu, Xin Wen et al.
ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object
Chenshuang Zhang, Fei Pan, Junmo Kim et al.
BlockGCN: Redefine Topology Awareness for Skeleton-Based Action Recognition
Yuxuan Zhou, Xudong Yan, Zhi-Qi Cheng et al.
Dynamic Inertial Poser (DynaIP): Part-Based Motion Dynamics Learning for Enhanced Human Pose Estimation with Sparse Inertial Sensors
Yu Zhang, Songpengcheng Xia, Lei Chu et al.
Person-in-WiFi 3D: End-to-End Multi-Person 3D Pose Estimation with Wi-Fi
Kangwei Yan, Fei Wang, Bo Qian et al.
ERMVP: Communication-Efficient and Collaboration-Robust Multi-Vehicle Perception in Challenging Environments
Jingyu Zhang, Kun Yang, Yilei Wang et al.
GRAM: Global Reasoning for Multi-Page VQA
Itshak Blau, Sharon Fogel, Roi Ronen et al.
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data
Qifan Yu, Juncheng Li, Longhui Wei et al.
Tri-Perspective View Decomposition for Geometry-Aware Depth Completion
Zhiqiang Yan, Yuankai Lin, Kun Wang et al.
MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric
Haokun Lin, Haoli Bai, Zhili Liu et al.
DiffusionRegPose: Enhancing Multi-Person Pose Estimation using a Diffusion-Based End-to-End Regression Approach
Dayi Tan, Hansheng Chen, Wei Tian et al.
Tumor Micro-environment Interactions Guided Graph Learning for Survival Analysis of Human Cancers from Whole-slide Pathological Images
WEI SHAO, YangYang Shi, Daoqiang Zhang et al.
Perception-Oriented Video Frame Interpolation via Asymmetric Blending
Guangyang Wu, Xin Tao, Changlin Li et al.
Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
Qi Yang, Xing Nie, Tong Li et al.
Exact Fusion via Feature Distribution Matching for Few-shot Image Generation
Yingbo Zhou, Yutong Ye, Pengyu Zhang et al.
Fooling Polarization-Based Vision using Locally Controllable Polarizing Projection
Zhuoxiao Li, Zhihang Zhong, Shohei Nobuhara et al.
Affine Equivariant Networks Based on Differential Invariants
Yikang Li, Yeqing Qiu, Yuxuan Chen et al.
Diffusion-based Blind Text Image Super-Resolution
Yuzhe Zhang, jiawei zhang, Hao Li et al.
Improving Generalized Zero-Shot Learning by Exploring the Diverse Semantics from External Class Names
Yapeng Li, Yong Luo, Zengmao Wang et al.
Continual Learning for Motion Prediction Model via Meta-Representation Learning and Optimal Memory Buffer Retention Strategy
Dae Jun Kang, Dongsuk Kum, Sanmin Kim
FlowDiffuser: Advancing Optical Flow Estimation with Diffusion Models
Ao Luo, XIN LI, Fan Yang et al.
3DGS-Avatar: Animatable Avatars via Deformable 3D Gaussian Splatting
Zhiyin Qian, Shaofei Wang, Marko Mihajlovic et al.
Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation
Haofeng Liu, Chenshu Xu, Yifei Yang et al.
AdaRevD: Adaptive Patch Exiting Reversible Decoder Pushes the Limit of Image Deblurring
Xintian Mao, Xiwen Gao, Yan Wang
Puff-Net: Efficient Style Transfer with Pure Content and Style Feature Fusion Network
Sizhe Zheng, Pan Gao, Peng Zhou et al.
SynSP: Synergy of Smoothness and Precision in Pose Sequences Refinement
Tao Wang, Lei Jin, Zheng Wang et al.
Building Vision-Language Models on Solid Foundations with Masked Distillation
Sepehr Sameni, Kushal Kafle, Hao Tan et al.
MS-DETR: Efficient DETR Training with Mixed Supervision
Chuyang Zhao, Yifan Sun, Wenhao Wang et al.
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
Pengchong Qiao, Lei Shang, Chang Liu et al.
Towards High-fidelity Artistic Image Vectorization via Texture-Encapsulated Shape Parameterization
Ye Chen, Bingbing Ni, Jinfan Liu et al.
OmniSDF: Scene Reconstruction using Omnidirectional Signed Distance Functions and Adaptive Binoctrees
Hakyeong Kim, Andreas Meuleman, Hyeonjoong Jang et al.
Deformable One-shot Face Stylization via DINO Semantic Guidance
Yang Zhou, Zichong Chen, Hui Huang
Density-Guided Semi-Supervised 3D Semantic Segmentation with Dual-Space Hardness Sampling
Jianan Li, Qiulei Dong
Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training Framework
Vu Minh Hieu Phan, Yutong Xie, Yuankai Qi et al.
LP++: A Surprisingly Strong Linear Probe for Few-Shot CLIP
Yunshi HUANG, Fereshteh Shakeri, Jose Dolz et al.
1-Lipschitz Layers Compared: Memory Speed and Certifiable Robustness
Bernd Prach, Fabio Brau, Giorgio Buttazzo et al.
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models
Hyeonho Jeong, Geon Yeong Park, Jong Chul Ye
PoNQ: a Neural QEM-based Mesh Representation
Nissim Maruani, Maks Ovsjanikov, Pierre Alliez et al.
M3-UDA: A New Benchmark for Unsupervised Domain Adaptive Fetal Cardiac Structure Detection
Bin Pu, Liwen Wang, Jiewen Yang et al.
Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental Learning
Da-Wei Zhou, Hai-Long Sun, Han-Jia Ye et al.
Beyond Image Super-Resolution for Image Recognition with Task-Driven Perceptual Loss
Jaeha Kim, Junghun Oh, Kyoung Mu Lee
Point-VOS: Pointing Up Video Object Segmentation
Sabarinath Mahadevan, Idil Esen Zulfikar, Paul Voigtlaender et al.
3D Face Tracking from 2D Video through Iterative Dense UV to Image Flow
Felix Taubner, Prashant Raina, Mathieu Tuli et al.
HIT: Estimating Internal Human Implicit Tissues from the Body Surface
Marilyn Keller, Vaibhav ARORA, Abdelmouttaleb Dakri et al.
Authentic Hand Avatar from a Phone Scan via Universal Hand Model
Gyeongsik Moon, Weipeng Xu, Rohan Joshi et al.
Task-Driven Exploration: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection
Jin Yang, Ping Wei, Huan Li et al.
Multiway Point Cloud Mosaicking with Diffusion and Global Optimization
Shengze Jin, Iro Armeni, Marc Pollefeys et al.
NeRSP: Neural 3D Reconstruction for Reflective Objects with Sparse Polarized Images
Yufei Han, Heng Guo, Koki Fukai et al.
HDRFlow: Real-Time HDR Video Reconstruction with Large Motions
Gangwei Xu, Yujin Wang, Jinwei Gu et al.
Beyond Average: Individualized Visual Scanpath Prediction
Xianyu Chen, Ming Jiang, Qi Zhao
Beyond Text: Frozen Large Language Models in Visual Signal Comprehension
Lei Zhu, Fangyun Wei, Yanye Lu
LEDITS++: Limitless Image Editing using Text-to-Image Models
Manuel Brack, Felix Friedrich, Katharina Kornmeier et al.
CAM Back Again: Large Kernel CNNs from a Weakly Supervised Object Localization Perspective
Shunsuke Yasuki, Masato Taki
Regularized Parameter Uncertainty for Improving Generalization in Reinforcement Learning
Pehuen Moure, Longbiao Cheng, Joachim Ott et al.
Robust Noisy Correspondence Learning with Equivariant Similarity Consistency
Yuchen Yang, Erkun Yang, Likai Wang et al.
Situational Awareness Matters in 3D Vision Language Reasoning
Yunze Man, Liang-Yan Gui, Yu-Xiong Wang
Decentralized Directed Collaboration for Personalized Federated Learning
Yingqi Liu, Yifan Shi, Qinglun Li et al.
Task-Driven Wavelets using Constrained Empirical Risk Minimization
Eric Marcus, Ray Sheombarsing, Jan-Jakob Sonke et al.
SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction
Zechuan Zhang, Zongxin Yang, Yi Yang
OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning
Siddharth Srivastava, Gaurav Sharma
Probing Synergistic High-Order Interaction in Infrared and Visible Image Fusion
Naishan Zheng, Man Zhou, Jie Huang et al.
Scaling Up Dynamic Human-Scene Interaction Modeling
Nan Jiang, Zhiyuan Zhang, Hongjie Li et al.
Utility-Fairness Trade-Offs and How to Find Them
Sepehr Dehdashtian, Bashir Sadeghi, Vishnu Naresh Boddeti
Data-Free Quantization via Pseudo-label Filtering
Chunxiao Fan, Ziqi Wang, Dan Guo et al.
Fitting Flats to Flats
Gabriel Dogadov, Ugo Finnendahl, Marc Alexa
HOIST-Former: Hand-held Objects Identification Segmentation and Tracking in the Wild
Supreeth Narasimhaswamy, Huy Anh Nguyen, Lihan Huang et al.
Towards Robust Learning to Optimize with Theoretical Guarantees
Qingyu Song, Wei Lin, Juncheng Wang et al.
Animating General Image with Large Visual Motion Model
Dengsheng Chen, Xiaoming Wei, Xiaolin Wei
MetaCloak: Preventing Unauthorized Subject-driven Text-to-image Diffusion-based Synthesis via Meta-learning
Yixin Liu, Chenrui Fan, Yutong Dai et al.
EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams
Christen Millerdurai, Hiroyasu Akada, Jian Wang et al.
ModaVerse: Efficiently Transforming Modalities with LLMs
Xinyu Wang, Bohan Zhuang, Qi Wu
Improving Generalization via Meta-Learning on Hard Samples
Nishant Jain, Arun Suggala, Pradeep Shenoy
WaveFace: Authentic Face Restoration with Efficient Frequency Recovery
Yunqi Miao, Jiankang Deng, Jungong Han
Hierarchical Histogram Threshold Segmentation – Auto-terminating High-detail Oversegmentation
Thomas Chang, Simon Seibt, Bartosz von Rymon Lipinski
CogAgent: A Visual Language Model for GUI Agents
Wenyi Hong, Weihan Wang, Qingsong Lv et al.
Learning Adaptive Spatial Coherent Correlations for Speech-Preserving Facial Expression Manipulation
Tianshui Chen, Jianman Lin, Zhijing Yang et al.
UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and Unfavorable Sets
Youngju Na, Woo Jae Kim, Kyu Han et al.
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Lihe Yang, Bingyi Kang, Zilong Huang et al.
EventDance: Unsupervised Source-free Cross-modal Adaptation for Event-based Object Recognition
Xu Zheng, Addison, Lin Wang
Real-World Efficient Blind Motion Deblurring via Blur Pixel Discretization
Insoo Kim, Jae Seok Choi, Geonseok Seo et al.
Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers
Sanghyeok Lee, Joonmyung Choi, Hyunwoo J. Kim
BANF: Band-Limited Neural Fields for Levels of Detail Reconstruction
Ahan Shabanov, Shrisudhan Govindarajan, Cody Reading et al.
HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting
Hongyu Zhou, Jiahao Shao, Lu Xu et al.
Human Motion Prediction Under Unexpected Perturbation
Jiangbei Yue, Baiyi Li, Julien Pettré et al.
LLMs are Good Action Recognizers
Haoxuan Qu, Yujun Cai, Jun Liu
SynFog: A Photo-realistic Synthetic Fog Dataset based on End-to-end Imaging Simulation for Advancing Real-World Defogging in Autonomous Driving
Yiming Xie, Henglu Wei, Zhenyi Liu et al.