Most Cited 2025 "causal perspective" Papers
22,274 papers found • Page 56 of 112
Conference
Bridge Frame and Event: Common Spatiotemporal Fusion for High-Dynamic Scene Optical Flow
Hanyu Zhou, Haonan Wang, Haoyue Liu et al.
On Large Multimodal Models as Open-World Image Classifiers
Alessandro Conti, Massimiliano Mancini, Enrico Fini et al.
Sample-efficient Learning of Concepts with Theoretical Guarantees: from Data to Concepts without Interventions
Hidde Fokkema, Tim van Erven, Sara Magliacane
Detecting Adversarial Data Using Perturbation Forgery
Qian Wang, Chen Li, Yuchen Luo et al.
C-NAV: Towards Self-Evolving Continual Object Navigation in Open World
MingMing Yu, Fei Zhu, Wenzhuo Liu et al.
GO-N3RDet: Geometry Optimized NeRF-enhanced 3D Object Detector
Zechuan Li, Hongshan Yu, Yihao Ding et al.
Replicable Online Learning
Saba Ahmadi, Siddharth Bhandari, Avrim Blum
Just One Layer Norm Guarantees Stable Extrapolation
Juliusz Ziomek, George Whittle, Michael A Osborne
JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model
Qihao Duan, Bingding Huang, Zhenqiao Song et al.
Balanced Rate-Distortion Optimization in Learned Image Compression
Yichi Zhang, Zhihao Duan, Yuning Huang et al.
PBCAT: Patch-Based Composite Adversarial Training against Physically Realizable Attacks on Object Detection
Xiao Li, Yiming Zhu, Yifan Huang et al.
Evaluating multiple models using labeled and unlabeled data
Divya Shanmugam, Shuvom Sadhuka, Manish Raghavan et al.
TREND: Unsupervised 3D Representation Learning via Temporal Forecasting for LiDAR Perception
Runjian Chen, Hyoungseob Park, Bo Zhang et al.
Martingale Score: An Unsupervised Metric for Bayesian Rationality in LLM Reasoning
Zhonghao He, Tianyi (Alex) Qiu, Hirokazu Shirado et al.
Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval
Dohwan Ko, Ji Soo Lee, Minhyuk Choi et al.
Bridge the Gap: From Weak to Full Supervision for Temporal Action Localization with PseudoFormer
Ziyi Liu, Yangcen Liu
Image as an IMU: Estimating Camera Motion from a Single Motion-Blurred Image
Jerred Chen, Ronald Clark
ArcPro: Architectural Programs for Structured 3D Abstraction of Sparse Points
Qirui Huang, Runze Zhang, Kangjun Liu et al.
Over-squashing in Spatiotemporal Graph Neural Networks
Ivan Marisca, Jacob Bamberger, Cesare Alippi et al.
MEgoHand: Multimodal Egocentric Hand-Object Interaction Motion Generation
Bohan Zhou, Yi Zhan, Zhongbin Zhang et al.
Recurrent Feature Mining and Keypoint Mixup Padding for Category-Agnostic Pose Estimation
Junjie Chen, Weilong Chen, Yifan Zuo et al.
FIction: 4D Future Interaction Prediction from Video
Kumar Ashutosh, Georgios Pavlakos, Kristen Grauman
Social Debiasing for Fair Multi-modal LLMs
Harry Cheng, Yangyang Guo, Qingpei Guo et al.
ConText-CIR: Learning from Concepts in Text for Composed Image Retrieval
Eric Xing, Pranavi Kolouju, Robert Pless et al.
Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation
Xiuyu Yang, Shuhan Tan, Philipp Kraehenbuehl
Parallelizing MCMC Across the Sequence Length
David Zoltowski, Skyler Wu, Xavier Gonzalez et al.
Mechanism and Emergence of Stacked Attention Heads in Multi-Layer Transformers
Tiberiu Mușat
Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy
Yiting Yang, Hao Luo, Yuan Sun et al.
Self-Refining Language Model Anonymizers via Adversarial Distillation
Kyuyoung Kim, Hyunjun Jeon, Jinwoo Shin
Reading Recognition in the Wild
Charig Yang, Samiul Alam, Shakhrul Iman Siam et al.
UniPhy: Learning a Unified Constitutive Model for Inverse Physics Simulation
Himangi Mittal, Peiye Zhuang, Hsin-Ying Lee et al.
BlockScan: Detecting Anomalies in Blockchain Transactions
Jiahao Yu, Xian Wu, Hao Liu et al.
Gyro-based Neural Single Image Deblurring
Heemin Yang, Jaesung Rim, Seungyong Lee et al.
Bridging Domain Generalization to Multimodal Domain Generalization via Unified Representations
Hai Huang, Yan Xia, Sashuai Zhou et al.
Diving into the Fusion of Monocular Priors for Generalized Stereo Matching
Chengtang Yao, Lidong Yu, Zhidan Liu et al.
Enhancing Image Restoration Transformer via Adaptive Translation Equivariance
JiaKui Hu, Zhengjian Yao, Lujia Jin et al.
Progressive Test Time Energy Adaptation for Medical Image Segmentation
Xiaoran Zhang, Byung-Woo Hong, Hyoungseob Park et al.
MonarchAttention: Zero-Shot Conversion to Fast, Hardware-Aware Structured Attention
Can Yaras, Alec Xu, Pierre Abillama et al.
MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning
Yuxuan Luo, Ryan Yuan, Junwen Chen et al.
DiSciPLE: Learning Interpretable Programs for Scientific Visual Discovery
Utkarsh Mall, Cheng Perng Phoo, Mia Chiquier et al.
Neurons: Emulating the Human Visual Cortex Improves Fidelity and Interpretability in fMRI-to-Video Reconstruction
Haonan Wang, Qixiang ZHANG, Lehan Wang et al.
STEER-ME: Assessing the Microeconomic Reasoning of Large Language Models
Narun Raman, Taylor Lundy, Thiago Amin et al.
EDCFlow: Exploring Temporally Dense Difference Maps for Event-based Optical Flow Estimation
Daikun Liu, Lei Cheng, Teng Wang et al.
Z-Magic: Zero-shot Multiple Attributes Guided Image Creator
Yingying Deng, Xiangyu He, Fan Tang et al.
Seeing in the Dark: Benchmarking Egocentric 3D Vision with the Oxford Day-and-Night Dataset
Zirui Wang, Wenjing Bian, Xinghui Li et al.
Rethinking Invariance Regularization in Adversarial Training to Improve Robustness-Accuracy Trade-off
Futa Waseda, Ching-Chun Chang, Isao Echizen
From Imitation to Innovation: The Emergence of AI's Unique Artistic Styles and the Challenge of Copyright Protection
Zexi Jia, Chuanwei Huang, Hongyan Fei et al.
Parameterized Blur Kernel Prior Learning for Local Motion Deblurring
Zhenxuan Fang, Fangfang Wu, Tao Huang et al.
ATCTrack: Aligning Target-Context Cues with Dynamic Target States for Robust Vision-Language Tracking
Xiaokun Feng, Shiyu Hu, Xuchen Li et al.
CineTechBench: A Benchmark for Cinematographic Technique Understanding and Generation
Xinran Wang, Songyu Xu, Shan Xiangxuan et al.
MQuAKE-Remastered: Multi-Hop Knowledge Editing Can Only Be Advanced with Reliable Evaluations
Shaochen Zhong, Yifan (Louie) Lu, Lize Shao et al.
Wide-Horizon Thinking and Simulation-Based Evaluation for Real-World LLM Planning with Multifaceted Constraints
Dongjie Yang, Chengqiang Lu, Qimeng Wang et al.
How To Make Your Cell Tracker Say "I dunno!"
Richard D Paul, Johannes Seiffarth, David Rügamer et al.
HoGS: Unified Near and Far Object Reconstruction via Homogeneous Gaussian Splatting
Xinpeng Liu, Zeyi Huang, Fumio Okura et al.
GPO: Learning from Critical Steps to Improve LLM Reasoning
Jiahao Yu, Zelei Cheng, Xian Wu et al.
Polar Sparsity: High Throughput Batched LLM Inferencing with Scalable Contextual Sparsity
Susav Shrestha, Bradley Settlemyer, Nikoli Dryden et al.
Differential-informed Sample Selection Accelerates Multimodal Contrastive Learning
Zihua Zhao, Feng Hong, Mengxi Chen et al.
Open-ended Hierarchical Streaming Video Understanding with Vision Language Models
Hyolim Kang, Yunsu Park, Youngbeom Yoo et al.
BATCLIP: Bimodal Online Test-Time Adaptation for CLIP
Sarthak Kumar Maharana, Baoming Zhang, Leonid Karlinsky et al.
Image Referenced Sketch Colorization Based on Animation Creation Workflow
Dingkun Yan, Xinrui Wang, Zhuoru Li et al.
Diverse Policies Recovering via Pointwise Mutual Information Weighted Imitation Learning
Hanlin Yang, Jian Yao, Weiming Liu et al.
TimeTracker: Event-based Continuous Point Tracking for Video Frame Interpolation with Non-linear Motion
Haoyue Liu, Jinghan Xu, Yi Chang et al.
RLZero: Direct Policy Inference from Language Without In-Domain Supervision
Harshit Sushil Sikchi, Siddhant Agarwal, Pranaya Jajoo et al.
Discrete Diffusion Models: Novel Analysis and New Sampler Guarantees
Yuchen Liang, Yingbin Liang, Lifeng LAI et al.
Monocular Semantic Scene Completion via Masked Recurrent Networks
Xuzhi Wang, Xinran Wu, Song Wang et al.
Fast and Accurate Gigapixel Pathological Image Classification with Hierarchical Distillation Multi-Instance Learning
Jiuyang Dong, Junjun Jiang, Kui Jiang et al.
Option-aware Temporally Abstracted Value for Offline Goal-Conditioned Reinforcement Learning
Hongjoon Ahn, Heewoong Choi, Jisu Han et al.
UNICL-SAM: Uncertainty-Driven In-Context Segmentation with Part Prototype Discovery
Dianmo Sheng, Dongdong Chen, Zhentao Tan et al.
BEEM: Boosting Performance of Early Exit DNNs using Multi-Exit Classifiers as Experts
Divya Jyoti Bajpai, Manjesh Kumar Hanawal
Grouped Speculative Decoding for Autoregressive Image Generation
Junhyuk So, Juncheol Shin, Hyunho Kook et al.
CrossAD: Time Series Anomaly Detection with Cross-scale Associations and Cross-window Modeling
Beibu Li, Qichao Shentu, Yang Shu et al.
Regret Analysis of Average-Reward Unichain MDPs via an Actor-Critic Approach
Swetha Ganesh, Vaneet Aggarwal
SPAZER: Spatial-Semantic Progressive Reasoning Agent for Zero-shot 3D Visual Grounding
Zhao Jin, Rong-Cheng Tu, Jingyi Liao et al.
Inpaint4Drag: Repurposing Inpainting Models for Drag-Based Image Editing via Bidirectional Warping
Jingyi Lu, Kai Han
FACE: Faithful Automatic Concept Extraction
Dipkamal Bhusal, Michael Clifford, Sara Rampazzi et al.
Spectral Analysis of Representational Similarity with Limited Neurons
Hyunmo Kang, Abdulkadir Canatar, SueYeon Chung
SplitFlow: Flow Decomposition for Inversion-Free Text-to-Image Editing
Sung-Hoon Yoon, Minghan Li, Gaspard Beaudouin et al.
RoomPainter: View-Integrated Diffusion for Consistent Indoor Scene Texturing
Zhipeng Huang, Wangbo Yu, Xinhua Cheng et al.
PropVG: End-to-End Proposal-Driven Visual Grounding with Multi-Granularity Discrimination
Ming Dai, Wenxuan Cheng, Jiedong Zhuang et al.
HairCUP: Hair Compositional Universal Prior for 3D Gaussian Avatars
Byungjun Kim, Shunsuke Saito, Giljoo Nam et al.
Semantic Equitable Clustering: A Simple and Effective Strategy for Clustering Vision Tokens
Qihang Fan, Huaibo Huang, Mingrui Chen et al.
SAUCE: Selective Concept Unlearning in Vision-Language Models with Sparse Autoencoders
Jiahui Geng, Qing Li
InfiGFusion: Graph-on-Logits Distillation via Efficient Gromov-Wasserstein for Model Fusion
Yuanyi Wang, Zhaoyi Yan, Yiming Zhang et al.
CALICO: Part-Focused Semantic Co-Segmentation with Large Vision-Language Models
Kiet A. Nguyen, Adheesh Juvekar, Tianjiao Yu et al.
PriOr-Flow: Enhancing Primitive Panoramic Optical Flow with Orthogonal View
Longliang Liu, Miaojie Feng, Junda Cheng et al.
Self-Ensembling Gaussian Splatting for Few-Shot Novel View Synthesis
Chen Zhao, Xuan Wang, Tong Zhang et al.
SP3D: Boosting Sparsely-Supervised 3D Object Detection via Accurate Cross-Modal Semantic Prompts
Shijia Zhao, Qiming Xia, Xusheng Guo et al.
Inference-Time Reward Hacking in Large Language Models
Hadi Khalaf, Claudio Mayrink Verdun, Alex Oesterling et al.
Scale Your Instructions: Enhance the Instruction-Following Fidelity of Unified Image Generation Model by Self-Adaptive Attention Scaling
Chao Zhou, Tianyi Wei, Nenghai Yu
Wav2Sem: Plug-and-Play Audio Semantic Decoupling for 3D Speech-Driven Facial Animation
Hao Li, Ju Dai, Xin Zhao et al.
Memory Decoder: A Pretrained, Plug-and-Play Memory for Large Language Models
Jiaqi Cao, Jiarui Wang, Rubin Wei et al.
Simultaneous Modeling of Protein Conformation and Dynamics via Autoregression
Yuning Shen, Lihao Wang, Huizhuo Yuan et al.
A Snapshot of Influence: A Local Data Attribution Framework for Online Reinforcement Learning
Yuzheng Hu, Fan Wu, Haotian Ye et al.
LLM Safety Alignment is Divergence Estimation in Disguise
Rajdeep Haldar, Ziyi Wang, Guang Lin et al.
Escaping the SpuriVerse: Can Large Vision-Language Models Generalize Beyond Seen Spurious Correlations?
Yiwei Yang, Chung Peng Lee, Shangbin Feng et al.
ODG: Occupancy Prediction Using Dual Gaussians
Yunxiao Shi, Yinhao Zhu, Herbert Cai et al.
Learning Class Prototypes for Unified Sparse-Supervised 3D Object Detection
Yun Zhu, Le Hui, Hang Yang et al.
SUM Parts: Benchmarking Part-Level Semantic Segmentation of Urban Meshes
Weixiao Gao, Liangliang Nan, Hugo Ledoux
LibraGrad: Balancing Gradient Flow for Universally Better Vision Transformer Attributions
Faridoun Mehri, Mahdieh Baghshah, Mohammad Taher Pilehvar
Bringing SAM to new heights: leveraging elevation data for tree crown segmentation from drone imagery
Mélisande Teng, Arthur Ouaknine, Etienne Laliberté et al.
Your Scale Factors are My Weapon: Targeted Bit-Flip Attacks on Vision Transformers via Scale Factor Manipulation
Jialai Wang, Yuxiao Wu, Weiye Xu et al.
Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update
Yu-Jie Zhang, Sheng-An Xu, Peng Zhao et al.
Anti-Aliased 2D Gaussian Splatting
Mae Younes, Adnane Boukhayma
COALA: Numerically Stable and Efficient Framework for Context-Aware Low-Rank Approximation
Uliana Parkina, Maxim Rakhuba
Dynamic Reconstruction of Hand-Object Interaction with Distributed Force-aware Contact Representation
Zhenjun Yu, Wenqiang Xu, Pengfei Xie et al.
Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning
Yurun Yuan, Fan Chen, Zeyu Jia et al.
Meta-Learning Objectives for Preference Optimization
Carlo Alfano, Silvia Sapora, Jakob Foerster et al.
OmniSplat: Taming Feed-Forward 3D Gaussian Splatting for Omnidirectional Images with Editable Capabilities
Suyoung Lee, JAEYOUNG CHUNG, Kihoon Kim et al.
EDM: Equirectangular Projection-Oriented Dense Kernelized Feature Matching
Dongki Jung, Jaehoon Choi, Yonghan Lee et al.
FSHNet: Fully Sparse Hybrid Network for 3D Object Detection
Shuai Liu, Mingyue Cui, Boyang Li et al.
XIFBench: Evaluating Large Language Models on Multilingual Instruction Following
Zhenyu Li, Kehai Chen, Yunfei Long et al.
Learning to Normalize on the SPD Manifold under Bures-Wasserstein Geometry
Rui Wang, Shaocheng Jin, Ziheng Chen et al.
FOCUS: Internal MLLM Representations for Efficient Fine-Grained Visual Question Answering
Liangyu Zhong, Fabio Philipp Rosenthal, Joachim Sicking et al.
3D Equivariant Visuomotor Policy Learning via Spherical Projection
Boce Hu, Dian Wang, David Klee et al.
DERD-Net: Learning Depth from Event-based Ray Densities
Diego de Oliveira Hitzges, Suman Ghosh, Guillermo Gallego
OmniCast: A Masked Latent Diffusion Model for Weather Forecasting Across Time Scales
Tung Nguyen, Tuan Pham, Troy Arcomano et al.
Gradient Multi-Normalization for Efficient LLM Training
Meyer Scetbon, Chao Ma, Wenbo Gong et al.
Put CASH on Bandits: A Max K-Armed Problem for Automated Machine Learning
Amir Rezaei Balef, Claire Vernade, Katharina Eggensperger
Hierarchical Features Matter: A Deep Exploration of Progressive Parameterization Method for Dataset Distillation
Xinhao Zhong, Hao Fang, Bin Chen et al.
RvLLM: LLM Runtime Verification with Domain Knowledge
Yedi Zhang, Sun Emma, Annabelle En et al.
PS-Diffusion: Photorealistic Subject-Driven Image Editing with Disentangled Control and Attention
Weicheng Wang, Guoli Jia, Zhongqi Zhang et al.
Ravan: Multi-Head Low-Rank Adaptation for Federated Fine-Tuning
Arian Raje, Baris Askin, Divyansh Jhunjhunwala et al.
Conformal Information Pursuit for Interactively Guiding Large Language Models
Kwan Ho Ryan Chan, Yuyan Ge, Edgar Dobriban et al.
A machine learning approach that beats Rubik's cubes
Alexander Chervov, Kirill Khoruzhii, Nikita Bukhal et al.
MoPFormer: Motion-Primitive Transformer for Wearable-Sensor Activity Recognition
Hao Zhang, Zhan Zhuang, Xuehao Wang et al.
Whose View of Safety? A Deep DIVE Dataset for Pluralistic Alignment of Text-to-Image Models
Charvi Rastogi, Tian Huey Teh, Pushkar Mishra et al.
Unveiling the Learning Mind of Language Models: A Cognitive Framework and Empirical Study
Zhengyu Hu, Jianxun Lian, Zheyuan Xiao et al.
HumanMM: Global Human Motion Recovery from Multi-shot Videos
Yuhong Zhang, Guanlin Wu, Ling-Hao Chen et al.
Online Language Splatting
Saimouli Katragadda, Cho-Ying Wu, Yuliang Guo et al.
Stable Part Diffusion 4D: Multi-View RGB and Kinematic Parts Video Generation
Hao Zhang, Chun-Han Yao, Simon Donné et al.
Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM
Zinuo Li, Xian Zhang, Yongxin Guo et al.
FeedEdit: Text-Based Image Editing with Dynamic Feedback Regulation
Fengyi Fu, Lei Zhang, Mengqi Huang et al.
Adaptive Batch-Wise Sample Scheduling for Direct Preference Optimization
Zixuan Huang, Yikun Ban, Lean Fu et al.
THUNDER: Tile-level Histopathology image UNDERstanding benchmark
Pierre Marza, Leo Fillioux, Sofiène Boutaj et al.
Floating No More: Object-Ground Reconstruction from a Single Image
Yunze Man, Yichen Sheng, Jianming Zhang et al.
SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction Scenarios
Lingwei Dang, Ruizhi Shao, Hongwen Zhang et al.
Does Object Binding Naturally Emerge in Large Pretrained Vision Transformers?
Yihao Li, Saeed Salehi, Lyle Ungar et al.
GG-SSMs: Graph-Generating State Space Models
Nikola Zubic, Davide Scaramuzza
Strassen Attention, Split VC Dimension and Compositionality in Transformers
Alexander Kozachinskiy, Felipe Urrutia, Hector Orellana et al.
DV-Matcher: Deformation-based Non-rigid Point Cloud Matching Guided by Pre-trained Visual Features
Zhangquan Chen, Puhua Jiang, Ruqi Huang
Distilled Decoding 2: One-step Sampling of Image Auto-regressive Models with Conditional Score Distillation
Enshu Liu, Qian Chen, Xuefei Ning et al.
Understand Before You Generate: Self-Guided Training for Autoregressive Image Generation
Xiaoyu Yue, ZiDong Wang, Yuqing Wang et al.
LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion
Fangfu Liu, Hao Li, Jiawei Chi et al.
DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long Videos
Zijia Lu, ASM Iftekhar, Gaurav Mittal et al.
SafePTR: Token-Level Jailbreak Defense in Multimodal LLMs via Prune-then-Restore Mechanism
Beitao Chen, Xinyu Lyu, shengming yuan et al.
On Fairness of Unified Multimodal Large Language Model for Image Generation
Ming Liu, Hao Chen, Jindong Wang et al.
KL Penalty Control via Perturbation for Direct Preference Optimization
Sangkyu Lee, Janghoon Han, Hosung Song et al.
Uncertainty Weighted Gradients for Model Calibration
Jinxu Lin, Linwei Tao, Minjing Dong et al.
Understanding challenges to the interpretation of disaggregated evaluations of algorithmic fairness
Stephen Pfohl, Natalie Harris, Chirag Nagpal et al.
Orientation-anchored Hyper-Gaussian for 4D Reconstruction from Casual Videos
Junyi Wu, Jiachen Tao, Haoxuan Wang et al.
RobotSmith: Generative Robotic Tool Design for Acquisition of Complex Manipulation Skills
Chunru Lin, Haotian Yuan, Yian Wang et al.
PQPP: A Joint Benchmark for Text-to-Image Prompt and Query Performance Prediction
Eduard Poesina, Adriana Valentina Costache, Adrian-Gabriel Chifu et al.
Zero-Shot Trajectory Planning for Signal Temporal Logic Tasks
Ruijia Liu, Ancheng Hou, Xiao Yu et al.
Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis
Boming Miao, Chunxiao Li, Xiaoxiao Wang et al.
Adaptive Distraction: Probing LLM Contextual Robustness with Automated Tree Search
Yanbo Wang, Zixiang Xu, Yue Huang et al.
GradMetaNet: An Equivariant Architecture for Learning on Gradients
Yoav Gelberg, Yam Eitan, Aviv Navon et al.
A Lightweight UDF Learning Framework for 3D Reconstruction Based on Local Shape Functions
Jiangbei Hu, Yanggeng Li, Fei Hou et al.
Detection, Pose Estimation and Segmentation for Multiple Bodies: Closing the Virtuous Circle
Miroslav Purkrabek, Jiri Matas
Exact and Linear Convergence for Federated Learning under Arbitrary Client Participation is Attainable
Bicheng Ying, Zhe Li, Haibo Yang
Generalizable Object Re-Identification via Visual In-Context Prompting
Zhizhong Huang, Xiaoming Liu
Towards High-fidelity 3D Talking Avatar with Personalized Dynamic Texture
Xuanchen Li, Jianyu Wang, Yuhao Cheng et al.
AdaLRS: Loss-Guided Adaptive Learning Rate Search for Efficient Foundation Model Pretraining
Hongyuan Dong, Dingkang Yang, Xiao Liang et al.
Partial Gromov-Wasserstein Metric
Yikun Bai, Rocio Diaz Martin, Abihith Kothapalli et al.
PoseTraj: Pose-Aware Trajectory Control in Video Diffusion
longbin ji, Lei Zhong, Pengfei Wei et al.
How Different from the Past? Spatio-Temporal Time Series Forecasting with Self-Supervised Deviation Learning
Haotian Gao, Zheng Dong, Jiawei Yong et al.
Volume Tells: Dual Cycle-Consistent Diffusion for 3D Fluorescence Microscopy De-noising and Super-Resolution
ZELIN LI, Chenwei Wang, Zhaoke Huang et al.
Learning Phase Distortion with Selective State Space Models for Video Turbulence Mitigation
Xingguang Zhang, Nicholas M Chimitt, Xijun Wang et al.
Continual Release Moment Estimation with Differential Privacy
Nikita Kalinin, Jalaj Upadhyay, Christoph Lampert
SemCoT: Accelerating Chain-of-Thought Reasoning through Semantically-Aligned Implicit Tokens
Yinhan He, Wendy Zheng, Yaochen Zhu et al.
BRACE: A Benchmark for Robust Audio Caption Quality Evaluation
Tianyu Guo, Hongyu Chen, Hao Liang et al.
Deep learning for continuous-time stochastic control with jumps
Patrick Cheridito, Jean-Loup Dupret, Donatien Hainaut
Reparameterized LLM Training via Orthogonal Equivalence Transformation
Zeju Qiu, Simon Buchholz, Tim Xiao et al.
Fast Direct: Query-Efficient Online Black-box Guidance for Diffusion-model Target Generation
Kim Yong Tan, YUEMING LYU, Ivor Tsang et al.
Task Descriptors Help Transformers Learn Linear Models In-Context
Ruomin Huang, Rong Ge
SocialMOIF: Multi-Order Intention Fusion for Pedestrian Trajectory Prediction
Kai Chen, Xiaodong Zhao, Yujie Huang et al.
Seeing What Matters: Generalizable AI-generated Video Detection with Forensic-Oriented Augmentation
Riccardo Corvi, Davide Cozzolino, Ekta Prashnani et al.
Stochastic Gradients under Nuisances
Facheng Yu, Ronak Mehta, Alex Luedtke et al.
Triad: Empowering LMM-based Anomaly Detection with Expert-guided Region-of-Interest Tokenizer and Manufacturing Process
Yuanze Li, Shihao Yuan, Haolin Wang et al.
SiM3D: Single-instance Multiview Multimodal and Multisetup 3D Anomaly Detection Benchmark
Alex Costanzino, Pierluigi Zama Ramirez, Luigi Lella et al.
SUB: Benchmarking CBM Generalization via Synthetic Attribute Substitutions
Jessica Bader, Leander Girrbach, Stephan Alaniz et al.
Refer to Any Segmentation Mask Group With Vision-Language Prompts
Shengcao Cao, Zijun Wei, Jason Kuen et al.
Decouple to Reconstruct: High Quality UHD Restoration via Active Feature Disentanglement and Reversible Fusion
Yidi Liu, Dong Li, Yuxin Ma et al.
Consensus-Driven Active Model Selection
Justin Kay, Grant Horn, Subhransu Maji et al.
OpenAnimals: Revisiting Person Re-Identification for Animals Towards Better Generalization
Saihui Hou, Panjian Huang, Zengbin Wang et al.
Backdoor Attacks on Neural Networks via One-Bit Flip
Xiang Li, Lannan Luo, Qiang Zeng
FB-Diff: Fourier Basis-guided Diffusion for Temporal Interpolation of 4D Medical Imaging
Xin You, Runze Yang, Chuyan Zhang et al.
Describe, Don’t Dictate: Semantic Image Editing with Natural Language Intent
En Ci, Shanyan Guan, Yanhao Ge et al.
Enhance Multi-View Classification Through Multi-Scale Alignment and Expanded Boundary
Yuena Lin, Yiyuan Wang, Gengyu Lyu et al.
SplArt: Articulation Estimation and Part-Level Reconstruction with 3D Gaussian Splatting
Shengjie Lin, Jiading Fang, Muhammad Zubair Irshad et al.
Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models
Xiao Liang, Di Wang, Zhicheng Jiao et al.
Multimodal Prompt Alignment for Facial Expression Recognition
Fuyan Ma, Yiran He, Bin Sun et al.
CABLD: Contrast-Agnostic Brain Landmark Detection with Consistency-Based Regularization
Soorena Salari, Arash Harirpoush, Hassan Rivaz et al.
Stable Score Distillation
Haiming Zhu, Yangyang Xu, Chenshu Xu et al.
BézierGS: Dynamic Urban Scene Reconstruction with Bézier Curve Gaussian Splatting
Zipei Ma, Junzhe Jiang, Yurui Chen et al.
Diffusion Image Prior
Hamadi Chihaoui, Paolo Favaro
Text2VDM: Text to Vector Displacement Maps for Expressive and Interactive 3D Sculpting
Hengyu Meng, Duotun Wang, Zhijing Shao et al.
Federated Domain Generalization with Domain-specific Soft Prompts Generation
Jianhan Wu, Xiaoyang Qu, Zhangcheng Huang et al.
Selective Contrastive Learning for Weakly Supervised Affordance Grounding
WonJun Moon, Hyun Seok Seong, Jae-Pil Heo
CoST: Efficient Collaborative Perception From Unified Spatiotemporal Perspective
Zongheng Tang, Yi Liu, Yifan Sun et al.
UniConvNet: Expanding Effective Receptive Field while Maintaining Asymptotically Gaussian Distribution for ConvNets of Any Scale
Yuhao Wang, Wei Xi