Most Cited 2025 "microtransactions" Papers
22,274 papers found • Page 105 of 112
Conference
CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning
Yang Yue, Yulin Wang, Chenxin Tao et al.
Knowledge Bridger: Towards Training-Free Missing Modality Completion
Guanzhou Ke, Shengfeng He, Xiao-Li Wang et al.
Semantic and Visual Crop-Guided Diffusion Models for Heterogeneous Tissue Synthesis in Histopathology
Saghir Alfasly, Wataru Uegami, MD ENAMUL HOQ et al.
Gaussian Herding across Pens: An Optimal Transport Perspective on Global Gaussian Reduction for 3DGS
Tao Wang, Mengyu Li, Geduo Zeng et al.
SkySense-O: Towards Open-World Remote Sensing Interpretation with Vision-Centric Visual-Language Modeling
Qi Zhu, Jiangwei Lao, Deyi Ji et al.
AdaDARE-gamma: Balancing Stability and Plasticity in Multi-modal LLMs through Efficient Adaptation
Jingyi Xie, Jintao Yang, Zhunchen Luo et al.
Fingerprinting Denoising Diffusion Probabilistic Models
Huan Teng, Yuhui Quan, Chengyu Wang et al.
Query Efficient Black-Box Visual Prompting with Subspace Learning
Haozhen Zhang, Zhaogeng Liu, Hualin Zhang et al.
Learning Urban Climate Dynamics via Physics-Guided Urban Surface–Atmosphere Interactions
Jiyang Xia, Fenghua Ling, Zhenhui Jessie Li et al.
Aligning and Prompting Anything for Zero-Shot Generalized Anomaly Detection
Jitao Ma, Weiying Xie, Hangyu Ye et al.
Efficient Depth Estimation for Unstable Stereo Camera Systems on AR Glasses
Yongfan Liu, Hyoukjun Kwon
Seeing is Not Believing: Adversarial Natural Object Optimization for Hard-Label 3D Scene Attacks
Daizong Liu, Wei Hu
Once-Tuning-Multiple-Variants: Tuning Once and Expanded as Multiple Vision-Language Model Variants
Chong Yu, Tao Chen, Zhongxue Gan
Hypergraph-Enhanced Contrastive Learning for Multi-View Clustering with Hyper-Laplacian Regularization
Zhibin Gu, weili wang
Identifying multi-compartment Hodgkin-Huxley models with high-density extracellular voltage recordings
Ian Christopher Tanoh, Michael Deistler, Jakob H Macke et al.
Heterogeneous Skeleton-Based Action Representation Learning
Xiaoyan Ma, jidong kuang, Hongsong Wang et al.
Recurrent Feature Mining and Keypoint Mixup Padding for Category-Agnostic Pose Estimation
Junjie Chen, Weilong Chen, Yifan Zuo et al.
Seeing Speech and Sound: Distinguishing and Locating Audio Sources in Visual Scenes
Hyeonggon Ryu, Seongyu Kim, Joon Chung et al.
FASTer: Focal token Acquiring-and-Scaling Transformer for Long-term 3D Objection Detection
Chenxu Dang, Pei An, Xinmin Zhang et al.
$\textit{HiMaCon:}$ Discovering Hierarchical Manipulation Concepts from Unlabeled Multi-Modal Data
Ruizhe Liu, Pei Zhou, Qian Luo et al.
Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss Landscapes
Aodi Li, Liansheng Zhuang, Xiao Long et al.
Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization
Feifei Li, Mi Zhang, Yiming Sun et al.
Beyond Local Sharpness: Communication-Efficient Global Sharpness-aware Minimization for Federated Learning
Debora Caldarola, Pietro Cagnasso, Barbara Caputo et al.
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Qiying Yu, Zheng Zhang, Ruofei Zhu et al.
LoKi: Low-dimensional KAN for Efficient Fine-tuning Image Models
Xuan Cai, Renjie Pan, Hua Yang
DIV-FF: Dynamic Image-Video Feature Fields For Environment Understanding in Egocentric Videos
Lorenzo Mur-Labadia, Jose J. Guerrero, Ruben Martinez-Cantin
AdMiT: Adaptive Multi-Source Tuning in Dynamic Environments
Xiangyu Chang, Fahim Faisal Niloy, Sk Miraj Ahmed et al.
Spectral Learning for Infinite-Horizon Average-Reward POMDPs
Alessio Russo, Alberto Maria Metelli, Marcello Restelli
Viewpoint Rosetta Stone: Unlocking Unpaired Ego-Exo Videos for View-invariant Representation Learning
Mi Luo, Zihui Xue, Alex Dimakis et al.
GIF: Generative Inspiration for Face Recognition at Scale
Mohammad Saadabadi Saadabadi, Sahar Rahimi Malakshan, Ali Dabouei et al.
Unveiling Transformer Perception by Exploring Input Manifolds
Alessandro Benfenati, Alfio Ferrara, Alessio Marta et al.
Robustifying Learning-Augmented Caching Efficiently without Compromising 1-Consistency
Peng Chen, Hailiang Zhao, Jiaji Zhang et al.
CrossSDF: 3D Reconstruction of Thin Structures From Cross-Sections
Thomas Walker, Salvatore Esposito, Daniel Rebain et al.
Looking Beyond the Known: Towards a Data Discovery Guided Open-World Object Detection
Anay Majee, Amitesh Gangrade, Rishabh Iyer
Investigating the Role of Weight Decay in Enhancing Nonconvex SGD
Tao Sun, Yuhao Huang, Li Shen et al.
DAWP: A framework for global observation forecasting via Data Assimilation and Weather Prediction in satellite observation space
Junchao Gong, Jingyi Xu, Ben Fei et al.
MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction
Jingcheng Ni, Yuxin Guo, Yichen Liu et al.
Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought
Chao Huang, Benfeng Wang, Wei Wang et al.
Autoregressive Motion Generation with Gaussian Mixture-Guided Latent Sampling
Linnan Tu, Lingwei Meng, Zongyi Li et al.
AVQACL: A Novel Benchmark for Audio-Visual Question Answering Continual Learning
Kaixuan Wu, Xinde Li, Xinglin Li et al.
Tackling Continual Offline RL through Selective Weights Activation on Aligned Spaces
Jifeng Hu, Sili Huang, Li Shen et al.
Value-Guided Decision Transformer: A Unified Reinforcement Learning Framework for Online and Offline Settings
Hongling Zheng, Li Shen, Yong Luo et al.
Prompt-guided Disentangled Representation for Action Recognition
tianci wu, Guangming Zhu, Lu jiang et al.
IPSI: Enhancing Structural Inference with Automatically Learned Structural Priors
Zhongben Gong, Xiaoqun Wu, Mingyang Zhou
Divide and Conquer: Heterogeneous Noise Integration for Diffusion-based Adversarial Purification
Gaozheng Pei, Shaojie Lyu, Gong Chen et al.
Foley-Flow: Coordinated Video-to-Audio Generation with Masked Audio-Visual Alignment and Dynamic Conditional Flows
Shentong Mo, Yibing Song
DL2G: Degradation-guided Local-to-Global Restoration for Eyeglass Reflection Removal
Yizhilv, Xiao Lu, Hong Ding et al.
Efficient Decoupled Feature 3D Gaussian Splatting via Hierarchical Compression
Zhenqi Dai, Ting Liu, Yanning Zhang
PANGEA: Projection-Based Augmentation with Non-Relevant General Data for Enhanced Domain Adaptation in LLMs
Seungyoo Lee, Giung Nam, Moonseok Choi et al.
Hearing Hands: Generating Sounds from Physical Interactions in 3D Scenes
Yiming Dou, Wonseok Oh, Yuqing Luo et al.
OmniZoom: A Universal Plug-and-Play Paradigm for Cross-Device Smooth Zoom Interpolation
Xiaoan Zhu, Yue Zhao, Tianyang Hu et al.
Domain Generalization in CLIP via Learning with Diverse Text Prompts
Changsong Wen, Zelin Peng, Yu Huang et al.
Alias-Free Latent Diffusion Models: Improving Fractional Shift Equivariance of Diffusion Latent Space
Yifan Zhou, Zeqi Xiao, Shuai Yang et al.
Weakly Supervised Contrastive Adversarial Training for Learning Robust Features from Semi-supervised Data
Lilin Zhang, Chengpei Wu, Ning Yang
DualCnst: Enhancing Zero-Shot Out-of-Distribution Detection via Text-Image Consistency in Vision-Language Models
Fayi Le, Wenwu He, Chentao Cao et al.
InstructSAM: A Training-free Framework for Instruction-Oriented Remote Sensing Object Recognition
Yijie Zheng, Weijie Wu, Qingyun Li et al.
CarPlanner: Consistent Auto-regressive Trajectory Planning for Large-Scale Reinforcement Learning in Autonomous Driving
Dongkun Zhang, Jiaming Liang, Ke Guo et al.
UCM-VeID V2: A Richer Dataset and A Pre-training Method for UAV Cross-Modality Vehicle Re-Identification
Xingyue Liu, Jiahao Qi, Chen Chen et al.
Unboxed: Geometrically and Temporally Consistent Video Outpainting
Zhongrui Yu, Martina Megaro-Boldini, Robert Sumner et al.
Better NTK Conditioning: A Free Lunch from (ReLU) Nonlinear Activation in Wide Neural Networks
Chaoyue Liu, Han Bi, Like Hui et al.
Pattern-Guided Adaptive Prior for Structure Learning
Lyuzhou Chen, Yijia Sun, Yanze Gao et al.
Visual Lexicon: Rich Image Features in Language Space
XuDong Wang, Xingyi Zhou, Alireza Fathi et al.
Continual SFT Matches Multimodal RLHF with Negative Supervision
Ke Zhu, Yu Wang, Yanpeng Sun et al.
BigCharts-R1: Enhanced Chart Reasoning with Visual Reinforcement Finetuning
Ahmed Masry, Abhay Puri, Masoud Hashemi et al.
From Pretraining to Pathology: How Noise Leads to Catastrophic Inheritance in Medical Models
HAO SUN, Zhongyi Han, Hao Chen et al.
CovMatch: Cross-Covariance Guided Multimodal Dataset Distillation with Trainable Text Encoder
Yongmin Lee, Hye Won Chung
Decoupled Motion Expression Video Segmentation
Hao Fang, Runmin Cong, Xiankai Lu et al.
Mixture of Submodules for Domain Adaptive Person Search
Minsu Kim, Seungryong Kim, Kwanghoon Sohn
Unsupervised Discovery of Facial Landmarks and Head Pose
Satyajit Tourani, Siddharth Tourani, Arif Mahmood et al.
Dynamic Integration of Task-Specific Adapters for Class Incremental Learning
Jiashuo Li, Shaokun Wang, Bo Qian et al.
Depth-Width Tradeoffs for Transformers on Graph Tasks
Gilad Yehudai, Clayton Sanford, Maya Bechler-Speicher et al.
Transition Matching: Scalable and Flexible Generative Modeling
Neta Shaul, Uriel Singer, Itai Gat et al.
Test-time Augmentation Improves Efficiency in Conformal Prediction
Divya M Shanmugam, Helen Lu, Swami Sankaranarayanan et al.
GREAT: Geometry-Intention Collaborative Inference for Open-Vocabulary 3D Object Affordance Grounding
Yawen Shao, Wei Zhai, Yuhang Yang et al.
Dual Diffusion for Unified Image Generation and Understanding
Zijie Li, Henry Li, Yichun Shi et al.
Commonsense Video Question Answering through Video-Grounded Entailment Tree Reasoning
Huabin Liu, Filip Ilievski, Cees G. M. Snoek
Enduring, Efficient and Robust Trajectory Prediction Attack in Autonomous Driving via Optimization-Driven Multi-Frame Perturbation Framework
Yi Yu, Weizhen Han, Libing Wu et al.
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
Jingyuan Qi, Zhiyang Xu, Qifan Wang et al.
UNEM: UNrolled Generalized EM for Transductive Few-Shot Learning
Long Zhou, Fereshteh Shakeri, Aymen Sadraoui et al.
Flash-Split: 2D Reflection Removal with Flash Cues and Latent Diffusion Separation
Tianfu Wang, Mingyang Xie, Haoming Cai et al.
Periodic Skill Discovery
Jonghae Park, Daesol Cho, Jusuk Lee et al.
Enhancing Training Data Attribution with Representational Optimization
Weiwei Sun, Haokun Liu, Nikhil Kandpal et al.
Test-Time Backdoor Detection for Object Detection Models
Hangtao Zhang, Yichen Wang, Shihui Yan et al.
Classifier-Free Guidance Inside the Attraction Basin May Cause Memorization
Anubhav Jain, Yuya Kobayashi, Takashi Shibuya et al.
PoseCrafter: Extreme Pose Estimation with Hybrid Video Synthesis
Qing Mao, Tianxin Huang, Yu Zhu et al.
AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning
Yang Chen, Zhuolin Yang, Zihan Liu et al.
Multiclass Loss Geometry Matters for Generalization of Gradient Descent in Separable Classification
Matan Schliserman, Tomer Koren
Reconciling Stochastic and Deterministic Strategies for Zero-shot Image Restoration using Diffusion Model in Dual
Chong Wang, Lanqing Guo, Zixuan Fu et al.
Harnessing Frequency Spectrum Insights for Image Copyright Protection Against Diffusion Models
Zhenguang Liu, Chao Shuai, Shaojing Fan et al.
Gain from Neighbors: Boosting Model Robustness in the Wild via Adversarial Perturbations Toward Neighboring Classes
Zhou Yang, Mingtao Feng, Tao Huang et al.
Denoising Trajectory Biases for Zero-Shot AI-Generated Image Detection
Yachao Liang, Min Yu, Gang Li et al.
Enhancing Creative Generation on Stable Diffusion-based Models
Jiyeon Han, Dahee Kwon, Gayoung Lee et al.
EquiPose: Exploiting Permutation Equivariance for Relative Camera Pose Estimation
Yuzhen Liu, Qiulei Dong
EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues
Sagar Soni, Akshay Dudhane, Hiyam Debary et al.
SPFL: Sequential updates with Parallel aggregation for Enhanced Federated Learning under Category and Domain Shifts
Haoyuan Liang, Shilei Cao, Li et al.
Bridging Scales: Spectral Theory Reveals How Local Connectivity Rules Sculpt Global Neural Dynamics in Spatially Extended Networks
Yuhan Huang, Keren Gao, Dongping Yang et al.
Visual Consensus Prompting for Co-Salient Object Detection
Jie Wang, Nana Yu, Zihao Zhang et al.
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding
Ahmed Masry, Juan Rodriguez, Tianyu Zhang et al.
Classifier-guided CLIP Distillation for Unsupervised Multi-label Classification
Dongseob Kim, Hyunjung Shim
GeoVideo: Introducing Geometric Regularization into Video Generation Model
Yunpeng Bai, Shaoheng Fang, Chaohui Yu et al.
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
Hang Yin, Xiuwei Xu, Linqing Zhao et al.
Floating No More: Object-Ground Reconstruction from a Single Image
Yunze Man, Yichen Sheng, Jianming Zhang et al.
pFedMxF: Personalized Federated Class-Incremental Learning with Mixture of Frequency Aggregation
Yifei Zhang, Hao Zhu, Alysa Ziying Tan et al.
The Art of Deception: Color Visual Illusions and Diffusion Models
Alexandra Gomez-Villa, Kai Wang, C.Alejandro Parraga et al.
RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis
Haolin Li, Tianjie Dai, Zhe Chen et al.
ACAttack: Adaptive Cross Attacking RGB-T Tracker via Multi-Modal Response Decoupling
Xinyu Xiang, Qinglong Yan, HAO ZHANG et al.
RoomEditor: High-Fidelity Furniture Synthesis with Parameter-Sharing U-Net
Zhenyi Lin, Xiaofan Ming, Qilong Wang et al.
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
Bingda Tang, Sayak Paul, Boyang Zheng et al.
SemiETS: Integrating Spatial and Content Consistencies for Semi-Supervised End-to-end Text Spotting
Dongliang Luo, Hanshen Zhu, Ziyang Zhang et al.
Quantifying Elicitation of Latent Capabilities in Language Models
Elizabeth Donoway, Hailey Joren, Arushi Somani et al.
Distilling Long-tailed Datasets
Zhenghao Zhao, Haoxuan Wang, Yuzhang Shang et al.
Knowledge Memorization and Rumination for Pre-trained Model-based Class-Incremental Learning
Zijian Gao, Wangwang Jia, Xingxing Zhang et al.
Adversarial Robustness of Nonparametric Regression
Parsa Moradi, Hanzaleh Nodehi, Mohammad Maddah-Ali
Pose Splatter: A 3D Gaussian Splatting Model for Quantifying Animal Pose and Appearance
Jack Goffinet, Youngjo Min, Carlo Tomasi et al.
Treasure Hunt: Real-time Targeting of the Long Tail using Training-Time Markers
Daniel Dsouza, Julia Kreutzer, Adrien Morisot et al.
SDTagNet: Leveraging Text-Annotated Navigation Maps for Online HD Map Construction
Fabian Immel, Jan-Hendrik Pauls, Richard Schwarzkopf et al.
PQPP: A Joint Benchmark for Text-to-Image Prompt and Query Performance Prediction
Eduard Poesina, Adriana Valentina Costache, Adrian-Gabriel Chifu et al.
CheXwhatsApp: A Dataset for Exploring Challenges in the Diagnosis of Chest X-rays through Mobile Devices
Mariamma Antony, Rajiv Porana, Sahil M. Lathiya et al.
Robust Estimation Under Heterogeneous Corruption Rates
Syomantak Chaudhuri, Jerry Li, Thomas Courtade
Single-Teacher View Augmentation: Boosting Knowledge Distillation via Angular Diversity
Seonghoon Yu, Dongjun Nam, Dina Katabi et al.
Learning to Steer: Input-dependent Steering for Multimodal LLMs
Jayneel Parekh, Pegah KHAYATAN, Mustafa Shukor et al.
From Prototypes to General Distributions: An Efficient Curriculum for Masked Image Modeling
Jinhong Lin, Cheng-En Wu, Huanran Li et al.
pLSTM: parallelizable Linear Source Transition Mark networks
Korbinian Pöppel, Richard Freinschlag, Thomas Schmied et al.
SINR: Sparsity Driven Compressed Implicit Neural Representations
Dhananjaya Jayasundara, Sudarshan Rajagopalan, Yasiru Ranasinghe et al.
Trajectory Graph Learning: Aligning with Long Trajectories in Reinforcement Learning Without Reward Design
Yunfan Li, Eric Liu, Lin Yang
Practical and Effective Code Watermarking for Large Language Models
Zhimeng Guo, Minhao Cheng
Evaluating Robustness of Monocular Depth Estimation with Procedural Scene Perturbations
Jack Nugent, Siyang Wu, Zeyu Ma et al.
Erase Diffusion: Empowering Object Removal Through Calibrating Diffusion Pathways
Yi Liu, Hao Zhou, Benlei Cui et al.
DVHGNN: Multi-Scale Dilated Vision HGNN for Efficient Vision Recognition
Caoshuo Li, Tanzhe Li, Xiaobin Hu et al.
C-NAV: Towards Self-Evolving Continual Object Navigation in Open World
MingMing Yu, Fei Zhu, Wenzhuo Liu et al.
RobSense: A Robust Multi-modal Foundation Model for Remote Sensing with Static, Temporal, and Incomplete Data Adaptability
Minh Kha Do, Kang Han, Phu Lai et al.
MESC-3D:Mining Effective Semantic Cues for 3D Reconstruction from a Single Image
Shaoming Li, Qing Cai, Songqi KONG et al.
Probing the Mid-level Vision Capabilities of Self-Supervised Learning
Xuweiyi Chen, Markus Marks, Zezhou Cheng
ConceptGuard: Continual Personalized Text-to-Image Generation with Forgetting and Confusion Mitigation
Zirun Guo, Tao Jin
Diffusion Bridge: Leveraging Diffusion Model to Reduce the Modality Gap Between Text and Vision for Zero-Shot Image Captioning
Jeongryong Lee, Yejee Shin, Geonhui Son et al.
ScaleDiff: Higher-Resolution Image Synthesis via Efficient and Model-Agnostic Diffusion
Sungho Koh, SeungJu Cha, Hyunwoo Oh et al.
The Devil is in the Prompts: Retrieval-Augmented Prompt Optimization for Text-to-Video Generation
Bingjie Gao, Xinyu Gao, Xiaoxue Wu et al.
Progress Reward Model for Reinforcement Learning via Large Language Models
Xiuhui Zhang, Ning Gao, Xingyu Jiang et al.
VLMs-Guided Representation Distillation for Efficient Vision-Based Reinforcement Learning
Haoran Xu, Peixi Peng, Guang Tan et al.
Human Texts Are Outliers: Detecting LLM-generated Texts via Out-of-distribution Detection
Cong Zeng, Shengkun Tang, Yuanzhou Chen et al.
Efficient $k$-Sparse Band–Limited Interpolation with Improved Approximation Ratio
Yang Cao, Xiaoyu Li, Zhao Song et al.
Beyond Image Classification: A Video Benchmark and Dual-Branch Hybrid Discrimination Framework for Compositional Zero-Shot Learning
Dongyao Jiang, Haodong Jing, Yongqiang Ma et al.
Block-Diagonal LoRA for Eliminating Communication Overhead in Tensor Parallel LoRA Serving
Xinyu Wang, Jonas M. Kübler, Kailash Budhathoki et al.
AniGrad: Anisotropic Gradient-Adaptive Sampling for 3D Reconstruction From Monocular Video
Noah Stier, Alex Rich, Pradeep Sen et al.
RCCDA: Adaptive Model Updates in the Presence of Concept Drift under a Constrained Resource Budget
Adam Piaseczny, Md Kamran Chowdhury Shisher, Shiqiang Wang et al.
ChatVLA-2: Vision-Language-Action Model with Open-World Reasoning
Zhongyi Zhou, Yichen Zhu, Xiaoyu Liu et al.
Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning
Hossein Rajoli Nowdeh, Jie Ji, Xiaolong Ma et al.
Unsupervised Trajectory Optimization for 3D Registration in Serial Section Electron Microscopy using Neural ODEs
Zhenbang Zhang, Jingtong Feng, Hongjia Li et al.
From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D
Jiahui Zhang, Yurui Chen, Yueming Xu et al.
Easy-editable Image Vectorization with Multi-layer Multi-scale Distributed Visual Feature Embedding
Ye Chen, Zhangli Hu, Zhongyin Zhao et al.
DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding
Yudong Han, Qingpei Guo, Liyuan Pan et al.
Automated Proof of Polynomial Inequalities via Reinforcement Learning
Banglong Liu, Niuniu Qi, Xia Zeng et al.
BIP3D: Bridging 2D Images and 3D Perception for Embodied Intelligence
Xuewu Lin, Tianwei Lin, Alan Huang et al.
MASt3R-SLAM: Real-Time Dense SLAM with 3D Reconstruction Priors
Riku Murai, Eric Dexheimer, Andrew J. Davison
Online Inverse Linear Optimization: Efficient Logarithmic-Regret Algorithm, Robustness to Suboptimality, and Lower Bound
Shinsaku Sakaue, Taira Tsuchiya, Han Bao et al.
NEP: Autoregressive Image Editing via Next Editing Token Prediction
Huimin Wu, Xiaojian (Shawn) Ma, Haozhe Zhao et al.
Toward Robust Neural Reconstruction from Sparse Point Sets
Amine Ouasfi, Shubhendu Jena, Eric Marchand et al.
Lorentz Local Canonicalization: How to make any Network Lorentz-Equivariant
Jonas Spinner, Luigi Favaro, Peter Lippmann et al.
Just Dance with pi! A Poly-modal Inductor for Weakly-supervised Video Anomaly Detection
Snehashis Majhi, Giacomo D'Amicantonio, Antitza Dantcheva et al.
Counterfactual Evolution of Multimodal Datasets via Visual Programming
Minghe Gao, Zhongqi Yue, Wenjie Yan et al.
EnCompass: Enhancing Agent Programming with Search Over Program Execution Paths
Zhening Li, Armando Solar-Lezama, Yisong Yue et al.
Gaussian Splashing: Unified Particles for Versatile Motion Synthesis and Rendering
Yutao Feng, Xiang Feng, Yintong Shang et al.
Improving Accuracy and Calibration via Differentiated Deep Mutual Learning
Han Liu, Peng Cui, Bingning Wang et al.
Hallo3: Highly Dynamic and Realistic Portrait Image Animation with Video Diffusion Transformer
Jiahao Cui, Hui Li, Qingkun Su et al.
Improving the Transferability of Adversarial Attacks on Face Recognition with Diverse Parameters Augmentation
Fengfan Zhou, Bangjie Yin, Hefei Ling et al.
AVerImaTeC: A Dataset for Automatic Verification of Image-Text Claims with Evidence from the Web
RUI CAO, Zifeng Ding, Zhijiang Guo et al.
Reward Fine-Tuning Two-Step Diffusion Models via Learning Differentiable Latent-Space Surrogate Reward
Zhiwei Jia, Yuesong Nan, Huixi Zhao et al.
CaricatureBooth: Data-Free Interactive Caricature Generation in a Photo Booth
Zhiyu Qu, Yunqi Miao, Zhensong Zhang et al.
Revisiting 1-peer exponential graph for enhancing decentralized learning efficiency
Kenta Niwa, Yuki Takezawa, Guoqiang Zhang et al.
Iterative Foundation Model Fine-Tuning on Multiple Rewards
Pouya M. Ghari, simone sciabola, Ye Wang
ENMA: Tokenwise Autoregression for Continuous Neural PDE Operators
Armand Kassaï Koupaï, Lise Le Boudec, Louis Serrano et al.
Channel-wise Noise Scheduled Diffusion for Inverse Rendering in Indoor Scenes
JunYong Choi, Min-Cheol Sagong, SeokYeong Lee et al.
Inexact Column Generation for Bayesian Network Structure Learning via Difference-of-Submodular Optimization
Yiran Yang, Rui Chen
Discovering Fine-Grained Visual-Concept Relations by Disentangled Optimal Transport Concept Bottleneck Models
Yan Xie, Zequn Zeng, Hao Zhang et al.
Visual Anagrams Reveal Hidden Differences in Holistic Shape Processing Across Vision Models
Fenil Doshi, Thomas Fel, Talia Konkle et al.
TraffiDent: A Dataset for Understanding the Interplay Between Traffic Dynamics and Incidents
Xiaochuan Gou, Ziyue Li, Tian Lan et al.
Natural Gradient VI: Guarantees for Non-Conjugate Models
Fangyuan Sun, Ilyas Fatkhullin, Niao He
CADGrasp: Learning Contact and Collision Aware General Dexterous Grasping in Cluttered Scenes
Jiyao Zhang, Zhiyuan Ma, Tianhao Wu et al.
Embodied Scene Understanding for Vision Language Models via MetaVQA
Weizhen Wang, Chenda Duan, Zhenghao Peng et al.
Learning Flow Fields in Attention for Controllable Person Image Generation
Zijian Zhou, Shikun Liu, Xiao Han et al.
VeriLoC: Line-of-Code Level Prediction of Hardware Design Quality from Verilog Code
Raghu Vamshi Hemadri, Jitendra Bhandari, Andre Nakkab et al.
Forming Auxiliary High-confident Instance-level Loss to Promote Learning from Label Proportions
Tianhao Ma, Han Chen, Juncheng Hu et al.
Towards Open-Vocabulary Audio-Visual Event Localization
Jinxing Zhou, Dan Guo, Ruohao Guo et al.
ADU: Adaptive Detection of Unknown Categories in Black-Box Domain Adaptation
Yushan Lai, Guowen Li, Haoyuan Liang et al.
Coeff-Tuning: A Graph Filter Subspace View for Tuning Attention-Based Large Models
Zichen Miao, WEI CHEN, Qiang Qiu
MVBoost: Boost 3D Reconstruction with Multi-View Refinement
Xiangyu Liu, Xiaomei Zhang, Zhiyuan Ma et al.
Semi-supervised Vertex Hunting, with Applications in Network and Text Analysis
Yicong Jiang, Zheng Tracy Ke
Adaptive Discretization for Consistency Models
Jiayu Bai, Zhanbo Feng, Zhijie Deng et al.
TGA: True-to-Geometry Avatar Dynamic Reconstruction
Bo Guo, Sijia Wen, Ziwei Wang et al.
CARL: A Framework for Equivariant Image Registration
Hastings Greer, Lin Tian, François-Xavier Vialard et al.
SONAR: Long-Range Graph Propagation Through Information Waves
Alessandro Trenta, Alessio Gravina, Davide Bacciu
Online Locally Differentially Private Conformal Prediction via Binary Inquiries
Qiangqiang Zhang, Chenfei Gu, Xinwei Feng et al.
AeroGen: Enhancing Remote Sensing Object Detection with Diffusion-Driven Data Generation
Datao Tang, Xiangyong Cao, Xuan Wu et al.
UniSTD: Towards Unified Spatio-Temporal Learning across Diverse Disciplines
Chen Tang, Xinzhu Ma, Encheng Su et al.
FedFree: Breaking Knowledge-sharing Barriers through Layer-wise Alignment in Heterogeneous Federated Learning
Haizhou Du, Yiran Xiang, Yiwen Cai et al.
RENO: Real-Time Neural Compression for 3D LiDAR Point Clouds
Kang You, Tong Chen, Dandan Ding et al.
Self-Supervised Discovery of Neural Circuits in Spatially Patterned Neural Responses with Graph Neural Networks
Kijung Yoon
Structured Reinforcement Learning for Combinatorial Decision-Making
Heiko Hoppe, Léo Baty, Louis Bouvier et al.
MaskGaussian: Adaptive 3D Gaussian Representation from Probabilistic Masks
Yifei Liu, Zhihang Zhong, Yifan Zhan et al.
Forensics Adapter: Adapting CLIP for Generalizable Face Forgery Detection
Xinjie Cui, Yuezun Li, Ao Luo et al.