Most Cited 2025 "degree-corrected block model" Papers
22,274 papers found • Page 16 of 112
Conference
Provable Convergence and Limitations of Geometric Tempering for Langevin Dynamics
Omar Chehab, Anna Korba, Austin Stromme et al.
On Temperature Scaling and Conformal Prediction of Deep Classifiers
Lahav Dabah, Tom Tirer
On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Bingrui Li, Wei Huang, Andi Han et al.
SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments
Yue Cao, Yun Xing, Jie Zhang et al.
FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Generalized Segmentation
Dong Zhao, Jinlong Li, Shuang Wang et al.
WeatherGFM: Learning a Weather Generalist Foundation Model via In-context Learning
Xiangyu Zhao, Zhiwang Zhou, Wenlong Zhang et al.
Pre-Training Graph Neural Networks on Molecules by Using Subgraph-Conditioned Graph Information Bottleneck
Van Thuy Hoang, O-Joun Lee
Self-attention-based Diffusion Model for Time-series Imputation in Partial Blackout Scenarios
Mohammad Rafid Ul Islam, Prasad Tadepalli, Alan Fern
Scaling Laws for Differentially Private Language Models
Ryan McKenna, Yangsibo Huang, Amer Sinha et al.
Monet: Mixture of Monosemantic Experts for Transformers
Jungwoo Park, Young Jin Ahn, Kee-Eung Kim et al.
(Mis)Fitting Scaling Laws: A Survey of Scaling Law Fitting Techniques in Deep Learning
Margaret Li, Sneha Kudugunta, Luke Zettlemoyer
$\gamma-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Yaxin Luo, Gen Luo, Jiayi Ji et al.
I Can Hear You: Selective Robust Training for Deepfake Audio Detection
Zirui Zhang, Wei Hao, Aroon Sankoh et al.
Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval
Arun Reddy, Alexander Martin, Eugene Yang et al.
Repo2Run: Automated Building Executable Environment for Code Repository at Scale
Ruida Hu, Chao Peng, XinchenWang et al.
OS-ATLAS: Foundation Action Model for Generalist GUI Agents
Zhiyong Wu, Zhenyu Wu, Fangzhi Xu et al.
Perplexity Trap: PLM-Based Retrievers Overrate Low Perplexity Documents
Haoyu Wang, Sunhao Dai, Haiyuan Zhao et al.
EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-Judge
Ruskin Raj Manku, Yuzhi Tang, Xingjian Shi et al.
Post-pre-training for Modality Alignment in Vision-Language Foundation Models
Shin'ya Yamaguchi, Dewei Feng, Sekitoshi Kanai et al.
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Miran Heo, Min-Hung Chen, De-An Huang et al.
Flow-Based Policy for Online Reinforcement Learning
Lei Lv, Yunfei Li, Yu Luo et al.
Breaking Free from MMI: A New Frontier in Rationalization by Probing Input Utilization
Wei Liu, Zhiying Deng, Zhongyu Niu et al.
Random-Set Neural Networks
Shireen Kudukkil Manchingal, Muhammad Mubashar, Kaizheng Wang et al.
Group Distributionally Robust Dataset Distillation with Risk Minimization
Saeed Vahidian, Mingyu Wang, Jianyang Gu et al.
Constrained Fair and Efficient Allocations
Benjamin Cookson, Soroush Ebadian, Nisarg Shah
Synthesizing Privacy-Preserving Text Data via Finetuning *without* Finetuning Billion-Scale LLMs
Bowen Tan, Zheng Xu, Eric Xing et al.
SuperPC: A Single Diffusion Model for Point Cloud Completion, Upsampling, Denoising, and Colorization
Yi Du, Zhipeng Zhao, Shaoshu Su et al.
From Poses to Identity: Training-Free Person Re-Identification via Feature Centralization
Chao Yuan, Guiwei Zhang, Changxiao Ma et al.
Accurate and Regret-Aware Numerical Problem Solver for Tabular Question Answering
Yuxiang Wang, Jianzhong Qi, Junhao Gan
Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation
Gao Peng, Le Zhuo, Dongyang Liu et al.
K-Sort Arena: Efficient and Reliable Benchmarking for Generative Models via K-wise Human Preferences
Zhikai Li, Xuewen Liu, Dongrong Joe Fu et al.
ScribbleLight: Single Image Indoor Relighting with Scribbles
Jun Myeong Choi, Annie N. Wang, Pieter Peers et al.
MUST: The First Dataset and Unified Framework for Multispectral UAV Single Object Tracking
Haolin Qin, Tingfa Xu, Tianhao Li et al.
BrainUICL: An Unsupervised Individual Continual Learning Framework for EEG Applications
Yangxuan Zhou, Sha Zhao, Jiquan Wang et al.
Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?
Simon Park, Abhishek Panigrahi, Yun Cheng et al.
PlanarSplatting: Accurate Planar Surface Reconstruction in 3 Minutes
Bin Tan, Rui Yu, Yujun Shen et al.
When Do LLMs Help With Node Classification? A Comprehensive Analysis
Xixi Wu, Yifei Shen, Fangzhou Ge et al.
Perspective-Invariant 3D Object Detection
Alan Liang, Lingdong Kong, Dongyue Lu et al.
PreciseCam: Precise Camera Control for Text-to-Image Generation
Edurne Bernal-Berdun, Ana Serrano, Belen Masia et al.
Measuring what Matters: Construct Validity in Large Language Model Benchmarks
Andrew M. Bean, Ryan Othniel Kearns, Angelika Romanou et al.
Online Video Understanding: OVBench and VideoChat-Online
Zhenpeng Huang, Xinhao Li, Jiaqi Li et al.
Realistic Evaluation of Deep Partial-Label Learning Algorithms
Wei Wang, Dong-Dong Wu, Jindong Wang et al.
Can Classic GNNs Be Strong Baselines for Graph-level Tasks? Simple Architectures Meet Excellence
Yuankai Luo, Lei Shi, Xiao-Ming Wu
All-in-One: Transferring Vision Foundation Models into Stereo Matching
Jingyi Zhou, Haoyu Zhang, Jiakang Yuan et al.
HyperGLM: HyperGraph for Video Scene Graph Generation and Anticipation
Trong-Thuan Nguyen, Pha Nguyen, Jackson Cothren et al.
Objective drives the consistency of representational similarity across datasets
Laure Ciernik, Lorenz Linhardt, Marco Morik et al.
TraF-Align: Trajectory-aware Feature Alignment for Asynchronous Multi-agent Perception
Zhiying Song, Lei Yang, Fuxi Wen et al.
Fast Training of Sinusoidal Neural Fields via Scaling Initialization
Taesun Yeom, Sangyoon Lee, Jaeho Lee
HyperGS: Hyperspectral 3D Gaussian Splatting
Christopher Thirgood, Oscar Mendez, Erin Chao Ling et al.
HybridGS: High-Efficiency Gaussian Splatting Data Compression using Dual-Channel Sparse Representation and Point Cloud Encoder
Qi Yang, Le Yang, Geert Van der Auwera et al.
KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation
Antoni Bigata Casademunt, Michał Stypułkowski, Rodrigo Mira et al.
You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning
Ayan Sengupta, Siddhant Chaudhary, Tanmoy Chakraborty
GIFT: Unlocking Full Potential of Labels in Distilled Dataset at Near-zero Cost
Xinyi Shang, Peng Sun, Tao Lin
Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained Control
Hejia Chen, Haoxian Zhang, Shoulong Zhang et al.
PDE-Transformer: Efficient and Versatile Transformers for Physics Simulations
Benjamin Holzschuh, Qiang Liu, Georg Kohl et al.
AlphaZero Neural Scaling and Zipf's Law: a Tale of Board Games and Power Laws
Oren Neumann, Claudius Gros
LeFusion: Controllable Pathology Synthesis via Lesion-Focused Diffusion Models
Hantao Zhang, Yuhe Liu, Jiancheng Yang et al.
IDEATOR: Jailbreaking and Benchmarking Large Vision-Language Models Using Themselves
Ruofan Wang, Juncheng Li, Yixu Wang et al.
HiMoR: Monocular Deformable Gaussian Reconstruction with Hierarchical Motion Representation
Yiming Liang, Tianhan Xu, Yuta Kikuchi
Learning Robust Spectral Dynamics for Temporal Domain Generalization
En Yu, Jie Lu, Xiaoyu Yang et al.
ND-SDF: Learning Normal Deflection Fields for High-Fidelity Indoor Reconstruction
Ziyu Tang, Weicai Ye, Yifan Wang et al.
Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Generation
Zheng Anlin, Xin Wen, Xuanyang Zhang et al.
Zero-Shot Styled Text Image Generation, but Make It Autoregressive
Vittorio Pippi, Fabio Quattrini, Silvia Cascianelli et al.
Multi-modal brain encoding models for multi-modal stimuli
SUBBA REDDY OOTA, Khushbu Pahwa, mounika marreddy et al.
Multimodal Latent Diffusion Model for Complex Sewing Pattern Generation
Shengqi Liu, Yuhao Cheng, Zhuo Chen et al.
MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models
Chejian Xu, Jiawei Zhang, Zhaorun Chen et al.
The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
Denis Sutter, Julian Minder, Thomas Hofmann et al.
MegActor-Sigma: Unlocking Flexible Mixed-Modal Control in Portrait Animation with Diffusion Transformer
Shurong Yang, Huadong Li, Juhao Wu et al.
Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks
Shikai Qiu, Lechao Xiao, Andrew Wilson et al.
FixTalk: Taming Identity Leakage for High-Quality Talking Head Generation in Extreme Cases
Shuai Tan, Bill Gong, Bin Ji et al.
UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents
Han Xiao, Guozhi Wang, Yuxiang Chai et al.
LITA-GS: Illumination-Agnostic Novel View Synthesis via Reference-Free 3D Gaussian Splatting and Physical Priors
Han Zhou, Wei Dong, Jun Chen
Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences
Nikos Dimitriadis, Pascal Frossard, François Fleuret
Attention-Driven GUI Grounding: Leveraging Pretrained Multimodal Large Language Models Without Fine-Tuning
Hai-Ming Xu, Qi Chen, Lei Wang et al.
Do Large Language Models Truly Understand Geometric Structures?
Xiaofeng Wang, Yiming Wang, Wenhong Zhu et al.
Knowledge-Aligned Counterfactual-Enhancement Diffusion Perception for Unsupervised Cross-Domain Visual Emotion Recognition
Wen Yin, Yong Wang, Guiduo Duan et al.
StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
Xin Ding, Hao Wu, Yifan Yang et al.
DreamDistribution: Learning Prompt Distribution for Diverse In-distribution Generation
Brian Nlong Zhao, Yuhang Xiao, Jiashu Xu et al.
Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization
Yamato Arai, Yuma Ichikawa
Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel
Zun Wang, Jialu Li, Yicong Hong et al.
LIBA: Language Instructed Multi-granularity Bridge Assistant for 3D Visual Grounding
Yuan Wang, Ya-Li Li, W U Eastman Z Y et al.
QuaDiM: A Conditional Diffusion Model For Quantum State Property Estimation
Yehui Tang, Mabiao Long, Junchi Yan
ADIFF: Explaining audio difference using natural language
Soham Deshmukh, Shuo Han, Rita Singh et al.
ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation
Mengyang Wu, Yuzhi Zhao, Jialun Cao et al.
SILO: Solving Inverse Problems with Latent Operators
Ron Raphaeli, Sean Man, Michael Elad
Counterfactual Generative Modeling with Variational Causal Inference
Yulun Wu, Louis McConnell, Claudia Iriondo
Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition
Zhong Zheng, Haochen Zhang, Lingzhou Xue
SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement
Yuqi Lin, Hengjia Li, Wenqi Shao et al.
Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration
Kang Liao, Zongsheng Yue, Zhouxia Wang et al.
UnCommon Objects in 3D
Xingchen Liu, Piyush Tayal, Jianyuan Wang et al.
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)
Leander Girrbach, Stephan Alaniz, Yiran Huang et al.
SMARTIES: Spectrum-Aware Multi-Sensor Auto-Encoder for Remote Sensing Images
Gencer Sumbul, Chang Xu, Emanuele Dalsasso et al.
Physical Plausibility-aware Trajectory Prediction via Locomotion Embodiment
Hiromu Taketsugu, Takeru Oba, Takahiro Maeda et al.
Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse
Arthur Jacot, Peter Súkeník, Zihan Wang et al.
OLinear: A Linear Model for Time Series Forecasting in Orthogonally Transformed Domain
Wenzhen Yue, Yong Liu, Hao Wang et al.
Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization
Luca Masserano, Abdul Fatir Ansari, Boran Han et al.
Diff3DS: Generating View-Consistent 3D Sketch via Differentiable Curve Rendering
Yibo Zhang, Lihong Wang, Changqing Zou et al.
Anyattack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models
Jiaming Zhang, Junhong Ye, Xingjun Ma et al.
GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling
Jialong Zhou, Lichao Wang, Xiao Yang
Relieving Universal Label Noise for Unsupervised Visible-Infrared Person Re-Identification by Inferring from Neighbors
Xiao Teng, Long Lan, Dingyao Chen et al.
Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment
Yifan Zhang, Ge Zhang, Yue Wu et al.
RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance
Chengrui Wang, Pengfei Liu, Min Zhou et al.
Dual Prompting Image Restoration with Diffusion Transformers
Dehong Kong, Fan Li, Zhixin Wang et al.
Transition Path Sampling with Improved Off-Policy Training of Diffusion Path Samplers
Kiyoung Seong, Seonghyun Park, Seonghwan Kim et al.
RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards
jingnan zheng, Xiangtian Ji, Yijun Lu et al.
Deep Nonlinear Sufficient Dimension Reduction
Yinfeng Chen, Yuling Jiao, Rui Qiu et al.
AutoOcc: Automatic Open-Ended Semantic Occupancy Annotation via Vision-Language Guided Gaussian Splatting
Xiaoyu Zhou, Jingqi Wang, Yongtao Wang et al.
How Transformers Learn Structured Data: Insights From Hierarchical Filtering
Jerome Garnier-Brun, Marc Mezard, Emanuele Moscato et al.
SAM2Object: Consolidating View Consistency via SAM2 for Zero-Shot 3D Instance Segmentation
Jihuai Zhao, Junbao Zhuo, Jiansheng Chen et al.
Fast Summation of Radial Kernels via QMC Slicing
Johannes Hertrich, Tim Jahn, Michael Quellmalz
Diffusion models for Gaussian distributions: Exact solutions and Wasserstein errors
Emile Pierret, Bruno Galerne
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
Jiashuo Yu, Yue Wu, Meng Chu et al.
HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model
Tao Wang, Changxu Cheng, Lingfeng Wang et al.
SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation
Koichi Saito, Dongjun Kim, Takashi Shibuya et al.
Test-time Adaptation for Cross-modal Retrieval with Query Shift
Haobin Li, Peng Hu, Qianjun Zhang et al.
MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation
Jiaxin Huang, Runnan Chen, Ziwen Li et al.
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models
Liyan Tang, Grace Kim, Xinyu Zhao et al.
Vanish into Thin Air: Cross-prompt Universal Adversarial Attacks for SAM2
Ziqi Zhou, Yifan Hu, Yufei Song et al.
DepthCues: Evaluating Monocular Depth Perception in Large Vision Models
Duolikun Danier, Mehmet Aygun, Changjian Li et al.
UAVScenes: A Multi-Modal Dataset for UAVs
Sijie Wang, Siqi Li, Yawei Zhang et al.
Make Me Happier: Evoking Emotions Through Image Diffusion Models
Qing Lin, Jingfeng Zhang, YEW-SOON ONG et al.
Multi-Focus Image Fusion via Explicit Defocus Blur Modelling
Yuhui Quan, Xi Wan, Zitao Tang et al.
Beyond the convexity assumption: Realistic tabular data generation under quantifier-free real linear constraints
Mihaela Stoian, Eleonora Giunchiglia
HELM: Hierarchical Encoding for mRNA Language Modeling
Mehdi Yazdani-Jahromi, Mangal Prakash, Tommaso Mansi et al.
Markov Persuasion Processes: Learning to Persuade From Scratch
Francesco Bacchiocchi, Francesco Emanuele Stradi, Matteo Castiglioni et al.
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
Da Xiao, Qingye Meng, Shengping Li et al.
Mimic In-Context Learning for Multimodal Tasks
Yuchu Jiang, Jiale Fu, chenduo hao et al.
Rapidly Adapting Policies to the Real-World via Simulation-Guided Fine-Tuning
Patrick Yin, Tyler Westenbroek, Ching-An Cheng et al.
MagCache: Fast Video Generation with Magnitude-Aware Cache
Zehong Ma, Longhui Wei, Feng Wang et al.
DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image
Qingxuan Wu, Zhiyang Dou, Sirui Xu et al.
Quadratic Gaussian Splatting: High Quality Surface Reconstruction with Second-order Geometric Primitives
ziyu zhang, Binbin Huang, Hanqing Jiang et al.
Quality-Driven Curation of Remote Sensing Vision-Language Data via Learned Scoring Models
Dilxat Muhtar, Enzhuo Zhang, Zhenshi Li et al.
VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language Models
Muchao Ye, Weiyang Liu, Pan He
PMQ-VE: Progressive Multi-Frame Quantization for Video Enhancement
ZhanFeng Feng, Long Peng, Xin Di et al.
Seurat: From Moving Points to Depth
Seokju Cho, Gabriel Huang, Seungryong Kim et al.
MIB: A Mechanistic Interpretability Benchmark
Aaron Mueller, Atticus Geiger, Sarah Wiegreffe et al.
ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone Generation
Angxiao Yue, Zichong Wang, Hongteng Xu
ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos
Tanveer Hannan, Md Mohaiminul Islam, Jindong Gu et al.
MP-SfM: Monocular Surface Priors for Robust Structure-from-Motion
Zador Pataki, Paul-Edouard Sarlin, Johannes Schönberger et al.
PENCIL: Long Thoughts with Short Memory
Chenxiao Yang, Nati Srebro, David McAllester et al.
Genesis: Multimodal Driving Scene Generation with Spatio-Temporal and Cross-Modal Consistency
Xiangyu Guo, Zhanqian Wu, Kaixin Xiong et al.
Self-Improving Language Models for Evolutionary Program Synthesis: A Case Study on ARC-AGI
Julien Pourcel, Cédric Colas, Pierre-Yves Oudeyer
BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems
Andy Zhang, Joey Ji, Celeste Menders et al.
PAL: Sample-Efficient Personalized Reward Modeling for Pluralistic Alignment
Daiwei Chen, Yi Chen, Aniket Rege et al.
Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training
Haicheng Wang, Chen Ju, Weixiong Lin et al.
Controllable Generation via Locally Constrained Resampling
Kareem Ahmed, Kai-Wei Chang, Guy Van den Broeck
Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies
Nadav Timor, Jonathan Mamou, Daniel Korat et al.
SurFhead: Affine Rig Blending for Geometrically Accurate 2D Gaussian Surfel Head Avatars
Jaeseong Lee, Taewoong Kang, Marcel Buehler et al.
Bayesian Test-Time Adaptation for Vision-Language Models
Lihua Zhou, Mao Ye, Shuaifeng Li et al.
Adversarially Robust Out-of-Distribution Detection Using Lyapunov-Stabilized Embeddings
Hossein Mirzaei Sadeghlou, Mackenzie Mathis
Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model
Benlin Liu, Yuhao Dong, Yiqin Wang et al.
On the Training Convergence of Transformers for In-Context Classification of Gaussian Mixtures
Wei Shen, Ruida Zhou, Jing Yang et al.
GenHancer: Imperfect Generative Models are Secretly Strong Vision-Centric Enhancers
Shijie Ma, Yuying Ge, Teng Wang et al.
Fourier Sliced-Wasserstein Embedding for Multisets and Measures
Tal Amir, Nadav Dym
SuperMat: Physically Consistent PBR Material Estimation at Interactive Rates
Yijia Hong, Yuan-Chen Guo, Ran Yi et al.
Solving Inequality Proofs with Large Language Models
Jiayi Sheng, Luna Lyu, Jikai Jin et al.
Model Provenance Testing for Large Language Models
Ivica Nikolic, Teodora Baluta, Prateek Saxena
DriveEditor: A Unified 3D Information-Guided Framework for Controllable Object Editing in Driving Scenes
Yiyuan Liang, Zhiying Yan, Liqun Chen et al.
Learning World Models for Interactive Video Generation
Taiye Chen, Xun Hu, Zihan Ding et al.
MultiADS: Defect-aware Supervision for Multi-type Anomaly Detection and Segmentation in Zero-Shot Learning
Ylli Sadikaj, Hongkuan Zhou, Lavdim Halilaj et al.
MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes
Ruijie Lu, Yixin Chen, Junfeng Ni et al.
ZoRI: Towards Discriminative Zero-Shot Remote Sensing Instance Segmentation
Shiqi Huang, Shuting He, Bihan Wen
Universal Video Temporal Grounding with Generative Multi-modal Large Language Models
Zeqian Li, Shangzhe Di, Zhonghua Zhai et al.
Discrete Codebook World Models for Continuous Control
Aidan Scannell, Mohammadreza Nakhaeinezhadfard, Kalle Kujanpää et al.
DiffCalib: Reformulating Monocular Camera Calibration as Diffusion-Based Dense Incident Map Generation
Xiankang He, Guangkai Xu, Bo Zhang et al.
WildFake: A Large-Scale and Hierarchical Dataset for AI-Generated Images Detection
Yan Hong, Jianming Feng, Haoxing Chen et al.
Aligned Better, Listen Better for Audio-Visual Large Language Models
Yuxin Guo, Shuailei Ma, Shijie Ma et al.
Theoretically Grounded Framework for LLM Watermarking: A Distribution-Adaptive Approach
Haiyun He, Yepeng Liu, Ziqiao Wang et al.
Energy-based Backdoor Defense Against Federated Graph Learning
Guancheng Wan, Zitong Shi, Wenke Huang et al.
Multi-Pair Temporal Sentence Grounding via Multi-Thread Knowledge Transfer Network
Xiang Fang, Wanlong Fang, Changshuo Wang et al.
Evaluating Large Language Models through Role-Guide and Self-Reflection: A Comparative Study
Lili Zhao, Yang Wang, Qi Liu et al.
TASAR: Transfer-based Attack on Skeletal Action Recognition
Yunfeng Diao, Baiqi Wu, Ruixuan Zhang et al.
AgentTTS: Large Language Model Agent for Test-time Compute-optimal Scaling Strategy in Complex Tasks
Fali Wang, Hui Liu, Zhenwei Dai et al.
Learning Pattern-Specific Experts for Time Series Forecasting Under Patch-level Distribution Shift
Yanru Sun, Zongxia Xie, Emadeldeen Eldele et al.
MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem
Fan LIU, Zherui Yang, Cancheng Liu et al.
Solving Video Inverse Problems Using Image Diffusion Models
Taesung Kwon, Jong Chul YE
Efficient Alternating Minimization with Applications to Weighted Low Rank Approximation
Zhao Song, Mingquan Ye, Junze Yin et al.
LATINO-PRO: LAtent consisTency INverse sOlver with PRompt Optimization
Alessio Spagnoletti, Jean Prost, Andres Almansa et al.
Queryable Prototype Multiple Instance Learning with Vision-Language Models for Incremental Whole Slide Image Classification
Jiaxiang Gou, Luping Ji, Pei Liu et al.
Rethinking Evaluation of Sparse Autoencoders through the Representation of Polysemous Words
Gouki Gouki, Hiroki Furuta, Yusuke Iwasawa et al.
MP-GUI: Modality Perception with MLLMs for GUI Understanding
Ziwei Wang, Weizhi Chen, Leyang Yang et al.
Graph Generative Pre-trained Transformer
Xiaohui Chen, Yinkai Wang, JIAXING HE et al.
GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization
Yirui Chen, Xudong Huang, Quan Zhang et al.
Correcting Deviations from Normality: A Reformulated Diffusion Model for Multi-Class Unsupervised Anomaly Detection
Farzad Beizaee, Gregory A. Lodygensky, Christian Desrosiers et al.
BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models
Zenghui Yuan, Jiawen Shi, Pan Zhou et al.
STAA-SNN: Spatial-Temporal Attention Aggregator for Spiking Neural Networks
Tianqing Zhang, Kairong Yu, Xian Zhong et al.
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
Zhengfeng Lai, Vasileios Saveris, Chen Chen et al.
Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation
Chanyoung Kim, Dayun Ju, Woojung Han et al.
PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution
Zhu Li Bo, Jianze Li, Haotong Qin et al.
Deformable Radial Kernel Splatting
Yihua Huang, Mingxian Lin, Yangtian Sun et al.
RMP-SAM: Towards Real-Time Multi-Purpose Segment Anything
Shilin Xu, Haobo Yuan, Qingyu Shi et al.
Guided Diffusion Sampling on Function Spaces with Applications to PDEs
Jiachen Yao, Abbas Mammadov, Julius Berner et al.
Fine-Tuning Visual Autogressive Models for Subject-Driven Generation
Jiwoo Chung, Sangeek Hyun, Hyunjun Kim et al.
DiffGAD: A Diffusion-based Unsupervised Graph Anomaly Detector
Jinghan Li, Yuan Gao, Jinda Lu et al.
InverseCoder: Self-improving Instruction-Tuned Code LLMs with Inverse-Instruct
Yutong Wu, Di Huang, Wenxuan Shi et al.
InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment
Yunhong Lu, Qichao Wang, Hengyuan Cao et al.
Diffusion Tree Sampling: Scalable inference‑time alignment of diffusion models
Vineet Jain, Kusha Sareen, Mohammad Pedramfar et al.
Advancing Expert Specialization for Better MoE
Hongcan Guo, Haolang Lu, Guoshun Nan et al.
Distilling Monocular Foundation Model for Fine-grained Depth Completion
Yingping Liang, Yutao Hu, Wenqi Shao et al.
Multimodal Quantitative Language for Generative Recommendation
Jianyang Zhai, Zi-Feng Mai, Chang-Dong Wang et al.