Most Cited 2025 "nonnegative matrices" Papers
22,274 papers found • Page 8 of 112
Conference
PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering
Yifan Gao, Zihang Lin, Chuanbin Liu et al.
PromptHMR: Promptable Human Mesh Recovery
Yufu Wang, Yu Sun, Priyanka Patel et al.
3DMambaIPF: A State Space Model for Iterative Point Cloud Filtering via Differentiable Rendering
Qingyuan Zhou, Weidong Yang, Ben Fei et al.
SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining
Yue Li, Qi Ma, Runyi Yang et al.
Learning Distributions of Complex Fluid Simulations with Diffusion Graph Networks
Mario Lino, Tobias Pfaff, Nils Thuerey
Variational Diffusion Posterior Sampling with Midpoint Guidance
Badr MOUFAD, Yazid Janati el idrissi, Lisa Bedin et al.
Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems
Christian Walder, Deep Tejas Karkhanis
Self-Correcting Decoding with Generative Feedback for Mitigating Hallucinations in Large Vision-Language Models
Ce Zhang, Zifu Wan, Zhehan Kan et al.
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
Jingjing Chang, Yixiao Fang, Peng Xing et al.
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
Zihan Liu, Shuangrui Ding, Zhixiong Zhang et al.
Flow: Modularized Agentic Workflow Automation
Boye Niu, Yiliao Song, Kai Lian et al.
SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE
YONGWEI CHEN, Yushi Lan, Shangchen Zhou et al.
VisualCloze: A Universal Image Generation Framework via Visual In-Context Learning
Zhong-Yu Li, Ruoyi Du, Juncheng Yan et al.
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
Jiaqi Chen, Bang Zhang, Ruotian Ma et al.
Monitoring Latent World States in Language Models with Propositional Probes
Jiahai Feng, Stuart Russell, Jacob Steinhardt
Agent-Oriented Planning in Multi-Agent Systems
Ao LI, Yuexiang Xie, Songze Li et al.
FonTS: Text Rendering With Typography and Style Controls
Wenda SHI, Yiren Song, Dengming Zhang et al.
Towards Fast, Specialized Machine Learning Force Fields: Distilling Foundation Models via Energy Hessians
Ishan Amin, Sanjeev Raja, Aditi Krishnapriyan
Scalable Ranked Preference Optimization for Text-to-Image Generation
Shyamgopal Karthik, Huseyin Coskun, Zeynep Akata et al.
Causal Concept Graph Models: Beyond Causal Opacity in Deep Learning
Gabriele Dominici, Pietro Barbiero, Mateo Espinosa Zarlenga et al.
ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance
Shuwei Shi, Wenbo Li, Yuechen Zhang et al.
Enhancing Decision-Making for LLM Agents via Step-Level Q-Value Models
Yuanzhao Zhai, Tingkai Yang, Kele Xu et al.
GI-GS: Global Illumination Decomposition on Gaussian Splatting for Inverse Rendering
Hongze CHEN, Zehong Lin, Jun Zhang
TabPFN Unleashed: A Scalable and Effective Solution to Tabular Classification Problems
Si-Yang Liu, Han-Jia Ye
GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion
Jiapeng Tang, Davide Davoli, Tobias Kirschstein et al.
ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning
Jiaqi Liao, Zhengyuan Yang, Linjie Li et al.
Is Sarcasm Detection a Step-by-Step Reasoning Process in Large Language Models?
Ben Yao, Yazhou Zhang, Qiuchi Li et al.
Understanding Optimization in Deep Learning with Central Flows
Jeremy Cohen, Alex Damian, Ameet Talwalkar et al.
The Loss Landscape of Deep Linear Neural Networks: a Second-order Analysis
El Mehdi Achour, Francois Malgouyres, Sebastien Gerchinovitz
Heuristic-Induced Multimodal Risk Distribution Jailbreak Attack for Multimodal Large Language Models
Ma Teng, Xiaojun Jia, Ranjie Duan et al.
Diverse Preference Learning for Capabilities and Alignment
Stewart Slocum, Asher Parker-Sartori, Dylan Hadfield-Menell
A Transfer Attack to Image Watermarks
Yuepeng Hu, Zhengyuan Jiang, Moyang Guo et al.
Matrix3D: Large Photogrammetry Model All-in-One
Yuanxun Lu, Jingyang Zhang, Tian Fang et al.
VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory
Runjia Li, Philip Torr, Andrea Vedaldi et al.
Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation
Jiaqi Huang, Zunnan Xu, Ting Liu et al.
Improving Semantic Understanding in Speech Language Models via Brain-tuning
Omer Moussa, Dietrich Klakow, Mariya Toneva
Training on the Benchmark Is Not All You Need
Shiwen Ni, Xiangtao Kong, Chengming Li et al.
Reinforced Lifelong Editing for Language Models
Zherui Li, Houcheng Jiang, Hao Chen et al.
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning
Yiwu Zhong, Zhuoming Liu, Yin Li et al.
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation
Mohamed el amine Boudjoghra, Angela Dai, Jean Lahoud et al.
How do language models learn facts? Dynamics, curricula and hallucinations
Nicolas Zucchet, Jorg Bornschein, Stephanie C.Y. Chan et al.
SCANS: Mitigating the Exaggerated Safety for LLMs via Safety-Conscious Activation Steering
Zouying Cao, Yifei Yang, Hai Zhao
Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward Text-to-3D Scene Generation
Yuanbo Yang, Jiahao Shao, Xinyang Li et al.
SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking
Xingrun Xing, Boyan Gao, Zheng Liu et al.
Is In-Context Learning Sufficient for Instruction Following in LLMs?
Hao Zhao, Maksym Andriushchenko, francesco croce et al.
Towards a General Time Series Anomaly Detector with Adaptive Bottlenecks and Dual Adversarial Decoders
Qichao Shentu, Beibu Li, Kai Zhao et al.
HyperFree: A Channel-adaptive and Tuning-free Foundation Model for Hyperspectral Remote Sensing Imagery
Jingtao Li, Yingyi Liu, XINYU WANG et al.
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
Gleb Rodionov, Roman Garipov, Alina Shutova et al.
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge
Boyu Gou, Zanming Huang, Yuting Ning et al.
Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing
Jaihoon Kim, Taehoon Yoon, Jisung Hwang et al.
Establishing Task Scaling Laws via Compute-Efficient Model Ladders
Akshita Bhagia, Jiacheng Liu, Alexander Wettig et al.
Multi-Sensor Object Anomaly Detection: Unifying Appearance, Geometry, and Internal Properties
wenqiao Li, BoZhong Zheng, Xiaohao Xu et al.
Hierarchical World Models as Visual Whole-Body Humanoid Controllers
Nick Hansen, Jyothir S V, Vlad Sobal et al.
MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models
Mohammad Shahab Sepehri, Zalan Fabian, Maryam Soltanolkotabi et al.
SynCity: Training-Free Generation of 3D Cities
Paul Engstler, Aleksandar Shtedritski, Iro Laina et al.
Trusted Unified Feature-Neighborhood Dynamics for Multi-View Classification
Haojian Huang, Chuanyu Qin, Zhe Liu et al.
Parallel Scaling Law for Language Models
Mouxiang Chen, Binyuan Hui, Zeyu Cui et al.
Rope to Nope and Back Again: A New Hybrid Attention Strategy
Bowen Yang, Bharat Venkitesh, Dwaraknath Gnaneshwar Talupuru et al.
Video Depth without Video Models
Bingxin Ke, Dominik Narnhofer, Shengyu Huang et al.
Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression
Jingcun Wang, Yu-Guang Chen, Ing-Chao Lin et al.
Streaming DiLoCo with overlapping communication
Arthur Douillard, Yani Donchev, J Keith Rush et al.
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up
Songhua Liu, Zhenxiong Tan, Xinchao Wang
SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device
Yushu Wu, Zhixing Zhang, Yanyu Li et al.
V2XPnP: Vehicle-to-Everything Spatio-Temporal Fusion for Multi-Agent Perception and Prediction
Zewei Zhou, Hao Xiang, Zhaoliang Zheng et al.
SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video
Jongmin Park, Minh-Quan Viet Bui, Juan Luis Gonzalez Bello et al.
TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model
Cheng Yang, Yang Sui, Jinqi Xiao et al.
Instant Gaussian Stream: Fast and Generalizable Streaming of Dynamic Scene Reconstruction via Gaussian Splatting
Jinbo Yan, Rui Peng, Zhiyan Wang et al.
How Do Large Language Models Understand Graph Patterns? A Benchmark for Graph Pattern Comprehension
Xinnan Dai, Haohao QU, Yifei Shen et al.
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation
Yifan Pu, Yiming Zhao, Zhicong Tang et al.
DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness
Yiming Zhong, Qi Jiang, Jingyi Yu et al.
FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities
Jin Wang, Yao Lai, Aoxue Li et al.
OccMamba: Semantic Occupancy Prediction with State Space Models
Heng Li, Yuenan Hou, Xiaohan Xing et al.
WonderTurbo: Generating Interactive 3D World in 0.72 Seconds
Chaojun Ni, Xiaofeng Wang, Zheng Zhu et al.
{$\tau$}-bench: A Benchmark for \underline{T}ool-\underline{A}gent-\underline{U}ser Interaction in Real-World Domains
Shunyu Yao, Noah Shinn, Pedram Razavi et al.
Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models
Andreas Müller, Denis Lukovnikov, Jonas Thietke et al.
Text2PDE: Latent Diffusion Models for Accessible Physics Simulation
Anthony Zhou, Zijie Li, Michael Schneier et al.
Enhanced Contrastive Learning with Multi-view Longitudinal Data for Chest X-ray Report Generation
Kang Liu, Zhuoqi Ma, Xiaolu Kang et al.
Flow Matching with Gaussian Process Priors for Probabilistic Time Series Forecasting
Marcel Kollovieh, Marten Lienen, David Lüdke et al.
OmniSAM: Omnidirectional Segment Anything Model for UDA in Panoramic Semantic Segmentation
Ding Zhong, Xu Zheng, Chenfei Liao et al.
AdaDiff: Adaptive Step Selection for Fast Diffusion Models
Hui Zhang, Zuxuan Wu, Zhen Xing et al.
Modeling Complex System Dynamics with Flow Matching Across Time and Conditions
Martin Rohbeck, Edward De Brouwer, Charlotte Bunne et al.
P(all-atom) Is Unlocking New Path For Protein Design
Wei Qu, Jiawei Guan, Rui Ma et al.
Intervening Anchor Token: Decoding Strategy in Alleviating Hallucinations for MLLMs
Barrett Tang, Zile Huang, Chengzhi Liu et al.
Generative Video Propagation
Shaoteng Liu, Tianyu Wang, Jui-Hsien Wang et al.
CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation
Nikolai Kalischek, Michael Oechsle, Fabian Manhardt et al.
Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning
Minheng Ni, YuTao Fan, Lei Zhang et al.
Generative Image Layer Decomposition with Visual Effects
Jinrui Yang, Qing Liu, Yijun Li et al.
Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples
chengqian gao, Haonan Li, Liu Liu et al.
D^3: Scaling Up Deepfake Detection by Learning from Discrepancy
Yongqi Yang, Zhihao Qian, Ye Zhu et al.
Reflective Gaussian Splatting
Yuxuan Yao, Zixuan Zeng, Chun Gu et al.
Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks
Michael Matthews, Michael Beukman, Chris Lu et al.
BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning
Jianyang Gu, Sam Stevens, Elizabeth Campolongo et al.
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
Weixian Lei, Jiacong Wang, Haochen Wang et al.
Occlusion-Embedded Hybrid Transformer for Light Field Super-Resolution
Zeyu Xiao, Zhuoyuan Li, Wei Jia
First-Person Fairness in Chatbots
Tyna Eloundou, Alex Beutel, David Robinson et al.
STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning
Marius Memmel, Jacob Berg, Bingqing Chen et al.
MV-VTON: Multi-View Virtual Try-On with Diffusion Models
Haoyu Wang, Zhilu Zhang, Donglin Di et al.
ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning
Hongyin Zhang, Zifeng Zhuang, Han Zhao et al.
Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search
Yuta Oshima, Masahiro Suzuki, Yutaka Matsuo et al.
LSNet: See Large, Focus Small
Ao Wang, Hui Chen, Zijia Lin et al.
Adv-CPG: A Customized Portrait Generation Framework with Facial Adversarial Attacks
Junying Wang, Hongyuan Zhang, Yuan Yuan
Is Your Multimodal Language Model Oversensitive to Safe Queries?
Xirui Li, Hengguang Zhou, Ruochen Wang et al.
Temporal Reasoning Transfer from Text to Video
Lei Li, Yuanxin Liu, Linli Yao et al.
STIV: Scalable Text and Image Conditioned Video Generation
Zongyu Lin, Wei Liu, Chen Chen et al.
Efficient Reinforcement Learning with Large Language Model Priors
Xue Yan, Yan Song, Xidong Feng et al.
2DMamba: Efficient State Space Model for Image Representation with Applications on Giga-Pixel Whole Slide Image Classification
Jingwei Zhang, Anh Tien Nguyen, Xi Han et al.
Self-Challenging Language Model Agents
Yifei Zhou, Sergey Levine, Jason Weston et al.
Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion
Massimiliano Viola, Kevin Qu, Nando Metzger et al.
Controlling Large Language Models Through Concept Activation Vectors
Hanyu Zhang, Xiting Wang, Chengao Li et al.
ToolDial: Multi-turn Dialogue Generation Method for Tool-Augmented Language Models
Jeonghoon Shim, Gyuhyeon Seo, Cheongsu Lim et al.
Online Preference Alignment for Language Models via Count-based Exploration
Chenjia Bai, Yang Zhang, Shuang Qiu et al.
Enhancing Chain of Thought Prompting in Large Language Models via Reasoning Patterns
Yufeng Zhang, Xuepeng Wang, Lingxiang Wu et al.
PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology
Fatemeh Ghezloo, Saygin Seyfioglu, Rustin Soraki et al.
CogVLA: Cognition-Aligned Vision-Language-Action Models via Instruction-Driven Routing & Sparsification
Wei Li, Renshan Zhang, Rui Shao et al.
StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching
Jixun Yao, Yang Yuguang, Yu Pan et al.
BSAFusion: A Bidirectional Stepwise Feature Alignment Network for Unaligned Medical Image Fusion
Huafeng Li, Dayong Su, Qing Cai et al.
UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions
Xue zhucun, Jiangning Zhang, Teng Hu et al.
DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory
Yutong Wang, Jiali Zeng, Xuebo Liu et al.
Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient
Zigeng Chen, Xinyin Ma, Gongfan Fang et al.
Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs
Zhaowei Zhang, Fengshuo Bai, Qizhi Chen et al.
Nemotron-CLIMB: Clustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
Shizhe Diao, Yu Yang, Yonggan Fu et al.
Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models
Uladzislau Sobal, Wancong Zhang, Kyunghyun Cho et al.
Framer: Interactive Frame Interpolation
Wen Wang, Qiuyu Wang, Kecheng Zheng et al.
Pruning Large Language Models with Semi-Structural Adaptive Sparse Training
Weiyu Huang, Yuezhou Hu, Guohao Jian et al.
Taming Teacher Forcing for Masked Autoregressive Video Generation
Deyu Zhou, Quan Sun, Yuang Peng et al.
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices
Xudong LU, Yinghao Chen, chencheng Chen et al.
V2X-Radar: A Multi-modal Dataset with 4D Radar for Cooperative Perception
Lei Yang, Xinyu Zhang, Jun Li et al.
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?
Letitia Parcalabescu, Anette Frank
Any-Resolution AI-Generated Image Detection by Spectral Learning
Dimitrios Karageorgiou, Symeon Papadopoulos, Ioannis Kompatsiaris et al.
LogicAD: Explainable Anomaly Detection via VLM-based Text Feature Extraction
Er Jin, Qihui Feng, Yongli Mou et al.
SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis
Hyojun Go, byeongjun park, Jiho Jang et al.
Learning to Reason for Long-Form Story Generation
Alexander Gurung, Mirella Lapata
Towards a Unified Copernicus Foundation Model for Earth Vision
Yi Wang, Zhitong Xiong, Chenying Liu et al.
Emergence of meta-stable clustering in mean-field transformer models
Giuseppe Bruno, Federico Pasqualotto, Andrea Agazzi
E(n) Equivariant Topological Neural Networks
Claudio Battiloro, Ege Karaismailoglu, Mauricio Tec et al.
The Devil is in Temporal Token: High Quality Video Reasoning Segmentation
Sitong Gong, Yunzhi Zhuge, Lu Zhang et al.
MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal Controls
Yuxuan Bian, Ailing Zeng, Xuan Ju et al.
Reading Your Heart: Learning ECG Words and Sentences via Pre-training ECG Language Model
Jiarui Jin, Haoyu Wang, Hongyan Li et al.
Improved Video VAE for Latent Video Diffusion Model
Pingyu Wu, Kai Zhu, Yu Liu et al.
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts
Junmo Kang, Leonid Karlinsky, Hongyin Luo et al.
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
Yang Liu, Ming Ma, Xiaomin Yu et al.
Mitigating Object Hallucination in MLLMs via Data-augmented Phrase-level Alignment
Pritam Sarkar, Sayna Ebrahimi, Ali Etemad et al.
Feat2GS: Probing Visual Foundation Models with Gaussian Splatting
Yue Chen, Xingyu Chen, Anpei Chen et al.
CAPTURE: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting
Atin Pothiraj, Jaemin Cho, Elias Stengel-Eskin et al.
Design Principles and Challenges for Gaze + Pinch Interaction in XR
Ken Pfeuffer, Hans Gellersen, Mar Gonzalez-Franco
Perturbation-Restrained Sequential Model Editing
Jun-Yu Ma, Hong Wang, Hao-Xiang Xu et al.
Design Principle Transfer in Neural Architecture Search via Large Language Models
Xun Zhou, Xingyu Wu, Liang Feng et al.
LaVin-DiT: Large Vision Diffusion Transformer
Zhaoqing Wang, Xiaobo Xia, Runnan Chen et al.
Mechanism Design for LLM Fine-tuning with Multiple Reward Models
Haoran Sun, Yurong Chen, Siwei Wang et al.
Zero-shot forecasting of chaotic systems
Yuanzhao Zhang, William Gilpin
SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Model
Yue Zhang, Zhiyang Xu, Ying Shen et al.
GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors
Tian-Xing Xu, Xiangjun Gao, Wenbo Hu et al.
Efficiently Scaling LLM Reasoning Programs with Certaindex
Yichao Fu, Junda Chen, Siqi Zhu et al.
Universal Length Generalization with Turing Programs
Kaiying Hou, David Brandfonbrener, Sham Kakade et al.
FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction
Siyu Jiao, Gengwei Zhang, Yinlong Qian et al.
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
Yicheng Xiao, Lin Song, Yukang Chen et al.
Investigating Non-Transitivity in LLM-as-a-Judge
Yi Xu, Laura Ruis, Tim Rocktäschel et al.
Does Thinking More Always Help? Mirage of Test-Time Scaling in Reasoning Models
Soumya Suvra Ghosal, Souradip Chakraborty, Avinash Reddy et al.
SeRL: Self-play Reinforcement Learning for Large Language Models with Limited Data
Wenkai Fang, Shunyu Liu, Yang Zhou et al.
COAT: Compressing Optimizer states and Activations for Memory-Efficient FP8 Training
Haocheng Xi, Han Cai, Ligeng Zhu et al.
PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages
Priyanshu Kumar, Devansh Jain, Akhila Yerukola et al.
Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards
Zijing Hu, Fengda Zhang, Long Chen et al.
CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs
Jinlan Fu, Shenzhen Huangfu, Hao Fei et al.
Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics
Siddhant Arora, Zhiyun Lu, Chung-Cheng Chiu et al.
CRANE: Reasoning with constrained LLM generation
Debangshu Banerjee, Tarun Suresh, Shubham Ugare et al.
GameArena: Evaluating LLM Reasoning through Live Computer Games
Lanxiang Hu, Qiyu Li, Anze Xie et al.
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
ziang yan, Zhilin Li, Yinan He et al.
KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models
Eunice Yiu, Maan Qraitem, Anisa Majhi et al.
OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
Pengfei Zhou, Xiaopeng Peng, Jiajun Song et al.
EVEv2: Improved Baselines for Encoder-Free Vision-Language Models
Haiwen Diao, Xiaotong Li, Yufeng Cui et al.
Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking
You Wu, Xucheng Wang, Xiangyang Yang et al.
KGARevion: An AI Agent for Knowledge-Intensive Biomedical QA
Xiaorui Su, Yibo Wang, Shanghua Gao et al.
A Rainbow in Deep Network Black Boxes
Florentin Guth, Brice Ménard, Gaspar Rochette et al.
Influence-Guided Diffusion for Dataset Distillation
Mingyang Chen, Jiawei Du, Bo Huang et al.
InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation
Yuchi Wang, Junliang Guo, Jianhong Bai et al.
Pre-training Auto-regressive Robotic Models with 4D Representations
Dantong Niu, Yuvan Sharma, Haoru Xue et al.
Adaptive Message Passing: A General Framework to Mitigate Oversmoothing, Oversquashing, and Underreaching
Federico Errica, Henrik Christiansen, Viktor Zaverkin et al.
Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly
Yexin Liu, Zhengyang Liang, Yueze Wang et al.
Spectral Motion Alignment for Video Motion Transfer Using Diffusion Models
Geon Yeong Park, Hyeonho Jeong, Sang Wan Lee et al.
SELF-EVOLVED REWARD LEARNING FOR LLMS
Chenghua Huang, Zhizhen Fan, Lu Wang et al.
From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data
Zheyang Xiong, Vasilis Papageorgiou, Kangwook Lee et al.
Is Noise Conditioning Necessary for Denoising Generative Models?
Qiao Sun, Zhicheng Jiang, Hanhong Zhao et al.
Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning
Gang Liu, Michael Sun, Wojciech Matusik et al.
Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs
Qizhe Zhang, Mengzhen Liu, Lichen Li et al.
Look Inside for More: Internal Spatial Modality Perception for 3D Anomaly Detection
Hanzhe Liang, Guoyang Xie, Chengbin Hou et al.
MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D
Wei Cheng, Juncheng Mu, Xianfang Zeng et al.
Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection
Kaiqing Lin, Yuzhen Lin, Weixiang Li et al.
Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
Wenhui Tan, Jiaze Li, Jianzhong Ju et al.
Segmenting Maxillofacial Structures in CBCT Volumes
Federico Bolelli, Kevin Marchesini, Niels van Nistelrooij et al.
Reducing Tool Hallucination via Reliability Alignment
Hongshen Xu, Zichen Zhu, Lei Pan et al.
Language Models Need Inductive Biases to Count Inductively
Yingshan Chang, Yonatan Bisk
Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets
Zhen Liu, Tim Xiao, Weiyang Liu et al.
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Ziyu Liu, Yuhang Zang, Xiaoyi Dong et al.
PersonalVideo: High ID-Fidelity Video Customization without Dynamic and Semantic Degradation
Hengjia Li, Haonan Qiu, Shiwei Zhang et al.
EmoEdit: Evoking Emotions through Image Manipulation
Jingyuan Yang, Jiawei Feng, Weibin Luo et al.
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation
Xinyu Yang, Yuwei An, Hongyi Liu et al.
Rethinking Transformer-Based Blind-Spot Network for Self-Supervised Image Denoising
Junyi Li, Zhilu Zhang, Wangmeng Zuo
Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation
Zilyu Ye, Zhiyang Chen, Tiancheng Li et al.
Progress or Regress? Self-Improvement Reversal in Post-training
Ting Wu, Xuefeng Li, Pengfei Liu
Task-driven Image Fusion with Learnable Fusion Loss
Haowen Bai, Jiangshe Zhang, Zixiang Zhao et al.