Most Cited 2025 "modular reasoning pipeline" Papers
22,274 papers found • Page 13 of 112
Conference
Rope to Nope and Back Again: A New Hybrid Attention Strategy
Bowen Yang, Bharat Venkitesh, Dwaraknath Gnaneshwar Talupuru et al.
Do as We Do, Not as You Think: the Conformity of Large Language Models
Zhiyuan Weng, Guikun Chen, Wenguan Wang
Revelio: Interpreting and leveraging semantic information in diffusion models
Dahye Kim, Xavier Thomas, Deepti Ghadiyaram
Breaking the Low-Rank Dilemma of Linear Attention
Qihang Fan, Huaibo Huang, Ran He
Occlusion-Embedded Hybrid Transformer for Light Field Super-Resolution
Zeyu Xiao, Zhuoyuan Li, Wei Jia
Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards
Zijing Hu, Fengda Zhang, Long Chen et al.
HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos
Jinglei Zhang, Jiankang Deng, Chao Ma et al.
Forking Paths in Neural Text Generation
Eric Bigelow, Ari Holtzman, Hidenori Tanaka et al.
Intervening Anchor Token: Decoding Strategy in Alleviating Hallucinations for MLLMs
Barrett Tang, Zile Huang, Chengzhi Liu et al.
Taming Teacher Forcing for Masked Autoregressive Video Generation
Deyu Zhou, Quan Sun, Yuang Peng et al.
Nova: Generative Language Models for Assembly Code with Hierarchical Attention and Contrastive Learning
Nan Jiang, Chengxiao Wang, Kevin Liu et al.
Text2PDE: Latent Diffusion Models for Accessible Physics Simulation
Anthony Zhou, Zijie Li, Michael Schneier et al.
Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models
Mateusz Pach, Shyamgopal Karthik, Quentin Bouniot et al.
Inference-Time Alignment of Diffusion Models with Direct Noise Optimization
Zhiwei Tang, Jiangweizhi Peng, Jiasheng Tang et al.
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?
Letitia Parcalabescu, Anette Frank
KAN-AD: Time Series Anomaly Detection with Kolmogorov–Arnold Networks
Quan Zhou, Changhua Pei, Fei Sun et al.
Fantastic Targets for Concept Erasure in Diffusion Models and Where To Find Them
Anh Bui, Thuy-Trang Vu, Long Vuong et al.
Sort-free Gaussian Splatting via Weighted Sum Rendering
Qiqi Hou, Randall Rauwendaal, Zifeng Li et al.
OmniGuard: Hybrid Manipulation Localization via Augmented Versatile Deep Image Watermarking
Xuanyu Zhang, Zecheng Tang, Zhipei Xu et al.
Learning Continually by Spectral Regularization
Alex Lewandowski, Michał Bortkiewicz, Saurabh Kumar et al.
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
ziang yan, Zhilin Li, Yinan He et al.
SynCity: Training-Free Generation of 3D Cities
Paul Engstler, Aleksandar Shtedritski, Iro Laina et al.
WonderTurbo: Generating Interactive 3D World in 0.72 Seconds
Chaojun Ni, Xiaofeng Wang, Zheng Zhu et al.
InsViE-1M: Effective Instruction-based Video Editing with Elaborate Dataset Construction
Yuhui WU, Liyi Chen, Ruibin Li et al.
You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs
Yihong Luo, Xiaolong Chen, Xinghua Qu et al.
UniK3D: Universal Camera Monocular 3D Estimation
Luigi Piccinelli, Christos Sakaridis, Mattia Segu et al.
SynFlowNet: Design of Diverse and Novel Molecules with Synthesis Constraints
Miruna Cretu, Charles Harris, Ilia Igashov et al.
InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation
Yuchi Wang, Junliang Guo, Jianhong Bai et al.
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
Yicheng Xiao, Lin Song, Yukang Chen et al.
From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications
Ajay Jaiswal, Yifan Wang, Lu Yin et al.
{$\tau$}-bench: A Benchmark for \underline{T}ool-\underline{A}gent-\underline{U}ser Interaction in Real-World Domains
Shunyu Yao, Noah Shinn, Pedram Razavi et al.
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up
Songhua Liu, Zhenxiong Tan, Xinchao Wang
Rethinking Light Decoder-based Solvers for Vehicle Routing Problems
Ziwei Huang, Jianan Zhou, Zhiguang Cao et al.
ShiftwiseConv: Small Convolutional Kernel with Large Kernel Effect
Dachong Li, li li, zhuangzhuang chen et al.
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
Wanhua Li, Renping Zhou, Jiawei Zhou et al.
FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction
Jiale Xu, Shenghua Gao, Ying Shan
Spectral Motion Alignment for Video Motion Transfer Using Diffusion Models
Geon Yeong Park, Hyeonho Jeong, Sang Wan Lee et al.
Capturing the Temporal Dependence of Training Data Influence
Jiachen (Tianhao) Wang, Dawn Song, James Y Zou et al.
Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation
Nicolas Dufour, Vicky Kalogeiton, David Picard et al.
CAD-Recode: Reverse Engineering CAD Code from Point Clouds
Danila Rukhovich, Elona Dupont, Dimitrios Mallis et al.
MoonCast: High-Quality Zero-Shot Podcast Generation
Zeqian Ju, Dongchao Yang, Shen Kai et al.
Generative Omnimatte: Learning to Decompose Video into Layers
Yao-Chih Lee, Erika Lu, Sarah Rumbley et al.
TabDPT: Scaling Tabular Foundation Models on Real Data
Junwei Ma, Valentin Thomas, Rasa Hosseinzadeh et al.
Commit0: Library Generation from Scratch
Wenting Zhao, Nan Jiang, Celine Lee et al.
SELF-EVOLVED REWARD LEARNING FOR LLMS
Chenghua Huang, Zhizhen Fan, Lu Wang et al.
DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture
Qianlong Xiang, Miao Zhang, Yuzhang Shang et al.
Block Verification Accelerates Speculative Decoding
Ziteng Sun, Uri Mendlovic, Yaniv Leviathan et al.
Multi-Domain Graph Foundation Models: Robust Knowledge Transfer via Topology Alignment
Shuo Wang, Bokui Wang, Zhixiang Shen et al.
TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting
Bojun Xiong, Jialun Liu, JiaKui Hu et al.
Universal Length Generalization with Turing Programs
Kaiying Hou, David Brandfonbrener, Sham Kakade et al.
TAPNext: Tracking Any Point (TAP) as Next Token Prediction
Artem Zholus, Carl Doersch, Yi Yang et al.
Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models
Hulingxiao He, Geng Li, Zijun Geng et al.
Look Inside for More: Internal Spatial Modality Perception for 3D Anomaly Detection
Hanzhe Liang, Guoyang Xie, Chengbin Hou et al.
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning
Yue Liu, Shengfang Zhai, Mingzhe Du et al.
Progress or Regress? Self-Improvement Reversal in Post-training
Ting Wu, Xuefeng Li, Pengfei Liu
Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection
Kaiqing Lin, Yuzhen Lin, Weixiang Li et al.
Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence
Shangbin Feng, Zifeng Wang, Yike Wang et al.
From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data
Zheyang Xiong, Vasilis Papageorgiou, Kangwook Lee et al.
Segmenting Maxillofacial Structures in CBCT Volumes
Federico Bolelli, Kevin Marchesini, Niels van Nistelrooij et al.
AGLLDiff: Guiding Diffusion Models Towards Unsupervised Training-free Real-world Low-light Image Enhancement
Yunlong Lin, Tian Ye, Sixiang Chen et al.
Feat2GS: Probing Visual Foundation Models with Gaussian Splatting
Yue Chen, Xingyu Chen, Anpei Chen et al.
Score Forgetting Distillation: A Swift, Data-Free Method for Machine Unlearning in Diffusion Models
Tianqi Chen, Shujian Zhang, Mingyuan Zhou
Streamlining Redundant Layers to Compress Large Language Models
Xiaodong Chen, Yuxuan Hu, Jing Zhang et al.
RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization
Hanyang Zhao, Genta Winata, Anirban Das et al.
Effective Interplay between Sparsity and Quantization: From Theory to Practice
Simla Harma, Ayan Chakraborty, Elizaveta Kostenok et al.
PersonalVideo: High ID-Fidelity Video Customization without Dynamic and Semantic Degradation
Hengjia Li, Haonan Qiu, Shiwei Zhang et al.
Temporal Query Network for Efficient Multivariate Time Series Forecasting
Shengsheng Lin, Haojun Chen, Haijie Wu et al.
3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting
Qihang Zhang, Yinghao Xu, Chaoyang Wang et al.
The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training
Fabian Schaipp, Alexander Hägele, Adrien Taylor et al.
Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Models
Simon Schrodi, David T. Hoffmann, Max Argus et al.
QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead
Amir Zandieh, Majid Daliri, Insu Han
Rethinking Transformer-Based Blind-Spot Network for Self-Supervised Image Denoising
Junyi Li, Zhilu Zhang, Wangmeng Zuo
Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations
Li Hao, He CAO, Bin Feng et al.
Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video Object Segmentation
Shaofei Huang, Rui Ling, Hongyu Li et al.
The Crucial Role of Samplers in Online Direct Preference Optimization
Ruizhe Shi, Runlong Zhou, Simon Du
BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics
Lukas Rauch, Raphael Schwinger, Moritz Wirth et al.
KBQA-o1: Agentic Knowledge Base Question Answering with Monte Carlo Tree Search
Haoran Luo, Haihong E, Yikai Guo et al.
EVEv2: Improved Baselines for Encoder-Free Vision-Language Models
Haiwen Diao, Xiaotong Li, Yufeng Cui et al.
NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation
Zhiyuan Liu, Yanchen Luo, Han Huang et al.
ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations
Tianming Liang, Kun-Yu Lin, Chaolei Tan et al.
Accelerating neural network training: An analysis of the AlgoPerf competition
Priya Kasimbeg, Frank Schneider, Runa Eschenhagen et al.
Aligning Vision to Language: Annotation-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning
Junming Liu, Siyuan Meng, Yanting Gao et al.
Mellow: a small audio language model for reasoning
Soham Deshmukh, Satvik Dixit, Rita Singh et al.
CrAM: Credibility-Aware Attention Modification in LLMs for Combating Misinformation in RAG
Boyi Deng, Wenjie Wang, Fengbin Zhu et al.
SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation
Yining Hong, Beide Liu, Maxine Wu et al.
Adversaries Can Misuse Combinations of Safe Models
Erik Jones, Anca Dragan, Jacob Steinhardt
Learning to Reason for Long-Form Story Generation
Alexander Gurung, Mirella Lapata
ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models
Chao Zeng, Songwei Liu, Yusheng Xie et al.
MaskGaussian: Adaptive 3D Gaussian Representation from Probabilistic Masks
Yifei Liu, Zhihang Zhong, Yifan Zhan et al.
A Controlled Study on Long Context Extension and Generalization in LLMs
Yi Lu, Jing Nathan Yan, Songlin Yang et al.
Hydra-NeXt: Robust Closed-Loop Driving with Open-Loop Training
Zhenxin Li, Shihao Wang, Shiyi Lan et al.
Design Principle Transfer in Neural Architecture Search via Large Language Models
Xun Zhou, Xingyu Wu, Liang Feng et al.
Learn from Downstream and Be Yourself in Multimodal Large Language Models Fine-Tuning
Wenke Huang, Jian Liang, Zekun Shi et al.
COAT: Compressing Optimizer states and Activations for Memory-Efficient FP8 Training
Haocheng Xi, Han Cai, Ligeng Zhu et al.
Benchmarking Predictive Coding Networks -- Made Simple
Luca Pinchetti, Chang Qi, Oleh Lokshyn et al.
Bilinear MLPs enable weight-based mechanistic interpretability
Michael Pearce, Thomas Dooms, Alice Rigg et al.
AE-NeRF: Augmenting Event-Based Neural Radiance Fields for Non-ideal Conditions and Larger Scenes
Chaoran Feng, Wangbo Yu, Xinhua Cheng et al.
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Kim Sung-Bin, Oh Hyun-Bin, Lee Jung-Mok et al.
Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding
Xiaoyi Zhang, Zhaoyang Jia, Zongyu Guo et al.
Neighboring Autoregressive Modeling for Efficient Visual Generation
Yefei He, Yuanyu He, Shaoxuan He et al.
Scaling Inference-Time Search with Vision Value Model for Improved Visual Comprehension
Xiyao Wang, Zhengyuan Yang, Linjie Li et al.
MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal Controls
Yuxuan Bian, Ailing Zeng, Xuan Ju et al.
Factored-NeuS: Reconstructing Surfaces, Illumination, and Materials of Possibly Glossy Objects
Yue Fan, Ningjing Fan, Ivan Skorokhodov et al.
Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning
Feng Chen, Allan Raventós, Nan Cheng et al.
Optimizing Anytime Reasoning via Budget Relative Policy Optimization
Penghui Qi, Zichen Liu, Tianyu Pang et al.
RLVR-World: Training World Models with Reinforcement Learning
Jialong Wu, Shaofeng Yin, Ningya Feng et al.
Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks
Lehan Wang, Haonan Wang, Honglong Yang et al.
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts
Junmo Kang, Leonid Karlinsky, Hongyin Luo et al.
Interpreting CLIP with Hierarchical Sparse Autoencoders
Vladimir Zaigrajew, Hubert Baniecki, Przemysław Biecek
Sparse Autoencoders for Hypothesis Generation
Rajiv Movva, Kenny Peng, Nikhil Garg et al.
Provable Benefit of Annealed Langevin Monte Carlo for Non-log-concave Sampling
Wei Guo, Molei Tao, Yongxin Chen
PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages
Priyanshu Kumar, Devansh Jain, Akhila Yerukola et al.
Automated Benchmark Generation for Repository-Level Coding Tasks
Konstantinos Vergopoulos, Mark Müller, Martin Vechev
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training
Jierun Chen, Dongting Hu, Xijie Huang et al.
GenFusion: Closing the Loop between Reconstruction and Generation via Videos
Sibo Wu, Congrong Xu, Binbin Huang et al.
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
Orr Zohar, Xiaohan Wang, Yonatan Bitton et al.
Softmax is not Enough (for Sharp Size Generalisation)
Petar Veličković, Christos Perivolaropoulos, Federico Barbero et al.
MatExpert: Decomposing Materials Discovery By Mimicking Human Experts
Qianggang Ding, Santiago Miret, Bang Liu
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion
Haonan Qiu, Shiwei Zhang, Yujie Wei et al.
KGARevion: An AI Agent for Knowledge-Intensive Biomedical QA
Xiaorui Su, Yibo Wang, Shanghua Gao et al.
WebGen-Bench: Evaluating LLMs on Generating Interactive and Functional Websites from Scratch
Zimu Lu, Yunqiao Yang, Houxing Ren et al.
Zero-shot forecasting of chaotic systems
Yuanzhao Zhang, William Gilpin
Sensitivity-Aware Amortized Bayesian Inference
Lasse Elsemüller, Hans Olischläger, Marvin Schmitt et al.
NerfBaselines: Consistent and Reproducible Evaluation of Novel View Synthesis Methods
Jonas Kulhanek, Torsten Sattler
Tool-Planner: Task Planning with Clusters across Multiple Tools
Yanming Liu, Xinyue Peng, Jiannan Cao et al.
Scalable Image Tokenization with Index Backpropagation Quantization
Fengyuan Shi, Zhuoyan Luo, Yixiao Ge et al.
MarS: a Financial Market Simulation Engine Powered by Generative Foundation Model
Junjie Li, Yang Liu, Weiqing Liu et al.
Reinforce LLM Reasoning through Multi-Agent Reflection
Yurun Yuan, Tengyang Xie
POI-Enhancer: An LLM-based Semantic Enhancement Framework for POI Representation Learning
Jiawei Cheng, Jingyuan Wang, Yichuan Zhang et al.
AlphaForge: A Framework to Mine and Dynamically Combine Formulaic Alpha Factors
Hao Shi, Weili Song, Xinting Zhang et al.
Efficiently Scaling LLM Reasoning Programs with Certaindex
Yichao Fu, Junda Chen, Siqi Zhu et al.
MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning Segmentation
Donggon Jang, Yucheol Cho, Suin Lee et al.
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Zhisheng Zhong, Chengyao Wang, Yuqi Liu et al.
TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba
Xiaowen Ma, Zhen-Liang Ni, Xinghao Chen
Emergent Symbolic Mechanisms Support Abstract Reasoning in Large Language Models
Yukang Yang, Declan Campbell, Kaixuan Huang et al.
DashGaussian: Optimizing 3D Gaussian Splatting in 200 Seconds
Youyu Chen, Junjun Jiang, Kui Jiang et al.
Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution
Haiyan Zhao, Heng Zhao, Bo Shen et al.
Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation
Zilyu Ye, Zhiyang Chen, Tiancheng Li et al.
CarPlanner: Consistent Auto-regressive Trajectory Planning for Large-Scale Reinforcement Learning in Autonomous Driving
Dongkun Zhang, Jiaming Liang, Ke Guo et al.
DynaPrompt: Dynamic Test-Time Prompt Tuning
Zehao Xiao, Shilin Yan, Jack Hong et al.
Learning to Discretize Denoising Diffusion ODEs
Vinh Tong, Trung-Dung Hoang, Anji Liu et al.
Subobject-level Image Tokenization
Delong Chen, Samuel Cahyawijaya, Jianfeng Liu et al.
NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments
Xuan Yao, Junyu Gao, Changsheng Xu
Pre-training Auto-regressive Robotic Models with 4D Representations
Dantong Niu, Yuvan Sharma, Haoru Xue et al.
Instant3dit: Multiview Inpainting for Fast Editing of 3D Objects
Amir Barda, Matheus Gadelha, Vladimir G. Kim et al.
Robotouille: An Asynchronous Planning Benchmark for LLM Agents
Gonzalo Gonzalez-Pumariega, Leong Yean, Neha Sunkara et al.
LumiNet: Latent Intrinsics Meets Diffusion Models for Indoor Scene Relighting
Xiaoyan Xing, Konrad Groh, Sezer Karaoglu et al.
Cross-View Completion Models are Zero-shot Correspondence Estimators
Honggyu An, Jin Hyeon Kim, Seonghoon Park et al.
Memory Layers at Scale
Vincent-Pierre Berges, Barlas Oğuz, Daniel HAZIZA et al.
Mitigating Overthinking in Large Reasoning Models via Manifold Steering
Yao Huang, Huanran Chen, Shouwei Ruan et al.
Two-stream Beats One-stream: Asymmetric Siamese Network for Efficient Visual Tracking
Jiawen Zhu, Huayi Tang, Xin Chen et al.
Design Principles and Challenges for Gaze + Pinch Interaction in XR
Ken Pfeuffer, Hans Gellersen, Mar Gonzalez-Franco
AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation
Yukang Cao, Liang Pan, Kai Han et al.
SAFE: Multitask Failure Detection for Vision-Language-Action Models
Qiao Gu, Yuanliang Ju, Shengxiang Sun et al.
Target Concrete Score Matching: A Holistic Framework for Discrete Diffusion
Ruixiang Zhang, Shuangfei Zhai, Yizhe Zhang et al.
Combating Multimodal LLM Hallucination via Bottom-Up Holistic Reasoning
Shengqiong Wu, Hao Fei, Liangming Pan et al.
Mechanism Design for LLM Fine-tuning with Multiple Reward Models
Haoran Sun, Yurong Chen, Siwei Wang et al.
CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset
Xiao Wang, Fuling Wang, Yuehang Li et al.
Model Equality Testing: Which Model is this API Serving?
Irena Gao, Percy Liang, Carlos Guestrin
Hash3D: Training-free Acceleration for 3D Generation
Xingyi Yang, Songhua Liu, Xinchao Wang
DriveGPT4-V2: Harnessing Large Language Model Capabilities for Enhanced Closed-Loop Autonomous Driving
Zhenhua Xu, Yan Bai, Yujia Zhang et al.
Auditing $f$-differential privacy in one run
Saeed Mahloujifar, Luca Melis, Kamalika Chaudhuri
A Rainbow in Deep Network Black Boxes
Florentin Guth, Brice Ménard, Gaspar Rochette et al.
Adaptive Message Passing: A General Framework to Mitigate Oversmoothing, Oversquashing, and Underreaching
Federico Errica, Henrik Christiansen, Viktor Zaverkin et al.
Magnetic Preference Optimization: Achieving Last-iterate Convergence for Language Model Alignment
Mingzhi Wang, Chengdong Ma, Qizhi Chen et al.
SensorLM: Learning the Language of Wearable Sensors
Yuwei Zhang, Kumar Ayush, Siyuan Qiao et al.
Systematic Outliers in Large Language Models
Yongqi An, Xu Zhao, Tao Yu et al.
Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation
Jiaming Zhou, Teli Ma, Kun-Yu Lin et al.
Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning
Simran Kaur, Simon Park, Anirudh Goyal et al.
Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?
xueru wen, Jie Lou, Yaojie Lu et al.
ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large Language Models
Hao Yin, Guangzong Si, Zilei Wang
Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly
Yexin Liu, Zhengyang Liang, Yueze Wang et al.
Flexible and Efficient Grammar-Constrained Decoding
Kanghee Park, Timothy Zhou, Loris D'Antoni
Emoji Attack: Enhancing Jailbreak Attacks Against Judge LLM Detection
Zhipeng Wei, Yuqi Liu, N. Benjamin Erichson
ITFormer: Bridging Time Series and Natural Language for Multi-Modal QA with Large-Scale Multitask Dataset
Yilin Wang, Peixuan Lei, Jie Song et al.
Influence-Guided Diffusion for Dataset Distillation
Mingyang Chen, Jiawei Du, Bo Huang et al.
Wavelet Diffusion Neural Operator
Peiyan Hu, Rui Wang, Xiang Zheng et al.
T2SG: Traffic Topology Scene Graph for Topology Reasoning in Autonomous Driving
Changsheng Lv, Mengshi Qi, Liang Liu et al.
Online Reward-Weighted Fine-Tuning of Flow Matching with Wasserstein Regularization
Jiajun Fan, Shuaike Shen, Chaoran Cheng et al.
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
Xinchen Zhang, Ling Yang, Guohao Li et al.
MMAD: A Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection
Xi Jiang, Jian Li, Hanqiu Deng et al.
Perm: A Parametric Representation for Multi-Style 3D Hair Modeling
Chengan He, Xin Sun, Zhixin Shu et al.
Score as Action: Fine Tuning Diffusion Generative Models by Continuous-time Reinforcement Learning
Hanyang Zhao, Haoxian Chen, Ji Zhang et al.
LangTime: A Language-Guided Unified Model for Time Series Forecasting with Proximal Policy Optimization
Wenzhe Niu, Zongxia Xie, Yanru Sun et al.
FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
Hang Hua, Qing Liu, Lingzhi Zhang et al.
Aioli: A Unified Optimization Framework for Language Model Data Mixing
Mayee Chen, Michael Hu, Nicholas Lourie et al.
Unveiling Visual Perception in Language Models: An Attention Head Analysis Approach
Jing Bi, Lianggong Bruce Wen, Zhang Liu et al.
4D-VLA: Spatiotemporal Vision-Language-Action Pretraining with Cross-Scene Calibration
Jiahui Zhang, Yurui Chen, Yueming Xu et al.
Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation and Reconstruction
Yuanhao Cai, He Zhang, Kai Zhang et al.
RoboScape: Physics-informed Embodied World Model
Yu Shang, Xin Zhang, Yinzhou Tang et al.
S^3cMath: Spontaneous Step-Level Self-Correction Makes Large Language Models Better Mathematical Reasoners
Yuchen Yan, Jin Jiang, Yang Liu et al.
Highly Efficient Self-Adaptive Reward Shaping for Reinforcement Learning
Haozhe Ma, Zhengding Luo, Thanh Vinh Vo et al.
Decompositional Neural Scene Reconstruction with Generative Diffusion Prior
Junfeng Ni, Yu Liu, Ruijie Lu et al.
Digital Twin Catalog: A Large-Scale Photorealistic 3D Object Digital Twin Dataset
Zhao Dong, Ka chen, Zhaoyang Lv et al.
Is Artificial Intelligence Generated Image Detection a Solved Problem?
Ziqiang Li, Jiazhen Yan, Ziwen He et al.
Incentivizing Dual Process Thinking for Efficient Large Language Model Reasoning
Xiaoxue Cheng, Junyi Li, Zhenduo Zhang et al.
VideoShield: Regulating Diffusion-based Video Generation Models via Watermarking
Runyi Hu, Jie Zhang, Yiming Li et al.
Exploring the Limits of Vision-Language-Action Manipulation in Cross-task Generalization
Jiaming Zhou, Ke Ye, Jiayi Liu et al.
Adversarial Reasoning at Jailbreaking Time
Mahdi Sabbaghi, Paul Kassianik, George Pappas et al.
Navigating Neural Space: Revisiting Concept Activation Vectors to Overcome Directional Divergence
Frederik Pahde, Maximilian Dreyer, Moritz Weckbecker et al.