Most Cited 2025 "multi-prompt generation" Papers
22,274 papers found • Page 8 of 112
Conference
Matrix3D: Large Photogrammetry Model All-in-One
Yuanxun Lu, Jingyang Zhang, Tian Fang et al.
VMem: Consistent Interactive Video Scene Generation with Surfel-Indexed View Memory
Runjia Li, Philip Torr, Andrea Vedaldi et al.
Training on the Benchmark Is Not All You Need
Shiwen Ni, Xiangtao Kong, Chengming Li et al.
PosterMaker: Towards High-Quality Product Poster Generation with Accurate Text Rendering
Yifan Gao, Zihang Lin, Chuanbin Liu et al.
Reinforced Lifelong Editing for Language Models
Zherui Li, Houcheng Jiang, Hao Chen et al.
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning
Yiwu Zhong, Zhuoming Liu, Yin Li et al.
Densely Connected Parameter-Efficient Tuning for Referring Image Segmentation
Jiaqi Huang, Zunnan Xu, Ting Liu et al.
Improving Semantic Understanding in Speech Language Models via Brain-tuning
Omer Moussa, Dietrich Klakow, Mariya Toneva
Open-YOLO 3D: Towards Fast and Accurate Open-Vocabulary 3D Instance Segmentation
Mohamed el amine Boudjoghra, Angela Dai, Jean Lahoud et al.
Is In-Context Learning Sufficient for Instruction Following in LLMs?
Hao Zhao, Maksym Andriushchenko, francesco croce et al.
Prometheus: 3D-Aware Latent Diffusion Models for Feed-Forward Text-to-3D Scene Generation
Yuanbo Yang, Jiahao Shao, Xinyang Li et al.
SpikeLLM: Scaling up Spiking Neural Network to Large Language Models via Saliency-based Spiking
Xingrun Xing, Boyan Gao, Zheng Liu et al.
Towards a General Time Series Anomaly Detector with Adaptive Bottlenecks and Dual Adversarial Decoders
Qichao Shentu, Beibu Li, Kai Zhao et al.
A Transfer Attack to Image Watermarks
Yuepeng Hu, Zhengyuan Jiang, Moyang Guo et al.
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
Gleb Rodionov, Roman Garipov, Alina Shutova et al.
HyperFree: A Channel-adaptive and Tuning-free Foundation Model for Hyperspectral Remote Sensing Imagery
Jingtao Li, Yingyi Liu, XINYU WANG et al.
Inference-Time Scaling for Flow Models via Stochastic Generation and Rollover Budget Forcing
Jaihoon Kim, Taehoon Yoon, Jisung Hwang et al.
Hierarchical World Models as Visual Whole-Body Humanoid Controllers
Nick Hansen, Jyothir S V, Vlad Sobal et al.
Mind2Web 2: Evaluating Agentic Search with Agent-as-a-Judge
Boyu Gou, Zanming Huang, Yuting Ning et al.
Multi-Sensor Object Anomaly Detection: Unifying Appearance, Geometry, and Internal Properties
wenqiao Li, BoZhong Zheng, Xiaohao Xu et al.
How Do Large Language Models Understand Graph Patterns? A Benchmark for Graph Pattern Comprehension
Xinnan Dai, Haohao QU, Yifei Shen et al.
MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models
Mohammad Shahab Sepehri, Zalan Fabian, Maryam Soltanolkotabi et al.
{$\tau$}-bench: A Benchmark for \underline{T}ool-\underline{A}gent-\underline{U}ser Interaction in Real-World Domains
Shunyu Yao, Noah Shinn, Pedram Razavi et al.
Parallel Scaling Law for Language Models
Mouxiang Chen, Binyuan Hui, Zeyu Cui et al.
Establishing Task Scaling Laws via Compute-Efficient Model Ladders
Akshita Bhagia, Jiacheng Liu, Alexander Wettig et al.
Video Depth without Video Models
Bingxin Ke, Dominik Narnhofer, Shengyu Huang et al.
Rope to Nope and Back Again: A New Hybrid Attention Strategy
Bowen Yang, Bharat Venkitesh, Dwaraknath Gnaneshwar Talupuru et al.
WonderTurbo: Generating Interactive 3D World in 0.72 Seconds
Chaojun Ni, Xiaofeng Wang, Zheng Zhu et al.
Basis Sharing: Cross-Layer Parameter Sharing for Large Language Model Compression
Jingcun Wang, Yu-Guang Chen, Ing-Chao Lin et al.
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up
Songhua Liu, Zhenxiong Tan, Xinchao Wang
Trusted Unified Feature-Neighborhood Dynamics for Multi-View Classification
Haojian Huang, Chuanyu Qin, Zhe Liu et al.
SynCity: Training-Free Generation of 3D Cities
Paul Engstler, Aleksandar Shtedritski, Iro Laina et al.
Streaming DiLoCo with overlapping communication
Arthur Douillard, Yani Donchev, J Keith Rush et al.
SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device
Yushu Wu, Zhixing Zhang, Yanyu Li et al.
FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities
Jin Wang, Yao Lai, Aoxue Li et al.
SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video
Jongmin Park, Minh-Quan Viet Bui, Juan Luis Gonzalez Bello et al.
TopV: Compatible Token Pruning with Inference Time Optimization for Fast and Low-Memory Multimodal Vision Language Model
Cheng Yang, Yang Sui, Jinqi Xiao et al.
Instant Gaussian Stream: Fast and Generalizable Streaming of Dynamic Scene Reconstruction via Gaussian Splatting
Jinbo Yan, Rui Peng, Zhiyan Wang et al.
V2XPnP: Vehicle-to-Everything Spatio-Temporal Fusion for Multi-Agent Perception and Prediction
Zewei Zhou, Hao Xiang, Zhaoliang Zheng et al.
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation
Yifan Pu, Yiming Zhao, Zhicong Tang et al.
DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness
Yiming Zhong, Qi Jiang, Jingyi Yu et al.
OccMamba: Semantic Occupancy Prediction with State Space Models
Heng Li, Yuenan Hou, Xiaohan Xing et al.
Text2PDE: Latent Diffusion Models for Accessible Physics Simulation
Anthony Zhou, Zijie Li, Michael Schneier et al.
Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models
Andreas Müller, Denis Lukovnikov, Jonas Thietke et al.
Enhanced Contrastive Learning with Multi-view Longitudinal Data for Chest X-ray Report Generation
Kang Liu, Zhuoqi Ma, Xiaolu Kang et al.
AdaDiff: Adaptive Step Selection for Fast Diffusion Models
Hui Zhang, Zuxuan Wu, Zhen Xing et al.
OmniSAM: Omnidirectional Segment Anything Model for UDA in Panoramic Semantic Segmentation
Ding Zhong, Xu Zheng, Chenfei Liao et al.
Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples
chengqian gao, Haonan Li, Liu Liu et al.
Flow Matching with Gaussian Process Priors for Probabilistic Time Series Forecasting
Marcel Kollovieh, Marten Lienen, David Lüdke et al.
Intervening Anchor Token: Decoding Strategy in Alleviating Hallucinations for MLLMs
Barrett Tang, Zile Huang, Chengzhi Liu et al.
Modeling Complex System Dynamics with Flow Matching Across Time and Conditions
Martin Rohbeck, Edward De Brouwer, Charlotte Bunne et al.
Visual-O1: Understanding Ambiguous Instructions via Multi-modal Multi-turn Chain-of-thoughts Reasoning
Minheng Ni, YuTao Fan, Lei Zhang et al.
CubeDiff: Repurposing Diffusion-Based Image Models for Panorama Generation
Nikolai Kalischek, Michael Oechsle, Fabian Manhardt et al.
Generative Video Propagation
Shaoteng Liu, Tianyu Wang, Jui-Hsien Wang et al.
Generative Image Layer Decomposition with Visual Effects
Jinrui Yang, Qing Liu, Yijun Li et al.
D^3: Scaling Up Deepfake Detection by Learning from Discrepancy
Yongqi Yang, Zhihao Qian, Ye Zhu et al.
BioCLIP 2: Emergent Properties from Scaling Hierarchical Contrastive Learning
Jianyang Gu, Sam Stevens, Elizabeth Campolongo et al.
Reflective Gaussian Splatting
Yuxuan Yao, Zixuan Zeng, Chun Gu et al.
Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks
Michael Matthews, Michael Beukman, Chris Lu et al.
P(all-atom) Is Unlocking New Path For Protein Design
Wei Qu, Jiawei Guan, Rui Ma et al.
Occlusion-Embedded Hybrid Transformer for Light Field Super-Resolution
Zeyu Xiao, Zhuoyuan Li, Wei Jia
STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning
Marius Memmel, Jacob Berg, Bingqing Chen et al.
ReinboT: Amplifying Robot Visual-Language Manipulation with Reinforcement Learning
Hongyin Zhang, Zifeng Zhuang, Han Zhao et al.
LSNet: See Large, Focus Small
Ao Wang, Hui Chen, Zijia Lin et al.
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
Weixian Lei, Jiacong Wang, Haochen Wang et al.
Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search
Yuta Oshima, Masahiro Suzuki, Yutaka Matsuo et al.
First-Person Fairness in Chatbots
Tyna Eloundou, Alex Beutel, David Robinson et al.
Is Your Multimodal Language Model Oversensitive to Safe Queries?
Xirui Li, Hengguang Zhou, Ruochen Wang et al.
Adv-CPG: A Customized Portrait Generation Framework with Facial Adversarial Attacks
Junying Wang, Hongyuan Zhang, Yuan Yuan
MV-VTON: Multi-View Virtual Try-On with Diffusion Models
Haoyu Wang, Zhilu Zhang, Donglin Di et al.
2DMamba: Efficient State Space Model for Image Representation with Applications on Giga-Pixel Whole Slide Image Classification
Jingwei Zhang, Anh Tien Nguyen, Xi Han et al.
Self-Challenging Language Model Agents
Yifei Zhou, Sergey Levine, Jason Weston et al.
Controlling Large Language Models Through Concept Activation Vectors
Hanyu Zhang, Xiting Wang, Chengao Li et al.
STIV: Scalable Text and Image Conditioned Video Generation
Zongyu Lin, Wei Liu, Chen Chen et al.
Efficient Reinforcement Learning with Large Language Model Priors
Xue Yan, Yan Song, Xidong Feng et al.
Enhancing Chain of Thought Prompting in Large Language Models via Reasoning Patterns
Yufeng Zhang, Xuepeng Wang, Lingxiang Wu et al.
ToolDial: Multi-turn Dialogue Generation Method for Tool-Augmented Language Models
Jeonghoon Shim, Gyuhyeon Seo, Cheongsu Lim et al.
Online Preference Alignment for Language Models via Count-based Exploration
Chenjia Bai, Yang Zhang, Shuang Qiu et al.
Marigold-DC: Zero-Shot Monocular Depth Completion with Guided Diffusion
Massimiliano Viola, Kevin Qu, Nando Metzger et al.
PathFinder: A Multi-Modal Multi-Agent System for Medical Diagnostic Decision-Making Applied to Histopathology
Fatemeh Ghezloo, Saygin Seyfioglu, Rustin Soraki et al.
Temporal Reasoning Transfer from Text to Video
Lei Li, Yuanxin Liu, Linli Yao et al.
BSAFusion: A Bidirectional Stepwise Feature Alignment Network for Unaligned Medical Image Fusion
Huafeng Li, Dayong Su, Qing Cai et al.
StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching
Jixun Yao, Yang Yuguang, Yu Pan et al.
UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions
Xue zhucun, Jiangning Zhang, Teng Hu et al.
CogVLA: Cognition-Aligned Vision-Language-Action Models via Instruction-Driven Routing & Sparsification
Wei Li, Renshan Zhang, Rui Shao et al.
LogicAD: Explainable Anomaly Detection via VLM-based Text Feature Extraction
Er Jin, Qihui Feng, Yongli Mou et al.
Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient
Zigeng Chen, Xinyin Ma, Gongfan Fang et al.
Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs
Zhaowei Zhang, Fengshuo Bai, Qizhi Chen et al.
DelTA: An Online Document-Level Translation Agent Based on Multi-Level Memory
Yutong Wang, Jiali Zeng, Xuebo Liu et al.
Nemotron-CLIMB: Clustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
Shizhe Diao, Yu Yang, Yonggan Fu et al.
Pruning Large Language Models with Semi-Structural Adaptive Sparse Training
Weiyu Huang, Yuezhou Hu, Guohao Jian et al.
Taming Teacher Forcing for Masked Autoregressive Video Generation
Deyu Zhou, Quan Sun, Yuang Peng et al.
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices
Xudong LU, Yinghao Chen, chencheng Chen et al.
Learning from Reward-Free Offline Data: A Case for Planning with Latent Dynamics Models
Uladzislau Sobal, Wancong Zhang, Kyunghyun Cho et al.
Do Vision & Language Decoders use Images and Text equally? How Self-consistent are their Explanations?
Letitia Parcalabescu, Anette Frank
V2X-Radar: A Multi-modal Dataset with 4D Radar for Cooperative Perception
Lei Yang, Xinyu Zhang, Jun Li et al.
Any-Resolution AI-Generated Image Detection by Spectral Learning
Dimitrios Karageorgiou, Symeon Papadopoulos, Ioannis Kompatsiaris et al.
Framer: Interactive Frame Interpolation
Wen Wang, Qiuyu Wang, Kecheng Zheng et al.
SplatFlow: Multi-View Rectified Flow Model for 3D Gaussian Splatting Synthesis
Hyojun Go, byeongjun park, Jiho Jang et al.
Emergence of meta-stable clustering in mean-field transformer models
Giuseppe Bruno, Federico Pasqualotto, Andrea Agazzi
E(n) Equivariant Topological Neural Networks
Claudio Battiloro, Ege Karaismailoglu, Mauricio Tec et al.
Towards a Unified Copernicus Foundation Model for Earth Vision
Yi Wang, Zhitong Xiong, Chenying Liu et al.
MotionCraft: Crafting Whole-Body Motion with Plug-and-Play Multimodal Controls
Yuxuan Bian, Ailing Zeng, Xuan Ju et al.
Learning to Reason for Long-Form Story Generation
Alexander Gurung, Mirella Lapata
The Devil is in Temporal Token: High Quality Video Reasoning Segmentation
Sitong Gong, Yunzhi Zhuge, Lu Zhang et al.
Improved Video VAE for Latent Video Diffusion Model
Pingyu Wu, Kai Zhu, Yu Liu et al.
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
Yang Liu, Ming Ma, Xiaomin Yu et al.
Design Principles and Challenges for Gaze + Pinch Interaction in XR
Ken Pfeuffer, Hans Gellersen, Mar Gonzalez-Franco
Feat2GS: Probing Visual Foundation Models with Gaussian Splatting
Yue Chen, Xingyu Chen, Anpei Chen et al.
CAPTURE: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting
Atin Pothiraj, Jaemin Cho, Elias Stengel-Eskin et al.
Perturbation-Restrained Sequential Model Editing
Jun-Yu Ma, Hong Wang, Hao-Xiang Xu et al.
Mitigating Object Hallucination in MLLMs via Data-augmented Phrase-level Alignment
Pritam Sarkar, Sayna Ebrahimi, Ali Etemad et al.
LaVin-DiT: Large Vision Diffusion Transformer
Zhaoqing Wang, Xiaobo Xia, Runnan Chen et al.
Reading Your Heart: Learning ECG Words and Sentences via Pre-training ECG Language Model
Jiarui Jin, Haoyu Wang, Hongyan Li et al.
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts
Junmo Kang, Leonid Karlinsky, Hongyin Luo et al.
SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language Model
Yue Zhang, Zhiyang Xu, Ying Shen et al.
Mechanism Design for LLM Fine-tuning with Multiple Reward Models
Haoran Sun, Yurong Chen, Siwei Wang et al.
Does Thinking More Always Help? Mirage of Test-Time Scaling in Reasoning Models
Soumya Suvra Ghosal, Souradip Chakraborty, Avinash Reddy et al.
GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors
Tian-Xing Xu, Xiangjun Gao, Wenbo Hu et al.
Efficiently Scaling LLM Reasoning Programs with Certaindex
Yichao Fu, Junda Chen, Siqi Zhu et al.
Universal Length Generalization with Turing Programs
Kaiying Hou, David Brandfonbrener, Sham Kakade et al.
FlexVAR: Flexible Visual Autoregressive Modeling without Residual Prediction
Siyu Jiao, Gengwei Zhang, Yinlong Qian et al.
Zero-shot forecasting of chaotic systems
Yuanzhao Zhang, William Gilpin
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
Yicheng Xiao, Lin Song, Yukang Chen et al.
SeRL: Self-play Reinforcement Learning for Large Language Models with Limited Data
Wenkai Fang, Shunyu Liu, Yang Zhou et al.
Investigating Non-Transitivity in LLM-as-a-Judge
Yi Xu, Laura Ruis, Tim Rocktäschel et al.
COAT: Compressing Optimizer states and Activations for Memory-Efficient FP8 Training
Haocheng Xi, Han Cai, Ligeng Zhu et al.
Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards
Zijing Hu, Fengda Zhang, Long Chen et al.
Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics
Siddhant Arora, Zhiyun Lu, Chung-Cheng Chiu et al.
PolyGuard: A Multilingual Safety Moderation Tool for 17 Languages
Priyanshu Kumar, Devansh Jain, Akhila Yerukola et al.
CRANE: Reasoning with constrained LLM generation
Debangshu Banerjee, Tarun Suresh, Shubham Ugare et al.
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
ziang yan, Zhilin Li, Yinan He et al.
GameArena: Evaluating LLM Reasoning through Live Computer Games
Lanxiang Hu, Qiyu Li, Anze Xie et al.
KiVA: Kid-inspired Visual Analogies for Testing Large Multimodal Models
Eunice Yiu, Maan Qraitem, Anisa Majhi et al.
OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
Pengfei Zhou, Xiaopeng Peng, Jiajun Song et al.
KGARevion: An AI Agent for Knowledge-Intensive Biomedical QA
Xiaorui Su, Yibo Wang, Shanghua Gao et al.
Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking
You Wu, Xucheng Wang, Xiangyang Yang et al.
EVEv2: Improved Baselines for Encoder-Free Vision-Language Models
Haiwen Diao, Xiaotong Li, Yufeng Cui et al.
Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning
Gang Liu, Michael Sun, Wojciech Matusik et al.
Influence-Guided Diffusion for Dataset Distillation
Mingyang Chen, Jiawei Du, Bo Huang et al.
A Rainbow in Deep Network Black Boxes
Florentin Guth, Brice Ménard, Gaspar Rochette et al.
InstructAvatar: Text-Guided Emotion and Motion Control for Avatar Generation
Yuchi Wang, Junliang Guo, Jianhong Bai et al.
Pre-training Auto-regressive Robotic Models with 4D Representations
Dantong Niu, Yuvan Sharma, Haoru Xue et al.
Is Noise Conditioning Necessary for Denoising Generative Models?
Qiao Sun, Zhicheng Jiang, Hanhong Zhao et al.
Unveiling the Ignorance of MLLMs: Seeing Clearly, Answering Incorrectly
Yexin Liu, Zhengyang Liang, Yueze Wang et al.
Adaptive Message Passing: A General Framework to Mitigate Oversmoothing, Oversquashing, and Underreaching
Federico Errica, Henrik Christiansen, Viktor Zaverkin et al.
From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data
Zheyang Xiong, Vasilis Papageorgiou, Kangwook Lee et al.
Beyond Attention or Similarity: Maximizing Conditional Diversity for Token Pruning in MLLMs
Qizhe Zhang, Mengzhen Liu, Lichen Li et al.
CHiP: Cross-modal Hierarchical Direct Preference Optimization for Multimodal LLMs
Jinlan Fu, Shenzhen Huangfu, Hao Fei et al.
Look Inside for More: Internal Spatial Modality Perception for 3D Anomaly Detection
Hanzhe Liang, Guoyang Xie, Chengbin Hou et al.
MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D
Wei Cheng, Juncheng Mu, Xianfang Zeng et al.
Language Models Need Inductive Biases to Count Inductively
Yingshan Chang, Yonatan Bisk
Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection
Kaiqing Lin, Yuzhen Lin, Weixiang Li et al.
Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
Wenhui Tan, Jiaze Li, Jianzhong Ju et al.
Segmenting Maxillofacial Structures in CBCT Volumes
Federico Bolelli, Kevin Marchesini, Niels van Nistelrooij et al.
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation
Xinyu Yang, Yuwei An, Hongyi Liu et al.
Rethinking Transformer-Based Blind-Spot Network for Self-Supervised Image Denoising
Junyi Li, Zhilu Zhang, Wangmeng Zuo
PersonalVideo: High ID-Fidelity Video Customization without Dynamic and Semantic Degradation
Hengjia Li, Haonan Qiu, Shiwei Zhang et al.
EmoEdit: Evoking Emotions through Image Manipulation
Jingyuan Yang, Jiawei Feng, Weibin Luo et al.
Reducing Tool Hallucination via Reliability Alignment
Hongshen Xu, Zichen Zhu, Lei Pan et al.
Schedule On the Fly: Diffusion Time Prediction for Faster and Better Image Generation
Zilyu Ye, Zhiyang Chen, Tiancheng Li et al.
Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets
Zhen Liu, Tim Xiao, Weiyang Liu et al.
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Ziyu Liu, Yuhang Zang, Xiaoyi Dong et al.
Benchmarking Agentic Workflow Generation
Shuofei Qiao, Runnan Fang, Zhisong Qiu et al.
Task-driven Image Fusion with Learnable Fusion Loss
Haowen Bai, Jiangshe Zhang, Zixiang Zhao et al.
TASTE-Rob: Advancing Video Generation of Task-Oriented Hand-Object Interaction for Generalizable Robotic Manipulation
Hongxiang Zhao, Xingchen Liu, Mutian Xu et al.
On the Closed-Form of Flow Matching: Generalization Does Not Arise from Target Stochasticity
Quentin Bertrand, Anne Gagneux, Mathurin Massias et al.
Textured Gaussians for Enhanced 3D Scene Appearance Modeling
Brian Chao, Hung-Yu Tseng, Lorenzo Porzi et al.
Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?
Michael-Andrei Panaitescu-Liess, Zora Che, Bang An et al.
Gaze-LLE: Gaze Target Estimation via Large-Scale Learned Encoders
Fiona Ryan, Ajay Bati, Sangmin Lee et al.
QMambaBSR: Burst Image Super-Resolution with Query State Space Model
Xin Di, Long Peng, Peizhe Xia et al.
Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought
Yunze Man, De-An Huang, Guilin Liu et al.
Automated Proof Generation for Rust Code via Self-Evolution
Tianyu Chen, Shuai Lu, Shan Lu et al.
Robotouille: An Asynchronous Planning Benchmark for LLM Agents
Gonzalo Gonzalez-Pumariega, Leong Yean, Neha Sunkara et al.
Progress or Regress? Self-Improvement Reversal in Post-training
Ting Wu, Xuefeng Li, Pengfei Liu
REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites
Div Garg, Diego Caples, Andis Draguns et al.
DriveGPT4-V2: Harnessing Large Language Model Capabilities for Enhanced Closed-Loop Autonomous Driving
Zhenhua Xu, Yan Bai, Yujia Zhang et al.
CXPMRG-Bench: Pre-training and Benchmarking for X-ray Medical Report Generation on CheXpert Plus Dataset
Xiao Wang, Fuling Wang, Yuehang Li et al.
A Unified Approach to Routing and Cascading for LLMs
Jasper Dekoninck, Maximilian Baader, Martin Vechev
MagicMotion: Controllable Video Generation with Dense-to-Sparse Trajectory Guidance
Quanhao Li, Zhen Xing, Rui Wang et al.
Cut Your Losses in Large-Vocabulary Language Models
Erik Wijmans, Brody Huval, Alexander Hertzberg et al.
TwinMarket: A Scalable Behavioral and Social Simulation for Financial Markets
Yuzhe YANG, Yifei Zhang, Minghao Wu et al.
DesignDiffusion: High-Quality Text-to-Design Image Generation with Diffusion Models
Zhendong Wang, Jianmin Bao, Shuyang Gu et al.
CrAM: Credibility-Aware Attention Modification in LLMs for Combating Misinformation in RAG
Boyi Deng, Wenjie Wang, Fengbin Zhu et al.
Distraction is All You Need for Multimodal Large Language Model Jailbreaking
Zuopeng Yang, Jiluan Fan, Anli Yan et al.
NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments
Xuan Yao, Junyu Gao, Changsheng Xu
DeepRTL: Bridging Verilog Understanding and Generation with a Unified Representation Model
Yi Liu, Changran Xu, Yunhao Zhou et al.
Effective Interplay between Sparsity and Quantization: From Theory to Practice
Simla Harma, Ayan Chakraborty, Elizaveta Kostenok et al.
UniK3D: Universal Camera Monocular 3D Estimation
Luigi Piccinelli, Christos Sakaridis, Mattia Segu et al.
Learning Long Range Dependencies on Graphs via Random Walks
Dexiong Chen, Till Schulz, Karsten Borgwardt
Spectral Motion Alignment for Video Motion Transfer Using Diffusion Models
Geon Yeong Park, Hyeonho Jeong, Sang Wan Lee et al.
SELF-EVOLVED REWARD LEARNING FOR LLMS
Chenghua Huang, Zhizhen Fan, Lu Wang et al.
TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba
Xiaowen Ma, Zhen-Liang Ni, Xinghao Chen
CAD-Recode: Reverse Engineering CAD Code from Point Clouds
Danila Rukhovich, Elona Dupont, Dimitrios Mallis et al.
Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding
Weiyu Guo, Ziyang Chen, Shaoguang WANG et al.
You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs
Yihong Luo, Xiaolong Chen, Xinghua Qu et al.
SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models
Shuaijie Shen, Chao Wang, Renzhuo Huang et al.
Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing
Kaifeng Gao, Jiaxin Shi, Hanwang Zhang et al.
TexGaussian: Generating High-quality PBR Material via Octree-based 3D Gaussian Splatting
Bojun Xiong, Jialun Liu, JiaKui Hu et al.
Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory
Nikola Zubic, Federico Soldà, Aurelio Sulser et al.