Most Cited 2025 "neural-symbolic implementations" Papers
22,274 papers found • Page 6 of 112
Conference
Vevo: Controllable Zero-Shot Voice Imitation with Self-Supervised Disentanglement
Xueyao Zhang, Xiaohui Zhang, Kainan Peng et al.
FastVLM: Efficient Vision Encoding for Vision Language Models
Pavan Kumar Anasosalu Vasu, Fartash Faghri, Chun-Liang Li et al.
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference under Ambiguities
Zheyuan Zhang, Fengyuan Hu, Jayjun Lee et al.
3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation
Xiao Fu, Xian Liu, Xintao WANG et al.
Towards Long-Horizon Vision-Language Navigation: Platform, Benchmark and Method
Xinshuai Song, weixing chen, Yang Liu et al.
Scaling Speech-Text Pre-training with Synthetic Interleaved Data
Aohan Zeng, Zhengxiao Du, Mingdao Liu et al.
PaPaGei: Open Foundation Models for Optical Physiological Signals
Arvind Pillai, Dimitris Spathis, Fahim Kawsar et al.
Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance
Yaxi Lu, Shenzhi Yang, Cheng Qian et al.
Diffusion Self-Distillation for Zero-Shot Customized Image Generation
Shengqu Cai, Eric Ryan Chan, Yunzhi Zhang et al.
Sparse Autoencoders Do Not Find Canonical Units of Analysis
Patrick Leask, Bart Bussmann, Michael Pearce et al.
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
Hang Yin, Xiuwei Xu, Linqing Zhao et al.
Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning
Zhiyuan Yan, Yandan Zhao, Shen Chen et al.
Towards LLM Unlearning Resilient to Relearning Attacks: A Sharpness-Aware Minimization Perspective and Beyond
Chongyu Fan, jinghan jia, Yihua Zhang et al.
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
Heming Xia, Yongqi Li, Jun Zhang et al.
SUTrack: Towards Simple and Unified Single Object Tracking
Xin Chen, Ben Kang, Wanting Geng et al.
PartField: Learning 3D Feature Fields for Part Segmentation and Beyond
Minghua Liu, Mikaela Uy, Donglai Xiang et al.
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy
Zhi Hou, Tianyi Zhang, Yuwen Xiong et al.
HD-EPIC: A Highly-Detailed Egocentric Video Dataset
Toby Perrett, Ahmad Darkhalil, Saptarshi Sinha et al.
SANA: Efficient High-Resolution Text-to-Image Synthesis with Linear Diffusion Transformers
Enze Xie, Junsong Chen, Junyu Chen et al.
Combining Induction and Transduction for Abstract Reasoning
Wen-Ding Li, Keya Hu, Carter Larsen et al.
EG4D: Explicit Generation of 4D Object without Score Distillation
Qi Sun, Zhiyang Guo, Ziyu Wan et al.
Agentic RL Scaling Law: Spontaneous Code Execution for Mathematical Problem Solving
Xinji Mai, Haotian Xu, Xing W et al.
Trajectory attention for fine-grained video motion control
Zeqi Xiao, Wenqi Ouyang, Yifan Zhou et al.
Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation
Shuo Yang, Haocheng Xi, Yilong Zhao et al.
Feynman-Kac Correctors in Diffusion: Annealing, Guidance, and Product of Experts
Marta Skreta, Tara Akhound-Sadegh, Viktor Ohanesian et al.
MoH: Multi-Head Attention as Mixture-of-Head Attention
Peng Jin, Bo Zhu, Li Yuan et al.
On Evaluating the Durability of Safeguards for Open-Weight LLMs
Xiangyu Qi, Boyi Wei, Nicholas Carlini et al.
Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance
Li Hu, wang yuan, Zhen Shen et al.
Position: Graph Learning Will Lose Relevance Due To Poor Benchmarks
Maya Bechler-Speicher, Ben Finkelshtein, Fabrizio Frasca et al.
CVE-Bench: A Benchmark for AI Agents’ Ability to Exploit Real-World Web Application Vulnerabilities
Yuxuan Zhu, Antony Kellermann, Dylan Bowman et al.
Rethinking Diffusion for Text-Driven Human Motion Generation: Redundant Representations, Evaluation, and Masked Autoregression
Zichong Meng, Yiming Xie, Xiaogang Peng et al.
Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation
Clément Chadebec, Onur Tasar, Eyal Benaroche et al.
A-Bench: Are LMMs Masters at Evaluating AI-generated Images?
Zicheng Zhang, Haoning Wu, Chunyi Li et al.
From Flatland to Space: Teaching Vision-Language Models to Perceive and Reason in 3D
Jiahui Zhang, Yurui Chen, Yueming Xu et al.
The Diffusion Duality
Subham Sekhar Sahoo, Justin Deschenaux, Aaron Gokaslan et al.
Multi-subject Open-set Personalization in Video Generation
Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace et al.
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents
Kexun Zhang, Weiran Yao, Zuxin Liu et al.
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search
Maohao Shen, Guangtao Zeng, Zhenting Qi et al.
Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective
Jorge (Zhoujun) Cheng, Shibo Hao, Tianyang Liu et al.
DEFOM-Stereo: Depth Foundation Model Based Stereo Matching
Hualie Jiang, Zhiqiang Lou, Laiyan Ding et al.
DartControl: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control
Kaifeng Zhao, Gen Li, Siyu Tang
PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting
Alex Hanson, Allen Tu, Vasu Singla et al.
Unlearning or Obfuscating? Jogging the Memory of Unlearned LLMs via Benign Relearning
Shengyuan Hu, Yiwei Fu, Steven Wu et al.
DiscoveryBench: Towards Data-Driven Discovery with Large Language Models
Bodhisattwa Prasad Majumder, Harshit Surana, Dhruv Agarwal et al.
Holmes-VAU: Towards Long-term Video Anomaly Understanding at Any Granularity
Huaxin Zhang, Xiaohao Xu, Xiang Wang et al.
3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer
Jiajun Deng, Tianyu He, Li Jiang et al.
Think Only When You Need with Large Hybrid-Reasoning Models
Lingjie Jiang, Xun Wu, Shaohan Huang et al.
How Feature Learning Can Improve Neural Scaling Laws
Blake Bordelon, Alexander Atanasov, Cengiz Pehlevan
GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction
Yuanhui Huang, Amonnut Thammatadatrakoon, Wenzhao Zheng et al.
Video-Guided Foley Sound Generation with Multimodal Controls
Ziyang Chen, Prem Seetharaman, Bryan Russell et al.
Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction
Zeren Jiang, Chuanxia Zheng, Iro Laina et al.
OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
Junbo Niu, Yifei Li, Ziyang Miao et al.
ReMA: Learning to Meta-Think for LLMs with Multi-agent Reinforcement Learning
Ziyu Wan, Yunxiang Li, Xiaoyu Wen et al.
DrVideo: Document Retrieval Based Long Video Understanding
Ziyu Ma, Chenhui Gou, Hengcan Shi et al.
FlatQuant: Flatness Matters for LLM Quantization
Yuxuan Sun, Ruikang Liu, Haoli Bai et al.
Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
Wenbin An, Feng Tian, Sicong Leng et al.
Think while You Generate: Discrete Diffusion with Planned Denoising
Sulin Liu, Juno Nam, Andrew Campbell et al.
Evaluating the Evaluator: Measuring LLMs’ Adherence to Task Evaluation Instructions
Bhuvanashree Murugadoss, Christian Poelitz, Ian Drosos et al.
DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention
Lianghui Zhu, Zilong Huang, Bencheng Liao et al.
Synthetic continued pretraining
Zitong Yang, Neil Band, Shuangping Li et al.
What is Wrong with Perplexity for Long-context Language Modeling?
Lizhe Fang, Yifei Wang, Zhaoyang Liu et al.
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation
Yiming Wang, Pei Zhang, Baosong Yang et al.
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation
Siyuan Huang, Liliang Chen, Pengfei Zhou et al.
Watermark Anything With Localized Messages
Tom Sander, Pierre Fernandez, Alain Oliviero Durmus et al.
On Scaling Up 3D Gaussian Splatting Training
Hexu Zhao, Haoyang Weng, Daohan Lu et al.
SafeArena: Evaluating the Safety of Autonomous Web Agents
Ada Tur, Nicholas Meade, Xing Han Lù et al.
Multi-Objective Evolution of Heuristic Using Large Language Model
Shunyu Yao, Fei Liu, Xi Lin et al.
OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction
Huang Huang, Fangchen Liu, Letian Fu et al.
CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility
Bojia Zi, Shihao Zhao, Xianbiao Qi et al.
Number it: Temporal Grounding Videos like Flipping Manga
Yongliang Wu, Xinting Hu, Yuyang Sun et al.
Attention with Markov: A Curious Case of Single-layer Transformers
Ashok Makkuva, Marco Bondaschi, Adway Girish et al.
Improving Pretraining Data Using Perplexity Correlations
Tristan Thrush, Christopher Potts, Tatsunori Hashimoto
Uni-Sign: Toward Unified Sign Language Understanding at Scale
Zecheng Li, Wengang Zhou, Weichao Zhao et al.
RATT: A Thought Structure for Coherent and Correct LLM Reasoning
Jinghan Zhang, Xiting Wang, Weijieying Ren et al.
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge
Angelika Romanou, Negar Foroutan, Anna Sotnikova et al.
Strong Model Collapse
Elvis Dohmatob, Yunzhen Feng, Arjun Subramonian et al.
Learning When to Think: Shaping Adaptive Reasoning in R1-Style Models via Multi-Stage RL
Songjun Tu, Jiahao Lin, Qichao Zhang et al.
SliM-LLM: Salience-Driven Mixed-Precision Quantization for Large Language Models
Wei Huang, Haotong Qin, Yangdong Liu et al.
Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting
Yu Liu, Baoxiong Jia, Ruijie Lu et al.
Improving the Diffusability of Autoencoders
Ivan Skorokhodov, Sharath Girish, Benran Hu et al.
Theory, Analysis, and Best Practices for Sigmoid Self-Attention
Jason Ramapuram, Federico Danieli, Eeshan Gunesh Dhekane et al.
Training-Free Activation Sparsity in Large Language Models
James Liu, Pragaash Ponnusamy, Tianle Cai et al.
DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
Keda Tao, Can Qin, Haoxuan You et al.
Human-Object Interaction from Human-Level Instructions
Zhen Wu, Jiaman Li, Pei Xu et al.
MaIR: A Locality- and Continuity-Preserving Mamba for Image Restoration
Boyun Li, Haiyu Zhao, Wenxin Wang et al.
LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers
Xuan Shen, Zhao Song, Yufa Zhou et al.
Align Your Flow: Scaling Continuous-Time Flow Map Distillation
Amirmojtaba Sabour, Sanja Fidler, Karsten Kreis
Restructuring Vector Quantization with the Rotation Trick
Christopher Fifty, Ronald Junkins, Dennis Duan et al.
Frequency-Aligned Knowledge Distillation for Lightweight Spatiotemporal Forecasting
Yuqi Li, Chuanguang Yang, Hansheng Zeng et al.
No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models
Seyedmorteza Sadat, Manuel Kansy, Otmar Hilliges et al.
WMAdapter: Adding WaterMark Control to Latent Diffusion Models
Hai Ci, Yiren Song, Pei Yang et al.
Persistent Pre-training Poisoning of LLMs
Yiming Zhang, Javier Rando, Ivan Evtimov et al.
Speedy-Splat: Fast 3D Gaussian Splatting with Sparse Pixels and Sparse Primitives
Alex Hanson, Allen Tu, Geng Lin et al.
Gradient Alignment in Physics-informed Neural Networks: A Second-Order Optimization Perspective
Sifan Wang, Ananyae bhartari, Bowen Li et al.
Vision-Language Models Do Not Understand Negation
Kumail Alhamoud, Shaden Alshammari, Yonglong Tian et al.
FG-CLIP: Fine-Grained Visual and Textual Alignment
Chunyu Xie, Bin Wang, Fanjing Kong et al.
UniCTokens: Boosting Personalized Understanding and Generation via Unified Concept Tokens
Ruichuan An, Sihan Yang, Renrui Zhang et al.
DeepRetrieval: Hacking Real Search Engines and Retrievers with Large Language Models via Reinforcement Learning
Pengcheng Jiang, Jiacheng Lin, Lang Cao et al.
Sequential Controlled Langevin Diffusions
Junhua Chen, Lorenz Richter, Julius Berner et al.
Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key
Zhihe Yang, Xufang Luo, Dongqi Han et al.
STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding?
Yun Li, Yiming Zhang, Tao Lin et al.
Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle
Zhenyu Tang, Junwu Zhang, Xinhua Cheng et al.
Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time
Van Yang, Xiang Yue, Vipin Chaudhary et al.
GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting
Wanshui Gan, Fang Liu, Hongbin Xu et al.
Variational Best-of-N Alignment
Afra Amini, Tim Vieira, Elliott Ash et al.
LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation
Fangxun Shu, Yue Liao, Lei Zhang et al.
SCBench: A KV Cache-Centric Analysis of Long-Context Methods
Yucheng Li, Huiqiang Jiang, Qianhui Wu et al.
Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage
Zhi Gao, Bofei Zhang, Pengxiang Li et al.
HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages
Zhilin Wang, Jiaqi Zeng, Olivier Delalleau et al.
SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration
Jianyi Wang, Zhijie Lin, Meng Wei et al.
Generalizing Verifiable Instruction Following
Valentina Pyatkin, Saumya Malik, Victoria Graf et al.
ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models
Xubing Ye, Yukang Gan, Yixiao Ge et al.
GFlow: Recovering 4D World from Monocular Video
Shizun Wang, Xingyi Yang, Qiuhong Shen et al.
SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving
Georg Hess, Carl Lindström, Maryam Fatemi et al.
Graph-constrained Reasoning: Faithful Reasoning on Knowledge Graphs with Large Language Models
Linhao Luo, Zicheng Zhao, Reza Haffari et al.
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?
Chenrui Fan, Ming Li, Lichao Sun et al.
Dynamic Diffusion Transformer
Wangbo Zhao, Yizeng Han, Jiasheng Tang et al.
SAeUron: Interpretable Concept Unlearning in Diffusion Models with Sparse Autoencoders
Bartosz Cywiński, Kamil Deja
TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models
Ziyao Shangguan, Chuhan Li, Yuxuan Ding et al.
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Size Wu, Wenwei Zhang, Lumin Xu et al.
Informed Correctors for Discrete Diffusion Models
Yixiu Zhao, Jiaxin Shi, Feng Chen et al.
Mixture of Attention Spans: Optimizing LLM Inference Efficiency with Heterogeneous Sliding-Window Lengths
Tianyu Fu, Haofeng Huang, Xuefei Ning et al.
Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community
Jiancheng Pan, Yanxing Liu, Yuqian Fu et al.
Compositional Entailment Learning for Hyperbolic Vision-Language Models
Avik Pal, Max van Spengler, Guido D'Amely di Melendugno et al.
FairMT-Bench: Benchmarking Fairness for Multi-turn Dialogue in Conversational LLMs
Zhiting Fan, Ruizhe Chen, Tianxiang Hu et al.
Preserving Diversity in Supervised Fine-Tuning of Large Language Models
Ziniu Li, Congliang Chen, Tian Xu et al.
MALT: Improving Reasoning with Multi-Agent LLM Training
Sumeet Ramesh Motwani, Chandler Smith, Rocktim Jyoti Das et al.
Flow Matching with General Discrete Paths: A Kinetic-Optimal Perspective
Neta Shaul, Itai Gat, Marton Havasi et al.
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities
Peng Xu, Wei Ping, Xianchao Wu et al.
SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning
Rui Pan, Yinwei Dai, Zhihao Zhang et al.
Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in Multimodal Large Language Models
Xin Zou, Yizhou WANG, Yibo Yan et al.
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
Assaf Ben-Kish, Itamar Zimerman, Shady Abu-Hussein et al.
From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers
Jiacheng Liu, Chang Zou, Yuanhuiyi Lyu et al.
Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator
Chaehun Shin, Jooyoung Choi, Heeseung Kim et al.
Deconstructing What Makes a Good Optimizer for Autoregressive Language Models
Rosie Zhao, Depen Morwani, David Brandfonbrener et al.
OpenCUA: Open Foundations for Computer-Use Agents
Xinyuan Wang, Bowen Wang, Dunjie Lu et al.
Which Attention Heads Matter for In-Context Learning?
Kayo Yin, Jacob Steinhardt
Transformers Provably Solve Parity Efficiently with Chain of Thought
Juno Kim, Taiji Suzuki
ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning
Xiao Yu, Baolin Peng, Vineeth Vajipey et al.
ControlAR: Controllable Image Generation with Autoregressive Models
Zongming Li, Tianheng Cheng, Shoufa Chen et al.
Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention
Shuang Wu, Youtian Lin, Feihu Zhang et al.
Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions
Michael Zhang, W. Bradley Knox, Eunsol Choi
Efficient Evolutionary Search Over Chemical Space with Large Language Models
Haorui Wang, Marta Skreta, Cher-Tian Ser et al.
COMBO: Compositional World Models for Embodied Multi-Agent Cooperation
Hongxin Zhang, Zeyuan Wang, Qiushi Lyu et al.
WorldModelBench: Judging Video Generation Models As World Models
Dacheng Li, Yunhao Fang, Yukang Chen et al.
FaceXFormer: A Unified Transformer for Facial Analysis
Kartik Narayan, Vibashan VS, Rama Chellappa et al.
Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
Tong Wu, Shujian Zhang, Kaiqiang Song et al.
Towards General Visual-Linguistic Face Forgery Detection
Ke Sun, Shen Chen, Taiping Yao et al.
SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation
Mingjie Li, Wai Man Si, Michael Backes et al.
GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-Time Alignment
Yuancheng Xu, Udari Sehwag, Alec Koppel et al.
LaMP: Language-Motion Pretraining for Motion Generation, Retrieval, and Captioning
Zhe Li, Weihao Yuan, Yisheng He et al.
ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning
Ruchika Chavhan, Da Li, Timothy Hospedales
Collapse or Thrive: Perils and Promises of Synthetic Data in a Self-Generating World
Joshua Kazdan, Rylan Schaeffer, Apratim Dey et al.
EarthDial: Turning Multi-sensory Earth Observations to Interactive Dialogues
Sagar Soni, Akshay Dudhane, Hiyam Debary et al.
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models
Zemin Huang, Zhiyang Chen, Zijun Wang et al.
Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning
Yong Liu, Zirui Zhu, Chaoyu Gong et al.
Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding
seil kang, Jinyeong Kim, Junhyeok Kim et al.
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs
Erik Daxberger, Nina Wenzel, David Griffiths et al.
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
Xuehai He, Weixi Feng, Kaizhi Zheng et al.
Rethinking LLM Unlearning Objectives: A Gradient Perspective and Go Beyond
Qizhou Wang, Jin Zhou, (Andrew) Zhanke Zhou et al.
xPatch: Dual-Stream Time Series Forecasting with Exponential Seasonal-Trend Decomposition
Artyom Stitsyuk, Jaesik Choi
VideoRoPE: What Makes for Good Video Rotary Position Embedding?
Xilin Wei, Xiaoran Liu, Yuhang Zang et al.
AgentStudio: A Toolkit for Building General Virtual Agents
Longtao Zheng, Zhiyuan Huang, Zhenghai Xue et al.
CreatiLayout: Siamese Multimodal Diffusion Transformer for Creative Layout-to-Image Generation
Hui Zhang, Dexiang Hong, Yitong Wang et al.
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation
Zekun Qi, Wenyao Zhang, Yufei Ding et al.
Attentive Eraser: Unleashing Diffusion Model’s Object Removal Potential via Self-Attention Redirection Guidance
Wenhao Sun, Xue-Mei Dong, Benlei Cui et al.
Process vs. Outcome Reward: Which is Better for Agentic RAG Reinforcement Learning
Wenlin Zhang, Xiangyang Li, Kuicai Dong et al.
Time-VLM: Exploring Multimodal Vision-Language Models for Augmented Time Series Forecasting
Siru Zhong, Weilin Ruan, Ming Jin et al.
VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge
Vishwesh Nath, Wenqi Li, Dong Yang et al.
IDOL: Instant Photorealistic 3D Human Creation from a Single Image
Yiyu Zhuang, Jiaxi Lv, Hao Wen et al.
Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
Mantas Mazeika, Xuwang Yin, Rishub Tamirisa et al.
ManipTrans: Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning
Kailin Li, Puhao Li, Tengyu Liu et al.
PAD: Personalized Alignment of LLMs at Decoding-time
Ruizhe Chen, Xiaotian Zhang, Meng Luo et al.
ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative Modeling of Human-Object Interactions
Jeonghwan Kim, Jisoo Kim, Jeonghyeon Na et al.
Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion
Chaodong Xiao, Minghan Li, zhengqiang ZHANG et al.
Read, Watch and Scream! Sound Generation from Text and Video
Yujin Jeong, Yunji Kim, Sanghyuk Chun et al.
StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models
Yunzhi Yan, Zhen Xu, Haotong Lin et al.
SparseFlex: High-Resolution and Arbitrary-Topology 3D Shape Modeling
Xianglong He, Zi-Xin Zou, Chia Hao Chen et al.
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
Yongxin Guo, Zhenglin Cheng, Xiaoying Tang et al.
Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
Rui Qian, Shuangrui Ding, Xiaoyi Dong et al.
Causal Prompting: Debiasing Large Language Model Prompting Based on Front-Door Adjustment
Congzhi Zhang, Linhai Zhang, Jialong Wu et al.
CogCoM: A Visual Language Model with Chain-of-Manipulations Reasoning
Ji Qi, Ming Ding, Weihan Wang et al.
Transolver++: An Accurate Neural Solver for PDEs on Million-Scale Geometries
HUAKUN LUO, Haixu Wu, Hang Zhou et al.
Diffusion Bridge Implicit Models
Kaiwen Zheng, Guande He, Jianfei Chen et al.
Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws
Yiding Jiang, Allan Zhou, Zhili Feng et al.
MuMA-ToM: Multi-modal Multi-Agent Theory of Mind
Haojun Shi, Suyu Ye, Xinyu Fang et al.
Selective Forgetting: Advancing Machine Unlearning Techniques and Evaluation in Language Models
Lingzhi Wang, Xingshan Zeng, Jinsong Guo et al.
FreeVS: Generative View Synthesis on Free Driving Trajectory
Qitai Wang, Lue Fan, Yuqi Wang et al.
SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
Hao Chen, Ze Wang, Xiang Li et al.
AutoEval Done Right: Using Synthetic Data for Model Evaluation
Pierre Boyeau, Anastasios Angelopoulos, Tianle Li et al.
Do LLM Agents Have Regret? A Case Study in Online Learning and Games
Chanwoo Park, Xiangyu Liu, Asuman Ozdaglar et al.
Learning to Route LLMs with Confidence Tokens
Yu-Neng Chuang, Prathusha Sarma, Parikshit Gopalan et al.
Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale
Bowen Jiang, Zhuoqun Hao, Young Min Cho et al.
AeroGen: Enhancing Remote Sensing Object Detection with Diffusion-Driven Data Generation
Datao Tang, Xiangyong Cao, Xuan Wu et al.
TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In-Context Learning
Andreas Auer, Patrick Podest, Daniel Klotz et al.
MAGIC: Generating Self-Correction Guideline for In-Context Text-to-SQL
Arian Askari, Christian Poelitz, Xinye Tang
SINGAPO: Single Image Controlled Generation of Articulated Parts in Objects
Jiayi Liu, Denys Iliash, Angel Chang et al.
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts
Peng Jin, Bo Zhu, Yuan Li et al.
Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering
Ziyu Zhao, tao shen, Didi Zhu et al.
LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models
Shenghao Fu, Qize Yang, Qijie Mo et al.