Most Cited 2025 "temporal prior" Papers
22,274 papers found • Page 4 of 112
Conference
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL
Mohammadreza Pourreza, Shayan Talaei, Ruoxi Sun et al.
Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization
Audrey Huang, Wenhao Zhan, Tengyang Xie et al.
Towards Realistic Data Generation for Real-World Super-Resolution
Long Peng, Wenbo Li, Renjing Pei et al.
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate
Yubo Wang, Xiang Yue, Wenhu Chen
Frame-Voyager: Learning to Query Frames for Video Large Language Models
Sicheng Yu, CHENGKAI JIN, Huanyu Wang et al.
DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment
Dahyun Kang, Piotr Bojanowski, Huy V. Vo et al.
CC-OCR: A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy
Zhibo Yang, Jun Tang, Zhaohai Li et al.
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning
Yibin Wang, li zhimin, Yuhang Zang et al.
ETA: Evaluating Then Aligning Safety of Vision Language Models at Inference Time
Yi Ding, Bolian Li, Ruqi Zhang
Agents' Room: Narrative Generation through Multi-step Collaboration
Fantine Huot, Reinald Kim Amplayo, Jennimaria Palomaki et al.
WritingBench: A Comprehensive Benchmark for Generative Writing
Yuning Wu, Jiahao Mei, Ming Yan et al.
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Michael Noukhovitch, Shengyi Huang, Sophie Xhonneux et al.
Learning to Prompt with Text Only Supervision for Vision-Language Models
Muhammad Uzair Khattak, Muhammad Ferjad Naeem, Muzammal Naseer et al.
Prototypical Calibrating Ambiguous Samples for Micro-Action Recognition
Kun Li, Dan Guo, Guoliang Chen et al.
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
Yuqian Yuan, Hang Zhang, Wentong Li et al.
CG-Bench: Clue-grounded Question Answering Benchmark for Long Video Understanding
Guo Chen, Yicheng Liu, Yifei Huang et al.
MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
Yusu Qian, Hanrong Ye, Jean-Philippe Fauconnier et al.
On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
Marcus Williams, Micah Carroll, Adhyyan Narang et al.
On the expressiveness and spectral bias of KANs
Yixuan Wang, Jonathan Siegel, Ziming Liu et al.
Point-SAM: Promptable 3D Segmentation Model for Point Clouds
Yuchen Zhou, Jiayuan Gu, Tung Chiang et al.
3DTrajMaster: Mastering 3D Trajectory for Multi-Entity Motion in Video Generation
Xiao Fu, Xian Liu, Xintao WANG et al.
Scaling RL to Long Videos
Yukang Chen, Wei Huang, Baifeng Shi et al.
Video World Models with Long-term Spatial Memory
Tong Wu, Shuai Yang, Ryan Po et al.
A Distractor-Aware Memory for Visual Object Tracking with SAM2
Alan Lukezic, Jovana Videnović, Matej Kristan
HD-EPIC: A Highly-Detailed Egocentric Video Dataset
Toby Perrett, Ahmad Darkhalil, Saptarshi Sinha et al.
Don't Shake the Wheel: Momentum-Aware Planning in End-to-End Autonomous Driving
Ziying Song, Caiyan Jia, Lin Liu et al.
Theory on Mixture-of-Experts in Continual Learning
Hongbo Li, Sen Lin, Lingjie Duan et al.
AA-CLIP: Enhancing Zero-Shot Anomaly Detection via Anomaly-Aware CLIP
wenxin ma, Xu Zhang, Qingsong Yao et al.
Self-Evolving Multi-Agent Collaboration Networks for Software Development
Yue Hu, Yuzhu Cai, Yaxin Du et al.
SANA: Efficient High-Resolution Text-to-Image Synthesis with Linear Diffusion Transformers
Enze Xie, Junsong Chen, Junyu Chen et al.
Agentic RL Scaling Law: Spontaneous Code Execution for Mathematical Problem Solving
Xinji Mai, Haotian Xu, Xing W et al.
Streaming Video Question-Answering with In-context Video KV-Cache Retrieval
Shangzhe Di, Zhelun Yu, Guanghao Zhang et al.
The Surprising Effectiveness of Test-Time Training for Few-Shot Learning
Ekin Akyürek, Mehul Damani, Adam Zweiger et al.
PhysTwin: Physics-Informed Reconstruction and Simulation of Deformable Objects from Videos
Hanxiao Jiang, Hao-Yu Hsu, Kaifeng Zhang et al.
Multi-subject Open-set Personalization in Video Generation
Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace et al.
Human-inspired Episodic Memory for Infinite Context LLMs
Zafeirios Fountas, Martin A Benfeghoul, Adnan Oomerjee et al.
DEFOM-Stereo: Depth Foundation Model Based Stereo Matching
Hualie Jiang, Zhiqiang Lou, Laiyan Ding et al.
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space
Zhen Zhang, Xuehai He, Weixiang Yan et al.
A-Bench: Are LMMs Masters at Evaluating AI-generated Images?
Zicheng Zhang, Haoning Wu, Chunyi Li et al.
Trajectory attention for fine-grained video motion control
Zeqi Xiao, Wenqi Ouyang, Yifan Zhou et al.
Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning
Zhiyuan Yan, Yandan Zhao, Shen Chen et al.
Looking Inward: Language Models Can Learn About Themselves by Introspection
Felix Jedidja Binder, James Chua, Tomek Korbak et al.
PartField: Learning 3D Feature Fields for Part Segmentation and Beyond
Minghua Liu, Mikaela Uy, Donglai Xiang et al.
TestGenEval: A Real World Unit Test Generation and Test Completion Benchmark
Kush Jain, Gabriel Synnaeve, Baptiste Roziere
On the Role of Attention Heads in Large Language Model Safety
Zhenhong Zhou, Haiyang Yu, Xinghua Zhang et al.
Scaling Speech-Text Pre-training with Synthetic Interleaved Data
Aohan Zeng, Zhengxiao Du, Mingdao Liu et al.
OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation
Hui Li, Mingwang Xu, Qingkun Su et al.
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference under Ambiguities
Zheyuan Zhang, Fengyuan Hu, Jayjun Lee et al.
Diffusion Feedback Helps CLIP See Better
Wenxuan Wang, Quan Sun, Fan Zhang et al.
3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer
Jiajun Deng, Tianyu He, Li Jiang et al.
Test-time Alignment of Diffusion Models without Reward Over-optimization
Sunwoo Kim, Minkyu Kim, Dongmin Park
An Architecture Search Framework for Inference-Time Techniques
Jon Saad-Falcon, Adrian Lafuente, Shlok Natarajan et al.
Scaling Language-Free Visual Representation Learning
David Fan, Shengbang Tong, Jiachen Zhu et al.
TabArena: A Living Benchmark for Machine Learning on Tabular Data
Nick Erickson, Lennart Purucker, Andrej Tschalzev et al.
Competition Dynamics Shape Algorithmic Phases of In-Context Learning
Core Francisco Park, Ekdeep Singh Lubana, Hidenori Tanaka
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
Zhaofeng Wu, Xinyan Yu, Dani Yogatama et al.
Theory, Analysis, and Best Practices for Sigmoid Self-Attention
Jason Ramapuram, Federico Danieli, Eeshan Gunesh Dhekane et al.
SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation
Junsong Chen, Shuchen Xue, Yuyang Zhao et al.
DrVideo: Document Retrieval Based Long Video Understanding
Ziyu Ma, Chenhui Gou, Hengcan Shi et al.
OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
Junbo Niu, Yifei Li, Ziyang Miao et al.
MR-GSM8K: A Meta-Reasoning Benchmark for Large Language Model Evaluation
Zhongshen Zeng, Pengguang Chen, Shu Liu et al.
HSEvo: Elevating Automatic Heuristic Design with Diversity-Driven Harmony Search and Genetic Algorithm Using LLMs
Pham Vu Tuan Dat, Long Doan, Huynh Thi Thanh Binh
Sonata: Self-Supervised Learning of Reliable Point Representations
Xiaoyang Wu, Daniel DeTone, Duncan Frost et al.
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents
Kexun Zhang, Weiran Yao, Zuxin Liu et al.
RATT: A Thought Structure for Coherent and Correct LLM Reasoning
Jinghan Zhang, Xiting Wang, Weijieying Ren et al.
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
Heming Xia, Yongqi Li, Jun Zhang et al.
EG4D: Explicit Generation of 4D Object without Score Distillation
Qi Sun, Zhiyang Guo, Ziyu Wan et al.
Attention with Markov: A Curious Case of Single-layer Transformers
Ashok Makkuva, Marco Bondaschi, Adway Girish et al.
Reasoning Gym: Reasoning Environments for Reinforcement Learning with Verifiable Rewards
Zafir Stojanovski, Oliver Stanley, Joe Sharratt et al.
Building Interactable Replicas of Complex Articulated Objects via Gaussian Splatting
Yu Liu, Baoxiong Jia, Ruijie Lu et al.
Combining Induction and Transduction for Abstract Reasoning
Wen-Ding Li, Keya Hu, Carter Larsen et al.
DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention
Lianghui Zhu, Zilong Huang, Bencheng Liao et al.
Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification
Wenxuan Huang, Zijie Zhai, Yunhang Shen et al.
Sampling-Efficient Test-Time Scaling: Self-Estimating the Best-of-N Sampling in Early Decoding
Yiming Wang, Pei Zhang, Siyuan Huang et al.
Animate Anyone 2: High-Fidelity Character Image Animation with Environment Affordance
Li Hu, wang yuan, Zhen Shen et al.
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?
Chenrui Fan, Ming Li, Lichao Sun et al.
5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks
Dongshuo Yin, Leiyi Hu, Bin Li et al.
SG-I2V: Self-Guided Trajectory Control in Image-to-Video Generation
Koichi Namekata, Sherwin Bahmani, Ziyi Wu et al.
MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation
Zehuan Huang, Yuanchen Guo, Xingqiao An et al.
LazyDiT: Lazy Learning for the Acceleration of Diffusion Transformers
Xuan Shen, Zhao Song, Yufa Zhou et al.
Uni-Sign: Toward Unified Sign Language Understanding at Scale
Zecheng Li, Wengang Zhou, Weichao Zhao et al.
Watermark Anything With Localized Messages
Tom Sander, Pierre Fernandez, Alain Oliviero Durmus et al.
Strong Model Collapse
Elvis Dohmatob, Yunzhen Feng, Arjun Subramonian et al.
Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation
Clément Chadebec, Onur Tasar, Eyal Benaroche et al.
GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting
Wanshui Gan, Fang Liu, Hongbin Xu et al.
FastVLM: Efficient Vision Encoding for Vision Language Models
Pavan Kumar Anasosalu Vasu, Fartash Faghri, Chun-Liang Li et al.
Exploiting Multimodal Spatial-temporal Patterns for Video Object Tracking
Xiantao Hu, Ying Tai, Xu Zhao et al.
CoCoCo: Improving Text-Guided Video Inpainting for Better Consistency, Controllability and Compatibility
Bojia Zi, Shihao Zhao, Xianbiao Qi et al.
EVER: Exact Volumetric Ellipsoid Rendering for Real-time View Synthesis
Alexander Mai, Peter Hedman, George Kopanas et al.
GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction
Yuanhui Huang, Amonnut Thammatadatrakoon, Wenzhao Zheng et al.
VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging
Yufan He, Pengfei Guo, Yucheng Tang et al.
Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective
Jorge (Zhoujun) Cheng, Shibo Hao, Tianyang Liu et al.
PaPaGei: Open Foundation Models for Optical Physiological Signals
Arvind Pillai, Dimitris Spathis, Fahim Kawsar et al.
Video-Guided Foley Sound Generation with Multimodal Controls
Ziyang Chen, Prem Seetharaman, Bryan Russell et al.
DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splat Generation
Chenguo Lin, Panwang Pan, Bangbang Yang et al.
Multi-modal Agent Tuning: Building a VLM-Driven Agent for Efficient Tool Usage
Zhi Gao, Bofei Zhang, Pengxiang Li et al.
Training-Free Activation Sparsity in Large Language Models
James Liu, Pragaash Ponnusamy, Tianle Cai et al.
Synthetic continued pretraining
Zitong Yang, Neil Band, Shuangping Li et al.
Cycle3D: High-quality and Consistent Image-to-3D Generation via Generation-Reconstruction Cycle
Zhenyu Tang, Junwu Zhang, Xinhua Cheng et al.
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Size Wu, Wenwei Zhang, Lumin Xu et al.
Rethinking Diffusion for Text-Driven Human Motion Generation: Redundant Representations, Evaluation, and Masked Autoregression
Zichong Meng, Yiming Xie, Xiaogang Peng et al.
No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models
Seyedmorteza Sadat, Manuel Kansy, Otmar Hilliges et al.
MoH: Multi-Head Attention as Mixture-of-Head Attention
Peng Jin, Bo Zhu, Li Yuan et al.
Looped Transformers for Length Generalization
Ying Fan, Yilun Du, Kannan Ramchandran et al.
ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning
Zhaorun Chen, Mintong Kang, Bo Li
Autoregressive Adversarial Post-Training for Real-Time Interactive Video Generation
Shanchuan Lin, Ceyuan Yang, Hao He et al.
Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation
Shuo Yang, Haocheng Xi, Yilong Zhao et al.
TinySAM: Pushing the Envelope for Efficient Segment Anything Model
Han Shu, Wenshuo Li, Yehui Tang et al.
TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution
linwei dong, Qingnan Fan, Yihong Guo et al.
Evaluating the Evaluator: Measuring LLMs’ Adherence to Task Evaluation Instructions
Bhuvanashree Murugadoss, Christian Poelitz, Ian Drosos et al.
SUTrack: Towards Simple and Unified Single Object Tracking
Xin Chen, Ben Kang, Wanting Geng et al.
Large Language Models Assume People are More Rational than We Really are
Ryan Liu, Jiayi Geng, Joshua Peterson et al.
SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning
Rui Pan, Yinwei Dai, Zhihao Zhang et al.
On Scaling Up 3D Gaussian Splatting Training
Hexu Zhao, Haoyang Weng, Daohan Lu et al.
SymmCD: Symmetry-Preserving Crystal Generation with Diffusion Models
Daniel Levy, Siba Smarak Panigrahi, Sékou-Oumar Kaba et al.
Variational Best-of-N Alignment
Afra Amini, Tim Vieira, Elliott Ash et al.
Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions
Michael Zhang, W. Bradley Knox, Eunsol Choi
SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving
Georg Hess, Carl Lindström, Maryam Fatemi et al.
Sparse Meets Dense: Unified Generative Recommendations with Cascaded Sparse-Dense Representations
Yuhao Yang, ZhI JI, Zhaopeng Li et al.
Real-Time Execution of Action Chunking Flow Policies
Kevin Black, Manuel Galliker, Sergey Levine
Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time
Van Yang, Xiang Yue, Vipin Chaudhary et al.
Sparse Autoencoders Do Not Find Canonical Units of Analysis
Patrick Leask, Bart Bussmann, Michael Pearce et al.
Sparse MeZO: Less Parameters for Better Performance in Zeroth-Order LLM Fine-Tuning
Yong Liu, Zirui Zhu, Chaoyu Gong et al.
VLAS: Vision-Language-Action Model with Speech Instructions for Customized Robot Manipulation
Wei Zhao, Pengxiang Ding, Zhang Min et al.
How Does Critical Batch Size Scale in Pre-training?
Hanlin Zhang, Depen Morwani, Nikhil Vyas et al.
Deconstructing What Makes a Good Optimizer for Autoregressive Language Models
Rosie Zhao, Depen Morwani, David Brandfonbrener et al.
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation
Siyuan Huang, Liliang Chen, Pengfei Zhou et al.
TOMATO: Assessing Visual Temporal Reasoning Capabilities in Multimodal Foundation Models
Ziyao Shangguan, Chuhan Li, Yuxuan Ding et al.
Speedy-Splat: Fast 3D Gaussian Splatting with Sparse Pixels and Sparse Primitives
Alex Hanson, Allen Tu, Geng Lin et al.
Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale
Bowen Jiang, Zhuoqun Hao, Young Min Cho et al.
IDOL: Instant Photorealistic 3D Human Creation from a Single Image
Yiyu Zhuang, Jiaxi Lv, Hao Wen et al.
ReMA: Learning to Meta-Think for LLMs with Multi-agent Reinforcement Learning
Ziyu Wan, Yunxiang Li, Xiaoyu Wen et al.
Gradient Alignment in Physics-informed Neural Networks: A Second-Order Optimization Perspective
Sifan Wang, Ananyae bhartari, Bowen Li et al.
Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator
Chaehun Shin, Jooyoung Choi, Heeseung Kim et al.
Re-thinking Temporal Search for Long-Form Video Understanding
Jinhui Ye, Zihan Wang, Haosen Sun et al.
FairMT-Bench: Benchmarking Fairness for Multi-turn Dialogue in Conversational LLMs
Zhiting Fan, Ruizhe Chen, Tianxiang Hu et al.
PolaFormer: Polarity-aware Linear Attention for Vision Transformers
Weikang Meng, Yadan Luo, Xin Li et al.
OpenCUA: Open Foundations for Computer-Use Agents
Xinyuan Wang, Bowen Wang, Dunjie Lu et al.
Human-Object Interaction from Human-Level Instructions
Zhen Wu, Jiaman Li, Pei Xu et al.
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents
Qianhui Wu, Kanzhi Cheng, Rui Yang et al.
STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding?
Yun Li, Yiming Zhang, Tao Lin et al.
Vision-Language Models Do Not Understand Negation
Kumail Alhamoud, Shaden Alshammari, Yonglong Tian et al.
ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning
Ruchika Chavhan, Da Li, Timothy Hospedales
DartControl: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control
Kaifeng Zhao, Gen Li, Siyu Tang
FreeVS: Generative View Synthesis on Free Driving Trajectory
Qitai Wang, Lue Fan, Yuqi Wang et al.
FaceXFormer: A Unified Transformer for Facial Analysis
Kartik Narayan, Vibashan VS, Rama Chellappa et al.
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy
Zhi Hou, Tianyi Zhang, Yuwen Xiong et al.
Causal Prompting: Debiasing Large Language Model Prompting Based on Front-Door Adjustment
Congzhi Zhang, Linhai Zhang, Jialong Wu et al.
Frequency-Aligned Knowledge Distillation for Lightweight Spatiotemporal Forecasting
Yuqi Li, Chuanguang Yang, Hansheng Zeng et al.
DiscoveryBench: Towards Data-Driven Discovery with Large Language Models
Bodhisattwa Prasad Majumder, Harshit Surana, Dhruv Agarwal et al.
DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation
Dongya Jia, Zhuo Chen, Jiawei Chen et al.
StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models
Yunzhi Yan, Zhen Xu, Haotong Lin et al.
Direct3D-S2: Gigascale 3D Generation Made Easy with Spatial Sparse Attention
Shuang Wu, Youtian Lin, Feihu Zhang et al.
Towards General Visual-Linguistic Face Forgery Detection
Ke Sun, Shen Chen, Taiping Yao et al.
Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws
Yiding Jiang, Allan Zhou, Zhili Feng et al.
EmbodiedSAM: Online Segment Any 3D Thing in Real Time
Xiuwei Xu, Huangxing Chen, Linqing Zhao et al.
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models
Ruikang Liu, Yuxuan Sun, Manyi Zhang et al.
Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI
Robert Hönig, Javier Rando, Nicholas Carlini et al.
Think Only When You Need with Large Hybrid-Reasoning Models
Lingjie Jiang, Xun Wu, Shaohan Huang et al.
Generalizing Verifiable Instruction Following
Valentina Pyatkin, Saumya Malik, Victoria Graf et al.
TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos via Diffusion Models
Mark YU, Wenbo Hu, Jinbo Xing et al.
Preserving Diversity in Supervised Fine-Tuning of Large Language Models
Ziniu Li, Congliang Chen, Tian Xu et al.
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
Hao Li, Changyao TIAN, Jie Shao et al.
Reconstructive Visual Instruction Tuning
Haochen Wang, Anlin Zheng, Yucheng Zhao et al.
Flow Matching with General Discrete Paths: A Kinetic-Optimal Perspective
Neta Shaul, Itai Gat, Marton Havasi et al.
From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers
Jiacheng Liu, Chang Zou, Yuanhuiyi Lyu et al.
Align Your Flow: Scaling Continuous-Time Flow Map Distillation
Amirmojtaba Sabour, Sanja Fidler, Karsten Kreis
PAD: Personalized Alignment of LLMs at Decoding-time
Ruizhe Chen, Xiaotian Zhang, Meng Luo et al.
AgentStudio: A Toolkit for Building General Virtual Agents
Longtao Zheng, Zhiyuan Huang, Zhenghai Xue et al.
Sequential Controlled Langevin Diffusions
Junhua Chen, Lorenz Richter, Julius Berner et al.
UniCTokens: Boosting Personalized Understanding and Generation via Unified Concept Tokens
Ruichuan An, Sihan Yang, Renrui Zhang et al.
Attentive Eraser: Unleashing Diffusion Model’s Object Removal Potential via Self-Attention Redirection Guidance
Wenhao Sun, Xue-Mei Dong, Benlei Cui et al.
Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors
Wonbong Jang, Philippe Weinzaepfel, Vincent Leroy et al.
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning
Ruowen Zhao, James Jun Liang Chen Ye, Zhengyi Wang et al.
MNE-SLAM: Multi-Agent Neural SLAM for Mobile Robots
Tianchen Deng, Guole Shen, Chen Xun et al.
ExACT: Teaching AI Agents to Explore with Reflective-MCTS and Exploratory Learning
Xiao Yu, Baolin Peng, Vineeth Vajipey et al.
KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills
Weiji Xie, Jinrui Han, Jiakun Zheng et al.
LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models
Shenghao Fu, Qize Yang, Qijie Mo et al.
Preference Optimization for Reasoning with Pseudo Feedback
Fangkai Jiao, Geyang Guo, Xingxing Zhang et al.
Computational Limits of Low-Rank Adaptation (LoRA) Fine-Tuning for Transformer Models
Jerry Yao-Chieh Hu, Maojiang Su, En-Jui Kuo et al.
Compositional Entailment Learning for Hyperbolic Vision-Language Models
Avik Pal, Max van Spengler, Guido D'Amely di Melendugno et al.
Dynamic Diffusion Transformer
Wangbo Zhao, Yizeng Han, Jiasheng Tang et al.
DrivingForward: Feed-forward 3D Gaussian Splatting for Driving Scene Reconstruction from Flexible Surround-view Input
Qijian Tian, Xin Tan, Yuan Xie et al.
Think while You Generate: Discrete Diffusion with Planned Denoising
Sulin Liu, Juno Nam, Andrew Campbell et al.
Persistent Pre-training Poisoning of LLMs
Yiming Zhang, Javier Rando, Ivan Evtimov et al.
ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative Modeling of Human-Object Interactions
Jeonghwan Kim, Jisoo Kim, Jeonghyeon Na et al.
Add-it: Training-Free Object Insertion in Images With Pretrained Diffusion Models
Yoad Tewel, Rinon Gal, Dvir Samuel et al.
Improving the Diffusability of Autoencoders
Ivan Skorokhodov, Sharath Girish, Benran Hu et al.
ForecastBench: A Dynamic Benchmark of AI Forecasting Capabilities
Ezra Karger, Houtan Bastani, Chen Yueh-Han et al.
One Diffusion to Generate Them All
Duong H. Le, Tuan Pham, Sangho Lee et al.
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
Xuehai He, Weixi Feng, Kaizhi Zheng et al.
Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering
Ziyu Zhao, tao shen, Didi Zhu et al.
Robust Autonomy Emerges from Self-Play
Marco Cusumano-Towner, David Hafner, Alexander Hertzberg et al.
WISA: World simulator assistant for physics-aware text-to-video generation
Jing Wang, Ao Ma, Ke Cao et al.
xPatch: Dual-Stream Time Series Forecasting with Exponential Seasonal-Trend Decomposition
Artyom Stitsyuk, Jaesik Choi
Accelerating Diffusion LLMs via Adaptive Parallel Decoding
Daniel Israel, Guy Van den Broeck, Aditya Grover
LongGenBench: Benchmarking Long-Form Generation in Long Context LLMs
Yuhao Wu, Ming Shan Hee, Zhiqiang Hu et al.
Efficient Evolutionary Search Over Chemical Space with Large Language Models
Haorui Wang, Marta Skreta, Cher-Tian Ser et al.
A Closer Look at Machine Unlearning for Large Language Models
Xiaojian Yuan, Tianyu Pang, Chao Du et al.
Sparc3D: Sparse Representation and Construction for High-Resolution 3D Shapes Modeling
Zhihao Li, Yufei Wang, Heliang Zheng et al.