Most Cited 2025 "frequency bias analysis" Papers
22,274 papers found • Page 8 of 112
Conference
The Curse of Depth in Large Language Models
Wenfang Sun, Xinyuan Song, Pengxiang Li et al.
Sparse autoencoders reveal selective remapping of visual concepts during adaptation
Hyesu Lim, Jinho Choi, Jaegul Choo et al.
Spike No More: Stabilizing the Pre-training of Large Language Models
Sho Takase, Shun Kiyono, Sosuke Kobayashi et al.
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
Peng Xia, Siwei Han, Shi Qiu et al.
OptiBench Meets ReSocratic: Measure and Improve LLMs for Optimization Modeling
Zhicheng YANG, Yiwei Wang, Yinya Huang et al.
Overtrained Language Models Are Harder to Fine-Tune
Jacob Mitchell Springer, Sachin Goyal, Kaiyue Wen et al.
Feed-Forward Bullet-Time Reconstruction of Dynamic Scenes from Monocular Videos
hanxue liang, Jiawei Ren, Ashkan Mirzaei et al.
IRGS: Inter-Reflective Gaussian Splatting with 2D Gaussian Ray Tracing
Chun Gu, Xiaofei Wei, Zixuan Zeng et al.
Steer LLM Latents for Hallucination Detection
Seongheon Park, Xuefeng Du, Min-Hsuan Yeh et al.
World4Drive: End-to-End Autonomous Driving via Intention-aware Physical Latent World Model
Yupeng Zheng, Pengxuan Yang, Zebin Xing et al.
Values in the Wild: Discovering and Mapping Values in Real-World Language Model Interactions
Saffron Huang, Esin DURMUS, Kunal Handa et al.
Forgetting Transformer: Softmax Attention with a Forget Gate
Zhixuan Lin, Evgenii Nikishin, Xu He et al.
VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation
Hanzhi Chen, Boyang Sun, Anran Zhang et al.
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers
Shalev Lifshitz, Sheila A. McIlraith, Yilun Du
VideoGLaMM : A Large Multimodal Model for Pixel-Level Visual Grounding in Videos
Shehan Munasinghe, Hanan Gani, Wenqi Zhu et al.
G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems
Guibin Zhang, Muxin Fu, Kun Wang et al.
CBQ: Cross-Block Quantization for Large Language Models
Xin Ding, Xiaoyu Liu, Zhijun Tu et al.
VRAG-RL: Empower Vision-Perception-Based RAG for Visually Rich Information Understanding via Iterative Reasoning with Reinforcement Learning
Qiuchen Wang, Ruixue Ding, Yu Zeng et al.
PivotMesh: Generic 3D Mesh Generation via Pivot Vertices Guidance
Haohan Weng, Yikai Wang, Tong Zhang et al.
Exploring Intrinsic Normal Prototypes within a Single Image for Universal Anomaly Detection
Wei Luo, Yunkang Cao, Haiming Yao et al.
VCA: Video Curious Agent for Long Video Understanding
Zeyuan Yang, Delin Chen, Xueyang Yu et al.
From Tokens to Words: On the Inner Lexicon of LLMs
Guy Kaplan, Matanel Oren, Yuval Reif et al.
Language Model Alignment in Multilingual Trolley Problems
Zhijing Jin, Max Kleiman-Weiner, Giorgio Piatti et al.
McEval: Massively Multilingual Code Evaluation
Linzheng Chai, Shukai Liu, Jian Yang et al.
PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers
Yuchen Lin, Chenguo Lin, Panwang Pan et al.
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
Leqi Shen, Tianxiang Hao, Tao He et al.
STAMP: Scalable Task- And Model-agnostic Collaborative Perception
Xiangbo Gao, Runsheng Xu, Jiachen Li et al.
CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification
Yuchen Tian, Weixiang Yan, Qian Yang et al.
Unlocking Multimodal Mathematical Reasoning via Process Reward Model
Ruilin Luo, Zhuofan Zheng, Lei Wang et al.
CViT: Continuous Vision Transformer for Operator Learning
Sifan Wang, Jacob Seidman, Shyam Sankaran et al.
Best-of-N Jailbreaking
John Hughes, Sara Price, Aengus Lynch et al.
PhysGen3D: Crafting a Miniature Interactive World from a Single Image
Boyuan Chen, Hanxiao Jiang, Shaowei Liu et al.
How to build a consistency model: Learning flow maps via self-distillation
Nicholas Boffi, Michael Albergo, Eric Vanden-Eijnden
GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPS
Saman Kazemkhani, Aarav Pandya, Daphne Cornelisse et al.
Visual Agentic AI for Spatial Reasoning with a Dynamic API
Damiano Marsili, Rohun Agrawal, Yisong Yue et al.
From Isolated Conversations to Hierarchical Schemas: Dynamic Tree Memory Representation for LLMs
Alireza Rezazadeh, Zichao Li, Wei Wei et al.
GeoPixel: Pixel Grounding Large Multimodal Model in Remote Sensing
Akashah Shabbir, Ilmuz Zaman Mohammed Zumri, Mohammed Bennamoun et al.
ASGO: Adaptive Structured Gradient Optimization
Kang An, Yuxing Liu, Rui Pan et al.
LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior
Hanyu Wang, Saksham Suri, Yixuan Ren et al.
FourierMamba: Fourier Learning Integration with State Space Models for Image Deraining
Dong Li, Yidi Liu, Xueyang Fu et al.
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training
Jintao Zhang, Jia wei, Haoxu Wang et al.
SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator
Guoxuan Chen, Han Shi, jiawei li et al.
Understanding Chain-of-Thought in LLMs through Information Theory
Jean-Francois Ton, Muhammad Faaiz Taufiq, Yang Liu
Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification
Eric Zhao, Pranjal Awasthi, Sreenivas Gollapudi
REEF: Representation Encoding Fingerprints for Large Language Models
Jie Zhang, Dongrui Liu, Chen Qian et al.
Exploring Prosocial Irrationality for LLM Agents: A Social Cognition View
Xuan Liu, Jie ZHANG, HaoYang Shang et al.
ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing
Ziteng Wang, Jun Zhu, Jianfei Chen
AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions
Polina Kirichenko, Mark Ibrahim, Kamalika Chaudhuri et al.
Longhorn: State Space Models are Amortized Online Learners
Bo Liu, Rui Wang, Lemeng Wu et al.
Efficient Online Reinforcement Learning for Diffusion Policy
Haitong Ma, Tianyi Chen, Kai Wang et al.
Herald: A Natural Language Annotated Lean 4 Dataset
Guoxiong Gao, Yutong Wang, Jiedong Jiang et al.
GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding
Haoyi Jiang, Liu Liu, Tianheng Cheng et al.
AI-Face: A Million-Scale Demographically Annotated AI-Generated Face Dataset and Fairness Benchmark
Li Lin, Santosh Santosh, Mingyang Wu et al.
Methods for Convex $(L_0,L_1)$-Smooth Optimization: Clipping, Acceleration, and Adaptivity
Eduard Gorbunov, Nazarii Tupitsa, Sayantan Choudhury et al.
UGPhysics: A Comprehensive Benchmark for Undergraduate Physics Reasoning with Large Language Models
Xin Xu, Qiyun Xu, Tong Xiao et al.
ConDSeg: A General Medical Image Segmentation Framework via Contrast-Driven Feature Enhancement
Mengqi Lei, Haochen Wu, Xinhua Lv et al.
Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge
Haomiao Xiong, Zongxin Yang, Jiazuo Yu et al.
DeSiRe-GS: 4D Street Gaussians for Static-Dynamic Decomposition and Surface Reconstruction for Urban Driving Scenes
Chensheng Peng, Chengwei Zhang, Yixiao Wang et al.
Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens
Dongwon Kim, Ju He, Qihang Yu et al.
LEARN: Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application
Jian Jia, Yipei Wang, Yan Li et al.
On Large Language Model Continual Unlearning
Chongyang Gao, Lixu Wang, Kaize Ding et al.
XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation
Bowen Chen, Brynn zhao, Haomiao Sun et al.
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling
Hongzhi Huang, Defa Zhu, Banggu Wu et al.
3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes
Jan Held, Renaud Vandeghen, Abdullah J Hamdi et al.
EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering
Sheng Zhou, Junbin Xiao, Qingyun Li et al.
PHYBench: Holistic Evaluation of Physical Perception and Reasoning in Large Language Models
Shi Qiu, Shaoyang Guo, Zhuo-Yang Song et al.
Distillation Scaling Laws
Dan Busbridge, Amitis Shidani, Floris Weers et al.
Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark
Tsung-Han Wu, Giscard Biamby, Jerome Quenum et al.
Epona: Autoregressive Diffusion World Model for Autonomous Driving
Kaiwen Zhang, Zhenyu Tang, Xiaotao Hu et al.
Image Watermarks are Removable using Controllable Regeneration from Clean Noise
Yepeng Liu, Yiren Song, Hai Ci et al.
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks
Jiacheng Chen, Tianhao Liang, Sherman Siu et al.
Unleashing the Potential of Large Language Models as Prompt Optimizers: Analogical Analysis with Gradient-based Model Optimizers
Xinyu Tang, Xiaolei Wang, Wayne Xin Zhao et al.
GenSE: Generative Speech Enhancement via Language Models using Hierarchical Modeling
Jixun Yao, Hexin Liu, CHEN CHEN et al.
Inst3D-LMM: Instance-Aware 3D Scene Understanding with Multi-modal Instruction Tuning
Hanxun Yu, Wentong Li, Song Wang et al.
BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models
Peiyan Li, Yixiang Chen, Hongtao Wu et al.
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models
Wenbo Hu, Jia-Chen Gu, Zi-Yi Dou et al.
Faster Algorithms for Structured John Ellipsoid Computation
Yang Cao, Xiaoyu Li, Zhao Song et al.
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Junyan Ye, Baichuan Zhou, Zilong Huang et al.
One Perturbation is Enough: On Generating Universal Adversarial Perturbations against Vision-Language Pre-training Models
Hao Fang, Jiawei Kong, Wenbo Yu et al.
Motion-Agent: A Conversational Framework for Human Motion Generation with LLMs
Qi Wu, Yubo Zhao, Yifan Wang et al.
OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces
zehan wang, Ziang Zhang, Minjie Hong et al.
KABB: Knowledge-Aware Bayesian Bandits for Dynamic Expert Coordination in Multi-Agent Systems
Jusheng Zhang, Zimeng Huang, Yijia Fan et al.
CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models
Zihui Cheng, Qiguang Chen, Jin Zhang et al.
The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
Sicong Leng, Yun Xing, Zesen Cheng et al.
InstantSwap: Fast Customized Concept Swapping across Sharp Shape Differences
Chenyang Zhu, Kai Li, Yue Ma et al.
SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding
Sihang Li, Jin Huang, Jiaxi Zhuang et al.
Locate-then-edit for Multi-hop Factual Recall under Knowledge Editing
Zhuoran Zhang, Yongxiang Li, Zijian Kan et al.
K-LoRA: Unlocking Training-Free Fusion of Any Subject and Style LoRAs
Ziheng Ouyang, Zhen Li, Qibin Hou
RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control
Teng Li, Guangcong Zheng, Rui Jiang et al.
Exploring Enhanced Contextual Information for Video-Level Object Tracking
Ben Kang, Xin Chen, Simiao Lai et al.
Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
Shiji Zhao, Ranjie Duan, Fengxiang Wang et al.
Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions
Stefan Andreas Baumann, Felix Krause, Michael Neumayr et al.
Spurious Forgetting in Continual Learning of Language Models
Junhao Zheng, Xidi Cai, Shengjie Qiu et al.
VSSD: Vision Mamba with Non-Causal State Space Duality
Yuheng Shi, Mingjia Li, Minjing Dong et al.
Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective
Can Jin, Tianjin Huang, Yihua Zhang et al.
Q-Bench-Video: Benchmark the Video Quality Understanding of LMMs
Zicheng Zhang, Ziheng Jia, Haoning Wu et al.
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination
Jianing "Jed" Yang, Xuweiyi Chen, Nikhil Madaan et al.
LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding
Doohyuk Jang, Sihwan Park, June Yong Yang et al.
Cascade Reward Sampling for Efficient Decoding-Time Alignment
Bolian Li, Yifan Wang, Anamika Lochab et al.
KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models
Yongliang Wu, Zonghui Li, Xinting Hu et al.
How to set AdamW's weight decay as you scale model and dataset size
Xi Wang, Laurence Aitchison
Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning
Moritz Reuss, Jyothish Pari, Pulkit Agrawal et al.
Efficient Dictionary Learning with Switch Sparse Autoencoders
Anish Mudide, Josh Engels, Eric Michaud et al.
Second-Order Fine-Tuning without Pain for LLMs: A Hessian Informed Zeroth-Order Optimizer
Yanjun Zhao, Sizhe Dang, Haishan Ye et al.
Generative Inbetweening: Adapting Image-to-Video Models for Keyframe Interpolation
Xiaojuan Wang, Boyang Zhou, Brian Curless et al.
GOFA: A Generative One-For-All Model for Joint Graph Language Modeling
Lecheng Kong, Jiarui Feng, Hao Liu et al.
Adversarial Diffusion Compression for Real-World Image Super-Resolution
Bin Chen, Gehui Li, Rongyuan Wu et al.
Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data
Zhiyuan Zhou, Andy Peng, Qiyang Li et al.
MINIMA: Modality Invariant Image Matching
Jiangwei Ren, Xingyu Jiang, Zizhuo Li et al.
Beyond Autoregression: Fast LLMs via Self-Distillation Through Time
Justin Deschenaux, Caglar Gulcehre
LMAct: A Benchmark for In-Context Imitation Learning with Long Multimodal Demonstrations
Anian Ruoss, Fabio Pardo, Harris Chan et al.
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse
Maojia Song, Shang Hong Sim, Rishabh Bhardwaj et al.
Enriching Multimodal Sentiment Analysis Through Textual Emotional Descriptions of Visual-Audio Content
Sheng Wu, Dongxiao He, Xiaobao Wang et al.
Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning
Yiqun Chen, Lingyong Yan, Weiwei Sun et al.
Reducing Hallucinations in Large Vision-Language Models via Latent Space Steering
Sheng Liu, Haotian Ye, James Y Zou
Zigzag Diffusion Sampling: Diffusion Models Can Self-Improve via Self-Reflection
Lichen Bai, Shitong Shao, zikai zhou et al.
Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
Fan Zhou, Zengzhi Wang, Qian Liu et al.
LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences
Hongyan Zhi, Peihao Chen, Junyan Li et al.
Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models
Zhenyu Pan, Haozheng Luo, Manling Li et al.
GraphArena: Evaluating and Exploring Large Language Models on Graph Computation
Jianheng Tang, Qifan Zhang, Yuhan Li et al.
How Discrete and Continuous Diffusion Meet: Comprehensive Analysis of Discrete Diffusion Models via a Stochastic Integral Framework
Yinuo Ren, Haoxuan Chen, Grant Rotskoff et al.
Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors
Duo Zheng, shijia Huang, Yanyang Li et al.
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Navve Wasserman, Noam Rotstein, Roy Ganz et al.
Your Mixture-of-Experts LLM Is Secretly an Embedding Model for Free
Ziyue Li, Tianyi Zhou
HERMES: A Unified Self-Driving World Model for Simultaneous 3D Scene Understanding and Generation
Xin Zhou, DINGKANG LIANG, Sifan Tu et al.
Reinforcement Learning with Action Chunking
Qiyang Li, Zhiyuan (Paul) Zhou, Sergey Levine
AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents
Arman Zharmagambetov, Chuan Guo, Ivan Evtimov et al.
Training Dynamics of In-Context Learning in Linear Attention
Yedi Zhang, Aaditya Singh, Peter Latham et al.
SlideChat: A Large Vision-Language Assistant for Whole-Slide Pathology Image Understanding
Ying Chen, Guoan Wang, Yuanfeng Ji et al.
Diving into Self-Evolving Training for Multimodal Reasoning
Wei Liu, Junlong Li, Xiwen Zhang et al.
MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space
Lixing Xiao, Shunlin Lu, Huaijin Pi et al.
Can Large Language Models Understand Symbolic Graphics Programs?
Zeju Qiu, Weiyang Liu, Haiwen Feng et al.
Energy-Weighted Flow Matching for Offline Reinforcement Learning
Shiyuan Zhang, Weitong Zhang, Quanquan Gu
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles
Jiangjie Chen, Qianyu He, Siyu Yuan et al.
Theoretical Benefit and Limitation of Diffusion Language Model
Guhao Feng, Yihan Geng, Jian Guan et al.
The OMG dataset: An Open MetaGenomic corpus for mixed-modality genomic language modeling
Andre Cornman, Jacob West-Roberts, Antonio Camargo et al.
DailyDilemmas: Revealing Value Preferences of LLMs with Quandaries of Daily Life
Yu Ying Chiu, Liwei Jiang, Yejin Choi
Large-scale and Fine-grained Vision-language Pre-training for Enhanced CT Image Understanding
Zhongyi Shui, Jianpeng Zhang, Weiwei Cao et al.
CURIE: Evaluating LLMs on Multitask Scientific Long-Context Understanding and Reasoning
Hao Cui, Zahra Shamsi, Gowoon Cheon et al.
Bolt3D: Generating 3D Scenes in Seconds
Stanislaw Szymanowicz, Jason Y. Zhang, Pratul Srinivasan et al.
Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models
Zehan Wang, Ziang Zhang, Tianyu Pang et al.
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
Hongjie Wang, Chih-Yao Ma, Yen-Cheng Liu et al.
Distilling Multi-modal Large Language Models for Autonomous Driving
Deepti Hegde, Rajeev Yasarla, Hong Cai et al.
InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
Liming Jiang, Qing Yan, Yumin Jia et al.
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models
Samira Abnar, Harshay Shah, Dan Busbridge et al.
Agent4Edu: Generating Learner Response Data by Generative Agents for Intelligent Education Systems
Weibo Gao, Qi Liu, Linan Yue et al.
KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse
Jingbo Yang, Bairu Hou, Wei Wei et al.
Semantics-Adaptive Activation Intervention for LLMs via Dynamic Steering Vectors
Weixuan Wang, JINGYUAN YANG, Wei Peng
RUN: Reversible Unfolding Network for Concealed Object Segmentation
Chunming He, Rihan Zhang, Fengyang Xiao et al.
DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding
Geng Li, Jinglin Xu, Yunzhen Zhao et al.
A Formal Framework for Understanding Length Generalization in Transformers
Xinting Huang, Andy Yang, Satwik Bhattamishra et al.
ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding
Yiyang Zhou, Yangfan He, Yaofeng Su et al.
CAT-3DGS: A Context-Adaptive Triplane Approach to Rate-Distortion-Optimized 3DGS Compression
Yu-Ting Zhan, Cheng-Yuan Ho, He-Bi Yang et al.
AdvAgent: Controllable Blackbox Red-teaming on Web Agents
Chejian Xu, Mintong Kang, Jiawei Zhang et al.
Machine Unlearning Fails to Remove Data Poisoning Attacks
Martin Pawelczyk, Jimmy Di, Yiwei Lu et al.
A Bias-Free Training Paradigm for More General AI-generated Image Detection
Fabrizio Guillaro, Giada Zingarini, Ben Usman et al.
PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training
Cong Chen, Mingyu Liu, Chenchen Jing et al.
ChartMoE: Mixture of Diversely Aligned Expert Connector for Chart Understanding
Zhengzhuo Xu, Bowen Qu, Yiyan Qi et al.
LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?
Zihan Zheng, Zerui Cheng, Zeyu Shen et al.
Can Knowledge Editing Really Correct Hallucinations?
Baixiang Huang, Canyu Chen, Xiongxiao Xu et al.
Holistically Evaluating the Environmental Impact of Creating Language Models
Jacob Morrison, Clara Na, Jared Fernandez et al.
Understanding and Enhancing Safety Mechanisms of LLMs via Safety-Specific Neuron
Yiran Zhao, Wenxuan Zhang, Yuxi Xie et al.
MARS: Unleashing the Power of Variance Reduction for Training Large Models
Huizhuo Yuan, Yifeng Liu, Shuang Wu et al.
Multimodal Situational Safety
Kaiwen Zhou, Chengzhi Liu, Xuandong Zhao et al.
Graphic Design with Large Multimodal Model
Yutao Cheng, Zhao Zhang, Maoke Yang et al.
Interleaved-Modal Chain-of-Thought
Jun Gao, Yongqi Li, Ziqiang Cao et al.
MegaMath: Pushing the Limits of Open Math Corpora
Fan Zhou, Zengzhi Wang, Nikhil Ranjan et al.
ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
Zongyi Li, Shujie HU, Shujie LIU et al.
HyperGraphRAG: Retrieval-Augmented Generation via Hypergraph-Structured Knowledge Representation
Haoran Luo, Haihong E, Guanting Chen et al.
Enhancing Zeroth-order Fine-tuning for Language Models with Low-rank Structures
Yiming Chen, Yuan Zhang, Liyuan Cao et al.
Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation
ZIYU ZHU, Xilin Wang, Yixuan Li et al.
AI for Global Climate Cooperation: Modeling Global Climate Negotiations, Agreements, and Long-Term Cooperation in RICE-N
Tianyu Zhang, Andrew Williams, Phillip Wozny et al.
Grounded Reinforcement Learning for Visual Reasoning
Gabriel Sarch, Snigdha Saha, Naitik Khandelwal et al.
Simple ReFlow: Improved Techniques for Fast Flow Models
Beomsu Kim, Yu-Guan Hsieh, Michal Klein et al.
TrackGo: A Flexible and Efficient Method for Controllable Video Generation
Haitao Zhou, Chuang Wang, Rui Nie et al.
LayerTracer: Cognitive-Aligned Layered SVG Synthesis via Diffusion Transformer
Yiren Song, Danze Chen, Mike Zheng Shou
From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions
Changle Qu, Sunhao Dai, Xiaochi Wei et al.
SVBench: A Benchmark with Temporal Multi-Turn Dialogues for Streaming Video Understanding
Zhenyu Yang, Yuhang Hu, Zemin Du et al.
Understanding Factual Recall in Transformers via Associative Memories
Eshaan Nichani, Jason Lee, Alberto Bietti
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code
Zimu Lu, Aojun Zhou, Ke Wang et al.
SV4D 2.0: Enhancing Spatio-Temporal Consistency in Multi-View Video Diffusion for High-Quality 4D Generation
Chun-Han Yao, Yiming Xie, Vikram Voleti et al.
ParetoQ: Improving Scaling Laws in Extremely Low-bit LLM Quantization
Zechun Liu, Changsheng Zhao, Hanxian Huang et al.
LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models
Parshin Shojaee, Ngoc Hieu Nguyen, Kazem Meidani et al.
Fast-Slow Thinking GRPO for Large Vision-Language Model Reasoning
Wenyi Xiao, Leilei Gan
Star Attention: Efficient LLM Inference over Long Sequences
Shantanu Acharya, Fei Jia, Boris Ginsburg
Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems
Christian Walder, Deep Tejas Karkhanis
The Superposition of Diffusion Models Using the Itô Density Estimator
Marta Skreta, Lazar Atanackovic, Joey Bose et al.
Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads
Siqi Kou, Jiachun Jin, Zhihong Liu et al.
Subspace Optimization for Large Language Models with Convergence Guarantees
Yutong He, Pengrui Li, Yipeng Hu et al.
ShortcutsBench: A Large-Scale Real-world Benchmark for API-based Agents
Haiyang SHEN, Yue Li, Desong Meng et al.
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
Jingjing Chang, Yixiao Fang, Peng Xing et al.
Style Outweighs Substance: Failure Modes of LLM Judges in Alignment Benchmarking
Benjamin Feuer, Micah Goldblum, Teresa Datta et al.
DropGaussian: Structural Regularization for Sparse-view Gaussian Splatting
Hyunwoo Park, Gun Ryu, Wonjun Kim
Learning stochastic dynamics from snapshots through regularized unbalanced optimal transport
Zhenyi Zhang, Tiejun Li, Peijie Zhou
Chain-of-Retrieval Augmented Generation
Liang Wang, Haonan Chen, Nan Yang et al.
NLSR: Neuron-Level Safety Realignment of Large Language Models Against Harmful Fine-Tuning
Xin Yi, Shunfan Zheng, Linlin Wang et al.
When Thinking Fails: The Pitfalls of Reasoning for Instruction-Following in LLMs
Xiaomin Li, Zhou Yu, Zhiwei Zhang et al.
VistaDream: Sampling multiview consistent images for single-view scene reconstruction
Haiping Wang, Yuan Liu, Ziwei Liu et al.
Nonconvex Stochastic Optimization under Heavy-Tailed Noises: Optimal Convergence without Gradient Clipping
Zijian Liu, Zhengyuan Zhou
Diffusion-based Neural Network Weights Generation
Bedionita Soro, Bruno Andreis, Hayeon Lee et al.