Most Cited 2025 "neural language models" Papers
22,274 papers found • Page 13 of 112
Conference
Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties
Gouki Minegishi, Hiroki Furuta, Takeshi Kojima et al.
SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs
Mohammad Mozaffari, Amir Yazdanbakhsh, Zhao Zhang et al.
ReasonGrounder: LVLM-Guided Hierarchical Feature Splatting for Open-Vocabulary 3D Visual Grounding and Reasoning
Zhenyang Liu, Yikai Wang, Sixiao Zheng et al.
Solver-Informed RL: Grounding Large Language Models for Authentic Optimization Modeling
Yitian Chen, Jingfan Xia, Siyu Shao et al.
Patient-Level Anatomy Meets Scanning-Level Physics: Personalized Federated Low-Dose CT Denoising Empowered by Large Language Model
Ziyuan Yang, Yingyu Chen, Zhiwen Wang et al.
ACL: Activating Capability of Linear Attention for Image Restoration
Yubin Gu, Yuan Meng, Jiayi Ji et al.
GeoBEV: Learning Geometric BEV Representation for Multi-view 3D Object Detection
Jinqing Zhang, Yanan Zhang, Yunlong Qi et al.
Unlocking Efficient, Scalable, and Continual Knowledge Editing with Basis-Level Representation Fine-Tuning
Tianci Liu, Ruirui Li, Yunzhe Qi et al.
Consistent Flow Distillation for Text-to-3D Generation
runjie yan, Yinbo Chen, Xiaolong Wang
Teaching Language Models to Evolve with Users: Dynamic Profile Modeling for Personalized Alignment
Weixiang Zhao, Xingyu Sui, Yulin Hu et al.
Can Watermarked LLMs be Identified by Users via Crafted Prompts?
Aiwei Liu, Sheng Guan, Yiming Liu et al.
Hypothesis-Driven Theory-of-Mind Reasoning for Large Language Models
Hyunwoo Kim, Melanie Sclar, Tan Zhi-Xuan et al.
Backdoor Attacks Against No-Reference Image Quality Assessment Models via a Scalable Trigger
Yi Yu, Song Xia, Xun Lin et al.
Yuan: Yielding Unblemished Aesthetics Through a Unified Network for Visual Imperfections Removal in Generated Images
Zhenyu Yu, Chee Seng Chan
Generalized Principal-Agent Problem with a Learning Agent
Tao Lin, Yiling Chen
GRPose: Learning Graph Relations for Human Image Generation with Pose Priors
Xiangchen Yin, Donglin Di, Lei Fan et al.
CLIMB-ReID: A Hybrid CLIP-Mamba Framework for Person Re-Identification
Chenyang Yu, Xuehu Liu, Jiawen Zhu et al.
AdaManip: Adaptive Articulated Object Manipulation Environments and Policy Learning
Yuanfei Wang, Xiaojie Zhang, Ruihai Wu et al.
LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation
Juzheng Zhang, Jiacheng You, Ashwinee Panda et al.
Fully-inductive Node Classification on Arbitrary Graphs
Jianan Zhao, Zhaocheng Zhu, Mikhail Galkin et al.
SymmCompletion: High-Fidelity and High-Consistency Point Cloud Completion with Symmetry Guidance
Hongyu Yan, Zijun Li, Kunming Luo et al.
ReSi: A Comprehensive Benchmark for Representational Similarity Measures
Max Klabunde, Tassilo Wald, Tobias Schumacher et al.
FreeFlux: Understanding and Exploiting Layer-Specific Roles in RoPE-Based MMDiT for Versatile Image Editing
Tianyi Wei, Yifan Zhou, Dongdong Chen et al.
Lie Algebra Canonicalization: Equivariant Neural Operators under arbitrary Lie Groups
Zakhar Shumaylov, Peter Zaika, James Rowbottom et al.
OLiDM: Object-aware LiDAR Diffusion Models for Autonomous Driving
Tianyi Yan, Junbo Yin, Xianpeng Lang et al.
ReSim: Reliable World Simulation for Autonomous Driving
Jiazhi Yang, Kashyap Chitta, Shenyuan Gao et al.
Linguini: A benchmark for language-agnostic linguistic reasoning
Eduardo Sánchez, Belen Alastruey, Christophe Ropers et al.
Multi-Teacher Knowledge Distillation with Reinforcement Learning for Visual Recognition
Chuanguang Yang, XinQiang Yu, Han Yang et al.
SmartEraser: Remove Anything from Images using Masked-Region Guidance
Longtao Jiang, Zhendong Wang, Jianmin Bao et al.
Vision Transformers Don't Need Trained Registers
Nicholas Jiang, Amil Dravid, Alexei Efros et al.
Understanding the Uncertainty of LLM Explanations: A Perspective Based on Reasoning Topology
Longchao Da, Xiaoou Liu, Jiaxin Dai et al.
MDP3: A Training-free Approach for List-wise Frame Selection in Video-LLMs
Hui Sun, Shiyin Lu, Huanyu Wang et al.
Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs
Jeongseok Hyun, Sukjun Hwang, Su Ho Han et al.
Cost-efficient Collaboration between On-device and Cloud Language Models
Avanika Narayan, Dan Biderman, Sabri Eyuboglu et al.
An OpenMind for 3D Medical Vision Self-supervised Learning
Tassilo Wald, Constantin Ulrich, Jonathan Suprijadi et al.
How to Synthesize Text Data without Model Collapse?
Xuekai Zhu, Daixuan Cheng, Hengli Li et al.
Debiased All-in-one Image Restoration with Task Uncertainty Regularization
Gang Wu, Junjun Jiang, Yijun Wang et al.
InstructSeg: Unifying Instructed Visual Segmentation with Multi-modal Large Language Models
Cong Wei, Yujie Zhong, yingsen zeng et al.
STEREO: A Two-Stage Framework for Adversarially Robust Concept Erasing from Text-to-Image Diffusion Models
Koushik Srivatsan, Fahad Shamshad, Muzammal Naseer et al.
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models
Luca Eyring, Shyamgopal Karthik, Alexey Dosovitskiy et al.
UniVG: A Generalist Diffusion Model for Unified Image Generation and Editing
Tsu-Jui Fu, Yusu Qian, Chen Chen et al.
Jasmine: Harnessing Diffusion Prior for Self-supervised Depth Estimation
Jiyuan Wang, Chunyu Lin, cheng guan et al.
FalseReject: A Resource for Improving Contextual Safety and Mitigating Over-Refusals in LLMs via Structured Reasoning
Zhehao Zhang, Weijie Xu, Fanyou Wu et al.
Unsupervised Foundation Model-Agnostic Slide-Level Representation Learning
Tim Lenz, Peter Neidlinger, Marta Ligero et al.
The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise
Yuanhao Ban, Ruochen Wang, Tianyi Zhou et al.
Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models
Amir Mohammad Karimi Mamaghan, Samuele Papa, Karl H. Johansson et al.
HiP-AD: Hierarchical and Multi-Granularity Planning with Deformable Attention for Autonomous Driving in a Single Decoder
Yingqi Tang, Zhuoran Xu, Zhaotie Meng et al.
Unbounded: A Generative Infinite Game of Character Life Simulation
Jialu Li, Yuanzhen Li, Neal Wadhwa et al.
Fine-grained Abnormality Prompt Learning for Zero-shot Anomaly Detection
Jiawen Zhu, YEW-SOON ONG, Chunhua Shen et al.
Focus on Local: Finding Reliable Discriminative Regions for Visual Place Recognition
Changwei Wang, Shunpeng Chen, Yukun Song et al.
Understanding Layer Significance in LLM Alignment
Guangyuan SHI, ZEXIN LU, Xiaoyu DONG et al.
Language Models Fail to Introspect About Their Knowledge of Language
Siyuan Song, Jennifer Hu, Kyle Mahowald
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
Wei Chen, Lin Li, Yongqi Yang et al.
Rethinking Diffusion Posterior Sampling: From Conditional Score Estimator to Maximizing a Posterior
Tongda Xu, Xiyan Cai, Xinjie Zhang et al.
RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation
Mingfei Han, Liang Ma, Kamila Zhumakhanova et al.
MMEgo: Towards Building Egocentric Multimodal LLMs for Video QA
Hanrong Ye, Haotian Zhang, Erik Daxberger et al.
Broadening Target Distributions for Accelerated Diffusion Models via a Novel Analysis Approach
Yuchen Liang, Peizhong Ju, Yingbin Liang et al.
Image Generation Diversity Issues and How to Tame Them
Mischa Dombrowski, Weitong Zhang, Hadrien Reynaud et al.
Language Model Personalization via Reward Factorization
Idan Shenfeld, Felix Faltings, Pulkit Agrawal et al.
Mobile Video Diffusion
Haitam Ben Yahia, Denis Korzhenkov, Ioannis Lelekas et al.
Twilight: Adaptive Attention Sparsity with Hierarchical Top-$p$ Pruning
Chaofan Lin, Jiaming Tang, Shuo Yang et al.
EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild
Yumeng Liu, Xiaoxiao Long, Zemin Yang et al.
Adaptive Layer-skipping in Pre-trained LLMs
Xuan Luo, Weizhi Wang, Xifeng Yan
Accelerating RL for LLM Reasoning with Optimal Advantage Regression
Kianté Brantley, Mingyu Chen, Zhaolin Gao et al.
UCF-Crime-DVS: A Novel Event-Based Dataset for Video Anomaly Detection with Spiking Neural Networks
Yuanbin Qian, Shuhan Ye, Chong Wang et al.
LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization
Guanzheng Chen, Xin Li, Michael Qizhe Shieh et al.
Detail Matters: Mamba-Inspired Joint Unfolding Network for Snapshot Spectral Compressive Imaging
Mengjie Qin, Yuchao Feng, Zongliang Wu et al.
Galileo: Learning Global & Local Features of Many Remote Sensing Modalities
Gabriel Tseng, Anthony Fuller, Marlena Reil et al.
LeanAgent: Lifelong Learning for Formal Theorem Proving
Adarsh Kumarappan, Mohit Tiwari, Peiyang Song et al.
Uncertainty-guided Perturbation for Image Super-Resolution Diffusion Model
Leheng Zhang, Weiyi You, Kexuan Shi et al.
CROW: Eliminating Backdoors from Large Language Models via Internal Consistency Regularization
Nay Myat Min, Long H. Pham, Yige Li et al.
Improving Equivariant Networks with Probabilistic Symmetry Breaking
Hannah Lawrence, Vasco Portilheiro, Yan Zhang et al.
Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaptation
Anqi Li, Feng Li, Yuxi Liu et al.
MG-MotionLLM: A Unified Framework for Motion Comprehension and Generation across Multiple Granularities
Bizhu Wu, Jinheng Xie, Keming Shen et al.
Decision Information Meets Large Language Models: The Future of Explainable Operations Research
Yansen Zhang, Qingcan Kang, Wing Yin YU et al.
Zero-Shot Monocular Scene Flow Estimation in the Wild
Yiqing Liang, Abhishek Badki, Hang Su et al.
LLMs Are In-Context Bandit Reinforcement Learners
Giovanni Monea, Antoine Bosselut, Kianté Brantley et al.
Imagine360: Immersive 360 Video Generation from Perspective Anchor
Jing Tan, Shuai Yang, Tong Wu et al.
RealRAG: Retrieval-augmented Realistic Image Generation via Self-reflective Contrastive Learning
Yuanhuiyi Lyu, Xu Zheng, Lutao Jiang et al.
RelCon: Relative Contrastive Learning for a Motion Foundation Model for Wearable Data
Maxwell Xu, Jaya Narain, Gregory Darnell et al.
HandDiffuse: Generative Controllers for Two-Hand Interactions via Diffusion Models
Pei Lin
Adversarial Distribution Matching for Diffusion Distillation Towards Efficient Image and Video Synthesis
Yanzuo Lu, Yuxi Ren, Xin Xia et al.
MotionPro: A Precise Motion Controller for Image-to-Video Generation
Zhongwei Zhang, Fuchen Long, Zhaofan Qiu et al.
ICE-Bench: A Unified and Comprehensive Benchmark for Image Creating and Editing
Yulin Pan, Xiangteng He, Chaojie Mao et al.
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
Jiale Cheng, Xiao Liu, Cunxiang Wang et al.
On Reasoning Strength Planning in Large Reasoning Models
Leheng Sheng, An Zhang, Zijian Wu et al.
Generative Planning with 3D-Vision Language Pre-training for End-to-End Autonomous Driving
Tengpeng Li, Hanli Wang, Xianfei Li et al.
MAP: Multi-Human-Value Alignment Palette
Xinran Wang, Qi Le, Ammar Ahmed et al.
ConfigX: Modular Configuration for Evolutionary Algorithms via Multitask Reinforcement Learning
Hongshu Guo, Zeyuan Ma, Jiacheng Chen et al.
TREAD: Token Routing for Efficient Architecture-agnostic Diffusion Training
Felix Krause, Timy Phan, Ming Gui et al.
VLA-OS: Structuring and Dissecting Planning Representations and Paradigms in Vision-Language-Action Models
Chongkai Gao, Zixuan Liu, Zhenghao Chi et al.
Trusted Multi-View Classification via Evolutionary Multi-View Fusion
Xinyan Liang, Pinhan Fu, Yuhua Qian et al.
Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning
Minheng Ni, Zhengyuan Yang, Linjie Li et al.
Instant Adversarial Purification with Adversarial Consistency Distillation
Chun Tong Lei, Hon Ming Yam, Zhongliang Guo et al.
FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors
Yabo Zhang, xinpeng zhou, Yihan Zeng et al.
EgoLM: Multi-Modal Language Model of Egocentric Motions
Fangzhou Hong, Vladimir Guzov, Hyo Jin Kim et al.
Dissecting Submission Limit in Desk-Rejections: A Mathematical Analysis of Fairness in AI Conference Policies
Yuefan Cao, Xiaoyu Li, Yingyu Liang et al.
Rethinking Spiking Neural Networks from an Ensemble Learning Perspective
Yongqi Ding, Lin Zuo, Mengmeng Jing et al.
SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix
Peng Dai, Feitong Tan, Qiangeng Xu et al.
One-for-More: Continual Diffusion Model for Anomaly Detection
Xiaofan Li, Xin Tan, Zhuo Chen et al.
PaTH Attention: Position Encoding via Accumulating Householder Transformations
Songlin Yang, Yikang Shen, Kaiyue Wen et al.
METASCENES: Towards Automated Replica Creation for Real-world 3D Scans
Huangyue Yu, Baoxiong Jia, Yixin Chen et al.
Inference Optimal VLMs Need Fewer Visual Tokens and More Parameters
Kevin Li, Sachin Goyal, João D Semedo et al.
Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven Optimization
Yue Zhang, Liqiang Jing, Vibhav Gogate
Brain Mapping with Dense Features: Grounding Cortical Semantic Selectivity in Natural Images With Vision Transformers
Andrew Luo, Jacob Yeung, Rushikesh Zawar et al.
Towards a Universal Synthetic Video Detector: From Face or Background Manipulations to Fully AI-Generated Content
Rohit Kundu, Hao Xiong, Vishal Mohanty et al.
Speech Robust Bench: A Robustness Benchmark For Speech Recognition
Muhammad Shah, David Solans Noguero, Mikko Heikkilä et al.
UniMuMo: Unified Text, Music, and Motion Generation
Han Yang, Kun Su, Yutong Zhang et al.
CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation
Anirudh Khatry, Robert Zhang, Jia Pan et al.
LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid
Tianyi Zhang, Anshumali Shrivastava
Bag of Tricks for Inference-time Computation of LLM Reasoning
Fan LIU, Wen-Shuo Chao, Naiqiang Tan et al.
Exploring the limits of strong membership inference attacks on large language models
Jamie Hayes, I Shumailov, Christopher A. Choquette-Choo et al.
MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm
Ziyan Guo, Zeyu HU, Na Zhao et al.
Generative Classifiers Avoid Shortcut Solutions
Alexander Li, Ananya Kumar, Deepak Pathak
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing
Ming Li, Xin Gu, Fan Chen et al.
The Illusion of Unlearning: The Unstable Nature of Machine Unlearning in Text-to-Image Diffusion Models
Naveen George, Karthik Nandan Dasaraju, Rutheesh Reddy Chittepu et al.
MVPortrait: Text-Guided Motion and Emotion Control for Multi-view Vivid Portrait Animation
Yukang Lin, Hokit Fung, Jianjin Xu et al.
Diffusion Models for Attribution
Xiongren Chen, Jiuyong Li, Jixue Liu et al.
Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration
Ziyang Ma, Guanrou Yang, Yifan Yang et al.
MVGenMaster: Scaling Multi-View Generation from Any Image via 3D Priors Enhanced Diffusion Model
Chenjie Cao, Chaohui Yu, Shang Liu et al.
DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses
Yatian Pang, Bin Zhu, Bin Lin et al.
SecureGS: Boosting the Security and Fidelity of 3D Gaussian Splatting Steganography
Xuanyu Zhang, Jiarui Meng, Zhipei Xu et al.
Imputation for prediction: beware of diminishing returns.
Marine Le Morvan, Gael Varoquaux
Stable Segment Anything Model
Qi Fan, Xin Tao, Lei Ke et al.
$\mathbb{X}$-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs
Vlad Sobal, Mark Ibrahim, Randall Balestriero et al.
CyberPal.AI: Empowering LLMs with Expert-Driven Cybersecurity Instructions
Matan Levi, Yair Allouche, Daniel Ohayon et al.
Advancing Spiking Neural Networks Towards Multiscale Spatiotemporal Interaction Learning
Yimeng Shan, Malu Zhang, Rui-jie Zhu et al.
The Power of Context: How Multimodality Improves Image Super-Resolution
Kangfu Mei, Vishal M. Patel, Mojtaba Sahraee-Ardakan et al.
DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation
Zhixuan Liang, Yao Mu, Yixiao Wang et al.
NAVIX: Scaling MiniGrid Environments with JAX
Eduardo Pignatelli, Jarek Liesen, Robert Lange et al.
NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models
Sung-Yeon Park, Can Cui, Yunsheng Ma et al.
Equivariance Everywhere All At Once: A Recipe for Graph Foundation Models
Ben Finkelshtein, Ismail Ilkan Ceylan, Michael Bronstein et al.
LUDVIG: Learning-Free Uplifting of 2D Visual Features to Gaussian Splatting Scenes
Juliette Marrie, Romain Menegaux, Michael Arbel et al.
CL-Attack: Textual Backdoor Attacks via Cross-Lingual Triggers
Jingyi Zheng, Tianyi Hu, Tianshuo Cong et al.
Heterogeneous Swarms: Jointly Optimizing Model Roles and Weights for Multi-LLM Systems
Shangbin Feng, Zifeng Wang, Palash Goyal et al.
DualTalk: Dual-Speaker Interaction for 3D Talking Head Conversations
Ziqiao Peng, Yanbo Fan, Haoyu Wu et al.
Identifying Query-Relevant Neurons in Large Language Models for Long-Form Texts
Lihu Chen, Adam Dejl, Francesca Toni
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
Shehreen Azad, Vibhav Vineet, Yogesh S. Rawat
MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting
Sangwoon Kwak, Joonsoo Kim, Jun Young Jeong et al.
CoT Red-Handed: Stress Testing Chain-of-Thought Monitoring
Benjamin Arnav, Pablo Bernabeu-Perez, Nathan Helm-Burger et al.
Learning Molecular Representation in a Cell
Gang Liu, Srijit Seal, John Arevalo et al.
Prototype-Based Image Prompting for Weakly Supervised Histopathological Image Segmentation
Qingchen Tang, Lei Fan, Maurice Pagnucco et al.
Mr. DETR: Instructive Multi-Route Training for Detection Transformers
Chang-Bin Zhang, Yujie Zhong, Kai Han
In Search of Adam’s Secret Sauce
Antonio Orvieto, Robert Gower
Accelerating Large Language Model Reasoning via Speculative Search
Zhihai Wang, Jie Wang, Jilai Pan et al.
Ambient Diffusion Omni: Training Good Models with Bad Data
Giannis Daras, Adrian Rodriguez-Munoz, Adam Klivans et al.
MDNS: Masked Diffusion Neural Sampler via Stochastic Optimal Control
Yuchen Zhu, Wei Guo, Jaemoo Choi et al.
BVINet: Unlocking Blind Video Inpainting with Zero Annotations
zhiliang wu, Kerui Chen, Kun Li et al.
Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace
Jinluan Yang, Anke Tang, Didi Zhu et al.
Panacea: Mitigating Harmful Fine-tuning for Large Language Models via Post-fine-tuning Perturbation
Yibo Wang, Tiansheng Huang, Li Shen et al.
RealGeneral: Unifying Visual Generation via Temporal In-Context Learning with Video Models
Yijing Lin, Mengqi Huang, Shuhan Zhuang et al.
DropoutGS: Dropping Out Gaussians for Better Sparse-view Rendering
Yexing Xu, Longguang Wang, Minglin Chen et al.
SuperCorrect: Advancing Small LLM Reasoning with Thought Template Distillation and Self-Correction
Ling Yang, Zhaochen Yu, Tianjun Zhang et al.
Patch-wise Structural Loss for Time Series Forecasting
Dilfira Kudrat, Zongxia Xie, Yanru Sun et al.
BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model
Adibvafa Fallahpour, Andrew Magnuson, Purav Gupta et al.
Learning Hazing to Dehazing: Towards Realistic Haze Generation for Real-World Image Dehazing
Ruiyi Wang, Yushuo Zheng, Zicheng Zhang et al.
InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding
Minsoo Kim, Kyuhong Shim, Jungwook Choi et al.
Efficient 3D Recognition with Event-driven Spike Sparse Convolution
Xuerui Qiu, Man Yao, Jieyuan Zhang et al.
Reward-Guided Iterative Refinement in Diffusion Models at Test-Time with Applications to Protein and DNA Design
Masatoshi Uehara, su, Yulai Zhao et al.
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLMs
Xinyu Fang, Zhijian Chen, Kai Lan et al.
GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs
Advik Basani, Xiao Zhang
KeyDiff: Key Similarity-Based KV Cache Eviction for Long-Context LLM Inference in Resource-Constrained Environments
Junyoung Park, Dalton Jones, Matthew Morse et al.
VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding
Kangsan Kim, Geon Park, Youngwan Lee et al.
Solving New Tasks by Adapting Internet Video Knowledge
Calvin Luo, Zilai Zeng, Yilun Du et al.
Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward
Yanming Wan, Jiaxing Wu, Marwa Abdulhai et al.
Vec2Face: Scaling Face Dataset Generation with Loosely Constrained Vectors
Haiyu Wu, Jaskirat Singh, Sicong Tian et al.
Post-hoc Reward Calibration: A Case Study on Length Bias
Zeyu Huang, Zihan Qiu, zili wang et al.
Fine-Grained Evaluation of Large Vision-Language Models in Autonomous Driving
Yue Li, Meng Tian, Zhenyu Lin et al.
Quest: Query-centric Data Synthesis Approach for Long-context Scaling of Large Language Model
Chaochen Gao, Xing W, Qi Fu et al.
DRiVE: Diffusion-based Rigging Empowers Generation of Versatile and Expressive Characters
Mingze Sun, Junting Dong, Junhao Chen et al.
Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a GAN Discriminator
Kaiwen Zheng, Yongxin Chen, Huayu Chen et al.
OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Xize Cheng, Siqi Zheng, zehan wang et al.
VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation
Saksham Singh Kushwaha, Yapeng Tian
Flow matching achieves almost minimax optimal convergence
Kenji Fukumizu, Taiji Suzuki, Noboru Isobe et al.
LoRACLR: Contrastive Adaptation for Customization of Diffusion Models
Enis Simsar, Thomas Hofmann, Federico Tombari et al.
MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation
Mingcheng Li, Xiaolu Hou, Ziyang Liu et al.
Completion as Enhancement: A Degradation-Aware Selective Image Guided Network for Depth Completion
Zhiqiang Yan, Zhengxue Wang, Kun Wang et al.
Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient
Jan Ludziejewski, Maciej Pióro, Jakub Krajewski et al.
StarGen: A Spatiotemporal Autoregression Framework with Video Diffusion Model for Scalable and Controllable Scene Generation
Shangjin Zhai, Zhichao Ye, Jialin Liu et al.
P-SPIKESSM: HARNESSING PROBABILISTIC SPIKING STATE SPACE MODELS FOR LONG-RANGE DEPENDENCY TASKS
Malyaban Bal, Abhronil Sengupta
Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic Manipulation
Qi Lv, Hao Li, Xiang Deng et al.
Backdoor Cleaning without External Guidance in MLLM Fine-tuning
Xuankun Rong, Wenke Huang, Jian Liang et al.
Exploring Vacant Classes in Label-Skewed Federated Learning
Kuangpu Guo, Yuhe Ding, Jian Liang et al.
Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering
Xingrui Wang, Wufei Ma, Angtian Wang et al.
Active Evaluation Acquisition for Efficient LLM Benchmarking
Yang Li, Jie Ma, Miguel Ballesteros et al.
Enhancing Multi-Robot Semantic Navigation Through Multimodal Chain-of-Thought Score Collaboration
Zhixuan Shen, Haonan Luo, Kexun Chen et al.
Coreset Selection via Reducible Loss in Continual Learning
Ruilin Tong, Yuhang Liu, Javen Qinfeng Shi et al.
RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors
Fengshuo Bai, Runze Liu, Yali Du et al.
CoRA: Collaborative Information Perception by Large Language Model’s Weights for Recommendation
Yuting Liu, Jinghao Zhang, Yizhou Dang et al.
NexusGS: Sparse View Synthesis with Epipolar Depth Priors in 3D Gaussian Splatting
Yulong Zheng, Zicheng Jiang, Shengfeng He et al.
Back on Track: Bundle Adjustment for Dynamic Scene Reconstruction
Weirong Chen, Ganlin Zhang, Felix Wimbauer et al.
AKiRa: Augmentation Kit on Rays for Optical Video Generation
Xi Wang, Robin Courant, Marc Christie et al.
RhythmMamba: Fast, Lightweight, and Accurate Remote Physiological Measurement
Bochao Zou, Zizheng Guo, Xiaocheng Hu et al.
CLIMB: Data Foundations for Large Scale Multimodal Clinical Foundation Models
David Dai, Peilin Chen, Malinda Lu et al.
MaskControl: Spatio-Temporal Control for Masked Motion Synthesis
Ekkasit Pinyoanuntapong, Muhammad Usama Saleem, Korrawe Karunratanakul et al.
Revisiting Zeroth-Order Optimization: Minimum-Variance Two-Point Estimators and Directionally Aligned Perturbations
Shaocong Ma, Heng Huang
Repulsive Latent Score Distillation for Solving Inverse Problems
Nicolas Zilberstein, Morteza Mardani, Santiago Segarra
Formation of Representations in Neural Networks
Liu Ziyin, Isaac Chuang, Tomer Galanti et al.
PSHuman: Photorealistic Single-image 3D Human Reconstruction using Cross-Scale Multiview Diffusion and Explicit Remeshing
Peng Li, Wangguandong Zheng, Yuan Liu et al.
Searching Latent Program Spaces
Matthew Macfarlane, Clem Bonnet