Most Cited 2025 "non-parametric retrieval" Papers
22,274 papers found • Page 10 of 112
Conference
Beyond Single Concept Vector: Modeling Concept Subspace in LLMs with Gaussian Distribution
Haiyan Zhao, Heng Zhao, Bo Shen et al.
RefactorBench: Evaluating Stateful Reasoning in Language Agents Through Code
Dhruv Gautam, Spandan Garg, Jinu Jang et al.
MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization
Yougang Lyu, Lingyong Yan, Zihan Wang et al.
Signature Kernel Conditional Independence Tests in Causal Discovery for Stochastic Processes
Georg Manten, Cecilia Casolo, Emilio Ferrucci et al.
Tracing Representation Progression: Analyzing and Enhancing Layer-Wise Similarity
Jiachen Jiang, Jinxin Zhou, Zhihui Zhu
MMDisCo: Multi-Modal Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation
Akio Hayakawa, Masato Ishii, Takashi Shibuya et al.
ContextGNN: Beyond Two-Tower Recommendation Systems
Yiwen Yuan, Zecheng Zhang, Xinwei He et al.
UniGraspTransformer: Simplified Policy Distillation for Scalable Dexterous Robotic Grasping
Wenbo Wang, Fangyun Wei, Lei Zhou et al.
DINO-Foresight: Looking into the Future with DINO
Efstathios Karypidis, Ioannis Kakogeorgiou, Spyridon Gidaris et al.
CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
Shoubin Yu, Jaehong Yoon, Mohit Bansal
Motion Prior Knowledge Learning with Homogeneous Language Descriptions for Moving Infrared Small Target Detection
Shengjia Chen, Luping Ji, Weiwei Duan et al.
MOS: Model Surgery for Pre-Trained Model-Based Class-Incremental Learning
Hai-Long Sun, Da-Wei Zhou, Hanbin Zhao et al.
Quantization without Tears
Minghao Fu, Hao Yu, Jie Shao et al.
MagicArticulate: Make Your 3D Models Articulation-Ready
Chaoyue Song, Jianfeng Zhang, Xiu Li et al.
BEST-Route: Adaptive LLM Routing with Test-Time Optimal Compute
Dujian Ding, Ankur Mallick, Shaokun Zhang et al.
Efficient Part-level 3D Object Generation via Dual Volume Packing
Jiaxiang Tang, Ruijie Lu, Max Li et al.
PRAGA: Prototype-aware Graph Adaptive Aggregation for Spatial Multi-modal Omics Analysis
Xinlei Huang, Zhiqi Ma, Dian Meng et al.
Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence
Shangbin Feng, Zifeng Wang, Yike Wang et al.
ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving Object
Zhe Shan, Yang Liu, Lei Zhou et al.
Reinforce LLM Reasoning through Multi-Agent Reflection
Yurun Yuan, Tengyang Xie
Aioli: A Unified Optimization Framework for Language Model Data Mixing
Mayee Chen, Michael Hu, Nicholas Lourie et al.
WebGen-Bench: Evaluating LLMs on Generating Interactive and Functional Websites from Scratch
Zimu Lu, Yunqiao Yang, Houxing Ren et al.
Instant3dit: Multiview Inpainting for Fast Editing of 3D Objects
Amir Barda, Matheus Gadelha, Vladimir G. Kim et al.
Federated Unlearning with Gradient Descent and Conflict Mitigation
Zibin Pan, Zhichao Wang, Chi Li et al.
Mimir: Improving Video Diffusion Models for Precise Text Understanding
Shuai Tan, Biao Gong, Yutong Feng et al.
Adaptive Length Image Tokenization via Recurrent Allocation
Shivam Duggal, Phillip Isola, Antonio Torralba et al.
MMAD: A Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection
Xi Jiang, Jian Li, Hanqiu Deng et al.
Concept Bottleneck Language Models For Protein Design
Aya Ismail, Tuomas Oikarinen, Amy Wang et al.
VQ-VLA: Improving Vision-Language-Action Models via Scaling Vector-Quantized Action Tokenizers
Yating Wang, Haoyi Zhu, Mingyu Liu et al.
EgoExo-Gen: Ego-centric Video Prediction by Watching Exo-centric Videos
Jilan Xu, Yifei Huang, Baoqi Pei et al.
Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding
Wei Suo, Lijun Zhang, Mengyang Sun et al.
Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models
Ronghuan Wu, Wanchao Su, Jing Liao
MambaIC: State Space Models for High-Performance Learned Image Compression
Fanhu Zeng, Hao Tang, Yihua Shao et al.
Degradation-Aware Feature Perturbation for All-in-One Image Restoration
Xiangpeng Tian, Xiangyu Liao, Xiao Liu et al.
Toward Generalized Image Quality Assessment: Relaxing the Perfect Reference Quality Assumption
Du CHEN, Tianhe Wu, Kede Ma et al.
Closed-Form Merging of Parameter-Efficient Modules for Federated Continual Learning
Riccardo Salami, Pietro Buzzega, Matteo Mosconi et al.
Revealing Key Details to See Differences: A Novel Prototypical Perspective for Skeleton-based Action Recognition
Hongda Liu, Yunfan Liu, Min Ren et al.
Quamba: A Post-Training Quantization Recipe for Selective State Space Models
Hung-Yueh Chiang, Chi-Chih Chang, Natalia Frumkin et al.
Endless Jailbreaks with Bijection Learning
Brian R.Y. Huang, Max Li, Leonard Tang
Dual Data Alignment Makes AI-Generated Image Detector Easier Generalizable
Ruoxin Chen, Junwei Xi, Zhiyuan Yan et al.
GraphMoRE: Mitigating Topological Heterogeneity via Mixture of Riemannian Experts
Zihao Guo, Qingyun Sun, Haonan Yuan et al.
Understanding and Enhancing the Transferability of Jailbreaking Attacks
Runqi Lin, Bo Han, Fengwang Li et al.
EnvGS: Modeling View-Dependent Appearance with Environment Gaussian
Tao Xie, Xi Chen, Zhen Xu et al.
Physics-Constrained Flow Matching: Sampling Generative Models with Hard Constraints
Utkarsh Utkarsh, Pengfei Cai, Alan Edelman et al.
Revisiting MAE Pre-training for 3D Medical Image Segmentation
Tassilo Wald, Constantin Ulrich, Stanislav Lukyanenko et al.
APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding
Xinyu Yang, Tianqi Chen, Beidi Chen
Stationary Kernels and Gaussian Processes on Lie Groups and their Homogeneous Spaces II: non-compact symmetric spaces
Iskander Azangulov, Andrei Smolensky, Alexander Terenin et al.
IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations
Zhibing Li, Tong Wu, Jing Tan et al.
MiniPLM: Knowledge Distillation for Pre-training Language Models
Yuxian Gu, Hao Zhou, Fandong Meng et al.
MMedPO: Aligning Medical Vision-Language Models with Clinical-Aware Multimodal Preference Optimization
Kangyu Zhu, Peng Xia, Yun Li et al.
Global-Local Tree Search in VLMs for 3D Indoor Scene Generation
Wei Deng, Mengshi Qi, Huadong Ma
VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Yunlong Tang, JunJia Guo, Hang Hua et al.
PrEditor3D: Fast and Precise 3D Shape Editing
Ziya Erkoc, Can Gümeli, Chaoyang Wang et al.
OBI-Bench: Can LMMs Aid in Study of Ancient Script on Oracle Bones?
Zijian Chen, tingzhu chen, Wenjun Zhang et al.
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References
Qiyuan Zhang, Yufei Wang, Tiezheng YU et al.
UniCombine: Unified Multi-Conditional Combination with Diffusion Transformer
Haoxuan Wang, Jinlong Peng, Qingdong He et al.
FastVID: Dynamic Density Pruning for Fast Video Large Language Models
Leqi Shen, Guoqiang Gong, Tao He et al.
MetaOOD: Automatic Selection of OOD Detection Models
Yuehan Qin, Yichi Zhang, Yi Nian et al.
VeriThinker: Learning to Verify Makes Reasoning Model Efficient
Zigeng Chen, Xinyin Ma, Gongfan Fang et al.
DRoC: Elevating Large Language Models for Complex Vehicle Routing via Decomposed Retrieval of Constraints
Xia Jiang, Yaoxin Wu, Chenhao Zhang et al.
ECBench: Can Multi-modal Foundation Models Understand the Egocentric World? A Holistic Embodied Cognition Benchmark
Ronghao Dang, Yuqian Yuan, Wenqi Zhang et al.
Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation and Reconstruction
Yuanhao Cai, He Zhang, Kai Zhang et al.
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao, Chenlu Ye, Quanquan Gu et al.
Training Neural Networks as Recognizers of Formal Languages
Alexandra Butoi, Ghazal Khalighinejad, Anej Svete et al.
SensorLM: Learning the Language of Wearable Sensors
Yuwei Zhang, Kumar Ayush, Siyuan Qiao et al.
Scalable Influence and Fact Tracing for Large Language Model Pretraining
Tyler Chang, Dheeraj Rajagopal, Tolga Bolukbasi et al.
xLSTM-Mixer: Multivariate Time Series Forecasting by Mixing via Scalar Memories
Maurice Kraus, Felix Divo, Devendra Singh Dhami et al.
EmoDubber: Towards High Quality and Emotion Controllable Movie Dubbing
Gaoxiang Cong, Jiadong Pan, Liang Li et al.
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
Xiaoyuan Liu, Tian Liang, Zhiwei He et al.
Law of Vision Representation in MLLMs
Shijia Yang, Bohan Zhai, Quanzeng You et al.
SciReplicate-Bench: Benchmarking LLMs in Agent-driven Algorithmic Reproduction from Research Papers
Yanzheng Xiang, Hanqi Yan, Shuyin Ouyang et al.
Simulating Human-like Daily Activities with Desire-driven Autonomy
Yiding Wang, Yuxuan Chen, Fangwei Zhong et al.
Brains vs. Bytes: Evaluating LLM Proficiency in Olympiad Mathematics
Hamed Mahdavi, Alireza Hashemi, Majid Daliri et al.
Learnable Expansion of Graph Operators for Multi-Modal Feature Fusion
Dexuan Ding, Lei Wang, Liyun Zhu et al.
Prior-guided Hierarchical Harmonization Network for Efficient Image Dehazing
Xiongfei Su, Siyuan Li, Yuning Cui et al.
A Many-Objective Problem Where Crossover Is Provably Indispensable
Andre Opris
Does SGD really happen in tiny subspaces?
Minhak Song, Kwangjun Ahn, Chulhee Yun
Where am I? Cross-View Geo-localization with Natural Language Descriptions
Junyan Ye, Honglin Lin, Leyan Ou et al.
DreamOmni: Unified Image Generation and Editing
Bin Xia, Yuechen Zhang, Jingyao Li et al.
LLMs Can Plan Only If We Tell Them
Bilgehan Sel, Ruoxi Jia, Ming Jin
Memory Injection Attacks on LLM Agents via Query-Only Interaction
Shen Dong, Shaochen Xu, Pengfei He et al.
Debate or Vote: Which Yields Better Decisions in Multi-Agent Large Language Models?
Hyeong Kyu Choi, Jerry Zhu, Sharon Li
OATS: Outlier-Aware Pruning Through Sparse and Low Rank Decomposition
Stephen Zhang, Vardan Papyan
Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation
Nicolas Dufour, Vicky Kalogeiton, David Picard et al.
Track-On: Transformer-based Online Point Tracking with Memory
Görkay Aydemir, Xiongyi Cai, Weidi Xie et al.
Task Vectors in In-Context Learning: Emergence, Formation, and Benefits
Liu Yang, Ziqian Lin, Kangwook Lee et al.
RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers
Yan Gong, Yiren Song, Yicheng Li et al.
PuzzleFusion++: Auto-agglomerative 3D Fracture Assembly by Denoise and Verify
Zhengqing Wang, Jiacheng Chen, Yasutaka Furukawa
AllTracker: Efficient Dense Point Tracking at High Resolution
Adam Harley, Yang You, Yang Zheng et al.
FD2-Net: Frequency-Driven Feature Decomposition Network for Infrared-Visible Object Detection
Ke Li, Di Wang, Zhangyuan Hu et al.
GOPlan: Goal-conditioned Offline Reinforcement Learning by Planning with Learned Models
Mianchu Wang, Rui Yang, Xi Chen et al.
Equivariant Neural Functional Networks for Transformers
Viet-Hoang Tran, Thieu Vo, An Nguyen et al.
Discrete to Continuous: Generating Smooth Transition Poses from Sign Language Observations
Shengeng Tang, Jiayi He, Lechao Cheng et al.
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He, Qihang Yu, Qihao Liu et al.
TabDPT: Scaling Tabular Foundation Models on Real Data
Junwei Ma, Valentin Thomas, Rasa Hosseinzadeh et al.
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models
Dewei Zhou, Mingwei Li, Zongxin Yang et al.
Few-Shot Recognition via Stage-Wise Retrieval-Augmented Finetuning
Tian Liu, Huixin Zhang, Shubham Parashar et al.
Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs
Zijia Zhao, Haoyu Lu, Yuqi Huo et al.
AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration
Andy Zhou, Kevin Wu, Francesco Pinto et al.
Systematic Outliers in Large Language Models
Yongqi An, Xu Zhao, Tao Yu et al.
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization
Jui-Nan Yen, Si Si, Zhao Meng et al.
Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs
Yuzhe Gu, Wenwei Zhang, Chengqi Lyu et al.
Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators
Dingkang Yang, Dongling Xiao, Jinjie Wei et al.
Universal Cross-Tokenizer Distillation via Approximate Likelihood Matching
Benjamin Minixhofer, Ivan Vulić, Edoardo Maria Ponti
RoboScape: Physics-informed Embodied World Model
Yu Shang, Xin Zhang, Yinzhou Tang et al.
OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting
Yongsheng Yu, Ziyun Zeng, Haitian Zheng et al.
FrugalNeRF: Fast Convergence for Extreme Few-shot Novel View Synthesis without Learned Priors
Chin-Yang Lin, Chung-Ho Wu, Changhan Yeh et al.
Scaling Properties of Diffusion Models For Perceptual Tasks
Rahul Ravishankar, Zeeshan Patel, Jathushan Rajasegaran et al.
Is Artificial Intelligence Generated Image Detection a Solved Problem?
Ziqiang Li, Jiazhen Yan, Ziwen He et al.
GaussianFlowOcc: Sparse and Weakly Supervised Occupancy Estimation using Gaussian Splatting and Temporal Flow
Simon Boeder, Fabian Gigengack, Benjamin Risse
Efficient Automated Circuit Discovery in Transformers using Contextual Decomposition
Aliyah Hsu, Georgia Zhou, Yeshwanth Cherapanamjeri et al.
ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning
Yarden As, Bhavya, Lenart Treven et al.
GGS: Generalizable Gaussian Splatting for Lane Switching in Autonomous Driving
Huasong Han, Kaixuan Zhou, Xiaoxiao Long et al.
Falcon: Faster and Parallel Inference of Large Language Models Through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree
Xiangxiang Gao, Weisheng Xie, Yiwei Xiang et al.
Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models
Yiran Guo, Lijie Xu, Jie Liu et al.
DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra
Montgomery Bohde, Mrunali Manjrekar, Runzhong Wang et al.
Cross the Gap: Exposing the Intra-modal Misalignment in CLIP via Modality Inversion
Marco Mistretta, Alberto Baldrati, Lorenzo Agnolucci et al.
UNSURE: self-supervised learning with Unknown Noise level and Stein's Unbiased Risk Estimate
Julián Tachella, Mike Davies, Laurent Jacques
Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs
Lei Zhang, Yunshui Li, Jiaming Li et al.
Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval
Yuanmin Tang, Jing Yu, Keke Gai et al.
Security Attacks on LLM-based Code Completion Tools
Wen Cheng, Ke Sun, Xinyu Zhang et al.
Retrieval Augmented Time Series Forecasting
Sungwon Han, Seungeon Lee, MEEYOUNG CHA et al.
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
Jiuhai Chen, Jianwei Yang, Haiping Wu et al.
FreeTimeGS: Free Gaussian Primitives at Anytime Anywhere for Dynamic Scene Reconstruction
Yifan Wang, Peishan Yang, Zhen Xu et al.
VMBench: A Benchmark for Perception-Aligned Video Motion Generation
Xinran Ling, Chen Zhu, Meiqi Wu et al.
Multi-Domain Graph Foundation Models: Robust Knowledge Transfer via Topology Alignment
Shuo Wang, Bokui Wang, Zhixiang Shen et al.
Adaptive teachers for amortized samplers
Minsu Kim, Sanghyeok Choi, Taeyoung Yun et al.
LightPROF: A Lightweight Reasoning Framework for Large Language Model on Knowledge Graph
Tu Ao, Yanhua Yu, Yuling Wang et al.
F-Fidelity: A Robust Framework for Faithfulness Evaluation of Explainable AI
Xu Zheng, Farhad Shirani, Zhuomin Chen et al.
Transformers Can Learn Temporal Difference Methods for In-Context Reinforcement Learning
Jiuqi Wang, Ethan Blaser, Hadi Daneshmand et al.
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
Sagnik Mukherjee, Lifan Yuan, Dilek Hakkani-Tur et al.
ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems
Xiangyuan Xue, Zeyu Lu, Di Huang et al.
BillBoard Splatting (BBSplat): Learnable Textured Primitives for Novel View Synthesis
David Svitov, Pietro Morerio, Lourdes Agapito et al.
Language Guided Skill Discovery
Seungeun Rho, Laura Smith, Tianyu Li et al.
From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications
Ajay Jaiswal, Yifan Wang, Lu Yin et al.
Re-Thinking Inverse Graphics With Large Language Models
Haiwen Feng, Michael J Black, Weiyang Liu et al.
AutoToM: Scaling Model-based Mental Inference via Automated Agent Modeling
Zhining Zhang, Chuanyang Jin, Mung Yao Jia et al.
Stochastic Deep Restoration Priors for Imaging Inverse Problems
Yuyang Hu, Albert Peng, Weijie Gan et al.
Thermalizer: Stable autoregressive neural emulation of spatiotemporal chaos
Chris Pedersen, Laure Zanna, Joan Bruna
RocketEval: Efficient automated LLM evaluation via grading checklist
Tianjun Wei, Wei Wen, Ruizhi Qiao et al.
Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach
Jiancong Xiao, Bojian Hou, Zhanliang Wang et al.
Can We Talk Models Into Seeing the World Differently?
Paul Gavrikov, Jovita Lukasik, Steffen Jung et al.
Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices
Junyan Lin, Haoran Chen, Yue Fan et al.
RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
Haifeng Huang, Xinyi Chen, Yilun Chen et al.
Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval
Sheryl Hsu, Omar Khattab, Chelsea Finn et al.
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Cheol Jun Cho, Nicholas Lee, Akshat Gupta et al.
FineLIP: Extending CLIP’s Reach via Fine-Grained Alignment with Longer Text Inputs
Mothilal Asokan, Kebin wu, Fatima Albreiki
Scalable Discrete Diffusion Samplers: Combinatorial Optimization and Statistical Physics
Sebastian Sanokowski, Wilhelm Berghammer, Haoyu Wang et al.
Generating Multi-Image Synthetic Data for Text-to-Image Customization
Nupur Kumari, Xi Yin, Jun-Yan Zhu et al.
Assessing Modality Bias in Video Question Answering Benchmarks with Multimodal Large Language Models
Jean Park, Kuk Jin Jang, Basam Alasaly et al.
Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning
Qinghao Ye, Xianhan Zeng, Fu Li et al.
PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs
Oskar van der Wal, Pietro Lesci, Max Müller-Eberstein et al.
MallowsPO: Fine-Tune Your LLM with Preference Dispersions
Haoxian Chen, Hanyang Zhao, Henry Lam et al.
TimeDP: Learning to Generate Multi-Domain Time Series with Domain Prompts
Yu-Hao Huang, Chang Xu, Yueying Wu et al.
Identifiable Exchangeable Mechanisms for Causal Structure and Representation Learning
Patrik Reizinger, Siyuan Guo, Ferenc Huszar et al.
Recoverable Compression: A Multimodal Vision Token Recovery Mechanism Guided by Text Information
Yi Chen, Jian Xu, Xu-Yao Zhang et al.
Wasserstein Flow Matching: Generative Modeling Over Families of Distributions
Doron Haviv, Aram-Alexandre Pooladian, Dana Pe'er et al.
FaithDiff: Unleashing Diffusion Priors for Faithful Image Super-resolution
Junyang Chen, Jinshan Pan, Jiangxin Dong
Image-level Memorization Detection via Inversion-based Inference Perturbation
Yue Jiang, Haokun Lin, Yang Bai et al.
To Trust or Not to Trust? Enhancing Large Language Models' Situated Faithfulness to External Contexts
Yukun Huang, Sanxing Chen, Hongyi Cai et al.
The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense
Yangyang Guo, Fangkai Jiao, Liqiang Nie et al.
Dynamic Camera Poses and Where to Find Them
Chris Rockwell, Joseph Tung, Tsung-Yi Lin et al.
Horizon Reduction Makes RL Scalable
Seohong Park, Kevin Frans, Deepinder Mann et al.
OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycles
Yihe Deng, Hritik Bansal, Fan Yin et al.
S2Gaussian: Sparse-View Super-Resolution 3D Gaussian Splatting
Yecong Wan, Mingwen Shao, Yuanshuo Cheng et al.
Personalized Federated Learning for Spatio-Temporal Forecasting: A Dual Semantic Alignment-Based Contrastive Approach
Qingxiang Liu, Sheng Sun, Yuxuan Liang et al.
Bridging the Data Provenance Gap Across Text, Speech, and Video
Shayne Longpre, Nikhil Singh, Manuel Cherep et al.
Mitigate the Gap: Improving Cross-Modal Alignment in CLIP
Sedigheh Eslami, Gerard de Melo
xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation
Qingchen Yu, Zifan Zheng, Shichao Song et al.
AutoPartGen: Autoregressive 3D Part Generation and Discovery
Minghao Chen, Jianyuan Wang, Roman Shapovalov et al.
Apollo-MILP: An Alternating Prediction-Correction Neural Solving Framework for Mixed-Integer Linear Programming
Haoyang Liu, Jie Wang, Zijie Geng et al.
DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation
Changdae Oh, Yixuan Li, Kyungwoo Song et al.
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
Tianjin Huang, Ziquan Zhu, Gaojie Jin et al.
Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis
Jiapeng Zhu, Ceyuan Yang, Kecheng Zheng et al.
Any6D: Model-free 6D Pose Estimation of Novel Object
Taeyeop Lee, Bowen Wen, Minjun Kang et al.
GotenNet: Rethinking Efficient 3D Equivariant Graph Neural Networks
Sarp Aykent, Tian Xia
Streamlining Redundant Layers to Compress Large Language Models
Xiaodong Chen, Yuxuan Hu, Jing Zhang et al.
VLM4D: Towards Spatiotemporal Awareness in Vision Language Models
Shijie Zhou, Alexander Vilesov, Xuehai He et al.
Power Lines: Scaling laws for weight decay and batch size in LLM pre-training
Shane Bergsma, Nolan Dey, Gurpreet Gosal et al.
Emergence and scaling laws in SGD learning of shallow neural networks
Yunwei Ren, Eshaan Nichani, Denny Wu et al.
Transformers Struggle to Learn to Search
Abulhair Saparov, Srushti Ajay Pawar, Shreyas Pimpalgaonkar et al.
AgentAuditor: Human-level Safety and Security Evaluation for LLM Agents
Hanjun Luo, Shenyu Dai, Chiming Ni et al.
Scaling Vision Pre-Training to 4K Resolution
Baifeng Shi, Boyi Li, Han Cai et al.
V2M4: 4D Mesh Animation Reconstruction from a Single Monocular Video
Jianqi Chen, Biao Zhang, Xiangjun Tang et al.
SAFE: Multitask Failure Detection for Vision-Language-Action Models
Qiao Gu, Yuanliang Ju, Shengxiang Sun et al.
VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models
JIACHENG RUAN, Wenzhen Yuan, Xian Gao et al.
FOLDER: Accelerating Multi-Modal Large Language Models with Enhanced Performance
Haicheng Wang, Zhemeng Yu, Gabriele Spadaro et al.
Spiking Vision Transformer with Saccadic Attention
Shuai Wang, Malu Zhang, Dehao Zhang et al.
Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization
Zichen Miao, Zhengyuan Yang, Kevin Lin et al.
Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning
Yuhao Zhou, Yiheng Wang, Xuming He et al.
Two-stream Beats One-stream: Asymmetric Siamese Network for Efficient Visual Tracking
Jiawen Zhu, Huayi Tang, Xin Chen et al.
ManiSkill-HAB: A Benchmark for Low-Level Manipulation in Home Rearrangement Tasks
Arth Shukla, Stone Tao, Hao Su
The Pitfalls of Memorization: When Memorization Hurts Generalization
Reza Bayat, Mohammad Pezeshki, Elvis Dohmatob et al.
LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs
Jiarui Wang, Huiyu Duan, Yu Zhao et al.
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
Yuxian Gu, Qinghao Hu, Haocheng Xi et al.
Position: Editing Large Language Models Poses Serious Safety Risks
Paul Youssef, Zhixue Zhao, Daniel Braun et al.
Continuous Ensemble Weather Forecasting with Diffusion models
Martin Andrae, Tomas Landelius, Joel Oskarsson et al.
Learning Efficient Positional Encodings with Graph Neural Networks
Charilaos Kanatsoulis, Evelyn Choi, Stefanie Jegelka et al.
Boosting Generative Image Modeling via Joint Image-Feature Synthesis
Theodoros Kouzelis, Efstathios Karypidis, Ioannis Kakogeorgiou et al.
One-for-All Few-Shot Anomaly Detection via Instance-Induced Prompt Learning
Wenxi Lv, Qinliang Su, Wenchao Xu