Most Cited 2025 "state-object pairs" Papers
22,274 papers found • Page 11 of 112
Conference
Diffusion on Language Model Encodings for Protein Sequence Generation
Viacheslav Meshchaninov, Pavel Strashnov, Andrey Shevtsov et al.
SAIST: Segment Any Infrared Small Target Model Guided by Contrastive Language-Image Pretraining
Mingjin Zhang, Xiaolong Li, Fei Gao et al.
Mitigate the Gap: Improving Cross-Modal Alignment in CLIP
Sedigheh Eslami, Gerard de Melo
Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting
Runsong Zhu, Shi Qiu, ZHENGZHE LIU et al.
ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement
XIANGYU PENG, Congying Xia, Xinyi Yang et al.
Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models
Shicheng Xu, Liang Pang, Yunchang Zhu et al.
Towards a Theoretical Understanding of Synthetic Data in LLM Post-Training: A Reverse-Bottleneck Perspective
Zeyu Gan, Yong Liu
Quantization without Tears
Minghao Fu, Hao Yu, Jie Shao et al.
Pippo: High-Resolution Multi-View Humans from a Single Image
Yash Kant, Ethan Weber, Jin Kyu Kim et al.
Provably Accurate Shapley Value Estimation via Leverage Score Sampling
Christopher Musco, R. Teal Witter
Inference-Time Hyper-Scaling with KV Cache Compression
Adrian Łańcucki, Konrad Staniszewski, Piotr Nawrot et al.
Video Diffusion Models Are Strong Video Inpainter
Minhyeok Lee, Suhwan Cho, Chajin Shin et al.
Can Transformers Learn Full Bayesian Inference in Context?
Arik Reuter, Tim G. J. Rudner, Vincent Fortuin et al.
Refine Knowledge of Large Language Models via Adaptive Contrastive Learning
Yinghui Li, Haojing Huang, Jiayi Kuang et al.
Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model
Wenhong Zhu, Zhiwei He, Xiaofeng Wang et al.
NEST: A Neuromodulated Small-world Hypergraph Trajectory Prediction Model for Autonomous Driving
Chengyue Wang, Haicheng Liao, Bonan Wang et al.
ThinkBench: Dynamic Out-of-Distribution Evaluation for Robust LLM Reasoning
Shulin Huang, Linyi Yang, Yan Song et al.
IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation
Yiren Song, Pei Yang, Hai Ci et al.
Optimization with Access to Auxiliary Information
EL MAHDI CHAYTI, Sai Karimireddy
Weak-to-Strong Generalization Through the Data-Centric Lens
Changho Shin, John Cooper, Frederic Sala
Synthetic-to-Real Self-supervised Robust Depth Estimation via Learning with Motion and Structure Priors
Weilong Yan, Ming Li, Li Haipeng et al.
UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing
Yiheng Li, RuiBing Hou, Hong Chang et al.
Ready-to-React: Online Reaction Policy for Two-Character Interaction Generation
Zhi Cen, Huaijin Pi, Sida Peng et al.
Let LRMs Break Free from Overthinking via Self-Braking Tuning
Haoran Zhao, Yuchen Yan, Yongliang Shen et al.
Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations
Ji-An Li, Huadong Xiong, Robert Wilson et al.
AG-VPReID: A Challenging Large-Scale Benchmark for Aerial-Ground Video-based Person Re-Identification
Huy Nguyen, Kien Nguyen Thanh, Akila Pemasiri et al.
SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs
Mohammad Mozaffari, Amir Yazdanbakhsh, Zhao Zhang et al.
Conformal Prediction for Causal Effects of Continuous Treatments
Maresa Schröder, Dennis Frauen, Jonas Schweisthal et al.
MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation
Shuwei Shi, Biao Gong, Xi Chen et al.
Benchmarking LLMs' Judgments with No Gold Standard
Shengwei Xu, Yuxuan Lu, Grant Schoenebeck et al.
Concept Bottleneck Language Models For Protein Design
Aya Ismail, Tuomas Oikarinen, Amy Wang et al.
Ctrl-U: Robust Conditional Image Generation via Uncertainty-aware Reward Modeling
Guiyu Zhang, Huan-ang Gao, Zijian Jiang et al.
An Engorgio Prompt Makes Large Language Model Babble on
Jianshuo Dong, Ziyuan Zhang, Qingjie Zhang et al.
A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules
Kairong Luo, Haodong Wen, Shengding Hu et al.
Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties
Gouki Minegishi, Hiroki Furuta, Takeshi Kojima et al.
ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large Language Models
Hao Yin, Guangzong Si, Zilei Wang
The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise
Shuze Daniel Liu, Shuhang Chen, Shangtong Zhang
VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception
Ziang Yan, Yinan He, Xinhao Li et al.
ASIGN: An Anatomy-aware Spatial Imputation Graphic Network for 3D Spatial Transcriptomics
Junchao Zhu, Ruining Deng, Tianyuan Yao et al.
MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models
Jingwei Xu, Junyu Lai, Yunpeng Huang
Ward: Provable RAG Dataset Inference via LLM Watermarks
Nikola Jovanović, Robin Staab, Maximilian Baader et al.
Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation is Wasteful
Martin Marek, Sanae Lotfi, Aditya Somasundaram et al.
InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation
Gaurav Sahu, Abhay Puri, Juan A. Rodriguez et al.
Mix Data or Merge Models? Balancing the Helpfulness, Honesty, and Harmlessness of Large Language Model via Model Merging
Jinluan Yang, Dingnan Jin, Anke Tang et al.
LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs
Jiarui Wang, Huiyu Duan, Yu Zhao et al.
Adversarial Locomotion and Motion Imitation for Humanoid Policy Learning
Jiyuan Shi, Xinzhe Liu, Dewei Wang et al.
Emergence and scaling laws in SGD learning of shallow neural networks
Yunwei Ren, Eshaan Nichani, Denny Wu et al.
Overcoming Lower-Level Constraints in Bilevel Optimization: A Novel Approach with Regularized Gap Functions
Wei Yao, Haian Yin, Shangzhi Zeng et al.
Sum of Squares Circuits
Lorenzo Loconte, Stefan Mengel, Antonio Vergari
Revisiting Nearest Neighbor for Tabular Data: A Deep Tabular Baseline Two Decades Later
Han-Jia Ye, Huai-Hong Yin, De-Chuan Zhan et al.
EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code
Yuhao Qing, Boyu Zhu, Mingzhe Du et al.
TANGO: Training-free Embodied AI Agents for Open-world Tasks
Filippo Ziliotto, Tommaso Campari, Luciano Serafini et al.
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling
Jian Yang, Dacheng Yin, Yizhou Zhou et al.
MapExpert: Online HD Map Construction with Simple and Efficient Sparse Map Element Expert
Dapeng Zhang, Dayu Chen, Peng Zhi et al.
BotSim: LLM-Powered Malicious Social Botnet Simulation
Boyu Qiao, Kun Li, Wei Zhou et al.
MIRAGE: Evaluating and Explaining Inductive Reasoning Process in Language Models
Jiachun Li, Pengfei Cao, Zhuoran Jin et al.
KITS: Inductive Spatio-Temporal Kriging with Increment Training Strategy
Qianxiong Xu, Cheng Long, Ziyue Li et al.
Surprising Effectiveness of pretraining Ternary Language Model at Scale
Ayush Kaushal, Tejas Vaidhya, Arnab Mondal et al.
ATLAS: Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data
Xiaoyang Liu, Kangjie Bao, Jiashuo Zhang et al.
Enhancing Vision-Language Model Reliability with Uncertainty-Guided Dropout Decoding
Yixiong Fang, Ziran Yang, Zhaorun Chen et al.
MemSim: A Bayesian Simulator for Evaluating Memory of LLM-based Personal Assistants
Zeyu Zhang, Quanyu Dai, Luyu Chen et al.
Asymmetric REINFORCE for off-Policy Reinforcement Learning: Balancing positive and negative rewards
Charles Arnal, Gaëtan Narozniak, Vivien Cabannes et al.
Classical Planning with LLM-Generated Heuristics: Challenging the State of the Art with Python Code
Augusto B. Corrêa, André G. Pereira, Jendrik Seipp
Measuring Non-Adversarial Reproduction of Training Data in Large Language Models
Michael Aerni, Javier Rando, Edoardo Debenedetti et al.
Robust Self-Paced Hashing for Cross-Modal Retrieval with Noisy Labels
Ruitao Pu, Yuan Sun, Yang Qin et al.
MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks
Yinghao Zhu, Ziyi He, Haoran Hu et al.
Efficient Inference for Large Language Model-based Generative Recommendation
Xinyu Lin, Chaoqun Yang, Wenjie Wang et al.
Personalized Federated Learning for Spatio-Temporal Forecasting: A Dual Semantic Alignment-Based Contrastive Approach
Qingxiang Liu, Sheng Sun, Yuxuan Liang et al.
Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation
Abdelrahman Eldesokey, Peter Wonka
Perception-as-Control: Fine-grained Controllable Image Animation with 3D-aware Motion Representation
Yingjie Chen, Yifang Men, Yuan Yao et al.
Detecting High-Stakes Interactions with Activation Probes
Alex McKenzie, Urja Pawar, Phil Blandfort et al.
Scaling Inference Time Compute for Diffusion Models
Nanye Ma, Shangyuan Tong, Haolin Jia et al.
Local Conditional Controlling for Text-to-Image Diffusion Models
Yibo Zhao, Liang Peng, Yang Yang et al.
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
Lucas Bandarkar, Benjamin Muller, Pritish Yuvraj et al.
DVP-MVS: Synergize Depth-Edge and Visibility Prior for Multi-View Stereo
Zhenlong Yuan, Jinguo Luo, Fei Shen et al.
Referring to Any Person
Qing Jiang, Lin Wu, Zhaoyang Zeng et al.
UGMathBench: A Diverse and Dynamic Benchmark for Undergraduate-Level Mathematical Reasoning with Large Language Models
Xin Xu, Jiaxin ZHANG, Tianhao Chen et al.
Compression of 3D Gaussian Splatting with Optimized Feature Planes and Standard Video Codecs
Soonbin Lee, Fangwen Shu, Yago Sanchez de la Fuente et al.
VPO: Aligning Text-to-Video Generation Models with Prompt Optimization
Jiale Cheng, Ruiliang Lyu, Xiaotao Gu et al.
PipeFusion: Patch-level Pipeline Parallelism for Diffusion Transformers Inference
Jiarui Fang, Jinzhe Pan, Aoyu Li et al.
FlowDec: A flow-based full-band general audio codec with high perceptual quality
Simon Welker, Matthew Le, Ricky T. Q. Chen et al.
RoboTron-Mani: All-in-One Multimodal Large Model for Robotic Manipulation
Feng yan, Fanfan Liu, Yiyang Huang et al.
Trivialized Momentum Facilitates Diffusion Generative Modeling on Lie Groups
Yuchen Zhu, Tianrong Chen, Lingkai Kong et al.
HRAvatar: High-Quality and Relightable Gaussian Head Avatar
Dongbin Zhang, Yunfei Liu, Lijian Lin et al.
Eve: Efficient Multimodal Vision Language Models with Elastic Visual Experts
Miao Rang, Zhenni Bi, Chuanjian Liu et al.
AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models
Ziyin Zhou, Yunpeng Luo, Yuanchen Wu et al.
Neuroplastic Expansion in Deep Reinforcement Learning
Jiashun Liu, Johan S Obando Ceron, Aaron Courville et al.
RI-MAE: Rotation-Invariant Masked AutoEncoders for Self-Supervised Point Cloud Representation Learning
Kunming Su, Qiuxia Wu, Panpan Cai et al.
ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning
Tonghe Zhang, Chao Yu, Sichang Su et al.
C-CLIP: Multimodal Continual Learning for Vision-Language Model
Wenzhuo Liu, Fei Zhu, Longhui Wei et al.
AdaWM: Adaptive World Model based Planning for Autonomous Driving
Hang Wang, Xin Ye, Feng Tao et al.
Detect Anything 3D in the Wild
Hanxue Zhang, Haoran Jiang, Qingsong Yao et al.
ECVC: Exploiting Non-Local Correlations in Multiple Frames for Contextual Video Compression
Wei Jiang, Junru Li, Kai Zhang et al.
Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models
Fu-Yun Wang, Yunhao Shui, Jingtan Piao et al.
What Makes a Maze Look Like a Maze?
Joy Hsu, Jiayuan Mao, Joshua B Tenenbaum et al.
OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models
William Chen, Jinchuan Tian, Yifan Peng et al.
Are Large Vision Language Models Good Game Players?
Xinyu Wang, Bohan Zhuang, Qi Wu
MVSAnywhere: Zero-Shot Multi-View Stereo
Sergio Izquierdo, Mohamed Sayed, Michael Firman et al.
UniNet: A Contrastive Learning-guided Unified Framework with Feature Selection for Anomaly Detection
Shun Wei, Jielin Jiang, Xiaolong Xu
UFM: A Simple Path towards Unified Dense Correspondence with Flow
Yuchen Zhang, Nikhil Keetha, Chenwei Lyu et al.
Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
Vaishnavh Nagarajan, Chen Wu, Charles Ding et al.
Faster Algorithms for Structured Linear and Kernel Support Vector Machines
Yuzhou Gu, Zhao Song, Lichen Zhang
Spike2Former: Efficient Spiking Transformer for High-performance Image Segmentation
Zhenxin Lei, Man Yao, Jiakui Hu et al.
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
Seanie Lee, Haebin Seong, Dong Bok Lee et al.
ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models
Ozgur Kara, Krishna Kumar Singh, Feng Liu et al.
On the Feature Learning in Diffusion Models
Andi Han, Wei Huang, Yuan Cao et al.
Scaling Autonomous Agents via Automatic Reward Modeling And Planning
Zhenfang Chen, Delin Chen, Rui Sun et al.
PRAGA: Prototype-aware Graph Adaptive Aggregation for Spatial Multi-modal Omics Analysis
Xinlei Huang, Zhiqi Ma, Dian Meng et al.
R-TPT: Improving Adversarial Robustness of Vision-Language Models through Test-Time Prompt Tuning
Lijun Sheng, Jian Liang, Zilei Wang et al.
FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models
Haokun Chen, Hang Li, Yao Zhang et al.
Temporal Query Network for Efficient Multivariate Time Series Forecasting
Shengsheng Lin, Haojun Chen, Haijie Wu et al.
Efficient Track Anything
Yunyang Xiong, Chong Zhou, Xiaoyu Xiang et al.
Fréchet Wavelet Distance: A Domain-Agnostic Metric for Image Generation
Lokesh Veeramacheneni, Moritz Wolter, Hilde Kuehne et al.
CoA-VLA: Improving Vision-Language-Action Models via Visual-Text Chain-of-Affordance
Jinming Li, Yichen Zhu, Zhibin Tang et al.
Dissecting and Mitigating Diffusion Bias via Mechanistic Interpretability
Yingdong Shi, Changming Li, Yifan Wang et al.
Truncated Consistency Models
Sangyun Lee, Yilun Xu, Tomas Geffner et al.
Event-based Video Super-Resolution via State Space Models
Zeyu Xiao, Xinchao Wang
On a Connection Between Imitation Learning and RLHF
Teng Xiao, Yige Yuan, Mingxiao Li et al.
Multimodal Class-aware Semantic Enhancement Network for Audio-Visual Video Parsing
Pengcheng Zhao, Jinxing Zhou, Yang Zhao et al.
xFinder: Large Language Models as Automated Evaluators for Reliable Evaluation
Qingchen Yu, Zifan Zheng, Shichao Song et al.
CirT: Global Subseasonal-to-Seasonal Forecasting with Geometry-inspired Transformer
Yang Liu, Zinan Zheng, Jiashun Cheng et al.
Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining
Daouda Sow, Herbert Woisetschläger, Saikiran Bulusu et al.
UniOcc: A Unified Benchmark for Occupancy Forecasting and Prediction in Autonomous Driving
Yuping Wang, Xiangyu Huang, Xiaokang Sun et al.
Personalized Preference Fine-tuning of Diffusion Models
Meihua Dang, Anikait Singh, Linqi Zhou et al.
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
Yingzi Ma, Jiongxiao Wang, Fei Wang et al.
CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models
Song Wang, Peng Wang, Tong Zhou et al.
Unlearning through Knowledge Overwriting: Reversible Federated Unlearning via Selective Sparse Adapter
Zhengyi Zhong, Weidong Bao, Ji Wang et al.
Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization
Tao Zhang, Cheng Da, Kun Ding et al.
Synthetic Data is an Elegant GIFT for Continual Vision-Language Models
Bin Wu, Wuxuan Shi, Jinqiao Wang et al.
PILAF: Optimal Human Preference Sampling for Reward Modeling
Yunzhen Feng, Ariel Kwiatkowski, Kunhao Zheng et al.
HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
Runhui Huang, Xinpeng Ding, Chunwei Wang et al.
Implicit Search via Discrete Diffusion: A Study on Chess
Jiacheng Ye, Zhenyu Wu, Jiahui Gao et al.
Sign-IDD: Iconicity Disentangled Diffusion for Sign Language Production
Shengeng Tang, Jiayi He, Dan Guo et al.
CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion
Yunlong Tang, Gen Zhan, Li Yang et al.
Force Prompting: Video Generation Models Can Learn And Generalize Physics-based Control Signals
Nate Gillman, Charles Herrmann, Michael Freeman et al.
Grounding Continuous Representations in Geometry: Equivariant Neural Fields
David Wessels, David Knigge, Riccardo Valperga et al.
Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping
Ziye Huang, Haoqi Yuan, Yuhui Fu et al.
ACC-Collab: An Actor-Critic Approach to Multi-Agent LLM Collaboration
Andrew Estornell, Jean-Francois Ton, Yuanshun Yao et al.
High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws
Muhammed Ildiz, Halil Gozeten, Ege Taga et al.
Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments
Luke Rowe, Roger Girgis, Anthony Gosselin et al.
How Contaminated Is Your Benchmark? Measuring Dataset Leakage in Large Language Models with Kernel Divergence
Hyeong Kyu Choi, Maxim Khanov, Hongxin Wei et al.
Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis
Yu Yuan, Xijun Wang, Yichen Sheng et al.
Identifiable Exchangeable Mechanisms for Causal Structure and Representation Learning
Patrik Reizinger, Siyuan Guo, Ferenc Huszar et al.
Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching
Bin Wang, Fan Wu, Linke Ouyang et al.
Exploring More from Multiple Gait Modalities for Human Identification
Dongyang Jin, Chao Fan, Weihua Chen et al.
Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis
Zikun Zhang, Zixiang Chen, Quanquan Gu
Do LLMs estimate uncertainty well in instruction-following?
Juyeon Heo, Miao Xiong, Christina Heinze-Deml et al.
MVTokenFlow: High-quality 4D Content Generation using Multiview Token Flow
Hanzhuo Huang, Yuan Liu, Ge Zheng et al.
Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints
Ming Dai, Jian Li, Jiedong Zhuang et al.
HEROS-GAN: Honed-Energy Regularized and Optimal Supervised GAN for Enhancing Accuracy and Range of Low-Cost Accelerometers
Yifeng Wang, Yi Zhao
MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning
Yaming Yang, Dilxat Muhtar, Yelong Shen et al.
VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning
Xueqing Wu, Yuheng Ding, Bingxuan Li et al.
SkillMimic: Learning Basketball Interaction Skills from Demonstrations
Yinhuai Wang, Qihan Zhao, Runyi Yu et al.
Adding Conditional Control to Diffusion Models with Reinforcement Learning
Yulai Zhao, Masatoshi Uehara, Gabriele Scalia et al.
DRAWER: Digital Reconstruction and Articulation With Environment Realism
Hongchi Xia, Entong Su, Marius Memmel et al.
On the Relationship Between Monotone and Squared Probabilistic Circuits
Benjie Wang, Guy Van den Broeck
A Periodic Bayesian Flow for Material Generation
Hanlin Wu, Yuxuan Song, Jingjing Gong et al.
ActSafe: Active Exploration with Safety Constraints for Reinforcement Learning
Yarden As, Bhavya, Lenart Treven et al.
MindJourney: Test-Time Scaling with World Models for Spatial Reasoning
Yuncong Yang, Jiageng Liu, Zheyuan Zhang et al.
VideoHallu: Evaluating and Mitigating Multi-modal Hallucinations on Synthetic Video Understanding
Zongxia Li, Xiyang Wu, Guangyao Shi et al.
Contextual Bandits for Unbounded Context Distributions
Puning Zhao, Rongfei Fan, Shaowei Wang et al.
Standardizing Structural Causal Models
Weronika Ormaniec, Scott Sussex, Lars Lorch et al.
SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models
Haotian Xia, Zhengbang Yang, Junbo Zou et al.
Decision Information Meets Large Language Models: The Future of Explainable Operations Research
Yansen Zhang, Qingcan Kang, Wing Yin YU et al.
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
Jiale Cheng, Xiao Liu, Cunxiang Wang et al.
MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm
Ziyan Guo, Zeyu HU, Na Zhao et al.
Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven Optimization
Yue Zhang, Liqiang Jing, Vibhav Gogate
SecureGS: Boosting the Security and Fidelity of 3D Gaussian Splatting Steganography
Xuanyu Zhang, Jiarui Meng, Zhipei Xu et al.
UniMuMo: Unified Text, Music, and Motion Generation
Han Yang, Kun Su, Yutong Zhang et al.
NexusGS: Sparse View Synthesis with Epipolar Depth Priors in 3D Gaussian Splatting
Yulong Zheng, Zicheng Jiang, Shengfeng He et al.
Establishing Best Practices in Building Rigorous Agentic Benchmarks
Yuxuan Zhu, Tengjun Jin, Yada Pruksachatkun et al.
Mr. DETR: Instructive Multi-Route Training for Detection Transformers
Chang-Bin Zhang, Yujie Zhong, Kai Han
Citations and Trust in LLM Generated Responses
Yifan Ding, Matthew Facciani, Ellen Joyce et al.
Prototype-Based Image Prompting for Weakly Supervised Histopathological Image Segmentation
Qingchen Tang, Lei Fan, Maurice Pagnucco et al.
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
Shehreen Azad, Vibhav Vineet, Yogesh S. Rawat
Identifying Query-Relevant Neurons in Large Language Models for Long-Form Texts
Lihu Chen, Adam Dejl, Francesca Toni
Rethinking Diffusion Posterior Sampling: From Conditional Score Estimator to Maximizing a Posterior
Tongda Xu, Xiyan Cai, Xinjie Zhang et al.
MoDec-GS: Global-to-Local Motion Decomposition and Temporal Interval Adjustment for Compact Dynamic 3D Gaussian Splatting
Sangwoon Kwak, Joonsoo Kim, Jun Young Jeong et al.
Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo
Zachary Charles, Gabriel Teston, Lucio Dery et al.
RealGeneral: Unifying Visual Generation via Temporal In-Context Learning with Video Models
Yijing Lin, Mengqi Huang, Shuhan Zhuang et al.
Fully-inductive Node Classification on Arbitrary Graphs
Jianan Zhao, Zhaocheng Zhu, Mikhail Galkin et al.
HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation
Kun Liu, Qi Liu, Xinchen Liu et al.
Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts
Minh Le, Chau Nguyen, Huy Nguyen et al.
CyberPal.AI: Empowering LLMs with Expert-Driven Cybersecurity Instructions
Matan Levi, Yair Allouche, Daniel Ohayon et al.
Back on Track: Bundle Adjustment for Dynamic Scene Reconstruction
Weirong Chen, Ganlin Zhang, Felix Wimbauer et al.
ReSim: Reliable World Simulation for Autonomous Driving
Jiazhi Yang, Kashyap Chitta, Shenyuan Gao et al.
VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation
Saksham Singh Kushwaha, Yapeng Tian
LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos
Chin-Yang Lin, Cheng Sun, Fu-En Yang et al.
Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs
Jeongseok Hyun, Sukjun Hwang, Su Ho Han et al.
Improving Equivariant Networks with Probabilistic Symmetry Breaking
Hannah Lawrence, Vasco Portilheiro, Yan Zhang et al.
Jasmine: Harnessing Diffusion Prior for Self-supervised Depth Estimation
Jiyuan Wang, Chunyu Lin, cheng guan et al.
ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments
Hojae Han, seung-won hwang, Rajhans Samdani et al.
VisionArena: 230k Real World User-VLM Conversations with Preference Labels
Christopher Chou, Lisa Dunlap, Wei-Lin Chiang et al.
Ambient Diffusion Omni: Training Good Models with Bad Data
Giannis Daras, Adrian Rodriguez-Munoz, Adam Klivans et al.
Imputation for prediction: beware of diminishing returns.
Marine Le Morvan, Gael Varoquaux
$\mathbb{X}$-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs
Vlad Sobal, Mark Ibrahim, Randall Balestriero et al.
GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs
Advik Basani, Xiao Zhang
Stable Segment Anything Model
Qi Fan, Xin Tao, Lei Ke et al.
Enhancing Multi-Robot Semantic Navigation Through Multimodal Chain-of-Thought Score Collaboration
Zhixuan Shen, Haonan Luo, Kexun Chen et al.
Mitigating the Backdoor Effect for Multi-Task Model Merging via Safety-Aware Subspace
Jinluan Yang, Anke Tang, Didi Zhu et al.