Most Cited 2025 "linear rewards" Papers
22,274 papers found • Page 33 of 112
Conference
SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization
Hongrui Jia, Chaoya Jiang, Haiyang Xu et al.
MI-DETR: An Object Detection Model with Multi-time Inquiries Mechanism
Zhixiong Nan, Xianghong Li, Tao Xiang et al.
Robust Federated Finetuning of LLMs via Alternating Optimization of LoRA
Shuangyi Chen, Yuanxin Guo, Yue Ju et al.
APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers
Zhuguanyu Wu, Jiayi Zhang, Jiaxin Chen et al.
Building a Mind Palace: Structuring Environment-Grounded Semantic Graphs for Effective Long Video Analysis with LLMs
Zeyi Huang, Yuyang Ji, Xiaofang Wang et al.
Spatial Understanding from Videos: Structured Prompts Meet Simulation Data
Haoyu Zhang, Meng Liu, Zaijing Li et al.
Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences
Hyojin Bahng, Caroline Chan, Fredo Durand et al.
Integral Imprecise Probability Metrics
Siu Lun (Alan) Chau, Michele Caprio, Krikamol Muandet
Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization
Mingzhe Du, Anh Tuan Luu, Yue Liu et al.
SubTrack++ : Gradient Subspace Tracking for Scalable LLM Training
Sahar Rajabi, Nayeema Nonta, Sirisha Rambhatla
Co-op: Correspondence-based Novel Object Pose Estimation
Sungphill Moon, Hyeontae Son, Dongcheol Hur et al.
Language Driven Occupancy Prediction
Zhu Yu, Bowen Pang, Lizhe Liu et al.
TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models
Ruidong Chen, honglin guo, Lanjun Wang et al.
LazyMAR: Accelerating Masked Autoregressive Models via Feature Caching
Feihong Yan, qingyan wei, Jiayi Tang et al.
MotionAgent: Fine-grained Controllable Video Generation via Motion Field Agent
Xinyao Liao, Xianfang Zeng, Liao Wang et al.
OpenRSD: Towards Open-prompts for Object Detection in Remote Sensing Images
Ziyue Huang, Yongchao Feng, Ziqi Liu et al.
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
Byung-Kwan Lee, Ryo Hachiuma, Yu-Chiang Frank Wang et al.
AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations
Junli Liu, Qizhi Chen, Zhigang Wang et al.
CWNet: Causal Wavelet Network for Low-Light Image Enhancement
Tongshun Zhang, Pingping Liu, Yubing Lu et al.
Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities
Liuyi Wang, Xinyuan Xia, Hui Zhao et al.
GARF: Learning Generalizable 3D Reassembly for Real-World Fractures
Sihang Li, Zeyu Jiang, Grace Chen et al.
ODDR: Outlier Detection & Dimension Reduction Based Defense Against Adversarial Patches
Nandish Chattopadhyay, Amira Guesmi, Muhammad Abdullah Hanif et al.
StreamGS: Online Generalizable Gaussian Splatting Reconstruction for Unposed Image Streams
Yang LI, Jinglu Wang, Lei Chu et al.
ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting
Ruijie Zhu, Mulin Yu, Linning Xu et al.
TruthPrInt: Mitigating Large Vision-Language Models Object Hallucination Via Latent Truthful-Guided Pre-Intervention
Jinhao Duan, Fei Kong, Hao Cheng et al.
RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation
Kaidong Zhang, Rongtao Xu, Ren Pengzhen et al.
Can Generative Video Models Help Pose Estimation?
Ruojin Cai, Jason Y. Zhang, Philipp Henzler et al.
RealEdit: Reddit Edits As a Large-scale Empirical Dataset for Image Transformations
Peter Sushko, Ayana Bharadwaj, Zhi Yang Lim et al.
Enhancing 3D Reconstruction for Dynamic Scenes
Jisang Han, Honggyu An, Jaewoo Jung et al.
Object-centric binding in Contrastive Language-Image Pretraining
Rim Assouel, Pietro Astolfi, Florian Bordes et al.
Multimodal LLMs as Customized Reward Models for Text-to-Image Generation
Shijie Zhou, Ruiyi Zhang, Huaisheng Zhu et al.
AgroBench: Vision-Language Model Benchmark in Agriculture
Risa Shinoda, Nakamasa Inoue, Hirokatsu Kataoka et al.
DiffLocks: Generating 3D Hair from a Single Image using Diffusion Models
Radu Alexandru Rosu, Keyu Wu, Yao Feng et al.
FOCUS: Knowledge-enhanced Adaptive Visual Compression for Few-shot Whole Slide Image Classification
Zhengrui Guo, Conghao Xiong, Jiabo MA et al.
Salvaging the Overlooked: Leveraging Class-Aware Contrastive Learning for Multi-Class Anomaly Detection
Lei Fan, Junjie Huang, Donglin Di et al.
GS-LIVM: Real-Time Photo-Realistic LiDAR-Inertial-Visual Mapping with Gaussian Splatting
Yusen XIE, Zhenmin Huang, Jin Wu et al.
What Makes for Text to 360-degree Panorama Generation with Stable Diffusion?
Jinhong Ni, Chang-Bin Zhang, Qiang Zhang et al.
The emergence of sparse attention: impact of data distribution and benefits of repetition
Nicolas Zucchet, Francesco D'Angelo, Andrew Lampinen et al.
MobileIE: An Extremely Lightweight and Effective ConvNet for Real-Time Image Enhancement on Mobile Devices
HAILONG YAN, Ao Li, Xiangtao Zhang et al.
Video2BEV: Transforming Drone Videos to BEVs for Video-based Geo-localization
Hao Ju, Shaofei Huang, Si Liu et al.
ARIG: Autoregressive Interactive Head Generation for Real-time Conversations
Ying Guo, Xi Liu, Cheng Zhen et al.
Keyframe-oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-Form Video Processing
Yudong Liu, Jingwei Sun, Yueqian Lin et al.
Vision-Language Models Can't See the Obvious
YASSER ABDELAZIZ DAHOU DJILALI, Ngoc Huynh, Phúc Lê Khắc et al.
DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding
Jungbin Cho, Junwan Kim, Jisoo Kim et al.
Semantic Watermarking Reinvented: Enhancing Robustness and Generation Quality with Fourier Integrity
Sung Ju Lee, Nam Ik Cho
Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos
Rundong Luo, Matthew Wallingford, Ali Farhadi et al.
REDUCIO! Generating 1K Video within 16 Seconds using Extremely Compressed Motion Latents
Rui Tian, Qi Dai, Jianmin Bao et al.
Parameter-Efficient Adaptation of Geospatial Foundation Models through Embedding Deflection
Romain Thoreau, Valerio Marsocci, Dawa Derksen
Riemannian-Geometric Fingerprints of Generative Models
Hae Jin Song, Laurent Itti
SITE: towards Spatial Intelligence Thorough Evaluation
Wenqi Wang, Reuben Tan, Pengyue Zhu et al.
TR-PTS: Task-Relevant Parameter and Token Selection for Efficient Tuning
Siqi Luo, Haoran Yang, Yi Xin et al.
SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing
Yingying Zhang, Lixiang Ru, Kang Wu et al.
Federated Continual Instruction Tuning
Haiyang Guo, Fanhu Zeng, Fei Zhu et al.
Event-based Tiny Object Detection: A Benchmark Dataset and Baselines
Nuo Chen, Chao Xiao, Yimian Dai et al.
Accelerating Diffusion Transformer via Gradient-Optimized Cache
Junxiang Qiu, Lin Liu, Shuo Wang et al.
Object-centric Video Question Answering with Visual Grounding and Referring
Haochen Wang, Qirui Chen, Cilin Yan et al.
StableCodec: Taming One-Step Diffusion for Extreme Image Compression
Tianyu Zhang, Xin Luo, Li Li et al.
PlanGen: Towards Unified Layout Planning and Image Generation in Auto-Regressive Vision Language Models
Runze He, bo cheng, Yuhang Ma et al.
GenDoP: Auto-regressive Camera Trajectory Generation as a Director of Photography
Mengchen Zhang, Tong Wu, Jing Tan et al.
SynFER: Towards Boosting Facial Expression Recognition with Synthetic Data
Xilin He, Cheng Luo, Xiaole Xian et al.
Dynamic Typography: Bringing Text to Life via Video Diffusion Prior
Zichen Liu, Yihao Meng, Hao Ouyang et al.
FlexGen: Flexible Multi-View Generation from Text and Image Inputs
Xinli Xu, Wenhang Ge, Jiantao Lin et al.
Importance-Based Token Merging for Efficient Image and Video Generation
Haoyu Wu, Jingyi Xu, Hieu Le et al.
Towards Real Unsupervised Anomaly Detection Via Confident Meta-Learning
Muhammad Aqeel, Shakiba Sharifi, Marco Cristani et al.
CCMNet: Leveraging Calibrated Color Correction Matrices for Cross-Camera Color Constancy
Dongyoung Kim, Mahmoud Afifi, Dongyun Kim et al.
FlowR: Flowing from Sparse to Dense 3D Reconstructions
Tobias Fischer, Samuel Rota Bulò, Yung-Hsu Yang et al.
Corvid: Improving Multimodal Large Language Models Towards Chain-of-Thought Reasoning
Jingjing Jiang, Chao Ma, Xurui Song et al.
RigGS: Rigging of 3D Gaussians for Modeling Articulated Objects in Videos
Yuxin Yao, Zhi Deng, Junhui Hou
MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges?
Yunxiang Zhang, Muhammad Khalifa, Shitanshu Bhushan et al.
Object-Shot Enhanced Grounding Network for Egocentric Video
Yisen Feng, Haoyu Zhang, Meng Liu et al.
T-SHIRT: Token-Selective Hierarchical Data Selection for Instruction Tuning
Yanjun Fu, Faisal Hamman, Sanghamitra Dutta
Robust Multimodal Survival Prediction with Conditional Latent Differentiation Variational AutoEncoder
Junjie Zhou, Jiao Tang, Yingli Zuo et al.
Sim-to-Real Causal Transfer: A Metric Learning Approach to Causally-Aware Interaction Representations
Ahmad Rahimi, Po-Chien Luan, Yuejiang Liu et al.
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
Sanghwan Kim, Rui Xiao, Iuliana Georgescu et al.
PICD: Versatile Perceptual Image Compression with Diffusion Rendering
Tongda Xu, Jiahao Li, Bin Li et al.
LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration
Yuyao Zhang, Jinghao Li, Yu-Wing Tai
Compiler-R1: Towards Agentic Compiler Auto-tuning with Reinforcement Learning
Haolin Pan, Hongyu Lin, Haoran Luo et al.
Synergizing Motion and Appearance: Multi-Scale Compensatory Codebooks for Talking Head Video Generation
Shuling Zhao, Fa-Ting Hong, Xiaoshui Huang et al.
On the Relation between Rectified Flows and Optimal Transport
Johannes Hertrich, Antonin Chambolle, Julie Delon
Straight-Line Diffusion Model for Efficient 3D Molecular Generation
Yuyan Ni, Shikun Feng, Haohan Chi et al.
Temporal Alignment-Free Video Matching for Few-shot Action Recognition
SuBeen Lee, WonJun Moon, Hyun Seok Seong et al.
Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models
Cameron Tice, Philipp Kreer, Nathan Helm-Burger et al.
FreqDebias: Towards Generalizable Deepfake Detection via Consistency-Driven Frequency Debiasing
Hossein Kashiani, Niloufar Alipour Talemi, Fatemeh Afghah
MMTL-UniAD: A Unified Framework for Multimodal and Multi-Task Learning in Assistive Driving Perception
Wenzhuo Liu, Wenshuo Wang, Yicheng Qiao et al.
M3amba: Memory Mamba is All You Need for Whole Slide Image Classification
Tingting Zheng, Kui Jiang, Yi Xiao et al.
Multitwine: Multi-Object Compositing with Text and Layout Control
Gemma Canet Tarrés, Zhe Lin, Zhifei Zhang et al.
PanDA: Towards Panoramic Depth Anything with Unlabeled Panoramas and Mobius Spatial Augmentation
Zidong Cao, Jinjing Zhu, Weiming Zhang et al.
TCFG: Tangential Damping Classifier-free Guidance
Mingi Kwon, Shin seong Kim, Jaeseok Jeong et al.
Navigating Image Restoration with VAR’s Distribution Alignment Prior
Siyang Wang, Naishan Zheng, Jie Huang et al.
Real-IAD D³: A Real-World 2D/Pseudo-3D/3D Dataset for Industrial Anomaly Detection
wenbing zhu, Lidong Wang, Ziqing Zhou et al.
DVHGNN: Multi-Scale Dilated Vision HGNN for Efficient Vision Recognition
Caoshuo Li, Tanzhe Li, Xiaobin Hu et al.
ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions
Tomas Soucek, Prajwal Gatti, Michael Wray et al.
Shape it Up! Restoring LLM Safety during Finetuning
ShengYun Peng, Pin-Yu Chen, Jianfeng Chi et al.
FlashDepth: Real-time Streaming Video Depth Estimation at 2K Resolution
Gene Chou, Wenqi Xian, Guandao Yang et al.
Scaffolding Dexterous Manipulation with Vision-Language Models
Vincent de Bakker, Joey Hejna, Tyler Lum et al.
Beyond Local Sharpness: Communication-Efficient Global Sharpness-aware Minimization for Federated Learning
Debora Caldarola, Pietro Cagnasso, Barbara Caputo et al.
CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation
Reza Abbasi, Ali Nazari, Aminreza Sefid et al.
Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization
Feifei Li, Mi Zhang, Yiming Sun et al.
Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions
Chan Hur, Jeong-hun Hong, Dong-hun Lee et al.
Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss Landscapes
Aodi Li, Liansheng Zhuang, Xiao Long et al.
Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning
Ali Taghibakhshi, Sharath Turuvekere Sreenivas, Saurav Muralidharan et al.
FlexOLMo: Open Language Models for Flexible Data Use
Weijia Shi, Akshita Bhagia, Kevin Farhat et al.
Robust 3D Shape Reconstruction in Zero-Shot from a Single Image in the Wild
Junhyeong Cho, Kim Youwang, Hunmin Yang et al.
Generalizable, real-time neural decoding with hybrid state-space models
Avery Hee-Woon Ryoo, Nanda H Krishna, Ximeng Mao et al.
SDGOCC: Semantic and Depth-Guided Bird's-Eye View Transformation for 3D Multimodal Occupancy Prediction
ZaiPeng Duan, Xuzhong Hu, Pei An et al.
Walking the Tightrope: Autonomous Disentangling Beneficial and Detrimental Drifts in Non-Stationary Custom-Tuning
Xiaoyu Yang, Jie Lu, En Yu
Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation
Reza Qorbani, Gianluca Villani, Theodoros Panagiotakopoulos et al.
Audio-Sync Video Generation with Multi-Stream Temporal Control
Shuchen Weng, Haojie Zheng, zheng chang et al.
DLF: Extreme Image Compression with Dual-generative Latent Fusion
Naifu Xue, Zhaoyang Jia, Jiahao Li et al.
Pareto-Optimal Energy Alignment for Designing Nature-Like Antibodies
Yibo Wen, Chenwei Xu, Jerry Yao-Chieh Hu et al.
A Stable Whitening Optimizer for Efficient Neural Network Training
Kevin Frans, Sergey Levine, Pieter Abbeel
Understanding LLM Behaviors via Compression: Data Generation, Knowledge Acquisition and Scaling Laws
Zhixuan Pan, Shaowen Wang, Liao Pengfei et al.
GeoSVR: Taming Sparse Voxels for Geometrically Accurate Surface Reconstruction
Jiahe Li, Jiawei Zhang, Youmin Zhang et al.
Estimating Model Performance Under Covariate Shift Without Labels
Jakub Białek, Juhani Kivimäki, Wojciech Kuberski et al.
Hearing Anywhere in Any Environment
Xiulong Liu, Anurag Kumar, Paul Calamia et al.
One-Step Offline Distillation of Diffusion-based Models via Koopman Modeling
Nimrod Berman, Ilan Naiman, Moshe Eliasof et al.
Time-IMM: A Dataset and Benchmark for Irregular Multimodal Multivariate Time Series
Ching Chang, Jeehyun Hwang, Yidan Shi et al.
A Tale of Two Symmetries: Exploring the Loss Landscape of Equivariant Models
YuQing Xie, Tess Smidt
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Zhengyao Lyu, Tianlin Pan, Chenyang Si et al.
DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding
Yudong Han, Qingpei Guo, Liyuan Pan et al.
GuardSplat: Efficient and Robust Watermarking for 3D Gaussian Splatting
Zixuan Chen, Guangcong Wang, Jiahao Zhu et al.
Normalized Attention Guidance: Universal Negative Guidance for Diffusion Models
Dar-Yen Chen, Hmrishav Bandyopadhyay, Kai Zou et al.
Seeing the Arrow of Time in Large Multimodal Models
Zihui (Sherry) Xue, Romy Luo, Kristen Grauman
Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval
Davide Caffagni, Sara Sarto, Marcella Cornia et al.
DualPM: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction
Ben Kaye, Tomas Jakab, Shangzhe Wu et al.
Keyframe-Guided Creative Video Inpainting
Yuwei Guo, Ceyuan Yang, Anyi Rao et al.
Absorb and Converge: Provable Convergence Guarantee for Absorbing Discrete Diffusion Models
Yuchen Liang, Renxiang Huang, Lifeng LAI et al.
CountLLM: Towards Generalizable Repetitive Action Counting via Large Language Model
Ziyu Yao, Xuxin Cheng, Zhiqi Huang et al.
Parametric Point Cloud Completion for Polygonal Surface Reconstruction
Zhaiyu Chen, Yuqing Wang, Liangliang Nan et al.
AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization
Yiyang Du, Xiaochen Wang, Chi Chen et al.
POT: Prototypical Optimal Transport for Weakly Supervised Semantic Segmentation
Jian Wang, Tianhong Dai, Bingfeng Zhang et al.
3D-MVP: 3D Multiview Pretraining for Manipulation
Shengyi Qian, Kaichun Mo, Valts Blukis et al.
Diagnosing and Addressing Pitfalls in KG-RAG Datasets: Toward More Reliable Benchmarking
Liangliang Zhang, Zhuorui Jiang, Hongliang Chi et al.
3D Occupancy Prediction with Low-Resolution Queries via Prototype-aware View Transformation
Gyeongrok Oh, Sung June Kim, Heeju Ko et al.
Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving of Inequalities
Haoyu Zhao, Yihan Geng, Shange Tang et al.
LaTexBlend: Scaling Multi-concept Customized Generation with Latent Textual Blending
Jian Jin, Zhenbo Yu, Yang Shen et al.
AURELIA: Test-time Reasoning Distillation in Audio-Visual LLMs
Sanjoy Chowdhury, Hanan Gani, Nishit Anand et al.
KAC: Kolmogorov-Arnold Classifier for Continual Learning
Yusong Hu, Zichen Liang, Fei Yang et al.
Hyperbolic Safety-Aware Vision-Language Models
Tobia Poppi, Tejaswi Kasarla, Pascal Mettes et al.
Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding
Yan Wang, Baoxiong Jia, Ziyu Zhu et al.
Point Clouds Meets Physics: Dynamic Acoustic Field Fitting Network for Point Cloud Understanding
Changshuo Wang, Shuting He, Xiang Fang et al.
High-Dimensional Calibration from Swap Regret
Maxwell Fishelson, Noah Golowich, Mehryar Mohri et al.
Learning with Calibration: Exploring Test-Time Computing of Spatio-Temporal Forecasting
Wei Chen, Yuxuan Liang
Visual Persona: Foundation Model for Full-Body Human Customization
Jisu Nam, Soowon Son, Zhan Xu et al.
COME: Adding Scene-Centric Forecasting Control to Occupancy World Model
Yining Shi, Kun Jiang, Qiang Meng et al.
Improved Representation Steering for Language Models
Zhengxuan Wu, Qinan Yu, Aryaman Arora et al.
Improving the Transferability of Adversarial Attacks on Face Recognition with Diverse Parameters Augmentation
Fengfan Zhou, Bangjie Yin, Hefei Ling et al.
A Comprehensive Study of Decoder-Only LLMs for Text-to-Image Generation
Andrew Z Wang, Songwei Ge, Tero Karras et al.
HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets and CLIP Models
ZHIXIANG WEI, Guangting Wang, Xiaoxiao Ma et al.
Fast Inference for Augmented Large Language Models
Rana Shahout, Cong Liang, Shiji Xin et al.
Týr-the-Pruner: Structural Pruning LLMs via Global Sparsity Distribution Optimization
Guanchen Li, Yixing Xu, Zeping Li et al.
R-LiViT: A LiDAR-Visual-Thermal Dataset Enabling Vulnerable Road User Focused Roadside Perception
Jonas Mirlach, Lei Wan, Andreas Wiedholz et al.
LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers
Yusuf Dalva, Hidir Yesiltepe, Pinar Yanardag
MIRE: Matched Implicit Neural Representations
Dhananjaya Jayasundara, Heng Zhao, Demetrio Labate et al.
U-Know-DiffPAN: An Uncertainty-aware Knowledge Distillation Diffusion Framework with Details Enhancement for PAN-Sharpening
Sungpyo Kim, Jeonghyeok Do, Jaehyup Lee et al.
Seg4Diff: Unveiling Open-Vocabulary Semantic Segmentation in Text-to-Image Diffusion Transformers
Chaehyun Kim, Heeseong Shin, Eunbeen Hong et al.
Realistic Test-Time Adaptation of Vision-Language Models
Maxime Zanella, Clément Fuchs, Christophe De Vleeschouwer et al.
Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks
Vishnu Sarukkai, Zhiqiang Xie, Kayvon Fatahalian
MATCHA: Towards Matching Anything
Fei Xue, Sven Elflein, Laura Leal-Taixe et al.
Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning
Julian Minder, Clément Dumas, Caden Juang et al.
Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training
Will Merrill, Shane Arora, Dirk Groeneveld et al.
On scalable and efficient training of diffusion samplers
Minkyu Kim, Kiyoung Seong, Dongyeop Woo et al.
T2V-OptJail: Discrete Prompt Optimization for Text-to-Video Jailbreak Attacks
Jiayang Liu, Siyuan Liang, Shiqian Zhao et al.
Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond)
Tomer Garber, Tom Tirer
CATSplat: Context-Aware Transformer with Spatial Guidance for Generalizable 3D Gaussian Splatting from A Single-View Image
Wonseok Roh, Hwanhee Jung, JongWook Kim et al.
HumanoidGen: Data Generation for Bimanual Dexterous Manipulation via LLM Reasoning
Zhi Jing, Siyuan Yang, Jicong Ao et al.
GaussianVideo: Efficient Video Representation via Hierarchical Gaussian Splatting
Andrew Bond, Jui-Hsien Wang, Long Mai et al.
MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems
Xuanming Zhang, Yuxuan Chen, Samuel (Min-Hsuan) Yeh et al.
PEER Pressure: Model-to-Model Regularization for Single Source Domain Generalization
Dongkyu Cho, Inwoo Hwang, Sanghack Lee
ZeroStereo: Zero-shot Stereo Matching from Single Images
Xianqi Wang, Hao Yang, Gangwei Xu et al.
Unified Dense Prediction of Video Diffusion
Lehan Yang, Lu Qi, Xiangtai Li et al.
ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding
Guangda Ji, Silvan Weder, Francis Engelmann et al.
PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model
Mingju Gao, Yike Pan, Huan-ang Gao et al.
Satellite Observations Guided Diffusion Model for Accurate Meteorological States at Arbitrary Resolution
Siwei Tu, Ben Fei, Weidong Yang et al.
Optimal Spectral Transitions in High-Dimensional Multi-Index Models
Leonardo Defilippis, Yatin Dandi, Pierre Mergny et al.
Driving View Synthesis on Free-form Trajectories with Generative Prior
Zeyu Yang, Zijie Pan, Yuankun Yang et al.
Uniform Generalization Bounds on Data-Dependent Hypothesis Sets via PAC-Bayesian Theory on Random Sets
Benjamin Dupuis, Paul Viallard, George Deligiannidis et al.
Nonisotropic Gaussian Diffusion for Realistic 3D Human Motion Prediction
Cecilia Curreli, Dominik Muhle, Abhishek Saroha et al.
Chat-based Person Retrieval via Dialogue-Refined Cross-Modal Alignment
Yang Bai, Yucheng Ji, Min Cao et al.
AlphaPre: Amplitude-Phase Disentanglement Model for Precipitation Nowcasting
Kenghong Lin, Baoquan Zhang, Demin Yu et al.
Distilling Parallel Gradients for Fast ODE Solvers of Diffusion Models
Beier Zhu, Ruoyu Wang, Tong Zhao et al.
D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation
Weinan Jia, Mengqi Huang, Nan Chen et al.
Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning
Marwa Abdulhai, Ryan Cheng, Donovan Clay et al.
Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs
Jingyao Wang, Wenwen Qiang, Zeen Song et al.
Win Fast or Lose Slow: Balancing Speed and Accuracy in Latency-Sensitive Decisions of LLMs
Hao Kang, Qingru Zhang, Han Cai et al.
MIEB: Massive Image Embedding Benchmark
Chenghao Xiao, Isaac Chung, Imene Kerboua et al.
AnyCalib: On-Manifold Learning for Model-Agnostic Single-View Camera Calibration
Javier Tirado-Garín, Javier Civera
Time Series Generation Under Data Scarcity: A Unified Generative Modeling Approach
Tal Gonen, Itai Pemper, Ilan Naiman et al.
Time Travel is Cheating: Going Live with DeepFund for Real-Time Fund Investment Benchmarking
Changlun Li, Yao SHI, Chen Wang et al.
Token Perturbation Guidance for Diffusion Models
Javad Rajabi, Soroush Mehraban, Seyedmorteza Sadat et al.
Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation
Liliang Ren, Congcong Chen, Haoran Xu et al.
MambaVLT: Time-Evolving Multimodal State Space Model for Vision-Language Tracking
Xinqi Liu, Li Zhou, Zikun Zhou et al.
The Hawthorne Effect in Reasoning Models: Evaluating and Steering Test Awareness
Sahar Abdelnabi, Ahmed Salem
Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation
Shivam Duggal, Yushi Hu, Oscar Michel et al.
Manipulating Feature Visualizations with Gradient Slingshots
Dilyara Bareeva, Marina Höhne, Alexander Warnecke et al.
Effective Cloud Removal for Remote Sensing Images by an Improved Mean-Reverting Denoising Model with Elucidated Design Space
Yi Liu, Wengen Li, Jihong Guan et al.
Improving Gaussian Splatting with Localized Points Management
Haosen Yang, Chenhao Zhang, Wenqing Wang et al.
DefectFill: Realistic Defect Generation with Inpainting Diffusion Model for Visual Inspection
Jaewoo Song, Daemin Park, Kanghyun Baek et al.
InteractionMap: Improving Online Vectorized HDMap Construction with Interaction
Kuang Wu, Chuan Yang, Zhanbin Li
RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments
Haisheng Su, Feixiang Song, CONG MA et al.