Most Cited 2025 "temporal question-answering" Papers
22,274 papers found • Page 30 of 112
Conference
DistinctAD: Distinctive Audio Description Generation in Contexts
Bo Fang, Wenhao Wu, Qiangqiang Wu et al.
The Atlas of In-Context Learning: How Attention Heads Shape In-Context Retrieval Augmentation
Patrick Kahardipraja, Reduan Achtibat, Thomas Wiegand et al.
BiLoRA: Almost-Orthogonal Parameter Spaces for Continual Learning
Hao Zhu, Yifei Zhang, Junhao Dong et al.
Towards Generalizable Trajectory Prediction using Dual-Level Representation Learning and Adaptive Prompting
Kaouther Messaoud, Matthieu Cord, Alex Alahi
SnowMaster: Comprehensive Real-world Image Desnowing via MLLM with Multi-Model Feedback Optimization
Jianyu LAI, Sixiang Chen, yunlong lin et al.
Understanding Contrastive Learning via Gaussian Mixture Models
Parikshit Bansal, Ali Kavis, Sujay Sanghavi
AgentBreeder: Mitigating the AI Safety Risks of Multi-Agent Scaffolds via Self-Improvement
J Rosser, Jakob Foerster
TrustMark: Robust Watermarking and Watermark Removal for Arbitrary Resolution Images
Tu Bui, Shruti Agarwal, John Collomosse
Optimal Spectral Transitions in High-Dimensional Multi-Index Models
Leonardo Defilippis, Yatin Dandi, Pierre Mergny et al.
DiscoVLA: Discrepancy Reduction in Vision, Language, and Alignment for Parameter-Efficient Video-Text Retrieval
Leqi Shen, Guoqiang Gong, Tianxiang Hao et al.
QuCOOP: A Versatile Framework for Solving Composite and Binary-Parametrised Problems on Quantum Annealers
Natacha Kuete Meli, Vladislav Golyanik, Marcel Seelbach Benkner et al.
Repurposing 2D Diffusion Models with Gaussian Atlas for 3D Generation
Tiange Xiang, Kai Li, Chengjiang Long et al.
Doubly Robust Alignment for Large Language Models
Erhan Xu, Kai Ye, Hongyi Zhou et al.
LatentHOI: On the Generalizable Hand Object Motion Generation with Latent Hand Diffusion.
Muchen Li, Sammy Christen, Chengde Wan et al.
Towards foundational LiDAR world models with efficient latent flow matching
Tianran Liu, Shengwen Zhao, Nicholas Rhinehart
ZeroVO: Visual Odometry with Minimal Assumptions
Lei Lai, Zekai Yin, Eshed Ohn-Bar
Do different prompting methods yield a common task representation in language models?
Guy Davidson, Todd Gureckis, Brenden Lake et al.
CoDA: Coordinated Diffusion Noise Optimization for Whole-Body Manipulation of Articulated Objects
Huaijin Pi, Zhi Cen, Zhiyang Dou et al.
Dual-Agent Optimization framework for Cross-Domain Few-Shot Segmentation
Zhaoyang Li, Yuan Wang, Wangkai Li et al.
Self-Calibrated Variance-Stabilizing Transformations for Real-World Image Denoising
Sébastien Herbreteau, Michael Unser
GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning
Kelin Yu, Sheng Zhang, Harshit Soora et al.
Enhanced then Progressive Fusion with View Graph for Multi-View Clustering
Zhibin Dong, Meng Liu, Siwei Wang et al.
Unity in Diversity: Video Editing via Gradient-Latent Purification
Junyu Gao, Kunlin Yang, Xuan Yao et al.
Lie Detector: Unified Backdoor Detection via Cross-Examination Framework
Xuan Wang, Siyuan Liang, Dongping Liao et al.
Deliberation on Priors: Trustworthy Reasoning of Large Language Models on Knowledge Graphs
Jie Ma, NING QU, Zhitao Gao et al.
Capturing Individual Human Preferences with Reward Features
Andre Barreto, Vincent Dumoulin, Yiran Mao et al.
BG-Triangle: Bézier Gaussian Triangle for 3D Vectorization and Rendering
Minye Wu, Haizhao Dai, Kaixin Yao et al.
Feedback Guidance of Diffusion Models
Felix Koulischer, Florian Handke, Johannes Deleu et al.
Robust-MVTON: Learning Cross-Pose Feature Alignment and Fusion for Robust Multi-View Virtual Try-On
Nannan Zhang, Yijiang Li, Dong Du et al.
PSBD: Prediction Shift Uncertainty Unlocks Backdoor Detection
Wei Li, Pin-Yu Chen, Sijia Liu et al.
TopoPoint: Enhance Topology Reasoning via Endpoint Detection in Autonomous Driving
Yanping Fu, Xinyuan Liu, Tianyu Li et al.
Decomposing Interventional Causality into Synergistic, Redundant, and Unique Components
Abel Jansma
UniEgoMotion: A Unified Model for Egocentric Motion Reconstruction, Forecasting, and Generation
Chaitanya Patel, Hiroki Nakamura, Yuta Kyuragi et al.
CanonSwap: High-Fidelity and Consistent Video Face Swapping via Canonical Space Modulation
Xiangyang Luo, Ye Zhu, Yunfei Liu et al.
Win Fast or Lose Slow: Balancing Speed and Accuracy in Latency-Sensitive Decisions of LLMs
Hao Kang, Qingru Zhang, Han Cai et al.
Knowledge Distillation with Refined Logits
Wujie Sun, Defang Chen, Siwei Lyu et al.
GECKO: Gigapixel Vision-Concept Contrastive Pretraining in Histopathology
Saarthak Kapse, Pushpak Pati, Srikar Yellapragada et al.
$\texttt{STRCMP}$: Integrating Graph Structural Priors with Language Models for Combinatorial Optimization
Xijun Li, Jiexiang Yang, Jinghao Wang et al.
PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs
Teng Zhou, Xiaoyu Zhang, Yongchuan Tang
MIRA: Medical Time Series Foundation Model for Real-World Health Data
Hao Li, Bowen Deng, Chang Xu et al.
Joint Relational Database Generation via Graph-Conditional Diffusion Models
Mohamed Amine Ketata, David Lüdke, Leo Schwinn et al.
3D Dental Model Segmentation with Geometrical Boundary Preserving
Shufan Xi, Zexian Liu, Junlin Chang et al.
Statistical inference for Linear Stochastic Approximation with Markovian Noise
Sergey Samsonov, Marina Sheshukova, Eric Moulines et al.
Beyond Single-Task: Robust Multi-Task Length Generalization for LLMs
Yi Hu, Shijia Kang, Haotong Yang et al.
CutS3D: Cutting Semantics in 3D for 2D Unsupervised Instance Segmentation
Leon Sick, Dominik Engel, Sebastian Hartwig et al.
Small Singular Values Matter: A Random Matrix Analysis of Transformer Models
Max Staats, Matthias Thamm, Bernd Rosenow
Neural Hierarchical Decomposition for Single Image Plant Modeling
Zhihao Liu, Zhanglin Cheng, Naoto Yokoya
BiM-VFI: Bidirectional Motion Field-Guided Frame Interpolation for Video with Non-uniform Motions
Wonyong Seo, Jihyong Oh, Munchurl Kim
GRAVER: Generative Graph Vocabularies for Robust Graph Foundation Models Fine-tuning
Haonan Yuan, Qingyun Sun, Junhua Shi et al.
ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction
YUEJIAO SU, Yi Wang, Qiongyang Hu et al.
Tight Lower Bounds and Improved Convergence in Performative Prediction
Pedram Khorsandi, Rushil Gupta, Mehrnaz Mofakhami et al.
MARBLE: Material Recomposition and Blending in CLIP-Space
Ta-Ying Cheng, Prafull Sharma, Mark Boss et al.
LiveCC: Learning Video LLM with Streaming Speech Transcription at Scale
Joya Chen, Yiqi Lin, Ziyun Zeng et al.
DSV-LFS: Unifying LLM-Driven Semantic Cues with Visual Features for Robust Few-Shot Segmentation
Amin Karimi, Charalambos Poullis
Memories of Forgotten Concepts
Matan Rusanovsky, Shimon Malnick, Amir Jevnisek et al.
Doppelgangers++: Improved Visual Disambiguation with Geometric 3D Features
Yuanbo Xiangli, Ruojin Cai, Hanyu Chen et al.
MixerMDM: Learnable Composition of Human Motion Diffusion Models
Pablo Ruiz-Ponce, German Barquero, Cristina Palmero et al.
PolarFree: Polarization-based Reflection-Free Imaging
Mingde Yao, Menglu Wang, King Man Tam et al.
OmniStereo: Real-time Omnidireactional Depth Estimation with Multiview Fisheye Cameras
Jiaxi Deng, Yushen Wang, Haitao Meng et al.
H-MoRe: Learning Human-centric Motion Representation for Action Analysis
Zhanbo Huang, Xiaoming Liu, Yu Kong
Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene
Shengqiong Wu, Hao Fei, Jingkang Yang et al.
Scalable and Cost-Efficient de Novo Template-Based Molecular Generation
Piotr Gaiński, Oussama Boussif, Andrei Rekesh et al.
SmartCLIP: Modular Vision-language Alignment with Identification Guarantees
Shaoan Xie, Lingjing Kong, Yujia Zheng et al.
When Thinking Drifts: Evidential Grounding for Robust Video Reasoning
Romy Luo, Zihui (Sherry) Xue, Alex Dimakis et al.
SAT-HMR: Real-Time Multi-Person 3D Mesh Estimation via Scale-Adaptive Tokens
Chi Su, Xiaoxuan Ma, Jiajun Su et al.
Vision Transformers with Self-Distilled Registers
Zipeng Yan, Yinjie Chen, Chong Zhou et al.
Dual-view X-ray Detection: Can AI Detect Prohibited Items from Dual-view X-ray Images like Humans?
Renshuai Tao, Haoyu Wang, Yuzhe Guo et al.
Why Knowledge Distillation Works in Generative Models: A Minimal Working Explanation
Sungmin Cha, Kyunghyun Cho
Protein Design with Dynamic Protein Vocabulary
Nuowei Liu, Jiahao Kuang, Yanting Liu et al.
Exploring Multimodal Diffusion Transformers for Enhanced Prompt-based Image Editing
Joonghyuk Shin, Alchan Hwang, Yujin Kim et al.
DINGO: Constrained Inference for Diffusion LLMs
Tarun Suresh, Debangshu Banerjee, Shubham Ugare et al.
UniPixel: Unified Object Referring and Segmentation for Pixel-Level Visual Reasoning
Ye Liu, Zongyang Ma, Junfu Pu et al.
MonoFusion: Sparse-View 4D Reconstruction via Monocular Fusion
Zihan Wang, Jeff Tan, Tarasha Khurana et al.
Ges3ViG : Incorporating Pointing Gestures into Language-Based 3D Visual Grounding for Embodied Reference Understanding
Atharv Mahesh Mane, Dulanga Weerakoon, Vigneshwaran Subbaraju et al.
Spatiotemporal Decoupling for Efficient Vision-Based Occupancy Forecasting
Jingyi Xu, Xieyuanli Chen, Junyi Ma et al.
TRACE: Grounding Time Series in Context for Multimodal Embedding and Retrieval
Jialin Chen, Ziyu Zhao, Gaukhar Nurbek et al.
Preference Learning with Lie Detectors can Induce Honesty or Evasion
Chris Cundy, Adam Gleave
Scaling Laws For Scalable Oversight
Joshua Engels, David Baek, Subhash Kantamneni et al.
Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using Concordia
Chandler Smith, Marwa Abdulhai, Manfred Díaz et al.
DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer
Yecheng Wu, Han Cai, Junyu Chen et al.
OccluGaussian: Occlusion-Aware Gaussian Splatting for Large Scene Reconstruction and Rendering
Shiyong Liu, Xiao Tang, Zhihao Li et al.
STiL: Semi-supervised Tabular-Image Learning for Comprehensive Task-Relevant Information Exploration in Multimodal Classification
Siyi Du, Xinzhe Luo, Declan ORegan et al.
LidarGait++: Learning Local Features and Size Awareness from LiDAR Point Clouds for 3D Gait Recognition
Chuanfu Shen, Rui Wang, Lixin Duan et al.
Robust Hallucination Detection in LLMs via Adaptive Token Selection
Mengjia Niu, Hamed Haddadi, Guansong Pang
DuCos: Duality Constrained Depth Super-Resolution via Foundation Model
Zhiqiang Yan, Zhengxue Wang, Haoye Dong et al.
ZigzagPointMamba: Spatial-Semantic Mamba for Point Cloud Understanding
LinshuangDiao, Sensen Song, Yurong Qian et al.
CuRe: Cultural Gaps in the Long Tail of Text-to-Image Systems
Aniket Rege, Zinnia Nie, Unmesh Raskar et al.
Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations
Jeonghyeon Kim, Sangheum Hwang
TKG-DM: Training-free Chroma Key Content Generation Diffusion Model
Ryugo Morita, Stanislav Frolov, Brian Bernhard Moser et al.
GraphMimic: Graph-to-Graphs Generative Modeling from Videos for Policy Learning
Guangyan Chen, Te Cui, Meiling Wang et al.
FRAMES-VQA: Benchmarking Fine-Tuning Robustness across Multi-Modal Shifts in Visual Question Answering
Chengyue Huang, Brisa Maneechotesuwan, Shivang Chopra et al.
HumanOLAT: A Large-Scale Dataset for Full-Body Human Relighting and Novel-View Synthesis
Timo Teufel, xilong zhou, Umar Iqbal et al.
GASP: Gaussian Avatars with Synthetic Priors
Jack Saunders, Charlie Hewitt, Yanan Jian et al.
Anomize: Better Open Vocabulary Video Anomaly Detection
Fei Li, Wenxuan Liu, Jingjing Chen et al.
VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance
Mohammad Reza Taesiri, Abhijay Ghildyal, Saman Zadtootaghaj et al.
Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval
Davide Caffagni, Sara Sarto, Marcella Cornia et al.
RELOCATE: A Simple Training-Free Baseline for Visual Query Localization Using Region-Based Representations
Savya Khosla, Sethuraman T V, Alexander G. Schwing et al.
PyTorchGeoNodes: Enabling Differentiable Shape Programs for 3D Shape Reconstruction
Sinisa Stekovic, Arslan Artykov, Stefan Ainetter et al.
Synergistic Prompting for Robust Visual Recognition with Missing Modalities
Zhihui Zhang, Luanyuan Dai, Qika Lin et al.
Activation Control for Efficiently Eliciting Long Chain-of-thought Ability of Language Models
Zekai Zhao, Qi Liu, Kun Zhou et al.
GENIUS: A Generative Framework for Universal Multimodal Search
Sungyeon Kim, Xinliang Zhu, Xiaofan Lin et al.
Common3D: Self-Supervised Learning of 3D Morphable Models for Common Objects in Neural Feature Space
Leonhard Sommer, Olaf Dünkel, Christian Theobalt et al.
CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models
Gaoyang Zhang, Bingtao Fu, Qingnan Fan et al.
Brain-like Variational Inference
Hadi Vafaii, Dekel Galor, Jacob Yates
Towards A Generalist Code Embedding Model Based On Massive Data Synthesis
Chaofan Li, Jianlyu Chen, Yingxia Shao et al.
Multi-Token Prediction Needs Registers
Anastasios Gerontopoulos, Spyridon Gidaris, Nikos Komodakis
Head Pursuit: Probing Attention Specialization in Multimodal Transformers
Lorenzo Basile, Valentino Maiorca, Diego Doimo et al.
Refusal Direction is Universal Across Safety-Aligned Languages
Xinpeng Wang, Mingyang Wang, Yihong Liu et al.
🎧MOSPA: Human Motion Generation Driven by Spatial Audio
Shuyang Xu, Zhiyang Dou, Mingyi Shi et al.
Let Me Think! A Long Chain of Thought Can Be Worth Exponentially Many Short Ones
Parsa Mirtaheri, Ezra Edelman, Samy Jelassi et al.
Test-Time Visual In-Context Tuning
Jiahao Xie, Alessio Tonioni, Nathalie Rauschmayr et al.
LineArt: A Knowledge-guided Training-free High-quality Appearance Transfer for Design Drawing with Diffusion Model
Xi Wang, Hongzhen Li, Heng Fang et al.
C4D: 4D Made from 3D through Dual Correspondences
Shizun Wang, Zhenxiang Jiang, Xingyi Yang et al.
FruitNinja: 3D Object Interior Texture Generation with Gaussian Splatting
Fangyu Wu, Yuhao Chen
GAP: Gaussianize Any Point Clouds with Text Guidance
Weiqi Zhang, Junsheng Zhou, Haotian Geng et al.
Visual-Oriented Fine-Grained Knowledge Editing for MultiModal Large Language Models
Zhen Zeng, Leijiang Gu, Xun Yang et al.
Not All Frame Features Are Equal: Video-to-4D Generation via Decoupling Dynamic-Static Features
Liying Yang, Chen Liu, Zhenwei Zhu et al.
Synthetic-powered predictive inference
Meshi Bashari, Roy Maor Lotan, Yonghoon Lee et al.
Beyond [cls]: Exploring the True Potential of Masked Image Modeling Representations
Marcin Przewięźlikowski, Randall Balestriero, Wojciech Jasiński et al.
AGC-Drive: A Large-Scale Dataset for Real-World Aerial-Ground Collaboration in Driving Scenarios
Yunhao Hou, Bochao Zou, Min Zhang et al.
Unveiling Concept Attribution in Diffusion Models
Nguyen Hung-Quang, Hoang Phan, Khoa D Doan
VOVTrack: Exploring the Potentiality in Raw Videos for Open-Vocabulary Multi-Object Tracking
Zekun Qian, Ruize Han, Junhui Hou et al.
Position: Bridge the Gaps between Machine Unlearning and AI Regulation
Bill Marino, Meghdad Kurmanji, Nicholas Lane
Generating 3D-Consistent Videos from Unposed Internet Photos
Gene Chou, Kai Zhang, Sai Bi et al.
Event Fields: Capturing Light Fields at High Speed, Resolution, and Dynamic Range
Ziyuan Qu, Zihao Zou, Vivek Boominathan et al.
Rethinking Neural Combinatorial Optimization for Vehicle Routing Problems with Different Constraint Tightness Degrees
Fu Luo, Yaoxin Wu, Zhi Zheng et al.
OpenSDI: Spotting Diffusion-Generated Images in the Open World
Yabin Wang, Zhiwu Huang, Xiaopeng Hong
Can LLMs Outshine Conventional Recommenders? A Comparative Evaluation
Qijiong Liu, Jieming Zhu, Lu Fan et al.
JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation
Kai Liu, Jungang Li, Yuchong Sun et al.
STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous Driving
Christian Fruhwirth-Reisinger, Dušan Malić, Wei Lin et al.
Absorb and Converge: Provable Convergence Guarantee for Absorbing Discrete Diffusion Models
Yuchen Liang, Renxiang Huang, Lifeng LAI et al.
Learning to Integrate Diffusion ODEs by Averaging the Derivatives
Wenze Liu, Xiangyu Yue
On the Out-Of-Distribution Generalization of Large Multimodal Models
Xingxuan Zhang, Jiansheng Li, Wenjing Chu et al.
MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness
Yunlong Tang, Pinxin Liu, Mingqian Feng et al.
Prompt2Perturb (P2P): Text-Guided Diffusion-Based Adversarial Attack on Breast Ultrasound Images
Yasamin Medghalchi, Moein Heidari, Clayton Allard et al.
VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization
Sihan Yang, Runsen Xu, Chenhang Cui et al.
Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks
Vishnu Sarukkai, Zhiqiang Xie, Kayvon Fatahalian
VerbDiff: Text-Only Diffusion Models with Enhanced Interaction Awareness
SeungJu Cha, Kwanyoung Lee, Ye-Chan Kim et al.
Humans as a Calibration Pattern: Dynamic 3D Scene Reconstruction from Unsynchronized and Uncalibrated Videos
Changwoon Choi, Jeongjun Kim, Geonho Cha et al.
SURGEON: Memory-Adaptive Fully Test-Time Adaptation via Dynamic Activation Sparsity
Ke Ma, Jiaqi Tang, Bin Guo et al.
FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models
Yan Gao, Massimo R. Scamarcia, Javier Fernandez-Marques et al.
IDEA-Bench: How Far are Generative Models from Professional Designing?
Chen Liang, Lianghua Huang, Jingwu Fang et al.
Flatten Graphs as Sequences: Transformers are Scalable Graph Generators
Dexiong Chen, Markus Krimmel, Karsten Borgwardt
Enhancing Dataset Distillation via Non-Critical Region Refinement
Minh-Tuan Tran, Trung Le, Xuan-May Le et al.
PLEIADES: Building Temporal Kernels with Orthogonal Polynomials
Yan Ru Pei, Olivier Coenen
Rethinking Multi-modal Object Detection from the Perspective of Mono-Modality Feature Learning
Tianyi Zhao, Boyang Liu, Yanglei Gao et al.
EconGym: A Scalable AI Testbed with Diverse Economic Tasks
Qirui Mi, Qipeng Yang, Zijun Fan et al.
Attend to Not Attended: Structure-then-Detail Token Merging for Post-training DiT Acceleration
Haipeng Fang, Sheng Tang, Juan Cao et al.
DeGauss: Dynamic-Static Decomposition with Gaussian Splatting for Distractor-free 3D Reconstruction
Rui Wang, Quentin Lohmeyer, Mirko Meboldt et al.
Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs
Yifan Shen, Yuanzhe Liu, Jingyuan Zhu et al.
Breaking the Discretization Barrier of Continuous Physics Simulation Learning
Fan Xu, Hao Wu, Nan Wang et al.
Rethinking Tokenized Graph Transformers for Node Classification
Jinsong Chen, Chenyang Li, Gaichao Li et al.
Conformal Prediction for Ensembles: Improving Efficiency via Score-Based Aggregation
Yash Patel, Eduardo Ochoa Rivera, Ambuj Tewari
Relation3D : Enhancing Relation Modeling for Point Cloud Instance Segmentation
Edward LOO, Jiacheng Deng
Auto-Vocabulary Semantic Segmentation
Osman Ülger, Maksymilian Kulicki, Yuki Asano et al.
Channel Consistency Prior and Self-Reconstruction Strategy Based Unsupervised Image Deraining
Guanglu Dong, Tianheng Zheng, Yuanzhouhan Cao et al.
Language Models can Self-Improve at State-Value Estimation for Better Search
Ethan Mendes, Alan Ritter
Video Motion Graphs
Haiyang Liu, Zhan Xu, Fating Hong et al.
Dynamic Risk Assessments for Offensive Cybersecurity Agents
Boyi Wei, Benedikt Stroebl, Jiacen Xu et al.
Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving
Hao Zhou, Zhanning Gao, Zhili Chen et al.
Reconstructing In-the-Wild Open-Vocabulary Human-Object Interactions
Boran Wen, Dingbang Huang, Zichen Zhang et al.
TF-TI2I: Training-Free Text-and-Image-to-Image Generation via Multi-Modal Implicit-Context Learning In Text-to-Image Models
Teng-Fang Hsiao, Bo-Kai Ruan, Yi-Lun Wu et al.
MarkushGrapher: Joint Visual and Textual Recognition of Markush Structures
Lucas Morin, Valery Weber, Ahmed Nassar et al.
Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime
Amit Attia, Matan Schliserman, Uri Sherman et al.
MagicHOI: Leveraging 3D Priors for Accurate Hand-object Reconstruction from Short Monocular Video Clips
SHIBO WANG, Haonan He, Maria Parelli et al.
ReRAW: RGB-to-RAW Image Reconstruction via Stratified Sampling for Efficient Object Detection on the Edge
Radu Berdan, Beril Besbinar, Christoph Reinders et al.
Multi-modal Medical Diagnosis via Large-small Model Collaboration
Wanyi Chen, Zihua Zhao, Jiangchao Yao et al.
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh
Shuangkang Fang, I-Chao Shen, Yufeng Wang et al.
Always Tell Me The Odds: Fine-grained Conditional Probability Estimation
Liaoyaqi Wang, Zhengping Jiang, Anqi Liu et al.
Quantifying Fairness in LLMs Beyond Tokens: A Semantic and Statistical Perspective
Weijie Xu, Yiwen Wang, Chi Xue et al.
Focal-SAM: Focal Sharpness-Aware Minimization for Long-Tailed Classification
Sicong Li, Qianqian Xu, Zhiyong Yang et al.
Can RLHF be More Efficient with Imperfect Reward Models? A Policy Coverage Perspective
Jiawei Huang, Bingcong Li, Christoph Dann et al.
Can Performant LLMs Be Ethical? Quantifying the Impact of Web Crawling Opt-Outs
Dongyang Fan, Vinko Sabolčec, Matin Ansaripour et al.
Transformative or Conservative? Conservation laws for ResNets and Transformers
Sibylle Marcotte, Rémi Gribonval, Gabriel Peyré
Towards Universal Offline Black-Box Optimization via Learning Language Model Embeddings
Rong-Xi Tan, Ming Chen, Ke Xue et al.
Overcoming Vocabulary Constraints with Pixel-level Fallback
Jonas F. Lotz, Hendra Setiawan, Stephan Peitz et al.
Efficient Parallel Training Methods for Spiking Neural Networks with Constant Time Complexity
Wanjin Feng, Xingyu Gao, Wenqian Du et al.
X-Hacking: The Threat of Misguided AutoML
Rahul Sharma, Sumantrak Mukherjee, Andrea Šipka et al.
Differential Privacy Under Class Imbalance: Methods and Empirical Insights
Lucas Rosenblatt, Yuliia Lut, Ethan Turok et al.
ADHMR: Aligning Diffusion-based Human Mesh Recovery via Direct Preference Optimization
Wenhao Shen, Wanqi Yin, Xiaofeng Yang et al.
LLaVA-ReID: Selective Multi-image Questioner for Interactive Person Re-Identification
Yiding Lu, Mouxing Yang, Dezhong Peng et al.
M3-JEPA: Multimodal Alignment via Multi-gate MoE based on the Joint-Embedding Predictive Architecture
Hongyang Lei, Xiaolong Cheng, Qi Qin et al.
Understanding Model Ensemble in Transferable Adversarial Attack
Wei Yao, Zeliang Zhang, Huayi Tang et al.
Prune 'n Predict: Optimizing LLM Decision-making with Conformal Prediction
Harit Vishwakarma, Alan Mishler, Thomas Cook et al.
Boosting Virtual Agent Learning and Reasoning: A Step-Wise, Multi-Dimensional, and Generalist Reward Model with Benchmark
Bingchen Miao, Yang Wu, Minghe Gao et al.
Fast and Low-Cost Genomic Foundation Models via Outlier Removal
Haozheng Luo, Chenghao Qiu, Maojiang Su et al.
True Multimodal In-Context Learning Needs Attention to the Visual Context
Shuo Chen, Jianzhe Liu, Zhen Han et al.
Emotional Face-to-Speech
Jiaxin Ye, Boyuan Cao, Hongming Shan
Policy Design for Two-sided Platforms with Participation Dynamics
Haruka Kiyohara, Fan Yao, Sarah Dean
EvalAgents: Discovering Implicit Evaluation Criteria from the Web
Manya Wadhwa, Zayne Rea Sprague, Chaitanya Malaviya et al.
CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation
Minghao Fu, Guo-Hua Wang, Liangfu Cao et al.
UncertainSAM: Fast and Efficient Uncertainty Quantification of the Segment Anything Model
Timo Kaiser, Thomas Norrenbrock, Bodo Rosenhahn
PANDAS: Improving Many-shot Jailbreaking via Positive Affirmation, Negative Demonstration, and Adaptive Sampling
Avery Ma, Yangchen Pan, Amir-massoud Farahmand
Data-Centric Human Preference with Rationales for Direct Preference Alignment
Hoang Anh Just, Ming Jin, Anit Kumar Sahu et al.
From Next-Token to Mathematics: The Learning Dynamics of Mathematical Reasoning in Language Models
Shubhra Mishra, Gabriel Poesia, Noah Goodman
QUDsim: Quantifying Discourse Similarities in LLM-Generated Text
Ramya Namuduri, Yating Wu, Anshun Asher Zheng et al.
SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching
Yuxuan Zhu, Ali Falahati, David H. Yang et al.
Control the Temperature: Selective Sampling for Diverse and High-Quality LLM Outputs
Sergey Troshin, Wafaa Mohammed, Yan Meng et al.
Generalized Venn and Venn-Abers Calibration with Applications in Conformal Prediction
Lars van der Laan, Ahmed Alaa
Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs
Itay Itzhak, Yonatan Belinkov, Gabriel Stanovsky