Most Cited 2025 "temporal question-answering" Papers
22,274 papers found • Page 21 of 112
Conference
UniCoTT: A Unified Framework for Structural Chain-of-Thought Distillation
Xianwei Zhuang, Zhihong Zhu, Zhichang Wang et al.
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding
Jinlong Li, Cristiano Saltori, Fabio Poiesi et al.
Predicting the Original Appearance of Damaged Historical Documents
Zhenhua Yang, Dezhi Peng, Yongxin Shi et al.
KinMo: Kinematic-aware Human Motion Understanding and Generation
Pengfei Zhang, Pinxin Liu, Pablo Garrido et al.
Instruction-guided Multi-Granularity Segmentation and Captioning with Large Multimodal Model
Xu Yuan, Li Zhou, Zenghui Sun et al.
Selective Prompt Anchoring for Code Generation
Yuan Tian, Tianyi Zhang
SMITE: Segment Me In TimE
Amirhossein Alimohammadi, Sauradip Nag, Saeid Asgari et al.
DanceFix: An Exploration in Group Dance Neatness Assessment Through Fixing Abnormal Challenges of Human Pose
Huangbiao Xu, Xiao Ke, Huanqi Wu et al.
Effective and Efficient Time-Varying Counterfactual Prediction with State-Space Models
Haotian Wang, Haoxuan Li, Hao Zou et al.
Glauber Generative Model: Discrete Diffusion Models via Binary Classification
Harshit Varma, Dheeraj Nagaraj, Karthikeyan Shanmugam
Details Enhancement in Unsigned Distance Field Learning for High-fidelity 3D Surface Reconstruction
Cheng Xu, Fei Hou, Wencheng Wang et al.
Scene Map-based Prompt Tuning for Navigation Instruction Generation
Sheng Fan, Rui Liu, Wenguan Wang et al.
Understanding Fairness Surrogate Functions in Algorithmic Fairness
Yong Liu, (Andrew) Zhanke Zhou, Zhicong Li et al.
Privacy amplification by random allocation
Moshe Shenfeld, Vitaly Feldman
What Do Latent Action Models Actually Learn?
Chuheng Zhang, Tim Pearce, Pushi Zhang et al.
Towards Robustness and Explainability of Automatic Algorithm Selection
Xingyu Wu, Jibin Wu, Yu Zhou et al.
BRAID: Input-driven Nonlinear Dynamical Modeling of Neural-Behavioral Data
Parsa Vahidi, Omid G. Sani, Maryam Shanechi
CVLUE: A New Benchmark Dataset for Chinese Vision-Language Understanding Evaluation
Yuxuan Wang, Yijun Liu, Fei Yu et al.
Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation
adil kaan akan, Yucel Yemez
Alignment-Free RGB-T Salient Object Detection: A Large-Scale Dataset and Progressive Correlation Network
Kunpeng Wang, Keke Chen, Chenglong Li et al.
GS-LIVM: Real-Time Photo-Realistic LiDAR-Inertial-Visual Mapping with Gaussian Splatting
Yusen XIE, Zhenmin Huang, Jin Wu et al.
Privacy Attacks on Image AutoRegressive Models
Antoni Kowalczuk, Jan Dubiński, Franziska Boenisch et al.
PALMBENCH: A COMPREHENSIVE BENCHMARK OF COMPRESSED LARGE LANGUAGE MODELS ON MOBILE PLATFORMS
Yilong Li, Jingyu Liu, Hao Zhang et al.
Doubly Robust Conformalized Survival Analysis with Right-Censored Data
Matteo Sesia, vladimir svetnik
CL-LoRA: Continual Low-Rank Adaptation for Rehearsal-Free Class-Incremental Learning
Jiangpeng He, Zhihao Duan, Fengqing Zhu
Anywhere: A Multi-Agent Framework for User-Guided, Reliable, and Diverse Foreground-Conditioned Image Generation
Xie Tianyidan, Rui Ma, Qian Wang et al.
LIRM: Large Inverse Rendering Model for Progressive Reconstruction of Shape, Materials and View-dependent Radiance Fields
Zhengqin Li, Dilin Wang, Ka chen et al.
EEE-Bench: A Comprehensive Multimodal Electrical And Electronics Engineering Benchmark
Ming Li, Jike Zhong, Tianle Chen et al.
Neural LightRig: Unlocking Accurate Object Normal and Material Estimation with Multi-Light Diffusion
Zexin He, Tengfei Wang, Xin Huang et al.
Adaptive Part Learning for Fine-Grained Generalized Category Discovery: A Plug-and-Play Enhancement
Qiyuan Dai, Hanzhuo Huang, Yu Wu et al.
Learning Safety Constraints for Large Language Models
Xin Chen, Yarden As, Andreas Krause
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis
Ulyana Piterbarg, Lerrel Pinto, Rob Fergus
ARIG: Autoregressive Interactive Head Generation for Real-time Conversations
Ying Guo, Xi Liu, Cheng Zhen et al.
Contextual AD Narration with Interleaved Multimodal Sequence
Hanlin Wang, Zhan Tong, Kecheng Zheng et al.
Beyond Human Data: Aligning Multimodal Large Language Models by Iterative Self-Evolution
Wentao Tan, Qiong Cao, Yibing Zhan et al.
Keyframe-oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-Form Video Processing
Yudong Liu, Jingwei Sun, Yueqian Lin et al.
StdGEN: Semantic-Decomposed 3D Character Generation from Single Images
Yuze He, Yanning Zhou, Wang Zhao et al.
Medical Multimodal Model Stealing Attacks via Adversarial Domain Alignment
Yaling Shen, Zhixiong Zhuang, Kun Yuan et al.
Compression via Pre-trained Transformers: A Study on Byte-Level Multimodal Data
David Heurtel-Depeiges, Anian Ruoss, Joel Veness et al.
CrossOver: 3D Scene Cross-Modal Alignment
Sayan Deb Sarkar, Ondrej Miksik, Marc Pollefeys et al.
A Simple yet Effective Layout Token in Large Language Models for Document Understanding
Zhaoqing Zhu, Chuwei Luo, Zirui Shao et al.
Symbolic regression via MDLformer-guided search: from minimizing prediction error to minimizing description length
Zihan Yu, Jingtao Ding, Yong Li et al.
GFlowVLM: Enhancing Multi-step Reasoning in Vision-Language Models with Generative Flow Networks
Haoqiang Kang, Enna Sachdeva, Piyush Gupta et al.
Impossible Videos
Zechen Bai, Hai Ci, Mike Zheng Shou
Inversion Circle Interpolation: Diffusion-based Image Augmentation for Data-scarce Classification
Yanghao Wang, Long Chen
Second Order Bounds for Contextual Bandits with Function Approximation
Aldo Pacchiano
AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models
Xinghui Li, Qichao Sun, Pengze Zhang et al.
PIG: Physics-Informed Gaussians as Adaptive Parametric Mesh Representations
Namgyu Kang, Jaemin Oh, Youngjoon Hong et al.
Segment Any 3D Object with Language
Seungjun Lee, Yuyang Zhao, Gim H Lee
A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation
Liang Chen, Sinan Tan, Zefan Cai et al.
Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos
Rundong Luo, Matthew Wallingford, Ali Farhadi et al.
A multiscale analysis of mean-field transformers in the moderate interaction regime
Giuseppe Bruno, Federico Pasqualotto, Andrea Agazzi
SwiftTry: Fast and Consistent Video Virtual Try-On with Diffusion Models
Hung Nguyen, Quang Qui-Vinh Nguyen, Khoi Nguyen et al.
Loss Functions and Operators Generated by f-Divergences
Vincent Roulet, Tianlin Liu, Nino Vieillard et al.
ESE: Espresso Sentence Embeddings
Xianming Li, Zongxi Li, Jing Li et al.
Motion-aware Contrastive Learning for Temporal Panoptic Scene Graph Generation
Thong Thanh Nguyen, Xiaobao Wu, Yi Bin et al.
Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model
Shengjun Zhang, Jinzhao Li, Xin Fei et al.
Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics
Lee Chae-Yeon, Oh Hyun-Bin, Han EunGi et al.
Uncertainty Modeling in Graph Neural Networks via Stochastic Differential Equations
Richard Bergna, Sergio Calvo Ordoñez, Felix Opolka et al.
Fine-structure Preserved Real-world Image Super-resolution via Transfer VAE Training
Qiaosi Yi, Shuai Li, Rongyuan Wu et al.
GenesisTex2: Stable, Consistent and High-Quality Text-to-Texture Generation
Jiawei Lu, YingPeng Zhang, Zengjun Zhao et al.
Emergence and Evolution of Interpretable Concepts in Diffusion Models
Berk Tinaz, Zalan Fabian, Mahdi Soltanolkotabi
ChatHuman: Chatting about 3D Humans with Tools
Jing Lin, Yao Feng, Weiyang Liu et al.
ChatGen: Automatic Text-to-Image Generation From FreeStyle Chatting
Chengyou Jia, Changliang Xia, Zhuohang Dang et al.
FreSh: Frequency Shifting for Accelerated Neural Representation Learning
Adam Kania, Marko Mihajlovic, Sergey Prokudin et al.
Scaling Laws for Task-Optimized Models of the Primate Visual Ventral Stream
Abdulkadir Gokce, Martin Schrimpf
Overcoming the Curse of Dimensionality in Reinforcement Learning Through Approximate Factorization
Chenbei Lu, Laixi Shi, Zaiwei Chen et al.
HQGS: High-Quality Novel View Synthesis with Gaussian Splatting in Degraded Scenes
Xin Lin, Shi Luo, Xiaojun Shan et al.
Perception in Reflection
Yana Wei, Liang Zhao, Kangheng Lin et al.
Arbitrary Reading Order Scene Text Spotter with Local Semantics Guidance
Jiahao Lyu, Wei Wang, Dongbao Yang et al.
Robust and Conjugate Spatio-Temporal Gaussian Processes
William Laplante, Matias Altamirano, Andrew Duncan et al.
Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD Map
Xinyuan Chang, Maixuan Xue, Xinran Liu et al.
SAM2-LOVE: Segment Anything Model 2 in Language-aided Audio-Visual Scenes
Yuji Wang, Haoran Xu, Yong Liu et al.
Keeping an Eye on LLM Unlearning: The Hidden Risk and Remedy
Jie Ren, Zhenwei Dai, Xianfeng Tang et al.
Zero-Shot Novel View and Depth Synthesis with Multi-View Geometric Diffusion
Vitor Guizilini, Muhammad Zubair Irshad, Dian Chen et al.
LLAVIDAL: A Large LAnguage VIsion Model for Daily Activities of Living
Dominick Reilly, Rajatsubhra Chakraborty, Arkaprava Sinha et al.
RobustMerge: Parameter-Efficient Model Merging for MLLMs with Direction Robustness
Fanhu Zeng, Haiyang Guo, Fei Zhu et al.
Pursuing Feature Separation based on Neural Collapse for Out-of-Distribution Detection
Yingwen Wu, Ruiji Yu, Xinwen Cheng et al.
Ultra-Resolution Adaptation with Ease
Ruonan Yu, Songhua Liu, Zhenxiong Tan et al.
TR-PTS: Task-Relevant Parameter and Token Selection for Efficient Tuning
Siqi Luo, Haoran Yang, Yi Xin et al.
SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing
Yingying Zhang, Lixiang Ru, Kang Wu et al.
Noise Modeling in One Hour: Minimizing Preparation Efforts for Self-supervised Low-Light RAW Image Denoising
Feiran Li, Haiyang Jiang, Daisuke Iso
SEAL: Semantic Attention Learning for Long Video Representation
Lan Wang, Yujia Chen, Wen-Sheng Chu et al.
POp-GS: Next Best View in 3D-Gaussian Splatting with P-Optimality
Joey Wilson, Marcelino M. de Almeida, Sachit Mahajan et al.
COLUMBUS: Evaluating COgnitive Lateral Understanding Through Multiple-Choice reBUSes
Koen Kraaijveld, Yifan Jiang, Kaixin Ma et al.
EvEnhancer: Empowering Effectiveness, Efficiency and Generalizability for Continuous Space-Time Video Super-Resolution with Events
Shuoyan Wei, Feng Li, Shengeng Tang et al.
PQMass: Probabilistic Assessment of the Quality of Generative Models using Probability Mass Estimation
Pablo Lemos, Sammy Sharief, Nikolay Malkin et al.
SEMU: Singular Value Decomposition for Efficient Machine Unlearning
Marcin Sendera, Łukasz Struski, Kamil Książek et al.
Value-Guided Search for Efficient Chain-of-Thought Reasoning
Kaiwen Wang, Jin Zhou, Jonathan Chang et al.
CausalRivers - Scaling up benchmarking of causal discovery for real-world time-series
Gideon Stein, Maha Shadaydeh, Jan Blunk et al.
Generating Freeform Endoskeletal Robots
Muhan Li, Lingji Kong, Sam Kriegman
NOVA: A Benchmark for Rare Anomaly Localization and Clinical Reasoning in Brain MRI
Cosmin Bercea, Jun Li, Philipp Raffler et al.
DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
Junjie Wang, BIN CHEN, Yulin Li et al.
CAM: A Constructivist View of Agentic Memory for LLM-Based Reading Comprehension
Rui Li, Zeyu Zhang, Xiaohe Bo et al.
Doubly Contrastive Learning for Source-Free Domain Adaptive Person Search
Yizhen Jia, Rong Quan, Yue Feng et al.
DA-VPT: Semantic-Guided Visual Prompt Tuning for Vision Transformers
Li Ren, Chen Chen, Liqiang Wang et al.
Dynamic Typography: Bringing Text to Life via Video Diffusion Prior
Zichen Liu, Yihao Meng, Hao Ouyang et al.
3D Gaussian Head Avatars with Expressive Dynamic Appearances by Compact Tensorial Representations
yating wang, Xuan Wang, Ran Yi et al.
VideoMage: Multi-Subject and Motion Customization of Text-to-Video Diffusion Models
Chi-Pin Huang, Yen-Siang Wu, Hung-Kai Chung et al.
PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient Vision Transformer
Pierre-David Letourneau, Manish Singh, Hsin-Pai Cheng et al.
HAIF-GS: Hierarchical and Induced Flow-Guided Gaussian Splatting for Dynamic Scene
Jianing Chen, Zehao Li, Yujun Cai et al.
DuMo: Dual Encoder Modulation Network for Precise Concept Erasure
Feng Han, Kai Chen, Chao Gong et al.
Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners
Michal Nauman, Marek Cygan, Carmelo Sferrazza et al.
NeSyC: A Neuro-symbolic Continual Learner For Complex Embodied Tasks in Open Domains
Wonje Choi, Jinwoo Park, Sanghyun Ahn et al.
Extrapolated Urban View Synthesis Benchmark
Xiangyu Han, Zhen Jia, Boyi Li et al.
GauSTAR: Gaussian Surface Tracking and Reconstruction
Chengwei Zheng, Lixin Xue, Juan Jose Zarate et al.
Multi-Perspective Data Augmentation for Few-shot Object Detection
Anh-Khoa Nguyen Vu, Quoc Truong Truong, Vinh-Tiep Nguyen et al.
Directional Gradient Projection for Robust Fine-Tuning of Foundation Models
Chengyue Huang, Junjiao Tian, Brisa Maneechotesuwan et al.
Robustness Auditing for Linear Regression: To Singularity and Beyond
Ittai Rubinstein, Samuel Hopkins
CTSyn: A Foundation Model for Cross Tabular Data Generation
Xiaofeng Lin, Chenheng Xu, Matthew Yang et al.
MAR-3D: Progressive Masked Auto-regressor for High-Resolution 3D Generation
Jinnan Chen, Lingting Zhu, Zeyu HU et al.
Exploring Historical Information for RGBE Visual Tracking with Mamba
Chuanyu Sun, Jiqing Zhang, Yang Wang et al.
Triples as the Key: Structuring Makes Decomposition and Verification Easier in LLM-based TableQA
Zhen Yang, Ziwei Du, Minghan Zhang et al.
FlowR: Flowing from Sparse to Dense 3D Reconstructions
Tobias Fischer, Samuel Rota Bulò, Yung-Hsu Yang et al.
FOCUS: Knowledge-enhanced Adaptive Visual Compression for Few-shot Whole Slide Image Classification
Zhengrui Guo, Conghao Xiong, Jiabo MA et al.
GaussRender: Learning 3D Occupancy with Gaussian Rendering
Loick Chambon, Eloi Zablocki, Alexandre Boulch et al.
AutoElicit: Using Large Language Models for Expert Prior Elicitation in Predictive Modelling
Alexander Capstick, Rahul G. Krishnan, Payam Barnaghi
Open CaptchaWorld: A Comprehensive Web-based Platform for Testing and Benchmarking Multimodal LLM Agents
Yaxin Luo, Zhaoyi Li, Jiacheng Liu et al.
VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling
Hyojun Go, Byeongjun Park, Hyelin Nam et al.
Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences
Hyojin Bahng, Caroline Chan, Fredo Durand et al.
ActionReasoningBench: Reasoning about Actions with and without Ramification Constraints
Divij Handa, Pavel Dolin, Shrinidhi Kumbhar et al.
Position: We Need An Algorithmic Understanding of Generative AI
Oliver Eberle, Thomas McGee, Hamza Giaffar et al.
Lifelong Safety Alignment for Language Models
Haoyu Wang, Yifei Zhao, Zeyu Qin et al.
Latent Policy Barrier: Learning Robust Visuomotor Policies by Staying In-Distribution
Zhanyi Sun, Shuran Song
Out of Length Text Recognition with Sub-String Matching
Yongkun Du, Zhineng Chen, Caiyan Jia et al.
Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations
Xiang Xu, Lingdong Kong, Song Wang et al.
Beyond Verifiable Rewards: Scaling Reinforcement Learning in Language Models to Unverifiable Data
Yunhao Tang, Sid Wang, Lovish Madaan et al.
Data-Juicer Sandbox: A Feedback-Driven Suite for Multimodal Data-Model Co-development
Daoyuan Chen, Haibin Wang, Yilun Huang et al.
RI3D: Few-Shot Gaussian Splatting With Repair and Inpainting Diffusion Priors
Avinash Paliwal, xilong zhou, Wei Ye et al.
Generative RLHF-V: Learning Principles from Multi-modal Human Preference
Jiayi Zhou, Jiaming Ji, Boyuan Chen et al.
Straight-Line Diffusion Model for Efficient 3D Molecular Generation
Yuyan Ni, Shikun Feng, Haohan Chi et al.
UNIC-Adapter: Unified Image-instruction Adapter with Multi-modal Transformer for Image Generation
Lunhao Duan, Shanshan Zhao, Wenjun Yan et al.
Evaluating Neuron Explanations: A Unified Framework with Sanity Checks
Tuomas Oikarinen, Ge Yan, Lily Weng
Vision-Language Models Can't See the Obvious
YASSER ABDELAZIZ DAHOU DJILALI, Ngoc Huynh, Phúc Lê Khắc et al.
MMSearch: Unveiling the Potential of Large Models as Multi-modal Search Engines
Dongzhi Jiang, Renrui Zhang, Ziyu Guo et al.
Training-Free Constrained Generation With Stable Diffusion Models
Stefano Zampini, Jacob K Christopher, Luca Oneto et al.
DISCO: learning to DISCover an evolution Operator for multi-physics-agnostic prediction
Rudy Morel, Jiequn Han, Edouard Oyallon
Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration
Max Wilcoxson, Qiyang Li, Kevin Frans et al.
H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving
Siran Chen, Yuxiao Luo, Yue Ma et al.
Flowing Datasets with Wasserstein over Wasserstein Gradient Flows
Clément Bonet, Christophe Vauthier, Anna Korba
BrainACTIV: Identifying visuo-semantic properties driving cortical selectivity using diffusion-based image manipulation
Diego García Cerdas, Christina Sartzetaki, Magnus Petersen et al.
Task Generalization with Autoregressive Compositional Structure: Can Learning from $D$ Tasks Generalize to $D^T$ Tasks?
Amirhesam Abedsoltan, Huaqing Zhang, Kaiyue Wen et al.
MATCHA: Towards Matching Anything
Fei Xue, Sven Elflein, Laura Leal-Taixe et al.
On the Relation between Rectified Flows and Optimal Transport
Johannes Hertrich, Antonin Chambolle, Julie Delon
Text2Relight: Creative Portrait Relighting with Text Guidance
Junuk Cha, Mengwei Ren, Krishna Kumar Singh et al.
MambaVLT: Time-Evolving Multimodal State Space Model for Vision-Language Tracking
Xinqi Liu, Li Zhou, Zikun Zhou et al.
WaterDiffusion: Learning a Prior-involved Unrolling Diffusion for Joint Underwater Saliency Detection and Visual Restoration
Laibin Chang, Yunke Wang, Longxiang Deng et al.
DMWM: Dual-Mind World Model with Long-Term Imagination
Lingyi Wang, Rashed Shelim, Walid Saad et al.
PROXSPARSE: REGULARIZED LEARNING OF SEMI-STRUCTURED SPARSITY MASKS FOR PRETRAINED LLMS
Hongyi Liu, Rajarshi Saha, Zhen Jia et al.
ArtiScene: Language-Driven Artistic 3D Scene Generation Through Image Intermediary
Zeqi Gu, Yin Cui, Max Li et al.
Intrinsic Dimension Correlation: uncovering nonlinear connections in multimodal representations
Lorenzo Basile, Santiago Acevedo, Luca Bortolussi et al.
Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models
Cameron Tice, Philipp Kreer, Nathan Helm-Burger et al.
T-SHIRT: Token-Selective Hierarchical Data Selection for Instruction Tuning
Yanjun Fu, Faisal Hamman, Sanghamitra Dutta
ProtoArgNet: Interpretable Image Classification with Super-Prototypes and Argumentation
Hamed Ayoobi, Nico Potyka, Francesca Toni
Understanding and Mitigating Memorization in Diffusion Models for Tabular Data
Zhengyu Fang, Zhimeng Jiang, Huiyuan Chen et al.
Dynamical Diffusion: Learning Temporal Dynamics with Diffusion Models
Xingzhuo Guo, Yu Zhang, Baixu Chen et al.
DEALing with Image Reconstruction: Deep Attentive Least Squares
Mehrsa Pourya, Erich Kobler, Michael Unser et al.
DexGarmentLab: Dexterous Garment Manipulation Environment with Generalizable Policy
Yuran Wang, Ruihai Wu, Yue Chen et al.
Exploit Your Latents: Coarse-Grained Protein Backmapping with Latent Diffusion Models
Rongchao Zhang, Yu Huang, Yiwei Lou et al.
ELICIT: LLM Augmentation Via External In-context Capability
Futing Wang, Jianhao (Elliott) Yan, Yue Zhang et al.
UniPre3D: Unified Pre-training of 3D Point Cloud Models with Cross-Modal Gaussian Splatting
Ziyi Wang, Yanran Zhang, Jie Zhou et al.
Understanding the Limits of Deep Tabular Methods with Temporal Shift
Haorun Cai, Han-Jia Ye
Always Skip Attention
Yiping Ji, Hemanth Saratchandran, Peyman Moghadam et al.
Tracing the Representation Geometry of Language Models from Pretraining to Post-training
Melody Li, Kumar Krishna Agrawal, Arna Ghosh et al.
The Persistence of Neural Collapse Despite Low-Rank Bias
Connall Garrod, Jonathan Keating
Mask in the Mirror: Implicit Sparsification
Tom Jacobs, Rebekka Burkholz
PhysAug: A Physical-guided and Frequency-based Data Augmentation for Single-Domain Generalized Object Detection
Xiaoran Xu, Jiangang Yang, Wenhui Shi et al.
Linear combinations of latents in generative models: subspaces and beyond
Erik Bodin, Alexandru Stere, Dragos Margineantu et al.
DualCP: Rehearsal-Free Domain-Incremental Learning via Dual-Level Concept Prototype
Qiang Wang, Yuhang He, Songlin Dong et al.
Point Clouds Meets Physics: Dynamic Acoustic Field Fitting Network for Point Cloud Understanding
Changshuo Wang, Shuting He, Xiang Fang et al.
Are Expressive Models Truly Necessary for Offline RL?
Guan Wang, Haoyi Niu, Jianxiong Li et al.
SUMI-IFL: An Information-Theoretic Framework for Image Forgery Localization with Sufficiency and Minimality Constraints
Ziqi Sheng, Wei Lu, Xiangyang Luo et al.
HumanoidGen: Data Generation for Bimanual Dexterous Manipulation via LLM Reasoning
Zhi Jing, Siyuan Yang, Jicong Ao et al.
MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems
Xuanming Zhang, Yuxuan Chen, Samuel (Min-Hsuan) Yeh et al.
CSformer: Combining Channel Independence and Mixing for Robust Multivariate Time Series Forecasting
Haoxin Wang, Yipeng Mo, Kunlan Xiang et al.
Runtime Analysis for Multi-Objective Evolutionary Algorithms in Unbounded Integer Spaces
Benjamin Doerr, Martin S. Krejca, Günter Rudolph
Bi-level Contrastive Learning for Knowledge-Enhanced Molecule Representations
Pengcheng Jiang, Cao Xiao, Tianfan Fu et al.
Týr-the-Pruner: Structural Pruning LLMs via Global Sparsity Distribution Optimization
Guanchen Li, Yixing Xu, Zeping Li et al.
Locally Convex Global Loss Network for Decision-Focused Learning
Haeun Jeon, Hyunglip Bae, Minsu Park et al.
LaTexBlend: Scaling Multi-concept Customized Generation with Latent Textual Blending
Jian Jin, Zhenbo Yu, Yang Shen et al.
MOL-Mamba: Enhancing Molecular Representation with Structural & Electronic Insights
Jingjing Hu, Dan Guo, Zhan Si et al.
A Unified Model for Compressed Sensing MRI Across Undersampling Patterns
Armeet Singh Jatyani, Jiayun Wang, Aditi Chandrashekar et al.
TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs
Yunheng Li, Jing Cheng, Shaoyong Jia et al.
GIViC: Generative Implicit Video Compression
Ge Gao, Siyue Teng, Tianhao Peng et al.
Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents
Han Lin, Jaemin Cho, Amir Zadeh et al.
Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning
Yang You, Yixin Li, Congyue Deng et al.
Utilize the Flow Before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning
Runchuan Zhu, Zhipeng Ma, Jiang Wu et al.
Dense SAE Latents Are Features, Not Bugs
Xiaoqing Sun, Alessandro Stolfo, Joshua Engels et al.
Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models
Xuran Ma, Yexin Liu, Yaofu LIU et al.
TaskGalaxy: Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types
Jiankang Chen, Tianke Zhang, Changyi Liu et al.
Reconstructing Humans with a Biomechanically Accurate Skeleton
Yan Xia, Xiaowei Zhou, Etienne Vouga et al.
Hypergraph Attacks via Injecting Homogeneous Nodes into Elite Hyperedges
Meixia He, Peican Zhu, Keke Tang et al.
YOLO-Count: Differentiable Object Counting for Text-to-Image Generation
Guanning Zeng, Xiang Zhang, Zirui Wang et al.
Aligning Language Models Using Follow-up Likelihood as Reward Signal
Chen Zhang, Dading Chong, Feng Jiang et al.
OmniCache: A Trajectory-Oriented Global Perspective on Training-Free Cache Reuse for Diffusion Transformer Models
Huanpeng Chu, Wei Wu, Guanyu Feng et al.
Parameter Efficient Fine-tuning via Explained Variance Adaptation
Fabian Paischer, Lukas Hauzenberger, Thomas Schmied et al.
Jailbreak-AudioBench: In-Depth Evaluation and Analysis of Jailbreak Threats for Large Audio Language Models
Hao Cheng, Erjia Xiao, Jing Shao et al.
PromptDresser: Improving the Quality and Controllability of Virtual Try-On via Generative Textual Prompt and Prompt-aware Mask
Jeongho Kim, Hoiyeong Jin, Sunghyun Park et al.
Autocorrelation Matters: Understanding the Role of Initialization Schemes for State Space Models
Fusheng Liu, Qianxiao Li
IMDPrompter: Adapting SAM to Image Manipulation Detection by Cross-View Automated Prompt Learning
Quan Zhang, Yuxin Qi, Xi Tang et al.