Most Cited 2025 "temporal frame prediction" Papers
22,274 papers found • Page 95 of 112
Conference
LDAdam: Adaptive Optimization from Low-Dimensional Gradient Statistics
Thomas Robert, Mher Safaryan, Ionut-Vlad Modoranu et al.
Tracking objects that change in appearance with phase synchrony
Sabine Muzellec, Drew Linsley, Alekh Ashok et al.
Descent with Misaligned Gradients and Applications to Hidden Convexity
Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar et al.
Diffusion State-Guided Projected Gradient for Inverse Problems
Rayhan Zirvi, Bahareh Tolooshams, anima anandkumar
Learning from weak labelers as constraints
Vishwajeet Agrawal, Rattana Pukdee, Nina Balcan et al.
A Distributional Approach to Uncertainty-Aware Preference Alignment Using Offline Demonstrations
Sheng Xu, Bo Yue, Hongyuan Zha et al.
Estimating the Probabilities of Rare Outputs in Language Models
Gabriel Wu, Jacob Hilton
Self-Normalized Resets for Plasticity in Continual Learning
Vivek Farias, Adam Jozefiak
Training on the Test Task Confounds Evaluation and Emergence
Ricardo Dominguez-Olmedo, Florian Eddie Dorner, Moritz Hardt
COME: Test-time Adaption by Conservatively Minimizing Entropy
Qingyang Zhang, Yatao Bian, Xinke Kong et al.
Oracle efficient truncated statistics
Konstantinos Karatapanis, Vasilis Kontonis, Christos Tzamos
Training Free Guided Flow-Matching with Optimal Control
Luran Wang, Chaoran Cheng, Yizhen Liao et al.
SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation
Mingjie Li, Wai Man Si, Michael Backes et al.
BTBS-LNS: Binarized-Tightening, Branch and Search on Learning LNS Policies for MIP
Hao Yuan, wenli ouyang, Changwen Zhang et al.
Pre-training of Foundation Adapters for LLM Fine-tuning
Linh The Nguyen, Dat Quoc Nguyen
NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation
Zhiyuan Liu, Yanchen Luo, Han Huang et al.
NeuroLM: A Universal Multi-task Foundation Model for Bridging the Gap between Language and EEG Signals
Wei-Bang Jiang, Yansen Wang, Bao-liang Lu et al.
Conformalized Interactive Imitation Learning: Handling Expert Shift and Intermittent Feedback
Michelle Zhao, Henny Admoni, Reid Simmons et al.
A Computational Framework for Modeling Emergence of Color Vision in the Human Brain
Atsunobu Kotani, Yi-Ren Ng
From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency
Kaiyue Wen, Huaqing Zhang, Hongzhou Lin et al.
Unsupervised Multiple Kernel Learning for Graphs via Ordinality Preservation
Yan Sun, Stanley Kok
Collaborative Discrete-Continuous Black-Box Prompt Learning for Language Models
Hualin Zhang, Haozhen Zhang, Zhekai Liu et al.
Generalizable Human Gaussians from Single-View Image
Jinnan Chen, Chen Li, Jianfeng Zhang et al.
Leveraging Flatness to Improve Information-Theoretic Generalization Bounds for SGD
Ze Peng, Jian Zhang, Yisen Wang et al.
Efficient Interpolation between Extragradient and Proximal Methods for Weak MVIs
Thomas Pethick, Ioannis Mavrothalassitis, Volkan Cevher
Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Models
Simon Schrodi, David T. Hoffmann, Max Argus et al.
VideoShield: Regulating Diffusion-based Video Generation Models via Watermarking
Runyi Hu, Jie Zhang, Yiming Li et al.
Flaws of ImageNet, Computer Vision's Favourite Dataset
Nikita Kisel, Illia Volkov, Kateřina Hanzelková et al.
Influence Functions for Scalable Data Attribution in Diffusion Models
Bruno Mlodozeniec, Runa Eschenhagen, Juhan Bae et al.
Lossy Compression with Pretrained Diffusion Models
jeremy vonderfecht, Feng Liu
Highly Efficient Self-Adaptive Reward Shaping for Reinforcement Learning
Haozhe Ma, Zhengding Luo, Thanh Vinh Vo et al.
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Jintao Zhang, Jia wei, Pengle Zhang et al.
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training
Renqiu Xia, mingsheng li, Hancheng Ye et al.
Safety Layers in Aligned Large Language Models: The Key to LLM Security
Shen Li, Liuyi Yao, Lan Zhang et al.
PIN: Prolate Spheroidal Wave Function-based Implicit Neural Representations
Viraj Dhananjaya Bandara Jayasundara Jayasundara Mudiyanselage, Heng Zhao, Demetrio Labate et al.
Learning Harmonized Representations for Speculative Sampling
Lefan Zhang, Xiaodan Wang, Yanhua Huang et al.
Extendable and Iterative Structure Learning Strategy for Bayesian Networks
Hamid Kalantari, Russell Greiner, Pouria Ramazi
KinFormer: Generalizable Dynamical Symbolic Regression for Catalytic Organic Reaction Kinetics
Jindou Chen, Jidong Tian, Liang Wu et al.
Transformers Provably Solve Parity Efficiently with Chain of Thought
Juno Kim, Taiji Suzuki
CLIPure: Purification in Latent Space via CLIP for Adversarially Robust Zero-Shot Classification
Mingkun Zhang, Keping Bi, Wei Chen et al.
Models trained with unnormalized density functions: A need for a course correction
Rishal Aggarwal, Daniel Penaherrera, Justin Shao et al.
REMEDY: Recipe Merging Dynamics in Large Vision-Language Models
Didi Zhu, Yibing Song, tao shen et al.
Mitigating Modality Prior-Induced Hallucinations in Multimodal Large Language Models via Deciphering Attention Causality
Guanyu Zhou, Yibo Yan, Xin Zou et al.
Noise Separation guided Candidate Label Reconstruction for Noisy Partial Label Learning
Xiaorui Peng, Yuheng Jia, Fuchao Yang et al.
ILLUSION: Unveiling Truth with a Comprehensive Multi-Modal, Multi-Lingual Deepfake Dataset
Kartik Thakral, Rishabh Ranjan, Akanksha Singh et al.
MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation
Chenxi Wang, Xiang Chen, Ningyu Zhang et al.
Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuning
Anh Tong, Thanh Nguyen-Tang, Dongeun Lee et al.
Projection Head is Secretly an Information Bottleneck
Zhuo Ouyang, Kaiwen Hu, Qi Zhang et al.
Boltzmann Semantic Score: A Semantic Metric for Evaluating Large Vision Models Using Large Language Models
Ali Khajegili Mirabadi, Katherine Rich, Hossein Farahani et al.
Unsupervised Zero-Shot Reinforcement Learning via Dual-Value Forward-Backward Representation
Jingbo Sun, Songjun Tu, Qichao Zhang et al.
RuAG: Learned-rule-augmented Generation for Large Language Models
Yudi Zhang, Pei Xiao, Lu Wang et al.
SOAP: Improving and Stabilizing Shampoo using Adam for Language Modeling
Nikhil Vyas, Depen Morwani, Rosie Zhao et al.
When is Task Vector Provably Effective for Model Editing? A Generalization Analysis of Nonlinear Transformers
Hongkang Li, Yihua Zhang, shuai ZHANG et al.
Improving Deep Regression with Tightness
Shihao Zhang, Yuguang Yan, Angela Yao
GSE: Group-wise Sparse and Explainable Adversarial Attacks
Shpresim Sadiku, Moritz Wagner, Sebastian Pokutta
Understanding Methods for Scalable MCTS
Will Knipe
The impact of allocation strategies in subset learning on the expressive power of neural networks
Ofir Schlisselberg, Ran Darshan
Wavelet Diffusion Neural Operator
Peiyan Hu, Rui Wang, Xiang Zheng et al.
VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text
Tianyu Zhang, Suyuchen Wang, Lu Li et al.
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation
Lu Li, Tianyu Zhang, Zhiqi Bu et al.
OccProphet: Pushing the Efficiency Frontier of Camera-Only 4D Occupancy Forecasting with an Observer-Forecaster-Refiner Framework
Junliang Chen, Huaiyuan Xu, Yi Wang et al.
Nonlinear Sequence Embedding by Monotone Variational Inequality
Jonathan Y. Zhou, Yao Xie
Agree to Disagree: Demystifying Homogeneous Deep Ensembles through Distributional Equivalence
Yipei Wang, Xiaoqian Wang
Quantum (Inspired) $D^2$-sampling with Applications
Poojan Shah, Ragesh Jaiswal
Inspection and Control of Self-Generated-Text Recognition Ability in Llama3-8b-Instruct
Christopher Ackerman, Nina Panickssery
Discovering Clone Negatives via Adaptive Contrastive Learning for Image-Text Matching
Renjie Pan, Jihao Dong, Hua Yang
Mitigating Reward Over-Optimization in RLHF via Behavior-Supported Regularization
Juntao Dai, Taiye Chen, Yaodong Yang et al.
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
Xinchen Zhang, Ling Yang, Guohao Li et al.
Resolution Attack: Exploiting Image Compression to Deceive Deep Neural Networks
Wangjia Yu, Xiaomeng Fu, Qiao Li et al.
Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?
xueru wen, Jie Lou, Yaojie Lu et al.
Walk the Talk? Measuring the Faithfulness of Large Language Model Explanations
Katie Matton, Robert Ness, John Guttag et al.
ProtPainter: Draw or Drag Protein via Topology-guided Diffusion
Zhengxi Lu, Shizhuo Cheng, Yuru Jiang et al.
Redefining the task of Bioactivity Prediction
Yanwen Huang, Bowen Gao, Yinjun JIA et al.
CtD: Composition through Decomposition in Emergent Communication
Boaz Carmeli, Ron Meir, Yonatan Belinkov
Reframing Structure-Based Drug Design Model Evaluation via Metrics Correlated to Practical Needs
Bowen Gao, Haichuan Tan, Yanwen Huang et al.
Score-based Self-supervised MRI Denoising
Jiachen Tu, Yaokun Shi, Fan Lam
Revisit Micro-batch Clipping: Adaptive Data Pruning via Gradient Manipulation
Lun Wang
Local Patterns Generalize Better for Novel Anomalies
Yalong Jiang
Model Risk-sensitive Offline Reinforcement Learning
Gwangpyo Yoo, Honguk Woo
Beyond Squared Error: Exploring Loss Design for Enhanced Training of Generative Flow Networks
Rui Hu, Yifan Zhang, Zhuoran Li et al.
Offline RL in Regular Decision Processes: Sample Efficiency via Language Metrics
Ahana Deb, Roberto Cipollone, Anders Jonsson et al.
TempMe: Video Temporal Token Merging for Efficient Text-Video Retrieval
Leqi Shen, Tianxiang Hao, Tao He et al.
FIRING-Net: A filtered feature recycling network for speech enhancement
Xinmeng Xu, Yiqun Zhang, Jizhen Li et al.
ZooProbe: A Data Engine for Evaluating, Exploring, and Evolving Large-scale Training Data for Multimodal LLMs
Yi-Kai Zhang, Shiyin Lu, Qing-Guo Chen et al.
Simple yet Effective Incomplete Multi-view Clustering: Similarity-level Imputation and Intra-view Hybrid-group Prototype Construction
Shengju Yu, Zhibin Dong, Siwei Wang et al.
UniCBE: An Uniformity-driven Comparing Based Evaluation Framework with Unified Multi-Objective Optimization
Peiwen Yuan, Shaoxiong Feng, Yiwei Li et al.
Personality Alignment of Large Language Models
Minjun Zhu, Yixuan Weng, Linyi Yang et al.
SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding
Sihang Li, Jin Huang, Jiaxi Zhuang et al.
UniRestore3D: A Scalable Framework For General Shape Restoration
Yuang Wang, Yujian Zhang, Sida Peng et al.
Offline Hierarchical Reinforcement Learning via Inverse Optimization
Carolin Schmidt, Daniele Gammelli, James Harrison et al.
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
Hongkang Li, Songtao Lu, Pin-Yu Chen et al.
Adversarially Robust Anomaly Detection through Spurious Negative Pair Mitigation
Hossein Mirzaei Sadeghlou, Mojtaba Nafez, Jafar Habibi et al.
T2V2: A Unified Non-Autoregressive Model for Speech Recognition and Synthesis via Multitask Learning
Nabarun Goswami, Hanqin Wang, Tatsuya Harada
One Hundred Neural Networks and Brains Watching Videos: Lessons from Alignment
Christina Sartzetaki, Gemma Roig, Cees G Snoek et al.
TD-Paint: Faster Diffusion Inpainting Through Time-Aware Pixel Conditioning
Tsiry MAYET, Pourya Shamsolmoali, Simon Bernard et al.
One for all and all for one: Efficient computation of partial Wasserstein distances on the line
Laetitia Chapel, Romain Tavenard
Learning Dynamics of Deep Matrix Factorization Beyond the Edge of Stability
Avrajit Ghosh, Soo Min Kwon, Rongrong Wang et al.
MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models
Peng Xia, Kangyu Zhu, Haoran Li et al.
Hotspot-Driven Peptide Design via Multi-Fragment Autoregressive Extension
Jiahan Li, Tong Chen, Shitong Luo et al.
Any-step Dynamics Model Improves Future Predictions for Online and Offline Reinforcement Learning
Haoxin Lin, Yu-Yan Xu, Yihao Sun et al.
Semantic Temporal Abstraction via Vision-Language Model Guidance for Efficient Reinforcement Learning
Tian-Shuo Liu, Xu-Hui Liu, Ruifeng Chen et al.
Scaling FP8 training to trillion-token LLMs
Maxim Fishman, Brian Chmiel, Ron Banner et al.
Enhancing Pre-trained Representation Classifiability can Boost its Interpretability
Reassessing How to Compare and Improve the Calibration of Machine Learning Models
Muthu Chidambaram, Rong Ge
On Stochastic Contextual Bandits with Knapsacks in Small Budget Regime
Hengquan Guo, Xin Liu
For Better or For Worse? Learning Minimum Variance Features With Label Augmentation
Muthu Chidambaram, Rong Ge
Residual-MPPI: Online Policy Customization for Continuous Control
Pengcheng Wang, Chenran Li, Catherine Weaver et al.
ImagineNav: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination
Xinxin Zhao, Wenzhe Cai, Likun Tang et al.
Restructuring Vector Quantization with the Rotation Trick
Christopher Fifty, Ronald Junkins, Dennis Duan et al.
SAGEPhos: Sage Bio-Coupled and Augmented Fusion for Phosphorylation Site Detection
Jingjie Zhang, Hanqun Cao, Zijun Gao et al.
EmbedLLM: Learning Compact Representations of Large Language Models
Richard Zhuang, Tianhao Wu, Zhaojin Wen et al.
TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Trees
Weibin Liao, Xu Chu, Yasha Wang
Bridging the Gap Between f-divergences and Bayes Hilbert Spaces
Linus Lach, Alexander Fottner, Yarema Okhrin
DeepTAGE: Deep Temporal-Aligned Gradient Enhancement for Optimizing Spiking Neural Networks
Wei Liu, Li Yang, Mingxuan Zhao et al.
Round and Round We Go! What makes Rotary Positional Encodings useful?
Federico Barbero, Alex Vitvitskyi, Christos Perivolaropoulos et al.
SysCaps: Language Interfaces for Simulation Surrogates of Complex Systems
Patrick Emami, Zhaonan Li, Saumya Sinha et al.
Revisit the Open Nature of Open Vocabulary Semantic Segmentation
Qiming Huang, Han Hu, Jianbo Jiao
Multi-Scale Fusion for Object Representation
Rongzhen Zhao, Vivienne Huiling Wang, Juho Kannala et al.
Recognize Any Surgical Object: Unleashing the Power of Weakly-Supervised Data
Jiajie Li, Brian Quaranto, Chenhui Xu et al.
GOLD: Graph Out-of-Distribution Detection via Implicit Adversarial Latent Generation
Danny Wang, Ruihong Qiu, Guangdong Bai et al.
SafeWatch: An Efficient Safety-Policy Following Video Guardrail Model with Transparent Explanations
Zhaorun Chen, Francesco Pinto, Minzhou Pan et al.
Instructional Segment Embedding: Improving LLM Safety with Instruction Hierarchy
Tong Wu, Shujian Zhang, Kaiqiang Song et al.
Data Pruning by Information Maximization
Haoru Tan, Sitong Wu, Wei Huang et al.
An Evolved Universal Transformer Memory
Edoardo Cetin, Qi Sun, Tianyu Zhao et al.
Memory Mosaics
Jianyu Zhang, Niklas Nolte, Ranajoy Sadhukhan et al.
Improving Long-Text Alignment for Text-to-Image Diffusion Models
Luping Liu, Chao Du, Tianyu Pang et al.
Enhancing End-to-End Autonomous Driving with Latent World Model
Yingyan Li, Lue Fan, Jiawei He et al.
On the Computation of the Fisher Information in Continual Learning
Gido van de Ven
CREAM: Consistency Regularized Self-Rewarding Language Models
Zhaoyang Wang, Weilei He, Zhiyuan Liang et al.
3DMolFormer: A Dual-channel Framework for Structure-based Drug Discovery
Xiuyuan Hu, Guoqing Liu, Can Chen et al.
A Geometric Framework for Understanding Memorization in Generative Models
Brendan Ross, Hamidreza Kamkari, Tongzi Wu et al.
TLDR: Token-Level Detective Reward Model for Large Vision Language Models
Deqing Fu, Tong Xiao, Rui Wang et al.
Glimpse: Enabling White-Box Methods to Use Proprietary Models for Zero-Shot LLM-Generated Text Detection
Guangsheng Bao, Yanbin Zhao, Juncai He et al.
SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget
Zihao Wang, Bin CUI, Shaoduo Gan
Towards Domain Adaptive Neural Contextual Bandits
Ziyan Wang, Xiaoming Huo, Hao Wang
Investigating Pattern Neurons in Urban Time Series Forecasting
Chengxin Wang, Yiran Zhao, shaofeng cai et al.
Wayward Concepts In Multimodal Models
Brandon Trabucco, Max Gurinas, Kyle Doherty et al.
Can Watermarks be Used to Detect LLM IP Infringement For Free?
Zhengyue Zhao, Xiaogeng Liu, Somesh Jha et al.
Learning Diagrams: A Graphical Language for Compositional Training Regimes
Mason Lary, Richard Samuelson, Alexander Wilentz et al.
Neural Approximate Mirror Maps for Constrained Diffusion Models
Berthy Feng, Ricardo Baptista, Katherine Bouman
GANDALF: Generative AttentioN based Data Augmentation and predictive modeLing Framework for personalized cancer treatment
Aishwarya Jayagopal, Yanrong Zhang, Robert Walsh et al.
Provably Safeguarding a Classifier from OOD and Adversarial Samples
Nicolas Atienza, Johanne Cohen, Christophe Labreuche et al.
On the Fourier analysis in the SO(3) space : the EquiLoPO Network
Dmitrii Zhemchuzhnikov, Sergei Grudinin
Adapting Multi-modal Large Language Model to Concept Drift From Pre-training Onwards
Xiaoyu Yang, Jie Lu, En Yu
Bridging the Gap between Variational Inference and Stochastic Gradient MCMC in Function Space
Mengjing Wu, Junyu Xuan, Jie Lu
Grammar Reinforcement Learning: path and cycle counting in graphs with a Context-Free Grammar and Transformer approach
Jason Piquenot, Maxime Berar, Romain Raveaux et al.
HyperFace: Generating Synthetic Face Recognition Datasets by Exploring Face Embedding Hypersphere
Hatef Otroshi Shahreza, Sébastien Marcel
Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance
Dongmin Park, Sebin Kim, Taehong Moon et al.
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Boyu Gou, Demi Ruohan Wang, Boyuan Zheng et al.
Scrutinize What We Ignore: Reining In Task Representation Shift Of Context-Based Offline Meta Reinforcement Learning
Hai Zhang, Boyuan Zheng, Tianying Ji et al.
BIRD: A Trustworthy Bayesian Inference Framework for Large Language Models
Yu Feng, Ben Zhou, Weidong Lin et al.
Decentralized Optimization with Coupled Constraints
Demyan Yarmoshik, Alexander Rogozin, Nikita Kiselev et al.
A Visual Dive into Conditional Flow Matching
Anne Gagneux, Ségolène Martin, Rémi Emonet et al.
Youku Dense Caption: A Large-scale Chinese Video Dense Caption Dataset and Benchmarks
Zixuan Xiong, Guangwei Xu, wenkai zhang et al.
Large Language Models Often Say One Thing and Do Another
Ruoxi Xu, Hongyu Lin, Xianpei Han et al.
Enhancing Vision-Language Model with Unmasked Token Alignment
Hongsheng Li, Jihao Liu, Boxiao Liu et al.
A3D: Does Diffusion Dream about 3D Alignment?
Savva Ignatyev, Nina Konovalova, Daniil Selikhanovych et al.
Making Transformer Decoders Better Differentiable Indexers
Wuchao Li, Kai Zheng, Defu Lian et al.
The KoLMogorov Test: Compression by Code Generation
Ori Yoran, Kunhao Zheng, Fabian Gloeckle et al.
Long Context Compression with Activation Beacon
Peitian Zhang, Zheng Liu, Shitao Xiao et al.
K-HALU: Multiple Answer Korean Hallucination Benchmark for Large Language Models
Jaehyung Seo, Heuiseok Lim
AutoUAD: Hyper-parameter Optimization for Unsupervised Anomaly Detection
Wei Dai, Jicong Fan
FlashMask: Efficient and Rich Mask Extension of FlashAttention
Guoxia Wang, Jinle Zeng, Xiyuan Xiao et al.
CipherPrune: Efficient and Scalable Private Transformer Inference
Yancheng Zhang, Jiaqi Xue, Mengxin Zheng et al.
FreCaS: Efficient Higher-Resolution Image Generation via Frequency-aware Cascaded Sampling
zhengqiang ZHANG, Ruihuang Li, Lei Zhang
Spatial-Mamba: Effective Visual State Space Models via Structure-Aware State Fusion
Chaodong Xiao, Minghan Li, zhengqiang ZHANG et al.
Data Selection via Optimal Control for Language Models
Yuxian Gu, Li Dong, Hongning Wang et al.
TeaserGen: Generating Teasers for Long Documentaries
Weihan Xu, Paul Pu Liang, Haven Kim et al.
VVC-Gym: A Fixed-Wing UAV Reinforcement Learning Environment for Multi-Goal Long-Horizon Problems
Xudong Gong, Feng Dawei, Kele Xu et al.
Scaling Laws for Downstream Task Performance in Machine Translation
Berivan Isik, NATALIA PONOMAREVA, Hussein Hazimeh et al.
Ranking-aware adapter for text-driven image ordering with CLIP
Wei-Hsiang Yu, Yen-Yu Lin, Ming-Hsuan Yang et al.
CURIE: Evaluating LLMs on Multitask Scientific Long-Context Understanding and Reasoning
Hao Cui, Zahra Shamsi, Gowoon Cheon et al.
SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression
Xin Wang, Yu Zheng, Zhongwei Wan et al.
LASER: A Neuro-Symbolic Framework for Learning Spatio-Temporal Scene Graphs with Weak Supervision
Jiani Huang, Ziyang Li, Mayur Naik et al.
Federated Continual Learning Goes Online: Uncertainty-Aware Memory Management for Vision Tasks and Beyond
Giuseppe Serra, Florian Buettner
Diversity-Rewarded CFG Distillation
Geoffrey Cideron, Andrea Agostinelli, Johan Ferret et al.
Backdooring Vision-Language Models with Out-Of-Distribution Data
Weimin Lyu, Michael Yao, Saumya Gupta et al.
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models
Chankyu Lee, Rajarshi Roy, Mengyao Xu et al.
GenXD: Generating Any 3D and 4D Scenes
Yuyang Zhao, Chung-Ching Lin, Kevin Lin et al.
Meta-Continual Learning of Neural Fields
Seungyoon Woo, Junhyeog Yun, Gunhee Kim
Semi-Supervised Vision-Centric 3D Occupancy World Model for Autonomous Driving
Xiang Li, Pengfei Li, Yupeng Zheng et al.
Adversarial Attacks on Data Attribution
Xinhe Wang, Pingbang Hu, Junwei Deng et al.
DPLM-2: A Multimodal Diffusion Protein Language Model
Xinyou Wang, Zaixiang Zheng, Fei YE et al.
Continuous Autoregressive Modeling with Stochastic Monotonic Alignment for Speech Synthesis
Weiwei Lin, Chenhang HE
Trained Transformer Classifiers Generalize and Exhibit Benign Overfitting In-Context
Spencer Frei, Gal Vardi
Grounding Multimodal Large Language Model in GUI World
Weixian Lei, Difei Gao, Mike Zheng Shou
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Haotian Zhang, Mingfei Gao, Zhe Gan et al.
NarrativeBridge: Enhancing Video Captioning with Causal-Temporal Narrative
Asmar Nadeem, Faegheh Sardari, Robert Dawes et al.
Learning View-invariant World Models for Visual Robotic Manipulation
Jing-Cheng Pang, Nan Tang, Kaiyuan Li et al.
Exploring Local Memorization in Diffusion Models via Bright Ending Attention
Chen Chen, Daochang Liu, Mubarak Shah et al.
Exploiting Structure in Offline Multi-Agent RL: The Benefits of Low Interaction Rank
Wenhao Zhan, Scott Fujimoto, Zheqing Zhu et al.
Towards Generalization Bounds of GCNs for Adversarially Robust Node Classification
Wen Wen, Han Li, Tieliang Gong et al.
Restating the Proof of Linear Convergence for Linear GNNs
Huayi Tang, Yuhe Guo, Yong Liu et al.
TGB-Seq Benchmark: Challenging Temporal GNNs with Complex Sequential Dynamics
Lu Yi, Jie Peng, Yanping Zheng et al.
Process Reward Model with Q-value Rankings
Wendi Li, Yixuan Li
UniCon: Unidirectional Information Flow for Effective Control of Large-Scale Diffusion Models
Fanghua Yu, Jinjin Gu, Jinfan Hu et al.
Efficient Cross-Episode Meta-RL
Gresa Shala, André Biedenkapp, Pierre Krack et al.
Diffusion Actor-Critic: Formulating Constrained Policy Iteration as Diffusion Noise Regression for Offline Reinforcement Learning
Linjiajie Fang, Ruoxue Liu, Jing Zhang et al.
TIGeR: Unifying Text-to-Image Generation and Retrieval with Large Multimodal Models
Leigang Qu, Haochuan Li, Tan Wang et al.
Rethinking Neural Multi-Objective Combinatorial Optimization via Neat Weight Embedding
Jinbiao Chen, Zhiguang Cao, Jiahai Wang et al.