Most Cited 2025 "temporal frame prediction" Papers
22,274 papers found • Page 15 of 112
Conference
DreamDistribution: Learning Prompt Distribution for Diverse In-distribution Generation
Brian Nlong Zhao, Yuhang Xiao, Jiashu Xu et al.
Realistic Evaluation of Deep Partial-Label Learning Algorithms
Wei Wang, Dong-Dong Wu, Jindong Wang et al.
AlphaZero Neural Scaling and Zipf's Law: a Tale of Board Games and Power Laws
Oren Neumann, Claudius Gros
SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation
Leigang Qu, Haochuan Li, Wenjie Wang et al.
X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing
Xinyan Chen, Jianfei Yang
Adversarially Robust Out-of-Distribution Detection Using Lyapunov-Stabilized Embeddings
Hossein Mirzaei Sadeghlou, Mackenzie Mathis
On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Bingrui Li, Wei Huang, Andi Han et al.
Provable Convergence and Limitations of Geometric Tempering for Langevin Dynamics
Omar Chehab, Anna Korba, Austin Stromme et al.
EventGPT: Event Stream Understanding with Multimodal Large Language Models
shaoyu liu, Jianing Li, guanghui zhao et al.
SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement
Yuqi Lin, Hengjia Li, Wenqi Shao et al.
(Mis)Fitting Scaling Laws: A Survey of Scaling Law Fitting Techniques in Deep Learning
Margaret Li, Sneha Kudugunta, Luke Zettlemoyer
PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution
Zhu Li Bo, Jianze Li, Haotong Qin et al.
Edge Prompt Tuning for Graph Neural Networks
Xingbo Fu, Yinhan He, Jundong Li
SMARTIES: Spectrum-Aware Multi-Sensor Auto-Encoder for Remote Sensing Images
Gencer Sumbul, Chang Xu, Emanuele Dalsasso et al.
SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters
Jianping Jiang, Weiye Xiao, Zhengyu Lin et al.
GLASS: Guided Latent Slot Diffusion for Object-Centric Learning
Krishnakant Singh, Simone Schaub-Meyer, Stefan Roth
On Reasoning Strength Planning in Large Reasoning Models
Leheng Sheng, An Zhang, Zijian Wu et al.
Fast Summation of Radial Kernels via QMC Slicing
Johannes Hertrich, Tim Jahn, Michael Quellmalz
Distilling Monocular Foundation Model for Fine-grained Depth Completion
Yingping Liang, Yutao Hu, Wenqi Shao et al.
MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining
Yunze Liu, Li Yi
Monet: Mixture of Monosemantic Experts for Transformers
Jungwoo Park, Young Jin Ahn, Kee-Eung Kim et al.
I Can Hear You: Selective Robust Training for Deepfake Audio Detection
Zirui Zhang, Wei Hao, Aroon Sankoh et al.
CoA: Towards Real Image Dehazing via Compression-and-Adaptation
Long Ma, Yuxin Feng, Yan Zhang et al.
Random-Set Neural Networks
Shireen Kudukkil Manchingal, Muhammad Mubashar, Kaizheng Wang et al.
Visual-Instructed Degradation Diffusion for All-in-One Image Restoration
Haina Qin, Wenyang Luo, Zewen Chen et al.
Accurate and Regret-Aware Numerical Problem Solver for Tabular Question Answering
Yuxiang Wang, Jianzhong Qi, Junhao Gan
OLinear: A Linear Model for Time Series Forecasting in Orthogonally Transformed Domain
Wenzhen Yue, Yong Liu, Hao Wang et al.
Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model
Benlin Liu, Yuhao Dong, Yiqin Wang et al.
Do Large Language Models Truly Understand Geometric Structures?
Xiaofeng Wang, Yiming Wang, Wenhong Zhu et al.
An End-to-End Robust Point Cloud Semantic Segmentation Network with Single-Step Conditional Diffusion Models
Wentao Qu, Jing Wang, Yongshun Gong et al.
Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel
Zun Wang, Jialu Li, Yicong Hong et al.
DreamText: High Fidelity Scene Text Synthesis
Yibin Wang, Weizhong Zhang, honghui xu et al.
Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration
Kang Liao, Zongsheng Yue, Zhouxia Wang et al.
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Miran Heo, Min-Hung Chen, De-An Huang et al.
Rethinking Query-based Transformer for Continual Image Segmentation
Yuchen Zhu, Cheng Shi, Dingyou Wang et al.
GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement
Peiye Zhuang, Songfang Han, Chaoyang Wang et al.
Beyond the convexity assumption: Realistic tabular data generation under quantifier-free real linear constraints
Mihaela Stoian, Eleonora Giunchiglia
BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems
Andy Zhang, Joey Ji, Celeste Menders et al.
Fourier Sliced-Wasserstein Embedding for Multisets and Measures
Tal Amir, Nadav Dym
Radiology Report Generation via Multi-objective Preference Optimization
Ting Xiao, Lei Shi, Peng Liu et al.
MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem
Fan LIU, Zherui Yang, Cancheng Liu et al.
Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance
Sachin Goyal, Christina Baek, Zico Kolter et al.
Shortcuts and Identifiability in Concept-based Models from a Neuro-Symbolic Lens
Samuele Bortolotti, Emanuele Marconato, Paolo Morettin et al.
Attention as a Hypernetwork
Simon Schug, Seijin Kobayashi, Yassir Akram et al.
ADBA: Approximation Decision Boundary Approach for Black-Box Adversarial Attacks
Feiyang Wang, Xingquan Zuo, Hai Huang et al.
A Watermark for Order-Agnostic Language Models
Ruibo Chen, Yihan Wu, Yanshuo Chen et al.
SuperMat: Physically Consistent PBR Material Estimation at Interactive Rates
Yijia Hong, Yuan-Chen Guo, Ran Yi et al.
FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Generalized Segmentation
Dong Zhao, Jinlong Li, Shuang Wang et al.
Chain of Attack: On the Robustness of Vision-Language Models Against Transfer-Based Adversarial Attacks
Peng Xie, Yequan Bie, Jianda Mao et al.
GenHancer: Imperfect Generative Models are Secretly Strong Vision-Centric Enhancers
Shijie Ma, Yuying Ge, Teng Wang et al.
Zero-Shot Styled Text Image Generation, but Make It Autoregressive
Vittorio Pippi, Fabio Quattrini, Silvia Cascianelli et al.
SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments
Yue Cao, Yun Xing, Jie Zhang et al.
RepLDM: Reprogramming Pretrained Latent Diffusion Models for High-Quality, High-Efficiency, High-Resolution Image Generation
Boyuan Cao, Jiaxin Ye, Yujie Wei et al.
ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models
Yeji Park, Deokyeong Lee, Junsuk Choe et al.
Enhancing Time Series Forecasting through Selective Representation Spaces: A Patch Perspective
Xingjian Wu, Xiangfei Qiu, Hanyin Cheng et al.
UnCommon Objects in 3D
Xingchen Liu, Piyush Tayal, Jianyuan Wang et al.
Hierarchical Mixture of Experts: Generalizable Learning for High-Level Synthesis
Weikai Li, Ding Wang, Zijian Ding et al.
JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent
Yunlong Lin, Zixu Lin, Kunjie Lin et al.
Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model
Dongki Kim, Wonbin Lee, Sung Ju Hwang
From Poses to Identity: Training-Free Person Re-Identification via Feature Centralization
Chao Yuan, Guiwei Zhang, Changxiao Ma et al.
NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields
Amandine Brunetto, Sascha Hornauer, Fabien Moutarde
MultiGO: Towards Multi-level Geometry Learning for Monocular 3D Textured Human Reconstruction
Gangjian Zhang, Nanjie Yao, Shunsi Zhang et al.
Probability Density Geodesics in Image Diffusion Latent Space
Qingtao Yu, Jaskirat Singh, Zhaoyuan Yang et al.
DriveEditor: A Unified 3D Information-Guided Framework for Controllable Object Editing in Driving Scenes
Yiyuan Liang, Zhiying Yan, Liqun Chen et al.
Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval
Arun Reddy, Alexander Martin, Eugene Yang et al.
EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-Judge
Ruskin Raj Manku, Yuzhi Tang, Xingjian Shi et al.
MUST: The First Dataset and Unified Framework for Multispectral UAV Single Object Tracking
Haolin Qin, Tingfa Xu, Tianhao Li et al.
DIFFER: Disentangling Identity Features via Semantic Cues for Clothes-Changing Person Re-ID
Xin Liang, Yogesh S. Rawat
PreciseCam: Precise Camera Control for Text-to-Image Generation
Edurne Bernal-Berdun, Ana Serrano, Belen Masia et al.
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
Weiming Ren, Huan Yang, Jie Min et al.
Counterfactual Generative Modeling with Variational Causal Inference
Yulun Wu, Louis McConnell, Claudia Iriondo
LoRA Subtraction for Drift-Resistant Space in Exemplar-Free Continual Learning
Xuan Liu, Xiaobin Chang
ZoRI: Towards Discriminative Zero-Shot Remote Sensing Instance Segmentation
Shiqi Huang, Shuting He, Bihan Wen
R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing
Tianyu Fu, Yi Ge, Yichen You et al.
LiDAR-RT: Gaussian-based Ray Tracing for Dynamic LiDAR Re-simulation
Chenxu Zhou, Lvchang Fu, Sida Peng et al.
GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling
Jialong Zhou, Lichao Wang, Xiao Yang
GIM: A Million-scale Benchmark for Generative Image Manipulation Detection and Localization
Yirui Chen, Xudong Huang, Quan Zhang et al.
DepthCues: Evaluating Monocular Depth Perception in Large Vision Models
Duolikun Danier, Mehmet Aygun, Changjian Li et al.
PMQ-VE: Progressive Multi-Frame Quantization for Video Enhancement
ZhanFeng Feng, Long Peng, Xin Di et al.
Markov Persuasion Processes: Learning to Persuade From Scratch
Francesco Bacchiocchi, Francesco Emanuele Stradi, Matteo Castiglioni et al.
Imagine and Seek: Improving Composed Image Retrieval with an Imagined Proxy
You Li, Fan Ma, Yi Yang
Quality-Driven Curation of Remote Sensing Vision-Language Data via Learned Scoring Models
Dilxat Muhtar, Enzhuo Zhang, Zhenshi Li et al.
MP-SfM: Monocular Surface Priors for Robust Structure-from-Motion
Zador Pataki, Paul-Edouard Sarlin, Johannes Schönberger et al.
GIFT: Unlocking Full Potential of Labels in Distilled Dataset at Near-zero Cost
Xinyi Shang, Peng Sun, Tao Lin
Bayesian Test-Time Adaptation for Vision-Language Models
Lihua Zhou, Mao Ye, Shuaifeng Li et al.
HyperGS: Hyperspectral 3D Gaussian Splatting
Christopher Thirgood, Oscar Mendez, Erin Chao Ling et al.
Empowering Users in Digital Privacy Management through Interactive LLM-Based Agents
Bolun Sun, Yifan Zhou, Haiyun Jiang
Theoretically Grounded Framework for LLM Watermarking: A Distribution-Adaptive Approach
Haiyun He, Yepeng Liu, Ziqiao Wang et al.
Universal Video Temporal Grounding with Generative Multi-modal Large Language Models
Zeqian Li, Shangzhe Di, Zhonghua Zhai et al.
Aux-Think: Exploring Reasoning Strategies for Data-Efficient Vision-Language Navigation
Shuo Wang, Yongcai Wang, Wanting Li et al.
ScribbleLight: Single Image Indoor Relighting with Scribbles
Jun Myeong Choi, Annie N. Wang, Pieter Peers et al.
KeyFace: Expressive Audio-Driven Facial Animation for Long Sequences via KeyFrame Interpolation
Antoni Bigata Casademunt, Michał Stypułkowski, Rodrigo Mira et al.
Constrained Fair and Efficient Allocations
Benjamin Cookson, Soroush Ebadian, Nisarg Shah
ViBiDSampler: Enhancing Video Interpolation Using Bidirectional Diffusion Sampler
Serin Yang, Taesung Kwon, Jong Chul YE
HiMoR: Monocular Deformable Gaussian Reconstruction with Hierarchical Motion Representation
Yiming Liang, Tianhan Xu, Yuta Kikuchi
TraF-Align: Trajectory-aware Feature Alignment for Asynchronous Multi-agent Perception
Zhiying Song, Lei Yang, Fuxi Wen et al.
Strategy Coopetition Explains the Emergence and Transience of In-Context Learning
Aaditya Singh, Ted Moskovitz, Sara Dragutinović et al.
Deep Nonlinear Sufficient Dimension Reduction
Yinfeng Chen, Yuling Jiao, Rui Qiu et al.
AgentTTS: Large Language Model Agent for Test-time Compute-optimal Scaling Strategy in Complex Tasks
Fali Wang, Hui Liu, Zhenwei Dai et al.
Update Your Transformer to the Latest Release: Re-Basin of Task Vectors
Filippo Rinaldi, Giacomo Capitani, Lorenzo Bonicelli et al.
$\gamma-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Yaxin Luo, Gen Luo, Jiayi Ji et al.
Controllable Generation via Locally Constrained Resampling
Kareem Ahmed, Kai-Wei Chang, Guy Van den Broeck
On Temperature Scaling and Conformal Prediction of Deep Classifiers
Lahav Dabah, Tom Tirer
WeatherGFM: Learning a Weather Generalist Foundation Model via In-context Learning
Xiangyu Zhao, Zhiwang Zhou, Wenlong Zhang et al.
RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards
jingnan zheng, Xiangtian Ji, Yijun Lu et al.
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics
Dongyoung Kim, Huiwon Jang, Sumin Park et al.
PlanarSplatting: Accurate Planar Surface Reconstruction in 3 Minutes
Bin Tan, Rui Yu, Yujun Shen et al.
Circumventing Shortcuts in Audio-visual Deepfake Detection Datasets with Unsupervised Learning
Stefan Smeu, Dragos-Alexandru Boldisor, Dan Oneata et al.
MagCache: Fast Video Generation with Magnitude-Aware Cache
Zehong Ma, Longhui Wei, Feng Wang et al.
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models
Liyan Tang, Grace Kim, Xinyu Zhao et al.
Efficient Alternating Minimization with Applications to Weighted Low Rank Approximation
Zhao Song, Mingquan Ye, Junze Yin et al.
BASKET: A Large-Scale Video Dataset for Fine-Grained Skill Estimation
Yulu Pan, Ce Zhang, Gedas Bertasius
DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution
Zheng Chen, Zichen Zou, Kewei Zhang et al.
Online Video Understanding: OVBench and VideoChat-Online
Zhenpeng Huang, Xinhao Li, Jiaqi Li et al.
Solving Video Inverse Problems Using Image Diffusion Models
Taesung Kwon, Jong Chul YE
LITA-GS: Illumination-Agnostic Novel View Synthesis via Reference-Free 3D Gaussian Splatting and Physical Priors
Han Zhou, Wei Dong, Jun Chen
Language Guided Concept Bottleneck Models for Interpretable Continual Learning
Lu Yu, HaoYu Han, Zhe Tao et al.
All-in-One: Transferring Vision Foundation Models into Stereo Matching
Jingyi Zhou, Haoyu Zhang, Jiakang Yuan et al.
Scaling Laws for Differentially Private Language Models
Ryan McKenna, Yangsibo Huang, Amer Sinha et al.
LesionLocator: Zero-Shot Universal Tumor Segmentation and Tracking in 3D Whole-Body Imaging
Maximilian Rokuss, Yannick Kirchhoff, Seval Akbal et al.
QuaDiM: A Conditional Diffusion Model For Quantum State Property Estimation
Yehui Tang, Mabiao Long, Junchi Yan
Distributive Fairness in Large Language Models: Evaluating Alignment with Human Values
Hadi Hosseini, Samarth Khanna
ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos
Tanveer Hannan, Md Mohaiminul Islam, Jindong Gu et al.
MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments
MATTHIEU CORD, Antonin Vobecky, Oriane Siméoni et al.
SuperPC: A Single Diffusion Model for Point Cloud Completion, Upsampling, Denoising, and Colorization
Yi Du, Zhipeng Zhao, Shaoshu Su et al.
X-Dancer: Expressive Music to Human Dance Video Generation
Zeyuan Chen, Hongyi Xu, Guoxian Song et al.
HOPE for a Robust Parameterization of Long-memory State Space Models
Annan Yu, Michael W Mahoney, N. Benjamin Erichson
Synthesizing Privacy-Preserving Text Data via Finetuning *without* Finetuning Billion-Scale LLMs
Bowen Tan, Zheng Xu, Eric Xing et al.
REGENT: A Retrieval-Augmented Generalist Agent That Can Act In-Context in New Environments
Kaustubh Sridhar, Souradeep Dutta, Dinesh Jayaraman et al.
HO-Cap: A Capture System and Dataset for 3D Reconstruction and Pose Tracking of Hand-Object Interaction
Jikai Wang, Qifan Zhang, Yu-Wei Chao et al.
Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation
Gao Peng, Le Zhuo, Dongyang Liu et al.
DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image
Qingxuan Wu, Zhiyang Dou, Sirui Xu et al.
Can Classic GNNs Be Strong Baselines for Graph-level Tasks? Simple Architectures Meet Excellence
Yuankai Luo, Lei Shi, Xiao-Ming Wu
Rapidly Adapting Policies to the Real-World via Simulation-Guided Fine-Tuning
Patrick Yin, Tyler Westenbroek, Ching-An Cheng et al.
Knowledge-Aligned Counterfactual-Enhancement Diffusion Perception for Unsupervised Cross-Domain Visual Emotion Recognition
Wen Yin, Yong Wang, Guiduo Duan et al.
Dual Prompting Image Restoration with Diffusion Transformers
Dehong Kong, Fan Li, Zhixin Wang et al.
MegActor-Sigma: Unlocking Flexible Mixed-Modal Control in Portrait Animation with Diffusion Transformer
Shurong Yang, Huadong Li, Juhao Wu et al.
When Do LLMs Help With Node Classification? A Comprehensive Analysis
Xixi Wu, Yifei Shen, Fangzhou Ge et al.
Solving Inequality Proofs with Large Language Models
Jiayi Sheng, Luna Lyu, Jikai Jin et al.
Attention-Driven GUI Grounding: Leveraging Pretrained Multimodal Large Language Models Without Fine-Tuning
Hai-Ming Xu, Qi Chen, Lei Wang et al.
Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization
Luca Masserano, Abdul Fatir Ansari, Boran Han et al.
Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained Control
Hejia Chen, Haoxian Zhang, Shoulong Zhang et al.
ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation
Mengyang Wu, Yuzhi Zhao, Jialun Cao et al.
Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?
Simon Park, Abhishek Panigrahi, Yun Cheng et al.
Breaking Free from MMI: A New Frontier in Rationalization by Probing Input Utilization
Wei Liu, Zhiying Deng, Zhongyu Niu et al.
Decomposition Polyhedra of Piecewise Linear Functions
Marie-Charlotte Brandenburg, Moritz Grillo, Christoph Hertrich
Physical Plausibility-aware Trajectory Prediction via Locomotion Embodiment
Hiromu Taketsugu, Takeru Oba, Takahiro Maeda et al.
PDE-Transformer: Efficient and Versatile Transformers for Physics Simulations
Benjamin Holzschuh, Qiang Liu, Georg Kohl et al.
HybridGS: High-Efficiency Gaussian Splatting Data Compression using Dual-Channel Sparse Representation and Point Cloud Encoder
Qi Yang, Le Yang, Geert Van der Auwera et al.
Multimodal Latent Diffusion Model for Complex Sewing Pattern Generation
Shengqi Liu, Yuhao Cheng, Zhuo Chen et al.
Diffusion Tree Sampling: Scalable inference‑time alignment of diffusion models
Vineet Jain, Kusha Sareen, Mohammad Pedramfar et al.
Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks
Shikai Qiu, Lechao Xiao, Andrew Wilson et al.
Objective drives the consistency of representational similarity across datasets
Laure Ciernik, Lorenz Linhardt, Marco Morik et al.
ROADWork: A Dataset and Benchmark for Learning to Recognize, Observe, Analyze and Drive Through Work Zones
Anurag Ghosh, Shen Zheng, Robert Tamburo et al.
Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment
Yifan Zhang, Ge Zhang, Yue Wu et al.
SILO: Solving Inverse Problems with Latent Operators
Ron Raphaeli, Sean Man, Michael Elad
StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition
Xin Ding, Hao Wu, Yifan Yang et al.
DM-Adapter: Domain-Aware Mixture-of-Adapters for Text-Based Person Retrieval
Yating Liu, Zimo Liu, Xiangyuan Lan et al.
Diffusion Transformers as Open-World Spatiotemporal Foundation Models
Yuan Yuan, Chonghua Han, Jingtao Ding et al.
OS-ATLAS: Foundation Action Model for Generalist GUI Agents
Zhiyong Wu, Zhenyu Wu, Fangzhi Xu et al.
Perspective-Invariant 3D Object Detection
Alan Liang, Lingdong Kong, Dongyue Lu et al.
MP-GUI: Modality Perception with MLLMs for GUI Understanding
Ziwei Wang, Weizhi Chen, Leyang Yang et al.
UAVScenes: A Multi-Modal Dataset for UAVs
Sijie Wang, Siqi Li, Yawei Zhang et al.
How Transformers Learn Structured Data: Insights From Hierarchical Filtering
Jerome Garnier-Brun, Marc Mezard, Emanuele Moscato et al.
Make Me Happier: Evoking Emotions Through Image Diffusion Models
Qing Lin, Jingfeng Zhang, YEW-SOON ONG et al.
SurFhead: Affine Rig Blending for Geometrically Accurate 2D Gaussian Surfel Head Avatars
Jaeseong Lee, Taewoong Kang, Marcel Buehler et al.
Can Transformers Do Enumerative Geometry?
Baran Hashemi, Roderic Corominas, Alessandro Giacchetto
Diffusion models for Gaussian distributions: Exact solutions and Wasserstein errors
Emile Pierret, Bruno Galerne
BrainUICL: An Unsupervised Individual Continual Learning Framework for EEG Applications
Yangxuan Zhou, Sha Zhao, Jiquan Wang et al.
Generalized and Efficient 2D Gaussian Splatting for Arbitrary-scale Super-Resolution
Du Chen, Liyi Chen, Zhengqiang ZHANG et al.
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
Da Xiao, Qingye Meng, Shengping Li et al.
Discrete Codebook World Models for Continuous Control
Aidan Scannell, Mohammadreza Nakhaeinezhadfard, Kalle Kujanpää et al.
MINERVA: Evaluating Complex Video Reasoning
Arsha Nagrani, Sachit Menon, Ahmet Iscen et al.
DiffCalib: Reformulating Monocular Camera Calibration as Diffusion-Based Dense Incident Map Generation
Xiankang He, Guangkai Xu, Bo Zhang et al.
Towards Federated RLHF with Aggregated Client Preference for LLMs
Feijie Wu, Xiaoze Liu, Haoyu Wang et al.
MultiADS: Defect-aware Supervision for Multi-type Anomaly Detection and Segmentation in Zero-Shot Learning
Ylli Sadikaj, Hongkuan Zhou, Lavdim Halilaj et al.
ReQFlow: Rectified Quaternion Flow for Efficient and High-Quality Protein Backbone Generation
Angxiao Yue, Zichong Wang, Hongteng Xu
MIB: A Mechanistic Interpretability Benchmark
Aaron Mueller, Atticus Geiger, Sarah Wiegreffe et al.
Learning to Adapt Frozen CLIP for Few-Shot Test-Time Domain Adaptation
Zhixiang Chi, Li Gu, Huan Liu et al.
Graph Generative Pre-trained Transformer
Xiaohui Chen, Yinkai Wang, JIAXING HE et al.
Diff3DS: Generating View-Consistent 3D Sketch via Differentiable Curve Rendering
Yibo Zhang, Lihong Wang, Changqing Zou et al.
Advancing Expert Specialization for Better MoE
Hongcan Guo, Haolang Lu, Guoshun Nan et al.
Multi-Pair Temporal Sentence Grounding via Multi-Thread Knowledge Transfer Network
Xiang Fang, Wanlong Fang, Changshuo Wang et al.
Jailbreaking as a Reward Misspecification Problem
Zhihui Xie, Jiahui Gao, Lei Li et al.
LeFusion: Controllable Pathology Synthesis via Lesion-Focused Diffusion Models
Hantao Zhang, Yuhe Liu, Jiancheng Yang et al.
Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies
Nadav Timor, Jonathan Mamou, Daniel Korat et al.
Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition
Zhong Zheng, Haochen Zhang, Lingzhou Xue
Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse
Arthur Jacot, Peter Súkeník, Zihan Wang et al.
PENCIL: Long Thoughts with Short Memory
Chenxiao Yang, Nati Srebro, David McAllester et al.
Repo2Run: Automated Building Executable Environment for Code Repository at Scale
Ruida Hu, Chao Peng, XinchenWang et al.
Reasoning as an Adaptive Defense for Safety
Taeyoun Kim, Fahim Tajwar, Aditi Raghunathan et al.
HELM: Hierarchical Encoding for mRNA Language Modeling
Mehdi Yazdani-Jahromi, Mangal Prakash, Tommaso Mansi et al.
BodyGen: Advancing Towards Efficient Embodiment Co-Design
Haofei Lu, Zhe Wu, Junliang Xing et al.
InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment
Yunhong Lu, Qichao Wang, Hengyuan Cao et al.
EAP-GP: Mitigating Saturation Effect in Gradient-based Automated Circuit Identification
Lin Zhang, Wenshuo Dong, Zhuoran Zhang et al.
Rethinking Evaluation of Sparse Autoencoders through the Representation of Polysemous Words
Gouki Gouki, Hiroki Furuta, Yusuke Iwasawa et al.
CAD-Coder: Text-to-CAD Generation with Chain-of-Thought and Geometric Reward
Yandong Guan, Xilin Wang, XiMing Xing et al.
On the Training Convergence of Transformers for In-Context Classification of Gaussian Mixtures
Wei Shen, Ruida Zhou, Jing Yang et al.
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Qining Zhang, Lei Ying
The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
Denis Sutter, Julian Minder, Thomas Hofmann et al.