Most Cited 2025 "cascading phenomena" Papers
22,274 papers found • Page 22 of 112
Conference
Flat-LoRA: Low-Rank Adaptation over a Flat Loss Landscape
Tao Li, Zhengbao He, Yujun Li et al.
FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
Renshan Zhang, Rui Shao, Gongwei Chen et al.
Data Synthesis with Diverse Styles for Face Recognition via 3DMM-Guided Diffusion
Yuxi Mi, Zhizhou Zhong, Yuge Huang et al.
QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search
Zongyu Lin, Yao Tang, Xingcheng Yao et al.
ResCLIP: Residual Attention for Training-free Dense Vision-language Inference
Jinhong Deng, Yuhang Yang, Wen Li et al.
Mosaic of Modalities: A Comprehensive Benchmark for Multimodal Graph Learning
Jing Zhu, Yuhang Zhou, Shengyi Qian et al.
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design
Haojie Duanmu, Xiuhong Li, Zhihang Yuan et al.
FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning
Gaojian Wang, Feng Lin, Tong Wu et al.
metabench - A Sparse Benchmark of Reasoning and Knowledge in Large Language Models
Alex Kipnis, Konstantinos Voudouris, Luca Schulze Buschoff et al.
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence
Jie Feng, Shengyuan Wang, Tianhui Liu et al.
BackdoorLLM: A Comprehensive Benchmark for Backdoor Attacks and Defenses on Large Language Models
Yige Li, Hanxun Huang, Yunhan Zhao et al.
ROADWork: A Dataset and Benchmark for Learning to Recognize, Observe, Analyze and Drive Through Work Zones
Anurag Ghosh, Shen Zheng, Robert Tamburo et al.
Detecting Backdoor Samples in Contrastive Language Image Pretraining
Hanxun Huang, Sarah Erfani, Yige Li et al.
Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting
Suraj Anand, Michael Lepori, Jack Merullo et al.
Temporal Fair Division
Benjamin Cookson, Soroush Ebadian, Nisarg Shah
The Utility and Complexity of In- and Out-of-Distribution Machine Unlearning
Youssef Allouah, Joshua Kazdan, Rachid Guerraoui et al.
Differentiable Optimization of Similarity Scores Between Models and Brains
Nathan Cloos, Moufan Li, Markus Siegel et al.
RelitLRM: Generative Relightable Radiance for Large Reconstruction Models
Tianyuan Zhang, Zhengfei Kuang, Haian Jin et al.
Understanding Virtual Nodes: Oversquashing and Node Heterogeneity
Joshua Southern, Francesco Di Giovanni, Michael Bronstein et al.
HyperFace: Generating Synthetic Face Recognition Datasets by Exploring Face Embedding Hypersphere
Hatef Otroshi Shahreza, Sébastien Marcel
Multi-view Reconstruction via SfM-guided Monocular Depth Estimation
Haoyu Guo, He Zhu, Sida Peng et al.
LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation models
Ziqi Lu, Heng Yang, Danfei Xu et al.
Accurate and Regret-Aware Numerical Problem Solver for Tabular Question Answering
Yuxiang Wang, Jianzhong Qi, Junhao Gan
DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution
Zheng Chen, Zichen Zou, Kewei Zhang et al.
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
Hanyang Wang, Fangfu Liu, Jiawei Chi et al.
Why and How LLMs Hallucinate: Connecting the Dots with Subsequence Associations
Yiyou Sun, Yu Gai, Lijie Chen et al.
Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
Federico Cocchi, Nicholas Moratelli, Marcella Cornia et al.
Adversarial Robustness in Two-Stage Learning-to-Defer: Algorithms and Guarantees
Yannis Montreuil, Axel Carlier, Lai Xing Ng et al.
Understanding Model Calibration - A gentle introduction and visual exploration of calibration and the expected calibration error (ECE)
Maja Pavlovic
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
Da Xiao, Qingye Meng, Shengping Li et al.
Reviving DSP for Advanced Theorem Proving in the Era of Reasoning Models
Chenrui Cao, Liangcheng Song, Zenan Li et al.
R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference
Zhenyu Zhang, Zechun Liu, Yuandong Tian et al.
Have the VLMs Lost Confidence? A Study of Sycophancy in VLMs
Shuo Li, Tao Ji, Xiaoran Fan et al.
Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts
Yun Wang, Longguang Wang, Chenghao Zhang et al.
RGBAvatar: Reduced Gaussian Blendshapes for Online Modeling of Head Avatars
Linzhou Li, Yumeng Li, Yanlin Weng et al.
Bridging the User-side Knowledge Gap in Knowledge-aware Recommendations with Large Language Models
Zheng Hu, Zhe Li, Ziyun Jiao et al.
Flash-VStream: Efficient Real-Time Understanding for Long Video Streams
Haoji Zhang, Yiqin Wang, Yansong Tang et al.
Strategist: Self-improvement of LLM Decision Making via Bi-Level Tree Search
Jonathan Light, Min Cai, Weiqin Chen et al.
MPDrive: Improving Spatial Understanding with Marker-Based Prompt Learning for Autonomous Driving
Zhi-Yuan Zhang, Xiaofan Li, Zhihao Xu et al.
Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation
Yongkang Li, Tianheng Cheng, Bin Feng et al.
Hierarchical Mixture of Experts: Generalizable Learning for High-Level Synthesis
Weikai Li, Ding Wang, Zijian Ding et al.
BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations
Weixi Feng, Chao Liu, Sifei Liu et al.
Memory Mosaics
Jianyu Zhang, Niklas Nolte, Ranajoy Sadhukhan et al.
Generation from Noisy Examples
Ananth Raman, Vinod Raman
How Expressive are Knowledge Graph Foundation Models?
Xingyue Huang, Pablo Barcelo, Michael Bronstein et al.
Reasoning as an Adaptive Defense for Safety
Taeyoun Kim, Fahim Tajwar, Aditi Raghunathan et al.
SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction
Lu Dai, Yijie Xu, Jinhui Ye et al.
3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model
Wenbo Hu, Yining Hong, Yanjun Wang et al.
How do Transformers Learn Implicit Reasoning?
Jiaran Ye, Zijun Yao, Zhidian Huang et al.
X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing
Xinyan Chen, Jianfei Yang
Lifting Motion to the 3D World via 2D Diffusion
Jiaman Li, Karen Liu, Jiajun Wu
Efficiently Parameterized Neural Metriplectic Systems
Anthony Gruber, Kookjin Lee, Haksoo Lim et al.
On Conformal Isometry of Grid Cells: Learning Distance-Preserving Position Embedding
Dehong Xu, Ruiqi Gao, Wenhao Zhang et al.
Jailbreaking as a Reward Misspecification Problem
Zhihui Xie, Jiahui Gao, Lei Li et al.
Aux-Think: Exploring Reasoning Strategies for Data-Efficient Vision-Language Navigation
Shuo Wang, Yongcai Wang, Wanting Li et al.
TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation
Mohan Xu, Kai Li, Guo Chen et al.
Probing the Latent Hierarchical Structure of Data via Diffusion Models
Antonio Sclocchi, Alessandro Favero, Noam Levi et al.
Video Summarization with Large Language Models
Min Jung Lee, Dayoung Gong, Minsu Cho
TEASER: Token Enhanced Spatial Modeling for Expressions Reconstruction
Yunfei Liu, Lei Zhu, Lijian Lin et al.
The Lighthouse of Language: Enhancing LLM Agents via Critique-Guided Improvement
Ruihan Yang, Fanghua Ye, Jian Li et al.
SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters
Jianping Jiang, Weiye Xiao, Zhengyu Lin et al.
LiDAR-RT: Gaussian-based Ray Tracing for Dynamic LiDAR Re-simulation
Chenxu Zhou, Lvchang Fu, Sida Peng et al.
SILO: Solving Inverse Problems with Latent Operators
Ron Raphaeli, Sean Man, Michael Elad
Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs
Yingji Zhong, Zhihao Li, Dave Zhenyu Chen et al.
Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis
Hongkang Li, Songtao Lu, Pin-Yu Chen et al.
Optimized Multi-Token Joint Decoding With Auxiliary Model for LLM Inference
Zongyue Qin, Ziniu Hu, Zifan He et al.
Revolve: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization
Peiyan Zhang, Haibo Jin, Leyang Hu et al.
CASE-Bench: Context-Aware SafEty Benchmark for Large Language Models
Guangzhi Sun, Xiao Zhan, Shutong Feng et al.
Hierarchical Fine-grained Preference Optimization for Physically Plausible Video Generation
Harold Haodong Chen, Haojian Huang, Qifeng Chen et al.
Open-World Amodal Appearance Completion
Jiayang Ao, Yanbei Jiang, Qiuhong Ke et al.
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?
Zhenheng Tang, Xiang Liu, Qian Wang et al.
PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation
Chen Wang, Chuhao Chen, Yiming Huang et al.
Dense Policy: Bidirectional Autoregressive Learning of Actions
Yue Su, Xinyu Zhan, Hongjie Fang et al.
Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance
Dimitris Oikonomou, Nicolas Loizou
How Much Can We Forget about Data Contamination?
Sebastian Bordt, Suraj Srinivas, Valentyn Boreiko et al.
PhyMPGN: Physics-encoded Message Passing Graph Network for spatiotemporal PDE systems
Bocheng Zeng, Qi Wang, Mengtao Yan et al.
SMamba: Sparse Mamba for Event-based Object Detection
Nan Yang, Yang Wang, Zhanwen Liu et al.
Anyprefer: An Agentic Framework for Preference Data Synthesis
Yiyang Zhou, Zhaoyang Wang, Tianle Wang et al.
Learning Flow Fields in Attention for Controllable Person Image Generation
Zijian Zhou, Shikun Liu, Xiao Han et al.
Glad: A Streaming Scene Generator for Autonomous Driving
Bin Xie, Yingfei Liu, Tiancai Wang et al.
Analysis of Linear Mode Connectivity via Permutation-Based Weight Matching: With Insights into Other Permutation Search Methods
Akira Ito, Masanori Yamada, Atsutoshi Kumagai
Is this Generated Person Existed in Real-world? Fine-grained Detecting and Calibrating Abnormal Human-body
Zeqing Wang, Qingyang Ma, Wentao Wan et al.
Beyond Graphs: Can Large Language Models Comprehend Hypergraphs?
Yifan Feng, Chengwu Yang, Xingliang Hou et al.
Bayesian Test-Time Adaptation for Vision-Language Models
Lihua Zhou, Mao Ye, Shuaifeng Li et al.
Rethinking Invariance in In-context Learning
Lizhe Fang, Yifei Wang, Khashayar Gatmiry et al.
Improving Your Model Ranking on Chatbot Arena by Vote Rigging
Rui Min, Tianyu Pang, Chao Du et al.
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late In Training
Zhanpeng Zhou, Mingze Wang, Yuchen Mao et al.
nnWNet: Rethinking the Use of Transformers in Biomedical Image Segmentation and Calling for a Unified Evaluation Benchmark
Yanfeng Zhou, Lingrui Li, Le Lu et al.
Contrastive Flow Matching
George Stoica, Vivek Ramanujan, Xiang Fan et al.
The Computational Complexity of Circuit Discovery for Inner Interpretability
Federico Adolfi, Martina G. Vilas, Todd Wareham
Attention-Driven GUI Grounding: Leveraging Pretrained Multimodal Large Language Models Without Fine-Tuning
Hai-Ming Xu, Qi Chen, Lei Wang et al.
COB-GS: Clear Object Boundaries in 3DGS Segmentation Based on Boundary-Adaptive Gaussian Splitting
Jiaxin Zhang, Junjun Jiang, Youyu Chen et al.
VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text
Tianyu Zhang, Suyuchen Wang, Lu Li et al.
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models
Zhenxing Mi, Kuan-Chieh Wang, Guocheng Qian et al.
CellFlux: Simulating Cellular Morphology Changes via Flow Matching
Yuhui Zhang, Yuchang Su, Chenyu Wang et al.
6DGS: Enhanced Direction-Aware Gaussian Splatting for Volumetric Rendering
Zhongpai Gao, Benjamin Planche, Meng Zheng et al.
Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning
Jiwon Song, Dongwon Jo, Yulhwa Kim et al.
ViSAGe: Video-to-Spatial Audio Generation
Jaeyeon Kim, Heeseung Yun, Gunhee Kim
Revisiting Energy Based Models as Policies: Ranking Noise Contrastive Estimation and Interpolating Energy Models
Sumeet Singh, Vikas Sindhwani, Stephen Tu
CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization
Feize Wu, Yun Pang, Junyi Zhang et al.
Layered Image Vectorization via Semantic Simplification
Zhenyu Wang, Jianxi Huang, Zhida Sun et al.
Task-Agnostic Guided Feature Expansion for Class-Incremental Learning
Bowen Zheng, Da-Wei Zhou, Han-Jia Ye et al.
Identifying and Mitigating Position Bias of Multi-image Vision-Language Models
Xinyu Tian, Shu Zou, Zhaoyuan Yang et al.
Neural Sampling from Boltzmann Densities: Fisher-Rao Curves in the Wasserstein Geometry
Jannis Chemseddine, Christian Wald, Richard Duong et al.
GraphGPT: Generative Pre-trained Graph Eulerian Transformer
Qifang Zhao, Weidong Ren, Tianyu Li et al.
Improving the Scaling Laws of Synthetic Data with Deliberate Practice
Reyhane Askari Hemmat, Mohammad Pezeshki, Elvis Dohmatob et al.
DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation
Jing He, Haodong Li, huyongzhe et al.
Rectifying Magnitude Neglect in Linear Attention
Qihang Fan, Huaibo Huang, Yuang Ai et al.
Rectified Diffusion Guidance for Conditional Generation
Mengfei Xia, Nan Xue, Yujun Shen et al.
Scaling Trends in Language Model Robustness
Nikolaus Howe, Ian McKenzie, Oskar Hollinsworth et al.
Can Transformers Reason Logically? A Study in SAT Solving
Leyan Pan, Vijay Ganesh, Jacob Abernethy et al.
UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning
Xiangyu Wang, Donglin Yang, Yue Liao et al.
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning
Alexander Nikulin, Ilya Zisman, Alexey Zemtsov et al.
RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction
Peng Liu, Dongyang Dai, Zhiyong Wu
TopoTune: A Framework for Generalized Combinatorial Complex Neural Networks
Mathilde Papillon, Guillermo Bernardez, Claudio Battiloro et al.
GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior
Penghao Wu, Shengnan Ma, Bo Wang et al.
RiOSWorld: Benchmarking the Risk of Multimodal Computer-Use Agents
Jingyi Yang, Shuai Shao, Dongrui Liu et al.
Towards Precise Scaling Laws for Video Diffusion Transformers
Yuanyang Yin, Yaqi Zhao, Mingwu Zheng et al.
TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation
Gihyun Kwon, Jong Chul YE
Reward-Instruct: A Reward-Centric Approach to Fast Photo-Realistic Image Generation
Yihong Luo, Tianyang Hu, Weijian Luo et al.
SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent
Yandan Yang, Baoxiong Jia, Shujie Zhang et al.
Point2RBox-v2: Rethinking Point-supervised Oriented Object Detection with Spatial Layout Among Instances
Yi Yu, Botao Ren, Peiyuan Zhang et al.
Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment
Cheryl Li, Tianyuan Xu, Yiwen Guo
VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation
Wenhao Wang, Yi Yang
SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding
Yangliu Hu, Zikai Song, Na Feng et al.
MP-GUI: Modality Perception with MLLMs for GUI Understanding
Ziwei Wang, Weizhi Chen, Leyang Yang et al.
Hierarchical Equivariant Policy via Frame Transfer
Haibo Zhao, Dian Wang, Yizhe Zhu et al.
Random-Set Neural Networks
Shireen Kudukkil Manchingal, Muhammad Mubashar, Kaizheng Wang et al.
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
Bingda Tang, Sayak Paul, Boyang Zheng et al.
Adaptive Self-improvement LLM Agentic System for ML Library Development
Genghan Zhang, Weixin Liang, Olivia Hsu et al.
QERA: an Analytical Framework for Quantization Error Reconstruction
Cheng Zhang, Jeffrey T. H. Wong, Can Xiao et al.
Lossy Compression with Pretrained Diffusion Models
jeremy vonderfecht, Feng Liu
DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers
Xuanlei Zhao, Shenggan Cheng, Chang Chen et al.
Don’t Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models
Sohyun An, Ruochen Wang, Tianyi Zhou et al.
ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding
Zhenxing Zhang, Yaxiong Wang, Lechao Cheng et al.
Using Diffusion Priors for Video Amodal Segmentation
Kaihua Chen, Deva Ramanan, Tarasha Khurana
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs
Akhiad Bercovich, Tomer Ronen, Talor Abramovich et al.
Retrieval-Augmented Perception: High-resolution Image Perception Meets Visual RAG
Wenbin Wang, Yongcheng Jing, Liang Ding et al.
CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance
Yongchao Chen, Yilun Hao, Yueying Liu et al.
MAPLE: Many-Shot Adaptive Pseudo-Labeling for In-Context Learning
Zihan Chen, Song Wang, Zhen Tan et al.
Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles
Buu Phan, Brandon Amos, Itai Gat et al.
Dual Consolidation for Pre-Trained Model-Based Domain-Incremental Learning
Da-Wei Zhou, Zi-Wen Cai, Han-Jia Ye et al.
DOTA: Distributional Test-time Adaptation of Vision-Language Models
Zongbo Han, Jialong Yang, Guangyu Wang et al.
TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning
Ge Li, Dong Tian, Hongyi Zhou et al.
ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models
Yeji Park, Deokyeong Lee, Junsuk Choe et al.
HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator
Fan Yang, Ru Zhen, Jianing Wang et al.
MM-CamObj: A Comprehensive Multimodal Dataset for Camouflaged Object Scenarios
Jiacheng Ruan, Wenzhen Yuan, Zehao Lin et al.
EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-Judge
Ruskin Raj Manku, Yuzhi Tang, Xingjian Shi et al.
xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference
Maximilian Beck, Korbinian Pöppel, Phillip Lippe et al.
TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting
Jianchuan Chen, Jingchuan Hu, Gaige Wang et al.
BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing
Yunqi Gu, Ian Huang, Jihyeon Je et al.
FEAT: Free energy Estimators with Adaptive Transport
Yuanqi Du, Jiajun He, Francisco Vargas et al.
3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding
Tatiana Zemskova, Dmitry Yudin
Addax: Utilizing Zeroth-Order Gradients to Improve Memory Efficiency and Performance of SGD for Fine-Tuning Language Models
Zeman Li, Xinwei Zhang, Peilin Zhong et al.
Science-T2I: Addressing Scientific Illusions in Image Synthesis
Jialuo Li, Wenhao Chai, XINGYU FU et al.
Privacy-Preserving Low-Rank Adaptation Against Membership Inference Attacks for Latent Diffusion Models
Zihao Luo, Xilie Xu, Feng Liu et al.
MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models
Chejian Xu, Jiawei Zhang, Zhaorun Chen et al.
Sports-Traj: A Unified Trajectory Generation Model for Multi-Agent Movement in Sports
Yi Xu, Yun Fu
Open-World Reinforcement Learning over Long Short-Term Imagination
Jiajian Li, Qi Wang, Yunbo Wang et al.
Diff-Shadow: Global-guided Diffusion Model for Shadow Removal
Jinting Luo, Ru Li, Chengzhi Jiang et al.
Topological Blindspots: Understanding and Extending Topological Deep Learning Through the Lens of Expressivity
Yam Eitan, Yoav Gelberg, Guy Bar-Shalom et al.
Fast training and sampling of Restricted Boltzmann Machines
Nicolas BEREUX, Aurélien Decelle, Cyril Furtlehner et al.
ExpertAF: Expert Actionable Feedback from Video
Kumar Ashutosh, Tushar Nagarajan, Georgios Pavlakos et al.
A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks
Thomas Schmied, Thomas Adler, Vihang Patil et al.
One Node One Model: Featuring the Missing-Half for Graph Clustering
Xuanting Xie, Bingheng Li, Erlin Pan et al.
VladVA: Discriminative Fine-tuning of LVLMs
Yassine Ouali, Adrian Bulat, ALEXANDROS XENOS et al.
Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences
Nikos Dimitriadis, Pascal Frossard, François Fleuret
PLeaS - Merging Models with Permutations and Least Squares
Anshul Nasery, Jonathan Hayase, Pang Wei Koh et al.
ConTextTab: A Semantics-Aware Tabular In-Context Learner
Marco Spinaci, Marek Polewczyk, Maximilian Schambach et al.
SRSA: Skill Retrieval and Adaptation for Robotic Assembly Tasks
Yijie Guo, Bingjie Tang, Iretiayo Akinola et al.
ArtFormer: Controllable Generation of Diverse 3D Articulated Objects
Jiayi Su, Youhe Feng, Zheng Li et al.
Whole-Body Conditioned Egocentric Video Prediction
Yutong Bai, Danny Tran, Amir Bar et al.
IgGM: A Generative Model for Functional Antibody and Nanobody Design
Rubo Wang, Fandi Wu, Xingyu Gao et al.
Locality in Image Diffusion Models Emerges from Data Statistics
Artem Lukoianov, Chenyang Yuan, Justin Solomon et al.
Scaling Embedding Layers in Language Models
Da Yu, Edith Cohen, Badih Ghazi et al.
MambaPEFT: Exploring Parameter-Efficient Fine-Tuning for Mamba
Masakazu Yoshimura, Teruaki Hayashi, Yota Maeda
TopoCellGen: Generating Histopathology Cell Topology with a Diffusion Model
Meilong Xu, Saumya Gupta, Xiaoling Hu et al.
Does Generation Require Memorization? Creative Diffusion Models using Ambient Diffusion
Kulin Shah, Alkis Kalavasis, Adam Klivans et al.
Visual-Instructed Degradation Diffusion for All-in-One Image Restoration
Haina Qin, Wenyang Luo, Zewen Chen et al.
Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization
Luca Masserano, Abdul Fatir Ansari, Boran Han et al.
Activation Gradient based Poisoned Sample Detection Against Backdoor Attacks
Danni Yuan, Mingda Zhang, Shaokui Wei et al.
On the Crucial Role of Initialization for Matrix Factorization
Bingcong Li, Liang Zhang, Aryan Mokhtari et al.
MLPs Learn In-Context on Regression and Classification Tasks
William Tong, Cengiz Pehlevan
Harnessing Massive Satellite Imagery with Efficient Masked Image Modeling
Fengxiang Wang, Hongzhen Wang, Di Wang et al.
Attention layers provably solve single-location regression
Pierre Marion, Raphaël Berthier, Gérard Biau et al.
SpatialCLIP: Learning 3D-aware Image Representations from Spatially Discriminative Language
zehan wang, Sashuai zhou, Shaoxuan He et al.
Statistical Advantages of Perturbing Cosine Router in Mixture of Experts
Huy Nguyen, Pedram Akbarian Saravi, Trang Pham et al.
Locality Alignment Improves Vision-Language Models
Ian Covert, Tony Sun, James Y Zou et al.
Semantic and Sequential Alignment for Referring Video Object Segmentation
Feiyu Pan, Hao Fang, Fangkai Li et al.
StyO: Stylize Your Face in Only One-Shot
Bonan Li, Zicheng Zhang, Xuecheng Nie et al.
Evaluating Image Hallucination in Text-to-Image Generation with Question-Answering
Youngsun Lim, Hojun Choi, Hyunjung Shim
EventGPT: Event Stream Understanding with Multimodal Large Language Models
shaoyu liu, Jianing Li, guanghui zhao et al.
Tri-Ergon: Fine-Grained Video-to-Audio Generation with Multi-Modal Conditions and LUFS Control
Bingliang Li, Fengyu Yang, Yuxin Mao et al.
DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation
Zhiqiang Shen, Ammar Sherif, Zeyuan Yin et al.
CSBrain: A Cross-scale Spatiotemporal Brain Foundation Model for EEG Decoding
Yuchen Zhou, Jiamin Wu, Zichen Ren et al.
SuperDec: 3D Scene Decomposition with Superquadrics Primitives
Elisabetta Fedele, Boyang Sun, Francis Engelmann et al.
Learning Robust Spectral Dynamics for Temporal Domain Generalization
En Yu, Jie Lu, Xiaoyu Yang et al.
h4rm3l: A Language for Composable Jailbreak Attack Synthesis
Moussa Koulako Bala Doumbouya, Ananjan Nandi, Gabriel Poesia et al.
KnowPO: Knowledge-Aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models
Ruizhe Zhang, Yongxin Xu, Yuzhen Xiao et al.
MoDiTalker: Motion-Disentangled Diffusion Model for High-Fidelity Talking Head Generation
Seyeon Kim, Siyoon Jin, Jihye Park et al.