Most Cited 2025 "low dimensional mappings" Papers
22,274 papers found • Page 17 of 112
Conference
Dual Prompting Image Restoration with Diffusion Transformers
Dehong Kong, Fan Li, Zhixin Wang et al.
OS-ATLAS: Foundation Action Model for Generalist GUI Agents
Zhiyong Wu, Zhenyu Wu, Fangzhi Xu et al.
SILO: Solving Inverse Problems with Latent Operators
Ron Raphaeli, Sean Man, Michael Elad
Layers at Similar Depths Generate Similar Activations Across LLM Architectures
Christopher Wolfram, Aaron Schein
Repo2Run: Automated Building Executable Environment for Code Repository at Scale
Ruida Hu, Chao Peng, XinchenWang et al.
AutoOcc: Automatic Open-Ended Semantic Occupancy Annotation via Vision-Language Guided Gaussian Splatting
Xiaoyu Zhou, Jingqi Wang, Yongtao Wang et al.
Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models
Lynn Chua, Badih Ghazi, Yangsibo Huang et al.
SAM2Object: Consolidating View Consistency via SAM2 for Zero-Shot 3D Instance Segmentation
Jihuai Zhao, Junbao Zhuo, Jiansheng Chen et al.
SMARTIES: Spectrum-Aware Multi-Sensor Auto-Encoder for Remote Sensing Images
Gencer Sumbul, Chang Xu, Emanuele Dalsasso et al.
Breaking Free from MMI: A New Frontier in Rationalization by Probing Input Utilization
Wei Liu, Zhiying Deng, Zhongyu Niu et al.
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
Jiashuo Yu, Yue Wu, Meng Chu et al.
HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model
Tao Wang, Changxu Cheng, Lingfeng Wang et al.
Lumina-T2X: Scalable Flow-based Large Diffusion Transformer for Flexible Resolution Generation
Gao Peng, Le Zhuo, Dongyang Liu et al.
Learning to Adapt Frozen CLIP for Few-Shot Test-Time Domain Adaptation
Zhixiang Chi, Li Gu, Huan Liu et al.
Random-Set Neural Networks
Shireen Kudukkil Manchingal, Muhammad Mubashar, Kaizheng Wang et al.
Measuring what Matters: Construct Validity in Large Language Model Benchmarks
Andrew M. Bean, Ryan Othniel Kearns, Angelika Romanou et al.
CP-Guard: Malicious Agent Detection and Defense in Collaborative Bird’s Eye View Perception
Senkang Hu, Yihang Tao, Guowen Xu et al.
DepthCues: Evaluating Monocular Depth Perception in Large Vision Models
Duolikun Danier, Mehmet Aygun, Changjian Li et al.
BrainUICL: An Unsupervised Individual Continual Learning Framework for EEG Applications
Yangxuan Zhou, Sha Zhao, Jiquan Wang et al.
The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
Denis Sutter, Julian Minder, Thomas Hofmann et al.
Learning Robust Spectral Dynamics for Temporal Domain Generalization
En Yu, Jie Lu, Xiaoyu Yang et al.
AlphaZero Neural Scaling and Zipf's Law: a Tale of Board Games and Power Laws
Oren Neumann, Claudius Gros
Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Generation
Zheng Anlin, Xin Wen, Xuanyang Zhang et al.
Mimic In-Context Learning for Multimodal Tasks
Yuchu Jiang, Jiale Fu, chenduo hao et al.
KPL: Training-Free Medical Knowledge Mining of Vision-Language Models
Jiaxiang Liu, Tianxiang Hu, Jiawei Du et al.
Pairwise or Pointwise? Evaluating Feedback Protocols for Bias in LLM-Based Evaluation
Tuhina Tripathi, Manya Wadhwa, Greg Durrett et al.
Strategy Coopetition Explains the Emergence and Transience of In-Context Learning
Aaditya Singh, Ted Moskovitz, Sara Dragutinović et al.
VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language Models
Muchao Ye, Weiyang Liu, Pan He
Perplexity Trap: PLM-Based Retrievers Overrate Low Perplexity Documents
Haoyu Wang, Sunhao Dai, Haiyuan Zhao et al.
UAVScenes: A Multi-Modal Dataset for UAVs
Sijie Wang, Siqi Li, Yawei Zhang et al.
Seurat: From Moving Points to Depth
Seokju Cho, Gabriel Huang, Seungryong Kim et al.
OLinear: A Linear Model for Time Series Forecasting in Orthogonally Transformed Domain
Wenzhen Yue, Yong Liu, Hao Wang et al.
Semi-Supervised Multi-View Multi-Label Learning with View-Specific Transformer and Enhanced Pseudo-Label
Quanjiang Li, Tingjin Luo, Mingdie Jiang et al.
MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models
Chejian Xu, Jiawei Zhang, Zhaorun Chen et al.
UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents
Han Xiao, Guozhi Wang, Yuxiang Chai et al.
Cafe-Talk: Generating 3D Talking Face Animation with Multimodal Coarse- and Fine-grained Control
Hejia Chen, Haoxian Zhang, Shoulong Zhang et al.
Make Me Happier: Evoking Emotions Through Image Diffusion Models
Qing Lin, Jingfeng Zhang, YEW-SOON ONG et al.
GIFT: Unlocking Full Potential of Labels in Distilled Dataset at Near-zero Cost
Xinyi Shang, Peng Sun, Tao Lin
ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos
Tanveer Hannan, Md Mohaiminul Islam, Jindong Gu et al.
MP-SfM: Monocular Surface Priors for Robust Structure-from-Motion
Zador Pataki, Paul-Edouard Sarlin, Johannes Schönberger et al.
Self-attention-based Diffusion Model for Time-series Imputation in Partial Blackout Scenarios
Mohammad Rafid Ul Islam, Prasad Tadepalli, Alan Fern
Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences
Nikos Dimitriadis, Pascal Frossard, François Fleuret
Update Your Transformer to the Latest Release: Re-Basin of Task Vectors
Filippo Rinaldi, Giacomo Capitani, Lorenzo Bonicelli et al.
Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training
Haicheng Wang, Chen Ju, Weixiong Lin et al.
Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization
Yamato Arai, Yuma Ichikawa
Fast Training of Sinusoidal Neural Fields via Scaling Initialization
Taesun Yeom, Sangyoon Lee, Jaeho Lee
You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning
Ayan Sengupta, Siddhant Chaudhary, Tanmoy Chakraborty
On Temperature Scaling and Conformal Prediction of Deep Classifiers
Lahav Dabah, Tom Tirer
Bayesian Test-Time Adaptation for Vision-Language Models
Lihua Zhou, Mao Ye, Shuaifeng Li et al.
Pre-Training Graph Neural Networks on Molecules by Using Subgraph-Conditioned Graph Information Bottleneck
Van Thuy Hoang, O-Joun Lee
Quadratic Gaussian Splatting: High Quality Surface Reconstruction with Second-order Geometric Primitives
ziyu zhang, Binbin Huang, Hanqing Jiang et al.
Coarse Correspondences Boost Spatial-Temporal Reasoning in Multimodal Language Model
Benlin Liu, Yuhao Dong, Yiqin Wang et al.
Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel
Zun Wang, Jialu Li, Yicong Hong et al.
WeatherGFM: Learning a Weather Generalist Foundation Model via In-context Learning
Xiangyu Zhao, Zhiwang Zhou, Wenlong Zhang et al.
Wide Neural Networks Trained with Weight Decay Provably Exhibit Neural Collapse
Arthur Jacot, Peter Súkeník, Zihan Wang et al.
Gap-Dependent Bounds for Q-Learning using Reference-Advantage Decomposition
Zhong Zheng, Haochen Zhang, Lingzhou Xue
Deep Nonlinear Sufficient Dimension Reduction
Yinfeng Chen, Yuling Jiao, Rui Qiu et al.
Multi-modal brain encoding models for multi-modal stimuli
SUBBA REDDY OOTA, Khushbu Pahwa, mounika marreddy et al.
$\gamma-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models
Yaxin Luo, Gen Luo, Jiayi Ji et al.
Do Large Language Models Truly Understand Geometric Structures?
Xiaofeng Wang, Yiming Wang, Wenhong Zhu et al.
GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling
Jialong Zhou, Lichao Wang, Xiao Yang
MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes
Ruijie Lu, Yixin Chen, Junfeng Ni et al.
Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration
Kang Liao, Zongsheng Yue, Zhouxia Wang et al.
Rethinking Evaluation of Sparse Autoencoders through the Representation of Polysemous Words
Gouki Gouki, Hiroki Furuta, Yusuke Iwasawa et al.
Realistic Evaluation of Deep Partial-Label Learning Algorithms
Wei Wang, Dong-Dong Wu, Jindong Wang et al.
Scaling Laws for Differentially Private Language Models
Ryan McKenna, Yangsibo Huang, Amer Sinha et al.
DreamDistribution: Learning Prompt Distribution for Diverse In-distribution Generation
Brian Nlong Zhao, Yuhang Xiao, Jiashu Xu et al.
SuperMat: Physically Consistent PBR Material Estimation at Interactive Rates
Yijia Hong, Yuan-Chen Guo, Ran Yi et al.
Constrained Fair and Efficient Allocations
Benjamin Cookson, Soroush Ebadian, Nisarg Shah
GenHancer: Imperfect Generative Models are Secretly Strong Vision-Centric Enhancers
Shijie Ma, Yuying Ge, Teng Wang et al.
Can Classic GNNs Be Strong Baselines for Graph-level Tasks? Simple Architectures Meet Excellence
Yuankai Luo, Lei Shi, Xiao-Ming Wu
LeFusion: Controllable Pathology Synthesis via Lesion-Focused Diffusion Models
Hantao Zhang, Yuhe Liu, Jiancheng Yang et al.
Group Distributionally Robust Dataset Distillation with Risk Minimization
Saeed Vahidian, Mingyu Wang, Jianyang Gu et al.
Promptable Anomaly Segmentation with SAM Through Self-Perception Tuning
Hui-Yue Yang, Hui Chen, Ao Wang et al.
Transition Path Sampling with Improved Off-Policy Training of Diffusion Path Samplers
Kiyoung Seong, Seonghyun Park, Seonghwan Kim et al.
RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards
jingnan zheng, Xiangtian Ji, Yijun Lu et al.
When Do LLMs Help With Node Classification? A Comprehensive Analysis
Xixi Wu, Yifei Shen, Fangzhou Ge et al.
ND-SDF: Learning Normal Deflection Fields for High-Fidelity Indoor Reconstruction
Ziyu Tang, Weicai Ye, Yifan Wang et al.
MP-GUI: Modality Perception with MLLMs for GUI Understanding
Ziwei Wang, Weizhi Chen, Leyang Yang et al.
Synthesizing Privacy-Preserving Text Data via Finetuning *without* Finetuning Billion-Scale LLMs
Bowen Tan, Zheng Xu, Eric Xing et al.
Vanish into Thin Air: Cross-prompt Universal Adversarial Attacks for SAM2
Ziqi Zhou, Yifan Hu, Yufei Song et al.
Correcting Deviations from Normality: A Reformulated Diffusion Model for Multi-Class Unsupervised Anomaly Detection
Farzad Beizaee, Gregory A. Lodygensky, Christian Desrosiers et al.
MultiADS: Defect-aware Supervision for Multi-type Anomaly Detection and Segmentation in Zero-Shot Learning
Ylli Sadikaj, Hongkuan Zhou, Lavdim Halilaj et al.
BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models
Zenghui Yuan, Jiawen Shi, Pan Zhou et al.
STAA-SNN: Spatial-Temporal Attention Aggregator for Spiking Neural Networks
Tianqing Zhang, Kairong Yu, Xian Zhong et al.
Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation
Chanyoung Kim, Dayun Ju, Woojung Han et al.
Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization
Luca Masserano, Abdul Fatir Ansari, Boran Han et al.
PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution
Zhu Li Bo, Jianze Li, Haotong Qin et al.
Deformable Radial Kernel Splatting
Yihua Huang, Mingxian Lin, Yangtian Sun et al.
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models
Liyan Tang, Grace Kim, Xinyu Zhao et al.
Objective drives the consistency of representational similarity across datasets
Laure Ciernik, Lorenz Linhardt, Marco Morik et al.
Markov Persuasion Processes: Learning to Persuade From Scratch
Francesco Bacchiocchi, Francesco Emanuele Stradi, Matteo Castiglioni et al.
InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment
Yunhong Lu, Qichao Wang, Hengyuan Cao et al.
Beyond the convexity assumption: Realistic tabular data generation under quantifier-free real linear constraints
Mihaela Stoian, Eleonora Giunchiglia
HybridGS: High-Efficiency Gaussian Splatting Data Compression using Dual-Channel Sparse Representation and Point Cloud Encoder
Qi Yang, Le Yang, Geert Van der Auwera et al.
Distilling Monocular Foundation Model for Fine-grained Depth Completion
Yingping Liang, Yutao Hu, Wenqi Shao et al.
Probability Density Geodesics in Image Diffusion Latent Space
Qingtao Yu, Jaskirat Singh, Zhaoyuan Yang et al.
Counterfactual Generative Modeling with Variational Causal Inference
Yulun Wu, Louis McConnell, Claudia Iriondo
LATINO-PRO: LAtent consisTency INverse sOlver with PRompt Optimization
Alessio Spagnoletti, Jean Prost, Andres Almansa et al.
GLASS: Guided Latent Slot Diffusion for Object-Centric Learning
Krishnakant Singh, Simone Schaub-Meyer, Stefan Roth
Language Guided Concept Bottleneck Models for Interpretable Continual Learning
Lu Yu, HaoYu Han, Zhe Tao et al.
O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models
Ashshak Sharifdeen, Muhammad Akhtar Munir, Sanoojan Baliah et al.
Fine-Tuning Visual Autogressive Models for Subject-Driven Generation
Jiwoo Chung, Sangeek Hyun, Hyunjun Kim et al.
REWIND: Real-Time Egocentric Whole-Body Motion Diffusion with Exemplar-Based Identity Conditioning
Jihyun Lee, Weipeng Xu, Alexander Richard et al.
PMQ-VE: Progressive Multi-Frame Quantization for Video Enhancement
ZhanFeng Feng, Long Peng, Xin Di et al.
MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation
Jiaxin Huang, Runnan Chen, Ziwen Li et al.
PDE-Transformer: Efficient and Versatile Transformers for Physics Simulations
Benjamin Holzschuh, Qiang Liu, Georg Kohl et al.
MagCache: Fast Video Generation with Magnitude-Aware Cache
Zehong Ma, Longhui Wei, Feng Wang et al.
Neural Video Compression with Context Modulation
Chuanbo Tang, Zhuoyuan Li, Yifan Bian et al.
Genesis: Multimodal Driving Scene Generation with Spatio-Temporal and Cross-Modal Consistency
Xiangyu Guo, Zhanqian Wu, Kaixin Xiong et al.
Quality-Driven Curation of Remote Sensing Vision-Language Data via Learned Scoring Models
Dilxat Muhtar, Enzhuo Zhang, Zhenshi Li et al.
BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems
Andy Zhang, Joey Ji, Celeste Menders et al.
MultiGO: Towards Multi-level Geometry Learning for Monocular 3D Textured Human Reconstruction
Gangjian Zhang, Nanjie Yao, Shunsi Zhang et al.
An End-to-End Robust Point Cloud Semantic Segmentation Network with Single-Step Conditional Diffusion Models
Wentao Qu, Jing Wang, Yongshun Gong et al.
RNG: Relightable Neural Gaussians
Jiahui Fan, Fujun Luan, Jian Yang et al.
Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks
Shikai Qiu, Lechao Xiao, Andrew Wilson et al.
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)
Leander Girrbach, Stephan Alaniz, Yiran Huang et al.
MegActor-Sigma: Unlocking Flexible Mixed-Modal Control in Portrait Animation with Diffusion Transformer
Shurong Yang, Huadong Li, Juhao Wu et al.
SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation
Leigang Qu, Haochuan Li, Wenjie Wang et al.
Model Provenance Testing for Large Language Models
Ivica Nikolic, Teodora Baluta, Prateek Saxena
SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement
Yuqi Lin, Hengjia Li, Wenqi Shao et al.
All-in-One: Transferring Vision Foundation Models into Stereo Matching
Jingyi Zhou, Haoyu Zhang, Jiakang Yuan et al.
QuaDiM: A Conditional Diffusion Model For Quantum State Property Estimation
Yehui Tang, Mabiao Long, Junchi Yan
ADIFF: Explaining audio difference using natural language
Soham Deshmukh, Shuo Han, Rita Singh et al.
SurFhead: Affine Rig Blending for Geometrically Accurate 2D Gaussian Surfel Head Avatars
Jaeseong Lee, Taewoong Kang, Marcel Buehler et al.
Attention-Driven GUI Grounding: Leveraging Pretrained Multimodal Large Language Models Without Fine-Tuning
Hai-Ming Xu, Qi Chen, Lei Wang et al.
Fast Summation of Radial Kernels via QMC Slicing
Johannes Hertrich, Tim Jahn, Michael Quellmalz
MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining
Yunze Liu, Li Yi
ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation
Mengyang Wu, Yuzhi Zhao, Jialun Cao et al.
Learning World Models for Interactive Video Generation
Taiye Chen, Xun Hu, Zihan Ding et al.
DIFFER: Disentangling Identity Features via Semantic Cues for Clothes-Changing Person Re-ID
Xin Liang, Yogesh S. Rawat
Controllable Generation via Locally Constrained Resampling
Kareem Ahmed, Kai-Wei Chang, Guy Van den Broeck
Test-time Adaptation for Cross-modal Retrieval with Query Shift
Haobin Li, Peng Hu, Qianjun Zhang et al.
CoA: Towards Real Image Dehazing via Compression-and-Adaptation
Long Ma, Yuxin Feng, Yan Zhang et al.
HumanRig: Learning Automatic Rigging for Humanoid Character in a Large Scale Dataset
Zedong Chu, Feng Xiong, Meiduo Liu et al.
Universal Video Temporal Grounding with Generative Multi-modal Large Language Models
Zeqian Li, Shangzhe Di, Zhonghua Zhai et al.
LIBA: Language Instructed Multi-granularity Bridge Assistant for 3D Visual Grounding
Yuan Wang, Ya-Li Li, W U Eastman Z Y et al.
GAS: Generative Avatar Synthesis from a Single Image
Yixing Lu, Junting Dong, YoungJoong Kwon et al.
RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance
Chengrui Wang, Pengfei Liu, Min Zhou et al.
Discrete Codebook World Models for Continuous Control
Aidan Scannell, Mohammadreza Nakhaeinezhadfard, Kalle Kujanpää et al.
DreamText: High Fidelity Scene Text Synthesis
Yibin Wang, Weizhong Zhang, honghui xu et al.
Adversarially Robust Out-of-Distribution Detection Using Lyapunov-Stabilized Embeddings
Hossein Mirzaei Sadeghlou, Mackenzie Mathis
Aligned Better, Listen Better for Audio-Visual Large Language Models
Yuxin Guo, Shuailei Ma, Shijie Ma et al.
Towards Transformer-Based Aligned Generation with Self-Coherence Guidance
Shulei Wang, Wang Lin, Hai Huang et al.
Rethinking Query-based Transformer for Continual Image Segmentation
Yuchen Zhu, Cheng Shi, Dingyou Wang et al.
BodyGen: Advancing Towards Efficient Embodiment Co-Design
Haofei Lu, Zhe Wu, Junliang Xing et al.
Diff3DS: Generating View-Consistent 3D Sketch via Differentiable Curve Rendering
Yibo Zhang, Lihong Wang, Changqing Zou et al.
Circumventing Shortcuts in Audio-visual Deepfake Detection Datasets with Unsupervised Learning
Stefan Smeu, Dragos-Alexandru Boldisor, Dan Oneata et al.
EventGPT: Event Stream Understanding with Multimodal Large Language Models
shaoyu liu, Jianing Li, guanghui zhao et al.
GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency
Dongyue Lu, Lingdong Kong, Tianxin Huang et al.
Relieving Universal Label Noise for Unsupervised Visible-Infrared Person Re-Identification by Inferring from Neighbors
Xiao Teng, Long Lan, Dingyao Chen et al.
How Transformers Learn Structured Data: Insights From Hierarchical Filtering
Jerome Garnier-Brun, Marc Mezard, Emanuele Moscato et al.
Learning Pattern-Specific Experts for Time Series Forecasting Under Patch-level Distribution Shift
Yanru Sun, Zongxia Xie, Emadeldeen Eldele et al.
Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment
Yifan Zhang, Ge Zhang, Yue Wu et al.
MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem
Fan LIU, Zherui Yang, Cancheng Liu et al.
Theoretically Grounded Framework for LLM Watermarking: A Distribution-Adaptive Approach
Haiyun He, Yepeng Liu, Ziqiao Wang et al.
Diffusion models for Gaussian distributions: Exact solutions and Wasserstein errors
Emile Pierret, Bruno Galerne
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
Weiming Ren, Huan Yang, Jie Min et al.
Diffusion Tree Sampling: Scalable inference‑time alignment of diffusion models
Vineet Jain, Kusha Sareen, Mohammad Pedramfar et al.
AgentTTS: Large Language Model Agent for Test-time Compute-optimal Scaling Strategy in Complex Tasks
Fali Wang, Hui Liu, Zhenwei Dai et al.
SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation
Koichi Saito, Dongjun Kim, Takashi Shibuya et al.
Multi-Focus Image Fusion via Explicit Defocus Blur Modelling
Yuhui Quan, Xi Wan, Zitao Tang et al.
LesionLocator: Zero-Shot Universal Tumor Segmentation and Tracking in 3D Whole-Body Imaging
Maximilian Rokuss, Yannick Kirchhoff, Seval Akbal et al.
HELM: Hierarchical Encoding for mRNA Language Modeling
Mehdi Yazdani-Jahromi, Mangal Prakash, Tommaso Mansi et al.
SketchVideo: Sketch-based Video Generation and Editing
Feng-Lin Liu, Hongbo Fu, Xintao Wang et al.
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections
Da Xiao, Qingye Meng, Shengping Li et al.
Mamba4D: Efficient 4D Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models
Jiuming Liu, Jinru Han, Lihao Liu et al.
Solving Inequality Proofs with Large Language Models
Jiayi Sheng, Luna Lyu, Jikai Jin et al.
Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?
Simon Park, Abhishek Panigrahi, Yun Cheng et al.
HC-LLM: Historical-Constrained Large Language Models for Radiology Report Generation
Tengfei Liu, Jiapu Wang, Yongli Hu et al.
Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing
Hanhui Wang, Yihua Zhang, Ruizheng Bai et al.
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs
Xiaoqin Wang, Xusen Ma, Xianxu Hou et al.
Modality-Specialized Synergizers for Interleaved Vision-Language Generalists
Zhiyang Xu, Minqian Liu, Ying Shen et al.
Image Quality Assessment: Investigating Causal Perceptual Effects with Abductive Counterfactual Inference
Wenhao Shen, Mingliang Zhou, Yu Chen et al.
Incomplete Modality Disentangled Representation for Ophthalmic Disease Grading and Diagnosis
Chengzhi Liu, Zile Huang, Zhe Chen et al.
Fast and Slow Streams for Online Time Series Forecasting Without Information Leakage
Ying-yee Ava Lau, Zhiwen Shao, Dit-Yan Yeung
ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention
Bencheng Liao, Xinggang Wang, Lianghui Zhu et al.
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers
Zeyuan Allen-Zhu
RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models
Haoran Hao, Jiaming Han, Changsheng Li et al.
Data Taggants: Dataset Ownership Verification Via Harmless Targeted Data Poisoning
Wassim Bouaziz, Nicolas Usunier, El-Mahdi El-Mhamdi
Why Masking Diffusion Works: Condition on the Jump Schedule for Improved Discrete Diffusion
Alan Amin, Nate Gruver, Andrew Wilson
ACE: Anti-Editing Concept Erasure in Text-to-Image Models
Zihao Wang, Yuxiang Wei, Fan Li et al.
AMO Sampler: Enhancing Text Rendering with Overshooting
Xixi Hu, Keyang Xu, Bo Liu et al.
Feature Denoising Diffusion Model for Blind Image Quality Assessment
Xudong Li, Yan Zhang, Yunhang Shen et al.
Micro-macro Wavelet-based Gaussian Splatting for 3D Reconstruction from Unconstrained Images
Yihui Li, Chengxin Lv, Hongyu Yang et al.
DELIFT: Data Efficient Language model Instruction Fine-Tuning
Ishika Agarwal, Krishnateja Killamsetty, Lucian Popa et al.
FALCON: Fairness Learning via Contrastive Attention Approach to Continual Semantic Scene Understanding
Thanh-Dat Truong, Utsav Prabhu, Bhiksha Raj et al.
INST-IT: Boosting Instance Understanding via Explicit Visual Prompt Instruction Tuning
Wujian Peng, Lingchen Meng, Yitong Chen et al.
MergeNet: Knowledge Migration Across Heterogeneous Models, Tasks, and Modalities
Kunxi Li, Tianyu Zhan, Kairui Fu et al.
Edge-SD-SR: Low Latency and Parameter Efficient On-device Super-Resolution with Stable Diffusion via Bidirectional Conditioning
Isma Hadji, Mehdi Noroozi, Victor Escorcia et al.
BARD-GS: Blur-Aware Reconstruction of Dynamic Scenes via Gaussian Splatting
Yiren Lu, Yunlai Zhou, Disheng Liu et al.
3D-GSW: 3D Gaussian Splatting for Robust Watermarking
Youngdong Jang, Hyunje Park, Feng Yang et al.
Geometry-aware RL for Manipulation of Varying Shapes and Deformable Objects
Tai Hoang, Huy Le, Philipp Becker et al.
Soft Reasoning: Navigating Solution Spaces in Large Language Models through Controlled Embedding Exploration
Qinglin Zhu, Runcong Zhao, Hanqi Yan et al.
SapiensID: Foundation for Human Recognition
Minchul Kim, Dingqiang Ye, Yiyang Su et al.
Sail into the Headwind: Alignment via Robust Rewards and Dynamic Labels against Reward Hacking
Paria Rashidinejad, Yuandong Tian
Learning Chaos In A Linear Way
Xiaoyuan Cheng, Yi He, Yiming Yang et al.
Can Textual Gradient Work in Federated Learning?
Minghui Chen, Ruinan Jin, Wenlong Deng et al.
Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration
Zilong Huang, Jun He, Junyan Ye et al.
Adversarial Generative Flow Network for Solving Vehicle Routing Problems
Ni Zhang, Jingfeng Yang, Zhiguang Cao et al.