Most Cited 2025 "masked autoencoder paradigm" Papers
22,274 papers found • Page 107 of 112
Conference
CHARM3R: Towards Unseen Camera Height Robust Monocular 3D Detector
Abhinav Kumar, Yuliang Guo, Zhihao Zhang et al.
PID-controlled Langevin Dynamics for Faster Sampling on Generative Models
Hongyi Chen, Jianhai Shu, Jingtao Ding et al.
Learning on the Go: A Meta-learning Object Navigation Model
Xiaorong Qin, Xinhang Song, Sixian Zhang et al.
WonderPlay: Dynamic 3D Scene Generation from a Single Image and Actions
Zizhang Li, Hong-Xing Yu, Wei Liu et al.
Beyond the Destination: A Novel Benchmark for Exploration-Aware Embodied Question Answering
Kaixuan Jiang, Yang Liu, Weixing Chen et al.
Not all Views are Created Equal: Analyzing Viewpoint Instabilities in Vision Foundation Models
Mateusz Michalkiewicz, Xinyue Bai, Mahsa Baktashmotlagh et al.
CHROME: Clothed Human Reconstruction with Occlusion-Resilience and Multiview-Consistency from a Single Image
Arindam Dutta, Meng Zheng, Zhongpai Gao et al.
ReCoT: Reflective Self-Correction Training for Mitigating Confirmation Bias in Large Vision-Language Models
Mengxue Qu, Yibo Hu, Kunyang Han et al.
STree: Speculative Tree Decoding for Hybrid State Space Models
Yangchao Wu, Zongyue Qin, Alex Wong et al.
Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base
Linxin Song, Xuwei Ding, Jieyu Zhang et al.
OMNI-DC: Highly Robust Depth Completion with Multiresolution Depth Integration
Yiming Zuo, Willow Yang, Zeyu Ma et al.
Multi-Objective Reinforcement Learning with Max-Min Criterion: A Game-Theoretic Approach
woohyeon Byeon, Giseung Park, Jongseong Chae et al.
SNEAKDOOR: Stealthy Backdoor Attacks against Distribution Matching-based Dataset Condensation
He Yang, Dongyi Lv, Song Ma et al.
CogNav: Cognitive Process Modeling for Object Goal Navigation with LLMs
Yihan Cao, Jiazhao Zhang, Zhinan Yu et al.
Disentangling Hyperedges through the Lens of Category Theory
Yoonho Lee, Junseok Lee, Sangwoo Seo et al.
Bridging the Sky and Ground: Towards View-Invariant Feature Learning for Aerial-Ground Person Re-Identification
Wajahat Khalid, Bin Liu, Xulin Li et al.
WalkVLM: Aid Visually Impaired People Walking by Vision Language Model
Zhiqiang Yuan, Ting Zhang, Yeshuang Zhu et al.
VIGFace: Virtual Identity Generation for Privacy-Free Face Recognition Dataset
Minsoo Kim, Min-Cheol Sagong, Gi Pyo Nam et al.
MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space
Lixing Xiao, Shunlin Lu, Huaijin Pi et al.
Benchmarking Retrieval-Augmented Multimomal Generation for Document Question Answering
Kuicai Dong, CHANG YUJING, Shijie Huang et al.
MUniverse: A Simulation and Benchmarking Suite for Motor Unit Decomposition
Pranav Mamidanna, Thomas Klotz, Dimitrios Chalatsis et al.
CineTechBench: A Benchmark for Cinematographic Technique Understanding and Generation
Xinran Wang, Songyu Xu, Shan Xiangxuan et al.
Mixture of Experts Guided by Gaussian Splatters Matters: A new Approach to Weakly-Supervised Video Anomaly Detection
Giacomo D'Amicantonio, Snehashis Majhi, Quan Kong et al.
What If: Understanding Motion Through Sparse Interactions
Stefan A. Baumann, Nick Stracke, Timy Phan et al.
Sekai: A Video Dataset towards World Exploration
Zhen Li, Chuanhao Li, Xiaofeng Mao et al.
Beyond Label Semantics: Language-Guided Action Anatomy for Few-shot Action Recognition
Zefeng Qian, Xincheng Yao, Yifei Huang et al.
Homogeneous Algorithms Can Reduce Competition in Personalized Pricing
Nathanael Jo, Ashia Wilson, Kathleen Creel et al.
TIDMAD: Time Series Dataset for Discovering Dark Matter with AI Denoising
Jessica Fry, Xinyi Fu, Zhenghao Fu et al.
Revisiting Generative Infrared and Visible Image Fusion Based on Human Cognitive Laws
Lin Guo, Xiaoqing Luo, Wei Xie et al.
MamTiff-CAD: Multi-Scale Latent Diffusion with Mamba+ for Complex Parametric Sequence
Liyuan Deng, Yunpeng Bai, Yongkang Dai et al.
Local Scale Equivariance with Latent Deep Equilibrium Canonicalizer
Md Ashiqur Rahman, Chiao-An Yang, Michael N Cheng et al.
RigAnyFace: Scaling Neural Facial Mesh Auto-Rigging with Unlabeled Data
Wenchao Ma, Dario Kneubuehler, Maurice Chu et al.
EfficientMT: Efficient Temporal Adaptation for Motion Transfer in Text-to-Video Diffusion Models
Yufei Cai, Hu Han, Yuxiang Wei et al.
Moment- and Power-Spectrum-Based Gaussianity Regularization for Text-to-Image Models
Jisung Hwang, Jaihoon Kim, Minhyuk Sung
Collab-RAG: Boosting Retrieval-Augmented Generation for Complex Question Answering via White-Box and Black-Box LLM Collaboration
Ran Xu, Wenqi Shi, Yuchen Zhuang et al.
Deep Adaptive Unfolded Network via Spatial Morphology Stripping and Spectral Filtration for Pan-sharpening
Hebaixu Wang, Jiayi Ma
Reference-based Super-Resolution via Image-based Retrieval-Augmented Generation Diffusion
Byeonghun Lee, Hyunmin Cho, Honggyu Choi et al.
Vulnerability-Aware Spatio-Temporal Learning for Generalizable Deepfake Video Detection
Dat NGUYEN, Marcella Astrid, Anis Kacem et al.
Multi-modal Identity Extraction
Ryan Webster, Teddy Furon
Understanding Differential Transformer Unchains Pretrained Self-Attentions
Chaerin Kong, Jiho Jang, Nojun Kwak
CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing Games
Peng Chen, Pi Bu, Yingyao Wang et al.
Blind Noisy Image Deblurring Using Residual Guidance Strategy
Heyan Liu, Jianing Sun, Jun Liu et al.
Drawing Developmental Trajectory from Cortical Surface Reconstruction
WENXUAN WU, ruowen qu, Zhongliang Liu et al.
ActiveVOO: Value of Observation Guided Active Knowledge Acquisition for Open-World Embodied Lifted Regression Planning
Xiaotian Liu, Ali Pesaranghader, Jaehong Kim et al.
Less is More: Improving Motion Diffusion Models with Sparse Keyframes
Jinseok Bae, Inwoo Hwang, Young-Yoon Lee et al.
DGTalker: Disentangled Generative Latent Space Learning for Audio-Driven Gaussian Talking Heads
Xiaoxi Liang, Yanbo Fan, Qiya Yang et al.
Sample Efficient Preference Alignment in LLMs via Active Exploration
Viraj Mehta, Syrine Belakaria, Vikramjeet Das et al.
Causal-Entity Reflected Egocentric Traffic Accident Video Synthesis
Lei-lei Li, Jianwu Fang, Junbin Xiao et al.
Riemannian-Geometric Fingerprints of Generative Models
Hae Jin Song, Laurent Itti
G-DexGrasp: Generalizable Dexterous Grasping Synthesis Via Part-Aware Prior Retrieval and Prior-Assisted Generation
Juntao Jian, Xiuping Liu, Zixuanchen Zixuanchen et al.
Fast Projection-Free Approach (without Optimization Oracle) for Optimization over Compact Convex Set
Chenghao Liu, Enming Liang, Minghua Chen
ISP2HRNet: Learning to Reconstruct High Resolution Image from Irregularly Sampled Pixels via Hierarchical Gradient Learning
Yuanlin Wang, Ruiqin Xiong, Rui Zhao et al.
Learning to Factorize Spatio-Temporal Foundation Models
Siru Zhong, Junjie Qiu, Yangyu Wu et al.
Event-Driven Storytelling with Multiple Lifelike Humans in a 3D Scene
Donggeun Lim, Jinseok Bae, Inwoo Hwang et al.
Robust and Scalable Autonomous Reinforcement Learning in Irreversible Environments
Sang-Hyun Lee
Disentangling misreporting from genuine adaptation in strategic settings: a causal approach
Dylan Zapzalka, Trenton Chang, Lindsay Warrenburg et al.
Fast Image Super-Resolution via Consistency Rectified Flow
Jiaqi Xu, Wenbo Li, Haoze Sun et al.
Event-guided HDR Reconstruction with Diffusion Priors
Yixin Yang, jiawei zhang, Yang Zhang et al.
AffordDexGrasp: Open-set Language-guided Dexterous Grasp with Generalizable-Instructive Affordance
Yilin Wei, Mu Lin, Yuhao Lin et al.
Robust Adverse Weather Removal via Spectral-based Spatial Grouping
Yuhwan Jeong, Yunseo Yang, Youngho Yoon et al.
Hipandas: Hyperspectral Image Joint Denoising and Super-Resolution by Image Fusion with the Panchromatic Image
Shuang Xu, Zixiang Zhao, Haowen Bai et al.
Revolutionizing Graph Aggregation: From Suppression to Amplification via BoostGCN
Jiaxin Wu, Chenglong Pang, Guangxiong Chen et al.
VA-GS: Enhancing the Geometric Representation of Gaussian Splatting via View Alignment
Qing Li, Huifang Feng, Xun Gong et al.
Scaling Law with Learning Rate Annealing
Howe Tissue, Venus Wang, Lu Wang
RODS: Robust Optimization Inspired Diffusion Sampling for Detecting and Reducing Hallucination in Generative Models
Yiqi Tian, Pengfei Jin, Mingze Yuan et al.
A*-Thought: Efficient Reasoning via Bidirectional Compression for Low-Resource Settings
Xiaoang Xu, Shuo Wang, Xu Han et al.
VideoSetDiff: Identifying and Reasoning Similarities and Differences in Similar Videos
YUE QIU, Yanjun Sun, Takuma Yagi et al.
HADES: Human Avatar with Dynamic Explicit Hair Strands
Zhanfeng Liao, Hanzhang Tu, Cheng Peng et al.
DreamRelation: Relation-Centric Video Customization
Yujie Wei, Shiwei Zhang, Hangjie Yuan et al.
A Learning-Augmented Dynamic Programming Approach for Orienteering Problem with Time Windows
Guansheng Peng, Lining Xing, Fuyan Ma et al.
FoundIR: Unleashing Million-scale Training Data to Advance Foundation Models for Image Restoration
Hao Li, Xiang Chen, Jiangxin Dong et al.
Highlight What You Want: Weakly-Supervised Instance-Level Controllable Infrared-Visible Image Fusion
Zeyu Wang, Jizheng Zhang, Haiyu Song et al.
FaceLift: Learning Generalizable Single Image 3D Face Reconstruction from Synthetic Heads
Weijie Lyu, Yi Zhou, Ming-Hsuan Yang et al.
Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images
Boyang Deng, Kyle Genova, Songyou Peng et al.
Blind2Sound: Self-Supervised Image Denoising without Residual Noise
Jiazheng Liu, Zejin Wang, Bohao Chen et al.
IMoRe: Implicit Program-Guided Reasoning for Human Motion Q&A
Chen Li, Chinthani Sugandhika, Ee Yeo Keat et al.
$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources
Apoorv Khandelwal, Tian Yun, Nihal V. Nayak et al.
MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation
Sungwoo Cho, Jeongsoo Choi, Sungnyun Kim et al.
Privacy-centric Deep Motion Retargeting for Anonymization of Skeleton-Based Motion Visualization
Thomas Carr, Depeng Xu, Shuhan Yuan et al.
UniPhys: Unified Planner and Controller with Diffusion for Flexible Physics-Based Character Control
Yan Wu, Korrawe Karunratanakul, Zhengyi Luo et al.
UniRes: Universal Image Restoration for Complex Degradations
Mo Zhou, Keren Ye, Mauricio Delbracio et al.
SV4D 2.0: Enhancing Spatio-Temporal Consistency in Multi-View Video Diffusion for High-Quality 4D Generation
Chun-Han Yao, Yiming Xie, Vikram Voleti et al.
Light-A-Video: Training-free Video Relighting via Progressive Light Fusion
Yujie Zhou, Jiazi Bu, Pengyang Ling et al.
Go to Zero: Towards Zero-shot Motion Generation with Million-scale Data
Ke Fan, Shunlin Lu, Minyue Dai et al.
Graph Few-Shot Learning via Adaptive Spectrum Experts and Cross-Set Distribution Calibration
Yonghao Liu, Yajun Wang, Chunli Guo et al.
Re-coding for Uncertainties: Edge-awareness Semantic Concordance for Resilient Event-RGB Segmentation
Nan Bao, Yifan Zhao, Lin Zhu et al.
Group-wise Scaling and Orthogonal Decomposition for Domain-Invariant Feature Extraction in Face Anti-Spoofing
Seungjin Jung, Kanghee Lee, Yonghyun Jeong et al.
LLM Unlearning Without an Expert Curated Dataset
Xiaoyuan Zhu, Muru Zhang, Ollie Liu et al.
RRO: LLM Agent Optimization Through Rising Reward Trajectories
Zilong Wang, Jingfeng Yang, Sreyashi Nag et al.
DynamicFace: High-Quality and Consistent Face Swapping for Image and Video using Composable 3D Facial Priors
Runqi Wang, Yang Chen, Sijie Xu et al.
DisenQ: Disentangling Q-Former for Activity-Biometrics
Shehreen Azad, Yogesh Rawat
Finding Low-Rank Matrix Weights in DNNs via Riemannian Optimization: RAdaGrad and RAdamW
Fengmiao Bian, Jinyang ZHENG, Ziyun Liu et al.
Online Bilateral Trade With Minimal Feedback: Don’t Waste Seller’s Time
Francesco Bacchiocchi, Matteo Castiglioni, Roberto Colomboni et al.
ProtoPairNet: Interpretable Regression through Prototypical Pair Reasoning
Rose Gurung, Ronilo Ragodos, Chiyu Ma et al.
Revisiting Frank-Wolfe for Structured Nonconvex Optimization
Hoomaan Maskan, Yikun Hou, Suvrit Sra et al.
T2Bs: Text-to-Character Blendshapes via Video Generation
Jiahao Luo, Chaoyang Wang, Michael Vasilkovsky et al.
LOMM: Latest Object Memory Management for Temporally Consistent Video Instance Segmentation
Seunghun Lee, Jiwan Seo, Minwoo Choi et al.
Chirality in Action: Time-Aware Video Representation Learning by Latent Straightening
Piyush Nitin Bagad, Andrew Zisserman
Quasi-Self-Concordant Optimization with $\ell_{\infty}$ Lewis Weights
Alina Ene, Ta Duy Nguyen, Adrian Vladu
MeshAnything V2: Artist-Created Mesh Generation with Adjacent Mesh Tokenization
Yiwen Chen, Yikai Wang, Yihao Luo et al.
π-AVAS: Can Physics-Integrated Audio-Visual Modeling Boost Neural Acoustic Synthesis?
Susan Liang, Chao Huang, Yolo Yunlong Tang et al.
SemGes: Semantics-aware Co-Speech Gesture Generation using Semantic Coherence and Relevance Learning
Lanmiao Liu, Esam Ghaleb, asli ozyurek et al.
I2VControl: Disentangled and Unified Video Motion Synthesis Control
Wanquan Feng, Tianhao Qi, Jiawei Liu et al.
MeshLLM: Empowering Large Language Models to Progressively Understand and Generate 3D Mesh
Shuangkang Fang, I-Chao Shen, Yufeng Wang et al.
LUT-Fuse: Towards Extremely Fast Infrared and Visible Image Fusion via Distillation to Learnable Look-Up Tables
Xunpeng Yi, yibing zhang, Xinyu Xiang et al.
MixANT: Observation-dependent Memory Propagation for Stochastic Dense Action Anticipation
Syed Talal Wasim, Hamid Suleman, Olga Zatsarynna et al.
Imagine All The Relevance: Scenario-Profiled Indexing with Knowledge Expansion for Dense Retrieval
Sangam Lee, Ryang Heo, SeongKu Kang et al.
FLOAT: Generative Motion Latent Flow Matching for Audio-driven Talking Portrait
Taekyung Ki, Dongchan Min, Gyeongsu Chae
M²IV: Towards Efficient and Fine-grained Multimodal In-Context Learning via Representation Engineering
Yanshu Li, Yi Cao, Hongyang He et al.
2HandedAfforder: Learning Precise Actionable Bimanual Affordances from Human Videos
Marvin Heidinger, Snehal Jauhri, Vignesh Prasad et al.
Language models align with brain regions that represent concepts across modalities
Maria Ryskina, Greta Tuckute, Alexander Fung et al.
Mixed-Sample SGD: an End-to-end Analysis of Supervised Transfer Learning
Yuyang Deng, Samory Kpotufe
RayZer: A Self-supervised Large View Synthesis Model
Hanwen Jiang, Hao Tan, Peng Wang et al.
MatchDiffusion: Training-free Generation of Match-Cuts
Alejandro Pardo, Fabio Pizzati, Tong Zhang et al.
Scalable Dual Fingerprinting for Hierarchical Attribution of Text-to-Image Models
Jianwei Fei, Yunshu Dai, Peipeng Yu et al.
QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation
Junyi Wu, Zhiteng Li, Zheng Hui et al.
Tree-NeRV: Efficient Non-Uniform Sampling for Neural Video Representation via Tree-Structured Feature Grids
Jiancheng Zhao, Yifan Zhan, Qingtian Zhu et al.
MaTe: Images Are All You Need for Material Transfer via Diffusion Transformer
Nisha Huang, Henglin Liu, Yizhou Lin et al.
Adaptive Caching for Faster Video Generation with Diffusion Transformers
Kumara Kahatapitiya, Haozhe Liu, Sen He et al.
FlowChef: Steering of Rectified Flow Models for Controlled Generations
Maitreya Patel, Song Wen, Dimitris Metaxas et al.
SynTag: Enhancing the Geometric Robustness of Inversion-based Generative Image Watermarking
Han Fang, Kejiang Chen, Zehua Ma et al.
Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning Code LLMs
Zichao Hu, Junyi Jessy Li, Arjun Guha et al.
WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation
Zhongyu Yang, Jun Chen, Dannong Xu et al.
QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning
Haoxuan Wang, Yuzhang Shang, Zhihang Yuan et al.
ALOPE: Adaptive Layer Optimization for Translation Quality Estimation using Large Language Models
Archchana Sindhujan, Shenbin Qian, Chan Chi Chun Matthew et al.
Split-and-Combine: Enhancing Style Augmentation for Single Domain Generalization
Zhen Zhang, Zhen Zhang, Qianlong Dang et al.
Fast Non-Log-Concave Sampling under Nonconvex Equality and Inequality Constraints with Landing
Kijung Jeon, Michael Muehlebach, Molei Tao
Fractional Langevin Dynamics for Combinatorial Optimization via Polynomial-Time Escape
Shiyue Wang, Ziao Guo, Changhong Lu et al.
Zero-Shot Depth Aware Image Editing with Diffusion Models
Rishubh Parihar, Sachidanand VS, Venkatesh Babu Radhakrishnan
Pose-Star: Anatomy-Aware Editing for Open-World Fashion Images
Yuran Dong, Mang Ye
Who Controls the Authorization? Invertible Networks for Copyright Protection in Text-to-Image Synthesis
Baoyue Hu, Yang Wei, Junhao Xiao et al.
Retrieval-Augmented Generation with Conflicting Evidence
Han Wang, Archiki Prasad, Elias Stengel-Eskin et al.
FontAnimate: High Quality Few-shot Font Generation via Animating Font Transfer Process
Bin Fu, Zixuan Wang, Kainan Yan et al.
CASCADE Your Datasets for Cross-Mode Knowledge Retrieval of Language Models
Runlong Zhou, Yi Zhang
LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation
Jiahao Wang, Ning Kang, Lewei Yao et al.
Sherkala-Chat: Building a State-of-the-Art LLM for Kazakh in a Moderately Resourced Setting
Fajri Koto, Rituraj Joshi, Nurdaulet Mukhituly et al.
Pre-Trained Policy Discriminators are General Reward Models
Shihan Dou, Shichun Liu, Yuming Yang et al.
TextMaster: A Unified Framework for Realistic Text Editing via Glyph-Style Dual-Control
Zhenyu Yan, Jian Wang, Aoqiang Wang et al.
MCID: Multi-aspect Copyright Infringement Detection for Generated Images
Chuanwei Huang, Zexi Jia, Hongyan Fei et al.
Text2Outfit: Controllable Outfit Generation with Multimodal Language Models
Yuanhao Zhai, Yen-Liang Lin, Minxu Peng et al.
Advancing Language Multi-Agent Learning with Credit Re-Assignment for Interactive Environment Generalization
Zhitao He, Zijun Liu, Peng Li et al.
Self-Steering Language Models
Gabriel Grand, Joshua B. Tenenbaum, Vikash Mansinghka et al.
Universal Few-shot Spatial Control for Diffusion Models
Kiet Nguyen, Chanhyuk Lee, Donggyun Kim et al.
DIMCIM: A Quantitative Evaluation Framework for Default-mode Diversity and Generalization in Text-to-Image Generative Models
Revant Teotia, Candace Ross, Karen Ullrich et al.
Cross-Granularity Online Optimization with Masked Compensated Information for Learned Image Compression
Haowei Kuang, Wenhan Yang, Zongming Guo et al.
Layer as Puzzle Pieces: Compressing Large Language Models through Layer Concatenation
Fei Wang, Li Shen, Liang Ding et al.
MOF-BFN: Metal-Organic Frameworks Structure Prediction via Bayesian Flow Networks
Rui Jiao, Hanlin Wu, Wenbing Huang et al.
TeEFusion: Blending Text Embeddings to Distill Classifier-Free Guidance
Minghao Fu, Guo-Hua Wang, Xiaohao Chen et al.
CompSlider: Compositional Slider for Disentangled Multiple-Attribute Image Generation
Zixin Zhu, Kevin Duarte, Mamshad Nayeem Rizve et al.
Predicting Functional Brain Connectivity with Context-Aware Deep Neural Networks
Alexander Ratzan, Sidharth Goel, Junhao Wen et al.
PLA: Prompt Learning Attack against Text-to-Image Generative Models
XINQI LYU, Yihao LIU, Yanjie Li et al.
Holistic Tokenizer for Autoregressive Image Generation
Anlin Zheng, Haochen Wang, Yucheng Zhao et al.
DanceEditor: Towards Iterative Editable Music-driven Dance Generation with Open-Vocabulary Descriptions
Hengyuan Zhang, Zhe Li, Xingqun Qi et al.
Toward Better Out-painting: Improving the Image Composition with Initialization Policy Model
Xuan Han, Yihao Zhao, Yanhao Ge et al.
Versatile Transition Generation with Image-to-Video Diffusion
Zuhao Yang, Jiahui Zhang, Yingchen Yu et al.
MetaMorph: Multimodal Understanding and Generation via Instruction Tuning
Shengbang Tong, David Fan, Jiachen Zhu et al.
DiffIP: Representation Fingerprints for Robust IP Protection of Diffusion Models
Zhuoling Li, Haoxuan Qu, Jason Kuen et al.
Processing and acquisition traces in visual encoders: What does CLIP know about your camera?
Ryan Ramos, Vladan Stojnić, Giorgos Kordopatis-Zilos et al.
AM-Adapter: Appearance Matching Adapter for Exemplar-based Semantic Image Synthesis in-the-Wild
Siyoon Jin, Jisu Nam, Jiyoung Kim et al.
Diffusion Epistemic Uncertainty with Asymmetric Learning for Diffusion-Generated Image Detection
Yingsong Huang, Hui Guo, Jing Huang et al.
Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models
Hyungjin Kim, Seokho Ahn, Young-Duk Seo
KINDLE: Knowledge-Guided Distillation for Prior-Free Gene Regulatory Network Inference
Rui Peng, Yuchen Lu, Qichen Sun et al.
V.I.P. : Iterative Online Preference Distillation for Efficient Video Diffusion Models
Jisoo Kim, Wooseok Seo, Junwan Kim et al.
X-Prompt: Generalizable Auto-Regressive Visual Learning with In-Context Prompting
Zeyi Sun, Ziyang Chu, Pan Zhang et al.
AnyI2V: Animating Any Conditional Image with Motion Control
Ziye Li, Xincheng Shuai, Hao Luo et al.
Diffusion Models Meet Contextual Bandits
Imad Aouali
Transfer Learning on Edge Connecting Probability Estimation Under Graphon Model
Yuyao Wang, Yu-Hung Cheng, Debarghya Mukherjee et al.
EEdit : Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing
Zexuan Yan, Yue Ma, Chang Zou et al.
RAGDiffusion: Faithful Cloth Generation via External Knowledge Assimilation
Yuhan Li, Xianfeng Tan, Wenxiang Shang et al.
Instruction-based Image Editing with Planning, Reasoning, and Generation
Liya Ji, Chenyang Qi, Qifeng Chen
HDR Image Generation via Gain Map Decomposed Diffusion
Yuanshen Guan, Ruikang Xu, Yinuo Liao et al.
ESSENTIAL: Episodic and Semantic Memory Integration for Video Class-Incremental Learning
Jongseo Lee, Kyungho Bae, Kyle Min et al.
Accelerating Diffusion Transformer via Gradient-Optimized Cache
Junxiang Qiu, Lin Liu, Shuo Wang et al.
The Silent Assistant: NoiseQuery as Implicit Guidance for Goal-Driven Image Generation
Ruoyu Wang, Huayang Huang, Ye Zhu et al.
Progressive Growing of Video Tokenizers for Temporally Compact Latent Spaces
Aniruddha Mahapatra, Long Mai, David Bourgin et al.
MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs
Yunqiu Xu, Linchao Zhu, Yi Yang
Planning and Learning in Average Risk-aware MDPs
Weikai Wang, Erick Delage
HyTIP: Hybrid Temporal Information Propagation for Masked Conditional Residual Video Coding
Yi-Hsin Chen, Yi-Chen Yao, Kuan-Wei Ho et al.
DACoN: DINO for Anime Paint Bucket Colorization with Any Number of Reference Images
Kazuma Nagata, Naoshi Kaneko
Vertical Federated Feature Screening
Huajun Yin, Liyuan Wang, Yingqiu Zhu et al.
Parametric Shadow Control for Portrait Generation in Text-to-Image Diffusion Models
Haoming Cai, Tsung-Wei Huang, Shiv Gehlot et al.
UniversalBooth: Model-Agnostic Personalized Text-to-Image Generation
Songhua Liu, Ruonan Yu, Xinchao Wang
Uni-RL: Unifying Online and Offline RL via Implicit Value Regularization
Haoran Xu, Liyuan Mao, Hui Jin et al.
Tight Bounds for Maximum Weight Matroid Independent Set and Matching in the Zero Communication Model
Ilan Doron-Arad
CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching
Zizhuo Li, Yifan Lu, Linfeng Tang et al.
Hyper-Modality Enhancement for Multimodal Sentiment Analysis with Missing Modalities
Yan Zhuang, Minhao Liu, Wei Bai et al.
LoMix: Learnable Weighted Multi-Scale Logits Mixing for Medical Image Segmentation
Md Mostafijur Rahman, Radu Marculescu
LOCATEdit: Graph Laplacian Optimized Cross Attention for Localized Text-Guided Image Editing
Achint Soni, Meet Soni, Sirisha Rambhatla
Style over Substance: Distilled Language Models Reason Via Stylistic Replication
Philip Lippmann, Jie Yang
FastVAR: Linear Visual Autoregressive Modeling via Cached Token Pruning
Hang Guo, Yawei Li, Taolin Zhang et al.
Beyond Isolated Words: Diffusion Brush for Handwritten Text-Line Generation
Gang Dai, Yifan Zhang, Yutao Qin et al.
BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation
Ruotong Wang, Mingli Zhu, Jiarong Ou et al.
Spectral Analysis of Representational Similarity with Limited Neurons
Hyunmo Kang, Abdulkadir Canatar, SueYeon Chung
SmolVLM: Redefining small and efficient multimodal models
Andrés Marafioti, Orr Zohar, Miquel Farré et al.
Hybrid Autoencoders for Tabular Data: Leveraging Model-Based Augmentation in Low-Label Settings
Erel Naor, Ofir Lindenbaum
Tracing Copied Pixels and Regularizing Patch Affinity in Copy Detection
Yichen Lu, Siwei Nie, Minlong Lu et al.
PixTalk: Controlling Photorealistic Image Processing and Editing with Language
Marcos Conde, Zihao Lu, Radu Timofte
A Unified Framework for Industrial Cel-Animation Colorization with Temporal-Structural Awareness
Xiaoyi Feng, Tao Huang, Peng Wang et al.
T2I-Copilot: A Training-Free Multi-Agent Text-to-Image System for Enhanced Prompt Interpretation and Interactive Generation
Chieh-Yun Chen, Min Shi, Gong Zhang et al.
LayerLock: Non-collapsing Representation Learning with Progressive Freezing
Goker Erdogan, Nikhil Parthasarathy, Catalin Ionescu et al.