Most Cited 2025 "molecular large language models" Papers
22,274 papers found • Page 40 of 112
Conference
Towards Realistic Semi-supervised Medical Image Classification
Wenxue Li, Lie Ju, Feilong Tang et al.
ORSO: Accelerating Reward Design via Online Reward Selection and Policy Optimization
Chen Bo Calvin Zhang, Zhang-Wei Hong, Aldo Pacchiano et al.
Text and Image Are Mutually Beneficial: Enhancing Training-Free Few-Shot Classification with CLIP
Yayuan Li, Jintao Guo, Lei Qi et al.
RoDA: Robust Domain Alignment for Cross-Domain Retrieval Against Label Noise
Ziniu Yin, Yanglin Feng, Ming Yan et al.
Self-Supervised Diffusion MRI Denoising via Iterative and Stable Refinement
Chenxu Wu, Qingpeng Kong, Zihang Jiang et al.
Decoupling Appearance Variations with 3D Consistent Features in Gaussian Splatting
Jiaqi Lin, Zhihao Li, Binxiao Huang et al.
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language
Jeong Hun Yeo, Chae Won Kim, Hyunjun Kim et al.
Optimized Gradient Clipping for Noisy Label Learning
Xichen Ye, Yifan Wu, Weizhong Zhang et al.
DPLUT: Unsupervised Low-light Image Enhancement with Lookup Tables and Diffusion Priors
Yunlong Lin, Zhenqi Fu, Kairun Wen et al.
Lightweight Predictive 3D Gaussian Splats
Junli Cao, Vidit Goel, Chaoyang Wang et al.
LICORICE: Label-Efficient Concept-Based Interpretable Reinforcement Learning
Zhuorui Ye, Stephanie Milani, Geoff Gordon et al.
CLIP-PCQA: Exploring Subjective-Aligned Vision-Language Modeling for Point Cloud Quality Assessment
Yating Liu, Yujie Zhang, Ziyu Shan et al.
SCOPE: Sign Language Contextual Processing with Embedding from LLMs
Yuqi Liu, Wenqian Zhang, Sihan Ren et al.
Anchor Learning with Potential Cluster Constraints for Multi-view Clustering
Yawei Chen, Huibing Wang, Jinjia Peng et al.
Storynizor: Consistent Story Generation via Inter-Frame Synchronized and Shuffled ID Injection
Yuhang Ma, Wenting Xu, Chaoyi Zhao et al.
Minimalist Concept Erasure in Generative Models
Yang Zhang, Er Jin, Yanfei Dong et al.
Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization
Jiaxin Deng, Junbiao Pang, Baochang Zhang et al.
EvdCLIP: Improving Vision-Language Retrieval with Entity Visual Descriptions from Large Language Models
GuangHao Meng, Sunan He, Jinpeng Wang et al.
MotionDreamer: One-to-Many Motion Synthesis with Localized Generative Masked Transformer
Yilin Wang, chuan guo, Yuxuan Mu et al.
CLIP-driven View-aware Prompt Learning for Unsupervised Vehicle Re-identification
Jiyang Xu, Qi Wang, Xin Xiong et al.
Modular-Cam: Modular Dynamic Camera-view Video Generation with LLM
Zirui Pan, Xin Wang, Yipeng Zhang et al.
Foreground-Covering Prototype Generation and Matching for SAM-Aided Few-Shot Segmentation
Suho Park, SuBeen Lee, Hyun Seok Seong et al.
Causal Information Prioritization for Efficient Reinforcement Learning
Hongye Cao, Fan Feng, Tianpei Yang et al.
TGBFormer: Transformer-GraphFormer Blender Network for Video Object Detection
Qiang Qi, Xiao Wang
Accelerating Training with Neuron Interaction and Nowcasting Networks
Boris Knyazev, Abhinav Moudgil, Guillaume Lajoie et al.
PointTalk: Audio-Driven Dynamic Lip Point Cloud for 3D Gaussian-based Talking Head Synthesis
Yifan Xie, Tao Feng, Xin Zhang et al.
NoVo: Norm Voting off Hallucinations with Attention Heads in Large Language Models
Zhengyi Ho, Siyuan Liang, Sen Zhang et al.
TIV-Diffusion: Towards Object-Centric Movement for Text-driven Image to Video Generation
Xingrui Wang, Xin Li, Yaosi Hu et al.
Topology-Aware 3D Gaussian Splatting: Leveraging Persistent Homology for Optimized Structural Integrity
Tianqi Shen, Shaohua Liu, Jiaqi Feng et al.
SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation
Pengfei Chen, Lingxi Xie, xinyue huo et al.
A Comprehensive Framework for Analyzing the Convergence of Adam: Bridging the Gap with SGD
Ruinan Jin, Xiao Li, Yaoliang Yu et al.
Flavors of Margin: Implicit Bias of Steepest Descent in Homogeneous Neural Networks
Nikolaos Tsilivis, Gal Vardi, Julia Kempe
Multi-Label Test-Time Adaptation with Bound Entropy Minimization
Xiangyu Wu, Feng Yu, Yang Yang et al.
Hierarchical Refinement: Optimal Transport to Infinity and Beyond
Peter Halmos, Julian Gold, Xinhao Liu et al.
Exploring Activation Patterns of Parameters in Language Models
Yudong Wang, Damai Dai, Zhe Yang et al.
Progressive Compression with Universally Quantized Diffusion Models
Yibo Yang, Justus Will, Stephan Mandt
Noise-Resilient Symbolic Regression with Dynamic Gating Reinforcement Learning
Chenglu Sun, Shuo Shen, Wenzhi Tao et al.
FRUGAL: Memory-Efficient Optimization by Reducing State Overhead for Scalable Training
Filipp Zmushko, Aleksandr Beznosikov, Martin Takac et al.
Sample Efficient Demonstration Selection for In-Context Learning
Kiran Purohit, Venktesh V, Sourangshu Bhattacharya et al.
Model Immunization from a Condition Number Perspective
Amber Yijia Zheng, Cedar Site Bai, Brian Bullins et al.
DreamCatalyst: Fast and High-Quality 3D Editing via Controlling Editability and Identity Preservation
Jiwook Kim, Seonho Lee, Jaeyo Shin et al.
Harmonizing Visual and Textual Embeddings for Zero-Shot Text-to-Image Customization
Yeji Song, Jimyeong Kim, Wonhark Park et al.
OpenHOI: Open-World Hand-Object Interaction Synthesis with Multimodal Large Language Model
Zhenhao Zhang, Ye Shi, Lingxiao Yang et al.
Score-based Pullback Riemannian Geometry: Extracting the Data Manifold Geometry using Anisotropic Flows
Willem Diepeveen, Georgios Batzolis, Zakhar Shumaylov et al.
GRADEO: Towards Human-Like Evaluation for Text-to-Video Generation via Multi-Step Reasoning
Zhun Mou, Bin Xia, Zhengchao Huang et al.
SteerX: Creating Any Camera-Free 3D and 4D Scenes with Geometric Steering
Byeongjun Park, Hyojun Go, Hyelin Nam et al.
VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video-Text Models
Dahun Kim, AJ Piergiovanni, Ganesh Satish Mallya et al.
EdgeDiff: Edge-aware Diffusion Network for Building Reconstruction from Point Clouds
Yujun Liu, Ruisheng Wang, Shangfeng Huang et al.
Integrating Task-Specific and Universal Adapters for Pre-Trained Model-based Class-Incremental Learning
yan wang, Da-Wei Zhou, Han-Jia Ye
ChartSketcher: Reasoning with Multimodal Feedback and Reflection for Chart Understanding
Muye Huang, Lingling Zhang, Jie Ma et al.
Physics-Driven Spatiotemporal Modeling for AI-Generated Video Detection
Shuhai Zhang, ZiHao Lian, Jiahao Yang et al.
Fractal Calibration for Long-tailed Object Detection
Konstantinos Alexandridis, Ismail Elezi, Jiankang Deng et al.
SEGS-SLAM: Structure-enhanced 3D Gaussian Splatting SLAM with Appearance Embedding
Tianci Wen, Zhiang Liu, Yongchun Fang
PTDiffusion: Free Lunch for Generating Optical Illusion Hidden Pictures with Phase-Transferred Diffusion Model
Xiang Gao, Shuai Yang, Jiaying Liu
LuxDiT: Lighting Estimation with Video Diffusion Transformer
Ruofan Liang, Kai He, Zan Gojcic et al.
DualReal: Adaptive Joint Training for Lossless Identity-Motion Fusion in Video Customization
Wenchuan Wang, Mengqi Huang, Yijing Tu et al.
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory
Nan Chen, Mengqi Huang, Yihao Meng et al.
Fine-grained Spatiotemporal Grounding on Egocentric Videos
Shuo LIANG, Yiwu Zhong, Zi-Yuan Hu et al.
StochasticSplats: Stochastic Rasterization for Sorting-Free 3D Gaussian Splatting
Shakiba Kheradmand, Delio Vicini, George Kopanas et al.
Towards Understanding and Quantifying Uncertainty for Text-to-Image Generation
Gianni Franchi, Nacim Belkhir, Dat NGUYEN et al.
Do different prompting methods yield a common task representation in language models?
Guy Davidson, Todd Gureckis, Brenden Lake et al.
Treatment Effect Estimation for Optimal Decision-Making
Dennis Frauen, Valentyn Melnychuk, Jonas Schweisthal et al.
Dense2MoE: Restructuring Diffusion Transformer to MoE for Efficient Text-to-Image Generation
Youwei Zheng, Yuxi Ren, Xin Xia et al.
Boosting Adversarial Transferability through Augmentation in Hypothesis Space
Yu Guo, Weiquan Liu, Qingshan Xu et al.
Triplets Better Than Pairs: Towards Stable and Effective Self-Play Fine-Tuning for LLMs
Yibo Wang, Hai-Long Sun, Guangda Huzhang et al.
VideoVLA: Video Generators Can Be Generalizable Robot Manipulators
Yichao Shen, Fangyun Wei, Zhiying Du et al.
Frequency-Aware Autoregressive Modeling for Efficient High-Resolution Image Synthesis
Zhuokun Chen, Jugang Fan, Zhuowei Yu et al.
UHD-processer: Unified UHD Image Restoration with Progressive Frequency Learning and Degradation-aware Prompts
Yidi Liu, Dong Li, Xueyang Fu et al.
SignRep: Enhancing Self-Supervised Sign Representations
Ryan Wong, Necati Cihan Camgoz, Richard Bowden
CaO2: Rectifying Inconsistencies in Diffusion-Based Dataset Distillation
Haoxuan Wang, Zhenghao Zhao, Junyi Wu et al.
InterGSEdit: Interactive 3D Gaussian Splatting Editing with 3D Geometry-Consistent Attention Prior
Minghao Wen, Shengjie Wu, Kangkan Wang et al.
Boost 3D Reconstruction using Diffusion-based Monocular Camera Calibration
Junyuan Deng, Wei Yin, Xiaoyang Guo et al.
MikuDance: Animating Character Art with Mixed Motion Dynamics
Jiaxu Zhang, Xianfang Zeng, Xin Chen et al.
From Alexnet to Transformers: Measuring the Non-linearity of Deep Neural Networks with Affine Optimal Transport
Quentin Bouniot, Ievgen Redko, Anton Mallasto et al.
MetaBox-v2: A Unified Benchmark Platform for Meta-Black-Box Optimization
Zeyuan Ma, Yue-Jiao Gong, Hongshu Guo et al.
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
Chenxin Tao, Shiqian Su, Xizhou Zhu et al.
MultiverSeg: Scalable Interactive Segmentation of Biomedical Imaging Datasets with In-Context Guidance
Hallee Wong, Jose Javier Gonzalez Ortiz, John Guttag et al.
HeMoRa: Unsupervised Heuristic Consensus Sampling for Robust Point Cloud Registration
Shaocheng Yan, Yiming Wang, Kaiyan Zhao et al.
Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection
Gensheng Pei, Tao Chen, Yujia Wang et al.
MARBLE: Material Recomposition and Blending in CLIP-Space
Ta-Ying Cheng, Prafull Sharma, Mark Boss et al.
Temporal Unlearnable Examples: Preventing Personal Video Data from Unauthorized Exploitation by Object Tracking
Qiangqiang Wu, Yi Yu, Chenqi Kong et al.
WeGen: A Unified Model for Interactive Multimodal Generation as We Chat
Zhipeng Huang, Shaobin Zhuang, Canmiao Fu et al.
External Knowledge Injection for CLIP-Based Class-Incremental Learning
Da-Wei Zhou, Kai-Wen Li, Jingyi Ning et al.
Improving Multimodal Learning via Imbalanced Learning
Shicai Wei, Chunbo Luo, Yang Luo
DreamLayer: Simultaneous Multi-Layer Generation via Diffusion Model
Junjia Huang, Pengxiang Yan, Jinhang Cai et al.
HERO: Human Reaction Generation from Videos
Chengjun Yu, Wei Zhai, Yuhang Yang et al.
Rethinking Tokenized Graph Transformers for Node Classification
Jinsong Chen, Chenyang Li, Gaichao Li et al.
Multi-party Collaborative Attention Control for Image Customization
Han Yang, Chuanguang Yang, Qiuli Wang et al.
FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation
Tianyun Zhong, Chao Liang, Jianwen Jiang et al.
PerLDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Model
Jinhua Zhang, Hualian Sheng, Sijia Cai et al.
KVQ: Boosting Video Quality Assessment via Saliency-guided Local Perception
Yunpeng Qu, Kun Yuan, Qizhi Xie et al.
Synergistic Prompting for Robust Visual Recognition with Missing Modalities
Zhihui Zhang, Luanyuan Dai, Qika Lin et al.
Bisecle: Binding and Separation in Continual Learning for Video Language Understanding
Yue Tan, Xiaoqian Hu, Hao Xue et al.
Adding Additional Control to One-Step Diffusion with Joint Distribution Matching
Yihong Luo, Tianyang Hu, Yifan Song et al.
Spiking Meets Attention: Efficient Remote Sensing Image Super-Resolution with Attention Spiking Neural Networks
Yi Xiao, Qiangqiang Yuan, Kui Jiang et al.
Streaming VideoLLMs for Real-Time Procedural Video Understanding
Dibyadip Chatterjee, Edoardo Remelli, Yale Song et al.
AutoData: A Multi-Agent System for Open Web Data Collection
Tianyi Ma, Yiyue Qian, Zheyuan Zhang et al.
CoMBO: Conflict Mitigation via Branched Optimization for Class Incremental Segmentation
Kai Fang, Anqi Zhang, Guangyu Gao et al.
Distribution-Aligned Decoding for Efficient LLM Task Adaptation
Senkang Hu, Xudong Han, Jinqi Jiang et al.
DisTime: Distribution-based Time Representation for Video Large Language Models
yingsen zeng, Zepeng Huang, Yujie Zhong et al.
GM-MoE: Low-Light Enhancement with Gated-Mechanism Mixture-of-Experts
Minwen Liao, Hao Dong, Xinyi Wang et al.
Learning Visual Generative Priors without Text
Shuailei Ma, Kecheng Zheng, Ying Wei et al.
DC-ControlNet: Decoupling Inter- and Intra-Element Conditions in Image Generation with Diffusion Models
hongji yang, Wencheng Han, Yucheng Zhou et al.
SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts
Gengze Zhou, Yicong Hong, Zun Wang et al.
SocialGesture: Delving into Multi-person Gesture Understanding
Xu Cao, Pranav Virupaksha, Wenqi Jia et al.
Hardware-Rasterized Ray-Based Gaussian Splatting
Samuel Rota Bulò, Lorenzo Porzi, Nemanja Bartolovic et al.
Versatile Transition Generation with Image-to-Video Diffusion
Zuhao Yang, Jiahui Zhang, Yingchen Yu et al.
WaveMamba: Wavelet-Driven Mamba Fusion for RGB-Infrared Object Detection
Haodong Zhu, Wenhao Dong, Linlin Yang et al.
X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation
jian ma, Qirong Peng, Xu Guo et al.
MambaVO: Deep Visual Odometry Based on Sequential Matching Refinement and Training Smoothing
Shuo Wang, Wanting Li, Yongcai Wang et al.
Constrained Diffusers for Safe Planning and Control
Jichen Zhang, Liqun Zhao, Antonis Papachristodoulou et al.
Efficient ANN-Guided Distillation: Aligning Rate-based Features of Spiking Neural Networks through Hybrid Block-wise Replacement
Shu Yang, Chengting Yu, Lei Liu et al.
DanmakuTPPBench: A Multi-modal Benchmark for Temporal Point Process Modeling and Understanding
Yue Jiang, Jichu Li, Yang Liu et al.
DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer
Yecheng Wu, Han Cai, Junyu Chen et al.
EchoONE: Segmenting Multiple Echocardiography Planes in One Model
Jiongtong Hu, Wei Zhuo, Jun Cheng et al.
Adventurer: Optimizing Vision Mamba Architecture Designs for Efficiency
Feng Wang, Timing Yang, Yaodong Yu et al.
Document Haystacks: Vision-Language Reasoning Over Piles of 1000+ Documents
Jun Chen, Dannong Xu, Junjie Fei et al.
Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization
Xiyue Peng, Hengquan Guo, Jiawei Zhang et al.
Free360: Layered Gaussian Splatting for Unbounded 360-Degree View Synthesis from Extremely Sparse and Unposed Views
Chong Bao, Xiyu Zhang, Zehao Yu et al.
TriDi: Trilateral Diffusion of 3D Humans, Objects, and Interactions
Ilya A. Petrov, Riccardo Marin, Julian Chibane et al.
OCRT: Boosting Foundation Models in the Open World with Object-Concept-Relation Triad
Luyao Tang, Chaoqi Chen, Yuxuan Yuan et al.
Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks
Debargha Ganguly, Vikash Singh, Sreehari Sankar et al.
E2Former: An Efficient and Equivariant Transformer with Linear-Scaling Tensor Products
Yunyang Li, Lin Huang, Zhihao Ding et al.
Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective
Yang Zhang, Xinran Li, Jianing Ye et al.
Learning single index models via harmonic decomposition
Nirmit Joshi, Hugo Koubbi, Theodor Misiakiewicz et al.
FlexEvent: Towards Flexible Event-Frame Object Detection at Varying Operational Frequencies
Dongyue Lu, Lingdong Kong, Gim Hee Lee et al.
In-Context Learning Strategies Emerge Rationally
Daniel Wurgaft, Ekdeep S Lubana, Core Francisco Park et al.
GUAVA: Generalizable Upper Body 3D Gaussian Avatar
Dongbin Zhang, Yunfei Liu, Lijian Lin et al.
Finding Local Diffusion Schrödinger Bridge using Kolmogorov-Arnold Network
Xingyu Qiu, Mengying Yang, Xinghua Ma et al.
GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding
Rui Hu, Yuxuan Zhang, Lianghui Zhu et al.
GaSLight: Gaussian Splats for Spatially-Varying Lighting in HDR
Christophe Bolduc, Yannick Hold-Geoffroy, Jean-Francois Lalonde
MotionPRO: Exploring the Role of Pressure in Human MoCap and Beyond
Shenghao Ren, Yi Lu, Jiayi Huang et al.
Adaptive Non-Uniform Timestep Sampling for Accelerating Diffusion Model Training
Myunsoo Kim, Donghyeon Ki, Seong-Woong Shim et al.
Object-aware Sound Source Localization via Audio-Visual Scene Understanding
Sung Jin Um, Dongjin Kim, Sangmin Lee et al.
MoEdit: On Learning Quantity Perception for Multi-object Image Editing
Yanfeng Li, Ka-Hou Chan, Yue Sun et al.
On the Consistency of Video Large Language Models in Temporal Comprehension
Minjoon Jung, Junbin Xiao, Byoung-Tak Zhang et al.
GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains
Chun Wang, Xiaojun Ye, Xiaoran Pan et al.
Spatial Transport Optimization by Repositioning Attention Map for Training-Free Text-to-Image Synthesis
Woojung Han, Yeonkyung Lee, Chanyoung Kim et al.
Tropical Attention: Neural Algorithmic Reasoning for Combinatorial Algorithms
Baran Hashemi, Kurt Pasque, Chris Teska et al.
Learning to Integrate Diffusion ODEs by Averaging the Derivatives
Wenze Liu, Xiangyu Yue
Not All Frame Features Are Equal: Video-to-4D Generation via Decoupling Dynamic-Static Features
Liying Yang, Chen Liu, Zhenwei Zhu et al.
SVDC: Consistent Direct Time-of-Flight Video Depth Completion with Frequency Selective Fusion
Xuan Zhu, Jijun Xiang, Xianqi Wang et al.
Native-Resolution Image Synthesis
ZiDong Wang, LEI BAI, Xiangyu Yue et al.
Alligat0R: Pre-Training through Covisibility Segmentation for Relative Camera Pose Regression
Thibaut Loiseau, Guillaume Bourmaud, Vincent Lepetit
Neural Thermodynamics: Entropic Forces in Deep and Universal Representation Learning
Liu Ziyin, Yizhou Xu, Isaac Chuang
ConceptGuard: Continual Personalized Text-to-Image Generation with Forgetting and Confusion Mitigation
Zirun Guo, Tao Jin
Can Generative Geospatial Diffusion Models Excel as Discriminative Geospatial Foundation Models?
Yuru Jia, Valerio Marsocci, Ziyang Gong et al.
Language Models Can Predict Their Own Behavior
Dhananjay Ashok, Jonathan May
Return of ChebNet: Understanding and Improving an Overlooked GNN on Long Range Tasks
Ali Hariri, Alvaro Arroyo, Alessio Gravina et al.
FreqPolicy: Efficient Flow-based Visuomotor Policy via Frequency Consistency
Yifei Su, Ning Liu, Dong Chen et al.
Learning Physics From Video: Unsupervised Physical Parameter Estimation for Continuous Dynamical Systems
Alejandro Castañeda Garcia, Jan Warchocki, Jan van Gemert et al.
Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation
Xiaoqi Li, Lingyun Xu, Mingxu Zhang et al.
SDFit: 3D Object Pose and Shape by Fitting a Morphable SDF to a Single Image
Dimitrije Antić, Georgios Paschalidis, Shashank Tripathi et al.
AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward
Haonan Han, Xiangzuo Wu, Huan Liao et al.
EgoM2P: Egocentric Multimodal Multitask Pretraining
Gen Li, Yutong Chen, Yiqian Wu et al.
SimMLM: A Simple Framework for Multi-modal Learning with Missing Modality
Sijie Li, Chen Chen, Jungong Han
AniMo: Species-Aware Model for Text-Driven Animal Motion Generation
Xuan Wang, Kai Ruan, Xing Zhang et al.
Backward Conformal Prediction
Etienne Gauthier, Francis Bach, Michael Jordan
Φ-GAN:Physics-Inspired GAN for Generating SAR Images Under Limited Data
Xidan Zhang, Yihan Zhuang, Qian Guo et al.
FlowDAS: A Stochastic Interpolant-based Framework for Data Assimilation
Siyi Chen, Yixuan Jia, Qing Qu et al.
Mitigating Forgetting in LLM Fine-Tuning via Low-Perplexity Token Learning
Chao-Chung Wu, Zhi Rui Tam, Chieh-Yen Lin et al.
Anomize: Better Open Vocabulary Video Anomaly Detection
Fei Li, Wenxuan Liu, Jingjing Chen et al.
NeurOp-Diff: Continuous Remote Sensing Image Super-Resolution via Neural Operator Diffusion
Zihao Xu, Yuzhi Tang, Bowen Xu et al.
Rethinking Neural Combinatorial Optimization for Vehicle Routing Problems with Different Constraint Tightness Degrees
Fu Luo, Yaoxin Wu, Zhi Zheng et al.
Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction
Yifei Wang, Weimin Bai, colin zhang et al.
NuiScene: Exploring Efficient Generation of Unbounded Outdoor Scenes
Han-Hung Lee, Qinghong Han, Angel Chang
OpenSDI: Spotting Diffusion-Generated Images in the Open World
Yabin Wang, Zhiwu Huang, Xiaopeng Hong
Block-Biased Mamba for Long-Range Sequence Processing
Annan Yu, N. Benjamin Erichson
Do It Yourself: Learning Semantic Correspondence from Pseudo-Labels
Olaf Dünkel, Thomas Wimmer, Christian Theobalt et al.
System Prompt Optimization with Meta-Learning
Yumin Choi, Jinheon Baek, Sung Ju Hwang
Frequency-Guided Posterior Sampling for Diffusion-Based Image Restoration
Darshan Thaker, Abhishek Goyal, Rene Vidal
Scaling Tumor Segmentation: Best Lessons from Real and Synthetic Data
Qi Chen, Xinze Zhou, Chen Liu et al.
Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding
Hanyin Wang, Zhenbang Wu, Gururaj Kolar et al.
Privacy Reasoning in Ambiguous Contexts
Ren Yi, Octavian Suciu, Adrian Gascon et al.
Online Learning of Neural Networks
Amit Daniely, Idan Mehalel, Elchanan Mossel
ANNEXE: Unified Analyzing, Answering, and Pixel Grounding for Egocentric Interaction
YUEJIAO SU, Yi Wang, Qiongyang Hu et al.
4Deform: Neural Surface Deformation for Robust Shape Interpolation
Lu Sang, Zehranaz Canfes, Dongliang Cao et al.
Complementary Advantages: Exploiting Cross-Field Frequency Correlation for NIR-Assisted Image Denoising
Yuchen Wang, Hongyuan Wang, Lizhi Wang et al.
OnlineAnySeg: Online Zero-Shot 3D Segmentation by Visual Foundation Model Guided 2D Mask Merging
Yijie Tang, Jiazhao Zhang, Yuqing Lan et al.
iSegMan: Interactive Segment-and-Manipulate 3D Gaussians
Yian Zhao, Wanshi Xu, Ruochong Zheng et al.
Multi-head Transformers Provably Learn Symbolic Multi-step Reasoning via Gradient Descent
Tong Yang, Yu Huang, Yingbin Liang et al.
Latent-Reframe: Enabling Camera Control for Video Diffusion Models without Training
Zhenghong Zhou, Jie An, Jiebo Luo
Improving Generalization of Neural Combinatorial Optimization for Vehicle Routing Problems via Test-Time Projection Learning
Yuanyao Chen, Rongsheng Chen, Fu Luo et al.
Heterogeneous Skeleton-Based Action Representation Learning
Xiaoyan Ma, jidong kuang, Hongsong Wang et al.
InterDyn: Controllable Interactive Dynamics with Video Diffusion Models
Rick Akkerman, Haiwen Feng, Michael J. Black et al.
Fine-grained List-wise Alignment for Generative Medication Recommendation
Chenxiao Fan, Chongming Gao, Wentao Shi et al.
Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields
Shijie Zhou, Hui Ren, Yijia Weng et al.
DeSplat: Decomposed Gaussian Splatting for Distractor-Free Rendering
Yihao Wang, Marcus Klasson, Matias Turkulainen et al.
4D Visual Pre-training for Robot Learning
Chengkai Hou, Yanjie Ze, Yankai Fu et al.
AnyI2V: Animating Any Conditional Image with Motion Control
Ziye Li, Xincheng Shuai, Hao Luo et al.
High-Fidelity Relightable Monocular Portrait Animation with Lighting-Controllable Video Diffusion Model
Mingtao Guo, Guanyu Xing, Yanli Liu
InfinityStar: Unified Spacetime AutoRegressive Modeling for Visual Generation
Jinlai Liu, Jian Han, Bin Yan et al.
Learning Heterogeneous Tissues with Mixture of Experts for Gigapixel Whole Slide Images
Junxian Wu, Minheng Chen, Xinyi Ke et al.
Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation
Xin Yan, Yuxuan Cai, Qiuyue Wang et al.
Brain-like Variational Inference
Hadi Vafaii, Dekel Galor, Jacob Yates
VLDrive: Vision-Augmented Lightweight MLLMs for Efficient Language-grounded Autonomous Driving
Ruifei Zhang, Wei Zhang, Xiao Tan et al.
A Quality-Guided Mixture of Score-Fusion Experts Framework for Human Recognition
Jie Zhu, Yiyang Su, Minchul Kim et al.
Improving Adversarial Transferability on Vision Transformers via Forward Propagation Refinement
Yuchen Ren, Zhengyu Zhao, Chenhao Lin et al.
Functional Scaling Laws in Kernel Regression: Loss Dynamics and Learning Rate Schedules
Binghui Li, Fengling Chen, Zixun Huang et al.
Improving Autoregressive Visual Generation with Cluster-Oriented Token Prediction
Teng Hu, Jiangning Zhang, Ran Yi et al.