Most Cited 2025 "diagram analysis" Papers
22,274 papers found • Page 15 of 112
Conference
STCOcc: Sparse Spatial-Temporal Cascade Renovation for 3D Occupancy and Scene Flow Prediction
Zhimin Liao, Ping Wei, Shuaijia Chen et al.
Diff2Flow: Training Flow Matching Models via Diffusion Model Alignment
Johannes Schusterbauer, Ming Gui, Frank Fundel et al.
Breaking the Data Barrier -- Building GUI Agents Through Task Generalization
Junlei Zhang, Zichen Ding, Chang Ma et al.
Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought'' Control
Hannah Cyberey, David Evans
More RLHF, More Trust? On The Impact of Preference Alignment On Trustworthiness
Aaron J. Li, Satyapriya Krishna, Hima Lakkaraju
Advancing Language Multi-Agent Learning with Credit Re-Assignment for Interactive Environment Generalization
Zhitao He, Zijun Liu, Peng Li et al.
Generalized Recorrupted-to-Recorrupted: Self-Supervised Learning Beyond Gaussian Noise
Brayan Monroy, Jorge Bacca, Julián Tachella
First SFT, Second RL, Third UPT: Continual Improving Multi-Modal LLM Reasoning via Unsupervised Post-Training
Lai Wei, Yuting Li, Chen Wang et al.
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text
Nikhil Kandpal, Brian Lester, Colin Raffel et al.
Aligning Human Motion Generation with Human Perceptions
Haoru Wang, Wentao Zhu, Luyi Miao et al.
Towards Multiple Character Image Animation Through Enhancing Implicit Decoupling
Jingyun Xue, WANG HongFa, Qi Tian et al.
EqNIO: Subequivariant Neural Inertial Odometry
Royina Karegoudra Jayanth, Yinshuang Xu, Ziyun Wang et al.
ResCLIP: Residual Attention for Training-free Dense Vision-language Inference
Jinhong Deng, Yuhang Yang, Wen Li et al.
A Single Goal is All You Need: Skills and Exploration Emerge from Contrastive RL without Rewards, Demonstrations, or Subgoals
Grace Liu, Michael Tang, Benjamin Eysenbach
Deep MMD Gradient Flow without adversarial training
Alexandre Galashov, Valentin De Bortoli, Arthur Gretton
Fluid Language Model Benchmarking
Valentin Hofmann, David Heineman, Ian Magnusson et al.
MagicColor: Multi-instance Sketch Colorization
yinhan Zhang, Yue Ma, Bingyuan Wang et al.
Gaussian Eigen Models for Human Heads
Wojciech Zielonka, Timo Bolkart, Thabo Beeler et al.
Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance
Sachin Goyal, Christina Baek, Zico Kolter et al.
Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program
Minghe Gao, Xuqi Liu, Zhongqi Yue et al.
Breach By A Thousand Leaks: Unsafe Information Leakage in 'Safe' AI Responses
David Glukhov, Ziwen Han, I Shumailov et al.
Rethinking Visual Counterfactual Explanations Through Region Constraint
Bartlomiej Sobieski, Jakub Grzywaczewski, Bartłomiej Sadlej et al.
Multiple Heads are Better than One: Mixture of Modality Knowledge Experts for Entity Representation Learning
Yichi Zhang, Zhuo Chen, Lingbing Guo et al.
The 3D-PC: a benchmark for visual perspective taking in humans and machines
Drew Linsley, Peisen Zhou, Alekh Ashok et al.
Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning of Vision Language Models
Huajie Tan, Yuheng Ji, Xiaoshuai Hao et al.
EDEN: Enhanced Diffusion for High-quality Large-motion Video Frame Interpolation
Zihao Zhang, Haoran Chen, Haoyu Zhao et al.
Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
Federico Cocchi, Nicholas Moratelli, Marcella Cornia et al.
MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL
Claas Voelcker, Marcel Hussing, ERIC EATON et al.
Causal Inference over Visual-Semantic-Aligned Graph for Image Classification
Lei Meng, Xiangxian Li, Xiaoshuo Yan et al.
Temporal Separation with Entropy Regularization for Knowledge Distillation in Spiking Neural Networks
Kairong Yu, Chengting Yu, Tianqing Zhang et al.
DocVLM: Make Your VLM an Efficient Reader
Mor Shpigel Nacson, Aviad Aberdam, Roy Ganz et al.
RGBAvatar: Reduced Gaussian Blendshapes for Online Modeling of Head Avatars
Linzhou Li, Yumeng Li, Yanlin Weng et al.
V2X-R: Cooperative LiDAR-4D Radar Fusion with Denoising Diffusion for 3D Object Detection
Xun Huang, Jinlong Wang, Qiming Xia et al.
Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting
Suraj Anand, Michael Lepori, Jack Merullo et al.
Label-Free Backdoor Attacks in Vertical Federated Learning
Wei Shen, Wenke Huang, Guancheng Wan et al.
Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs
Yikang Zhou, Tao Zhang, Shilin Xu et al.
PCDreamer: Point Cloud Completion Through Multi-view Diffusion Priors
Guangshun Wei, Yuan Feng, Long Ma et al.
Harnessing Massive Satellite Imagery with Efficient Masked Image Modeling
Fengxiang Wang, Hongzhen Wang, Di Wang et al.
Pareto Set Learning for Multi-Objective Reinforcement Learning
Erlong Liu, Yu-Chang Wu, Xiaobin Huang et al.
Noise Stability Optimization for Finding Flat Minima: A Hessian-based Regularization Approach
Haotian Ju, Hongyang Zhang, Dongyue Li
WildSAT: Learning Satellite Image Representations from Wildlife Observations
Rangel Daroya, Elijah Cole, Oisin Mac Aodha et al.
VTON 360: High-Fidelity Virtual Try-On from Any Viewing Direction
Zijian He, Yuwei Ning, Yipeng Qin et al.
SuperDec: 3D Scene Decomposition with Superquadrics Primitives
Elisabetta Fedele, Boyang Sun, Francis Engelmann et al.
LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models
Haiwen Huang, Anpei Chen, Volodymyr Havrylov et al.
$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training
Jin Zhou, Kaiwen Wang, Jonathan Chang et al.
Generative Monoculture in Large Language Models
Fan Wu, Emily Black, Varun Chandrasekaran
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
Gang Li, Ming Lin, Tomer Galanti et al.
ParZC: Parametric Zero-Cost Proxies for Efficient NAS
Peijie Dong, Lujun Li, Zhenheng Tang et al.
Video Summarization with Large Language Models
Min Jung Lee, Dayoung Gong, Minsu Cho
Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model
Zhaochong An, Guolei Sun, Yun Liu et al.
DOVE: Efficient One-Step Diffusion Model for Real-World Video Super-Resolution
Zheng Chen, Zichen Zou, Kewei Zhang et al.
Generalized and Efficient 2D Gaussian Splatting for Arbitrary-scale Super-Resolution
Du Chen, Liyi Chen, Zhengqiang ZHANG et al.
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float (DFloat11)
Tianyi Zhang, Mohsen Hariri, Shaochen (Henry) Zhong et al.
Knowledge Graph Completion with Relation-Aware Anchor Enhancement
Duanyang Yuan, Sihang Zhou, Xiaoshu Chen et al.
ReCap: Better Gaussian Relighting with Cross-Environment Captures
Jingzhi Li, Zongwei Wu, Eduard Zamfir et al.
Graph Neural Preconditioners for Iterative Solutions of Sparse Linear Systems
Jie Chen
Revisiting Multimodal Fusion for 3D Anomaly Detection from an Architectural Perspective
Kaifang Long, Guoyang Xie, Lianbo Ma et al.
MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-text Decoding
Weikang Qiu, Zheng Huang, Haoyu Hu et al.
Open-World Amodal Appearance Completion
Jiayang Ao, Yanbei Jiang, Qiuhong Ke et al.
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data
Yiren Song, Cheng Liu, Mike Zheng Shou
Anyprefer: An Agentic Framework for Preference Data Synthesis
Yiyang Zhou, Zhaoyang Wang, Tianle Wang et al.
Budgeted Online Continual Learning by Adaptive Layer Freezing and Frequency-based Sampling
Minhyuk Seo, Hyunseo Koh, Jonghyun Choi
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Haiyang Wang, Yue Fan, Muhammad Ferjad Naeem et al.
MINERVA: Evaluating Complex Video Reasoning
Arsha Nagrani, Sachit Menon, Ahmet Iscen et al.
Efficiently Parameterized Neural Metriplectic Systems
Anthony Gruber, Kookjin Lee, Haksoo Lim et al.
DSP: Dynamic Sequence Parallelism for Multi-Dimensional Transformers
Xuanlei Zhao, Shenggan Cheng, Chang Chen et al.
The Double-Ellipsoid Geometry of CLIP
Meir Yossef Levi, Guy Gilboa
Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling
Tsung-Han (Patrick) Wu, Heekyung Lee, Jiaxin Ge et al.
HIIF: Hierarchical Encoding based Implicit Image Function for Continuous Super-resolution
Yuxuan Jiang, Ho Man Kwan, jasmine peng et al.
A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks
Thomas Schmied, Thomas Adler, Vihang Patil et al.
MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards
Sheng Wang, Liheng Chen, Pengan CHEN et al.
DOTA: Distributional Test-time Adaptation of Vision-Language Models
Zongbo Han, Jialong Yang, Guangyu Wang et al.
Probing the Latent Hierarchical Structure of Data via Diffusion Models
Antonio Sclocchi, Alessandro Favero, Noam Levi et al.
SlerpFace: Face Template Protection via Spherical Linear Interpolation
Zhizhou Zhong, Yuxi Mi, Yuge Huang et al.
SpatialDreamer: Self-supervised Stereo Video Synthesis from Monocular Input
Zhen Lv, Yangqi Long, Congzhentao Huang et al.
DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation
Jing He, Haodong Li, huyongzhe et al.
Training-Free Guidance Beyond Differentiability: Scalable Path Steering with Tree Search in Diffusion and Flow Models
Yingqing Guo, Yukang Yang, Hui Yuan et al.
CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering
Tianyu Huai, Jie Zhou, Xingjiao Wu et al.
GRPose: Learning Graph Relations for Human Image Generation with Pose Priors
Xiangchen Yin, Donglin Di, Lei Fan et al.
Light-T2M: A Lightweight and Fast Model for Text-to-motion Generation
Ling-An Zeng, Guohong Huang, Gaojie Wu et al.
X-Fi: A Modality-Invariant Foundation Model for Multimodal Human Sensing
Xinyan Chen, Jianfei Yang
Trajectory Mamba: Efficient Attention-Mamba Forecasting Model Based on Selective SSM
Yizhou Huang, Yihua Cheng, Kezhi Wang
Measuring memorization in RLHF for code completion
Jamie Hayes, I Shumailov, Billy Porter et al.
Visual Test-time Scaling for GUI Agent Grounding
Tiange Luo, Lajanugen Logeswaran, Justin Johnson et al.
PhyMPGN: Physics-encoded Message Passing Graph Network for spatiotemporal PDE systems
Bocheng Zeng, Qi Wang, Mengtao Yan et al.
Hybrid Global-Local Representation with Augmented Spatial Guidance for Zero-Shot Referring Image Segmentation
Ting Liu, Siyuan Li
Harmony in Divergence: Towards Fast, Accurate, and Memory-efficient Zeroth-order LLM Fine-tuning
Qitao Tan, Jun Liu, Zheng Zhan et al.
HOIGPT: Learning Long-Sequence Hand-Object Interaction with Language Models
Mingzhen Huang, Fu-Jen Chu, Bugra Tekin et al.
Layered Image Vectorization via Semantic Simplification
Zhenyu Wang, Jianxi Huang, Zhida Sun et al.
Radiology Report Generation via Multi-objective Preference Optimization
Ting Xiao, Lei Shi, Peng Liu et al.
GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration
Yuchen Sun, Shanhui Zhao, Tao Yu et al.
Task-Agnostic Guided Feature Expansion for Class-Incremental Learning
Bowen Zheng, Da-Wei Zhou, Han-Jia Ye et al.
VTON-HandFit: Virtual Try-on for Arbitrary Hand Pose Guided by Hand Priors Embedding
Yujie Liang, Xiaobin Hu, Boyuan Jiang et al.
Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM
Yatai Ji, Jiacheng Zhang, Jie Wu et al.
CCIN: Compositional Conflict Identification and Neutralization for Composed Image Retrieval
Likai Tian, Jian Zhao, Zechao Hu et al.
Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think
Zhenyi Lu, Xiaoye Qu, Zhenyi Lu et al.
Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions
Siqiao Mu, Diego Klabjan
Text-to-CAD Generation Through Infusing Visual Feedback in Large Language Models
Ruiyu Wang, Yu Yuan, Shizhao Sun et al.
Node-Time Conditional Prompt Learning in Dynamic Graphs
Xingtong Yu, Zhenghao Liu, Xinming Zhang et al.
Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model
Dongki Kim, Wonbin Lee, Sung Ju Hwang
Spectral Image Tokenizer
Carlos Esteves, Mohammed Suhail, Ameesh Makadia
VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation
Wenhao Wang, Yi Yang
Reliable and Efficient Amortized Model-based Evaluation
Sang Truong, Yuheng Tu, Percy Liang et al.
RomanTex: Decoupling 3D-aware Rotary Positional Embedded Multi-Attention Network for Texture Synthesis
yifei feng, Mx Yang, Shuhui Yang et al.
Point2RBox-v2: Rethinking Point-supervised Oriented Object Detection with Spatial Layout Among Instances
Yi Yu, Botao Ren, Peiyuan Zhang et al.
DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents
Hao Li, Xiaogeng Liu, CHIU Chun et al.
Attention as a Hypernetwork
Simon Schug, Seijin Kobayashi, Yassir Akram et al.
Beyond Graphs: Can Large Language Models Comprehend Hypergraphs?
Yifan Feng, Chengwu Yang, Xingliang Hou et al.
Visual Generation Without Guidance
Huayu Chen, Kai Jiang, Kaiwen Zheng et al.
CLEVER: A Curated Benchmark for Formally Verified Code Generation
Amitayush Thakur, Jasper Lee, George Tsoukalas et al.
Tuning the Frequencies: Robust Training for Sinusoidal Neural Networks
Tiago Novello, Diana Aldana Moreno, André Araujo et al.
The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization
Jae-Won Chung, Jeff J. Ma, Ruofan Wu et al.
ADBA: Approximation Decision Boundary Approach for Black-Box Adversarial Attacks
Feiyang Wang, Xingquan Zuo, Hai Huang et al.
Needle Threading: Can LLMs Follow Threads Through Near-Million-Scale Haystacks?
Jonathan Roberts, Kai Han, Samuel Albanie
AlphaPO: Reward Shape Matters for LLM Alignment
Aman Gupta, Shao Tang, Qingquan Song et al.
Taylor Series-Inspired Local Structure Fitting Network for Few-shot Point Cloud Semantic Segmentation
Changshuo Wang, Shuting He, Xiang Fang et al.
Unifying 2D and 3D Vision-Language Understanding
Ayush Jain, Alexander Swerdlow, Yuzhou Wang et al.
A Closer Look at TabPFN v2: Understanding Its Strengths and Extending Its Capabilities
Han-Jia Ye, Si-Yang Liu, Wei-Lun (Harry) Chao
Unlearn and Burn: Adversarial Machine Unlearning Requests Destroy Model Accuracy
Yangsibo Huang, Daogao Liu, Lynn Chua et al.
QERA: an Analytical Framework for Quantization Error Reconstruction
Cheng Zhang, Jeffrey T. H. Wong, Can Xiao et al.
Transformer Copilot: Learning from The Mistake Log in LLM Fine-tuning
Jiaru Zou, Yikun Ban, Zihao Li et al.
CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models
Hao He, Ceyuan Yang, Shanchuan Lin et al.
DreamRelation: Bridging Customization and Relation Generation
Qingyu Shi, Lu Qi, Jianzong Wu et al.
Periodic Materials Generation using Text-Guided Joint Diffusion Model
KISHALAY DAS, Subhojyoti Khastagir, Pawan Goyal et al.
FormalAlign: Automated Alignment Evaluation for Autoformalization
Jianqiao Lu, Yingjia Wan, Yinya Huang et al.
PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation
Qihan Huang, Weilong Dai, Jinlong Liu et al.
Scaling Laws for Optimal Data Mixtures
Mustafa Shukor, Louis Bethune, Dan Busbridge et al.
Boosting Latent Diffusion with Perceptual Objectives
Tariq Berrada, Pietro Astolfi, Melissa Hall et al.
AdaRankGrad: Adaptive Gradient Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning
Yehonathan Refael, Jonathan Svirsky, Boris Shustin et al.
UCF-Crime-DVS: A Novel Event-Based Dataset for Video Anomaly Detection with Spiking Neural Networks
Yuanbin Qian, Shuhan Ye, Chong Wang et al.
Attention layers provably solve single-location regression
Pierre Marion, Raphaël Berthier, Gérard Biau et al.
Efficient Attention-Sharing Information Distillation Transformer for Lightweight Single Image Super-Resolution
Karam Park, Jae Woong Soh, Nam Ik Cho
Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation
Yuxuan Wang, Xuanyu Yi, Haohan Weng et al.
X-Dancer: Expressive Music to Human Dance Video Generation
Zeyuan Chen, Hongyi Xu, Guoxian Song et al.
Topological Blindspots: Understanding and Extending Topological Deep Learning Through the Lens of Expressivity
Yam Eitan, Yoav Gelberg, Guy Bar-Shalom et al.
DM-Adapter: Domain-Aware Mixture-of-Adapters for Text-Based Person Retrieval
Yating Liu, Zimo Liu, Xiangyuan Lan et al.
RayFlow: Instance-Aware Diffusion Acceleration via Adaptive Flow Trajectories
Huiyang Shao, Xin Xia, Yuhong Yang et al.
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
Yumeng Li, William H Beluch, Margret Keuper et al.
Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order Bounds
Zhiyong Wang, Dongruo Zhou, John C.S. Lui et al.
Hydra-SGG: Hybrid Relation Assignment for One-stage Scene Graph Generation
Minghan Chen, Guikun Chen, Wenguan Wang et al.
Open-World Reinforcement Learning over Long Short-Term Imagination
Jiajian Li, Qi Wang, Yunbo Wang et al.
PLeaS - Merging Models with Permutations and Least Squares
Anshul Nasery, Jonathan Hayase, Pang Wei Koh et al.
Efficient Gaussian Splatting for Monocular Dynamic Scene Rendering via Sparse Time-Variant Attribute Modeling
Hanyang Kong, Xingyi Yang, Xinchao Wang
Fast training and sampling of Restricted Boltzmann Machines
Nicolas BEREUX, Aurélien Decelle, Cyril Furtlehner et al.
Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model
Keda TAO, Jinjin Gu, Yulun Zhang et al.
On the Crucial Role of Initialization for Matrix Factorization
Bingcong Li, Liang Zhang, Aryan Mokhtari et al.
NeSyCoCo: A Neuro-Symbolic Concept Composer for Compositional Generalization
Danial Kamali, Elham J. Barezi, Parisa Kordjamshidi
Can Transformers Reason Logically? A Study in SAT Solving
Leyan Pan, Vijay Ganesh, Jacob Abernethy et al.
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence
Jie Feng, Shengyuan Wang, Tianhui Liu et al.
Towards Federated RLHF with Aggregated Client Preference for LLMs
Feijie Wu, Xiaoze Liu, Haoyu Wang et al.
NTPP: Generative Speech Language Modeling for Dual-Channel Spoken Dialogue via Next-Token-Pair Prediction
Qichao Wang, Ziqiao Meng, Wenqian Cui et al.
ROADWork: A Dataset and Benchmark for Learning to Recognize, Observe, Analyze and Drive Through Work Zones
Anurag Ghosh, Shen Zheng, Robert Tamburo et al.
RoMo: Robust Motion Segmentation Improves Structure from Motion
Lily Goli, Sara Sabour, Mark Matthews et al.
Constrain Alignment with Sparse Autoencoders
Qingyu Yin, Chak Tou Leong, Hongbo Zhang et al.
Adaptive Self-improvement LLM Agentic System for ML Library Development
Genghan Zhang, Weixin Liang, Olivia Hsu et al.
Scaling Inference-Efficient Language Models
Song Bian, Minghao Yan, Shivaram Venkataraman
AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation
zijie wu, Chaohui Yu, Fan Wang et al.
AVTrustBench: Assessing and Enhancing Reliability and Robustness in Audio-Visual LLMs
Sanjoy Chowdhury, Sayan Nag, Subhrajyoti Dasgupta et al.
FedMIA: An Effective Membership Inference Attack Exploiting "All for One" Principle in Federated Learning
Gongxi Zhu, Donghao Li, Hanlin Gu et al.
Flowing from Words to Pixels: A Noise-Free Framework for Cross-Modality Evolution
Qihao Liu, Xi Yin, Alan L. Yuille et al.
Plastic Learning with Deep Fourier Features
Alex Lewandowski, Dale Schuurmans, Marlos C. Machado
MatryoshkaKV: Adaptive KV Compression via Trainable Orthogonal Projection
Bokai Lin, Zihao Zeng, Zipeng Xiao et al.
DELT: A Simple Diversity-driven EarlyLate Training for Dataset Distillation
Zhiqiang Shen, Ammar Sherif, Zeyuan Yin et al.
CADDreamer: CAD Object Generation from Single-view Images
Yuan Li, Cheng Lin, Yuan Liu et al.
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead
Marlon Becker, Frederick Altrock, Benjamin Risse
Fragment and Geometry Aware Tokenization of Molecules for Structure-Based Drug Design Using Language Models
Cong Fu, Xiner Li, Blake Olson et al.
ObjectMover: Generative Object Movement with Video Prior
Xin Yu, Tianyu Wang, Soo Ye Kim et al.
STOP: Integrated Spatial-Temporal Dynamic Prompting for Video Understanding
Zichen Liu, Kunlun Xu, Bing Su et al.
TACO: Taming Diffusion for in-the-wild Video Amodal Completion
Ruijie Lu, Yixin Chen, Yu Liu et al.
FilterTS: Comprehensive Frequency Filtering for Multivariate Time Series Forecasting
Yulong Wang, Yushuo Liu, Xiaoyi Duan et al.
ThermalGaussian: Thermal 3D Gaussian Splatting
Rongfeng Lu, Hangyu Chen, Zunjie Zhu et al.
Gemstones: A Model Suite for Multi-Faceted Scaling Laws
Sean McLeish, John Kirchenbauer, David Miller et al.
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Weronika Ormaniec, Felix Dangel, Sidak Pal Singh
Towards a Comprehensive, Efficient and Promptable Anatomic Structure Segmentation Model Using 3D Whole-Body CT Scans
Heng Guo, Jianfeng Zhang, Jiaxing Huang et al.
RANKCLIP: Ranking-Consistent Language-Image Pretraining
Yiming Zhang, Zhuokai Zhao, Zhaorun Chen et al.
Intrinsic User-Centric Interpretability through Global Mixture of Experts
Vinitra Swamy, Syrielle Montariol, Julian Blackwell et al.
Reconstructing People, Places, and Cameras
Lea Müller, Hongsuk Choi, Anthony Zhang et al.
EgoExoBench: A Benchmark for First- and Third-person View Video Understanding in MLLMs
Yuping He, Yifei Huang, Guo Chen et al.
GraphAvatar: Compact Head Avatars with GNN-Generated 3D Gaussians
Xiaobao Wei, Peng Chen, Ming Lu et al.
Enhancing Multilingual LLM Pretraining with Model-Based Data Selection
Bettina Messmer, Vinko Sabolčec, Martin Jaggi
MIP against Agent: Malicious Image Patches Hijacking Multimodal OS Agents
Lukas Aichberger, Alasdair Paren, Guohao Li et al.
Hierarchical Mixture of Experts: Generalizable Learning for High-Level Synthesis
Weikai Li, Ding Wang, Zijian Ding et al.
A Recipe for Generating 3D Worlds from a Single Image
Katja Schwarz, Denis Rozumny, Samuel Rota Bulò et al.
BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation
Yuyang Peng, Shishi Xiao, Keming Wu et al.
Bayesian Concept Bottleneck Models with LLM Priors
Jean Feng, Avni Kothari, Lucas Zier et al.
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Qining Zhang, Lei Ying
DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding
Wenhui Liao, Jiapeng Wang, Hongliang Li et al.
DiTFastAttnV2: Head-wise Attention Compression for Multi-Modality Diffusion Transformers
Hanling Zhang, Rundong Su, Zhihang Yuan et al.
Atlas Gaussians Diffusion for 3D Generation
Haitao Yang, Yuan Dong, Hanwen Jiang et al.
Splatter-360: Generalizable 360 Gaussian Splatting for Wide-baseline Panoramic Images
Zheng Chen, Chenming Wu, Zhelun Shen et al.
InsightEdit: Towards Better Instruction Following for Image Editing
Yingjing Xu, Jie Kong, Jiazhi Wang et al.
Imagine and Seek: Improving Composed Image Retrieval with an Imagined Proxy
You Li, Fan Ma, Yi Yang
CoreMatching: A Co-adaptive Sparse Inference Framework with Token and Neuron Pruning for Comprehensive Acceleration of Vision-Language Models
Qinsi Wang, Hancheng Ye, Ming-Yu Chung et al.
CholecTrack20: A Multi-Perspective Tracking Dataset for Surgical Tools
Chinedu Innocent Nwoye, Kareem elgohary, Anvita A. Srinivas et al.
LoRA Subtraction for Drift-Resistant Space in Exemplar-Free Continual Learning
Xuan Liu, Xiaobin Chang
GenDeg: Diffusion-based Degradation Synthesis for Generalizable All-In-One Image Restoration
Sudarshan Rajagopalan, Nithin Gopalakrishnan Nair, Jay Paranjape et al.
RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards
jingnan zheng, Xiangtian Ji, Yijun Lu et al.
Energy-based Backdoor Defense Against Federated Graph Learning
Guancheng Wan, Zitong Shi, Wenke Huang et al.
SceneTAP: Scene-Coherent Typographic Adversarial Planner against Vision-Language Models in Real-World Environments
Yue Cao, Yun Xing, Jie Zhang et al.
FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Generalized Segmentation
Dong Zhao, Jinlong Li, Shuang Wang et al.