Most Cited 2025 "joint segmentation decoders" Papers
22,274 papers found • Page 9 of 112
Conference
Ca2-VDM: Efficient Autoregressive Video Diffusion Model with Causal Generation and Cache Sharing
Kaifeng Gao, Jiaxin Shi, Hanwang Zhang et al.
EffoVPR: Effective Foundation Model Utilization for Visual Place Recognition
Issar Tzachor, Boaz Lerner, Matan Levy et al.
Decompositional Neural Scene Reconstruction with Generative Diffusion Prior
Junfeng Ni, Yu Liu, Ruijie Lu et al.
SynFlowNet: Design of Diverse and Novel Molecules with Synthesis Constraints
Miruna Cretu, Charles Harris, Ilia Igashov et al.
AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation
Yukang Cao, Liang Pan, Kai Han et al.
Cubify Anything: Scaling Indoor 3D Object Detection
Justin Lazarow, David Griffiths, Gefen Kohavi et al.
Emerging Safety Attack and Defense in Federated Instruction Tuning of Large Language Models
Rui Ye, Jingyi Chai, Xiangrui Liu et al.
Raw2Drive: Reinforcement Learning with Aligned World Models for End-to-End Autonomous Driving (in CARLA v2)
Zhenjie Yang, Xiaosong Jia, Qifeng Li et al.
Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning
Simran Kaur, Simon Park, Anirudh Goyal et al.
SLMRec: Distilling Large Language Models into Small for Sequential Recommendation
Wujiang Xu, Qitian Wu, Zujie Liang et al.
Adaptive Rectangular Convolution for Remote Sensing Pansharpening
Xueyang Wang, Zhixin Zheng, Jiandong Shao et al.
Video Motion Transfer with Diffusion Transformers
Alexander Pondaven, Aliaksandr Siarohin, Sergey Tulyakov et al.
VSP: Diagnosing the Dual Challenges of Perception and Reasoning in Spatial Planning Tasks for MLLMs
Qiucheng Wu, Handong Zhao, Michael Saxon et al.
MSP-MVS: Multi-Granularity Segmentation Prior Guided Multi-View Stereo
Zhenlong Yuan, Cong Liu, Fei Shen et al.
Generalization through variance: how noise shapes inductive biases in diffusion models
John Vastola
Towards Adversarially Robust Dataset Distillation by Curvature Regularization
Eric Xue, Yijiang Li, Haoyang Liu et al.
You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs
Yihong Luo, Xiaolong Chen, Xinghua Qu et al.
Diversity-Aware Policy Optimization for Large Language Model Reasoning
Jian Yao, Ran Cheng, Xingyu Wu et al.
Analyzing and Boosting the Power of Fine-Grained Visual Recognition for Multi-modal Large Language Models
Hulingxiao He, Geng Li, Zijun Geng et al.
Hydra-NeXt: Robust Closed-Loop Driving with Open-Loop Training
Zhenxin Li, Shihao Wang, Shiyi Lan et al.
A Controlled Study on Long Context Extension and Generalization in LLMs
Yi Lu, Jing Nathan Yan, Songlin Yang et al.
Limits of Deep Learning: Sequence Modeling through the Lens of Complexity Theory
Nikola Zubic, Federico Soldà, Aurelio Sulser et al.
Commit0: Library Generation from Scratch
Wenting Zhao, Nan Jiang, Celine Lee et al.
Ref-GS: Directional Factorization for 2D Gaussian Splatting
Youjia Zhang, Anpei Chen, Yumin Wan et al.
HexGen-2: Disaggregated Generative Inference of LLMs in Heterogeneous Environment
YOUHE JIANG, Ran Yan, Binhang Yuan
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis
Jiangyong Huang, Baoxiong Jia, Yan Wang et al.
Perm: A Parametric Representation for Multi-Style 3D Hair Modeling
Chengan He, Xin Sun, Zhixin Shu et al.
UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing
Yiheng Li, RuiBing Hou, Hong Chang et al.
Vid2Avatar-Pro: Authentic Avatar from Videos in the Wild via Universal Prior
Chen Guo, Junxuan Li, Yash Kant et al.
MoonCast: High-Quality Zero-Shot Podcast Generation
Zeqian Ju, Dongchao Yang, Shen Kai et al.
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs
Jongwoo Ko, Tianyi Chen, Sungnyun Kim et al.
Block Verification Accelerates Speculative Decoding
Ziteng Sun, Uri Mendlovic, Yaniv Leviathan et al.
SketchAgent: Language-Driven Sequential Sketch Generation
Yael Vinker, Tamar Rott Shaham, Kristine Zheng et al.
Delta Decompression for MoE-based LLMs Compression
Hao Gu, Wei Li, Lujun Li et al.
Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning
Fan Lu, Wei Wu, Kecheng Zheng et al.
Merging on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging
Anke Tang, Enneng Yang, Li Shen et al.
Beyond Chemical QA: Evaluating LLM's Chemical Reasoning with Modular Chemical Operations
Li Hao, He CAO, Bin Feng et al.
Generative Multimodal Pretraining with Discrete Diffusion Timestep Tokens
Kaihang Pan, Wang Lin, Zhongqi Yue et al.
Palu: KV-Cache Compression with Low-Rank Projection
Chi-Chih Chang, Wei-Cheng Lin, Chien-Yu Lin et al.
What Kind of Visual Tokens Do We Need? Training-Free Visual Token Pruning for Multi-Modal Large Language Models from the Perspective of Graph
Yutao Jiang, Qiong Wu, Wenhao Lin et al.
BirdSet: A Large-Scale Dataset for Audio Classification in Avian Bioacoustics
Lukas Rauch, Raphael Schwinger, Moritz Wirth et al.
Bridging Distributional and Risk-sensitive Reinforcement Learning with Provable Regret Bounds
Hao Liang, Zhiquan Luo
Scaling Optimal LR Across Token Horizons
Johan Bjorck, Alon Benhaim, Vishrav Chaudhary et al.
Cross-Embodiment Dexterous Grasping with Reinforcement Learning
Haoqi Yuan, Bohan Zhou, Yuhui Fu et al.
CrossMPT: Cross-attention Message-passing Transformer for Error Correcting Codes
Seong-Joon Park, Hee-Youl Kwak, Sang-Hyo Kim et al.
Controlling Language and Diffusion Models by Transporting Activations
Pau Rodriguez, Arno Blaas, Michal Klein et al.
FatesGS: Fast and Accurate Sparse-View Surface Reconstruction Using Gaussian Splatting with Depth-Feature Consistency
Han Huang, Yulun Wu, Chao Deng et al.
CLIBD: Bridging Vision and Genomics for Biodiversity Monitoring at Scale
ZeMing Gong, Austin Wang, Xiaoliang Huo et al.
HumanDreamer: Generating Controllable Human-Motion Videos via Decoupled Generation
Boyuan Wang, Xiaofeng Wang, Chaojun Ni et al.
Benchmarking Predictive Coding Networks -- Made Simple
Luca Pinchetti, Chang Qi, Oleh Lokshyn et al.
ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning
Tonghe Zhang, Chao Yu, Sichang Su et al.
VLM-R³: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought
Chaoya Jiang, Yongrui Heng, Wei Ye et al.
TreeMeshGPT: Artistic Mesh Generation with Autoregressive Tree Sequencing
Stefan Lionar, Jiabin Liang, Gim Hee Lee
4Real-Video: Learning Generalizable Photo-Realistic 4D Video Diffusion
Chaoyang Wang, Peiye Zhuang, Tuan Duc Ngo et al.
HoliTom: Holistic Token Merging for Fast Video Large Language Models
Kele Shao, Keda TAO, Can Qin et al.
Encryption-Friendly LLM Architecture
Donghwan Rho, Taeseong Kim, Minje Park et al.
GenFusion: Closing the Loop between Reconstruction and Generation via Videos
Sibo Wu, Congrong Xu, Binbin Huang et al.
Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos
Yi Chen, Yuying Ge, Weiliang Tang et al.
Discretization-invariance? On the Discretization Mismatch Errors in Neural Operators
Wenhan Gao, Ruichen Xu, Yuefan Deng et al.
Diffusion$^2$: Dynamic 3D Content Generation via Score Composition of Video and Multi-view Diffusion Models
Zeyu Yang, Zijie Pan, Chun Gu et al.
SpikingSSMs: Learning Long Sequences with Sparse and Parallel Spiking State Space Models
Shuaijie Shen, Chao Wang, Renzhuo Huang et al.
Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search
Yuichi Inoue, Kou Misaki, Yuki Imajuku et al.
Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset
Xiao Wang, Yu Jin, Wentao Wu et al.
Boosting Neural Combinatorial Optimization for Large-Scale Vehicle Routing Problems
Fu Luo, Xi Lin, Yaoxin Wu et al.
MarS: a Financial Market Simulation Engine Powered by Generative Foundation Model
Junjie Li, Yang Liu, Weiqing Liu et al.
X-NeMo: Expressive Neural Motion Reenactment via Disentangled Latent Attention
XiaoChen Zhao, Hongyi Xu, Guoxian Song et al.
Gradient-Free Generation for Hard-Constrained Systems
Chaoran Cheng, Boran Han, Danielle Maddix et al.
Structure-Adaptive Multi-View Graph Clustering for Remote Sensing Data
Renxiang Guan, Wenxuan Tu, Siwei Wang et al.
TIME-FS: Joint Learning of Tensorial Incomplete Multi-View Unsupervised Feature Selection and Missing-View Imputation
Yanyong Huang, Minghui Lu, Wei Huang et al.
Hash3D: Training-free Acceleration for 3D Generation
Xingyi Yang, Songhua Liu, Xinchao Wang
CAD-Recode: Reverse Engineering CAD Code from Point Clouds
Danila Rukhovich, Elona Dupont, Dimitrios Mallis et al.
Air Quality Prediction with Physics-Guided Dual Neural ODEs in Open Systems
jindong tian, Yuxuan Liang, Ronghui Xu et al.
Logic-in-Frames: Dynamic Keyframe Search via Visual Semantic-Logical Verification for Long Video Understanding
Weiyu Guo, Ziyang Chen, Shaoguang WANG et al.
The Illusion of Empathy: How AI Chatbots Shape Conversation Perception
Tingting Liu, Salvatore Giorgi, Ankit Aich et al.
Learning to Discretize Denoising Diffusion ODEs
Vinh Tong, Trung-Dung Hoang, Anji Liu et al.
OmniGuard: Hybrid Manipulation Localization via Augmented Versatile Deep Image Watermarking
Xuanyu Zhang, Zecheng Tang, Zhipei Xu et al.
Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book?
Seth Aycock, David Stap, Di Wu et al.
Playing the Fool: Jailbreaking LLMs and Multimodal LLMs with Out-of-Distribution Strategy
Joonhyun Jeong, Seyun Bae, Yeonsung Jung et al.
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization
Bhavya, Stelian Coros, Andreas Krause et al.
FreeSplatter: Pose-free Gaussian Splatting for Sparse-view 3D Reconstruction
Jiale Xu, Shenghua Gao, Ying Shan
A Label-free Heterophily-guided Approach for Unsupervised Graph Fraud Detection
Junjun Pan, Yixin Liu, Xin Zheng et al.
Non-myopic Generation of Language Models for Reasoning and Planning
Chang Ma, Haiteng Zhao, Junlei Zhang et al.
Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency
Jerry Yao-Chieh Hu, Wei-Po Wang, Ammar Gilani et al.
A Quantum Circuit-Based Compression Perspective for Parameter-Efficient Learning
Chen-Yu Liu, Chao-Han Huck Yang, Hsi-Sheng Goan et al.
Graph Sparsification via Mixture of Graphs
Guibin Zhang, Xiangguo SUN, Yanwei Yue et al.
MetaMetrics: Calibrating Metrics for Generation Tasks Using Human Preferences
Genta Winata, David Anugraha, Lucky Susanto et al.
Learning 3D Persistent Embodied World Models
Siyuan Zhou, Yilun Du, Yuncong Yang et al.
Large Images Are Gaussians: High-Quality Large Image Representation with Levels of 2D Gaussian Splatting
Lingting Zhu, Guying Lin, Jinnan Chen et al.
ViLLa: Video Reasoning Segmentation with Large Language Model
rongkun Zheng, Lu Qi, Xi Chen et al.
Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment
Xiaojun Jia, Sensen Gao, Simeng Qin et al.
Optimal Transport for Time Series Imputation
Hao Wang, zhengnan li, Haoxuan Li et al.
No Preference Left Behind: Group Distributional Preference Optimization
Binwei Yao, Zefan Cai, Yun-Shiuan Chuang et al.
DashGaussian: Optimizing 3D Gaussian Splatting in 200 Seconds
Youyu Chen, Junjun Jiang, Kui Jiang et al.
DarkBench: Benchmarking Dark Patterns in Large Language Models
Esben Kran, Hieu Minh Nguyen, Akash Kundu et al.
Mamba as a Bridge: Where Vision Foundation Models Meet Vision Language Models for Domain-Generalized Semantic Segmentation
Xin Zhang, Robby T. Tan
VITA-Audio: Fast Interleaved Audio-Text Token Generation for Efficient Large Speech-Language Model
Zuwei Long, Yunhang Shen, Chaoyou Fu et al.
USP: Unified Self-Supervised Pretraining for Image Generation and Understanding
Xiangxiang Chu, Renda Li, Yong Wang
Adaptive Methods through the Lens of SDEs: Theoretical Insights on the Role of Noise
Enea Monzio Compagnoni, Tianlin Liu, Rustem Islamov et al.
Depth Any Camera: Zero-Shot Metric Depth Estimation from Any Camera
Yuliang Guo, Sparsh Garg, S. Mahdi H. Miangoleh et al.
Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation
Jiaming Zhou, Teli Ma, Kun-Yu Lin et al.
DoomArena: A framework for Testing AI Agents Against Evolving Security Threats
Léo Boisvert, Abhay Puri, Gabriel Huang et al.
V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
Junqi Ge, Ziyi Chen, Jintao Lin et al.
Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling
Junha Hyung, Kinam Kim, Susung Hong et al.
QiMeng-CodeV-R1: Reasoning-Enhanced Verilog Generation
Yaoyu Zhu, Di Huang, Hanqi Lyu et al.
From Risk to Uncertainty: Generating Predictive Uncertainty Measures via Bayesian Estimation
Nikita Kotelevskii, Vladimir Kondratyev, Martin Takáč et al.
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
Yue Yang, Shuibo Zhang, Kaipeng Zhang et al.
INFP: Audio-Driven Interactive Head Generation in Dyadic Conversations
Yongming Zhu, Longhao Zhang, Zhengkun Rong et al.
Towards Universality: Studying Mechanistic Similarity Across Language Model Architectures
Junxuan Wang, Xuyang Ge, Wentao Shu et al.
Spiking Transformer with Spatial-Temporal Attention
Donghyun Lee, Yuhang Li, Youngeun Kim et al.
Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image Retrieval
Yuanmin Tang, Jue Zhang, Xiaoting Qin et al.
Unifying Unsupervised Graph-Level Anomaly Detection and Out-of-Distribution Detection: A Benchmark
Yili Wang, Yixin Liu, Xu Shen et al.
Learning Clustering-based Prototypes for Compositional Zero-Shot Learning
Hongyu Qu, Jianan Wei, Xiangbo Shu et al.
PO3AD: Predicting Point Offsets toward Better 3D Point Cloud Anomaly Detection
Jianan Ye, Weiguang Zhao, Xi Yang et al.
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
Yuqing Wang, Zhijie Lin, Yao Teng et al.
Can LVLMs Obtain a Driver’s License? A Benchmark Towards Reliable AGI for Autonomous Driving
Yuhang Lu, Yichen Yao, Jiadong Tu et al.
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning
Yue Liu, Shengfang Zhai, Mingzhe Du et al.
Neighboring Autoregressive Modeling for Efficient Visual Generation
Yefei He, Yuanyu He, Shaoxuan He et al.
Learning Interpretable Hierarchical Dynamical Systems Models from Time Series Data
Manuel Brenner, Elias Weber, Georgia Koppe et al.
CyberHost: A One-stage Diffusion Framework for Audio-driven Talking Body Generation
Gaojie Lin, Jianwen Jiang, Chao Liang et al.
OmniKV: Dynamic Context Selection for Efficient Long-Context LLMs
Jitai Hao, Yuke Zhu, Tian Wang et al.
Generative Flows on Synthetic Pathway for Drug Design
Seonghwan Seo, Minsu Kim, Tony Shen et al.
Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models
Matvei Popov, Peter Robicheaux, Anish Madan et al.
RainbowPO: A Unified Framework for Combining Improvements in Preference Optimization
Hanyang Zhao, Genta Winata, Anirban Das et al.
Reasoning of Large Language Models over Knowledge Graphs with Super-Relations
Song Wang, Junhong Lin, Xiaojie Guo et al.
LoRA-FAIR: Federated LoRA Fine-Tuning with Aggregation and Initialization Refinement
Jieming Bian, Lei Wang, Letian Zhang et al.
TULIP: Token-length Upgraded CLIP
Ivona Najdenkoska, Mohammad Mahdi Derakhshani, Yuki Asano et al.
A Probabilistic Perspective on Unlearning and Alignment for Large Language Models
Yan Scholten, Stephan Günnemann, Leo Schwinn
VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning
Yongshuo Zong, Ondrej Bohdal, Timothy Hospedales
Collab-RAG: Boosting Retrieval-Augmented Generation for Complex Question Answering via White-Box and Black-Box LLM Collaboration
Ran Xu, Wenqi Shi, Yuchen Zhuang et al.
Exploring the Limits of Vision-Language-Action Manipulation in Cross-task Generalization
Jiaming Zhou, Ke Ye, Jiayi Liu et al.
Mellow: a small audio language model for reasoning
Soham Deshmukh, Satvik Dixit, Rita Singh et al.
GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation
Hongyin Zhang, Pengxiang Ding, Shangke Lyu et al.
SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation
Yining Hong, Beide Liu, Maxine Wu et al.
Accelerating neural network training: An analysis of the AlgoPerf competition
Priya Kasimbeg, Frank Schneider, Runa Eschenhagen et al.
Spatial457: A Diagnostic Benchmark for 6D Spatial Reasoning of Large Mutimodal Models
Xingrui Wang, Wufei Ma, Tiezheng Zhang et al.
SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding
Mingze Xu, Mingfei Gao, Shiyu Li et al.
BlueSuffix: Reinforced Blue Teaming for Vision-Language Models Against Jailbreak Attacks
Yunhan Zhao, Xiang Zheng, Lin Luo et al.
M²IV: Towards Efficient and Fine-grained Multimodal In-Context Learning via Representation Engineering
Yanshu Li, Yi Cao, Hongyang He et al.
DeLLMa: Decision Making Under Uncertainty with Large Language Models
Ollie Liu, Deqing Fu, Dani Yogatama et al.
Sloth: scaling laws for LLM skills to predict multi-benchmark performance across families
Felipe Maia Polo, Seamus Somerstep, Leshem Choshen et al.
Video-T1: Test-time Scaling for Video Generation
Fangfu Liu, Hanyang Wang, Yimo Cai et al.
UniRestore: Unified Perceptual and Task-Oriented Image Restoration Model Using Diffusion Prior
I-Hsiang Chen, Wei-Ting Chen, Yu-Wei Liu et al.
Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding
Xiaoyi Zhang, Zhaoyang Jia, Zongyu Guo et al.
Detecting Out-of-Distribution Through the Lens of Neural Collapse
Litian Liu, Yao Qin
Meta-Unlearning on Diffusion Models: Preventing Relearning Unlearned Concepts
Hongcheng Gao, Tianyu Pang, Chao Du et al.
AVHBench: A Cross-Modal Hallucination Benchmark for Audio-Visual Large Language Models
Kim Sung-Bin, Oh Hyun-Bin, Lee Jung-Mok et al.
Improving Reasoning Performance in Large Language Models via Representation Engineering
Bertram Højer, Oliver Jarvis, Stefan Heinrich
3DRealCar: An In-the-wild RGB-D Car Dataset with 360-degree Views
Xiaobiao Du, Yida Wang, Haiyang Sun et al.
Fine-Tuning Discrete Diffusion Models with Policy Gradient Methods
Oussama Zekri, Nicolas Boulle
VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior
Xindi Yang, Baolu Li, Yiming Zhang et al.
Swift4D: Adaptive divide-and-conquer Gaussian Splatting for compact and efficient reconstruction of dynamic scene
Jiahao Wu, Rui Peng, Zhiyan Wang et al.
Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks
Lehan Wang, Haonan Wang, Honglong Yang et al.
Provable Benefit of Annealed Langevin Monte Carlo for Non-log-concave Sampling
Wei Guo, Molei Tao, Yongxin Chen
Controllable Context Sensitivity and the Knob Behind It
Julian Minder, Kevin Du, Niklas Stoehr et al.
Design Principle Transfer in Neural Architecture Search via Large Language Models
Xun Zhou, Xingyu Wu, Liang Feng et al.
u-$\mu$P: The Unit-Scaled Maximal Update Parametrization
Charles Blake, Constantin Eichenberg, Josef Dean et al.
PointOBB-v2: Towards Simpler, Faster, and Stronger Single Point Supervised Oriented Object Detection
Botao Ren, Xue Yang, Yi Yu et al.
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision
Orr Zohar, Xiaohan Wang, Yonatan Bitton et al.
GEM: Empowering MLLM for Grounded ECG Understanding with Time Series and Images
Xiang Lan, Feng Wu, Kai He et al.
Visual Description Grounding Reduces Hallucinations and Boosts Reasoning in LVLMs
Sreyan Ghosh, Chandra Kiran Evuru, Sonal Kumar et al.
GraphMaster: Automated Graph Synthesis via LLM Agents in Data-Limited Environments
Enjun Du, Xunkai Li, Tian Jin et al.
FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
Hang Hua, Qing Liu, Lingzhi Zhang et al.
Learning Precise Affordances from Egocentric Videos for Robotic Manipulation
Li, Nikolaos Tsagkas, Jifei Song et al.
AlignMamba: Enhancing Multimodal Mamba with Local and Global Cross-modal Alignment
Yan Li, Yifei Xing, Xiangyuan Lan et al.
MMR: A Large-scale Benchmark Dataset for Multi-target and Multi-granularity Reasoning Segmentation
Donggon Jang, Yucheol Cho, Suin Lee et al.
Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters
Yuan Wang, Ouxiang Li, Tingting Mu et al.
Idiosyncrasies in Large Language Models
Mingjie Sun, Yida Yin, Zhiqiu (Oscar) Xu et al.
Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset
Yiqun Mei, Mingming He, Li Ma et al.
On the Guidance of Flow Matching
Ruiqi Feng, Chenglei Yu, Wenhao Deng et al.
Unleashing the Potential of Multi-modal Foundation Models and Video Diffusion for 4D Dynamic Physical Scene Simulation
Zhuoman Liu, Weicai Ye, Yan Luximon et al.
HaWoR: World-Space Hand Motion Reconstruction from Egocentric Videos
Jinglei Zhang, Jiankang Deng, Chao Ma et al.
MLLMs Need 3D-Aware Representation Supervision for Scene Understanding
Xiaohu Huang, Jingjing Wu, Qunyi Xie et al.
CBGBench: Fill in the Blank of Protein-Molecule Complex Binding Graph
Haitao Lin, Guojiang Zhao, Odin Zhang et al.
ThinkBot: Embodied Instruction Following with Thought Chain Reasoning
Guanxing Lu, Ziwei Wang, Changliu Liu et al.
EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing
Haotian Sun, Tao Lei, Bowen Zhang et al.
Lightning-Fast Image Inversion and Editing for Text-to-Image Diffusion Models
Dvir Samuel, Barak Meiri, Haggai Maron et al.
Free4D: Tuning-free 4D Scene Generation with Spatial-Temporal Consistency
Tianqi Liu, Zihao Huang, Zhaoxi Chen et al.
Monte Carlo Tree Diffusion for System 2 Planning
Jaesik Yoon, Hyeonseo Cho, Doojin Baek et al.
Grokking at the Edge of Numerical Stability
Lucas Prieto, Melih Barsbey, Pedro Mediano et al.
Personalized Federated Collaborative Filtering: A Variational AutoEncoder Approach
Zhiwei Li, Guodong Long, Tianyi Zhou et al.
VeriThinker: Learning to Verify Makes Reasoning Model Efficient
Zigeng Chen, Xinyin Ma, Gongfan Fang et al.
Model Equality Testing: Which Model is this API Serving?
Irena Gao, Percy Liang, Carlos Guestrin
OBI-Bench: Can LMMs Aid in Study of Ancient Script on Oracle Bones?
Zijian Chen, tingzhu chen, Wenjun Zhang et al.
xLSTM-Mixer: Multivariate Time Series Forecasting by Mixing via Scalar Memories
Maurice Kraus, Felix Divo, Devendra Singh Dhami et al.
Reinforce LLM Reasoning through Multi-Agent Reflection
Yurun Yuan, Tengyang Xie
The Scene Language: Representing Scenes with Programs, Words, and Embeddings
Yunzhi Zhang, Zizhang Li, Matt Zhou et al.
MLLM-as-a-Judge for Image Safety without Human Labeling
Zhenting Wang, Shuming Hu, Shiyu Zhao et al.
Trust, But Verify: A Self-Verification Approach to Reinforcement Learning with Verifiable Rewards
Xiaoyuan Liu, Tian Liang, Zhiwei He et al.
ROS-SAM: High-Quality Interactive Segmentation for Remote Sensing Moving Object
Zhe Shan, Yang Liu, Lei Zhou et al.
Accelerating Multimodal Large Language Models by Searching Optimal Vision Token Reduction
Shiyu Zhao, Zhenting Wang, Felix Juefei-Xu et al.
Mimir: Improving Video Diffusion Models for Precise Text Understanding
Shuai Tan, Biao Gong, Yutong Feng et al.
Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation and Reconstruction
Yuanhao Cai, He Zhang, Kai Zhang et al.
Instant3dit: Multiview Inpainting for Fast Editing of 3D Objects
Amir Barda, Matheus Gadelha, Vladimir G. Kim et al.
Does SGD really happen in tiny subspaces?
Minhak Song, Kwangjun Ahn, Chulhee Yun
RefactorBench: Evaluating Stateful Reasoning in Language Agents Through Code
Dhruv Gautam, Spandan Garg, Jinu Jang et al.
MMDisCo: Multi-Modal Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation
Akio Hayakawa, Masato Ishii, Takashi Shibuya et al.
SensorLM: Learning the Language of Wearable Sensors
Yuwei Zhang, Kumar Ayush, Siyuan Qiao et al.
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao, Chenlu Ye, Quanquan Gu et al.
Tighter Privacy Auditing of DP-SGD in the Hidden State Threat Model
Tudor Cebere, Aurélien Bellet, Nicolas Papernot
ContextGNN: Beyond Two-Tower Recommendation Systems
Yiwen Yuan, Zecheng Zhang, Xinwei He et al.