Most Cited 2025 "hierarchical joint embedding" Papers
22,274 papers found • Page 24 of 112
Conference
FormalAlign: Automated Alignment Evaluation for Autoformalization
Jianqiao Lu, Yingjia Wan, Yinya Huang et al.
Self-Improving Embodied Foundation Models
Seyed Kamyar Seyed Ghasemipour, Ayzaan Wahid, Jonathan Tompson et al.
ReCap: Better Gaussian Relighting with Cross-Environment Captures
Jingzhi Li, Zongwei Wu, Eduard Zamfir et al.
SliderSpace: Decomposing the Visual Capabilities of Diffusion Models
Rohit Gandikota, Zongze Wu, Richard Zhang et al.
AniMer: Animal Pose and Shape Estimation Using Family Aware Transformer
Jin Lyu, Tianyi Zhu, Yi Gu et al.
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Weronika Ormaniec, Felix Dangel, Sidak Pal Singh
VERA: Explainable Video Anomaly Detection via Verbalized Learning of Vision-Language Models
Muchao Ye, Weiyang Liu, Pan He
SfM-Free 3D Gaussian Splatting via Hierarchical Training
Bo Ji, Angela Yao
Continuous Diffusion for Mixed-Type Tabular Data
Markus Mueller, Kathrin Gruber, Dennis Fok
Volumetrically Consistent 3D Gaussian Rasterization
Chinmay Talegaonkar, Yash Belhe, Ravi Ramamoorthi et al.
Step-by-Step Reasoning for Math Problems via Twisted Sequential Monte Carlo
Shengyu Feng, Xiang Kong, shuang ma et al.
Radiology Report Generation via Multi-objective Preference Optimization
Ting Xiao, Lei Shi, Peng Liu et al.
IRIS: Inverse Rendering of Indoor Scenes from Low Dynamic Range Images
Chih-Hao Lin, Jia-Bin Huang, Zhengqin Li et al.
Embodied Scene Understanding for Vision Language Models via MetaVQA
Weizhen Wang, Chenda Duan, Zhenghao Peng et al.
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics
Dongyoung Kim, Huiwon Jang, Sumin Park et al.
Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training
Haicheng Wang, Chen Ju, Weixiong Lin et al.
Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding
Wenxuan Guo, Xiuwei Xu, Ziwei Wang et al.
ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation
Mengyang Wu, Yuzhi Zhao, Jialun Cao et al.
SpatialDreamer: Self-supervised Stereo Video Synthesis from Monocular Input
Zhen Lv, Yangqi Long, Congzhentao Huang et al.
Gaussian Mixture Flow Matching Models
Hansheng Chen, Kai Zhang, Hao Tan et al.
GraphAvatar: Compact Head Avatars with GNN-Generated 3D Gaussians
Xiaobao Wei, Peng Chen, Ming Lu et al.
CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering
Tianyu Huai, Jie Zhou, Xingjiao Wu et al.
Rashomon Sets for Prototypical-Part Networks: Editing Interpretable Models in Real-Time
Jon Donnelly, Zhicheng Guo, Alina Jade Barnett et al.
SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction
Yang Zhou, Hao Shao, Letian Wang et al.
Reward Fine-Tuning Two-Step Diffusion Models via Learning Differentiable Latent-Space Surrogate Reward
Zhiwei Jia, Yuesong Nan, Huixi Zhao et al.
MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation
Jiaxin Huang, Runnan Chen, Ziwen Li et al.
Deep Linear Probe Generators for Weight Space Learning
Jonathan Kahana, Eliahu Horwitz, Imri Shuval et al.
Safety Reasoning with Guidelines
Haoyu Wang, Zeyu Qin, Li Shen et al.
Towards Satellite Image Road Graph Extraction: A Global-Scale Dataset and A Novel Method
Pan Yin, Kaiyu Li, Xiangyong Cao et al.
Trajectory Mamba: Efficient Attention-Mamba Forecasting Model Based on Selective SSM
Yizhou Huang, Yihua Cheng, Kezhi Wang
Scaling Stick-Breaking Attention: An Efficient Implementation and In-depth Study
Shawn Tan, Songlin Yang, Aaron Courville et al.
Self-Improving Language Models for Evolutionary Program Synthesis: A Case Study on ARC-AGI
Julien Pourcel, Cédric Colas, Pierre-Yves Oudeyer
PENCIL: Long Thoughts with Short Memory
Chenxiao Yang, Nati Srebro, David McAllester et al.
AnimateAnyMesh: A Feed-Forward 4D Foundation Model for Text-Driven Universal Mesh Animation
zijie wu, Chaohui Yu, Fan Wang et al.
Hybrid Global-Local Representation with Augmented Spatial Guidance for Zero-Shot Referring Image Segmentation
Ting Liu, Siyuan Li
GTR: Improving Large 3D Reconstruction Models through Geometry and Texture Refinement
Peiye Zhuang, Songfang Han, Chaoyang Wang et al.
The Non-Linear Representation Dilemma: Is Causal Abstraction Enough for Mechanistic Interpretability?
Denis Sutter, Julian Minder, Thomas Hofmann et al.
Beyond the convexity assumption: Realistic tabular data generation under quantifier-free real linear constraints
Mihaela Stoian, Eleonora Giunchiglia
UltraFusion: Ultra High Dynamic Imaging using Exposure Fusion
Zixuan Chen, Yujin Wang, Xin Cai et al.
Data-Driven Performance Guarantees for Classical and Learned Optimizers
Rajiv Sambharya, Bartolomeo Stellato
VTON-HandFit: Virtual Try-on for Arbitrary Hand Pose Guided by Hand Priors Embedding
Yujie Liang, Xiaobin Hu, Boyuan Jiang et al.
CCIN: Compositional Conflict Identification and Neutralization for Composed Image Retrieval
Likai Tian, Jian Zhao, Zechao Hu et al.
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models
Liyan Tang, Grace Kim, Xinyu Zhao et al.
Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think
Zhenyi Lu, Xiaoye Qu, Zhenyi Lu et al.
Hydra-SGG: Hybrid Relation Assignment for One-stage Scene Graph Generation
Minghan Chen, Guikun Chen, Wenguan Wang et al.
SEFE: Superficial and Essential Forgetting Eliminator for Multimodal Continual Instruction Tuning
Jinpeng Chen, Runmin Cong, Yuzhi Zhao et al.
M2OST: Many-to-one Regression for Predicting Spatial Transcriptomics from Digital Pathology Images
Hongyi Wang, Xiuju Du, Jing Liu et al.
ADBA: Approximation Decision Boundary Approach for Black-Box Adversarial Attacks
Feiyang Wang, Xingquan Zuo, Hai Huang et al.
TACO: Taming Diffusion for in-the-wild Video Amodal Completion
Ruijie Lu, Yixin Chen, Yu Liu et al.
Large Convolutional Model Tuning via Filter Subspace
Wei Chen, Zichen Miao, Qiang Qiu
RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance
Chengrui Wang, Pengfei Liu, Min Zhou et al.
Unisolver: PDE-Conditional Transformers Towards Universal Neural PDE Solvers
Hang Zhou, Yuezhou Ma, Haixu Wu et al.
A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts
Suyu Ge, Xihui Lin, Yunan Zhang et al.
Constrain Alignment with Sparse Autoencoders
Qingyu Yin, Chak Tou Leong, Hongbo Zhang et al.
FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion
Akide Liu, Zeyu Zhang, Zhexin Li et al.
Think Then React: Towards Unconstrained Action-to-Reaction Motion Generation
Wenhui Tan, Boyuan Li, Chuhao Jin et al.
Tuning the Frequencies: Robust Training for Sinusoidal Neural Networks
Tiago Novello, Diana Aldana Moreno, André Araujo et al.
BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models
Zenghui Yuan, Jiawen Shi, Pan Zhou et al.
SoMA: Singular Value Decomposed Minor Components Adaptation for Domain Generalizable Representation Learning
Seokju Yun, Seunghye Chae, Dongheon Lee et al.
STAA-SNN: Spatial-Temporal Attention Aggregator for Spiking Neural Networks
Tianqing Zhang, Kairong Yu, Xian Zhong et al.
Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation
Chanyoung Kim, Dayun Ju, Woojung Han et al.
PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution
Zhu Li Bo, Jianze Li, Haotong Qin et al.
Deformable Radial Kernel Splatting
Yihua Huang, Mingxian Lin, Yangtian Sun et al.
Efficient Model Editing with Task-Localized Sparse Fine-tuning
Leonardo Iurada, Marco Ciccone, Tatiana Tommasi
Lessons and Insights from a Unifying Study of Parameter-Efficient Fine-Tuning (PEFT) in Visual Recognition
Zheda Mai, Ping Zhang, Cheng-Hao Tu et al.
IterGen: Iterative Semantic-aware Structured LLM Generation with Backtracking
Shubham Dipak Ugare, Rohan Gumaste, Tarun Suresh et al.
Quartet: Native FP4 Training Can Be Optimal for Large Language Models
Roberto Castro, Andrei Panferov, Rush Tabesh et al.
Adversarial Training Can Provably Improve Robustness: Theoretical Analysis of Feature Learning Process Under Structured Data
Binghui Li, Yuanzhi Li
GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance
Jinuk Kim, Marwa El Halabi, Wonpyo Park et al.
On the Training Convergence of Transformers for In-Context Classification of Gaussian Mixtures
Wei Shen, Ruida Zhou, Jing Yang et al.
OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts
Yuxuan Wang, Yueqian Wang, Bo Chen et al.
Lifting the Veil on Visual Information Flow in MLLMs: Unlocking Pathways to Faster Inference
Hao Yin, Guangzong Si, Zilei Wang
R.I.P.: Better Models by Survival of the Fittest Prompts
Ping Yu, Weizhe Yuan, Olga Golovneva et al.
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference
Qining Zhang, Lei Ying
Atlas Gaussians Diffusion for 3D Generation
Haitao Yang, Yuan Dong, Hanwen Jiang et al.
Intrinsic User-Centric Interpretability through Global Mixture of Experts
Vinitra Swamy, Syrielle Montariol, Julian Blackwell et al.
MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge
yuntao du, Kailin Jiang, Zhi Gao et al.
Node-Time Conditional Prompt Learning in Dynamic Graphs
Xingtong Yu, Zhenghao Liu, Xinming Zhang et al.
DreamRelation: Bridging Customization and Relation Generation
Qingyu Shi, Lu Qi, Jianzong Wu et al.
Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice
Jian-Qiao Zhu, Haijiang Yan, Thomas L. Griffiths
MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data
Hanwen Jiang, Zexiang Xu, Desai Xie et al.
GLASS: Guided Latent Slot Diffusion for Object-Centric Learning
Krishnakant Singh, Simone Schaub-Meyer, Stefan Roth
Language Guided Concept Bottleneck Models for Interpretable Continual Learning
Lu Yu, HaoYu Han, Zhe Tao et al.
$\infty$-Video: A Training-Free Approach to Long Video Understanding via Continuous-Time Memory Consolidation
Saúl Santos, António Farinhas, Daniel McNamee et al.
O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models
Ashshak Sharifdeen, Muhammad Akhtar Munir, Sanoojan Baliah et al.
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead
Marlon Becker, Frederick Altrock, Benjamin Risse
PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation
Qihan Huang, Weilong Dai, Jinlong Liu et al.
Neural Video Compression with Context Modulation
Chuanbo Tang, Zhuoyuan Li, Yifan Bian et al.
SurFhead: Affine Rig Blending for Geometrically Accurate 2D Gaussian Surfel Head Avatars
Jaeseong Lee, Taewoong Kang, Marcel Buehler et al.
Antidistillation Sampling
Yash Savani, Asher Trockman, Zhili Feng et al.
A Closer Look at Multimodal Representation Collapse
Abhra Chaudhuri, Anjan Dutta, Tu Bui et al.
ConfTuner: Training Large Language Models to Express Their Confidence Verbally
Yibo Li, Miao Xiong, Jiaying Wu et al.
Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation
Dingcheng Zhen, Shunshun Yin, Shiyang Qin et al.
Consistency Checks for Language Model Forecasters
Daniel Paleka, Abhimanyu Pallavi Sudhir, Alejandro Alvarez et al.
MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models
Pei Wang, Yanan Wu, Zekun Wang et al.
Interpreting the Repeated Token Phenomenon in Large Language Models
Itay Yona, Ilia Shumailov, Jamie Hayes et al.
MONA: Myopic Optimization with Non-myopic Approval Can Mitigate Multi-step Reward Hacking
Sebastian Farquhar, Vikrant Varma, David Lindner et al.
MagicColor: Multi-instance Sketch Colorization
yinhan Zhang, Yue Ma, Bingyuan Wang et al.
Beyond Walking: A Large-Scale Image-Text Benchmark for Text-based Person Anomaly Search
Shuyu Yang, Yaxiong Wang, Li Zhu et al.
Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program
Minghe Gao, Xuqi Liu, Zhongqi Yue et al.
MIP against Agent: Malicious Image Patches Hijacking Multimodal OS Agents
Lukas Aichberger, Alasdair Paren, Guohao Li et al.
Neural Exploratory Landscape Analysis for Meta-Black-Box-Optimization
Zeyuan Ma, Jiacheng Chen, Hongshu Guo et al.
SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes
Tony Alex, Sara Atito, Armin Mustafa et al.
Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance
Sachin Goyal, Christina Baek, Zico Kolter et al.
Vector-ICL: In-context Learning with Continuous Vector Representations
Yufan Zhuang, Chandan Singh, Liyuan Liu et al.
WAVE: Weight Templates for Adaptive Initialization of Variable-sized Models
Fu Feng, Yucheng Xie, Jing Wang et al.
BodyGen: Advancing Towards Efficient Embodiment Co-Design
Haofei Lu, Zhe Wu, Junliang Xing et al.
Synthesizing Software Engineering Data in a Test-Driven Manner
Lei Zhang, Jiaxi Yang, Min Yang et al.
RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression
Payman Behnam, Yaosheng Fu, Ritchie Zhao et al.
How Much Can Transfer? BRIDGE: Bounded Multi-Domain Graph Foundation Model with Generalization Guarantees
Haonan Yuan, Qingyun Sun, Junhua Shi et al.
The 3D-PC: a benchmark for visual perspective taking in humans and machines
Drew Linsley, Peisen Zhou, Alekh Ashok et al.
FilterTS: Comprehensive Frequency Filtering for Multivariate Time Series Forecasting
Yulong Wang, Yushuo Liu, Xiaoyi Duan et al.
When Do LLMs Help With Node Classification? A Comprehensive Analysis
Xixi Wu, Yifei Shen, Fangzhou Ge et al.
RayFlow: Instance-Aware Diffusion Acceleration via Adaptive Flow Trajectories
Huiyang Shao, Xin Xia, Yuhong Yang et al.
SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation
Leigang Qu, Haochuan Li, Wenjie Wang et al.
Advancing Textual Prompt Learning with Anchored Attributes
Zheng Li, Yibing Song, Ming-Ming Cheng et al.
DM-Adapter: Domain-Aware Mixture-of-Adapters for Text-Based Person Retrieval
Yating Liu, Zimo Liu, Xiangyuan Lan et al.
PhyloLM: Inferring the Phylogeny of Large Language Models and Predicting their Performances in Benchmarks
Nicolas Yax, Pierre-Yves Oudeyer, Stefano Palminteri
Unleashing Hour-Scale Video Training for Long Video-Language Understanding
Jingyang Lin, Jialian Wu, Ximeng Sun et al.
Taylor Series-Inspired Local Structure Fitting Network for Few-shot Point Cloud Semantic Segmentation
Changshuo Wang, Shuting He, Xiang Fang et al.
Efficient Attention-Sharing Information Distillation Transformer for Lightweight Single Image Super-Resolution
Karam Park, Jae Woong Soh, Nam Ik Cho
Fundamental limits of learning in sequence multi-index models and deep attention networks: high-dimensional asymptotics and sharp thresholds
Emanuele Troiani, Hugo Cui, Yatin Dandi et al.
Selective Prompt Anchoring for Code Generation
Yuan Tian, Tianyi Zhang
Sensor-Invariant Tactile Representation
Harsh Gupta, Yuchen Mo, Shengmiao Jin et al.
Boosting Latent Diffusion with Perceptual Objectives
Tariq Berrada, Pietro Astolfi, Melissa Hall et al.
InsTaG: Learning Personalized 3D Talking Head from Few-Second Video
Jiahe Li, Jiawei Zhang, Xiao Bai et al.
Make Me Happier: Evoking Emotions Through Image Diffusion Models
Qing Lin, Jingfeng Zhang, YEW-SOON ONG et al.
Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs
Yikang Zhou, Tao Zhang, Shilin Xu et al.
RoMA: Scaling up Mamba-based Foundation Models for Remote Sensing
Fengxiang Wang, Yulin Wang, Mingshuo Chen et al.
InfoChartQA: A Benchmark for Multimodal Question Answering on Infographic Charts
Tianchi Xie, Minzhi Lin, Mengchen Liu et al.
Identifying Macro Conditional Independencies and Macro Total Effects in Summary Causal Graphs with Latent Confounding
Simon Ferreira, Charles K. Assaad
Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models
Alireza Ganjdanesh, Reza Shirkavand, Shangqian Gao et al.
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
Yumeng Li, William H Beluch, Margret Keuper et al.
Stochastic Control for Fine-tuning Diffusion Models: Optimality, Regularity, and Convergence
Yinbin Han, Meisam Razaviyayn, Renyuan Xu
ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention
Bencheng Liao, Xinggang Wang, Lianghui Zhu et al.
Towards Hierarchical Rectified Flow
Yichi Zhang, Yici Yan, Alex Schwing et al.
Beyond Verifiable Rewards: Scaling Reinforcement Learning in Language Models to Unverifiable Data
Yunhao Tang, Sid Wang, Lovish Madaan et al.
SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation
Koichi Saito, Dongjun Kim, Takashi Shibuya et al.
WildSAT: Learning Satellite Image Representations from Wildlife Observations
Rangel Daroya, Elijah Cole, Oisin Mac Aodha et al.
More RLHF, More Trust? On The Impact of Preference Alignment On Trustworthiness
Aaron J. Li, Satyapriya Krishna, Hima Lakkaraju
Solving Video Inverse Problems Using Image Diffusion Models
Taesung Kwon, Jong Chul YE
Multi-Session Budget Optimization for Forward Auction-based Federated Learning
Xiaoli Tang, Han Yu, Zengxiang Li et al.
WSI-LLaVA: A Multimodal Large Language Model for Whole Slide Image
Yuci Liang, Xinheng Lyu, Meidan Ding et al.
DIFFER: Disentangling Identity Features via Semantic Cues for Clothes-Changing Person Re-ID
Xin Liang, Yogesh S. Rawat
Tool Unlearning for Tool-Augmented LLMs
Jiali Cheng, Hadi Amiri
Di[M]O: Distilling Masked Diffusion Models into One-step Generator
Yuanzhi Zhu, Xi WANG, Stéphane Lathuilière et al.
Robotic Visual Instruction
Yanbang Li, ZiYang Gong, Haoyang Li et al.
CoA: Towards Real Image Dehazing via Compression-and-Adaptation
Long Ma, Yuxin Feng, Yan Zhang et al.
Direct Alignment with Heterogeneous Preferences
Ali Shirali, Arash Nasr-Esfahany, Abdullah Alomar et al.
HumanRig: Learning Automatic Rigging for Humanoid Character in a Large Scale Dataset
Zedong Chu, Feng Xiong, Meiduo Liu et al.
On Measuring Long-Range Interactions in Graph Neural Networks
Jacob Bamberger, Benjamin Gutteridge, Scott le Roux et al.
Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order Bounds
Zhiyong Wang, Dongruo Zhou, John C.S. Lui et al.
Efficient Gaussian Splatting for Monocular Dynamic Scene Rendering via Sparse Time-Variant Attribute Modeling
Hanyang Kong, Xingyi Yang, Xinchao Wang
Quadratic Gaussian Splatting: High Quality Surface Reconstruction with Second-order Geometric Primitives
ziyu zhang, Binbin Huang, Hanqing Jiang et al.
Objective drives the consistency of representational similarity across datasets
Laure Ciernik, Lorenz Linhardt, Marco Morik et al.
Improving Parallel Program Performance with LLM Optimizers via Agent-System Interfaces
Anjiang Wei, Allen Nie, Thiago Teixeira et al.
Improved Off-policy Reinforcement Learning in Biological Sequence Design
Hyeonah Kim, Minsu Kim, Taeyoung Yun et al.
Joint Vision-Language Social Bias Removal for CLIP
Haoyu Zhang, Yangyang Guo, Mohan Kankanhalli
Decoupled Diffusion Sparks Adaptive Scene Generation
Yunsong Zhou, Naisheng Ye, William Ljungbergh et al.
ProtoOcc: Accurate, Efficient 3D Occupancy Prediction Using Dual Branch Encoder-Prototype Query Decoder
Jungho Kim, Changwon Kang, Dongyoung Lee et al.
Disentangled Motion Modeling for Video Frame Interpolation
Jaihyun Lew, Jooyoung Choi, Chaehun Shin et al.
Towards Understanding Fine-Tuning Mechanisms of LLMs via Circuit Analysis
Xu Wang, Yan Hu, Wenyu Du et al.
SnapMoGen: Human Motion Generation from Expressive Texts
chuan guo, Inwoo Hwang, Jian Wang et al.
Advancing Expert Specialization for Better MoE
Hongcan Guo, Haolang Lu, Guoshun Nan et al.
Sail into the Headwind: Alignment via Robust Rewards and Dynamic Labels against Reward Hacking
Paria Rashidinejad, Yuandong Tian
FlexiTex: Enhancing Texture Generation via Visual Guidance
Dadong Jiang, Xianghui Yang, Zibo Zhao et al.
Optimal Protocols for Continual Learning via Statistical Physics and Control Theory
Francesco Mori, Stefano Sarao Mannelli, Francesca Mignacco
Towards Transformer-Based Aligned Generation with Self-Coherence Guidance
Shulei Wang, Wang Lin, Hai Huang et al.
Rethinking Query-based Transformer for Continual Image Segmentation
Yuchen Zhu, Cheng Shi, Dingyou Wang et al.
Generalized and Efficient 2D Gaussian Splatting for Arbitrary-scale Super-Resolution
Du Chen, Liyi Chen, Zhengqiang ZHANG et al.
Improving Transformer World Models for Data-Efficient RL
Antoine Dedieu, Joseph Ortiz, Xinghua Lou et al.
LLMs Encode Harmfulness and Refusal Separately
Jiachen Zhao, Jing Huang, Zhengxuan Wu et al.
Diffusion On Syntax Trees For Program Synthesis
Shreyas Kapur, Erik Jenner, Stuart Russell
FedMIA: An Effective Membership Inference Attack Exploiting "All for One" Principle in Federated Learning
Gongxi Zhu, Donghao Li, Hanlin Gu et al.
Seeing Your Speech Style: A Novel Zero-Shot Identity-Disentanglement Face-based Voice Conversion
Yan Rong, Li Liu
MA-LoT: Model-Collaboration Lean-based Long Chain-of-Thought Reasoning enhances Formal Theorem Proving
Ruida Wang, Rui Pan, Yuxin Li et al.
MagCache: Fast Video Generation with Magnitude-Aware Cache
Zehong Ma, Longhui Wei, Feng Wang et al.
ZoRI: Towards Discriminative Zero-Shot Remote Sensing Instance Segmentation
Shiqi Huang, Shuting He, Bihan Wen
Towards Multiple Character Image Animation Through Enhancing Implicit Decoupling
Jingyun Xue, WANG HongFa, Qi Tian et al.
AvatarArtist: Open-Domain 4D Avatarization
Hongyu Liu, Xuan Wang, Ziyu Wan et al.
Confidence Estimation for Error Detection in Text-to-SQL Systems
Oleg Somov, Elena Tutubalina
CADDreamer: CAD Object Generation from Single-view Images
Yuan Li, Cheng Lin, Yuan Liu et al.
Quality-Driven Curation of Remote Sensing Vision-Language Data via Learned Scoring Models
Dilxat Muhtar, Enzhuo Zhang, Zhenshi Li et al.
Rethinking Artistic Copyright Infringements In the Era Of Text-to-Image Generative Models
Mazda Moayeri, Sriram Balasubramanian, Samyadeep Basu et al.
Scaffold-BPE: Enhancing Byte Pair Encoding for Large Language Models with Simple and Effective Scaffold Token Removal
Haoran Lian, Yizhe Xiong, Jianwei Niu et al.
BEVDiffuser: Plug-and-Play Diffusion Model for BEV Denoising with Ground-Truth Guidance
Xin Ye, Burhan Yaman, Sheng Cheng et al.
STOP: Integrated Spatial-Temporal Dynamic Prompting for Video Understanding
Zichen Liu, Kunlun Xu, Bing Su et al.
GenHancer: Imperfect Generative Models are Secretly Strong Vision-Centric Enhancers
Shijie Ma, Yuying Ge, Teng Wang et al.
From Experts to a Generalist: Toward General Whole-Body Control for Humanoid Robots
Yuxuan Wang, Ming Yang, Gang Ding et al.
Graph Neural Preconditioners for Iterative Solutions of Sparse Linear Systems
Jie Chen
V2C-CBM: Building Concept Bottlenecks with Vision-to-Concept Tokenizer
Hangzhou He, Lei Zhu, Xinliang Zhang et al.
QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache
Rishabh Tiwari, Haocheng Xi, Aditya Tomar et al.
MINERVA: Evaluating Complex Video Reasoning
Arsha Nagrani, Sachit Menon, Ahmet Iscen et al.
Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training
Mengru Wang, Xingyu Chen, Yue Wang et al.
First SFT, Second RL, Third UPT: Continual Improving Multi-Modal LLM Reasoning via Unsupervised Post-Training
Lai Wei, Yuting Li, Chen Wang et al.
LLaVA Needs More Knowledge: Retrieval Augmented Natural Language Generation with Knowledge Graph for Explaining Thoracic Pathologies
Ameer Hamza, Abdullah, Yong Hyun Ahn et al.
Scalable and Certifiable Graph Unlearning: Overcoming the Approximation Error Barrier
Lu Yi, Zhewei Wei
Towards a Comprehensive, Efficient and Promptable Anatomic Structure Segmentation Model Using 3D Whole-Body CT Scans
Heng Guo, Jianfeng Zhang, Jiaxing Huang et al.
CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation
Han He, Qianchu Liu, Lei Xu et al.
Surgical Workflow Recognition and Blocking Effectiveness Detection in Laparoscopic Liver Resection with Pringle Maneuver
Diandian Guo, Weixin Si, Zhixi Li et al.