Most Cited 2025 "hierarchical joint embedding" Papers
22,274 papers found • Page 51 of 112
Conference
ReferEverything: Towards Segmenting Everything We Can Speak of in Videos
Anurag Bagchi, Zhipeng Bao, Yu-Xiong Wang et al.
From Sequence to Structure: Uncovering Substructure Reasoning in Transformers
Xinnan Dai, Kai Yang, Jay Revolinsky et al.
Set Smoothness Unlocks Clarke Hyper-stationarity in Bilevel Optimization
He Chen, Jiajin Li, Anthony Man-Cho So
SemAlign3D: Semantic Correspondence between RGB-Images through Aligning 3D Object-Class Representations
Krispin Wandel, Hesheng Wang
Towards Efficient Foundation Model for Zero-shot Amodal Segmentation
Zhaochen Liu, Limeng Qiao, Xiangxiang Chu et al.
Uncertainty-Aware Gradient Stabilization for Small Object Detection
Huixin Sun, Yanjing Li, Linlin Yang et al.
VoteSplat: Hough Voting Gaussian Splatting for 3D Scene Understanding
Minchao Jiang, Shunyu Jia, Jiaming Gu et al.
MoESD: Unveil Speculative Decoding's Potential for Accelerating Sparse MoE
Zongle Huang, Lei Zhu, ZongYuan Zhan et al.
FlySearch: Exploring how vision-language models explore
Adam Pardyl, Dominik Matuszek, Mateusz Przebieracz et al.
Parallelizing MCMC Across the Sequence Length
David Zoltowski, Skyler Wu, Xavier Gonzalez et al.
Turbocharging Gaussian Process Inference with Approximate Sketch-and-Project
Pratik Rathore, Zachary Frangella, Sachin Garg et al.
Diffusion Classifiers Understand Compositionality, but Conditions Apply
Yujin Jeong, Arnas Uselis, Seong Joon Oh et al.
ScenePainter: Semantically Consistent Perpetual 3D Scene Generation with Concept Relation Alignment
Chong Xia, Shengjun Zhang, Fangfu Liu et al.
CoLMDriver: LLM-based Negotiation Benefits Cooperative Autonomous Driving
Changxing Liu, Genjia Liu, Zijun Wang et al.
Neurons: Emulating the Human Visual Cortex Improves Fidelity and Interpretability in fMRI-to-Video Reconstruction
Haonan Wang, Qixiang ZHANG, Lehan Wang et al.
MMMG: A Massive, Multidisciplinary, Multi-Tier Generation Benchmark for Text-to-Image Reasoning
Yuxuan Luo, Ryan Yuan, Junwen Chen et al.
I Am Big, You Are Little; I Am Right, You Are Wrong
David A Kelly, Akchunya Chanchal, Nathan Blake
Learning (Approximately) Equivariant Networks via Constrained Optimization
Andrei Manolache, Luiz Chamon, Mathias Niepert
Learning to Normalize on the SPD Manifold under Bures-Wasserstein Geometry
Rui Wang, Shaocheng Jin, Ziheng Chen et al.
Sensitivity-Aware Efficient Fine-Tuning via Compact Dynamic-Rank Adaptation
Tianran Chen, Jiarui Chen, Baoquan Zhang et al.
Graph-KV: Breaking Sequence via Injecting Structural Biases into Large Language Models
Haoyu Wang, Peihao Wang, Mufei Li et al.
World-aware Planning Narratives Enhance Large Vision-Language Model Planner
Junhao Shi, Zhaoye Fei, Siyin Wang et al.
DiffDoctor: Diagnosing Image Diffusion Models Before Treating
Yiyang Wang, Xi Chen, Xiaogang Xu et al.
Understanding challenges to the interpretation of disaggregated evaluations of algorithmic fairness
Stephen Pfohl, Natalie Harris, Chirag Nagpal et al.
Less Attention is More: Prompt Transformer for Generalized Category Discovery
Wei Zhang, Baopeng Zhang, Zhu Teng et al.
Fine-Grained Image-Text Correspondence with Cost Aggregation for Open-Vocabulary Part Segmentation
Jiho Choi, Seonho Lee, Minhyun Lee et al.
SoftShadow: Leveraging Soft Masks for Penumbra-Aware Shadow Removal
Xinrui Wang, Lanqing Guo, Xiyu Wang et al.
DexFlyWheel: A Scalable and Self-improving Data Generation Framework for Dexterous Manipulation
Kefei Zhu, Fengshuo Bai, YuanHao Xiang et al.
Towards General Modality Translation with Contrastive and Predictive Latent Diffusion Bridge
Nimrod Berman, Omkar Joglekar, Eitan Kosman et al.
Discrete Diffusion Models: Novel Analysis and New Sampler Guarantees
Yuchen Liang, Yingbin Liang, Lifeng LAI et al.
Taming generative video models for zero-shot optical flow extraction
Seungwoo Kim, Khai Loong Aw, Klemen Kotar et al.
OmniCast: A Masked Latent Diffusion Model for Weather Forecasting Across Time Scales
Tung Nguyen, Tuan Pham, Troy Arcomano et al.
FROSS: Faster-Than-Real-Time Online 3D Semantic Scene Graph Generation from RGB-D Images
Hao-Yu Hou, Chun-Yi Lee, Motoharu Sonogashira et al.
Adaptive Batch-Wise Sample Scheduling for Direct Preference Optimization
Zixuan Huang, Yikun Ban, Lean Fu et al.
Polar Sparsity: High Throughput Batched LLM Inferencing with Scalable Contextual Sparsity
Susav Shrestha, Bradley Settlemyer, Nikoli Dryden et al.
Synthesizing Near-Boundary OOD Samples for Out-of-Distribution Detection
Jinglun Li, Kaixun Jiang, Zhaoyu Chen et al.
Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Instructional Videos
Sagnik Majumder, Tushar Nagarajan, Ziad Al-Halah et al.
Reproducible Vision-Language Models Meet Concepts Out of Pre-Training
Ziliang Chen, Xin Huang, Xiaoxuan Fan et al.
3D Equivariant Visuomotor Policy Learning via Spherical Projection
Boce Hu, Dian Wang, David Klee et al.
Visual Prompting for One-shot Controllable Video Editing without Inversion
Zhengbo Zhang, Yuxi Zhou, DUO PENG et al.
Position: AI Should Sense Better, Not Just Scale Bigger: Adaptive Sensing as a Paradigm Shift
Eunsu Baek, Keondo Park, Jeonggil Ko et al.
E-BATS: Efficient Backpropagation-Free Test-Time Adaptation for Speech Foundation Models
Jiaheng Dong, Hong Jia, Soumyajit Chatterjee et al.
LogoSP: Local-global Grouping of Superpoints for Unsupervised Semantic Segmentation of 3D Point Clouds
Zihui Zhang, Weisheng Dai, Hongtao Wen et al.
Floating No More: Object-Ground Reconstruction from a Single Image
Yunze Man, Yichen Sheng, Jianming Zhang et al.
Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation
Xiuyu Yang, Shuhan Tan, Philipp Kraehenbuehl
BrepGiff: Lightweight Generation of Complex B-rep with 3D GAT Diffusion
Hao Guo, Xiaoshui Huang, Hao jiacheng et al.
Improving the Euclidean Diffusion Generation of Manifold Data by Mitigating Score Function Singularity
Zichen Liu, Wei Zhang, Tiejun Li
SA-Occ: Satellite-Assisted 3D Occupancy Prediction in Real World
Chen Chen, Zhirui Wang, Taowei Sheng et al.
Learning Phase Distortion with Selective State Space Models for Video Turbulence Mitigation
Xingguang Zhang, Nicholas M Chimitt, Xijun Wang et al.
AdaDrive: Self-Adaptive Slow-Fast System for Language-Grounded Autonomous Driving
Ruifei Zhang, Junlin Xie, Wei Zhang et al.
Align Your Rhythm: Generating Highly Aligned Dance Poses with Gating-Enhanced Rhythm-Aware Feature Representation
Congyi Fan, Jian Guan, Xuanjia Zhao et al.
FOCUS: Internal MLLM Representations for Efficient Fine-Grained Visual Question Answering
Liangyu Zhong, Fabio Philipp Rosenthal, Joachim Sicking et al.
Bridge the Gap: From Weak to Full Supervision for Temporal Action Localization with PseudoFormer
Ziyi Liu, Yangcen Liu
Cascaded Language Models for Cost-Effective Human–AI Decision-Making
Claudio Fanconi, Mihaela van der Schaar
Low Rank Gradients and Where to Find Them
Rishi Sonthalia, Michael Murray, Guido Montufar
CReFT-CAD: Boosting Orthographic Projection Reasoning for CAD via Reinforcement Fine-Tuning
Ke Niu, Zhuofan Chen, Haiyang Yu et al.
Aligning Transformers with Continuous Feedback via Energy Rank Alignment
Shriram Chennakesavalu, Frank Hu, Sebastian Ibarraran et al.
Positive2Negative: Breaking the Information-Lossy Barrier in Self-Supervised Single Image Denoising
Tong Li, Lizhi Wang, Zhiyuan Xu et al.
Search and Detect: Training-Free Long Tail Object Detection via Web-Image Retrieval
Mankeerat Sidhu, Hetarth Chopra, Ansel Blume et al.
ArcPro: Architectural Programs for Structured 3D Abstraction of Sparse Points
Qirui Huang, Runze Zhang, Kangjun Liu et al.
On the Generalization of Handwritten Text Recognition Models
Carlos Garrido-Munoz, Jorge Calvo-Zaragoza
ShapeEmbed: a self-supervised learning framework for 2D contour quantification
Anna Foix-Romero, Craig Russell, Alexander Krull et al.
FSHNet: Fully Sparse Hybrid Network for 3D Object Detection
Shuai Liu, Mingyue Cui, Boyang Li et al.
FilmComposer: LLM-Driven Music Production for Silent Film Clips
Zhifeng Xie, Qile He, Youjia Zhu et al.
OmniSplat: Taming Feed-Forward 3D Gaussian Splatting for Omnidirectional Images with Editable Capabilities
Suyoung Lee, JAEYOUNG CHUNG, Kihoon Kim et al.
VQToken: Neural Discrete Token Representation Learning for Extreme Token Reduction in Video Large Language Models
Haichao Zhang, Yun Fu
Exploiting Diffusion Prior for Task-driven Image Restoration
Jaeha Kim, Junghun Oh, Kyoung Mu Lee
Integrating Visual Interpretation and Linguistic Reasoning for Geometric Problem Solving
Zixian Guo, Ming Liu, Qilong Wang et al.
Traversing Distortion-Perception Tradeoff using a Single Score-Based Generative Model
Yuhan Wang, Suzhi Bi, Ying-Jun Angela Zhang et al.
Beyond Sight: Towards Cognitive Alignment in LVLM via Enriched Visual Knowledge
Yaqi Zhao, Yuanyang Yin, Lin Li et al.
Seeing in the Dark: Benchmarking Egocentric 3D Vision with the Oxford Day-and-Night Dataset
Zirui Wang, Wenjing Bian, Xinghui Li et al.
LLaFEA: Frame-Event Complementary Fusion for Fine-Grained Spatiotemporal Understanding in LMMs
Hanyu Zhou, Gim Hee Lee
Unlocking hidden biomolecular conformational landscapes in diffusion models at inference time
Daniel D. Richman, Jessica Karaguesian, Carl-Mikael Suomivuori et al.
Tiled Diffusion
Or Madar, Ohad Fried
seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models
Hafez Ghaemi, Eilif B. Muller, Shahab Bakhtiari
Differentiable Inverse Rendering with Interpretable Basis BRDFs
Hoon-Gyu Chung, Seokjun Choi, Seung-Hwan Baek
R-KV: Redundancy-aware KV Cache Compression for Reasoning Models
Zefan Cai, Wen Xiao, Hanshi Sun et al.
Discretized Gaussian Representation for Tomographic Reconstruction
Shaokai Wu, Yuxiang Lu, Yapan Guo et al.
Volume Tells: Dual Cycle-Consistent Diffusion for 3D Fluorescence Microscopy De-noising and Super-Resolution
ZELIN LI, Chenwei Wang, Zhaoke Huang et al.
Dark-ISP: Enhancing RAW Image Processing for Low-Light Object Detection
Jiasheng Guo, Xin Gao, Yuxiang Yan et al.
DriveScape: High-Resolution Driving Video Generation by Multi-View Feature Fusion
Wei Wu, Xi Guo, Weixuan TANG et al.
XIFBench: Evaluating Large Language Models on Multilingual Instruction Following
Zhenyu Li, Kehai Chen, Yunfei Long et al.
Asymptotic Theory of Geometric and Adaptive $k$-Means Clustering
Adam Quinn Jaffe
Memory-Efficient 4-bit Preconditioned Stochastic Optimization
Jingyang Li, Kuangyu Ding, Kim-chuan Toh et al.
BATCLIP: Bimodal Online Test-Time Adaptation for CLIP
Sarthak Kumar Maharana, Baoming Zhang, Leonid Karlinsky et al.
HumanSAM: Classifying Human-centric Forgery Videos in Human Spatial, Appearance, and Motion Anomaly
Chang Liu, Yunfan Ye, Fan Zhang et al.
PoseTraj: Pose-Aware Trajectory Control in Video Diffusion
longbin ji, Lei Zhong, Pengfei Wei et al.
ICPC-Eval: Probing the Frontiers of LLM Reasoning with Competitive Programming Contests
Shiyi Xu, Hu Yiwen, Yingqian Min et al.
Vivid4D: Improving 4D Reconstruction from Monocular Video by Video Inpainting
Jiaxin Huang, Sheng Miao, Bangbang Yang et al.
VideoAds for Fast-Paced Video Understanding
Zheyuan Zhang, Wanying Dou, Linkai Peng et al.
RLZero: Direct Policy Inference from Language Without In-Domain Supervision
Harshit Sushil Sikchi, Siddhant Agarwal, Pranaya Jajoo et al.
Strassen Attention, Split VC Dimension and Compositionality in Transformers
Alexander Kozachinskiy, Felipe Urrutia, Hector Orellana et al.
Towards a Golden Classifier-Free Guidance Path via Foresight Fixed Point Iterations
Kaibo Wang, Jianda Mao, Tong Wu et al.
Balanced Rate-Distortion Optimization in Learned Image Compression
Yichi Zhang, Zhihao Duan, Yuning Huang et al.
More of the Same: Persistent Representational Harms Under Increased Representation
Jennifer Mickel, Maria De-Arteaga, Liu Leqi et al.
Improve Representation for Imbalanced Regression through Geometric Constraints
Zijian Dong, Yilei Wu, Chongyao Chen et al.
JailBound: Jailbreaking Internal Safety Boundaries of Vision-Language Models
Jiaxin Song, Yixu Wang, Jie Li et al.
Leveraging SD Map to Augment HD Map-based Trajectory Prediction
Zhiwei Dong, Ran Ding, Wei Li et al.
VODiff: Controlling Object Visibility Order in Text-to-Image Generation
Dong Liang, Jinyuan Jia, Yuhao Liu et al.
DiMPLe - Disentangled Multi-Modal Prompt Learning: Enhancing Out-Of-Distribution Alignment with Invariant and Spurious Feature Separation
Umaima Rahman, Mohammad Yaqub, Dwarikanath Mahapatra
TCM-Ladder: A Benchmark for Multimodal Question Answering on Traditional Chinese Medicine
Jiacheng Xie, Yang Yu, Ziyang Zhang et al.
Ask a Strong LLM Judge when Your Reward Model is Uncertain
Zhenghao Xu, Qin Lu, Qingru Zhang et al.
SAMA: Towards Multi-Turn Referential Grounded Video Chat with Large Language Models
Ye Sun, Hao Zhang, Henghui Ding et al.
FG-OrIU: Towards Better Forgetting via Feature-Gradient Orthogonality for Incremental Unlearning
qian feng, Jiahang Tu, Mintong Kang et al.
ICP: Immediate Compensation Pruning for Mid-to-high Sparsity
Xin Luo, Fu Xueming, Zihang Jiang et al.
Sufficient Invariant Learning for Distribution Shift
Taero Kim, Subeen Park, Sungjun Lim et al.
Inpaint4Drag: Repurposing Inpainting Models for Drag-Based Image Editing via Bidirectional Warping
Jingyi Lu, Kai Han
Player-Centric Multimodal Prompt Generation for Large Language Model Based Identity-Aware Basketball Video Captioning
Zeyu Xi, Haoying Sun, Yaofei Wu et al.
ExCap3D: Expressive 3D Scene Understanding via Object Captioning with Varying Detail
Chandan Yeshwanth, David Rozenberszki, Angela Dai
AeSPa : Attention-guided Self-supervised Parallel Imaging for MRI Reconstruction
Jinho Joo, Hyeseong Kim, Hyeyeon Won et al.
Alignment, Mining and Fusion: Representation Alignment with Hard Negative Mining and Selective Knowledge Fusion for Medical Visual Question Answering
Yuanhao Zou, Zhaozheng Yin
4DGCPro: Efficient Hierarchical 4D Gaussian Compression for Progressive Volumetric Video Streaming
Zihan Zheng, Zhenlong Wu, Houqiang Zhong et al.
Multi-modal contrastive learning adapts to intrinsic dimensions of shared latent variables
Yu Gui, Cong Ma, Zongming Ma
Cognitive Mirrors: Exploring the Diverse Functional Roles of Attention Heads in LLM Reasoning
Xueqi Ma, Jun Wang, Yanbei Jiang et al.
Your Scale Factors are My Weapon: Targeted Bit-Flip Attacks on Vision Transformers via Scale Factor Manipulation
Jialai Wang, Yuxiao Wu, Weiye Xu et al.
Acquire and then Adapt: Squeezing out Text-to-Image Model for Image Restoration
Junyuan Deng, Xinyi Wu, Yongxing Yang et al.
From Panels to Prose: Generating Literary Narratives from Comics
Ragav Sachdeva, Andrew Zisserman
HairCUP: Hair Compositional Universal Prior for 3D Gaussian Avatars
Byungjun Kim, Shunsuke Saito, Giljoo Nam et al.
Flow-NeRF: Joint Learning of Geometry, Poses, and Dense Flow within Unified Neural Representations
Xunzhi Zheng, Dan Xu
Personalized Safety in LLMs: A Benchmark and A Planning-Based Agent Approach
Yuchen Wu, Edward Sun, Kaijie Zhu et al.
Wavelet and Prototype Augmented Query-based Transformer for Pixel-level Surface Defect Detection
Feng Yan, Xiaoheng Jiang, Yang Lu et al.
Mesh Mamba: A Unified State Space Model for Saliency Prediction in Non-Textured and Textured Meshes
Kaiwei Zhang, Dandan Zhu, Xiongkuo Min et al.
Effortless Active Labeling for Long-Term Test-Time Adaptation
Guowei Wang, Changxing Ding
FREE-Merging: Fourier Transform for Efficient Model Merging
Shenghe Zheng, Hongzhi Wang
Amodal Depth Anything: Amodal Depth Estimation in the Wild
Zhenyu Li, Mykola Lavreniuk, Jian Shi et al.
Martian World Model: Controllable Video Synthesis with Physically Accurate 3D Reconstructions
Longfei Li, Zhiwen Fan, Wenyan Cong et al.
Visual Modality Prompt for Adapting Vision-Language Object Detectors
Heitor Rapela Medeiros, Atif Belal, Srikanth Muralidharan et al.
CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image
Jingshun Huang, Haitao Lin, Tianyu Wang et al.
Colors See Colors Ignore: Clothes Changing ReID with Color Disentanglement
Priyank Pathak, Yogesh Rawat
OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography
Li Caoshuo, Zengmao Ding, Xiaobin Hu et al.
Transformed Low-rank Adaptation via Tensor Decomposition and Its Applications to Text-to-image Models
Zerui Tao, Yuhta Takida, Naoki Murata et al.
Common Task Framework For a Critical Evaluation of Scientific Machine Learning Algorithms
Philippe Wyder, Judah Goldfeder, Alexey Yermakov et al.
COSMO: Combination of Selective Memorization for Low-cost Vision-and-Language Navigation
Siqi Zhang, Yanyuan Qiao, Qunbo Wang et al.
Tracing Back the Malicious Clients in Poisoning Attacks to Federated Learning
Yuqi Jia, Minghong Fang, Hongbin Liu et al.
Splat-LOAM: Gaussian Splatting LiDAR Odometry and Mapping
Emanuele Giacomini, Luca Di Giammarino, Lorenzo De Rebotti et al.
Decoder Gradient Shield: Provable and High-Fidelity Prevention of Gradient-Based Box-Free Watermark Removal
Haonan An, Guang Hua, Zhengru Fang et al.
Associative Transformer
Yuwei Sun, Hideya Ochiai, Zhirong Wu et al.
From Head to Tail: Efficient Black-box Model Inversion Attack via Long-tailed Learning
Ziang Li, Hongguang Zhang, Juan Wang et al.
Multi-Scale Neighborhood Occupancy Masked Autoencoder for Self-Supervised Learning in LiDAR Point Clouds
Mohamed Abdelsamad, Michael Ulrich, Claudius Glaeser et al.
SparseDiT: Token Sparsification for Efficient Diffusion Transformer
Shuning Chang, Pichao WANG, Jiasheng Tang et al.
Unified Algorithms for RL with Decision-Estimation Coefficients: PAC, Reward-Free, Preference-Based Learning, and Beyond
Fan Chen, Song Mei, Yu Bai
AION-1: Omnimodal Foundation Model for Astronomical Sciences
Liam Parker, Francois Lanusse, Jeff Shen et al.
AutoJudge: Judge Decoding Without Manual Annotation
Roman Garipov, Fedor Velikonivtsev, Ivan Ermakov et al.
Can't Slow Me Down: Learning Robust and Hardware-Adaptive Object Detectors against Latency Attacks for Edge Devices
Tianyi Wang, Zichen Wang, Cong Wang et al.
Joint Diffusion Models in Continual Learning
Paweł Skierś, Kamil Deja
Semantic Causality-Aware Vision-Based 3D Occupancy Prediction
Dubing Chen, Huan Zheng, Yucheng Zhou et al.
BoltzNCE: Learning likelihoods for Boltzmann Generation with Stochastic Interpolants and Noise Contrastive Estimation
Rishal Aggarwal, Jacky Chen, Nicholas Boffi et al.
Image Referenced Sketch Colorization Based on Animation Creation Workflow
Dingkun Yan, Xinrui Wang, Zhuoru Li et al.
TailedCore: Few-Shot Sampling for Unsupervised Long-Tail Noisy Anomaly Detection
Yoon Gyo Jung, Jaewoo Park, Jaeho Yoon et al.
GPAvatar: High-fidelity Head Avatars by Learning Efficient Gaussian Projections
Weiqi Feng, Dong Han, Zekang Zhou et al.
Meta-Learning Objectives for Preference Optimization
Carlo Alfano, Silvia Sapora, Jakob Foerster et al.
STRAP: Spatio-Temporal Pattern Retrieval for Out-of-Distribution Generalization
Haoyu Zhang, WentaoZhang, Hao Miao et al.
Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning
Yurun Yuan, Fan Chen, Zeyu Jia et al.
Gyro-based Neural Single Image Deblurring
Heemin Yang, Jaesung Rim, Seungyong Lee et al.
Composing Global Solutions to Reasoning Tasks via Algebraic Objects in Neural Nets
Yuandong Tian
TalkCuts: A Large-Scale Dataset for Multi-Shot Human Speech Video Generation
Jiaben Chen, Zixin Wang, AILING ZENG et al.
RoboTron-Nav: A Unified Framework for Embodied Navigation Integrating Perception, Planning, and Prediction
Yufeng Zhong, Chengjian Feng, Feng yan et al.
Learning Class Prototypes for Unified Sparse-Supervised 3D Object Detection
Yun Zhu, Le Hui, Hang Yang et al.
SkyLadder: Better and Faster Pretraining via Context Window Scheduling
Tongyao Zhu, Qian Liu, Haonan Wang et al.
CuMPerLay: Learning Cubical Multiparameter Persistence Vectorizations
Caner Korkmaz, Brighton Nuwagira, Baris Coskunuzer et al.
Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval
Dohwan Ko, Ji Soo Lee, Minhyuk Choi et al.
Wav2Sem: Plug-and-Play Audio Semantic Decoupling for 3D Speech-Driven Facial Animation
Hao Li, Ju Dai, Xin Zhao et al.
UniPhy: Learning a Unified Constitutive Model for Inverse Physics Simulation
Himangi Mittal, Peiye Zhuang, Hsin-Ying Lee et al.
Efficient Adaptation of Pre-trained Vision Transformer underpinned by Approximately Orthogonal Fine-Tuning Strategy
Yiting Yang, Hao Luo, Yuan Sun et al.
Value Gradient Guidance for Flow Matching Alignment
Zhen Liu, Tim Xiao, Carles Domingo i Enrich et al.
Vision-Language Embodiment for Monocular Depth Estimation
Jinchang Zhang, Guoyu Lu
Reinforced Context Order Recovery for Adaptive Reasoning and Planning
Long Ma, Fangwei Zhong, Yizhou Wang
ImViD: Immersive Volumetric Videos for Enhanced VR Engagement
Zhengxian Yang, Shi Pan, Shengqi Wang et al.
Seeing What Matters: Generalizable AI-generated Video Detection with Forensic-Oriented Augmentation
Riccardo Corvi, Davide Cozzolino, Ekta Prashnani et al.
Image as an IMU: Estimating Camera Motion from a Single Motion-Blurred Image
Jerred Chen, Ronald Clark
Task Vector Quantization for Memory-Efficient Model Merging
Youngeun Kim, Seunghwan Lee, Aecheon Jung et al.
Eyes Wide Open: Ego Proactive Video-LLM for Streaming Video
Xueyang Yu, Cheng Shi, Yang Wang et al.
Simpler Diffusion: 1.5 FID on ImageNet512 with Pixel-space Diffusion
Emiel Hoogeboom, Thomas Mensink, Jonathan Heek et al.
OmniVTON: Training-Free Universal Virtual Try-On
Zhaotong Yang, Yuhui Li, Shengfeng He et al.
Anti-Aliased 2D Gaussian Splatting
Mae Younes, Adnane Boukhayma
EVOREFUSE: Evolutionary Prompt Optimization for Evaluation and Mitigation of LLM Over-Refusal to Pseudo-Malicious Instructions
Xiaorui Wu, Fei Li, Xiaofeng Mao et al.
Reanimating Images using Neural Representations of Dynamic Stimuli
Jacob Yeung, Andrew Luo, Gabriel Sarch et al.
Probably Approximately Precision and Recall Learning
Lee Cohen, Yishay Mansour, Shay Moran et al.
Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion
Aleksandar Jevtić, Christoph Reich, Felix Wimbauer et al.
Auto-Connect: Connectivity-Preserving RigFormer with Direct Preference Optimization
jingfeng Guo, Jian Liu, Jinnan Chen et al.
HumanMM: Global Human Motion Recovery from Multi-shot Videos
Yuhong Zhang, Guanlin Wu, Ling-Hao Chen et al.
PI-HMR: Towards Robust In-bed Temporal Human Shape Reconstruction with Contact Pressure Sensing
Ziyu Wu, Yufan Xiong, Mengting Niu et al.
KL Penalty Control via Perturbation for Direct Preference Optimization
Sangkyu Lee, Janghoon Han, Hosung Song et al.
BMMR: A Large-Scale Bilingual Multimodal Multi-Discipline Reasoning Dataset
Zhiheng Xi, Guanyu Li, Yutao Fan et al.
Generalized Linear Bandits: Almost Optimal Regret with One-Pass Update
Yu-Jie Zhang, Sheng-An Xu, Peng Zhao et al.
TokensGen: Harnessing Condensed Tokens for Long Video Generation
Wenqi Ouyang, Zeqi Xiao, Danni Yang et al.
PGC: Physics-Based Gaussian Cloth from a Single Pose
Michelle Guo, Matt Jen-Yuan Chiang, Igor Santesteban et al.
Flat Channels to Infinity in Neural Loss Landscapes
Flavio Martinelli, Alexander van Meegen, Berfin Simsek et al.
BEAST: Efficient Tokenization of B-Splines Encoded Action Sequences for Imitation Learning
Hongyi Zhou, Weiran Liao, Xi Huang et al.
A Unified Framework for the Transportability of Population-Level Causal Measures
Ahmed Boughdiri, Clément Berenfeld, Julie Josse et al.
Z-Magic: Zero-shot Multiple Attributes Guided Image Creator
Yingying Deng, Xiangyu He, Fan Tang et al.
Joint Self-Supervised Video Alignment and Action Segmentation
Ali Shah Ali, Syed Ahmed Mahmood, Mubin Saeed et al.
Learning to Better Search with Language Models via Guided Reinforced Self-Training
Seungyong Moon, Bumsoo Park, Hyun Oh Song
NeRF Is a Valuable Assistant for 3D Gaussian Splatting
Shuangkang Fang, I-Chao Shen, Takeo Igarashi et al.
Precise Information Control in Long-Form Text Generation
Jacqueline He, Howard Yen, Margaret Li et al.
Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding
Han Xiao, yina xie, Guanxin tan et al.
Mitigating Ambiguities in 3D Classification with Gaussian Splatting
Ruiqi Zhang, Hao Zhu, Jingyi Zhao et al.
PBCAT: Patch-Based Composite Adversarial Training against Physically Realizable Attacks on Object Detection
Xiao Li, Yiming Zhu, Yifan Huang et al.
StickMotion: Generating 3D Human Motions by Drawing a Stickman
Tao Wang, Zhihua Wu, Qiaozhi He et al.
Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning
Wang Yang, Zirui Liu, Hongye Jin et al.