Most Cited 2025 "hierarchical joint embedding" Papers
22,274 papers found • Page 50 of 112
Conference
SC-Captioner: Improving Image Captioning with Self-Correction by Reinforcement Learning
Lin Zhang, Xianfang Zeng, Kangcong Li et al.
PlugMark: A Plug-in Zero-Watermarking Framework for Diffusion Models
Pengzhen Chen, Yanwei Liu, Xiaoyan Gu et al.
Implicit Bias Injection Attacks against Text-to-Image Diffusion Models
Huayang Huang, Xiangye Jin, Jiaxu Miao et al.
Zero-Shot Blind-spot Image Denoising via Implicit Neural Sampling
Yuhui Quan, Tianxiang Zheng, Zhiyuan Ma et al.
Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generation
Yuan Gan, Jiaxu Miao, Yunze Wang et al.
A Tale of Two Classes: Adapting Supervised Contrastive Learning to Binary Imbalanced Datasets
David Mildenberger, Paul Hager, Daniel Rueckert et al.
CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models
Quang-Binh Nguyen, Minh Luu, Quang Nguyen et al.
VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
Kevin Qinghong Lin, Mike Zheng Shou
Global Convergence for Average Reward Constrained MDPs with Primal-Dual Actor Critic Algorithm
Yang Xu, Swetha Ganesh, Washim Mondal et al.
Efficient Preference-Based Reinforcement Learning: Randomized Exploration meets Experimental Design
Andreas Schlaginhaufen, Reda Ouhamma, Maryam Kamgarpour
UNICL-SAM: Uncertainty-Driven In-Context Segmentation with Part Prototype Discovery
Dianmo Sheng, Dongdong Chen, Zhentao Tan et al.
PSA-SSL: Pose and Size-aware Self-Supervised Learning on LiDAR Point Clouds
Barza Nisar, Steven L. Waslander
ARMesh: Autoregressive Mesh Generation via Next-Level-of-Detail Prediction
Jiabao Lei, Kewei Shi, Zhihao Liang et al.
Learning from positive and unlabeled examples -Finite size sample bounds
Farnam Mansouri, Shai Ben-David
Accurate and Efficient Low-Rank Model Merging in Core Space
Aniello Panariello, Daniel Marczak, Simone Magistri et al.
Multi-focal Conditioned Latent Diffusion for Person Image Synthesis
Jiaqi Liu, Jichao Zhang, Paolo Rota et al.
Towards Fully FP8 GEMM LLM Training at Scale
Alejandro Hernández Cano, Dhia Garbaya, Imanol Schlag et al.
Decoupling Training-Free Guided Diffusion by ADMM
Youyuan Zhang, Zehua Liu, Zenan Li et al.
Guiding LLM Decision-Making with Fairness Reward Models
Zara Hall, Melanie Subbiah, Thomas Zollo et al.
HuMoCon: Concept Discovery for Human Motion Understanding
Qihang Fang, Chengcheng Tang, Bugra Tekin et al.
Enhancing Image Restoration Transformer via Adaptive Translation Equivariance
JiaKui Hu, Zhengjian Yao, Lujia Jin et al.
Bridging the Gap between Gaussian Diffusion Models and Universal Quantization for Image Compression
Lucas Relic, Roberto Azevedo, Yang Zhang et al.
GradMetaNet: An Equivariant Architecture for Learning on Gradients
Yoav Gelberg, Yam Eitan, Aviv Navon et al.
UrbanCAD: Towards Highly Controllable and Photorealistic 3D Vehicles for Urban Scene Simulation
Yichong Lu, Yichi Cai, Shangzhan Zhang et al.
DERD-Net: Learning Depth from Event-based Ray Densities
Diego de Oliveira Hitzges, Suman Ghosh, Guillermo Gallego
ATLAS: Decoupling Skeletal and Shape Parameters for Expressive Parametric Human Modeling
Jinhyung Park, Javier Romero, Shunsuke Saito et al.
Second-order Optimization under Heavy-Tailed Noise: Hessian Clipping and Sample Complexity Limits
Abdurakhmon Sadiev, Peter Richtarik, Ilyas Fatkhullin
MMAIF: Multi-task and Multi-degradation All-in-One for Image Fusion with Language Guidance
Zihan Cao, Yu Zhong, Ziqi Wang et al.
Quantifying Cross-Modality Memorization in Vision-Language Models
Yuxin Wen, Yangsibo Huang, Tom Goldstein et al.
FADE: Frequency-Aware Diffusion Model Factorization for Video Editing
Yixuan Zhu, Haolin Wang, Shilin Ma et al.
CineTechBench: A Benchmark for Cinematographic Technique Understanding and Generation
Xinran Wang, Songyu Xu, Shan Xiangxuan et al.
Supervising Sound Localization by In-the-wild Egomotion
Anna Min, Ziyang Chen, Hang Zhao et al.
Beyond Isolated Words: Diffusion Brush for Handwritten Text-Line Generation
Gang Dai, Yifan Zhang, Yutao Qin et al.
Towards Consistent Multi-Task Learning: Unlocking the Potential of Task-Specific Parameters
Xiaohan Qin, Xiaoxing Wang, Junchi Yan
Recognition-Synergistic Scene Text Editing
Zhengyao Fang, Pengyuan Lyu, Jingjing Wu et al.
Teaching VLMs to Localize Specific Objects from In-context Examples
Sivan Doveh, Nimrod Shabtay, Eli Schwartz et al.
Articulated Kinematics Distillation from Video Diffusion Models
Xuan Li, Qianli Ma, Tsung-Yi Lin et al.
A Differentiable Wave Optics Model for End-to-End Computational Imaging System Optimization
Chi-Jui Ho, Yash Belhe, Steve Rotenberg et al.
HARDMath2: A Benchmark for Applied Mathematics Built by Students as Part of a Graduate Class
James Roggeveen, Erik Wang, David Ettel et al.
SMTPD: A New Benchmark for Temporal Prediction of Social Media Popularity
Yijie Xu, Bolun Zheng, Wei Zhu et al.
Next Semantic Scale Prediction via Hierarchical Diffusion Language Models
Cai Zhou, Chenyu Wang, Dinghuai Zhang et al.
RL-RC-DoT: A Block-level RL agent for Task-Aware Video Compression
Uri Gadot, Shie Mannor, Assaf Shocher et al.
On Large Multimodal Models as Open-World Image Classifiers
Alessandro Conti, Massimiliano Mancini, Enrico Fini et al.
Disentangled Clothed Avatar Generation with Layered Representation
Weitian Zhang, Yichao Yan, Sijing Wu et al.
Bring Your Rear Cameras for Egocentric 3D Human Pose Estimation
HIroyasu Akada, Jian Wang, Vladislav Golyanik et al.
Foresight in Motion: Reinforcing Trajectory Prediction with Reward Heuristics
Muleilan Pei, Shaoshuai Shi, Xuesong Chen et al.
Stackelberg Self-Annotation: A Robust Approach to Data-Efficient LLM Alignment
Chu Xu, Zhixin Zhang, Tianyu Jia et al.
SemanticDraw: Towards Real-Time Interactive Content Creation from Image Diffusion Models
Jaerin Lee, Daniel Jung, Kanggeon Lee et al.
Efficient Parametric SVD of Koopman Operator for Stochastic Dynamical Systems
Minchan Jeong, Jongha (Jon) Ryu, Se-Young Yun et al.
Bridge Frame and Event: Common Spatiotemporal Fusion for High-Dynamic Scene Optical Flow
Hanyu Zhou, Haonan Wang, Haoyue Liu et al.
Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection
wenqiao Li, Yao Gu, Xintao Chen et al.
Unveiling the Learning Mind of Language Models: A Cognitive Framework and Empirical Study
Zhengyu Hu, Jianxun Lian, Zheyuan Xiao et al.
Hierarchical-aware Orthogonal Disentanglement Framework for Fine-grained Skeleton-based Action Recognition
Haochen Chang, Pengfei Ren, Haoyang Zhang et al.
MonarchAttention: Zero-Shot Conversion to Fast, Hardware-Aware Structured Attention
Can Yaras, Alec Xu, Pierre Abillama et al.
TimeExpert: An Expert-Guided Video LLM for Video Temporal Grounding
Zuhao Yang, Yingchen Yu, Yunqing Zhao et al.
Beyond Spatial Frequency: Pixel-wise Temporal Frequency-based Deepfake Video Detection
Taehoon Kim, Jongwook Choi, Yonghyun Jeong et al.
Scene-agnostic Pose Regression for Visual Localization
Junwei Zheng, Ruiping Liu, Yufan Chen et al.
Hierarchical Features Matter: A Deep Exploration of Progressive Parameterization Method for Dataset Distillation
Xinhao Zhong, Hao Fang, Bin Chen et al.
Kestrel: 3D Multimodal LLM for Part-Aware Grounded Description
Mahmoud Ahmed, Junjie Fei, Jian Ding et al.
Adaptive Frontier Exploration on Graphs with Applications to Network-Based Disease Testing
XianJun, Davin Choo, Yuqi Pan, Tonghan Wang et al.
4D-Fly: Fast 4D Reconstruction from a Single Monocular Video
Diankun Wu, Fangfu Liu, Yi-Hsin Hung et al.
Mixture-of-Experts Meets In-Context Reinforcement Learning
Wenhao Wu, Fuhong Liu, Haoru Li et al.
Whose View of Safety? A Deep DIVE Dataset for Pluralistic Alignment of Text-to-Image Models
Charvi Rastogi, Tian Huey Teh, Pushkar Mishra et al.
Trans-EnV: A Framework for Evaluating the Linguistic Robustness of LLMs Against English Varieties
Jiyoung Lee, Seungho Kim, Jieun Han et al.
ControlFace: Harnessing Facial Parametric Control for Face Rigging
Wooseok Jang, Youngjun Hong, Geonho Cha et al.
A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision
Chensheng Peng, Ido Sobol, Masayoshi Tomizuka et al.
LOD-GS: Achieving Levels of Detail using Scalable Gaussian Soup
Jianxiong Shen, Yue Qian, Xiaohang Zhan
Towards Better & Faster Autoregressive Image Generation: From the Perspective of Entropy
Xiaoxiao Ma, Feng Zhao, Pengyang Ling et al.
Enhancing Diversity for Data-free Quantization
Kai Zhao, zhihao zhuang, Miao Zhang et al.
CALICO: Part-Focused Semantic Co-Segmentation with Large Vision-Language Models
Kiet A. Nguyen, Adheesh Juvekar, Tianjiao Yu et al.
$\boldsymbol{\lambda}$-Orthogonality Regularization for Compatible Representation Learning
Simone Ricci, Niccolò Biondi, Federico Pernici et al.
Recurrent Feature Mining and Keypoint Mixup Padding for Category-Agnostic Pose Estimation
Junjie Chen, Weilong Chen, Yifan Zuo et al.
GPO: Learning from Critical Steps to Improve LLM Reasoning
Jiahao Yu, Zelei Cheng, Xian Wu et al.
Unifying Text Semantics and Graph Structures for Temporal Text-attributed Graphs with Large Language Models
Siwei Zhang, Yun Xiong, Yateng Tang et al.
Differentiation Through Black-Box Quadratic Programming Solvers
Connor Magoon, Fengyu Yang, Noam Aigerman et al.
PRVQL: Progressive Knowledge-guided Refinement for Robust Egocentric Visual Query Localization
Bing Fan, Yunhe Feng, Yapeng Tian et al.
IllumiCraft: Unified Geometry and Illumination Diffusion for Controllable Video Generation
Yuanze Lin, Yi-Wen Chen, Yi-Hsuan Tsai et al.
DV-Matcher: Deformation-based Non-rigid Point Cloud Matching Guided by Pre-trained Visual Features
Zhangquan Chen, Puhua Jiang, Ruqi Huang
Anatomical Consistency and Adaptive Prior-informed Transformation for Multi-contrast MR Image Synthesis via Diffusion Model
Yejee Shin, Yeeun Lee, Hanbyol Jang et al.
F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration
Lu Liu, Huiyu Duan, Qiang Hu et al.
Prototypes are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval
WonJun Moon, Cheol-Ho Cho, Woojin Jun et al.
LocalDyGS: Multi-view Global Dynamic Scene Modeling via Adaptive Local Implicit Feature Decoupling
Jiahao Wu, Rui Peng, Jianbo Jiao et al.
Towards a Universal 3D Medical Multi-modality Generalization via Learning Personalized Invariant Representation
Zhaorui Tan, Xi Yang, Tan Pan et al.
Seeing the Abstract: Translating the Abstract Language for Vision Language Models
Davide Talon, Federico Girella, Ziyue Liu et al.
Adaptive Distraction: Probing LLM Contextual Robustness with Automated Tree Search
Yanbo Wang, Zixiang Xu, Yue Huang et al.
A Lightweight UDF Learning Framework for 3D Reconstruction Based on Local Shape Functions
Jiangbei Hu, Yanggeng Li, Fei Hou et al.
Born a Transformer -- Always a Transformer? On the Effect of Pretraining on Architectural Abilities
Mayank Jobanputra, Yana Veitsman, Yash Sarrof et al.
MoPFormer: Motion-Primitive Transformer for Wearable-Sensor Activity Recognition
Hao Zhang, Zhan Zhuang, Xuehao Wang et al.
Heavy Labels Out! Dataset Distillation with Label Space Lightening
Ruonan Yu, Songhua Liu, Zigeng Chen et al.
AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers
Jiazhi Guan, Kaisiyuan Wang, Zhiliang Xu et al.
ResQ: A Novel Framework to Implement Residual Neural Networks on Analog Rydberg Atom Quantum Computers
Nicholas DiBrita, Jason Han, Tirthak Patel
Diffusion Models and the Manifold Hypothesis: Log-Domain Smoothing is Geometry Adaptive
Tyler Farghly, Peter Potaptchik, Samuel Howard et al.
Human-assisted Robotic Policy Refinement via Action Preference Optimization
Wenke Xia, Yichu Yang, Hongtao Wu et al.
LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion
Fangfu Liu, Hao Li, Jiawei Chi et al.
RivuletMLP: An MLP-based Architecture for Efficient Compressed Video Quality Enhancement
Gang He, Weiran Wang, Guancheng Quan et al.
Taming Flow Matching with Unbalanced Optimal Transport into Fast Pansharpening
Zihan Cao, Yu Zhong, Liang-Jian Deng
Conformal Information Pursuit for Interactively Guiding Large Language Models
Kwan Ho Ryan Chan, Yuyan Ge, Edgar Dobriban et al.
Offline RL by Reward-Weighted Fine-Tuning for Conversation Optimization
Subhojyoti Mukherjee, Viet Lai, Raghavendra Addanki et al.
Revisiting Pool-based Prompt Learning for Few-shot Class-incremental Learning
Yongwei Jiang, Yixiong Zou, Yuhua Li et al.
INTER: Mitigating Hallucination in Large Vision-Language Models by Interaction Guidance Sampling
Xin Dong, Shichao Dong, Jin Wang et al.
Fair Generation without Unfair Distortions: Debiasing Text-to-Image Generation with Entanglement-Free Attention
Jeonghoon Park, Juyoung Lee, Chaeyeon Chung et al.
Extending Foundational Monocular Depth Estimators to Fisheye Cameras with Calibration Tokens
Suchisrit Gangopadhyay, Jung Hee Kim, Xien Chen et al.
MobileH2R: Learning Generalizable Human to Mobile Robot Handover Exclusively from Scalable and Diverse Synthetic Data
Zifan Wang, Ziqing Chen, Junyu Chen et al.
Atomic Diffusion Models for Small Molecule Structure Elucidation from NMR Spectra
Ziyu Xiong, Yichi Zhang, Foyez Alauddin et al.
Hierarchical Material Recognition from Local Appearance
Matthew Beveridge, Shree Nayar
Ravan: Multi-Head Low-Rank Adaptation for Federated Fine-Tuning
Arian Raje, Baris Askin, Divyansh Jhunjhunwala et al.
A machine learning approach that beats Rubik's cubes
Alexander Chervov, Kirill Khoruzhii, Nikita Bukhal et al.
Predictability Enables Parallelization of Nonlinear State Space Models
Xavier Gonzalez, Leo Kozachkov, David Zoltowski et al.
Efficient Rectified Flow for Image Fusion
Zirui Wang, Jiayi Zhang, Tianwei Guan et al.
Second-Order Convergence in Private Stochastic Non-Convex Optimization
Youming Tao, Zuyuan Zhang, Dongxiao Yu et al.
RvLLM: LLM Runtime Verification with Domain Knowledge
Yedi Zhang, Sun Emma, Annabelle En et al.
A3: Few-shot Prompt Learning of Unlearnable Examples with Cross-Modal Adversarial Feature Alignment
Xuan Wang, Xitong Gao, Dongping Liao et al.
Decouple Distortion from Perception: Region Adaptive Diffusion for Extreme-low Bitrate Perception Image Compression
Jinchang Xu, Shaokang Wang, Jintao Chen et al.
From Laboratory to Real World: A New Benchmark Towards Privacy-Preserved Visible-Infrared Person Re-Identification
Yan Jiang, Hao Yu, Xu Cheng et al.
D2SP: Dynamic Dual-Stage Purification Framework for Dual Noise Mitigation in Vision-based Affective Recognition.
Haoran Wang, Xinji Mai, Zeng Tao et al.
Towards High-fidelity 3D Talking Avatar with Personalized Dynamic Texture
Xuanchen Li, Jianyu Wang, Yuhao Cheng et al.
HyTIP: Hybrid Temporal Information Propagation for Masked Conditional Residual Video Coding
Yi-Hsin Chen, Yi-Chen Yao, Kuan-Wei Ho et al.
RoPECraft: Training-Free Motion Transfer with Trajectory-Guided RoPE Optimization on Diffusion Transformers
Ahmet Berke Gökmen, Yiğit Ekin, Bahri Batuhan Bilecen et al.
Scaling Laws for Robust Comparison of Open Foundation Language-Vision Models and Datasets
Marianna Nezhurina, Tomer Porian, Giovanni Puccetti et al.
SHAP values via sparse Fourier representation
Ali Gorji, Andisheh Amrollahi, Andreas Krause
Language Modeling by Language Models
Junyan Cheng, Peter Clark, Kyle Richardson
Revisiting Semi-Supervised Learning in the Era of Foundation Models
Ping Zhang, Zheda Mai, Quang-Huy (Percy) Nguyen et al.
Ground-V: Teaching VLMs to Ground Complex Instructions in Pixels
Yongshuo Zong, Qin ZHANG, DONGSHENG An et al.
Distilling Spatially-Heterogeneous Distortion Perception for Blind Image Quality Assessment
Xudong Li, Wenjie Nie, Yan Zhang et al.
T1: A Tool-Oriented Conversational Dataset for Multi-Turn Agentic Planning
Open-set Cross Modal Generalization via Multimodal Unified Representation
Hai Huang, Yan Xia, Shulei Wang et al.
GaRe: Relightable 3D Gaussian Splatting for Outdoor Scenes from Unconstrained Photo Collections
Haiyang Bai, Jiaqi Zhu, Songru Jiang et al.
Brain network science modelling of sparse neural networks enables Transformers and LLMs to perform as fully connected
Yingtao Zhang, Diego Cerretti, Jialin Zhao et al.
Zero-Shot Trajectory Planning for Signal Temporal Logic Tasks
Ruijia Liu, Ancheng Hou, Xiao Yu et al.
STEER-ME: Assessing the Microeconomic Reasoning of Large Language Models
Narun Raman, Taylor Lundy, Thiago Amin et al.
TimeTracker: Event-based Continuous Point Tracking for Video Frame Interpolation with Non-linear Motion
Haoyue Liu, Jinghan Xu, Yi Chang et al.
Unleashing High-Quality Image Generation in Diffusion Sampling Using Second-Order Levenberg-Marquardt-Langevin
Fangyikang Wang, Hubery Yin, Lei Qian et al.
Scale Efficient Training for Large Datasets
Qing Zhou, Junyu Gao, Qi Wang
Beyond Scores: Proximal Diffusion Models
Zhenghan Fang, Mateo Diaz, Sam Buchanan et al.
Demystifying Spectral Feature Learning for Instrumental Variable Regression
Dimitri Meunier, Antoine Moulin, Jakub Wornbard et al.
BACON: Improving Clarity of Image Captions via Bag-of-Concept Graphs
Zhantao Yang, Ruili Feng, Keyu Yan et al.
ReCAP: Recursive Context-Aware Reasoning and Planning for Large Language Model Agents
Zhenyu Zhang, Tianyi Chen, Weiran Xu et al.
TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation
Zonglin Lyu, Chen Chen
Decouple-Then-Merge: Finetune Diffusion Models as Multi-Task Learning
Qianli Ma, Xuefei Ning, Dongrui Liu et al.
FGBench: A Dataset and Benchmark for Molecular Property Reasoning at Functional Group-Level in Large Language Models
Xuan Liu, Siru Ouyang, Xianrui Zhong et al.
Register and [CLS] tokens induce a decoupling of local and global features in large ViTs
Alexander Lappe, Martin Giese
SA-LUT: Spatial Adaptive 4D Look-Up Table for Photorealistic Style Transfer
Zerui Gong, Zhonghua Wu, Qingyi Tao et al.
Parallel Sequence Modeling via Generalized Spatial Propagation Network
Hongjun Wang, Wonmin Byeon, Jiarui Xu et al.
Monocular Semantic Scene Completion via Masked Recurrent Networks
Xuzhi Wang, Xinran Wu, Song Wang et al.
ADIEE: Automatic Dataset Creation and Scorer for Instruction-Guided Image Editing Evaluation
Sherry Chen, Yi Wei, Luowei Zhou et al.
All You Need is One: Capsule Prompt Tuning with a Single Vector
Yiyang Liu, James Liang, Heng Fan et al.
SeaLion: Semantic Part-Aware Latent Point Diffusion Models for 3D Generation
Dekai Zhu, Yan Di, Stefan Gavranovic et al.
FairGen: Enhancing Fairness in Text-to-Image Diffusion Models via Self-Discovering Latent Directions
Yilei Jiang, Wei-Hong Li, Yiyuan Zhang et al.
Unveiling the Invisible: Reasoning Complex Occlusions Amodally with AURA
Zhixuan Li, Hyunse Yoon, Sanghoon Lee et al.
Conformal Inference under High-Dimensional Covariate Shifts via Likelihood-Ratio Regularization
Sunay Joshi, Shayan Kiyani, George J. Pappas et al.
Information-Theoretic Reward Decomposition for Generalizable RLHF
Liyuan Mao, Haoran Xu, Amy Zhang et al.
LIFT: Latent Implicit Functions for Task- and Data-Agnostic Encoding
Amirhossein Kazerouni, Soroush Mehraban, Michael Brudno et al.
RobotSmith: Generative Robotic Tool Design for Acquisition of Complex Manipulation Skills
Chunru Lin, Haotian Yuan, Yian Wang et al.
Orientation-anchored Hyper-Gaussian for 4D Reconstruction from Casual Videos
Junyi Wu, Jiachen Tao, Haoxuan Wang et al.
Generalizable Domain Adaptation for Sim-and-Real Policy Co-Training
Shuo Cheng, Liqian Ma, Zhenyang Chen et al.
A Hidden Stumbling Block in Generalized Category Discovery: Distracted Attention
Qiyu Xu, Zhanxuan Hu, Yu Duan et al.
GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models
Jonathan Roberts, Kai Han, Samuel Albanie
Free-viewpoint Human Animation with Pose-correlated Reference Selection
Fa-Ting Hong, Zhan Xu, Haiyang Liu et al.
RapVerse: Coherent Vocals and Whole-Body Motion Generation from Text
Jiaben Chen, Xin Yan, Yihang Chen et al.
Sparse Fine-Tuning of Transformers for Generative Tasks
Wei Chen, Jingxi Yu, Zichen Miao et al.
VIGFace: Virtual Identity Generation for Privacy-Free Face Recognition Dataset
Minsoo Kim, Min-Cheol Sagong, Gi Pyo Nam et al.
ZeroKey: Point-Level Reasoning and Zero-Shot 3D Keypoint Detection from Large Language Models
Bingchen Gong, Diego Gomez, Abdullah Hamdi et al.
CoSpace: Benchmarking Continuous Space Perception Ability for Vision-Language Models
Yiqi Zhu, Ziyue Wang, Can Zhang et al.
Efficient Unsupervised Shortcut Learning Detection and Mitigation in Transformers
Lukas Kuhn, sari sadiya, Jörg Schlötterer et al.
TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models
Ziyang Luo, Nian Liu, Xuguang Yang et al.
Forgetting Through Transforming: Enabling Federated Unlearning via Class-Aware Representation Transformation
Qi Guo, Zhen Tian, Minghao Yao et al.
ReasonVQA: A Multi-hop Reasoning Benchmark with Structural Knowledge for Visual Question Answering
Duong T. Tran, Trung-Kien Tran, Manfred Hauswirth et al.
Dataset Distillation via Vision-Language Category Prototype
YAWEN ZOU, Guang Li, Duo Su et al.
On the Surprising Effectiveness of Large Learning Rates under Standard Width Scaling
Moritz Haas, Sebastian Bordt, Ulrike Luxburg et al.
GroomLight: Hybrid Inverse Rendering for Relightable Human Hair Appearance Modeling
Yang Zheng, Menglei Chai, Delio Vicini et al.
Jigsaw++: Imagining Complete Shape Priors for Object Reassembly
Jiaxin Lu, Gang Hua, Qixing Huang
Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation
Yiheng Li, Yang Yang, Zichang Tan et al.
Self-Ensembling Gaussian Splatting for Few-Shot Novel View Synthesis
Chen Zhao, Xuan Wang, Tong Zhang et al.
Spatial-Temporal Aware Visuomotor Diffusion Policy Learning
Zhenyang Liu, Yikai Wang, Kuanning Wang et al.
HCRMP: An LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving
Zhiwen Chen, Hanming Deng, Zhuoren Li et al.
4D Gaussian Splatting SLAM
Yanyan Li, Youxu Fang, Zunjie Zhu et al.
Incentivizing Truthful Language Models via Peer Elicitation Games
Baiting Chen, Tong Zhu, Jiale Han et al.
FRESA: Feedforward Reconstruction of Personalized Skinned Avatars from Few Images
Rong Wang, Fabian Prada, Ziyan Wang et al.
GeoComplete: Geometry-Aware Diffusion for Reference-Driven Image Completion
Beibei Lin, Tingting Chen, Robby Tan
Sharp-It: A Multi-view to Multi-view Diffusion Model for 3D Synthesis and Manipulation
Yiftach Edelstein, Or Patashnik, Dana Cohen-Bar et al.
Identity-Clothing Similarity Modeling for Unsupervised Clothing Change Person Re-Identification
Zhiqi Pang, Junjie Wang, Lingling Zhao et al.
MaRI: Material Retrieval Integration across Domains
Jianhui Wang, Zhifei Yang, Yangfan He et al.
Non-stationary Bandit Convex Optimization: A Comprehensive Study
Xiaoqi Liu, Dorian Baudry, Julian Zimmert et al.
SCOUT: Teaching Pre-trained Language Models to Enhance Reasoning via Flow Chain-of-Thought
Guanghao Li, Wenhao Jiang, Mingfeng Chen et al.
Uncertainty Weighted Gradients for Model Calibration
Jinxu Lin, Linwei Tao, Minjing Dong et al.
AHCPTQ: Accurate and Hardware-Compatible Post-Training Quantization for Segment Anything Model
Wenlun Zhang, Yunshan Zhong, Shimpei Ando et al.
Shot-by-Shot: Film-Grammar-Aware Training-Free Audio Description Generation
Junyu Xie, Tengda Han, Max Bain et al.
MPMAvatar: Learning 3D Gaussian Avatars with Accurate and Robust Physics-Based Dynamics
Changmin Lee, Jihyun Lee, Tae-Kyun Kim
LocDiff: Identifying Locations on Earth by Diffusing in the Hilbert Space
Zhangyu Wang, Zeping Liu, Jielu Zhang et al.
Making Old Film Great Again: Degradation-aware State Space Model for Old Film Restoration
Yudong Mao, Hao Luo, Zhiwei Zhong et al.
Put CASH on Bandits: A Max K-Armed Problem for Automated Machine Learning
Amir Rezaei Balef, Claire Vernade, Katharina Eggensperger
Credal Prediction based on Relative Likelihood
Timo Löhr, Paul Hofman, Felix Mohr et al.
ShapeX: Shapelet-Driven Post Hoc Explanations for Time Series Classification Models
Bosong Huang, Ming Jin, Yuxuan Liang et al.
Gradient Multi-Normalization for Efficient LLM Training
Meyer Scetbon, Chao Ma, Wenbo Gong et al.
GS-Occ3D: Scaling Vision-only Occupancy Reconstruction with Gaussian Splatting
Baijun Ye, Minghui Qin, Saining Zhang et al.
Normalization in Attention Dynamics
Nikita Karagodin, Shu Ge, Yury Polyanskiy et al.
KL-Regularized RLHF with Multiple Reference Models: Exact Solutions and Sample Complexity
Gholamali Aminian, Amir R. Asadi, Idan Shenfeld et al.
Fourier Analysis Network
Yihong Dong, Ge Li, Yongding Tao et al.
SATURN: SAT-based Reinforcement Learning to Unleash LLMs Reasoning
Huanyu Liu, Jia Li, Hao Zhu et al.
Adaptive Stochastic Coefficients for Accelerating Diffusion Sampling
Ruoyu Wang, Beier Zhu, Junzhi Li et al.