Most Cited 2025 "shape-aware representations" Papers
22,274 papers found • Page 34 of 112
Conference
FlyLoRA: Boosting Task Decoupling and Parameter Efficiency via Implicit Rank-Wise Mixture-of-Experts
Heming Zou, Yunliang Zang, Wutong Xu et al.
Scaling Offline RL via Efficient and Expressive Shortcut Models
Nicolas Espinosa-Dice, Yiyi Zhang, Yiding Chen et al.
Behavior Injection: Preparing Language Models for Reinforcement Learning
Zhepeng Cen, Yihang Yao, William Han et al.
SceneSplat++: A Large Dataset and Comprehensive Benchmark for Language Gaussian Splatting
Mengjiao Ma, Qi Ma, Yue Li et al.
Towards Unified and Lossless Latent Space for 3D Molecular Latent Diffusion Modeling
Yanchen Luo, ZHIYUAN LIU, Yi Zhao et al.
Parameter Efficient Fine-tuning via Explained Variance Adaptation
Fabian Paischer, Lukas Hauzenberger, Thomas Schmied et al.
Unifying Re-Identification, Attribute Inference, and Data Reconstruction Risks in Differential Privacy
Bogdan Kulynych, Juan Gomez, Georgios Kaissis et al.
MIEB: Massive Image Embedding Benchmark
Chenghao Xiao, Isaac Chung, Imene Kerboua et al.
Distilled Prompt Learning for Incomplete Multimodal Survival Prediction
Yingxue Xu, Fengtao ZHOU, Chenyu Zhao et al.
$\Psi$-Sampler: Initial Particle Sampling for SMC-Based Inference-Time Reward Alignment in Score Models
Taehoon Yoon, Yunhong Min, Kyeongmin Yeo et al.
Distributionally Robust Learning for Multi-source Unsupervised Domain Adaptation
Zhenyu Wang, Peter Bühlmann, Zijian Guo
AgentBreeder: Mitigating the AI Safety Risks of Multi-Agent Scaffolds via Self-Improvement
J Rosser, Jakob Foerster
Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation
Shivam Duggal, Yushi Hu, Oscar Michel et al.
RIGNO: A Graph-based Framework For Robust And Accurate Operator Learning For PDEs On Arbitrary Domains
Sepehr Mousavi, Shizheng Wen, Levi Lingsch et al.
MIND: Math Informed syNthetic Dialogues for Pretraining LLMs
Syeda Nahida Akter, Shrimai Prabhumoye, John Kamalu et al.
What Matters in Data for DPO?
Yu Pan, Zhongze Cai, Huaiyang Zhong et al.
Future-Aware End-to-End Driving: Bidirectional Modeling of Trajectory Planning and Scene Evolution
Bozhou Zhang, Nan Song, jingyu li et al.
Walking the Tightrope: Autonomous Disentangling Beneficial and Detrimental Drifts in Non-Stationary Custom-Tuning
Xiaoyu Yang, Jie Lu, En Yu
Audio-Sync Video Generation with Multi-Stream Temporal Control
Shuchen Weng, Haojie Zheng, zheng chang et al.
Angular Steering: Behavior Control via Rotation in Activation Space
Minh Hieu Vu, Tan Nguyen
Make Your Training Flexible: Towards Deployment-Efficient Video Models
Chenting Wang, Kunchang Li, Tianxiang Jiang et al.
Breaking the Frozen Subspace: Importance Sampling for Low-Rank Optimization in LLM Pretraining
Haochen Zhang, Junze Yin, Guanchu Wang et al.
Towards Robust Multimodal Open-set Test-time Adaptation via Adaptive Entropy-aware Optimization
Hao Dong, Eleni Chatzi, Olga Fink
Chat-based Person Retrieval via Dialogue-Refined Cross-Modal Alignment
Yang Bai, Yucheng Ji, Min Cao et al.
Multi-turn Consistent Image Editing
Zijun Zhou, Yingying Deng, Xiangyu He et al.
LayerIF: Estimating Layer Quality for Large Language Models using Influence Functions
Hadi Askari, Shivanshu Gupta, Fei Wang et al.
Distillation Robustifies Unlearning
Bruce W, Lee, Addie Foote, Alex Infanger et al.
Generative Graph Pattern Machine
Zehong Wang, Zheyuan Zhang, Tianyi Ma et al.
Scene-Centric Unsupervised Panoptic Segmentation
Oliver Hahn, Christoph Reich, Nikita Araslanov et al.
Scaling Physical Reasoning with the PHYSICS Dataset
Shenghe Zheng, Qianjia Cheng, Junchi Yao et al.
Direct Numerical Layout Generation for 3D Indoor Scene Synthesis via Spatial Reasoning
Xingjian Ran, Yixuan Li, Linning Xu et al.
DMWM: Dual-Mind World Model with Long-Term Imagination
Lingyi Wang, Rashed Shelim, Walid Saad et al.
Blending Complementary Memory Systems in Hybrid Quadratic-Linear Transformers
Kazuki Irie, Morris Yau, Samuel J Gershman
LayerAnimate: Layer-level Control for Animation
Yuxue Yang, Lue Fan, Zuzeng Lin et al.
Encoder-Decoder Diffusion Language Models for Efficient Training and Inference
Marianne Arriola, Yair Schiff, Hao Phung et al.
A Stable Whitening Optimizer for Efficient Neural Network Training
Kevin Frans, Sergey Levine, Pieter Abbeel
Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs
Jingyao Wang, Wenwen Qiang, Zeen Song et al.
Understanding LLM Behaviors via Compression: Data Generation, Knowledge Acquisition and Scaling Laws
Zhixuan Pan, Shaowen Wang, Liao Pengfei et al.
Outdoor Monocular SLAM with Global Scale-Consistent 3D Gaussian Pointmaps
Chong Cheng, Sicheng Yu, Zijian Wang et al.
U-Know-DiffPAN: An Uncertainty-aware Knowledge Distillation Diffusion Framework with Details Enhancement for PAN-Sharpening
Sungpyo Kim, Jeonghyeok Do, Jaehyup Lee et al.
MIRE: Matched Implicit Neural Representations
Dhananjaya Jayasundara, Heng Zhao, Demetrio Labate et al.
Improving the Transferability of Adversarial Attacks on Face Recognition with Diverse Parameters Augmentation
Fengfan Zhou, Bangjie Yin, Hefei Ling et al.
Visual Persona: Foundation Model for Full-Body Human Customization
Jisu Nam, Soowon Son, Zhan Xu et al.
Realistic Test-Time Adaptation of Vision-Language Models
Maxime Zanella, Clément Fuchs, Christophe De Vleeschouwer et al.
Fast Last-Iterate Convergence of SGD in the Smooth Interpolation Regime
Amit Attia, Matan Schliserman, Uri Sherman et al.
Investigating the Pre-Training Dynamics of In-Context Learning: Task Recognition vs. Task Learning
Xiaolei Wang, Xinyu Tang, Junyi Li et al.
Feature Coding in the Era of Large Models: Dataset, Test Conditions, and Benchmark
Changsheng Gao, Yifan Ma, Qiaoxi Chen et al.
Multi-Modal View Enhanced Large Vision Models for Long-Term Time Series Forecasting
ChengAo Shen, Wenchao Yu, Ziming Zhao et al.
3D-MVP: 3D Multiview Pretraining for Manipulation
Shengyi Qian, Kaichun Mo, Valts Blukis et al.
R-LiViT: A LiDAR-Visual-Thermal Dataset Enabling Vulnerable Road User Focused Roadside Perception
Jonas Mirlach, Lei Wan, Andreas Wiedholz et al.
Time-IMM: A Dataset and Benchmark for Irregular Multimodal Multivariate Time Series
Ching Chang, Jeehyun Hwang, Yidan Shi et al.
OmniVCus: Feedforward Subject-driven Video Customization with Multimodal Control Conditions
Yuanhao Cai, HE Zhang, Xi Chen et al.
POT: Prototypical Optimal Transport for Weakly Supervised Semantic Segmentation
Jian Wang, Tianhong Dai, Bingfeng Zhang et al.
Fast Inference for Augmented Large Language Models
Rana Shahout, Cong Liang, Shiji Xin et al.
Narrating the Video: Boosting Text-Video Retrieval via Comprehensive Utilization of Frame-Level Captions
Chan Hur, Jeong-hun Hong, Dong-hun Lee et al.
Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond)
Tomer Garber, Tom Tirer
Learning Interpretable Queries for Explainable Image Classification with Information Pursuit
Stefan Kolek, Aditya Chattopadhyay, Kwan Ho Ryan Chan et al.
Low-Light Image Enhancement using Event-Based Illumination Estimation
Lei Sun, Yuhan Bao, Jiajun Zhai et al.
Differentiable Generalized Sliced Wasserstein Plans
Laetitia Chapel, Romain Tavenard, Samuel Vaiter
WildSeg3D: Segment Any 3D Objects in the Wild from 2D Images
Yansong Guo, Jie Hu, Yansong Qu et al.
Scaling Speculative Decoding with Lookahead Reasoning
Yichao Fu, Rui Ge, Zelei Shao et al.
Keyframe-Guided Creative Video Inpainting
Yuwei Guo, Ceyuan Yang, Anyi Rao et al.
ReWind: Understanding Long Videos with Instructed Learnable Memory
Anxhelo Diko, Tinghuai Wang, Wassim Swaileh et al.
DynFocus: Dynamic Cooperative Network Empowers LLMs with Video Understanding
Yudong Han, Qingpei Guo, Liyuan Pan et al.
Good, Cheap, and Fast: Overfitted Image Compression with Wasserstein Distortion
Jona Ballé, Luca Versari, Emilien Dupont et al.
DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning
Leander Diaz-Bone, Marco Bagatella, Jonas Hübotter et al.
Seeing the Arrow of Time in Large Multimodal Models
Zihui (Sherry) Xue, Romy Luo, Kristen Grauman
How to Probe: Simple Yet Effective Techniques for Improving Post-hoc Explanations
Siddhartha Gairola, Moritz Böhle, Francesco Locatello et al.
Probing Equivariance and Symmetry Breaking in Convolutional Networks
Sharvaree Vadgama, Mohammad Islam, Domas Buracas et al.
Semantic Library Adaptation: LoRA Retrieval and Fusion for Open-Vocabulary Semantic Segmentation
Reza Qorbani, Gianluca Villani, Theodoros Panagiotakopoulos et al.
Probabilistic Stability Guarantees for Feature Attributions
Helen Jin, Anton Xue, Weiqiu You et al.
Fused View-Time Attention and Feedforward Reconstruction for 4D Scene Generation
Chaoyang Wang, Ashkan Mirzaei, Vidit Goel et al.
PromptDresser: Improving the Quality and Controllability of Virtual Try-On via Generative Textual Prompt and Prompt-aware Mask
Jeongho Kim, Hoiyeong Jin, Sunghyun Park et al.
Chiron-o1: Igniting Multimodal Large Language Models towards Generalizable Medical Reasoning via Mentor-Intern Collaborative Search
Haoran Sun, Yankai Jiang, Wenjie Lou et al.
RainyGS: Efficient Rain Synthesis with Physically-Based Gaussian Splatting
Qiyu Dai, Xingyu Ni, Qianfan Shen et al.
CoDa-4DGS: Dynamic Gaussian Splatting with Context and Deformation Awareness for Autonomous Driving
Rui Song, Chenwei Liang, Yan Xia et al.
Towards Better Dental AI: A Multimodal Benchmark and Instruction Dataset for Panoramic X-ray Analysis
Jing Hao, Yuxuan Fan, Yanpeng Sun et al.
FAIR Universe HiggsML Uncertainty Dataset and Competition
Wahid Bhimji, Ragansu Chakkappai, Po-Wen Chang et al.
Time-Aware Auto White Balance in Mobile Photography
Mahmoud Afifi, Luxi Zhao, Abhijith Punnappurath et al.
DVHGNN: Multi-Scale Dilated Vision HGNN for Efficient Vision Recognition
Caoshuo Li, Tanzhe Li, Xiaobin Hu et al.
Real-IAD D³: A Real-World 2D/Pseudo-3D/3D Dataset for Industrial Anomaly Detection
wenbing zhu, Lidong Wang, Ziqing Zhou et al.
TCFG: Tangential Damping Classifier-free Guidance
Mingi Kwon, Shin seong Kim, Jaeseok Jeong et al.
PanDA: Towards Panoramic Depth Anything with Unlabeled Panoramas and Mobius Spatial Augmentation
Zidong Cao, Jinjing Zhu, Weiming Zhang et al.
LiT: Delving into a Simple Linear Diffusion Transformer for Image Generation
Jiahao Wang, Ning Kang, Lewei Yao et al.
Attractive Metadata Attack: Inducing LLM Agents to Invoke Malicious Tools
Kanghua Mo, Li Hu, Yucheng Long et al.
DriveX: Omni Scene Modeling for Learning Generalizable World Knowledge in Autonomous Driving
Chen Shi, Shaoshuai Shi, Kehua Sheng et al.
Test3R: Learning to Reconstruct 3D at Test Time
Yuheng Yuan, Qiuhong Shen, Shizun Wang et al.
Ego4o: Egocentric Human Motion Capture and Understanding from Multi-Modal Input
Jian Wang, Rishabh Dabral, Diogo Luvizon et al.
TRACE: Grounding Time Series in Context for Multimodal Embedding and Retrieval
Jialin Chen, Ziyu Zhao, Gaukhar Nurbek et al.
1000+ FPS 4D Gaussian Splatting for Dynamic Scene Rendering
Yuheng Yuan, Qiuhong Shen, Xingyi Yang et al.
Hierarchical Cross-modal Prompt Learning for Vision-Language Models
Hao Zheng, Shunzhi Yang, Zhuoxin He et al.
Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts
Qizhou Chen, Chengyu Wang, Dakan Wang et al.
From Replication to Redesign: Exploring Pairwise Comparisons for LLM-Based Peer Review
Yaohui Zhang, Haijing ZHANG, Wenlong Ji et al.
Rethinking Spiking Self-Attention Mechanism: Implementing α-XNOR Similarity Calculation in Spiking Transformers
Yichen Xiao, Shuai Wang, Dehao Zhang et al.
ARKit LabelMaker: A New Scale for Indoor 3D Scene Understanding
Guangda Ji, Silvan Weder, Francis Engelmann et al.
Pareto-Optimal Energy Alignment for Designing Nature-Like Antibodies
Yibo Wen, Chenwei Xu, Jerry Yao-Chieh Hu et al.
HandOS: 3D Hand Reconstruction in One Stage
Xingyu Chen, Zhuheng Song, Xiaoke Jiang et al.
Diagnosing and Addressing Pitfalls in KG-RAG Datasets: Toward More Reliable Benchmarking
Liangliang Zhang, Zhuorui Jiang, Hongliang Chi et al.
One Model Transfer to All: On Robust Jailbreak Prompts Generation against LLMs
Linbao Li, Yannan Liu, Daojing He et al.
SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents
Wanxin Tian, Shijie Zhang, Kevin Zhang et al.
Advantage Alignment Algorithms
Juan Duque, Milad Aghajohari, Timotheus Cooijmans et al.
GoRA: Gradient-driven Adaptive Low Rank Adaptation
haonan he, Peng Ye, Yuchen Ren et al.
Rethinking Bimanual Robotic Manipulation: Learning with Decoupled Interaction Framework
Jian-Jian Jiang, Xiao-Ming Wu, Yi-Xiang He et al.
MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds
Bingquan Dai, Luo Li, Qihong Tang et al.
CAT: Content-Adaptive Image Tokenization
Junhong Shen, Kushal Tirumala, Michihiro Yasunaga et al.
Selective induction Heads: How Transformers Select Causal Structures in Context
Francesco D'Angelo, francesco croce, Nicolas Flammarion
Matryoshka Pilot: Learning to Drive Black-Box LLMs with LLMs
ChangHao Li, Yuchen Zhuang, Rushi Qiang et al.
Dynamic Low-Rank Sparse Adaptation for Large Language Models
Weizhong Huang, Yuxin Zhang, Xiawu Zheng et al.
Learning with Calibration: Exploring Test-Time Computing of Spatio-Temporal Forecasting
Wei Chen, Yuxuan Liang
COME: Adding Scene-Centric Forecasting Control to Occupancy World Model
Yining Shi, Kun Jiang, Qiang Meng et al.
CoMatcher: Multi-View Collaborative Feature Matching
Jintao Zhang, Zimin Xia, Mingyue Dong et al.
ComfyMind: Toward General-Purpose Generation via Tree-Based Planning and Reactive Feedback
Litao Guo, Xinli Xu, Luozhou Wang et al.
Nonisotropic Gaussian Diffusion for Realistic 3D Human Motion Prediction
Cecilia Curreli, Dominik Muhle, Abhishek Saroha et al.
Towards Doctor-Like Reasoning: Medical RAG Fusing Knowledge with Patient Analogy through Textual Gradients
Yuxing Lu, Gecheng Fu, Wei Wu et al.
From Elements to Design: A Layered Approach for Automatic Graphic Design Composition
Jiawei Lin, Shizhao Sun, Danqing Huang et al.
RANGE: Retrieval Augmented Neural Fields for Multi-Resolution Geo-Embeddings
Aayush Dhakal, Srikumar Sastry, Subash Khanal et al.
DefectFill: Realistic Defect Generation with Inpainting Diffusion Model for Visual Inspection
Jaewoo Song, Daemin Park, Kanghyun Baek et al.
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback
Arun Verma, Zhongxiang Dai, Xiaoqiang Lin et al.
Frequency-Dynamic Attention Modulation For Dense Prediction
Linwei Chen, Lin Gu, Ying Fu
Unleashing the Potential of Vision-Language Pre-Training for 3D Zero-Shot Lesion Segmentation via Mask-Attribute Alignment
Yankai Jiang, Wenhui Lei, Xiaofan Zhang et al.
Dissecting Generalized Category Discovery: Multiplex Consensus under Self-Deconstruction
Luyao Tang, Kunze Huang, Yuxuan Yuan et al.
Consistently Simulating Human Personas with Multi-Turn Reinforcement Learning
Marwa Abdulhai, Ryan Cheng, Donovan Clay et al.
AlphaPre: Amplitude-Phase Disentanglement Model for Precipitation Nowcasting
Kenghong Lin, Baoquan Zhang, Demin Yu et al.
Win Fast or Lose Slow: Balancing Speed and Accuracy in Latency-Sensitive Decisions of LLMs
Hao Kang, Qingru Zhang, Han Cai et al.
PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model
Mingju Gao, Yike Pan, Huan-ang Gao et al.
PRIMAL: Physically Reactive and Interactive Motor Model for Avatar Learning
Yan Zhang, Yao Feng, Alpár Cseke et al.
Decoder-Hybrid-Decoder Architecture for Efficient Reasoning with Long Generation
Liliang Ren, Congcong Chen, Haoran Xu et al.
LeanVAE: An Ultra-Efficient Reconstruction VAE for Video Diffusion Models
Yu Cheng, Fajie Yuan
Centralized Reward Agent for Knowledge Sharing and Transfer in Multi-Task Reinforcement Learning
Haozhe Ma, Zhengding Luo, Thanh Vinh Vo et al.
SweetTok: Semantic-Aware Spatial-Temporal Tokenizer for Compact Video Discretization
Zhentao Tan, Ben Xue, Jian Jia et al.
ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models
Zifu Wan, Ce Zhang, Silong Yong et al.
EgoBridge: Domain Adaptation for Generalizable Imitation from Egocentric Human Data
Ryan Punamiya, Dhruv Patel, Patcharapong Aphiwetsa et al.
ReAL-AD: Towards Human-Like Reasoning in End-to-End Autonomous Driving
Yuhang Lu, Jiadong Tu, Yuexin Ma et al.
Robust Machine Unlearning for Quantized Neural Networks via Adaptive Gradient Reweighting with Similar Labels
Yujia Tong, Yuze Wang, Jingling Yuan et al.
Stable-SCore: A Stable Registration-based Framework for 3D Shape Correspondence
Haolin Liu, Xiaohang Zhan, Zizheng Yan et al.
AOR: Anatomical Ontology-Guided Reasoning for Medical Large Multimodal Model in Chest X-Ray Interpretation
Qingqiu Li, Zihang Cui, Seongsu Bae et al.
Effective Cloud Removal for Remote Sensing Images by an Improved Mean-Reverting Denoising Model with Elucidated Design Space
Yi Liu, Wengen Li, Jihong Guan et al.
MATCHA: Towards Matching Anything
Fei Xue, Sven Elflein, Laura Leal-Taixe et al.
SymRTLO: Enhancing RTL Code Optimization with LLMs and Neuron-Inspired Symbolic Reasoning
Yiting Wang, Wanghao Ye, Ping Guo et al.
TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modeling
Yuancheng Wang, Dekun Chen, Xueyao Zhang et al.
InteractionMap: Improving Online Vectorized HDMap Construction with Interaction
Kuang Wu, Chuan Yang, Zhanbin Li
Signal and Noise: A Framework for Reducing Uncertainty in Language Model Evaluation
David Heineman, Valentin Hofmann, Ian Magnusson et al.
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Zhengyao Lyu, Tianlin Pan, Chenyang Si et al.
RoboSense: Large-scale Dataset and Benchmark for Egocentric Robot Perception and Navigation in Crowded and Unstructured Environments
Haisheng Su, Feixiang Song, CONG MA et al.
Thinking in Character: Advancing Role-Playing Agents with Role-Aware Reasoning
Yihong Tang, Kehai Chen, Muyun Yang et al.
Snakes and Ladders: Two Steps Up for VideoMamba
Hui Lu, Albert Ali Salah, Ronald Poppe
EgoAgent: A Joint Predictive Agent Model in Egocentric Worlds
Lu Chen, Yizhou Wang, SHIXIANG TANG et al.
Truth over Tricks: Measuring and Mitigating Shortcut Learning in Misinformation Detection
Herun Wan, Jiaying Wu, Minnan Luo et al.
Beyond Scalars: Concept-Based Alignment Analysis in Vision Transformers
Johanna Vielhaben, Dilyara Bareeva, Jim Berend et al.
Optimal Spectral Transitions in High-Dimensional Multi-Index Models
Leonardo Defilippis, Yatin Dandi, Pierre Mergny et al.
Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition
Zhiyuan Chen, Keyi Li, Yifan Jia et al.
Large Language Models Think Too Fast To Explore Effectively
Lan Pan, Hanbo Xie, Robert Wilson
A Tale of Two Symmetries: Exploring the Loss Landscape of Equivariant Models
YuQing Xie, Tess Smidt
Demystifying Language Model Forgetting with Low-rank Example Associations
Xisen Jin, Xiang Ren
Generative Sparse-View Gaussian Splatting
Hanyang Kong, Xingyi Yang, Xinchao Wang
OmniCache: A Trajectory-Oriented Global Perspective on Training-Free Cache Reuse for Diffusion Transformer Models
Huanpeng Chu, Wei Wu, Guanyu Feng et al.
Logits DeConfusion with CLIP for Few-Shot Learning
Shuo Li, Fang Liu, Zehua Hao et al.
LVFace: Progressive Cluster Optimization for Large Vision Models in Face Recognition
Jinghan You, Shanglin Li, Yuanrui Sun et al.
ROSE: Remove Objects with Side Effects in Videos
Chenxuan Miao, Yutong Feng, Jianshu Zeng et al.
Stabilized Neural Prediction of Potential Outcomes in Continuous Time
Konstantin Hess, Stefan Feuerriegel
Textured 3D Regenerative Morphing with 3D Diffusion Prior
Songlin Yang, Yushi LAN, Honghua Chen et al.
StyleSSP: Sampling StartPoint Enhancement for Training-free Diffusion-based Method for Style Transfer
ruojun xu, Weijie Xi, Xiaodi Wang et al.
EOC-Bench: Can MLLMs Identify, Recall, and Forecast Objects in an Egocentric World?
Yuqian Yuan, Ronghao Dang, long li et al.
UVGS: Reimagining Unstructured 3D Gaussian Splatting using UV Mapping
Aashish Rai, Dilin Wang, Mihir Jain et al.
MedSegFactory: Text-Guided Generation of Medical Image-Mask Pairs
Jiawei Mao, Yuhan Wang, Yucheng Tang et al.
HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets and CLIP Models
ZHIXIANG WEI, Guangting Wang, Xiaoxiao Ma et al.
Point Clouds Meets Physics: Dynamic Acoustic Field Fitting Network for Point Cloud Understanding
Changshuo Wang, Shuting He, Xiang Fang et al.
Maximizing the Potential of Synthetic Data: Insights from Random Matrix Theory
Aymane El Firdoussi, Mohamed El Amine Seddik, Soufiane Hayou et al.
Decompile-Bench: Million-Scale Binary-Source Function Pairs for Real-World Binary Decompilation
hanzhuo tan, Xiaolong Tian, Hanrui Qi et al.
Absorb and Converge: Provable Convergence Guarantee for Absorbing Discrete Diffusion Models
Yuchen Liang, Renxiang Huang, Lifeng LAI et al.
Navigating Image Restoration with VAR’s Distribution Alignment Prior
Siyang Wang, Naishan Zheng, Jie Huang et al.
Beyond Local Sharpness: Communication-Efficient Global Sharpness-aware Minimization for Federated Learning
Debora Caldarola, Pietro Cagnasso, Barbara Caputo et al.
Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks
Daniel Kunin, Giovanni Luca Marchetti, Feng Chen et al.
Detect-and-Guide: Self-regulation of Diffusion Models for Safe Text-to-Image Generation via Guideline Token Optimization
Feifei Li, Mi Zhang, Yiming Sun et al.
Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss Landscapes
Aodi Li, Liansheng Zhuang, Xiao Long et al.
T2V-OptJail: Discrete Prompt Optimization for Text-to-Video Jailbreak Attacks
Jiayang Liu, Siyuan Liang, Shiqian Zhao et al.
On scalable and efficient training of diffusion samplers
Minkyu Kim, Kiyoung Seong, Dongyeop Woo et al.
LaTexBlend: Scaling Multi-concept Customized Generation with Latent Textual Blending
Jian Jin, Zhenbo Yu, Yang Shen et al.
Golden Cudgel Network for Real-Time Semantic Segmentation
Guoyu Yang, Yuan Wang, Daming Shi et al.
GazeGaussian: High-Fidelity Gaze Redirection with 3D Gaussian Splatting
Xiaobao Wei, Peng Chen, Guangyu Li et al.
Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training
Will Merrill, Shane Arora, Dirk Groeneveld et al.
CountLLM: Towards Generalizable Repetitive Action Counting via Large Language Model
Ziyu Yao, Xuxin Cheng, Zhiqi Huang et al.
SDGOCC: Semantic and Depth-Guided Bird's-Eye View Transformation for 3D Multimodal Occupancy Prediction
ZaiPeng Duan, Xuzhong Hu, Pei An et al.
GLEAM: Learning Generalizable Exploration Policy for Active Mapping in Complex 3D Indoor Scene
Xiao Chen, Tai Wang, Quanyi Li et al.
Relation3D : Enhancing Relation Modeling for Point Cloud Instance Segmentation
Edward LOO, Jiacheng Deng
CODA: Repurposing Continuous VAEs for Discrete Tokenization
Zeyu Liu, Zanlin Ni, Yeguo Hua et al.
Temporal Representation Alignment: Successor Features Enable Emergent Compositionality in Robot Instruction Following
Vivek Myers, Bill Zheng, Anca Dragan et al.
Structured Reinforcement Learning for Combinatorial Decision-Making
Heiko Hoppe, Léo Baty, Louis Bouvier et al.
Video Perception Models for 3D Scene Synthesis
Rui Huang, Guangyao Zhai, Zuria Bauer et al.
GuardSplat: Efficient and Robust Watermarking for 3D Gaussian Splatting
Zixuan Chen, Guangcong Wang, Jiahao Zhu et al.
Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks
Vishnu Sarukkai, Zhiqiang Xie, Kayvon Fatahalian
Short-length Adversarial Training Helps LLMs Defend Long-length Jailbreak Attacks: Theoretical and Empirical Evidence
Shaopeng Fu, Liang Ding, Jingfeng ZHANG et al.
Linear combinations of latents in generative models: subspaces and beyond
Erik Bodin, Alexandru Stere, Dragos Margineantu et al.
DLF: Extreme Image Compression with Dual-generative Latent Fusion
Naifu Xue, Zhaoyang Jia, Jiahao Li et al.
A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone
Jitai Hao, Qiang Huang, Hao Liu et al.
Cached Multi-Lora Composition for Multi-Concept Image Generation
Xiandong Zou, Mingzhu Shen, Christos-Savvas Bouganis et al.
Hearing Anywhere in Any Environment
Xiulong Liu, Anurag Kumar, Paul Calamia et al.
AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization
Yiyang Du, Xiaochen Wang, Chi Chen et al.
LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers
Yusuf Dalva, Hidir Yesiltepe, Pinar Yanardag
RePerformer: Immersive Human-centric Volumetric Videos from Playback to Photoreal Reperformance
Yuheng Jiang, Zhehao Shen, Chengcheng Guo et al.