Most Cited 2025 "hardware robotic control" Papers
22,274 papers found • Page 65 of 112
Conference
Visual Consensus Prompting for Co-Salient Object Detection
Jie Wang, Nana Yu, Zihao Zhang et al.
Flexible Group Count Enables Hassle-Free Structured Pruning
Jiamu Zhang, Shaochen Zhong, Andrew Ye et al.
Object-level Correlation for Few-Shot Segmentation
chunlin wen, Yu Zhang, Jie Fan et al.
GLEAM: Enhanced Transferable Adversarial Attacks for Vision-Language Pre-training Models via Global-Local Transformations
Yunqi Liu, Xiaohui Cui, Ouyang Xue
EasyCraft: A Robust and Efficient Framework for Automatic Avatar Crafting
Suzhen Wang, Weijie Chen, Wei Zhang et al.
RefPose: Leveraging Reference Geometric Correspondences for Accurate 6D Pose Estimation of Unseen Objects
Jaeguk Kim, Jaewoo Park, Keuntek Lee et al.
Neural Collapse in Cumulative Link Models for Ordinal Regression: An Analysis with Unconstrained Feature Model
Chuang Ma, Tomoyuki Obuchi, Toshiyuki Tanaka
ChartPoint: Guiding MLLMs with Grounding Reflection for Chart Reasoning
Zhengzhuo Xu, Sinan Du, Yiyan Qi et al.
Stepping Out of Similar Semantic Space for Open-Vocabulary Segmentation
Yong Liu, Song-Li Wu, Sule Bai et al.
PhyS-EdiT: Physics-aware Semantic Image Editing with Text Description
Ziqi Cai, Shuchen Weng, Yifei Xia et al.
MaGS: Reconstructing and Simulating Dynamic 3D Objects with Mesh-adsorbed Gaussian Splatting
Shaojie Ma, Yawei Luo, Wei Yang et al.
A Unifying View of Linear Function Approximation in Off-Policy RL Through Matrix Splitting and Preconditioning
Zechen Wu, Amy Greenwald, Ronald Parr
Constrained Posterior Sampling: Time Series Generation with Hard Constraints
Sai Shankar Narasimhan, Shubhankar Agarwal, Litu Rout et al.
Fancy123: One Image to High-Quality 3D Mesh Generation via Plug-and-Play Deformation
Qiao Yu, Xianzhi Li, Yuan Tang et al.
Hand-held Object Reconstruction from RGB Video with Dynamic Interaction
Shijian Jiang, Qi Ye, Rengan Xie et al.
Exploiting the Asymmetric Uncertainty Structure of Pre-trained VLMs on the Unit Hypersphere
Li Ju, Max Andersson, Stina Fredriksson et al.
A Practical Guide for Incorporating Symmetry in Diffusion Policy
Dian Wang, Boce Hu, Shuran Song et al.
Bootstrap Your Own Views: Masked Ego-Exo Modeling for Fine-grained View-invariant Video Representations
Jungin Park, Jiyoung Lee, Kwanghoon Sohn
Multivariate Latent Recalibration for Conditional Normalizing Flows
Victor Dheur, Souhaib Ben Taieb
Reinforcement Learning Meets Masked Generative Models: Mask-GRPO for Text-to-Image Generation
Yifu Luo, Xinhao Hu, Keyu Fan et al.
Few-Shot Learning from Gigapixel Images via Hierarchical Vision-Language Alignment and Modeling
Bryan Wong, Jongwoo Kim, Huazhu Fu et al.
GausSim: Foreseeing Reality by Gaussian Simulator for Elastic Objects
Yidi Shao, Mu Huang, Chen Change Loy et al.
HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding
JIAHE ZHAO, RuiBing Hou, zejie tian et al.
ProReflow: Progressive Reflow with Decomposed Velocity
Lei Ke, Haohang Xu, Xuefei Ning et al.
LP-Diff: Towards Improved Restoration of Real-World Degraded License Plate
Haoyan Gong, Zhenrong Zhang, Yuzheng Feng et al.
Evaluating Model Perception of Color Illusions in Photorealistic Scenes
Lingjun Mao, Zineng Tang, Alane Suhr
Generative Active Learning for Long-tail Trajectory Prediction via Controllable Diffusion Model
Daehee Park, Monu Surana, Pranav Desai et al.
The Promise of RL for Autoregressive Image Editing
Saba Ahmadi, Rabiul Awal, Ankur Sikarwar et al.
GLSim: Detecting Object Hallucinations in LVLMs via Global-Local Similarity
Seongheon Park, Sharon Li
PointMapPolicy: Structured Point Cloud Processing for Multi-Modal Imitation Learning
Xiaogang Jia, Qian Wang, Anrui Wang et al.
Teaching Language Models to Reason with Tools
Chengpeng Li, Zhengyang Tang, Ziniu Li et al.
CLIMB: Class-imbalanced Learning Benchmark on Tabular Data
Zhining Liu, Zihao Li, Ze Yang et al.
Demeter: A Parametric Model of Crop Plant Morphology from the Real World
Tianhang Cheng, Albert Zhai, Evan Chen et al.
OpenWorldSAM: Extending SAM2 for Universal Image Segmentation with Language Prompts
Shiting (Ginny) Xiao, Rishabh Kabra, Yuhang Li et al.
CMT: A Cascade MAR with Topology Predictor for Multimodal Conditional CAD Generation
Jianyu Wu, Yizhou Wang, Xiangyu Yue et al.
Scalable In-context Ranking with Generative Models
Nilesh Gupta, Chong You, Srinadh Bhojanapalli et al.
Perspective-Aware Teaching: Adapting Knowledge for Heterogeneous Distillation
Jhe-Hao Lin, Yi Yao, Chan-Feng Hsu et al.
Self-Supervised Spatial Correspondence Across Modalities
Ayush Shrivastava, Andrew Owens
Unsupervised Continual Domain Shift Learning with Multi-Prototype Modeling
Haopeng Sun, Yingwei Zhang, Lumin Xu et al.
High-order Equivariant Flow Matching for Density Functional Theory Hamiltonian Prediction
Seongsu Kim, Nayoung Kim, Dongwoo Kim et al.
SVRPBench: A Realistic Benchmark for Stochastic Vehicle Routing Problem
Ahmed Heakl, Yahia Salaheldin Shaaban, Salem Lahlou et al.
ACCO: Accumulate While You Communicate for Communication-Overlapped Sharded LLM Training
Adel Nabli, Louis Fournier, Pierre ERBACHER et al.
Edit Less, Achieve More: Dynamic Sparse Neuron Masking for Lifelong Knowledge Editing in LLMs
Jinzhe Liu, Junshu Sun, Shufan Shen et al.
HMARL-CBF – Hierarchical Multi-Agent Reinforcement Learning with Control Barrier Functions for Safety-Critical Autonomous Systems
H M Sabbir Ahmad, Ehsan Sabouni, Alexander Wasilkoff et al.
FLOWING: Implicit Neural Flows for Structure-Preserving Morphing
Arthur Bizzi, Matias Grynberg Portnoy, Vitor Pereira Matias et al.
A Structure-aware and Motion-adaptive Framework for 3D Human Pose Estimation with Mamba
Ye Lu, Jie Wang, Jianjun Gao et al.
Proximalized Preference Optimization for Diverse Feedback Types: A Decomposed Perspective on DPO
Kaiyang Guo, Yinchuan Li, Zhitang Chen
PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning
Zongqian Li, Yixuan Su, Nigel Collier
Continuous Simplicial Neural Networks
Aref Einizade, Dorina Thanou, Fragkiskos Malliaros et al.
Neural Inverse Rendering from Propagating Light
Anagh Malik, Benjamin Attal, Andrew Xie et al.
SplArt: Articulation Estimation and Part-Level Reconstruction with 3D Gaussian Splatting
Shengjie Lin, Jiading Fang, Muhammad Zubair Irshad et al.
RAST: Reasoning Activation in LLMs via Small-model Transfer
Siru Ouyang, Xinyu Zhu, Zilin Xiao et al.
Reasoning Mamba: Hypergraph-Guided Region Relation Calculating for Weakly Supervised Affordance Grounding
Yuxuan Wang, Aming Wu, Muli Yang et al.
Derivative-Free Diffusion Manifold-Constrained Gradient for Unified XAI
Won Jun Kim, Hyungjin Chung, Jaemin Kim et al.
Large Language Bayes
Justin Domke
Hadamax Encoding: Elevating Performance in Model-Free Atari
Jacob Eeuwe Kooi, Zhao Yang, Vincent Francois-Lavet
2D Gaussian Splatting-based Sparse-view Transparent Object Depth Reconstruction via Physics Simulation for Scene Update
Jeongyun Kim, Seunghoon Jeong, Giseop Kim et al.
LookWhere? Efficient Visual Recognition by Learning Where to Look and What to See from Self-Supervision
Anthony Fuller, Yousef Yassin, Junfeng Wen et al.
DTOS: Dynamic Time Object Sensing with Large Multimodal Model
Jirui Tian, Jinrong Zhang, Shenglan Liu et al.
KMD: Koopman Multi-modality Decomposition for Generalized Brain Tumor Segmentation under Incomplete Modalities
Tianyi Liu, Haochuan Jiang, Kaizhu Huang
Modeling the Economic Impacts of AI Openness Regulation
Tori Qiu, Benjamin Laufer, Jon Kleinberg et al.
Blind2Sound: Self-Supervised Image Denoising without Residual Noise
Jiazheng Liu, Zejin Wang, Bohao Chen et al.
Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images
Boyang Deng, Kyle Genova, Songyou Peng et al.
IMoRe: Implicit Program-Guided Reasoning for Human Motion Q&A
Chen Li, Chinthani Sugandhika, Ee Yeo Keat et al.
VSC: Visual Search Compositional Text-to-Image Diffusion Model
Do Dat, Nam Hyeon-Woo, Po-Yuan Mao et al.
Bridging the Skeleton-Text Modality Gap: Diffusion-Powered Modality Alignment for Zero-shot Skeleton-based Action Recognition
Jeonghyeok Do, Munchurl Kim
AIM-Fair: Advancing Algorithmic Fairness via Selectively Fine-Tuning Biased Models with Contextual Synthetic Data
Zengqun Zhao, Ziquan Liu, Yu Cao et al.
ChartCap: Mitigating Hallucination of Dense Chart Captioning
Junyoung Lim, Jaewoo Ahn, Gunhee Kim
PERSONA: Personalized Whole-Body 3D Avatar with Pose-Driven Deformations from a Single Image
Geonhee Sim, Gyeongsik Moon
Seeing is Believing? Mitigating OCR Hallucinations in Multimodal Large Language Models
zhentao he, Can Zhang, Ziheng Wu et al.
A Unified, Resilient, and Explainable Adversarial Patch Detector
Vishesh Kumar, Akshay Agarwal
SVG-Head: Hybrid Surface-Volumetric Gaussians for High-Fidelity Head Reconstruction and Real-Time Editing
Heyi Sun, Cong Wang, Tian-Xing Xu et al.
Synchronization of Multiple Videos
Avihai Naaman, Ron Shapira Weber, Oren Freifeld
Demystifying Network Foundation Models
Roman Beltiukov, Satyandra Guthula, Wenbo Guo et al.
Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data
Zhiyuan Ma, Xinyue Liang, Rongyuan Wu et al.
PseudoMapTrainer: Learning Online Mapping without HD Maps
Christian Löwens, Thorben Funke, Jingchao Xie et al.
MaNGO — Adaptable Graph Network Simulators via Meta-Learning
Philipp Dahlinger, Tai Hoang, Denis Blessing et al.
Hierarchical Flow Diffusion for Efficient Frame Interpolation
Yang Hai, Guo Wang, Tan Su et al.
From Average-Iterate to Last-Iterate Convergence in Games: A Reduction and Its Applications
Yang Cai, Haipeng Luo, Chen-Yu Wei et al.
Adapting In-Domain Few-Shot Segmentation to New Domains without Source Domain Retraining
Qi Fan, Kaiqi Liu, Nian Liu et al.
DisenQ: Disentangling Q-Former for Activity-Biometrics
Shehreen Azad, Yogesh Rawat
Trokens: Semantic-Aware Relational Trajectory Tokens for Few-Shot Action Recognition
Pulkit Kumar, Shuaiyi Huang, Matthew Walmer et al.
Datasets, Documents, and Repetitions: The Practicalities of Unequal Data Quality
Alex Fang, Hadi Pouransari, Matt Jordan et al.
Improving Noise Efficiency in Privacy-preserving Dataset Distillation
Runkai Zheng, Vishnu Dasu, Yinong Wang et al.
What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning
Chi-Hsi Kung, Frangil Ramirez, Juhyung Ha et al.
QUT-DV25: A Dataset for Dynamic Analysis of Next-Gen Software Supply Chain Attacks
Sk Tanzir Mehedi, Raja Jurdak, Chadni Islam et al.
Reconciling Stochastic and Deterministic Strategies for Zero-shot Image Restoration using Diffusion Model in Dual
Chong Wang, Lanqing Guo, Zixuan Fu et al.
Slow Transition to Low-Dimensional Chaos in Heavy-Tailed Recurrent Neural Networks
Eva Xie, Stefan Mihalas, Łukasz Kuśmierz
VoluMe – Authentic 3D Video Calls from Live Gaussian Splat Prediction
Martin de La Gorce, Charlie Hewitt, Tibor Takács et al.
A Theoretical Framework for Grokking: Interpolation followed by Riemannian Norm Minimisation
Etienne Boursier, Scott Pesme, Radu-Alexandru Dragomir
Metric Convolutions: A Unifying Theory to Adaptive Image Convolutions
Thomas Dagès, Michael Lindenbaum, Alfred Bruckstein
PatchGuard: Adversarially Robust Anomaly Detection and Localization through Vision Transformers and Pseudo Anomalies
Mojtaba Nafez, Amirhossein Koochakian, Arad Maleki et al.
Mixed Signals: A Diverse Point Cloud Dataset for Heterogeneous LiDAR V2X Collaboration
Katie Luo, Minh-Quan Dao, Zhenzhen Liu et al.
On-Device Diffusion Transformer Policy for Efficient Robot Manipulation
Yiming Wu, Huan Wang, Zhenghao Chen et al.
A Partition Cover Approach to Tokenization
Jia Peng Lim, Shawn Tan, XianJun, Davin Choo et al.
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation
Qihui Zhang, Munan Ning, Zheyuan Liu et al.
Towards Explicit Exoskeleton for the Reconstruction of Complicated 3D Human Avatars
Yifan Zhan, Qingtian Zhu, Muyao Niu et al.
Sound Bridge: Associating Egocentric and Exocentric Videos via Audio Cues
Sihong Huang, Jiaxin Wu, Xiaoyong Wei et al.
PhySense: Sensor Placement Optimization for Accurate Physics Sensing
Yuezhou Ma, Haixu Wu, Hang Zhou et al.
Music-Aligned Holistic 3D Dance Generation via Hierarchical Motion Modeling
LI XIAOJIE, Ronghui Li, Shukai Fang et al.
Rethinking Optimal Verification Granularity for Compute-Efficient Test-Time Scaling
Hao Chen, Guanxi Lu, Yasuyuki Okoshi et al.
Transfer Your Perspective: Controllable 3D Generation from Any Viewpoint in a Driving Scene
Tai-Yu Daniel Pan, Sooyoung Jeon, Mengdi Fan et al.
DIP: Unsupervised Dense In-Context Post-training of Visual Representations
Sophia Sirko-Galouchenko, Spyros Gidaris, Antonin Vobecky et al.
On the Global Optimality of Policy Gradient Methods in General Utility Reinforcement Learning
Anas Barakat, Souradip Chakraborty, Peihong Yu et al.
LUT-Fuse: Towards Extremely Fast Infrared and Visible Image Fusion via Distillation to Learnable Look-Up Tables
Xunpeng Yi, yibing zhang, Xinyu Xiang et al.
Bridging Symmetry and Robustness: On the Role of Equivariance in Enhancing Adversarial Robustness
Longwei Wang, Ifrat Ikhtear Uddin, Prof. KC Santosh (PhD) et al.
Hierarchical Frequency Tagging Probe (HFTP): A Unified Approach to Investigate Syntactic Structure Representations in Large Language Models and the Human Brain
Jingmin An, Yilong Song, Ruolin Yang et al.
ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search
Zeyu Shen, Basileal Imana, Tong Wu et al.
Not All Parameters Matter: Masking Diffusion Models for Enhancing Generation Ability
Lei Wang, Senmao Li, Fei Yang et al.
Multi-Modal Synergistic Implicit Image Enhancement for Efficient Optical Flow Estimation
Weichen Dai, wu hexing, xiaoyang weng et al.
Identity Preserving 3D Head Stylization with Multiview Score Distillation
Bahri Batuhan Bilecen, Ahmet Berke Gokmen, Furkan Güzelant et al.
Text-Aware Real-World Image Super-Resolution via Diffusion Model with Joint Segmentation Decoders
Qiming Hu, Linlong Fan, Yiyan Luo et al.
Boosting the Dual-Stream Architecture in Ultra-High Resolution Segmentation with Resolution-Biased Uncertainty Estimation
Rong Qin, Xingyu Liu, Jinglei Shi et al.
SG-LDM: Semantic-Guided LiDAR Generation via Latent-Aligned Diffusion
Zhengkang Xiang, Zizhao Li, Amir Khodabandeh et al.
IntroStyle: Training-Free Introspective Style Attribution using Diffusion Features
Anand Kumar, Jiteng Mu, Nuno Vasconcelos
Towards Million-Scale Adversarial Robustness Evaluation With Stronger Individual Attacks
Yong Xie, Weijie Zheng, Hanxun Huang et al.
Ponimator: Unfolding Interactive Pose for Versatile Human-human Interaction Animation
Shaowei Liu, chuan guo, Bing Zhou et al.
SmallKV: Small Model Assisted Compensation of KV Cache Compression for Efficient LLM Inference
Yi Zhao, Yajuan Peng, Nguyen Cam-Tu et al.
MatchDiffusion: Training-free Generation of Match-Cuts
Alejandro Pardo, Fabio Pizzati, Tong Zhang et al.
GRAPE: Optimize Data Mixture for Group Robust Multi-target Adaptive Pretraining
Simin Fan, Maria Ios Glarou, Martin Jaggi
Reversing Flow for Image Restoration
Haina Qin, Wenyang Luo, Bing Li et al.
DOGR: Towards Versatile Visual Document Grounding and Referring
Yinan Zhou, Yuxin Chen, Haokun Lin et al.
DualOptim: Enhancing Efficacy and Stability in Machine Unlearning with Dual Optimizers
Xuyang Zhong, Haochen Luo, Chen Liu
GlyphMastero: A Glyph Encoder for High-Fidelity Scene Text Editing
Tong Wang, Ting Liu, Xiaochao Qu et al.
MEAT: Multiview Diffusion Model for Human Generation on Megapixels with Mesh Attention
Yuhan Wang, Fangzhou Hong, Shuai Yang et al.
SDMatte: Grafting Diffusion Models for Interactive Matting
Longfei Huang, Yu Liang, Hao Zhang et al.
Certified Human Trajectory Prediction
Mohammadhossein Bahari, Saeed Saadatnejad, Amirhossein Askari Farsangi et al.
SMGDiff: Soccer Motion Generation using Diffusion Probabilistic Models
Hongdi Yang, Chengyang Li, Zhenxuan Wu et al.
Activation-Guided Consensus Merging for Large Language Models
Yuxuan Yao, Shuqi LIU, Zehua Liu et al.
Nested Diffusion Models Using Hierarchical Latent Priors
Xiao Zhang, Ruoxi Jiang, Rebecca Willett et al.
Quantum Doubly Stochastic Transformers
Jannis Born, Filip Skogh, Kahn Rhrissorrakrai et al.
Fair Deepfake Detectors Can Generalize
Harry Cheng, Ming-Hui Liu, Yangyang Guo et al.
Panoptic Captioning: An Equivalence Bridge for Image and Text
Kun-Yu Lin, Hongjun Wang, Weining Ren et al.
Pairwise Calibrated Rewards for Pluralistic Alignment
Daniel Halpern, Evi Micha, Ariel Procaccia et al.
Gradient Variance Reveals Failure Modes in Flow-Based Generative Models
Teodora Reu, Sixtine Dromigny, Michael Bronstein et al.
DiffCAM: Data-Driven Saliency Maps by Capturing Feature Differences
Xingjian Li, Qiming Zhao, Neelesh Bisht et al.
Efficient Autoregressive Shape Generation via Octree-Based Adaptive Tokenization
Kangle Deng, Hsueh-Ti Derek Liu, Yiheng Zhu et al.
SPADE: Spatial-Aware Denoising Network for Open-vocabulary Panoptic Scene Graph Generation with Long- and Local-range Context Reasoning
XIN Hu, Ke Qin, Guiduo Duan et al.
EvolvingGrasp: Evolutionary Grasp Generation via Efficient Preference Alignment
Yufei Zhu, Yiming Zhong, Zemin Yang et al.
RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions
Bimsara Pathiraja, Maitreya Patel, Shivam Singh et al.
A Theory for Worst-Case vs. Average-Case Guarantees for LLMs
Noga Amit, Shafi Goldwasser, Orr Paradise et al.
Bootstrap3D: Improving Multi-view Diffusion Model with Synthetic Data
Zeyi Sun, Tong Wu, Pan Zhang et al.
Enhancing Dance-to-Music Generation via Negative Conditioning Latent Diffusion Model
Changchang Sun, Gaowen Liu, Charles Fleming et al.
StyleKeeper: Prevent Content Leakage using Negative Visual Query Guidance
Jaeseok Jeong, Junho Kim, Youngjung Uh et al.
FaceCraft4D: Animated 3D Facial Avatar Generation from a Single Image
Fei Yin, Mallikarjun Reddy, Chun-Han Yao et al.
Towards Human-Understandable Multi-Dimensional Concept Discovery
Arne Grobrügge, Niklas Kühl, Gerhard Satzger et al.
MUSE: Multi-Subject Unified Synthesis via Explicit Layout Semantic Expansion
Fei Peng, Junqiang Wu, Yan Li et al.
Sequential keypoint density estimator: an overlooked baseline of skeleton-based video anomaly detection
Anja Delić, Matej Grcic, Siniša Šegvić
Nabla-R2D3: Effective and Efficient 3D Diffusion Alignment with 2D Rewards
Qingming LIU, Zhen Liu, Dinghuai Zhang et al.
ProbRes: Probabilistic Jump Diffusion for Open-World Egocentric Activity Recognition
Sanjoy Kundu, Shanmukha Vellamcheti, Sathyanarayanan Aakur
Towards Identifiability of Hierarchical Temporal Causal Representation Learning
Zijian Li, Minghao Fu, Junxian Huang et al.
Multi-Object Sketch Animation by Scene Decomposition and Motion Planning
Jingyu Liu, Zijie Xin, Yuhan Fu et al.
SAGI: Semantically Aligned and Uncertainty Guided AI Image Inpainting
Paschalis Giakoumoglou, Dimitrios Karageorgiou, Symeon Papadopoulos et al.
Toward Efficient Inference Attacks: Shadow Model Sharing via Mixture-of-Experts
Li Bai, Qingqing Ye, Xinwei Zhang et al.
What Changed? Detecting and Evaluating Instruction-Guided Image Edits with Multimodal Large Language Models
Lorenzo Baraldi, Davide Bucciarelli, Federico Betti et al.
Joint Asymmetric Loss for Learning with Noisy Labels
Jialiang Wang, Xianming Liu, Xiong Zhou et al.
Imagined Autocurricula
Ahmet Hamdi Güzel, Matthew T Jackson, Jarek Liesen et al.
From Black-box to Causal-box: Towards Building More Interpretable Models
Inwoo Hwang, Yushu Pan, Elias Bareinboim
DIMCIM: A Quantitative Evaluation Framework for Default-mode Diversity and Generalization in Text-to-Image Generative Models
Revant Teotia, Candace Ross, Karen Ullrich et al.
Boosting Domain Incremental Learning: Selecting the Optimal Parameters is All You Need
Qiang Wang, Xiang Song, Yuhang He et al.
Attack by Yourself: Effective and Unnoticeable Multi-Category Graph Backdoor Attacks with Subgraph Triggers Pool
Jiangtong Li, Dongyi Liu, Kun Zhu et al.
UniHOPE: A Unified Approach for Hand-Only and Hand-Object Pose Estimation
Yinqiao Wang, Hao Xu, Pheng-Ann Heng et al.
Unlocking Constraints: Source-Free Occlusion-Aware Seamless Segmentation
Yihong Cao, Jiaming Zhang, Xu Zheng et al.
Exploiting Vocabulary Frequency Imbalance in Language Model Pre-training
Woojin Chung, Jeonghoon Kim
Path Gradients after Flow Matching
Lorenz Vaitl, Leon Klein
ALINE: Joint Amortization for Bayesian Inference and Active Data Acquisition
Daolang Huang, Xinyi Wen, Ayush Bharti et al.
MAVias: Mitigate any Visual Bias
Ioannis Sarridis, Christos Koutlis, Symeon Papadopoulos et al.
LookOut: Real-World Humanoid Egocentric Navigation
Boxiao Pan, Adam Harley, Francis Engelmann et al.
AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
Shouwei Ruan, Hanqing Liu, Yao Huang et al.
From Objects to Events: Unlocking Complex Visual Understanding in Object Detectors via LLM-guided Symbolic Reasoning
Yuhui Zeng, Haoxiang Wu, Wenjie Nie et al.
PLA: Prompt Learning Attack against Text-to-Image Generative Models
XINQI LYU, Yihao LIU, Yanjie Li et al.
msf-CNN: Patch-based Multi-Stage Fusion with Convolutional Neural Networks for TinyML
Zhaolan Huang, Emmanuel Baccelli
No Thing, Nothing: Highlighting Safety-Critical Classes for Robust LiDAR Semantic Segmentation in Adverse Weather
Junsung Park, HwiJeong Lee, Inha Kang et al.
Holistic Tokenizer for Autoregressive Image Generation
Anlin Zheng, Haochen Wang, Yucheng Zhao et al.
Rao-Blackwell Gradient Estimators for Equivariant Denoising Diffusion
Vinh Tong, Trung-Dung Hoang, Anji Liu et al.
Rethinking Lanes and Points in Complex Scenarios for Monocular 3D Lane Detection
Yifan Chang, Junjie Huang, Xiaofeng Wang et al.
Restricted Spectral Gap Decomposition for Simulated Tempering Targeting Mixture Distributions
Jhanvi Garg, Krishnakumar Balasubramanian, Quan Zhou
Improving Sound Source Localization with Joint Slot Attention on Image and Audio
Inho Kim, YOUNGKIL SONG, Jicheol Park et al.
Constructing an Optimal Behavior Basis for the Option Keyboard
Lucas N. Alegre, Ana Bazzan, Andre Barreto et al.
Volumetric Surfaces: Representing Fuzzy Geometries with Layered Meshes
Stefano Esposito, Anpei Chen, Christian Reiser et al.
Draw Your Mind: Personalized Generation via Condition-Level Modeling in Text-to-Image Diffusion Models
Hyungjin Kim, Seokho Ahn, Young-Duk Seo
Diffusion Adaptive Text Embedding for Text-to-Image Diffusion Models
Byeonghu Na, Minsang Park, Gyuwon Sim et al.
Triad: Empowering LMM-based Anomaly Detection with Expert-guided Region-of-Interest Tokenizer and Manufacturing Process
Yuanze Li, Shihao Yuan, Haolin Wang et al.
LLM Meets Diffusion: A Hybrid Framework for Crystal Material Generation
Subhojyoti Khastagir, KISHALAY DAS, Pawan Goyal et al.
Auto-Encoded Supervision for Perceptual Image Super-Resolution
MinKyu Lee, Sangeek Hyun, Woojin Jun et al.
LOTA: Bit-Planes Guided AI-Generated Image Detection
Renxi Cheng, Hongsong Wang, Yang Zhang et al.
Trade-offs in Image Generation: How Do Different Dimensions Interact?
Sicheng Zhang, Binzhu Xie, Zhonghao Yan et al.
RaSS: Improving Denoising Diffusion Samplers with Reinforced Active Sampling Scheduler
Xin Ding, Lei Yu, Xin Li et al.
EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception
Sanjoy Chowdhury, Subrata Biswas, Sayan Nag et al.
Layer as Puzzle Pieces: Compressing Large Language Models through Layer Concatenation
Fei Wang, Li Shen, Liang Ding et al.
TSP-Mamba: The Travelling Salesman Problem Meets Mamba for Image Super-resolution and Beyond
Kun Zhou, Xinyu Lin, Jiangbo Lu
One Filters All: A Generalist Filter For State Estimation
Shiqi Liu, Wenhan Cao, Chang Liu et al.
A Unified Framework for Motion Reasoning and Generation in Human Interaction
Jeongeun Park, Sungjoon Choi, Sangdoo Yun
VTimeCoT: Thinking by Drawing for Video Temporal Grounding and Reasoning
Jinglei Zhang, Yuanfan Guo, Rolandos Alexandros Potamias et al.
Progressive Growing of Video Tokenizers for Temporally Compact Latent Spaces
Aniruddha Mahapatra, Long Mai, David Bourgin et al.
CorrBEV: Multi-View 3D Object Detection by Correlation Learning with Multi-modal Prototypes
ziteng xue, Mingzhe Guo, Heng Fan et al.
Adaptive Kernel Design for Bayesian Optimization Is a Piece of CAKE with LLMs
Richard Suwandi, Feng Yin, Juntao Wang et al.
EfficientMT: Efficient Temporal Adaptation for Motion Transfer in Text-to-Video Diffusion Models
Yufei Cai, Hu Han, Yuxiang Wei et al.
SPARKE: Scalable Prompt-Aware Diversity and Novelty Guidance in Diffusion Models via RKE Score
Mohammad Jalali, Haoyu Lei, Amin Gohari et al.
HouseTour: A Virtual Real Estate A(I)gent
Ata Çelen, Iro Armeni, Daniel Barath et al.