Most Cited 2025 "subword tokenization" Papers
22,274 papers found • Page 45 of 112
Conference
IntersectionZoo: Eco-driving for Benchmarking Multi-Agent Contextual Reinforcement Learning
Vindula Jayawardana, Baptiste Freydt, Ao Qu et al.
MV-SSM: Multi-View State Space Modeling for 3D Human Pose Estimation
Aviral Chharia, Wenbo Gou, Haoye Dong
FlashMoE: Fast Distributed MoE in a Single Kernel
Osayamen Aimuyo, Byungsoo Oh, Rachee Singh
STAFF: Speculative Coreset Selection for Task-Specific Fine-tuning
Xiaoyu Zhang, Juan Zhai, Shiqing Ma et al.
UniWav: Towards Unified Pre-training for Speech Representation Learning and Generation
Alexander Liu, Sang-gil Lee, Chao-Han Huck Yang et al.
AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders
Yuezhou Hu, Jiaxin Guo, Xinyu Feng et al.
IndustryEQA: Pushing the Frontiers of Embodied Question Answering in Industrial Scenarios
Yifan Li, Yuhang Chen, Anh Dao et al.
I2V3D: Controllable Image-to-video Generation with 3D Guidance
Zhiyuan Zhang, Dongdong Chen, Jing Liao
RUBIK: A Structured Benchmark for Image Matching across Geometric Challenges
Thibaut Loiseau, Guillaume Bourmaud
From Image to Video: An Empirical Study of Diffusion Representations
Pedro Vélez, Luisa Polania Cabrera, Yi Yang et al.
VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance
Mohammad Reza Taesiri, Abhijay Ghildyal, Saman Zadtootaghaj et al.
Higher-Order Graphon Neural Networks: Approximation and Cut Distance
Daniel Herbst, Stefanie Jegelka
The Hyperfitting Phenomenon: Sharpening and Stabilizing LLMs for Open-Ended Text Generation
Fredrik Carlsson, Fangyu Liu, Daniel Ward et al.
X-Scene: Large-Scale Driving Scene Generation with High Fidelity and Flexible Controllability
Yu Yang, Alan Liang, Jianbiao Mei et al.
Styl3R: Instant 3D Stylized Reconstruction for Arbitrary Scenes and Styles
Peng Wang, Xiang Liu, Peidong Liu
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
Fida Mohammad Thoker, Letian Jiang, Chen Zhao et al.
OSMamba: Omnidirectional Spectral Mamba with Dual-Domain Prior Generator for Exposure Correction
Gehui Li, Bin Chen, Chen Zhao et al.
Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation
Jitesh Jain, Zhengyuan Yang, Humphrey Shi et al.
RAGRouter: Learning to Route Queries to Multiple Retrieval-Augmented Language Models
Jiarui Zhang, Xiangyu Liu, Yong Hu et al.
UniEgoMotion: A Unified Model for Egocentric Motion Reconstruction, Forecasting, and Generation
Chaitanya Patel, Hiroki Nakamura, Yuta Kyuragi et al.
Gradient Inversion Attacks on Parameter-Efficient Fine-Tuning
Hasin Us Sami, Swapneel Sen, Amit K. Roy-Chowdhury et al.
Towards A Generalist Code Embedding Model Based On Massive Data Synthesis
Chaofan Li, Jianlyu Chen, Yingxia Shao et al.
Decomposing Interventional Causality into Synergistic, Redundant, and Unique Components
Abel Jansma
Going Beyond Feature Similarity: Effective Dataset distillation based on Class-aware Conditional Mutual Information
Xinhao Zhong, Bin Chen, Hao Fang et al.
Lie Detector: Unified Backdoor Detection via Cross-Examination Framework
Xuan Wang, Siyuan Liang, Dongping Liao et al.
Ouroboros: Single-step Diffusion Models for Cycle-consistent Forward and Inverse Rendering
shanlin sun, Yifan Wang, Hanwen Zhang et al.
Discovering Influential Neuron Path in Vision Transformers
Yifan Wang, Yifei Liu, Yingdong Shi et al.
SnowMaster: Comprehensive Real-world Image Desnowing via MLLM with Multi-Model Feedback Optimization
Jianyu LAI, Sixiang Chen, yunlong lin et al.
NeedleInATable: Exploring Long-Context Capability of Large Language Models towards Long-Structured Tables
Lanrui Wang, Mingyu Zheng, Hongyin Tang et al.
Single-pass Adaptive Image Tokenization for Minimum Program Search
Shivam Duggal, Sanghyun Byun, Bill Freeman et al.
Real-time Free-view Human Rendering from Sparse-view RGB Videos using Double Unprojected Textures
Guoxing Sun, Rishabh Dabral, Heming Zhu et al.
Copyright-Protected Language Generation via Adaptive Model Fusion
Javier Abad, Konstantin Donhauser, Francesco Pinto et al.
Few-Shot Image Quality Assessment via Adaptation of Vision-Language Models
Xudong Li, Zihao Huang, Yan Zhang et al.
ESCAPE: Equivariant Shape Completion via Anchor Point Encoding
Burak Bekci, Nassir Navab, Federico Tombari et al.
AGC-Drive: A Large-Scale Dataset for Real-World Aerial-Ground Collaboration in Driving Scenarios
Yunhao Hou, Bochao Zou, Min Zhang et al.
Interpretable Global Minima of Deep ReLU Neural Networks on Sequentially Separable Data
Thomas Chen, Patricia Muñoz Ewald
QCircuitBench: A Large-Scale Dataset for Benchmarking Quantum Algorithm Design
Rui Yang, Ziruo Wang, Yuntian Gu et al.
Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency
Yukun Jiang, Mingjie Li, Michael Backes et al.
Common3D: Self-Supervised Learning of 3D Morphable Models for Common Objects in Neural Feature Space
Leonhard Sommer, Olaf Dünkel, Christian Theobalt et al.
Learnable Infinite Taylor Gaussian for Dynamic View Rendering
Bingbing Hu, Yanyan Li, rui xie et al.
Toward a Vision-Language Foundation Model for Medical Data: Multimodal Dataset and Benchmarks for Vietnamese PET/CT Report Generation
Tien Nguyen, Dac Nguyen, Duc Nguyen The Minh et al.
Multi-View 3D Point Tracking
Frano Rajič, Haofei Xu, Marko Mihajlovic et al.
LD-RPS: Zero-Shot Unified Image Restoration via Latent Diffusion Recurrent Posterior Sampling
Li Huaqiu, Yong Wang, Tongwen Huang et al.
Morpheus: Text-Driven 3D Gaussian Splat Shape and Color Stylization
Jamie Wynn, Zawar Qureshi, Jakub Powierza et al.
Self-Calibrated Variance-Stabilizing Transformations for Real-World Image Denoising
Sébastien Herbreteau, Michael Unser
From stability of Langevin diffusion to convergence of proximal MCMC for non-log-concave sampling
Marien Renaud, Valentin De Bortoli, Arthur Leclaire et al.
Rethinking Epistemic and Aleatoric Uncertainty for Active Open-Set Annotation: An Energy-Based Approach
Chen-Chen Zong, Sheng-Jun Huang
STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous Driving
Christian Fruhwirth-Reisinger, Dušan Malić, Wei Lin et al.
Learning Streaming Video Representation via Multitask Training
Yibin Yan, Jilan Xu, Shangzhe Di et al.
Learning to Plan Before Answering: Self-Teaching LLMs to Learn Abstract Plans for Problem Solving
Jin Zhang, Flood Sung, Zhilin Yang et al.
Learning Affine Correspondences by Integrating Geometric Constraints
Pengju Sun, Banglei Guan, Zhenbao Yu et al.
OptiScene: LLM-driven Indoor Scene Layout Generation via Scaled Human-aligned Data Synthesis and Multi-Stage Preference Optimization
Yixuan Yang, Zhen Luo, Tongsheng Ding et al.
FlashSloth : Lightning Multimodal Large Language Models via Embedded Visual Compression
Bo Tong, Bokai Lai, Yiyi Zhou et al.
SAT-HMR: Real-Time Multi-Person 3D Mesh Estimation via Scale-Adaptive Tokens
Chi Su, Xiaoxuan Ma, Jiajun Su et al.
Operator Deep Smoothing for Implied Volatility
Ruben Wiedemann, Antoine (Jack) Jacquier, Lukas Gonon
Solving Partial Differential Equations via Radon Neural Operator
Wenbin Lu, Yihan Chen, Junnan Xu et al.
Unveiling the Compositional Ability Gap in Vision-Language Reasoning Model
Tianle Li, Jihai Zhang, Yongming Rao et al.
Scalable Autoregressive Monocular Depth Estimation
Jinhong Wang, Jintai Chen, Jian liu et al.
Learning Linear Attention in Polynomial Time
Morris Yau, Ekin Akyürek, Jiayuan Mao et al.
Learning Mask Invariant Mutual Information for Masked Image Modeling
Tao Huang, Yanxiang Ma, Shan You et al.
Efficient Motion-Aware Video MLLM
Zijia Zhao, Yuqi Huo, Tongtian Yue et al.
OccluGaussian: Occlusion-Aware Gaussian Splatting for Large Scene Reconstruction and Rendering
Shiyong Liu, Xiao Tang, Zhihao Li et al.
Gatekeeper: Improving Model Cascades Through Confidence Tuning
Stephan Rabanser, Nathalie Rauschmayr, Achin Kulshrestha et al.
QuCOOP: A Versatile Framework for Solving Composite and Binary-Parametrised Problems on Quantum Annealers
Natacha Kuete Meli, Vladislav Golyanik, Marcel Seelbach Benkner et al.
CoT-lized Diffusion: Let's Reinforce T2I Generation Step-by-step
Zheyuan Liu, Munan Ning, Qihui Zhang et al.
Ges3ViG : Incorporating Pointing Gestures into Language-Based 3D Visual Grounding for Embodied Reference Understanding
Atharv Mahesh Mane, Dulanga Weerakoon, Vigneshwaran Subbaraju et al.
MixerMDM: Learnable Composition of Human Motion Diffusion Models
Pablo Ruiz-Ponce, German Barquero, Cristina Palmero et al.
HyPoGen: Optimization-Biased Hypernetworks for Generalizable Policy Generation
Hanxiang Ren, Li Sun, Xulong Wang et al.
Controllable 3D Outdoor Scene Generation via Scene Graphs
Yuheng Liu, Xinke Li, Yuning Zhang et al.
Synthetic-powered predictive inference
Meshi Bashari, Roy Maor Lotan, Yonghoon Lee et al.
TF-TI2I: Training-Free Text-and-Image-to-Image Generation via Multi-Modal Implicit-Context Learning In Text-to-Image Models
Teng-Fang Hsiao, Bo-Kai Ruan, Yi-Lun Wu et al.
Tokenize Image Patches: Global Context Fusion for Effective Haze Removal in Large Images
Jiuchen Chen, Xinyu Yan, Qizhi Xu et al.
ForgeLens: Data-Efficient Forgery Focus for Generalizable Forgery Image Detection
Yingjian Chen, Lei Zhang, Yakun Niu
MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness
Yunlong Tang, Pinxin Liu, Mingqian Feng et al.
CSI-Bench: A Large-Scale In-the-Wild Dataset for Multi-task WiFi Sensing
Guozhen Zhu, Yuqian Hu, Weihang Gao et al.
TopoPoint: Enhance Topology Reasoning via Endpoint Detection in Autonomous Driving
Yanping Fu, Xinyuan Liu, Tianyu Li et al.
PriorMotion: Generative Class-Agnostic Motion Prediction with Raster-Vector Motion Field Priors
Kangan Qian, Jinyu Miao, Xinyu Jiao et al.
Learning to Inference Adaptively for Multimodal Large Language Models
Zhuoyan Xu, Khoi Nguyen, Preeti Mukherjee et al.
Repurposing 2D Diffusion Models with Gaussian Atlas for 3D Generation
Tiange Xiang, Kai Li, Chengjiang Long et al.
Brain Harmony: A Multimodal Foundation Model Unifying Morphology and Function into 1D Tokens
Zijian Dong, Ruilin Li, Joanna Chong et al.
STRATUS: A Multi-agent System for Autonomous Reliability Engineering of Modern Clouds
Yinfang Chen, Jiaqi Pan, Jackson Clark et al.
Extreme Risk Mitigation in Reinforcement Learning using Extreme Value Theory
Jan Drgona, Mahantesh Halappanavar, Frank Liu et al.
In the Eye of MLLM: Benchmarking Egocentric Video Intent Understanding with Gaze-Guided Prompting
Taiying Peng, Jiacheng Hua, Miao Liu et al.
Visually Consistent Hierarchical Image Classification
Seulki Park, Youren Zhang, Stella Yu et al.
Reconstructing Animals and the Wild
Peter Kulits, Michael J. Black, Silvia Zuffi
Joint Graph Rewiring and Feature Denoising via Spectral Resonance
Jonas Linkerhägner, Cheng Shi, Ivan Dokmanić
Rethinking Multi-modal Object Detection from the Perspective of Mono-Modality Feature Learning
Tianyi Zhao, Boyang Liu, Yanglei Gao et al.
RLGF: Reinforcement Learning with Geometric Feedback for Autonomous Driving Video Generation
Tianyi Yan, Wencheng Han, xia zhou et al.
RICCARDO: Radar Hit Prediction and Convolution for Camera-Radar 3D Object Detection
Yunfei Long, Abhinav Kumar, Xiaoming Liu et al.
zip2zip: Inference-Time Adaptive Tokenization via Online Compression
Saibo Geng, Nathan Ranchin, Yunzhen Yao et al.
Preserving Clusters in Prompt Learning for Unsupervised Domain Adaptation
Long Tung Vuong, Hoang Phan, Vy Vo et al.
Vision Transformers with Self-Distilled Registers
Zipeng Yan, Yinjie Chen, Chong Zhou et al.
Self-Supervised Selective-Guided Diffusion Model for Old-Photo Face Restoration
Wenjie Li, Xiangyi Wang, Heng Guo et al.
GCC: Generative Color Constancy via Diffusing a Color Checker
Chen-Wei Chang, Cheng-De Fan, Chia-Che Chang et al.
Let's Revise Step-by-Step: A Unified Local Search Framework for Code Generation with LLMs
Zhiyi Lyu, Jianguo Huang, Yanchen Deng et al.
One-Way Ticket: Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models
Senmao Li, Lei Wang, Kai Wang et al.
Outcome-Based Online Reinforcement Learning: Algorithms and Fundamental Limits
Fan Chen, Zeyu Jia, Alexander Rakhlin et al.
OvercookedV2: Rethinking Overcooked for Zero-Shot Coordination
Tobias Gessler, Tin Dizdarevic, Ani Calinescu et al.
Conformal Prediction for Ensembles: Improving Efficiency via Score-Based Aggregation
Yash Patel, Eduardo Ochoa Rivera, Ambuj Tewari
CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning
Yang Yue, Yulin Wang, Chenxin Tao et al.
Learning Dynamic Collaborative Network for Semi-supervised 3D Vessel Segmentation
Jiao Xu, Xin Chen, Lihe Zhang
Non-Equilibrium Dynamics of Hybrid Continuous-Discrete Ground-State Sampling
Timothee Leleu, Sam Reifenstein
Learning to Unlearn while Retaining: Combating Gradient Conflicts in Machine Unlearning
Gaurav Patel, Qiang Qiu
Knowledge Bridger: Towards Training-Free Missing Modality Completion
Guanzhou Ke, Shengfeng He, Xiao-Li Wang et al.
Multi-modal Medical Diagnosis via Large-small Model Collaboration
Wanyi Chen, Zihua Zhao, Jiangchao Yao et al.
Identifiability of Deep Polynomial Neural Networks
Konstantin Usevich, Ricardo Borsoi, Clara Dérand et al.
MTL-KD: Multi-Task Learning Via Knowledge Distillation for Generalizable Neural Vehicle Routing Solver
yuepeng zheng, Fu Luo, Zhenkun Wang et al.
Collaborative Instance Object Navigation: Leveraging Uncertainty-Awareness to Minimize Human-Agent Dialogues
Francesco Taioli, Edoardo Zorzi, Gianni Franchi et al.
VertexRegen: Mesh Generation with Continuous Level of Detail
Xiang Zhang, Yawar Siddiqui, Armen Avetisyan et al.
Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation
Jie Xu, Na Zhao, Gang Niu et al.
Precise Parameter Localization for Textual Generation in Diffusion Models
Łukasz Staniszewski, Bartosz Cywiński, Franziska Boenisch et al.
EvaLearn: Quantifying the Learning Capability and Efficiency of LLMs via Sequential Problem Solving
Shihan Dou, Ming Zhang, Chenhao Huang et al.
All that structure matches does not glitter
Maya Martirossyan, Thomas Egg, Philipp Höllmer et al.
Learning to Steer: Input-dependent Steering for Multimodal LLMs
Jayneel Parekh, Pegah KHAYATAN, Mustafa Shukor et al.
ReGen: Generative Robot Simulation via Inverse Design
Peter (Phat) Nguyen, Johnson (Tsun-Hsuan) Wang, Zhang-Wei Hong et al.
T-FAKE: Synthesizing Thermal Images for Facial Landmarking
Philipp Flotho, Moritz Piening, Anna Kukleva et al.
Taxonomy-Aware Evaluation of Vision-Language Models
Vésteinn Snæbjarnarson, Kevin Du, Niklas Stoehr et al.
FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models
Yan Gao, Massimo R. Scamarcia, Javier Fernandez-Marques et al.
EVODiff: Entropy-aware Variance Optimized Diffusion Inference
Shigui Li, Wei Chen, Delu Zeng
Model Selection for Off-policy Evaluation: New Algorithms and Experimental Protocol
Pai Liu, Lingfeng Zhao, Shivangi Agarwal et al.
TurboTrain: Towards Efficient and Balanced Multi-Task Learning for Multi-Agent Perception and Prediction
Zewei Zhou, Zhihao Zhao, Tianhui Cai et al.
ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools
Shaofeng Yin, Ting Lei, Yang Liu
Protein Design with Dynamic Protein Vocabulary
Nuowei Liu, Jiahao Kuang, Yanting Liu et al.
AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction
Yuanbin Man, Ying Huang, Chengming Zhang et al.
Free-Form Motion Control: Controlling the 6D Poses of Camera and Objects in Video Generation
Xincheng Shuai, Henghui Ding, Zhenyuan Qin et al.
C-SEO Bench: Does Conversational SEO Work?
Haritz Puerto, Martin Gubri, Tommaso Green et al.
Continuous Locomotive Crowd Behavior Generation
Inhwan Bae, Junoh Lee, Hae-Gon Jeon
LineArt: A Knowledge-guided Training-free High-quality Appearance Transfer for Design Drawing with Diffusion Model
Xi Wang, Hongzhen Li, Heng Fang et al.
CAVIS: Context-Aware Video Instance Segmentation
Seunghun Lee, Jiwan Seo, Kiljoon Han et al.
CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting
Siyu Jiao, Haoye Dong, Yuyang Yin et al.
3EED: Ground Everything Everywhere in 3D
Rong Li, Yuhao Dong, Tianshuai Hu et al.
Rollout Roulette: A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods
Isha Puri, Shivchander Sudalairaj, Guangxuan Xu et al.
SD-VLM: Spatial Measuring and Understanding with Depth-Encoded Vision-Language Models
Pingyi Chen, Yujing Lou, Shen Cao et al.
Event Fields: Capturing Light Fields at High Speed, Resolution, and Dynamic Range
Ziyuan Qu, Zihao Zou, Vivek Boominathan et al.
BiM-VFI: Bidirectional Motion Field-Guided Frame Interpolation for Video with Non-uniform Motions
Wonyong Seo, Jihyong Oh, Munchurl Kim
PolyVivid: Vivid Multi-Subject Video Generation with Cross-Modal Interaction and Enhancement
Teng Hu, Zhentao Yu, Zhengguang Zhou et al.
Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data
Haoxin Li, Boyang Li
EconGym: A Scalable AI Testbed with Diverse Economic Tasks
Qirui Mi, Qipeng Yang, Zijun Fan et al.
Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations
Jeonghyeon Kim, Sangheum Hwang
HyperSeg: Hybrid Segmentation Assistant with Fine-grained Visual Perceiver
Cong Wei, Haoxian Tan, Yujie Zhong et al.
Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness
Qifan Yu, Zhebei Shen, Zhongqi Yue et al.
PhyloVAE: Unsupervised Learning of Phylogenetic Trees via Variational Autoencoders
Tianyu Xie, David Harry Tyensoung Richman, Jiansi Gao et al.
SMoLoRA: Exploring and Defying Dual Catastrophic Forgetting in Continual Visual Instruction Tuning
Ziqi Wang, Chang Che, Qi Wang et al.
Functional Homotopy: Smoothing Discrete Optimization via Continuous Parameters for LLM Jailbreak Attacks
Zi Wang, Divyam Anshumaan, Ashish Hooda et al.
SIMS: Simulating Stylized Human-Scene Interactions with Retrieval-Augmented Script Generation
Wenjia Wang, Liang Pan, Zhiyang Dou et al.
NEP: Autoregressive Image Editing via Next Editing Token Prediction
Huimin Wu, Xiaojian (Shawn) Ma, Haozhe Zhao et al.
LiteReality: Graphic-Ready 3D Scene Reconstruction from RGB-D Scans
Zhening Huang, Xiaoyang Wu, Fangcheng Zhong et al.
FSNet: Feasibility-Seeking Neural Network for Constrained Optimization with Guarantees
Hoang Nguyen, Priya Donti
SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors
chen yang, Hui Wang, Shiyao Wang et al.
GPLQ: A General, Practical, and Lightning QAT Method for Vision Transformers
Guang Liang, Xinyao Liu, Jianxin Wu
DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking head Video Generation
Hanbo Cheng, Limin Lin, Chenyu Liu et al.
Satellite to GroundScape - Large-scale Consistent Ground View Generation from Satellite Views
Ningli Xu, Rongjun Qin
ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers
Qianhao Yuan, Qingyu Zhang, yanjiang liu et al.
Dynamic Risk Assessments for Offensive Cybersecurity Agents
Boyi Wei, Benedikt Stroebl, Jiacen Xu et al.
PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning
Song Wang, Xiaolu Liu, Lingdong Kong et al.
Complexity Lower Bounds of Adaptive Gradient Algorithms for Non-convex Stochastic Optimization under Relaxed Smoothness
Michael Crawshaw, Mingrui Liu
Subnet-Aware Dynamic Supernet Training for Neural Architecture Search
Jeimin Jeon, Youngmin Oh, Junghyup Lee et al.
Range, not Independence, Drives Modularity in Biologically Inspired Representations
Will Dorrell, Kyle Hsu, Luke Hollingsworth et al.
$\texttt{G1}$: Teaching LLMs to Reason on Graphs with Reinforcement Learning
Xiaojun Guo, Ang Li, Yifei Wang et al.
Metropolis Adjusted Microcanonical Hamiltonian Monte Carlo
Jakob Robnik, Reuben Cohn-Gordon, Uros Seljak
D3: Training-Free AI-Generated Video Detection Using Second-Order Features
Chende Zheng, Ruiqi suo, Chenhao Lin et al.
Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better
Zihang Lai, Andrea Vedaldi
Resilient Sensor Fusion Under Adverse Sensor Failures via Multi-Modal Expert Fusion
Konyul Park, Yecheol Kim, Daehun Kim et al.
Geometry in Style: 3D Stylization via Surface Normal Deformation
Nam Anh Dinh, Itai Lang, Hyunwoo Kim et al.
Adaptive Prediction-Powered AutoEval with Reliability and Efficiency Guarantees
Sangwoo Park, Matteo Zecchin, Osvaldo Simeone
Sparta Alignment: Collectively Aligning Multiple Language Models through Combat
Yuru Jiang, Wenxuan Ding, Shangbin Feng et al.
SAM4D: Segment Anything in Camera and LiDAR Streams
Jianyun Xu, Song Wang, Ziqian Ni et al.
BokehDiff: Neural Lens Blur with One-Step Diffusion
Chengxuan Zhu, Qingnan Fan, Qi Zhang et al.
What are you sinking? A geometric approach on attention sink
Valeria Ruscio, Umberto Nanni, Fabrizio Silvestri
Understanding Contrastive Learning via Gaussian Mixture Models
Parikshit Bansal, Ali Kavis, Sujay Sanghavi
CMMLoc: Advancing Text-to-PointCloud Localization with Cauchy-Mixture-Model Based Framework
Yanlong Xu, Haoxuan Qu, Jun Liu et al.
VeriLoC: Line-of-Code Level Prediction of Hardware Design Quality from Verilog Code
Raghu Vamshi Hemadri, Jitendra Bhandari, Andre Nakkab et al.
The Devil is in Low-Level Features for Cross-Domain Few-Shot Segmentation
Yuhan Liu, Yixiong Zou, Yuhua Li et al.
FedRTS: Federated Robust Pruning via Combinatorial Thompson Sampling
Hong Huang, Jinhai Yang, Yuan Chen et al.
Let Humanoids Hike! Integrative Skill Development on Complex Trails
Kwan-Yee Lin, Stella X. Yu
CAML: Collaborative Auxiliary Modality Learning for Multi-Agent Systems
Rui Liu, Yu Shen, Peng Gao et al.
Buffer Anytime: Zero-Shot Video Depth and Normal from Image Priors
Zhengfei Kuang, Tianyuan Zhang, Kai Zhang et al.
SparseRecon: Neural Implicit Surface Reconstruction from Sparse Views with Feature and Depth Consistencies
Liang Han, Xu Zhang, Haichuan Song et al.
Ev-3DOD: Pushing the Temporal Boundaries of 3D Object Detection with Event Cameras
Hoonhee Cho, Jae-Young Kang, Youngho Kim et al.
Event Ellipsometer: Event-based Mueller-Matrix Video Imaging
Ryota Maeda, Yunseong Moon, Seung-Hwan Baek
Learning Physics-Based Full-Body Human Reaching and Grasping from Brief Walking References
Yitang Li, Mingxian Lin, Zhuo Lin et al.
Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation
Jungeun Kim, Hyeongwoo Jeon, Jongseong Bae et al.
Mitigating Sexual Content Generation via Embedding Distortion in Text-conditioned Diffusion Models
Jaesin Ahn, Heechul Jung
IDF: Iterative Dynamic Filtering Networks for Generalizable Image Denoising
Dongjin Kim, Jaekyun Ko, Muhammad Kashif Ali et al.
Arti-PG: A Toolbox for Procedurally Synthesizing Large-Scale and Diverse Articulated Objects with Rich Annotations
Jianhua Sun, Yuxuan Li, Jiude Wei et al.
LaM-SLidE: Latent Space Modeling of Spatial Dynamical Systems via Linked Entities
Florian Sestak, Artur Toshev, Andreas Fürst et al.
Regression-adjusted Monte Carlo Estimators for Shapley Values and Probabilistic Values
R. Teal Witter, Yurong Liu, Christopher Musco
Taste More, Taste Better: Diverse Data and Strong Model Boost Semi-Supervised Crowd Counting
Maochen Yang, Zekun Li, Jian Zhang et al.
AdmTree: Compressing Lengthy Context with Adaptive Semantic Trees
Yangning Li, Shaoshen Chen, Yinghui Li et al.
Preserve or Modify? Context-Aware Evaluation for Balancing Preservation and Modification in Text-Guided Image Editing
Yoonjeon Kim, Soohyun Ryu, Yeonsung Jung et al.
Balanced Direction from Multifarious Choices: Arithmetic Meta-Learning for Domain Generalization
Xiran Wang, Jian Zhang, Lei Qi et al.
Enhancing Robust Fairness via Confusional Spectral Regularization
Gaojie Jin, Sihao Wu, Jiaxu Liu et al.
CORE: Reducing UI Exposure in Mobile Agents via Collaboration Between Cloud and Local LLMs
Gucongcong Fan, Chaoyue Niu, Chengfei Lyu et al.
CLIPGaussian: Universal and Multimodal Style Transfer Based on Gaussian Splatting
Kornel Howil, Joanna Waczynska, Piotr Borycki et al.
ARGUS: Hallucination and Omission Evaluation in Video-LLMs
Ruchit Rawal, Reza Shirkavand, Heng Huang et al.
MultiVerse: A Multi-Turn Conversation Benchmark for Evaluating Large Vision and Language Models
Young-Jun Lee, Byung-Kwan Lee, Jianshu Zhang et al.
LidarGait++: Learning Local Features and Size Awareness from LiDAR Point Clouds for 3D Gait Recognition
Chuanfu Shen, Rui Wang, Lixin Duan et al.
DualEqui: A Dual-Space Hierarchical Equivariant Network for Large Biomolecules
Junjie Xu, Jiahao Zhang, Mangal Prakash et al.
Impact of Dataset Properties on Membership Inference Vulnerability of Deep Transfer Learning
Marlon Tobaben, Hibiki Ito, Joonas Jälkö et al.
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations
Yujia Zhang, Xiaoyang Wu, Yixing Lao et al.