Most Cited 2025 "space-to-object regression" Papers
22,274 papers found • Page 19 of 112
Conference
DynamicVL: Benchmarking Multimodal Large Language Models for Dynamic City Understanding
Weihao Xuan, Junjue Wang, Heli Qi et al.
Near, far: Patch-ordering enhances vision foundation models' scene understanding
Valentinos Pariza, Mohammadreza Salehi, Gertjan J Burghouts et al.
g3D-LF: Generalizable 3D-Language Feature Fields for Embodied Tasks
Zihan Wang, Gim Hee Lee
Data Taggants: Dataset Ownership Verification Via Harmless Targeted Data Poisoning
Wassim Bouaziz, Nicolas Usunier, El-Mahdi El-Mhamdi
Exploring CLIP's Dense Knowledge for Weakly Supervised Semantic Segmentation
Zhiwei Yang, Yucong Meng, Kexue Fu et al.
Injecting Universal Jailbreak Backdoors into LLMs in Minutes
Zhuowei Chen, qiannan zhang, Shichao Pei
GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis
Bo Liu, Ke Zou, Li-Ming Zhan et al.
As large as it gets – Studying Infinitely Large Convolutions via Neural Implicit Frequency Filters
Margret Keuper, Julia Grabinski, Janis Keuper
Fast and Slow Streams for Online Time Series Forecasting Without Information Leakage
Ying-yee Ava Lau, Zhiwen Shao, Dit-Yan Yeung
Foundations of Top-$k$ Decoding for Language Models
Georgy Noarov, Soham Mallick, Tao Wang et al.
Multi-clue Consistency Learning to Bridge Gaps Between General and Oriented Object in Semi-supervised Detection
Chenxu Wang, Chunyan Xu, Xiang Li et al.
LoTUS: Large-Scale Machine Unlearning with a Taste of Uncertainty
Christoforos N. Spartalis, Theodoros Semertzidis, Efstratios Gavves et al.
REPA Works Until It Doesn’t: Early-Stopped, Holistic Alignment Supercharges Diffusion Training
Ziqiao Wang, Wangbo Zhao, Yuhao Zhou et al.
On the Expressiveness of Rational ReLU Neural Networks With Bounded Depth
Gennadiy Averkov, Christopher Hojny, Maximilian Merkert
SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning
Jiaqi Huang, Zunnan Xu, Jun Zhou et al.
BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning
Han Zhong, Yutong Yin, Shenao Zhang et al.
ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL
Yang Qin, Chao Chen, Zhihang Fu et al.
Whole-Body Conditioned Egocentric Video Prediction
Yutong Bai, Danny Tran, Amir Bar et al.
Scaling Embedding Layers in Language Models
Da Yu, Edith Cohen, Badih Ghazi et al.
Differentially Private Steering for Large Language Model Alignment
Anmol Goel, Yaxi Hu, Iryna Gurevych et al.
DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback
Zaid Khan, Elias Stengel-Eskin, Jaemin Cho et al.
Dehaze-RetinexGAN: Real-World Image Dehazing via Retinex-based Generative Adversarial Network
Xinran Wang, Guang Yang, Tian Ye et al.
Adversarial Generative Flow Network for Solving Vehicle Routing Problems
Ni Zhang, Jingfeng Yang, Zhiguang Cao et al.
Decoding Game: On Minimax Optimality of Heuristic Text Generation Strategies
Sijin Chen, Omar Hagrass, Jason Klusowski
DataMan: Data Manager for Pre-training Large Language Models
Ru Peng, Kexin Yang, Yawen Zeng et al.
Efficient stagewise pretraining via progressive subnetworks
Abhishek Panigrahi, Nikunj Saunshi, Kaifeng Lyu et al.
Deeply Supervised Flow-Based Generative Models
Inkyu Shin, Chenglin Yang, Liang-Chieh Chen
Skip Tuning: Pre-trained Vision-Language Models are Effective and Efficient Adapters Themselves
Shihan Wu, Ji Zhang, Pengpeng Zeng et al.
Multirate Neural Image Compression with Adaptive Lattice Vector Quantization
Hao Xu, Xiaolin Wu, Xi Zhang
RealisHuman: A Two-Stage Approach for Refining Malformed Human Parts in Generated Images
Benzhi Wang, Jingkai Zhou, Jingqi Bai et al.
GaussMark: A Practical Approach for Structural Watermarking of Language Models
Adam Block, Alexander Rakhlin, Ayush Sekhari
LoCoDL: Communication-Efficient Distributed Learning with Local Training and Compression
Laurent Condat, Artavazd Maranjyan, Peter Richtarik
DenseGrounding: Improving Dense Language-Vision Semantics for Ego-centric 3D Visual Grounding
Henry Zheng, Hao Shi, Qihang Peng et al.
Can Textual Gradient Work in Federated Learning?
Minghui Chen, Ruinan Jin, Wenlong Deng et al.
NeuralSVG: An Implicit Representation for Text-to-Vector Generation
Sagi Polaczek, Yuval Alaluf, Elad Richardson et al.
GPS: A Probabilistic Distributional Similarity with Gumbel Priors for Set-to-Set Matching
Ziming Zhang, Fangzhou Lin, Haotian Liu et al.
Balancing Multimodal Training Through Game-Theoretic Regularization
Konstantinos Kontras, Thomas Strypsteen, Christos Chatzichristos et al.
Medical Multimodal Model Stealing Attacks via Adversarial Domain Alignment
Yaling Shen, Zhixiong Zhuang, Kun Yuan et al.
ROICtrl: Boosting Instance Control for Visual Generation
Yuchao Gu, Yipin Zhou, Yunfan Ye et al.
Sherlock: Self-Correcting Reasoning in Vision-Language Models
Yi Ding, Ruqi Zhang
ModeSeq: Taming Sparse Multimodal Motion Prediction with Sequential Mode Modeling
Zikang Zhou, Hengjian Zhou, Haibo Hu et al.
Ultra-Resolution Adaptation with Ease
Ruonan Yu, Songhua Liu, Zhenxiong Tan et al.
Gaussian Splatting for Efficient Satellite Image Photogrammetry
Luca Savant Aira, Gabriele Facciolo, Thibaud Ehret
DeNVeR: Deformable Neural Vessel Representations for Unsupervised Video Vessel Segmentation
Chun-Hung Wu, Shih-Hong Chen, Chih Yao Hu et al.
Real-time High-fidelity Gaussian Human Avatars with Position-based Interpolation of Spatially Distributed MLPs
Youyi Zhan, Tianjia Shao, Yin Yang et al.
Linear Attention Modeling for Learned Image Compression
Donghui Feng, Zhengxue Cheng, Shen Wang et al.
ProbPose: A Probabilistic Approach to 2D Human Pose Estimation
Miroslav Purkrábek, Jiri Matas
MAGE: Model-Level Graph Neural Networks Explanations via Motif-based Graph Generation
Zhaoning Yu, Hongyang Gao
MMSearch: Unveiling the Potential of Large Models as Multi-modal Search Engines
Dongzhi Jiang, Renrui Zhang, Ziyu Guo et al.
FlashMD: long-stride, universal prediction of molecular dynamics
Filippo Bigi, Sanggyu Chong, Agustinus Kristiadi et al.
Debiasing Multimodal Large Language Models via Noise-Aware Preference Optimization
zefeng zhang, Hengzhu Tang, Jiawei Sheng et al.
Learning Bijective Surface Parameterization for Inferring Signed Distance Functions from Sparse Point Clouds with Grid Deformation
Takeshi Noda, Chao Chen, Junsheng Zhou et al.
Vision-centric Token Compression in Large Language Model
Ling Xing, Alex Jinpeng Wang, Rui Yan et al.
SMT: Fine-Tuning Large Language Models with Sparse Matrices
Haoze He, Juncheng Li, Xuan Jiang et al.
Motion-aware Contrastive Learning for Temporal Panoptic Scene Graph Generation
Thong Thanh Nguyen, Xiaobao Wu, Yi Bin et al.
Modeling Cell Dynamics and Interactions with Unbalanced Mean Field Schrödinger Bridge
Zhenyi Zhang, Zihan Wang, Yuhao Sun et al.
ARM: Appearance Reconstruction Model for Relightable 3D Generation
Xiang Feng, Chang Yu, Zoubin Bi et al.
SwiftTry: Fast and Consistent Video Virtual Try-On with Diffusion Models
Hung Nguyen, Quang Qui-Vinh Nguyen, Khoi Nguyen et al.
SEMU: Singular Value Decomposition for Efficient Machine Unlearning
Marcin Sendera, Łukasz Struski, Kamil Książek et al.
Arbitrary Reading Order Scene Text Spotter with Local Semantics Guidance
Jiahao Lyu, Wei Wang, Dongbao Yang et al.
SMITE: Segment Me In TimE
Amirhossein Alimohammadi, Sauradip Nag, Saeid Asgari et al.
LLM Strategic Reasoning: Agentic Study through Behavioral Game Theory
Jingru Jia, Zehua Yuan, Junhao Pan et al.
ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering
Yuki Imajuku, Kohki Horie, Yoichi Iwata et al.
Glauber Generative Model: Discrete Diffusion Models via Binary Classification
Harshit Varma, Dheeraj Nagaraj, Karthikeyan Shanmugam
Unified Uncertainty-Aware Diffusion for Multi-Agent Trajectory Modeling
Guillem Capellera, Antonio Rubio, Luis Ferraz et al.
Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve Adjustment
Ziteng Cui, Xuangeng Chu, Tatsuya Harada
Among Us: A Sandbox for Measuring and Detecting Agentic Deception
Satvik Golechha, Adrià Garriga-Alonso
VORTA: Efficient Video Diffusion via Routing Sparse Attention
Wenhao Sun, Rong-Cheng Tu, Yifu Ding et al.
Fine-Grained Erasure in Text-to-Image Diffusion-based Foundation Models
Kartik Thakral, Tamar Glaser, Tal Hassner et al.
Valid Conformal Prediction for Dynamic GNNs
Ed Davis, Ian Gallagher, Daniel Lawson et al.
PurpCode: Reasoning for Safer Code Generation
Jiawei Liu, Nirav Diwan, Zhe Wang et al.
Monocular and Generalizable Gaussian Talking Head Animation
Shengjie Gong, Haojie Li, Jiapeng Tang et al.
The Computer Vision Foundation
Yancheng Cai, Fei Yin, Dounia Hammou et al.
Activation-Informed Merging of Large Language Models
Amin Heyrani Nobari, Kaveh Alimohammadi, Ali ArjomandBigdeli et al.
DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding
Jungbin Cho, Junwan Kim, Jisoo Kim et al.
Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration
Max Wilcoxson, Qiyang Li, Kevin Frans et al.
Uncertainty Modeling in Graph Neural Networks via Stochastic Differential Equations
Richard Bergna, Sergio Calvo Ordoñez, Felix Opolka et al.
Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation
adil kaan akan, Yucel Yemez
Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models
Zhaoyi Liu, Huan Zhang
AutoElicit: Using Large Language Models for Expert Prior Elicitation in Predictive Modelling
Alexander Capstick, Rahul G. Krishnan, Payam Barnaghi
FreSh: Frequency Shifting for Accelerated Neural Representation Learning
Adam Kania, Marko Mihajlovic, Sergey Prokudin et al.
COLUMBUS: Evaluating COgnitive Lateral Understanding Through Multiple-Choice reBUSes
Koen Kraaijveld, Yifan Jiang, Kaixin Ma et al.
Position: We Need An Algorithmic Understanding of Generative AI
Oliver Eberle, Thomas McGee, Hamza Giaffar et al.
Turbo3D: Ultra-fast Text-to-3D Generation
Hanzhe Hu, Tianwei Yin, Fujun Luan et al.
Unified Multimodal Understanding via Byte-Pair Visual Encoding
Wanpeng Zhang, Yicheng Feng, Hao Luo et al.
Data-Juicer Sandbox: A Feedback-Driven Suite for Multimodal Data-Model Co-development
Daoyuan Chen, Haibin Wang, Yilun Huang et al.
Enhancing SAM with Efficient Prompting and Preference Optimization for Semi-supervised Medical Image Segmentation
Aishik Konwer, Zhijian Yang, Erhan Bas et al.
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis
Ulyana Piterbarg, Lerrel Pinto, Rob Fergus
MimeQA: Towards Socially-Intelligent Nonverbal Foundation Models
Hengzhi Li, Megan Tjandrasuwita, Yi R. (May) Fung et al.
Evaluating Neuron Explanations: A Unified Framework with Sanity Checks
Tuomas Oikarinen, Ge Yan, Lily Weng
DISCO: learning to DISCover an evolution Operator for multi-physics-agnostic prediction
Rudy Morel, Jiequn Han, Edouard Oyallon
UltraHR-100K: Enhancing UHR Image Synthesis with A Large-Scale High-Quality Dataset
Chen Zhao, En Ci, Yunzhe Xu et al.
Doubly Contrastive Learning for Source-Free Domain Adaptive Person Search
Yizhen Jia, Rong Quan, Yue Feng et al.
Janus-Pro-R1: Advancing Collaborative Visual Comprehension and Generation via Reinforcement Learning
Kaihang Pan, Yang Wu, Wendong Bu et al.
HAMoBE: Hierarchical and Adaptive Mixture of Biometric Experts for Video-based Person ReID
Yiyang Su, Yunping Shi, Feng Liu et al.
Cross-modal Causal Relation Alignment for Video Question Grounding
weixing chen, Yang Liu, Binglin Chen et al.
GenesisTex2: Stable, Consistent and High-Quality Text-to-Texture Generation
Jiawei Lu, YingPeng Zhang, Zengjun Zhao et al.
Hyperbolic Category Discovery
Yuanpei Liu, Zhenqi He, Kai Han
BRAID: Input-driven Nonlinear Dynamical Modeling of Neural-Behavioral Data
Parsa Vahidi, Omid G. Sani, Maryam Shanechi
DGQ: Distribution-Aware Group Quantization for Text-to-Image Diffusion Models
Hyogon Ryu, NaHyeon Park, Hyunjung Shim
JAFAR: Jack up Any Feature at Any Resolution
Paul Couairon, Loïck Chambon, Louis Serrano et al.
HUMOTO: A 4D Dataset of Mocap Human Object Interactions
Jiaxin Lu, Chun-Hao Huang, Uttaran Bhattacharya et al.
Geometry Aware Operator Transformer as an efficient and accurate neural surrogate for PDEs on arbitrary domains
Shizheng Wen, Arsh Kumbhat, Levi Lingsch et al.
Dynamic Updates for Language Adaptation in Visual-Language Tracking
Xiaohai Li, Bineng Zhong, Qihua Liang et al.
LookCloser: Frequency-aware Radiance Field for Tiny-Detail Scene
Xiaoyu Zhang, Weihong Pan, Chong Bao et al.
Towards Stable and Storage-efficient Dataset Distillation: Matching Convexified Trajectory
Wenliang Zhong, Haoyu Tang, Qinghai Zheng et al.
Rethinking Verification for LLM Code Generation: From Generation to Testing
Zihan Ma, Taolin Zhang, Maosongcao et al.
Panorama Generation From NFoV Image Done Right
Dian Zheng, Cheng Zhang, Xiao-Ming Wu et al.
Enhancing 3D Gaze Estimation in the Wild using Weak Supervision with Gaze Following Labels
Pierre Vuillecard, Jean-marc Odobez
Symbolic regression via MDLformer-guided search: from minimizing prediction error to minimizing description length
Zihan Yu, Jingtao Ding, Yong Li et al.
DuMo: Dual Encoder Modulation Network for Precise Concept Erasure
Feng Han, Kai Chen, Chao Gong et al.
SynFER: Towards Boosting Facial Expression Recognition with Synthetic Data
Xilin He, Cheng Luo, Xiaole Xian et al.
What Do Latent Action Models Actually Learn?
Chuheng Zhang, Tim Pearce, Pushi Zhang et al.
Finite-Sample Analysis of Policy Evaluation for Robust Average Reward Reinforcement Learning
Yang Xu, Washim Mondal, Vaneet Aggarwal
Segment Any 3D Object with Language
Seungjun Lee, Yuyang Zhao, Gim H Lee
Noisy Label Calibration for Multi-View Classification
Shilin Xu, Yuan Sun, Xingfeng Li et al.
GSRF: Complex-Valued 3D Gaussian Splatting for Efficient Radio-Frequency Data Synthesis
Kang Yang, Gaofeng Dong, Sijie Ji et al.
Out of Length Text Recognition with Sub-String Matching
Yongkun Du, Zhineng Chen, Caiyan Jia et al.
PIG: Physics-Informed Gaussians as Adaptive Parametric Mesh Representations
Namgyu Kang, Jaemin Oh, Youngjoon Hong et al.
PALMBENCH: A COMPREHENSIVE BENCHMARK OF COMPRESSED LARGE LANGUAGE MODELS ON MOBILE PLATFORMS
Yilong Li, Jingyu Liu, Hao Zhang et al.
Privacy amplification by random allocation
Moshe Shenfeld, Vitaly Feldman
Triples as the Key: Structuring Makes Decomposition and Verification Easier in LLM-based TableQA
Zhen Yang, Ziwei Du, Minghan Zhang et al.
EchoShot: Multi-Shot Portrait Video Generation
Jiahao Wang, Hualian Sheng, Sijia Cai et al.
Expensive Multi-Objective Bayesian Optimization Based on Diffusion Models
Bingdong Li, Zixiang Di, Yongfan Lu et al.
Detail-Preserving Latent Diffusion for Stable Shadow Removal
Jiamin Xu, Yuxin Zheng, Zelong Li et al.
SVGBuilder: Component-Based Colored SVG Generation with Text-Guided Autoregressive Transformers
Zehao Chen, Rong Pan
Ringmaster ASGD: The First Asynchronous SGD with Optimal Time Complexity
Artavazd Maranjyan, Alexander Tyurin, Peter Richtarik
Towards Generalizable Scene Change Detection
Jae-Woo KIM, Ue-Hwan Kim
DiffLocks: Generating 3D Hair from a Single Image using Diffusion Models
Radu Alexandru Rosu, Keyu Wu, Yao Feng et al.
Solving Robust Markov Decision Processes: Generic, Reliable, Efficient
Tobias Meggendorfer, Maximilian Weininger, Patrick Wienhöft
HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization
Zitang Zhou, Ke Mei, Yu Lu et al.
AgroBench: Vision-Language Model Benchmark in Agriculture
Risa Shinoda, Nakamasa Inoue, Hirokatsu Kataoka et al.
Progress-Aware Video Frame Captioning
Zihui Xue, Joungbin An, Xitong Yang et al.
Language Driven Occupancy Prediction
Zhu Yu, Bowen Pang, Lizhe Liu et al.
CoreGuard: Safeguarding Foundational Capabilities of LLMs Against Model Stealing in Edge Deployment
Qinfeng Li, Tianyue Luo, Xuhong Zhang et al.
GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs
Xinli Xu, Wenhang Ge, Dicong Qiu et al.
Adaptive Calibration: A Unified Conversion Framework of Spiking Neural Networks
Ziqing Wang, Yuetong Fang, Jiahang Cao et al.
Second Order Bounds for Contextual Bandits with Function Approximation
Aldo Pacchiano
DiET-GS: Diffusion Prior and Event Stream-Assisted Motion Deblurring 3D Gaussian Splatting
Seungjun Lee, Gim Hee Lee
A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image Generation
Liang Chen, Sinan Tan, Zefan Cai et al.
InteractAnything: Zero-shot Human Object Interaction Synthesis via LLM Feedback and Object Affordance Parsing
Jinlu Zhang, Yixin Chen, Zan Wang et al.
Gradient-Guided Annealing for Domain Generalization
Aristotelis Ballas, Christos Diou
Learning Complex Heterogeneous Multimodal Fake News via Social Latent Network Inference
Mingxin Li, Yuchen Zhang, Haowei Xu et al.
FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement
Ian Huang, Yanan Bao, Karen Truong et al.
ESE: Espresso Sentence Embeddings
Xianming Li, Zongxi Li, Jing Li et al.
Corvid: Improving Multimodal Large Language Models Towards Chain-of-Thought Reasoning
Jingjing Jiang, Chao Ma, Xurui Song et al.
Towards Real Unsupervised Anomaly Detection Via Confident Meta-Learning
Muhammad Aqeel, Shakiba Sharifi, Marco Cristani et al.
SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization
Hongrui Jia, Chaoya Jiang, Haiyang Xu et al.
Generalizable Sensor-Based Activity Recognition via Categorical Concept Invariant Learning
Di Xiong, Shuoyuan Wang, Lei Zhang et al.
Cross-Domain Graph Data Scaling: A Showcase with Diffusion Models
Wenzhuo Tang, Haitao Mao, Danial Dervovic et al.
DAMO: Decoding by Accumulating Activations Momentum for Mitigating Hallucinations in Vision-Language Models
Kaishen Wang, Hengrui Gu, Meijun Gao et al.
Sculpting Features from Noise: Reward-Guided Hierarchical Diffusion for Task-Optimal Feature Transformation
Nanxu Gong, Zijun Li, Sixun Dong et al.
Learning Fine-Grained Representations through Textual Token Disentanglement in Composed Video Retrieval
Yue Wu, Zhaobo Qi, Yiling Wu et al.
Detecting Visual Information Manipulation Attacks in Augmented Reality: A Multimodal Semantic Reasoning Approach
Yanming Xiu, Maria Gorlatova
Generating Freeform Endoskeletal Robots
Muhan Li, Lingji Kong, Sam Kriegman
SemGeoMo: Dynamic Contextual Human Motion Generation with Semantic and Geometric Guidance
Peishan Cong, Ziyi Wang, Yuexin Ma et al.
FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion
Haosen Yang, Adrian Bulat, Isma Hadji et al.
HQGS: High-Quality Novel View Synthesis with Gaussian Splatting in Degraded Scenes
Xin Lin, Shi Luo, Xiaojun Shan et al.
PQMass: Probabilistic Assessment of the Quality of Generative Models using Probability Mass Estimation
Pablo Lemos, Sammy Sharief, Nikolay Malkin et al.
Pursuing Feature Separation based on Neural Collapse for Out-of-Distribution Detection
Yingwen Wu, Ruiji Yu, Xinwen Cheng et al.
Federated Continual Instruction Tuning
Haiyang Guo, Fanhu Zeng, Fei Zhu et al.
CausalRivers - Scaling up benchmarking of causal discovery for real-world time-series
Gideon Stein, Maha Shadaydeh, Jan Blunk et al.
Federated Class-Incremental Learning: A Hybrid Approach Using Latent Exemplars and Data-Free Techniques to Address Local and Global Forgetting
Milad Khademi Nori, IL-MIN KIM, Guanghui Wang
A multiscale analysis of mean-field transformers in the moderate interaction regime
Giuseppe Bruno, Federico Pasqualotto, Andrea Agazzi
A Comprehensive Evaluation on Event Reasoning of Large Language Models
Zhengwei Tao, Zhi Jin, Yifan Zhang et al.
GraphLand: Evaluating Graph Machine Learning Models on Diverse Industrial Data
Gleb Bazhenov, Oleg Platonov, Liudmila Prokhorenkova
Robustness Auditing for Linear Regression: To Singularity and Beyond
Ittai Rubinstein, Samuel Hopkins
Emergence and Evolution of Interpretable Concepts in Diffusion Models
Berk Tinaz, Zalan Fabian, Mahdi Soltanolkotabi
Circuit Transformer: A Transformer That Preserves Logical Equivalence
Xihan Li, Xing Li, Lei Chen et al.
PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient Vision Transformer
Pierre-David Letourneau, Manish Singh, Hsin-Pai Cheng et al.
Distilling Structured Rationale from Large Language Models to Small Language Models for Abstractive Summarization
Linyong Wang, Lianwei Wu, Shaoqi Song et al.
RobustMerge: Parameter-Efficient Model Merging for MLLMs with Direction Robustness
Fanhu Zeng, Haiyang Guo, Fei Zhu et al.
Neighborhood Self-Dissimilarity Attention for Medical Image Segmentation
Junren Chen, Rui Chen, Wei Wang et al.
Multi-Perspective Data Augmentation for Few-shot Object Detection
Anh-Khoa Nguyen Vu, Quoc Truong Truong, Vinh-Tiep Nguyen et al.
System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts
Xiaoqiang Wang, Suyuchen Wang, Yun Zhu et al.
Conformal Linguistic Calibration: Trading-off between Factuality and Specificity
Zhengping Jiang, Anqi Liu, Ben Van Durme
TIMotion: Temporal and Interactive Framework for Efficient Human-Human Motion Generation
Yabiao Wang, Shuo Wang, Jiangning Zhang et al.
Training-Free Constrained Generation With Stable Diffusion Models
Stefano Zampini, Jacob K Christopher, Luca Oneto et al.
Validating LLM-as-a-Judge Systems under Rating Indeterminacy
Luke Guerdan, Solon Barocas, Kenneth Holstein et al.
Keeping an Eye on LLM Unlearning: The Hidden Risk and Remedy
Jie Ren, Zhenwei Dai, Xianfeng Tang et al.
InterAct: Advancing Large-Scale Versatile 3D Human-Object Interaction Generation
Sirui Xu, Dongting Li, Yucheng Zhang et al.
PostCast: Generalizable Postprocessing for Precipitation Nowcasting via Unsupervised Blurriness Modeling
Junchao Gong, Siwei Tu, Weidong Yang et al.
AnoLLM: Large Language Models for Tabular Anomaly Detection
Che-Ping Tsai, Ganyu Teng, Phillip Wallis et al.
Adaptive Draft-Verification for Efficient Large Language Model Decoding
Xukun Liu, Bowen Lei, Ruqi Zhang et al.
MMReason: An Open-Ended Multi-Modal Multi-Step Reasoning Benchmark for MLLMs Toward AGI
Huanjin Yao, Jiaxing Huang, Yawen Qiu et al.
BinaryDM: Accurate Weight Binarization for Efficient Diffusion Models
Xingyu Zheng, Xianglong Liu, Haotong Qin et al.
Spatial Understanding from Videos: Structured Prompts Meet Simulation Data
Haoyu Zhang, Meng Liu, Zaijing Li et al.
STPro: Spatial and Temporal Progressive Learning for Weakly Supervised Spatio-Temporal Grounding
Aaryan Garg, Akash Kumar, Yogesh S. Rawat
Dynamic-Width Speculative Beam Decoding for LLM Inference
Zongyue Qin, Zifan He, Neha Prakriya et al.
Value-Guided Search for Efficient Chain-of-Thought Reasoning
Kaiwen Wang, Jin Zhou, Jonathan Chang et al.
CAM: A Constructivist View of Agentic Memory for LLM-Based Reading Comprehension
Rui Li, Zeyu Zhang, Xiaohe Bo et al.
NOVA: A Benchmark for Rare Anomaly Localization and Clinical Reasoning in Brain MRI
Cosmin Bercea, Jun Li, Philipp Raffler et al.
SWEA: Updating Factual Knowledge in Large Language Models via Subject Word Embedding Altering
Xiaopeng Li, Shasha Li, Shezheng Song et al.
ActionReasoningBench: Reasoning about Actions with and without Ramification Constraints
Divij Handa, Pavel Dolin, Shrinidhi Kumbhar et al.
PBR-NeRF: Inverse Rendering with Physics-Based Neural Fields
Sean Wu, Shamik Basu, Tim Broedermann et al.
MotionAgent: Fine-grained Controllable Video Generation via Motion Field Agent
Xinyao Liao, Xianfang Zeng, Liao Wang et al.
Fast Think-on-Graph: Wider, Deeper and Faster Reasoning of Large Language Model on Knowledge Graph
Xujian Liang, Zhaoquan Gu
ConMo: Controllable Motion Disentanglement and Recomposition for Zero-Shot Motion Transfer
Jiayi Gao, Zijin Yin, Changcheng Hua et al.
HAIF-GS: Hierarchical and Induced Flow-Guided Gaussian Splatting for Dynamic Scene
Jianing Chen, Zehao Li, Yujun Cai et al.
Monte Carlo Planning with Large Language Model for Text-Based Game Agents
Zijing Shi, Meng Fang, Ling Chen