Most Cited 2025 "object-proposal association" Papers
22,274 papers found • Page 101 of 112
Conference
ConText: Driving In-context Learning for Text Removal and Segmentation
Fei Zhang, Pei Zhang, Baosong Yang et al.
Reaction Graph: Towards Reaction-Level Modeling for Chemical Reactions with 3D Structures
Yingzhao Jian, Yue Zhang, Ying Wei et al.
Advancing Personalized Learning with Neural Collapse for Long-Tail Challenge
Hanglei Hu, Yingying Guo, Zhikang Chen et al.
Learning the Electronic Hamiltonian of Large Atomic Structures
Chen Hao Xia, Manasa Kaniselvan, Alexandros Nikolaos Ziogas et al.
Diffusion Counterfactual Generation with Semantic Abduction
Rajat Rasal, Avinash Kori, Fabio De Sousa Ribeiro et al.
When Dynamic Data Selection Meets Data Augmentation: Achieving Enhanced Training Acceleration
Suorong Yang, Peng Ye, Furao Shen et al.
Non-Stationary Predictions May Be More Informative: Exploring Pseudo-Labels with a Two-Phase Pattern of Training Dynamics
Hongbin Pei, Jingxin Hai, Yu Li et al.
Beyond Induction Heads: In-Context Meta Learning Induces Multi-Phase Circuit Emergence
Gouki Minegishi, Hiroki Furuta, Shohei Taniguchi et al.
Weakly-Supervised Contrastive Learning for Imprecise Class Labels
Zi-Hao Zhou, Jun-Jie Wang, Tong Wei et al.
Maintaining Proportional Committees with Dynamic Candidate Sets
Chris Dong, Jannik Peters
Solving Satisfiability Modulo Counting Exactly with Probabilistic Circuits
Jinzhao Li, Nan Jiang, Yexiang Xue
Exact Upper and Lower Bounds for the Output Distribution of Neural Networks with Random Inputs
Andrey Kofnov, Daniel Kapla, Ezio Bartocci et al.
Reward Translation via Reward Machine in Semi-Alignable MDPs
Yun Hua, Haosheng Chen, Wenhao Li et al.
TUMTraf VideoQA: Dataset and Benchmark for Unified Spatio-Temporal Video Understanding in Traffic Scenes
Xingcheng Zhou, Konstantinos Larintzakis, Hao Guo et al.
Lexico: Extreme KV Cache Compression via Sparse Coding over Universal Dictionaries
Junhyuck Kim, Jongho Park, Jaewoong Cho et al.
Catching Two Birds with One Stone: Reward Shaping with Dual Random Networks for Balancing Exploration and Exploitation
Haozhe Ma, Fangling Li, Jing Lim et al.
Refined generalization analysis of the Deep Ritz Method and Physics-Informed Neural Networks
Xianliang Xu, Ye Li, Zhongyi Huang
On the Out-of-Distribution Generalization of Self-Supervised Learning
Wenwen Qiang, Jingyao Wang, Zeen Song et al.
Leveraging Diffusion Model as Pseudo-Anomalous Graph Generator for Graph-Level Anomaly Detection
Jinyu Cai, Yunhe Zhang, Fusheng Liu et al.
Neural Collapse Beyond the Unconstrained Features Model: Landscape, Dynamics, and Generalization in the Mean-Field Regime
Diyuan Wu, Marco Mondelli
Stealing That Free Lunch: Exposing the Limits of Dyna-Style Reinforcement Learning
Brett Barkley, David Fridovich-Keil
AtlasD: Automatic Local Symmetry Discovery
Manu Bhat, Jonghyun Park, Jianke Yang et al.
Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale
Fan Zhou, Zengzhi Wang, Qian Liu et al.
The Devil Is in the Details: Tackling Unimodal Spurious Correlations for Generalizable Multimodal Reward Models
Zichao Li, Xueru Wen, Jie Lou et al.
Automated Red Teaming with GOAT: the Generative Offensive Agent Tester
Maya Pavlova, Erik Brinkman, Krithika Iyer et al.
Language Models May Verbatim Complete Text They Were Not Explicitly Trained On
Ken Ziyu Liu, Christopher A. Choquette Choo, Matthew Jagielski et al.
InfoSEM: A Deep Generative Model with Informative Priors for Gene Regulatory Network Inference
Tianyu Cui, Song-Jun Xu, Artem Moskalev et al.
Polynomial-Time Approximability of Constrained Reinforcement Learning
Jeremy McMahan
Benchmarking Abstract and Reasoning Abilities Through A Theoretical Perspective
Qingchuan Ma, Yuhang Wu, Xiawu Zheng et al.
No Soundness in the Real World: On the Challenges of the Verification of Deployed Neural Networks
Attila Szász, Balázs Bánhelyi, Mark Jelasity
ReVISE: Learning to Refine at Test-Time via Intrinsic Self-Verification
Hyunseok Lee, Seunghyuk Oh, Jaehyung Kim et al.
Large Language Models are Demonstration Pre-Selectors for Themselves
Jiarui Jin, Yuwei Wu, Haoxuan Li et al.
WAVE: Weighted Autoregressive Varying Gate for Time Series Forecasting
Jiecheng Lu, Xu Han, Yan Sun et al.
Large Language-Geometry Model: When LLM meets Equivariance
Zongzhao Li, Jiacheng Cen, Bing Su et al.
Closed-Loop Long-Horizon Robotic Planning via Equilibrium Sequence Modeling
Jinghan Li, Zhicheng Sun, Yadong Mu
Open Materials Generation with Stochastic Interpolants
Philipp Höllmer, Thomas Egg, Maya Martirossyan et al.
AGAV-Rater: Adapting Large Multimodal Model for AI-Generated Audio-Visual Quality Assessment
Yuqin Cao, Xiongkuo Min, Yixuan Gao et al.
GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance
Jinuk Kim, Marwa El Halabi, Wonpyo Park et al.
Attributes Shape the Embedding Space of Face Recognition Models
Pierrick Leroy, Antonio Mastropietro, Marco Nurisso et al.
Collapse or Thrive: Perils and Promises of Synthetic Data in a Self-Generating World
Joshua Kazdan, Rylan Schaeffer, Apratim Dey et al.
Linear Transformers as VAR Models: Aligning Autoregressive Attention Mechanisms with Autoregressive Forecasting
Jiecheng Lu, Shihao Yang
One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs
Yinghui Li, Jiayi Kuang, Haojing Huang et al.
Probing Visual Language Priors in VLMs
Tiange Luo, Ang Cao, Gunhee Lee et al.
Control and Realism: Best of Both Worlds in Layout-to-Image without Training
Bonan Li, Yinhan Hu, Songhua Liu et al.
Sketch to Adapt: Fine-Tunable Sketches for Efficient LLM Adaptation
Tianyi Zhang, Junda Su, Aditya Desai et al.
Tracking Most Significant Shifts in Infinite-Armed Bandits
Joe Suk, Jung-hun Kim
When to Forget? Complexity Trade-offs in Machine Unlearning
Martin Van Waerebeke, Marco Lorenzi, Giovanni Neglia et al.
Direct Density Ratio Optimization: A Statistically Consistent Approach to Aligning Large Language Models
Rei Higuchi, Taiji Suzuki
M2PDE: Compositional Generative Multiphysics and Multi-component PDE Simulation
Tao Zhang, Zhenhai Liu, Feipeng Qi et al.
Spherical Rotation Dimension Reduction with Geometric Loss Functions
Hengrui Luo, Jeremy E. Purvis, Didong Li
Improving the Scaling Laws of Synthetic Data with Deliberate Practice
Reyhane Askari Hemmat, Mohammad Pezeshki, Elvis Dohmatob et al.
Widening the Network Mitigates the Impact of Data Heterogeneity on FedAvg
Like Jian, Dong Liu
Federated Disentangled Tuning with Textual Prior Decoupling and Visual Dynamic Adaptation
Yihao Yang, Wenke Huang, Guancheng Wan et al.
Understanding High-Dimensional Bayesian Optimization
Leonard Papenmeier, Matthias Poloczek, Luigi Nardi
Learning Configurations for Data-Driven Multi-Objective Optimization
Zhiyang Chen, Hailong Yao, Xia Yin
End-to-End Learning Framework for Solving Non-Markovian Optimal Control
Xiaole Zhang, Peiyu Zhang, Xiongye Xiao et al.
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer
Enze Xie, Junsong Chen, Yuyang Zhao et al.
Going Deeper into Locally Differentially Private Graph Neural Networks
Longzhu He, Chaozhuo Li, Peng Tang et al.
Federated Node-Level Clustering Network with Cross-Subgraph Link Mending
Jingxin Liu, Renda Han, Wenxuan Tu et al.
Causal Effect Identification in lvLiNGAM from Higher-Order Cumulants
Daniele Tramontano, Yaroslav Kivva, Saber Salehkaleybar et al.
HuMoCon: Concept Discovery for Human Motion Understanding
Qihang Fang, Chengcheng Tang, Bugra Tekin et al.
Reloc3r: Large-Scale Training of Relative Camera Pose Regression for Generalizable, Fast, and Accurate Visual Localization
Siyan Dong, Shuzhe Wang, Shaohui Liu et al.
Bridge Frame and Event: Common Spatiotemporal Fusion for High-Dynamic Scene Optical Flow
Hanyu Zhou, Haonan Wang, Haoyue Liu et al.
StoryGPT-V: Large Language Models as Consistent Story Visualizers
Xiaoqian Shen, Mohamed Elhoseiny
Invisible Backdoor Attack against Self-supervised Learning
Hanrong Zhang, Zhenting Wang, Boheng Li et al.
S^3-Face: SSS-Compliant Facial Reflectance Estimation via Diffusion Priors
Xingyu Ren, Jiankang Deng, Yuhao Cheng et al.
SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration
Jianyi Wang, Zhijie Lin, Meng Wei et al.
RSAR: Restricted State Angle Resolver and Rotated SAR Benchmark
Xin Zhang, Xue Yang, Yuxuan Li et al.
Diffusion Model is Effectively Its Own Teacher
Xinyin Ma, Runpeng Yu, Songhua Liu et al.
Towards Visual Discrimination and Reasoning of Real-World Physical Dynamics: Physics-Grounded Anomaly Detection
wenqiao Li, Yao Gu, Xintao Chen et al.
Flow-NeRF: Joint Learning of Geometry, Poses, and Dense Flow within Unified Neural Representations
Xunzhi Zheng, Dan Xu
LiVOS: Light Video Object Segmentation with Gated Linear Matching
Qin Liu, Jianfeng Wang, Zhengyuan Yang et al.
Dynamic Content Prediction with Motion-aware Priors for Blind Face Video Restoration
Lianxin Xie, csbingbing zheng, Si Wu et al.
BADGR: Bundle Adjustment Diffusion Conditioned by Gradients for Wide-Baseline Floor Plan Reconstruction
Yuguang Li, Ivaylo Boyadzhiev, Zixuan Liu et al.
ReinAD: Towards Real-world Industrial Anomaly Detection with a Comprehensive Contrastive Dataset
Xu Wang, Jingyuan Zhuo, Zhiyuan You et al.
Object-Centric Pretraining via Target Encoder Bootstrapping
Nikola Đukić, Tim Lebailly, Tinne Tuytelaars
Towards More General Video-based Deepfake Detection through Facial Component Guided Adaptation for Foundation Model
Yue-Hua Han, Tai-Ming Huang, Kailung Hua et al.
WF-VAE: Enhancing Video VAE by Wavelet-Driven Energy Flow for Latent Video Diffusion Model
Zongjian Li, Bin Lin, Yang Ye et al.
MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling
Yifang Men, Yuan Yao, Miaomiao Cui et al.
Provable unlearning in topic modeling and downstream tasks
Stanley Wei, Sadhika Malladi, Sanjeev Arora et al.
Leveraging Perturbation Robustness to Enhance Out-of-Distribution Detection
Wenxi Chen, Raymond A. Yeh, Shaoshuai Mou et al.
TVNet: A Novel Time Series Analysis Method Based on Dynamic Convolution and 3D-Variation
Chenghan Li, Mingchen LI, Ruisheng Diao
MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation
Chenhui Zhu, Yilu Wu, Shuai Wang et al.
Noise-Consistent Siamese-Diffusion for Medical Image Synthesis and Segmentation
Kunpeng Qiu, Zhiqiang Gao, Zhiying Zhou et al.
Parametric Point Cloud Completion for Polygonal Surface Reconstruction
Zhaiyu Chen, Yuqing Wang, Liangliang Nan et al.
DeceptionBench: A Comprehensive Benchmark for AI Deception Behaviors in Real-world Scenarios
Yao Huang, Yitong Sun, Yichi Zhang et al.
MathArena: Evaluating LLMs on Uncontaminated Math Competitions
Mislav Balunovic, Jasper Dekoninck, Ivo Petrov et al.
Spik-NeRF: Spiking Neural Networks for Neural Radiance Fields
Gang Wan, Qinlong Lan, Zihan Li et al.
RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives
Chirag Parikh, Deepti Rawat, Rakshitha R. T. et al.
AIM-Fair: Advancing Algorithmic Fairness via Selectively Fine-Tuning Biased Models with Contextual Synthetic Data
Zengqun Zhao, Ziquan Liu, Yu Cao et al.
TAET: Two-Stage Adversarial Equalization Training on Long-Tailed Distributions
Wang Yu-Hang, Junkang Guo, Aolei Liu et al.
When does compositional structure yield compositional generalization? A kernel theory.
Samuel Lippl, Kimberly Stachenfeld
LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes
Xiang Xu, Lingdong Kong, hui shuai et al.
Edge-aware Image Smoothing with Relative Wavelet Domain Representation
Huiqing Qi, Xiaoliu Luo, Tingting Li et al.
ToVE: Efficient Vision-Language Learning via Knowledge Transfer from Vision Experts
Yuanchen Wu, Junlong Du, Ke Yan et al.
Interpreting Object-level Foundation Models via Visual Precision Search
Ruoyu Chen, Siyuan Liang, Jingzhi Li et al.
Descriptor-In-Pixel : Point-Feature Tracking For Pixel Processor Arrays
Laurie Bose, Piotr Dudek, Jianing Chen
CONGO: Compressive Online Gradient Optimization
Jeremy Carleton, Prathik Vijaykumar, Divyanshu Saxena et al.
Consistent Normal Orientation for 3D Point Clouds via Least Squares on Delaunay Graph
Rao Fu, Jianmin Zheng, Liang Yu
Uncover Governing Law of Pathology Propagation Mechanism Through A Mean-Field Game
Tingting Dan, Zhihao Fan, Guorong Wu
Unveiling the Knowledge of CLIP for Training-Free Open-Vocabulary Semantic Segmentation
Yajie Liu, Guodong Wang, Jinjin Zhang et al.
Optimal Protocols for Continual Learning via Statistical Physics and Control Theory
Francesco Mori, Stefano Sarao Mannelli, Francesca Mignacco
AdaCM^2: On Understanding Extremely Long-Term Video with Adaptive Cross-Modality Memory Reduction
Yuanbin Man, Ying Huang, Chengming Zhang et al.
Decoupled Graph Energy-based Model for Node Out-of-Distribution Detection on Heterophilic Graphs
Yuhan Chen, Yihong Luo, Yifan Song et al.
Movie Weaver: Tuning-Free Multi-Concept Video Personalization with Anchored Prompts
Feng Liang, Haoyu Ma, Zecheng He et al.
RandLoRA: Full rank parameter-efficient fine-tuning of large models
Paul Albert, Frederic Zhang, Hemanth Saratchandran et al.
Efficient Off-Policy Learning for High-Dimensional Action Spaces
Fabian Otto, Philipp Becker, Vien A Ngo et al.
Exploring Timeline Control for Facial Motion Generation
Yifeng Ma, Jinwei Qi, Chaonan Ji et al.
IRGS: Inter-Reflective Gaussian Splatting with 2D Gaussian Ray Tracing
Chun Gu, Xiaofei Wei, Zixuan Zeng et al.
OSLoPrompt: Bridging Low-Supervision Challenges and Open-Set Domain Generalization in CLIP
Mohamad Hassan N C, Divyam Gupta, Mainak Singha et al.
EchoTraffic: Enhancing Traffic Anomaly Understanding with Audio-Visual Insights
Zhenghao Xing, Hao Chen, Binzhu Xie et al.
Semantix: An Energy-guided Sampler for Semantic Style Transfer
Huiang He, Minghui HU, Chuanxia Zheng et al.
Learning Temporally Consistent Video Depth from Video Diffusion Priors
Jiahao Shao, Yuanbo Yang, Hongyu Zhou et al.
Yo’Chameleon: Personalized Vision and Language Generation
Thao Nguyen, Krishna Kumar Singh, Jing Shi et al.
X-ALMA: Plug & Play Modules and Adaptive Rejection for Quality Translation at Scale
Haoran Xu, Kenton Murray, Philipp Koehn et al.
Rethinking Reward Modeling in Preference-based Large Language Model Alignment
Hao Sun, Yunyi Shen, Jean-Francois Ton
Protecting against simultaneous data poisoning attacks
Neel Alex, Muhammad Shoaib Ahmed Siddiqui, Amartya Sanyal et al.
Understanding Long Videos with Multimodal Language Models
Kanchana Ranasinghe, Xiang Li, Kumara Kahatapitiya et al.
PersonaBooth: Personalized Text-to-Motion Generation
Boeun Kim, Hea In Jeong, JungHoon Sung et al.
Benchmarking Object Detectors under Real-World Distribution Shifts in Satellite Imagery
Sara Al-Emadi, Yin Yang, Ferda Ofli
Electromyography-Informed Facial Expression Reconstruction for Physiological-Based Synthesis and Analysis
Tim Büchner, Christoph Anders, Orlando Guntinas-Lichius et al.
InsTaG: Learning Personalized 3D Talking Head from Few-Second Video
Jiahe Li, Jiawei Zhang, Xiao Bai et al.
Unseen Visual Anomaly Generation
HAN SUN, Yunkang Cao, Hao Dong et al.
Neural Wave Equation for Irregularly Sampled Sequence Data
Arkaprava Majumdar, M Anand Krishna, P. K. Srijith
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity
Mutian He, Philip N. Garner
Rethinking Graph Prompts: Unraveling the Power of Data Manipulation in Graph Neural Networks
Chenyi Zi, Bowen LIU, Xiangguo SUN et al.
A General Framework for Off-Policy Learning with Partially-Observed Reward
Rikiya Takehi, Masahiro Asami, Kosuke Kawakami et al.
Towards Zero-Shot Anomaly Detection and Reasoning with Multimodal Large Language Models
Jiacong Xu, Shao-Yuan Lo, Bardia Safaei et al.
The Foundations of Tokenization: Statistical and Computational Concerns
Juan Luis Gastaldi, John Terilla, Luca Malagutti et al.
SLAM3R: Real-Time Dense Scene Reconstruction from Monocular RGB Videos
Yuzheng Liu, Siyan Dong, Shuzhe Wang et al.
Generating Likely Counterfactuals Using Sum-Product Networks
Jiří Němeček, Tomáš Pevný, Jakub Marecek
DiffPuter: Empowering Diffusion Models for Missing Data Imputation
Hengrui Zhang, Liancheng Fang, Qitian Wu et al.
EchoMatch: Partial-to-Partial Shape Matching via Correspondence Reflection
Yizheng Xie, Viktoria Ehm, Paul Roetzer et al.
Memory Efficient Transformer Adapter for Dense Predictions
Dong Zhang, Rui Yan, Pingcheng Dong et al.
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation
Yiyang Ma, Xingchao Liu, Xiaokang Chen et al.
Revolutionizing EMCCD Denoising through a Novel Physics-Based Learning Framework for Noise Modeling
Haiyang Jiang, Tetsuichi Wazawa, Imari Sato et al.
PatchDEMUX: A Certifiably Robust Framework for Multi-label Classifiers Against Adversarial Patches
Dennis Jacob, Chong Xiang, Prateek Mittal
metabench - A Sparse Benchmark of Reasoning and Knowledge in Large Language Models
Alex Kipnis, Konstantinos Voudouris, Luca Schulze Buschoff et al.
SmallKV: Small Model Assisted Compensation of KV Cache Compression for Efficient LLM Inference
Yi Zhao, Yajuan Peng, Nguyen Cam-Tu et al.
The Value of Sensory Information to a Robot
Arjun Krishna, Edward Hu, Dinesh Jayaraman
Detecting Backdoor Samples in Contrastive Language Image Pretraining
Hanxun Huang, Sarah Erfani, Yige Li et al.
CALICO: Part-Focused Semantic Co-Segmentation with Large Vision-Language Models
Kiet A. Nguyen, Adheesh Juvekar, Tianjiao Yu et al.
Semantic Aware Representation Learning for Lifelong Learning
Fahad Sarfraz, Elahe Arani, Bahram Zonooz
Repurposing Pre-trained Video Diffusion Models for Event-based Video Interpolation
Jingxi Chen, Brandon Y. Feng, Haoming Cai et al.
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
Ziyang Wang, Shoubin Yu, Elias Stengel-Eskin et al.
MM-OR: A Large Multimodal Operating Room Dataset for Semantic Understanding of High-Intensity Surgical Environments
Ege Özsoy, Chantal Pellegrini, Tobias Czempiel et al.
VideoSPatS: Video SPatiotemporal Splines for Disentangled Occlusion, Appearance and Motion Modeling and Editing
Juan Luis Gonzalez Bello, Xu Yao, Alex Whelan et al.
Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding
Pedro Hermosilla, Christian Stippel, Leon Sick
Prior Does Matter: Visual Navigation via Denoising Diffusion Bridge Models
Hao Ren, Yiming Zeng, Zetong Bi et al.
LoRA Recycle: Unlocking Tuning-Free Few-Shot Adaptability in Visual Foundation Models by Recycling Pre-Tuned LoRAs
Zixuan Hu, Yongxian Wei, Li Shen et al.
TensoFlow: Tensorial Flow-based Sampler for Inverse Rendering
Chun Gu, Xiaofei Wei, Li Zhang et al.
STAR-Edge: Structure-aware Local Spherical Curve Representation for Thin-walled Edge Extraction from Unstructured Point Clouds
Zikuan Li, Honghua Chen, Yuecheng Wang et al.
ZoomLDM: Latent Diffusion Model for Multi-scale Image Generation
Srikar Yellapragada, Alexandros Graikos, Kostas Triaridis et al.
A Large-Scale 3D Face Mesh Video Dataset via Neural Re-parameterized Optimization
Kim Youwang, Lee Hyun, Kim Sung-Bin et al.
RainyGS: Efficient Rain Synthesis with Physically-Based Gaussian Splatting
Qiyu Dai, Xingyu Ni, Qianfan Shen et al.
Sufficient Context: A New Lens on Retrieval Augmented Generation Systems
Hailey Joren, Jianyi Zhang, Chun-Sung Ferng et al.
Zero-shot Model-based Reinforcement Learning using Large Language Models
Abdelhakim Benechehab, Youssef Attia El Hili, Ambroise Odonnat et al.
Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning
Amrith Setlur, Chirag Nagpal, Adam Fisch et al.
Design2GarmentCode: Turning Design Concepts to Tangible Garments Through Program Synthesis
Feng Zhou, Ruiyang Liu, chen liu et al.
Efficient Dynamic Scene Editing via 4D Gaussian-based Static-Dynamic Separation
Joohyun Kwon, Hanbyel Cho, Junmo Kim
Model-agnostic meta-learners for estimating heterogeneous treatment effects over time
Dennis Frauen, Konstantin Hess, Stefan Feuerriegel
EventFly: Event Camera Perception from Ground to the Sky
Lingdong Kong, Dongyue Lu, Xiang Xu et al.
Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations
Nick Jiang, Anish Kachinthaya, Suzanne Petryk et al.
Exploiting Deblurring Networks for Radiance Fields
Haeyun Choi, Heemin Yang, Janghyeok Han et al.
DeltaFormer: Unlock the state space of Transformer
Mingyu Xu, Tenglong Ao, Jiaao He et al.
Interpreting the Second-Order Effects of Neurons in CLIP
Yossi Gandelsman, Alexei Efros, Jacob Steinhardt
ICE: Intrinsic Concept Extraction from a Single Image via Diffusion Models
Fernando Julio Cendra, Kai Han
Tuning Frequency Bias of State Space Models
Annan Yu, Dongwei Lyu, Soon Hoe Lim et al.
Latent Bayesian Optimization via Autoregressive Normalizing Flows
Seunghun Lee, Jinyoung Park, Jaewon Chu et al.
Can Generative Video Models Help Pose Estimation?
Ruojin Cai, Jason Y. Zhang, Philipp Henzler et al.
MMRL: Multi-Modal Representation Learning for Vision-Language Models
Yuncheng Guo, Xiaodong Gu
VidTwin: Video VAE with Decoupled Structure and Dynamics
Yuchi Wang, Junliang Guo, Xinyi Xie et al.
CraftRTL: High-quality Synthetic Data Generation for Verilog Code Models with Correct-by-Construction Non-Textual Representations and Targeted Code Repair
Mingjie Liu, Yun-Da Tsai, Wenfei Zhou et al.
Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions
Stefan Andreas Baumann, Felix Krause, Michael Neumayr et al.
Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces
Andy (DiJia) Su, Sainbayar Sukhbaatar, Michael Rabbat et al.
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion
Zhaoxi Chen, Jiaxiang Tang, Yuhao Dong et al.
Unraveling Normal Anatomy via Fluid-Driven Anomaly Randomization
Peirong Liu, Ana Lawry Aguila, Juan Iglesias
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Xubing Ye, Yukang Gan, Xiaoke Huang et al.
Period-LLM: Extending the Periodic Capability of Multimodal Large Language Model
Yuting Zhang, Hao Lu, Qingyong Hu et al.
Param$\Delta$ for Direct Mixing: Post-Train Large Language Model At Zero Cost
Sheng Cao, Mingrui Wu, Karthik Prasad et al.
Continuous Locomotive Crowd Behavior Generation
Inhwan Bae, Junoh Lee, Hae-Gon Jeon
A Unified Latent Schrödinger Bridge Diffusion Model for Unsupervised Anomaly Detection and Localization
Shilhora Akshay, Niveditha Lakshmi Narasimhan, Jacob George et al.
Token Cropr: Faster ViTs for Quite a Few Tasks
Benjamin Bergner, Christoph Lippert, Aravindh Mahendran
CacheQuant: Comprehensively Accelerated Diffusion Models
Xuewen Liu, Zhikai Li, Qingyi Gu
Text-guided Sparse Voxel Pruning for Efficient 3D Visual Grounding
Wenxuan Guo, Xiuwei Xu, Ziwei Wang et al.
Is Your Video Language Model a Reliable Judge?
Ming Liu, Wensheng Zhang
DABL: Detecting Semantic Anomalies in Business Processes Using Large Language Models
Wei Guan, Jian Cao, Jianqi Gao et al.
Behavior Importance-Aware Graph Neural Architecture Search for Cross-Domain Recommendation
Chendi Ge, Xin Wang, Ziwei Zhang et al.
Uncertainty-Informed Meta Pseudo Labeling for Surrogate Modeling with Limited Labeled Data
Xingyu Ren, Pengwei Liu, Pengkai Wang et al.
SimLTD: Simple Supervised and Semi-Supervised Long-Tailed Object Detection
Phi Vu Tran
Identifiability for Gaussian Processes with Holomorphic Kernels
Ameer Qaqish, Didong Li
What’s in the Image? A Deep-Dive into the Vision of Vision Language Models
Omri Kaduri, Shai Bagon, Tali Dekel
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts
Peijie Wang, Zhong-Zhi Li, Fei Yin et al.
Dynamic Low-Rank Sparse Adaptation for Large Language Models
Weizhong Huang, Yuxin Zhang, Xiawu Zheng et al.
MonoTAKD: Teaching Assistant Knowledge Distillation for Monocular 3D Object Detection
Hou-I Liu, Christine Wu, Jen-Hao Cheng et al.
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos
Dayal Singh Kalra, Tianyu He, Maissam Barkeshli
Solving hidden monotone variational inequalities with surrogate losses
Ryan D'Orazio, Danilo Vucetic, Zichu Liu et al.
LLMs' Potential Influences on Our Democracy: Challenges and Opportunities
Yujin Potter, David Rand, Yejin Choi et al.
APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers
Zhuguanyu Wu, Jiayi Zhang, Jiaxin Chen et al.
InstaSHAP: Interpretable Additive Models Explain Shapley Values Instantly
James Enouen, Yan Liu