Most Cited 2025 "analysis" Papers
22,274 papers found • Page 33 of 112
Conference
GCE-Pose: Global Context Enhancement for Category-level Object Pose Estimation
Weihang Li, Hongli XU, Junwen Huang et al.
Question-Aware Gaussian Experts for Audio-Visual Question Answering
Hongyeob Kim, Inyoung Jung, Dayoon Suh et al.
RelationField: Relate Anything in Radiance Fields
Sebastian Koch, Johanna Wald, Mirco Colosi et al.
TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models
Ruidong Chen, honglin guo, Lanjun Wang et al.
Boosting the visual interpretability of CLIP via adversarial fine-tuning
Shizhan Gong, Haoyu LEI, Qi Dou et al.
LIRM: Large Inverse Rendering Model for Progressive Reconstruction of Shape, Materials and View-dependent Radiance Fields
Zhengqin Li, Dilin Wang, Ka chen et al.
LazyMAR: Accelerating Masked Autoregressive Models via Feature Caching
Feihong Yan, qingyan wei, Jiayi Tang et al.
Do Computer Vision Foundation Models Learn the Low-level Characteristics of the Human Visual System?
Yancheng Cai, Fei Yin, Dounia Hammou et al.
Active Task Disambiguation with LLMs
Katarzyna Kobalczyk, Nicolás Astorga, Tennison Liu et al.
MotionAgent: Fine-grained Controllable Video Generation via Motion Field Agent
Xinyao Liao, Xianfang Zeng, Liao Wang et al.
ChatHuman: Chatting about 3D Humans with Tools
Jing Lin, Yao Feng, Weiyang Liu et al.
OpenRSD: Towards Open-prompts for Object Detection in Remote Sensing Images
Ziyue Huang, Yongchao Feng, Ziqi Liu et al.
TurboFill: Adapting Few-step Text-to-image Model for Fast Image Inpainting
Liangbin Xie, Daniil Pakhomov, Zhonghao Wang et al.
S4M: S4 for multivariate time series forecasting with Missing values
Jing Peng, Meiqi Yang, Qiong Zhang et al.
MotiF: Making Text Count in Image Animation with Motion Focal Loss
Shijie Wang, Samaneh Azadi, Rohit Girdhar et al.
Keyframe-oriented Vision Token Pruning: Enhancing Efficiency of Large Vision Language Models on Long-Form Video Processing
Yudong Liu, Jingwei Sun, Yueqian Lin et al.
Towards Understanding the Robustness of Diffusion-Based Purification: A Stochastic Perspective
Yiming Liu, Kezhao Liu, Yao Xiao et al.
AerialVG: A Challenging Benchmark for Aerial Visual Grounding by Exploring Positional Relations
Junli Liu, Qizhi Chen, Zhigang Wang et al.
EvEnhancer: Empowering Effectiveness, Efficiency and Generalizability for Continuous Space-Time Video Super-Resolution with Events
Shuoyan Wei, Feng Li, Shengeng Tang et al.
CWNet: Causal Wavelet Network for Low-Light Image Enhancement
Tongshun Zhang, Pingping Liu, Yubing Lu et al.
Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities
Liuyi Wang, Xinyuan Xia, Hui Zhao et al.
Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing
Peihao Wang, Ruisi Cai, Yuehao Wang et al.
GOAL: Global-local Object Alignment Learning
Hyungyu Choi, Young Kyun Jang, Chanho Eom
GeoLoRA: Geometric integration for parameter efficient fine-tuning
Steffen Schotthöfer, Emanuele Zangrando, Gianluca Ceruti et al.
PADRe: A Unifying Polynomial Attention Drop-in Replacement for Efficient Vision Transformer
Pierre-David Letourneau, Manish Singh, Hsin-Pai Cheng et al.
Reconstructing Humans with a Biomechanically Accurate Skeleton
Yan Xia, Xiaowei Zhou, Etienne Vouga et al.
UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image
Xingyu Liu, Gu Wang, Ruida Zhang et al.
Reference-Based 3D-Aware Image Editing with Triplanes
Bahri Batuhan Bilecen, Yiğit Yalın, Ning Yu et al.
LookCloser: Frequency-aware Radiance Field for Tiny-Detail Scene
Xiaoyu Zhang, Weihong Pan, Chong Bao et al.
FlexGen: Flexible Multi-View Generation from Text and Image Inputs
Xinli Xu, Wenhang Ge, Jiantao Lin et al.
Distributed Speculative Inference (DSI): Speculation Parallelism for Provably Faster Lossless Language Model Inference
Nadav Timor, Jonathan Mamou, Daniel Korat et al.
HarmonySet: A Comprehensive Dataset for Understanding Video-Music Semantic Alignment and Temporal Synchronization
Zitang Zhou, Ke Mei, Yu Lu et al.
Gradient-Guided Annealing for Domain Generalization
Aristotelis Ballas, Christos Diou
SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization
Hongrui Jia, Chaoya Jiang, Haiyang Xu et al.
APHQ-ViT: Post-Training Quantization with Average Perturbation Hessian Based Reconstruction for Vision Transformers
Zhuguanyu Wu, Jiayi Zhang, Jiaxin Chen et al.
Co-op: Correspondence-based Novel Object Pose Estimation
Sungphill Moon, Hyeontae Son, Dongcheol Hur et al.
TruthPrInt: Mitigating Large Vision-Language Models Object Hallucination Via Latent Truthful-Guided Pre-Intervention
Jinhao Duan, Fei Kong, Hao Cheng et al.
RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation
Kaidong Zhang, Rongtao Xu, Ren Pengzhen et al.
RigGS: Rigging of 3D Gaussians for Modeling Articulated Objects in Videos
Yuxin Yao, Zhi Deng, Junhui Hou
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
Byung-Kwan Lee, Ryo Hachiuma, Yu-Chiang Frank Wang et al.
Can Generative Video Models Help Pose Estimation?
Ruojin Cai, Jason Y. Zhang, Philipp Henzler et al.
RealEdit: Reddit Edits As a Large-scale Empirical Dataset for Image Transformations
Peter Sushko, Ayana Bharadwaj, Zhi Yang Lim et al.
Causal Representation Learning from Multimodal Biomedical Observations
Yuewen Sun, Lingjing Kong, Guangyi Chen et al.
Robustness Auditing for Linear Regression: To Singularity and Beyond
Ittai Rubinstein, Samuel Hopkins
Integrative Decoding: Improving Factuality via Implicit Self-consistency
Yi Cheng, Xiao Liang, Yeyun Gong et al.
FOCUS: Knowledge-enhanced Adaptive Visual Compression for Few-shot Whole Slide Image Classification
Zhengrui Guo, Conghao Xiong, Jiabo MA et al.
Vision-Language Models Can't See the Obvious
YASSER ABDELAZIZ DAHOU DJILALI, Ngoc Huynh, Phúc Lê Khắc et al.
RI3D: Few-Shot Gaussian Splatting With Repair and Inpainting Diffusion Priors
Avinash Paliwal, xilong zhou, Wei Ye et al.
Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations
Xiang Xu, Lingdong Kong, Song Wang et al.
VideoRFSplat: Direct Scene-Level Text-to-3D Gaussian Splatting Generation with Flexible Pose and Multi-View Joint Modeling
Hyojun Go, Byeongjun Park, Hyelin Nam et al.
FlowR: Flowing from Sparse to Dense 3D Reconstructions
Tobias Fischer, Samuel Rota Bulò, Yung-Hsu Yang et al.
ODDR: Outlier Detection & Dimension Reduction Based Defense Against Adversarial Patches
Nandish Chattopadhyay, Amira Guesmi, Muhammad Abdullah Hanif et al.
Cycle Consistency as Reward: Learning Image-Text Alignment without Human Preferences
Hyojin Bahng, Caroline Chan, Fredo Durand et al.
HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics
Gueter Josmy Faure, Jia-Fong Yeh, Min-Hung Chen et al.
DASH: Detection and Assessment of Systematic Hallucinations of VLMs
Maximilian Augustin, Yannic Neuhaus, Matthias Hein
Mesh-RFT: Enhancing Mesh Generation via Fine-grained Reinforcement Fine-Tuning
Jian Liu, Jing Xu, Song Guo et al.
DataRater: Meta-Learned Dataset Curation
Dan Andrei Calian, Greg Farquhar, Iurii Kemaev et al.
Hyperbolic Dataset Distillation
Wenyuan Li, Guang Li, Keisuke Maeda et al.
Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models
Cameron Tice, Philipp Kreer, Nathan Helm-Burger et al.
On the Relation between Rectified Flows and Optimal Transport
Johannes Hertrich, Antonin Chambolle, Julie Delon
MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges?
Yunxiang Zhang, Muhammad Khalifa, Shitanshu Bhushan et al.
T-SHIRT: Token-Selective Hierarchical Data Selection for Instruction Tuning
Yanjun Fu, Faisal Hamman, Sanghamitra Dutta
Dynam3D: Dynamic Layered 3D Tokens Empower VLM for Vision-and-Language Navigation
Zihan Wang, Seungjun Lee, Gim Hee Lee
LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration
Yuyao Zhang, Jinghao Li, Yu-Wing Tai
Compiler-R1: Towards Agentic Compiler Auto-tuning with Reinforcement Learning
Haolin Pan, Hongyu Lin, Haoran Luo et al.
On Extending Direct Preference Optimization to Accommodate Ties
Jinghong Chen, Guangyu Yang, Weizhe Lin et al.
Generalized Gradient Norm Clipping & Non-Euclidean $(L_0,L_1)$-Smoothness
Thomas Pethick, Wanyun Xie, Mete Erdogan et al.
Straight-Line Diffusion Model for Efficient 3D Molecular Generation
Yuyan Ni, Shikun Feng, Haohan Chi et al.
Multimodal LLMs as Customized Reward Models for Text-to-Image Generation
Shijie Zhou, Ruiyi Zhang, Huaisheng Zhu et al.
Space Group Equivariant Crystal Diffusion
Rees Chang, Angela Pak, Alex Guerra et al.
Latent Policy Barrier: Learning Robust Visuomotor Policies by Staying In-Distribution
Zhanyi Sun, Shuran Song
Accelerating Diffusion Transformer via Gradient-Optimized Cache
Junxiang Qiu, Lin Liu, Shuo Wang et al.
Importance-Based Token Merging for Efficient Image and Video Generation
Haoyu Wu, Jingyi Xu, Hieu Le et al.
Universal generalization guarantees for Wasserstein distributionally robust models
Tam Le, Jerome Malick
TempSamp-R1: Effective Temporal Sampling with Reinforcement Fine-Tuning for Video LLMs
Yunheng Li, Jing Cheng, Shaoyong Jia et al.
LittleBit: Ultra Low-Bit Quantization via Latent Factorization
Banseok Lee, Dongkyu Kim, Youngcheon You et al.
ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting
Ruijie Zhu, Mulin Yu, Linning Xu et al.
MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering
Rushi Qiang, Yuchen Zhuang, Yinghao Li et al.
Dense SAE Latents Are Features, Not Bugs
Xiaoqing Sun, Alessandro Stolfo, Joshua Engels et al.
HAIF-GS: Hierarchical and Induced Flow-Guided Gaussian Splatting for Dynamic Scene
Jianing Chen, Zehao Li, Yujun Cai et al.
Domain-RAG: Retrieval-Guided Compositional Image Generation for Cross-Domain Few-Shot Object Detection
Yu Li, Xingyu Qiu, Yuqian Fu et al.
NOVA: A Benchmark for Rare Anomaly Localization and Clinical Reasoning in Brain MRI
Cosmin Bercea, Jun Li, Philipp Raffler et al.
Is PRM Necessary? Problem-Solving RL Implicitly Induces PRM Capability in LLMs
Zhangyin Feng, Qianglong Chen, Ning Lu et al.
Dynamic Typography: Bringing Text to Life via Video Diffusion Prior
Zichen Liu, Yihao Meng, Hao Ouyang et al.
BOOM: Benchmarking Out-Of-distribution Molecular Property Predictions of Machine Learning Models
Evan Antoniuk, Shehtab Zaman, Tal Ben-Nun et al.
Wicked Oddities: Selectively Poisoning for Effective Clean-Label Backdoor Attacks
Hung Quang Nguyen, Hieu Nguyen, Anh Ta et al.
CLIP Under the Microscope: A Fine-Grained Analysis of Multi-Object Representation
Reza Abbasi, Ali Nazari, Aminreza Sefid et al.
Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models
Xuran Ma, Yexin Liu, Yaofu LIU et al.
RGB-Event ISP: The Dataset and Benchmark
Yunfan LU, Yanlin Qian, Ziyang Rao et al.
Cross-modal Ship Re-Identification via Optical and SAR Imagery: A Novel Dataset and Method
Han Wang, Shengyang Li, Jian Yang et al.
mmCooper: A Multi-agent Multi-stage Communication-efficient and Collaboration-robust Cooperative Perception Framework
Bingyi Liu, Jian Teng, Hongfei Xue et al.
CompCap: Improving Multimodal Large Language Models with Composite Captions
Xiaohui Chen, Satya Narayan Shukla, Mahmoud Azab et al.
TurboReg: TurboClique for Robust and Efficient Point Cloud Registration
Shaocheng Yan, Pengcheng Shi, Zhenjun Zhao et al.
SpatialSplat: Efficient Semantic 3D from Sparse Unposed Images
Yu Sheng, Jiajun Deng, Xinran Zhang et al.
Seg4Diff: Unveiling Open-Vocabulary Semantic Segmentation in Text-to-Image Diffusion Transformers
Chaehyun Kim, Heeseong Shin, Eunbeen Hong et al.
PhysDrive: A Multimodal Remote Physiological Measurement Dataset for In-vehicle Driver Monitoring
Wang, Xiao Yang, Qingyong Hu et al.
A Statistical Framework for Ranking LLM-based Chatbots
Siavash Ameli, Siyuan Zhuang, Ion Stoica et al.
IGL-Bench: Establishing the Comprehensive Benchmark for Imbalanced Graph Learning
Jiawen Qin, Haonan Yuan, Qingyun Sun et al.
RadarSplat: Radar Gaussian Splatting for High-Fidelity Data Synthesis and 3D Reconstruction of Autonomous Driving Scenes
Pou-Chun Kung, Skanda Harisha, Ram Vasudevan et al.
DyMU: Dynamic Merging and Virtual Unmerging for Efficient Variable-Length VLMs
Zhenhailong Wang, Senthil Purushwalkam, Caiming Xiong et al.
Uncertain Multimodal Intention and Emotion Understanding in the Wild
Qu Yang, QingHongYa Shi, Tongxin Wang et al.
ConStellaration: A dataset of QI-like stellarator plasma boundaries and optimization benchmarks
Santiago Cadena, Andrea Merlo, Emanuel Laude et al.
Not-So-Optimal Transport Flows for 3D Point Cloud Generation
Ka-Hei Hui, Chao Liu, xiaohui zeng et al.
EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance
Yang Yue, Yulin Wang, Haojun Jiang et al.
CamFreeDiff: Camera-free Image to Panorama Generation with Diffusion Model
Xiaoding Yuan, Shitao Tang, Kejie Li et al.
Multitwine: Multi-Object Compositing with Text and Layout Control
Gemma Canet Tarrés, Zhe Lin, Zhifei Zhang et al.
VELOCITI: Benchmarking Video-Language Compositional Reasoning with Strict Entailment
Darshana Saravanan, Varun Gupta, Darshan Singh S et al.
Reinforcement learning with combinatorial actions for coupled restless bandits
Lily Xu, Bryan Wilder, Elias Khalil et al.
Diffusion Bridge AutoEncoders for Unsupervised Representation Learning
Yeongmin Kim, Kwanghyeon Lee, Minsang Park et al.
Reconciling Model Multiplicity for Downstream Decision Making
Ally Du, Dung Daniel Ngo, Steven Wu
Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning
Julian Minder, Clément Dumas, Caden Juang et al.
Lay2Story: Extending Diffusion Transformers for Layout-Togglable Story Generation
Ao Ma, Jiasong Feng, Ke Cao et al.
Student-Informed Teacher Training
Nico Messikommer, Jiaxu Xing, Elie Aljalbout et al.
FaceShield: Defending Facial Image against Deepfake Threats
Jaehwan Jeong, Sumin In, Sieun Kim et al.
Improving Gaussian Splatting with Localized Points Management
Haosen Yang, Chenhao Zhang, Wenqing Wang et al.
ReNeg: Learning Negative Embedding with Reward Guidance
Xiaomin Li, yixuan liu, Takashi Isobe et al.
DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning
Fucai Ke, Vijay Kumar b g, Xingjian Leng et al.
Time Travel is Cheating: Going Live with DeepFund for Real-Time Fund Investment Benchmarking
Changlun Li, Yao SHI, Chen Wang et al.
GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning
Shutong Ding, Ke Hu, Shan Zhong et al.
Rotation-Equivariant Self-Supervised Method in Image Denoising
Hanze Liu, Jiahong Fu, Qi Xie et al.
On Speeding Up Language Model Evaluation
Jin Zhou, Christian Belardi, Ruihan Wu et al.
Understanding Representation Dynamics of Diffusion Models via Low-Dimensional Modeling
Xiao Li, Zekai Zhang, Xiang Li et al.
Motion Modes: What Could Happen Next?
Karran Pandey, Yannick Hold-Geoffroy, Matheus Gadelha et al.
Learning from Streaming Video with Orthogonal Gradients
Tengda Han, Dilara Gokay, Joseph Heyward et al.
``Principal Components" Enable A New Language of Images
Xin Wen, Bingchen Zhao, Ismail Elezi et al.
CanonSwap: High-Fidelity and Consistent Video Face Swapping via Canonical Space Modulation
Xiangyang Luo, Ye Zhu, Yunfei Liu et al.
DreamFuse: Adaptive Image Fusion with Diffusion Transformer
Junjia Huang, Pengxiang Yan, Jiyang Liu et al.
AURELIA: Test-time Reasoning Distillation in Audio-Visual LLMs
Sanjoy Chowdhury, Hanan Gani, Nishit Anand et al.
Divergence-enhanced Knowledge-guided Context Optimization for Visual-Language Prompt Tuning
Yilun Li, Miaomiao Cheng, Xu Han et al.
Robust 3D Shape Reconstruction in Zero-Shot from a Single Image in the Wild
Junhyeong Cho, Kim Youwang, Hunmin Yang et al.
Driving View Synthesis on Free-form Trajectories with Generative Prior
Zeyu Yang, Zijie Pan, Yuankun Yang et al.
Bayesian Experimental Design Via Contrastive Diffusions
Jacopo Iollo, Christophe Heinkelé, Pierre Alliez et al.
Adapt-$\infty$: Scalable Continual Multimodal Instruction Tuning via Dynamic Data Selection
Adyasha Maharana, Jaehong Yoon, Tianlong Chen et al.
Secure and Confidential Certificates of Online Fairness
Olive Franzese, Ali Shahin Shamsabadi, Carter Luck et al.
CDI: Copyrighted Data Identification in Diffusion Models
Jan Dubiński, Antoni Kowalczuk, Franziska Boenisch et al.
Learning a Neural Solver for Parametric PDEs to Enhance Physics-Informed Methods
Lise Le Boudec, Emmanuel de Bézenac, Louis Serrano et al.
GIViC: Generative Implicit Video Compression
Ge Gao, Siyue Teng, Tianhao Peng et al.
ZeroStereo: Zero-shot Stereo Matching from Single Images
Xianqi Wang, Hao Yang, Gangwei Xu et al.
CATSplat: Context-Aware Transformer with Spatial Guidance for Generalizable 3D Gaussian Splatting from A Single-View Image
Wonseok Roh, Hwanhee Jung, JongWook Kim et al.
Recurrence-Enhanced Vision-and-Language Transformers for Robust Multimodal Document Retrieval
Davide Caffagni, Sara Sarto, Marcella Cornia et al.
Large Self-Supervised Models Bridge the Gap in Domain Adaptive Object Detection
Marc-Antoine Lavoie, Anas Mahmoud, Steven L. Waslander
RAGDiffusion: Faithful Cloth Generation via External Knowledge Assimilation
Yuhan Li, Xianfeng Tan, Wenxiang Shang et al.
Decoupling Layout from Glyph in Online Chinese Handwriting Generation
Minsi Ren, Yan-Ming Zhang, yi chen
DualPM: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction
Ben Kaye, Tomas Jakab, Shangzhe Wu et al.
Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis
Arpita Chowdhury, Dipanjyoti Paul, Zheda Mai et al.
Efficient Prompt Compression with Evaluator Heads for Long-Context Transformer Inference
Weizhi Fei, Xueyan Niu, XIE GUOQING et al.
3D-MolT5: Leveraging Discrete Structural Information for Molecule-Text Modeling
Qizhi Pei, Rui Yan, Kaiyuan Gao et al.
Sparfels: Fast Reconstruction from Sparse Unposed Imagery
Shubhendu Jena, Amine Ouasfi, Mae Younes et al.
Enhancing Reward Models for High-quality Image Generation: Beyond Text-Image Alignment
ying ba, Tianyu Zhang, Yalong Bai et al.
Curly Flow Matching for Learning Non-gradient Field Dynamics
Katarina Petrović, Lazar Atanackovic, Viggo Moro et al.
LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits
Duy Nguyen, Archiki Prasad, Elias Stengel-Eskin et al.
Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models
Yudong Jin, Sida Peng, Xuan Wang et al.
Dense-SfM: Structure from Motion with Dense Consistent Matching
JongMin Lee, Sungjoo Yoo
Efficient Source-Free Time-Series Adaptation via Parameter Subspace Disentanglement
Gaurav Patel, Christopher M. Sandino, Behrooz Mahasseni et al.
Dual-Process Image Generation
Grace Luo, Jonathan Granskog, Aleksander Holynski et al.
Multilevel Generative Samplers for Investigating Critical Phenomena
Ankur Singha, Elia Cellini, Kim A. Nicoli et al.
E(3)-equivariant models cannot learn chirality: Field-based molecular generation
Alexandru Dumitrescu, Dani Korpela, Markus Heinonen et al.
Execution Guided Line-by-Line Code Generation
Boaz Lavon, Shahar Katz, Lior Wolf
On the Loss of Context Awareness in General Instruction Fine-tuning
Yihan Wang, Andrew Bai, Nanyun Peng et al.
KAC: Kolmogorov-Arnold Classifier for Continual Learning
Yusong Hu, Zichen Liang, Fei Yang et al.
Dynamical Low-Rank Compression of Neural Networks with Robustness under Adversarial Attacks
Steffen Schotthöfer, Lexie Yang, Stefan Schnake
Deep Continuous-Time State-Space Models for Marked Event Sequences
Yuxin Chang, Alex Boyd, Cao (Danica) Xiao et al.
The Curse of Conditions: Analyzing and Improving Optimal Transport for Conditional Flow-Based Generation
Ho Kei Cheng, Alex Schwing
SparseAlign: a Fully Sparse Framework for Cooperative Object Detection
Yunshuang Yuan, Yan Xia, Daniel Cremers et al.
EVOS: Efficient Implicit Neural Training via EVOlutionary Selector
Weixiang Zhang, Shuzhao Xie, Chengwei Ren et al.
Vgent: Graph-based Retrieval-Reasoning-Augmented Generation For Long Video Understanding
Xiaoqian Shen, Wenxuan Zhang, Jun Chen et al.
Towards In-the-wild 3D Plane Reconstruction from a Single Image
Jiachen Liu, Rui Yu, Sili Chen et al.
Augmented Mass-Spring Model for Real-Time Dense Hair Simulation
Jorge Herrera, Yi Zhou, Xin Sun et al.
On the Expressive Power of Sparse Geometric MPNNs
Yonatan Sverdlov, Nadav Dym
Cross-Modal Interactive Perception Network with Mamba for Lung Tumor Segmentation in PET-CT Images
Jie Mei, Chenyu Lin, Yu Qiu et al.
IGD: Token Decisiveness Modeling via Information Gain in LLMs for Personalized Recommendation
Zijie Lin, Yang Zhang, Xiaoyan Zhao et al.
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning
Xianhang Li, Yanqing Liu, Haoqin Tu et al.
Tight Time Complexities in Parallel Stochastic Optimization with Arbitrary Computation Dynamics
Alexander Tyurin
Probabilistic Learning to Defer: Handling Missing Expert Annotations and Controlling Workload Distribution
Cuong Nguyen, Thanh-Toan Do, Gustavo Carneiro
Control-oriented Clustering of Visual Latent Representation
Han Qi, Haocheng Yin, Heng Yang
Open-Vocabulary Octree-Graph for 3D Scene Understanding
Zhigang Wang, Yifei Su, Chenhui Li et al.
Towards Realistic Example-based Modeling via 3D Gaussian Stitching
Xinyu Gao, Ziyi Yang, Bingchen Gong et al.
Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation
Kaining Ying, Henghui Ding, Guangquan Jie et al.
VasTSD: Learning 3D Vascular Tree-state Space Diffusion Model for Angiography Synthesis
Zhifeng Wang, Renjiao Yi, Xin Wen et al.
Ensembles of Low-Rank Expert Adapters
Yinghao Li, Vianne Gao, Chao Zhang et al.
DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation
Runze Zhang, Guoguang Du, Xiaochuan Li et al.
Mitigating Hallucination Through Theory-Consistent Symmetric Multimodal Preference Optimization
Wenqi Liu, Xuemeng Song, Jiaxi Li et al.
Weakly Supervised Temporal Action Localization via Dual-Prior Collaborative Learning Guided by Multimodal Large Language Models
Quan Zhang, Jinwei Fang, Rui Yuan et al.
On Denoising Walking Videos for Gait Recognition
Dongyang Jin, Chao Fan, Jingzhe Ma et al.
Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models
Ilgee Hong, Changlong Yu, Liang Qiu et al.
Thought Communication in Multiagent Collaboration
Yujia Zheng, Zhuokai Zhao, Zijian Li et al.
TSAM: Temporal SAM Augmented with Multimodal Prompts for Referring Audio-Visual Segmentation
Abduljalil Radman, Jorma Laaksonen
3D Occupancy Prediction with Low-Resolution Queries via Prototype-aware View Transformation
Gyeongrok Oh, Sung June Kim, Heeju Ko et al.
Where, What, Why: Towards Explainable Driver Attention Prediction
Yuchen Zhou, Jiayu Tang, Xiaoyan Xiao et al.
Imputation-free and Alignment-free: Incomplete Multi-view Clustering Driven by Consensus Semantic Learning
yuzhuo dai, Jiaqi Jin, Zhibin Dong et al.
Ineq-Comp: Benchmarking Human-Intuitive Compositional Reasoning in Automated Theorem Proving of Inequalities
Haoyu Zhao, Yihan Geng, Shange Tang et al.
Hyperbolic Safety-Aware Vision-Language Models
Tobia Poppi, Tejaswi Kasarla, Pascal Mettes et al.
SQLens: An End-to-End Framework for Error Detection and Correction in Text-to-SQL
Yue Gong, Chuan Lei, Xiao Qin et al.
KLASS: KL-Guided Fast Inference in Masked Diffusion Models
Seo Hyun Kim, Sunwoo Hong, Hojung Jung et al.
Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding
Yan Wang, Baoxiong Jia, Ziyu Zhu et al.
FlexOLMo: Open Language Models for Flexible Data Use
Weijia Shi, Akshita Bhagia, Kevin Farhat et al.
SimXRD-4M: Big Simulated X-ray Diffraction Data and Crystal Symmetry Classification Benchmark
Bin Cao, Yang Liu, Zinan Zheng et al.
Generalizable, real-time neural decoding with hybrid state-space models
Avery Hee-Woon Ryoo, Nanda H Krishna, Ximeng Mao et al.
Bridging Information Asymmetry in Text-video Retrieval: A Data-centric Approach
Zechen Bai, Tianjun Xiao, Tong He et al.
VALLR: Visual ASR Language Model for Lip Reading
Marshall Thomas, Edward Fish, Richard Bowden