Most Cited 2025 "gated memory unit" Papers
22,274 papers found • Page 55 of 112
Conference
Dataset Distillation via Vision-Language Category Prototype
YAWEN ZOU, Guang Li, Duo Su et al.
Predictive Uncertainty Quantification for Bird's Eye View Segmentation: A Benchmark and Novel Loss Function
Linlin Yu, Bowen Yang, Tianhao Wang et al.
Zero-Shot Blind-spot Image Denoising via Implicit Neural Sampling
Yuhui Quan, Tianxiang Zheng, Zhiyuan Ma et al.
Diffusion-based Realistic Listening Head Generation via Hybrid Motion Modeling
Yinuo Wang, Yanbo Fan, Xuan Wang et al.
Classifier-to-Bias: Toward Unsupervised Automatic Bias Detection for Visual Classifiers
Quentin Guimard, Moreno D'Incà, Massimiliano Mancini et al.
Accurate and Scalable Graph Neural Networks via Message Invariance
Zhihao Shi, Jie Wang, Zhiwei Zhuang et al.
Search and Detect: Training-Free Long Tail Object Detection via Web-Image Retrieval
Mankeerat Sidhu, Hetarth Chopra, Ansel Blume et al.
PaCA: Partial Connection Adaptation for Efficient Fine-Tuning
Sunghyeon Woo, Sol Namkung, SunWoo Lee et al.
Fine-Grained Image-Text Correspondence with Cost Aggregation for Open-Vocabulary Part Segmentation
Jiho Choi, Seonho Lee, Minhyun Lee et al.
Stereo Any Video: Temporally Consistent Stereo Matching
Junpeng Jing, Weixun Luo, Ye Mao et al.
Sparse Fine-Tuning of Transformers for Generative Tasks
Wei Chen, Jingxi Yu, Zichen Miao et al.
Decoupling Training-Free Guided Diffusion by ADMM
Youyuan Zhang, Zehua Liu, Zenan Li et al.
Learning from Synchronization: Self-Supervised Uncalibrated Multi-View Person Association in Challenging Scenes
Keqi Chen, vinkle srivastav, Didier MUTTER et al.
RC-AutoCalib: An End-to-End Radar-Camera Automatic Calibration Network
Van-Tin Luu, Yong-Lin Cai, Vu-Hoang Tran et al.
Computationally Efficient RL under Linear Bellman Completeness for Deterministic Dynamics
Runzhe Wu, Ayush Sekhari, Akshay Krishnamurthy et al.
Decoder Gradient Shield: Provable and High-Fidelity Prevention of Gradient-Based Box-Free Watermark Removal
Haonan An, Guang Hua, Zhengru Fang et al.
Robust and Efficient 3D Gaussian Splatting for Urban Scene Reconstruction
Zhensheng Yuan, Haozhi Huang, Zhen Xiong et al.
Efficient Imitation under Misspecification
Nicolas Espinosa Dice, Sanjiban Choudhury, Wen Sun et al.
Towards Stabilized and Efficient Diffusion Transformers through Long-Skip-Connections with Spectral Constraints
Guanjie Chen, Xinyu Zhao, Yucheng Zhou et al.
Watermarking One for All: A Robust Watermarking Scheme Against Partial Image Theft
Gaozhi Liu, Silu Cao, Zhenxing Qian et al.
Heavy Labels Out! Dataset Distillation with Label Space Lightening
Ruonan Yu, Songhua Liu, Zigeng Chen et al.
Taming Flow Matching with Unbalanced Optimal Transport into Fast Pansharpening
Zihan Cao, Yu Zhong, Liang-Jian Deng
Revisiting Pool-based Prompt Learning for Few-shot Class-incremental Learning
Yongwei Jiang, Yixiong Zou, Yuhua Li et al.
Vivid4D: Improving 4D Reconstruction from Monocular Video by Video Inpainting
Jiaxin Huang, Sheng Miao, Bangbang Yang et al.
ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains
Yein Park, Chanwoong Yoon, Jungwoo Park et al.
ZeroKey: Point-Level Reasoning and Zero-Shot 3D Keypoint Detection from Large Language Models
Bingchen Gong, Diego Gomez, Abdullah Hamdi et al.
CuMPerLay: Learning Cubical Multiparameter Persistence Vectorizations
Caner Korkmaz, Brighton Nuwagira, Baris Coskunuzer et al.
IM-Zero: Instance-level Motion Controllable Video Generation in a Zero-shot Manner
Yuyang Huang, Yabo Chen, Li Ding et al.
EigenGS Representation: From Eigenspace to Gaussian Image Space
LO-WEI TAI, Ching-En Ching En, Li et al.
SA-Occ: Satellite-Assisted 3D Occupancy Prediction in Real World
Chen Chen, Zhirui Wang, Taowei Sheng et al.
An Effective Theory of Bias Amplification
Arjun Subramonian, Samuel Bell, Levent Sagun et al.
CL-MFAP: A Contrastive Learning-Based Multimodal Foundation Model for Molecular Property Prediction and Antibiotic Screening
Gen Zhou, Sugitha Janarthanan, Yutong Lu et al.
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Masked Image Modeling Representations
Benedikt Alkin, Lukas Miklautz, Sepp Hochreiter et al.
FairGen: Enhancing Fairness in Text-to-Image Diffusion Models via Self-Discovering Latent Directions
Yilei Jiang, Wei-Hong Li, Yiyuan Zhang et al.
MaRI: Material Retrieval Integration across Domains
Jianhui Wang, Zhifei Yang, Yangfan He et al.
Exploiting Diffusion Prior for Task-driven Image Restoration
Jaeha Kim, Junghun Oh, Kyoung Mu Lee
ADIEE: Automatic Dataset Creation and Scorer for Instruction-Guided Image Editing Evaluation
Sherry Chen, Yi Wei, Luowei Zhou et al.
SEAL: Semantic Aware Image Watermarking
Kasra Arabi, R. Teal Witter, Chinmay Hegde et al.
Discretized Gaussian Representation for Tomographic Reconstruction
Shaokai Wu, Yuxiang Lu, Yapan Guo et al.
Infilling Score: A Pretraining Data Detection Algorithm for Large Language Models
Negin Raoof, Litu Rout, Giannis Daras et al.
A3: Few-shot Prompt Learning of Unlearnable Examples with Cross-Modal Adversarial Feature Alignment
Xuan Wang, Xitong Gao, Dongping Liao et al.
FADE: Frequency-Aware Diffusion Model Factorization for Video Editing
Yixuan Zhu, Haolin Wang, Shilin Ma et al.
Finsler Multi-Dimensional Scaling: Manifold Learning for Asymmetric Dimensionality Reduction and Embedding
Thomas Dagès, Simon Weber, Ya-Wei Eileen Lin et al.
GSBA$^K$: $top$-$K$ Geometric Score-based Black-box Attack
Md Farhamdur Reza, Richeng Jin, Tianfu Wu et al.
SFDM: Robust Decomposition of Geometry and Reflectance for Realistic Face Rendering from Sparse-view Images
Daisheng Jin, Jiangbei Hu, Baixin Xu et al.
MPG-SAM 2: Adapting SAM 2 with Mask Priors and Global Context for Referring Video Object Segmentation
Fu Rong, Meng Lan, Qian Zhang et al.
SA-LUT: Spatial Adaptive 4D Look-Up Table for Photorealistic Style Transfer
Zerui Gong, Zhonghua Wu, Qingyi Tao et al.
VAFlow: Video-to-Audio Generation with Cross-Modality Flow Matching
Xihua Wang, Xin Cheng, Yuyue Wang et al.
How Can Objects Help Video-Language Understanding?
Zitian Tang, Shijie Wang, Junho Cho et al.
StickMotion: Generating 3D Human Motions by Drawing a Stickman
Tao Wang, Zhihua Wu, Qiaozhi He et al.
GaRe: Relightable 3D Gaussian Splatting for Outdoor Scenes from Unconstrained Photo Collections
Haiyang Bai, Jiaqi Zhu, Songru Jiang et al.
LoCA: Location-Aware Cosine Adaptation for Parameter-Efficient Fine-Tuning
Zhekai Du, Yinjie Min, Jingjing Li et al.
NeRF Is a Valuable Assistant for 3D Gaussian Splatting
Shuangkang Fang, I-Chao Shen, Takeo Igarashi et al.
PVChat: Personalized Video Chat with One-Shot Learning
YUFEI SHI, Weilong Yan, Gang Xu et al.
Conditional Testing based on Localized Conformal $p$-values
Xiaoyang Wu, Lin Lu, Zhaojun Wang et al.
From Panels to Prose: Generating Literary Narratives from Comics
Ragav Sachdeva, Andrew Zisserman
GGTalker: Talking Head Systhesis with Generalizable Gaussian Priors and Identity-Specific Adaptation
Wentao Hu, Shunkai Li, Ziqiao Peng et al.
Beyond Sight: Towards Cognitive Alignment in LVLM via Enriched Visual Knowledge
Yaqi Zhao, Yuanyang Yin, Lin Li et al.
MVGBench: a Comprehensive Benchmark for Multi-view Generation Models
Xianghui Xie, Jan Lenssen, Gerard Pons-Moll
On the Generalization of Handwritten Text Recognition Models
Carlos Garrido-Munoz, Jorge Calvo-Zaragoza
Visual Prompting for One-shot Controllable Video Editing without Inversion
Zhengbo Zhang, Yuxi Zhou, DUO PENG et al.
NL-Eye: Abductive NLI For Images
Mor Ventura, Michael Toker, Nitay Calderon et al.
Robustness Inspired Graph Backdoor Defense
Zhiwei Zhang, Minhua Lin, Junjie Xu et al.
DMesh++: An Efficient Differentiable Mesh for Complex Shapes
Sanghyun Son, Matheus Gadelha, Yang Zhou et al.
Expected Return Symmetries
Darius Muglich, Johannes Forkel, Elise van der Pol et al.
From Head to Tail: Efficient Black-box Model Inversion Attack via Long-tailed Learning
Ziang Li, Hongguang Zhang, Juan Wang et al.
LogoSP: Local-global Grouping of Superpoints for Unsupervised Semantic Segmentation of 3D Point Clouds
Zihui Zhang, Weisheng Dai, Hongtao Wen et al.
Efficient Distribution Matching of Representations via Noise-Injected Deep InfoMax
Ivan Butakov, Alexander Semenenko, Alexander Tolmachev et al.
UA-Pose: Uncertainty-Aware 6D Object Pose Estimation and Online Object Completion with Partial References
Ming-Feng Li, Xin Yang, Fu-En Wang et al.
Amodal Depth Anything: Amodal Depth Estimation in the Wild
Zhenyu Li, Mykola Lavreniuk, Jian Shi et al.
CoLMDriver: LLM-based Negotiation Benefits Cooperative Autonomous Driving
Changxing Liu, Genjia Liu, Zijun Wang et al.
DFM: Differentiable Feature Matching for Anomaly Detection
Wu Sheng, Yimi Wang, Xudong Liu et al.
Free-viewpoint Human Animation with Pose-correlated Reference Selection
Fa-Ting Hong, Zhan Xu, Haiyang Liu et al.
GS-Occ3D: Scaling Vision-only Occupancy Reconstruction with Gaussian Splatting
Baijun Ye, Minghui Qin, Saining Zhang et al.
Resonance: Learning to Predict Social-Aware Pedestrian Trajectories as Co-Vibrations
Conghao Wong, Ziqian Zou, Beihao Xia
Action Detail Matters: Refining Video Recognition with Local Action Queries
Mengmeng Wang, Zeyi Huang, Xiangjie Kong et al.
A Policy-Gradient Approach to Solving Imperfect-Information Games with Best-Iterate Convergence
Mingyang Liu, Gabriele Farina, Asuman Ozdaglar
FFR: Frequency Feature Rectification for Weakly Supervised Semantic Segmentation
Ziqian Yang, Xinqiao Zhao, Xiaolei Wang et al.
Simpler Diffusion: 1.5 FID on ImageNet512 with Pixel-space Diffusion
Emiel Hoogeboom, Thomas Mensink, Jonathan Heek et al.
Multi-Scale Neighborhood Occupancy Masked Autoencoder for Self-Supervised Learning in LiDAR Point Clouds
Mohamed Abdelsamad, Michael Ulrich, Claudius Glaeser et al.
Signs as Tokens: A Retrieval-Enhanced Multilingual Sign Language Generator
Ronglai Zuo, Rolandos Alexandros Potamias, Evangelos Ververas et al.
CAP-Net: A Unified Network for 6D Pose and Size Estimation of Categorical Articulated Parts from a Single RGB-D Image
Jingshun Huang, Haitao Lin, Tianyu Wang et al.
Diffusion Transformers for Tabular Data Time Series Generation
Fabrizio Garuti, Enver Sangineto, Simone Luetto et al.
4D-Fly: Fast 4D Reconstruction from a Single Monocular Video
Diankun Wu, Fangfu Liu, Yi-Hsin Hung et al.
ThunderKittens: Simple, Fast, and $\textit{Adorable}$ Kernels
Benjamin Spector, Simran Arora, Aaryan Singhal et al.
Large (Vision) Language Models are Unsupervised In-Context Learners
Artyom Gadetsky, Andrei Atanov, Yulun Jiang et al.
Boosting Adversarial Transferability via Residual Perturbation Attack
Jinjia Peng, Zeze Tao, Huibing Wang et al.
Geometry of Neural Reinforcement Learning in Continuous State and Action Spaces
Saket Tiwari, Omer Gottesman, George D Konidaris
HalLoc: Token-level Localization of Hallucinations for Vision Language Models
Eunkyu Park, Minyeong Kim, Gunhee Kim
Fast Uncovering of Protein Sequence Diversity from Structure
Luca Alessandro Silva, Barthelemy Meynard-Piganeau, Carlo Lucibello et al.
Plug-and-Play Versatile Compressed Video Enhancement
Huimin Zeng, Jiacheng Li, Zhiwei Xiong
Unleashing the Potential of Consistency Learning for Detecting and Grounding Multi-Modal Media Manipulation
Yiheng Li, Yang Yang, Zichang Tan et al.
HyTIP: Hybrid Temporal Information Propagation for Masked Conditional Residual Video Coding
Yi-Hsin Chen, Yi-Chen Yao, Kuan-Wei Ho et al.
Ground-V: Teaching VLMs to Ground Complex Instructions in Pixels
Yongshuo Zong, Qin ZHANG, DONGSHENG An et al.
EmbodiedSplat: Personalized Real-to-Sim-to-Real Navigation with Gaussian Splats from a Mobile Device
Gunjan Chhablani, Xiaomeng Ye, Muhammad Zubair Irshad et al.
PHATNet: A Physics-guided Haze Transfer Network for Domain-adaptive Real-world Image Dehazing
Fu-Jen Tsai, Yan-Tsung Peng, Yen-Yu Lin et al.
UVE: Are MLLMs Unified Evaluators for AI-Generated Videos?
Yuanxin Liu, Rui Zhu, Shuhuai Ren et al.
Breaking the Encoder Barrier for Seamless Video-Language Understanding
Handong Li, Yiyuan Zhang, Longteng Guo et al.
GroomLight: Hybrid Inverse Rendering for Relightable Human Hair Appearance Modeling
Yang Zheng, Menglei Chai, Delio Vicini et al.
Counterfactual Realizability
Arvind Raghavan, Elias Bareinboim
LOD-GS: Achieving Levels of Detail using Scalable Gaussian Soup
Jianxiong Shen, Yue Qian, Xiaohang Zhan
Flow4Agent: Long-form Video Understanding via Motion Prior from Optical Flow
Ruyang Liu, Shangkun Sun, Haoran Tang et al.
MomentSeeker: A Task-Oriented Benchmark For Long-Video Moment Retrieval
Huaying Yuan, Jian Ni, Zheng Liu et al.
Preventing Shortcuts in Adapter Training via Providing the Shortcuts
Anujraaj Goyal, Guocheng Qian, Huseyin Coskun et al.
FastLongSpeech: Enhancing Large Speech-Language Models for Efficient Long-Speech Processing
Shoutao Guo, Shaolei Zhang, Qingkai Fang et al.
Parallel Sequence Modeling via Generalized Spatial Propagation Network
Hongjun Wang, Wonmin Byeon, Jiarui Xu et al.
TAViS: Text-bridged Audio-Visual Segmentation with Foundation Models
Ziyang Luo, Nian Liu, Xuguang Yang et al.
Mitigating Hallucination in VideoLLMs via Temporal-Aware Activation Engineering
JIANFENG CAI, Jiale Hong, Zongmeng Zhang et al.
Align Your Rhythm: Generating Highly Aligned Dance Poses with Gating-Enhanced Rhythm-Aware Feature Representation
Congyi Fan, Jian Guan, Xuanjia Zhao et al.
Improving Bilinear RNN with Closed-loop Control
Jiaxi Hu, Yongqi Pan, Jusen Du et al.
Universal Visuo-Tactile Video Understanding for Embodied Interaction
Yifan Xie, Mingyang Li, Shoujie Li et al.
Hardware-aligned Hierarchical Sparse Attention for Efficient Long-term Memory Access
Xiang Hu, Jiaqi Leng, Jun Zhao et al.
4D Gaussian Splatting SLAM
Yanyan Li, Youxu Fang, Zunjie Zhu et al.
TensorRL-QAS: Reinforcement learning with tensor networks for improved quantum architecture search
Akash Kundu, Stefano Mangini
Task Vector Quantization for Memory-Efficient Model Merging
Youngeun Kim, Seunghwan Lee, Aecheon Jung et al.
Enhancing Diversity for Data-free Quantization
Kai Zhao, zhihao zhuang, Miao Zhang et al.
PSA-SSL: Pose and Size-aware Self-Supervised Learning on LiDAR Point Clouds
Barza Nisar, Steven L. Waslander
Weakly Supervised Visible-Infrared Person Re-Identification via Heterogeneous Expert Collaborative Consistency Learning
Yafei Zhang, Lingqi Kong, Huafeng Li et al.
Semantic Causality-Aware Vision-Based 3D Occupancy Prediction
Dubing Chen, Huan Zheng, Yucheng Zhou et al.
Implicit Bias Injection Attacks against Text-to-Image Diffusion Models
Huayang Huang, Xiangye Jin, Jiaxu Miao et al.
Fair Generation without Unfair Distortions: Debiasing Text-to-Image Generation with Entanglement-Free Attention
Jeonghoon Park, Juyoung Lee, Chaeyeon Chung et al.
X2-Gaussian: 4D Radiative Gaussian Splatting for Continuous-time Tomographic Reconstruction
Weihao Yu, Yuanhao Cai, Ruyi Zha et al.
SAM-REF: Introducing Image-Prompt Synergy during Interaction for Detail Enhancement in the Segment Anything Model
Chongkai Yu, Ting Liu, Li Anqi et al.
RePIC: Reinforced Post-Training for Personalizing Multi-Modal Language Models
Yeongtak Oh, Dohyun Chung, Juhyeon Shin et al.
DAGSM: Disentangled Avatar Generation with GS-enhanced Mesh
Jingyu Zhuang, Di Kang, Linchao Bao et al.
Revisiting LRP: Positional Attribution as the Missing Ingredient for Transformer Explainability
Yarden Bakish, Itamar Zimerman, Hila Chefer et al.
Towards Principled Unsupervised Multi-Agent Reinforcement Learning
Riccardo Zamboni, Mirco Mutti, Marcello Restelli
Beyond Modality Collapse: Representation Blending for Multimodal Dataset Distillation
xin zhang, Ziruo Zhang, JIAWEI DU et al.
Entropy Rectifying Guidance for Diffusion and Flow Models
Tariq Berrada Ifriqi, Adriana Romero-Soriano, Michal Drozdzal et al.
FiRe: Fixed-points of Restoration Priors for Solving Inverse Problems
Matthieu Terris, Ulugbek Kamilov, Thomas Moreau
DeltaPhi: Physical States Residual Learning for Neural Operators in Data-Limited PDE Solving
Xihang Yue, Yi Yang, Linchao Zhu
EgoDTM: Towards 3D-Aware Egocentric Video-Language Pretraining
Boshen Xu, Yuting Mei, liu xinbi et al.
Sharp-It: A Multi-view to Multi-view Diffusion Model for 3D Synthesis and Manipulation
Yiftach Edelstein, Or Patashnik, Dana Cohen-Bar et al.
SceneDecorator: Towards Scene-Oriented Story Generation with Scene Planning and Scene Consistency
Quanjian Song, Donghao Zhou, Jingyu Lin et al.
OWMM-Agent: Open World Mobile Manipulation With Multi-modal Agentic Data Synthesis
Junting Chen, Haotian Liang, Lingxiao Du et al.
SCAN: Bootstrapping Contrastive Pre-training for Data Efficiency
Yangyang Guo, Mohan Kankanhalli
Continuous Concepts Removal in Text-to-image Diffusion Models
Tingxu Han, Weisong Sun, Yanrong Hu et al.
Electromyography-Informed Facial Expression Reconstruction for Physiological-Based Synthesis and Analysis
Tim Büchner, Christoph Anders, Orlando Guntinas-Lichius et al.
One Prompt Fits All: Universal Graph Adaptation for Pretrained Models
Yongqi Huang, Jitao Zhao, Dongxiao He et al.
Scene-agnostic Pose Regression for Visual Localization
Junwei Zheng, Ruiping Liu, Yufan Chen et al.
Towards Source-Free Machine Unlearning
Sk Miraj Ahmed, Umit Basaran, Dripta S. Raychaudhuri et al.
Can MLLMs Absorb Math Reasoning Abilities from LLMs as Free Lunch?
Yijie Hu, Zihao Zhou, Kaizhu Huang et al.
FedSPA: Generalizable Federated Graph Learning under Homophily Heterogeneity
Zihan Tan, Guancheng Wan, Wenke Huang et al.
Large-scale Pre-training for Grounded Video Caption Generation
Evangelos Kazakos, Cordelia Schmid, Josef Sivic
Bridging Sign and Spoken Languages: Pseudo Gloss Generation for Sign Language Translation
Jianyuan Guo, Peike Li, Trevor Cohn
SPACE: Noise Contrastive Estimation Stabilizes Self-Play Fine-Tuning for Large Language Models
Yibo Wang, Guangda Huzhang, Qingguo Chen et al.
Training-Free Bayesianization for Low-Rank Adapters of Large Language Models
Haizhou Shi, Yibin Wang, Ligong Han et al.
Player-Centric Multimodal Prompt Generation for Large Language Model Based Identity-Aware Basketball Video Captioning
Zeyu Xi, Haoying Sun, Yaofei Wu et al.
SOMBRL: Scalable and Optimistic Model-Based RL
Bhavya, Lenart Treven, Carmelo Sferrazza et al.
Learning Differential Pyramid Representation for Tone Mapping
Qirui Yang, Yinbo Li, Yihao Liu et al.
FADRM: Fast and Accurate Data Residual Matching for Dataset Distillation
Jiacheng Cui, Xinyue Bi, Yaxin Luo et al.
Minimum Width for Deep, Narrow MLP: A Diffeomorphism Approach
Geonho Hwang
Valid Selection among Conformal Sets
Mahmoud Hegazy, Liviu Aolaritei, Michael Jordan et al.
GRIFFIN: Effective Token Alignment for Faster Speculative Decoding
Shijing Hu, Jingyang Li, Xingyu Xie et al.
Shapley-Coop: Credit Assignment for Emergent Cooperation in Self-Interested LLM Agents
Yun Hua, Haosheng Chen, Shiqin Wang et al.
Efficient Video Super-Resolution for Real-time Rendering with Decoupled G-buffer Guidance
Mingjun Zheng, Long Sun, Jiangxin Dong et al.
Text-to-Decision Agent: Offline Meta-Reinforcement Learning from Natural Language Supervision
Shilin Zhang, Zican Hu, Wenhao Wu et al.
SOGS: Second-Order Anchor for Advanced 3D Gaussian Splatting
Jiahui Zhang, Fangneng Zhan, Ling Shao et al.
Towards Minimizing Feature Drift in Model Merging: Layer-wise Task Vector Fusion for Adaptive Knowledge Integration
Wenju Sun, Qingyong Li, Wen Wang et al.
Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking
Zihan Su, Xuerui Qiu, Hongbin Xu Xu et al.
LLaFEA: Frame-Event Complementary Fusion for Fine-Grained Spatiotemporal Understanding in LMMs
Hanyu Zhou, Gim Hee Lee
FreeInv: Free Lunch for Improving DDIM Inversion
Yuxiang Bao, Huijie Liu, xun gao et al.
Improve Representation for Imbalanced Regression through Geometric Constraints
Zijian Dong, Yilei Wu, Chongyao Chen et al.
SHAP zero Explains Biological Sequence Models with Near-zero Marginal Cost for Future Queries
Darin Tsui, Aryan Musharaf, Yigit Efe Erginbas et al.
DiMPLe - Disentangled Multi-Modal Prompt Learning: Enhancing Out-Of-Distribution Alignment with Invariant and Spurious Feature Separation
Umaima Rahman, Mohammad Yaqub, Dwarikanath Mahapatra
URDF-Anything: Constructing Articulated Objects with 3D Multimodal Language Model
Zhe Li, Xiang Bai, Jieyu Zhang et al.
SemAlign3D: Semantic Correspondence between RGB-Images through Aligning 3D Object-Class Representations
Krispin Wandel, Hesheng Wang
TimeEmb: A Lightweight Static-Dynamic Disentanglement Framework for Time Series Forecasting
Mingyuan Xia, Chunxu Zhang, Zijian Zhang et al.
Dark-ISP: Enhancing RAW Image Processing for Low-Light Object Detection
Jiasheng Guo, Xin Gao, Yuxiang Yan et al.
M2SFormer: Multi-Spectral and Multi-Scale Attention with Edge-Aware Difficulty Guidance for Image Forgery Localization
Ju-Hyeon Nam, Dong-Hyun Moon, Sang-Chul Lee
Distilling Spatially-Heterogeneous Distortion Perception for Blind Image Quality Assessment
Xudong Li, Wenjie Nie, Yan Zhang et al.
OracleFusion: Assisting the Decipherment of Oracle Bone Script with Structurally Constrained Semantic Typography
Li Caoshuo, Zengmao Ding, Xiaobin Hu et al.
COSMO: Combination of Selective Memorization for Low-cost Vision-and-Language Navigation
Siqi Zhang, Yanyuan Qiao, Qunbo Wang et al.
D2SP: Dynamic Dual-Stage Purification Framework for Dual Noise Mitigation in Vision-based Affective Recognition.
Haoran Wang, Xinji Mai, Zeng Tao et al.
Rethinking Circuit Completeness in Language Models: AND, OR, and ADDER Gates
Hang Chen, Jiaying Zhu, Xinyu Yang et al.
Making Classic GNNs Strong Baselines Across Varying Homophily: A Smoothness–Generalization Perspective
Ming Gu, Zhuonan Zheng, Sheng Zhou et al.
Semantic and Visual Crop-Guided Diffusion Models for Heterogeneous Tissue Synthesis in Histopathology
Saghir Alfasly, Wataru Uegami, MD ENAMUL HOQ et al.
VideoAds for Fast-Paced Video Understanding
Zheyuan Zhang, Wanying Dou, Linkai Peng et al.
Articulated Kinematics Distillation from Video Diffusion Models
Xuan Li, Qianli Ma, Tsung-Yi Lin et al.
Prototypes are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval
WonJun Moon, Cheol-Ho Cho, Woojin Jun et al.
TokensGen: Harnessing Condensed Tokens for Long Video Generation
Wenqi Ouyang, Zeqi Xiao, Danni Yang et al.
Face Forgery Video Detection via Temporal Forgery Cue Unraveling
Zonghui Guo, YingJie Liu, Jie Zhang et al.
Learning to Specialize: Joint Gating-Expert Training for Adaptive MoEs in Decentralized Settings
Yehya Farhat, Hamza ElMokhtar Shili, Fangshuo Liao et al.
Mono2Stereo: A Benchmark and Empirical Study for Stereo Conversion
Songsong Yu, Yuxin Chen, Zhongang Qi et al.
Adaptive Defense against Harmful Fine-Tuning for Large Language Models via Bayesian Data Scheduler
Zixuan Hu, Li Shen, Zhenyi Wang et al.
Multipole Attention for Efficient Long Context Reasoning
Coleman Hooper, Sebastian Zhao, Luca Manolache et al.
Attention Sinks: A 'Catch, Tag, Release' Mechanism for Embeddings
Stephen Zhang, Mustafa Khan, Vardan Papyan
Compositional Caching for Training-free Open-vocabulary Attribute Detection
Marco Garosi, Alessandro Conti, Gaowen Liu et al.
TemCoCo: Temporally Consistent Multi-modal Video Fusion with Visual-Semantic Collaboration
Gong Meiqi, Hao Zhang, Xunpeng Yi et al.
HuMoCon: Concept Discovery for Human Motion Understanding
Qihang Fang, Chengcheng Tang, Bugra Tekin et al.
Revisiting Mode Connectivity in Neural Networks with Bezier Surface
Jie Ren, Pin-Yu Chen, Ren Wang
AR-RAG: Autoregressive Retrieval Augmentation for Image Generation
Jingyuan Qi, Zhiyang Xu, Qifan Wang et al.
Can We Infer Confidential Properties of Training Data from LLMs?
Pengrun Huang, Chhavi Yadav, Kamalika Chaudhuri et al.
Beyond the Surface: Enhancing LLM-as-a-Judge Alignment with Human via Internal Representations
Peng Lai, Jianjie Zheng, Sijie Cheng et al.
ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation
Yunhong Min, Daehyeon Choi, Kyeongmin Yeo et al.
High Temporal Consistency through Semantic Similarity Propagation in Semi-Supervised Video Semantic Segmentation for Autonomous Flight
Cédric Vincent, Taehyoung Kim, Henri Meeß
GeoVideo: Introducing Geometric Regularization into Video Generation Model
Yunpeng Bai, Shaoheng Fang, Chaohui Yu et al.
LabelAny3D: Label Any Object 3D in the Wild
Jin Yao, Radowan Mahmud Redoy, Sebastian Elbaum et al.
Generative Photomontage
Sean J. Liu, Nupur Kumari, Ariel Shamir et al.
DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution
Yuzhong Zhao, Feng Liu, Yue Liu et al.