Most Cited 2025 "general ability preservation" Papers
22,274 papers found • Page 14 of 112
Conference
Multi-step Visual Reasoning with Visual Tokens Scaling and Verification
Tianyi Bai, Zengjie Hu, Fupeng Sun et al.
MM-CamObj: A Comprehensive Multimodal Dataset for Camouflaged Object Scenarios
Jiacheng Ruan, Wenzhen Yuan, Zehao Lin et al.
RecFlow: An Industrial Full Flow Recommendation Dataset
Qi Liu, Kai Zheng, Rui Huang et al.
Think or Not? Exploring Thinking Efficiency in Large Reasoning Models via an Information-Theoretic Lens
Xixian Yong, Xiao Zhou, Yingying Zhang et al.
The Optimization Landscape of SGD Across the Feature Learning Strength
Alexander Atanasov, Alexandru Meterez, James Simon et al.
Latent Thought Models with Variational Bayes Inference-Time Computation
Deqian Kong, Minglu Zhao, Dehong Xu et al.
Training-free LLM-generated Text Detection by Mining Token Probability Sequences
Yihuai Xu, Yongwei Wang, YIFEI BI et al.
UniMLVG: Unified Framework for Multi-view Long Video Generation with Comprehensive Control Capabilities for Autonomous Driving
Rui Chen, Zehuan Wu, Yichen Liu et al.
ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models
Yeji Park, Deokyeong Lee, Junsuk Choe et al.
EMOE: Modality-Specific Enhanced Dynamic Emotion Experts
Yiyang Fang, Wenke Huang, Guancheng Wan et al.
Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance
Dimitris Oikonomou, Nicolas Loizou
Gazing Into Missteps: Leveraging Eye-Gaze for Unsupervised Mistake Detection in Egocentric Videos of Skilled Human Activities
Michele Mazzamuto, Antonino Furnari, Yoichi Sato et al.
Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation
Eliot Xing, Vernon Luk, Jean Oh
MolParser: End-to-end Visual Recognition of Molecule Structures in the Wild
Xi Fang, Jiankun Wang, Xiaochen Cai et al.
Do-PFN: In-Context Learning for Causal Effect Estimation
Jake Robertson, Arik Reuter, Siyuan Guo et al.
Adapter Merging with Centroid Prototype Mapping for Scalable Class-Incremental Learning
Takuma Fukuda, Hiroshi Kera, Kazuhiko Kawamoto
GenPC: Zero-shot Point Cloud Completion via 3D Generative Priors
An Li, Zhe Zhu, Mingqiang Wei
Diff-Shadow: Global-guided Diffusion Model for Shadow Removal
Jinting Luo, Ru Li, Chengzhi Jiang et al.
Optimized Multi-Token Joint Decoding With Auxiliary Model for LLM Inference
Zongyue Qin, Ziniu Hu, Zifan He et al.
NeRFPrior: Learning Neural Radiance Field as a Prior for Indoor Scene Reconstruction
Wenyuan Zhang, Emily Yue-ting Jia, Junsheng Zhou et al.
The Computational Complexity of Circuit Discovery for Inner Interpretability
Federico Adolfi, Martina G. Vilas, Todd Wareham
3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding
Tatiana Zemskova, Dmitry Yudin
Tri-Ergon: Fine-Grained Video-to-Audio Generation with Multi-Modal Conditions and LUFS Control
Bingliang Li, Fengyu Yang, Yuxin Mao et al.
GaussianUDF: Inferring Unsigned Distance Functions through 3D Gaussian Splatting
Shujuan Li, Yu-Shen Liu, Zhizhong Han
EmotiCrafter: Text-to-Emotional-Image Generation based on Valence-Arousal Model
Shengqi Dang, Yi He, Long Ling et al.
Addax: Utilizing Zeroth-Order Gradients to Improve Memory Efficiency and Performance of SGD for Fine-Tuning Language Models
Zeman Li, Xinwei Zhang, Peilin Zhong et al.
ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models
Junzhe Chen, Tianshu Zhang, Shiyu Huang et al.
Breaking Latent Prior Bias in Detectors for Generalizable AIGC Image Detection
Yue Zhou, Xinan He, Kaiqing Lin et al.
TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation
Gihyun Kwon, Jong Chul YE
When the Future Becomes the Past: Taming Temporal Correspondence for Self-supervised Video Representation Learning
Yang Liu, Qianqian Xu, Peisong Wen et al.
Interaction Asymmetry: A General Principle for Learning Composable Abstractions
Jack Brady, Julius von Kügelgen, Sebastien Lachapelle et al.
CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance
Yongchao Chen, Yilun Hao, Yueying Liu et al.
Learning the RoPEs: Better 2D and 3D Position Encodings with STRING
Connor Schenck, Isaac Reid, Mithun Jacob et al.
MetaUrban: An Embodied AI Simulation Platform for Urban Micromobility
Wayne Wu, Honglin He, Jack He et al.
TopoNets: High performing vision and language models with brain-like topography
Mayukh Deb, Mainak Deb, Apurva Murty
DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation
Hongbin Lin, Zilu Guo, Yifan Zhang et al.
EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis
Sheng Miao, Jiaxin Huang, Dongfeng Bai et al.
Federated Learning with Sample-level Client Drift Mitigation
Haoran Xu, Jiaze Li, Wanyi Wu et al.
XTrack: Multimodal Training Boosts RGB-X Video Object Trackers
Yuedong Tan, Zongwei Wu, Yuqian Fu et al.
This Time is Different: An Observability Perspective on Time Series Foundation Models
Ben Cohen, Emaad Khwaja, Youssef Doubli et al.
BlockDance: Reuse Structurally Similar Spatio-Temporal Features to Accelerate Diffusion Transformers
Hui Zhang, Tingwei Gao, Jie Shao et al.
LBM: Latent Bridge Matching for Fast Image-to-Image Translation
Clément Chadebec, Onur Tasar, Sanjeev Sreetharan et al.
NoT: Federated Unlearning via Weight Negation
Yasser Khalil, Leo Maxime Brunswic, Soufiane Lamghari et al.
KTAE: A Model-Free Algorithm to Key-Tokens Advantage Estimation in Mathematical Reasoning
Wei Sun, Wen Yang, Pu Jian et al.
DELTA: Pre-Train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment
Haitao Li, Qingyao Ai, Xinyan Han et al.
Planning in the Dark: LLM-Symbolic Planning Pipeline Without Experts
Sukai Huang, Nir Lipovetzky, Trevor Cohn
RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction
Peng Liu, Dongyang Dai, Zhiyong Wu
Horizon-GS: Unified 3D Gaussian Splatting for Large-Scale Aerial-to-Ground Scenes
Lihan Jiang, Kerui Ren, Mulin Yu et al.
Locality Alignment Improves Vision-Language Models
Ian Covert, Tony Sun, James Y Zou et al.
SLIP: Spoof-Aware One-Class Face Anti-Spoofing with Language Image Pretraining
Pei-Kai Huang, Jun-Xiong Chong, Cheng-Hsuan Chiang et al.
Enhancing Trustworthiness of Graph Neural Networks with Rank-Based Conformal Training
Ting Wang, Zhixin Zhou, Rui Luo
KnowPO: Knowledge-Aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models
Ruizhe Zhang, Yongxin Xu, Yuzhen Xiao et al.
Don’t Think Longer, Think Wisely: Optimizing Thinking Dynamics for Large Reasoning Models
Sohyun An, Ruochen Wang, Tianyi Zhou et al.
h4rm3l: A Language for Composable Jailbreak Attack Synthesis
Moussa Koulako Bala Doumbouya, Ananjan Nandi, Gabriel Poesia et al.
CoRe: Benchmarking LLMs’ Code Reasoning Capabilities through Static Analysis Tasks
Danning Xie, Mingwei Zheng, Xuwei Liu et al.
Long-Sequence Recommendation Models Need Decoupled Embeddings
Ningya Feng, Junwei Pan, Jialong Wu et al.
Data Synthesis with Diverse Styles for Face Recognition via 3DMM-Guided Diffusion
Yuxi Mi, Zhizhou Zhong, Yuge Huang et al.
FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning
Gaojian Wang, Feng Lin, Tong Wu et al.
RealMath: A Continuous Benchmark for Evaluating Language Models on Research-Level Mathematics
Jie Zhang, Cezara Petrui, Kristina Nikolić et al.
Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors
Fan Nie, Lan Feng, Haotian Ye et al.
Multi-view Reconstruction via SfM-guided Monocular Depth Estimation
Haoyu Guo, He Zhu, Sida Peng et al.
TOP-ERL: Transformer-based Off-Policy Episodic Reinforcement Learning
Ge Li, Dong Tian, Hongyi Zhou et al.
FAMNet: Frequency-aware Matching Network for Cross-domain Few-shot Medical Image Segmentation
Yuntian Bo, Yazhou Zhu, Lunbo Li et al.
LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models
Jian Liang, Wenke Huang, Guancheng Wan et al.
Jigsaw Puzzles: Splitting Harmful Questions to Jailbreak Large Language Models in Multi-turn Interactions
Hao Yang, Lizhen Qu, Ehsan Shareghi et al.
Neither Valid nor Reliable? Investigating the Use of LLMs as Judges
Khaoula Chehbouni, Mohammed Haddou, Jackie CK Cheung et al.
Adaptive Prompting for Continual Relation Extraction: A Within-Task Variance Perspective
Minh Le, Tien Ngoc Luu, An Nguyen The et al.
What's the Move? Hybrid Imitation Learning via Salient Points
Priya Sundaresan, Hengyuan Hu, Quan Vuong et al.
FocalCodec: Low-Bitrate Speech Coding via Focal Modulation Networks
Luca Della Libera, Francesco Paissan, Cem Subakan et al.
Sports-Traj: A Unified Trajectory Generation Model for Multi-Agent Movement in Sports
Yi Xu, Yun Fu
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
Hanyang Wang, Fangfu Liu, Jiawei Chi et al.
Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation
Yongkang Li, Tianheng Cheng, Bin Feng et al.
Learning 2D Invariant Affordance Knowledge for 3D Affordance Grounding
Xianqiang Gao, Pingrui Zhang, Delin Qu et al.
BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations
Weixi Feng, Chao Liu, Sifei Liu et al.
Lightweight Neural App Control
Filippos Christianos, Georgios Papoudakis, Thomas Coste et al.
Zero-Shot Low-Light Image Enhancement via Latent Diffusion Models
Yan Huang, Xiaoshan Liao, Jinxiu Liang et al.
VTDexManip: A Dataset and Benchmark for Visual-tactile Pretraining and Dexterous Manipulation with Reinforcement Learning
Qingtao Liu, Yu Cui, Zhengnan Sun et al.
From Words to Worth: Newborn Article Impact Prediction with LLM
Penghai Zhao, Qinghua Xing, Kairan Dou et al.
AI-Researcher: Autonomous Scientific Innovation
Jiabin Tang, Lianghao Xia, Zhonghang Li et al.
Hierarchical Autoregressive Transformers: Combining Byte- and Word-Level Processing for Robust, Adaptable Language Models
Pit Neitemeier, Björn Deiseroth, Constantin Eichenberg et al.
Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding
Yiming Zhang, Zhuokai Zhao, Zhaorun Chen et al.
Revisiting Tampered Scene Text Detection in the Era of Generative AI
Chenfan Qu, Yiwu Zhong, Fengjun Guo et al.
Lifting Motion to the 3D World via 2D Diffusion
Jiaman Li, Karen Liu, Jiajun Wu
Understanding Emotional Body Expressions via Large Language Models
Haifeng Lu, Jiuyi Chen, Feng Liang et al.
SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters
Jianping Jiang, Weiye Xiao, Zhengyu Lin et al.
LiDAR-RT: Gaussian-based Ray Tracing for Dynamic LiDAR Re-simulation
Chenxu Zhou, Lvchang Fu, Sida Peng et al.
DifIISR: A Diffusion Model with Gradient Guidance for Infrared Image Super-Resolution
Xingyuan Li, Zirui Wang, Yang Zou et al.
RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion
Xiaomeng Chu, Jiajun Deng, Guoliang You et al.
Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs
Yingji Zhong, Zhihao Li, Dave Zhenyu Chen et al.
IgGM: A Generative Model for Functional Antibody and Nanobody Design
Rubo Wang, Fandi Wu, Xingyu Gao et al.
Toward Real-world BEV Perception: Depth Uncertainty Estimation via Gaussian Splatting
Shu-Wei Lu, Yi-Hsuan Tsai, Yi-Ting Chen
Deep Kernel Relative Test for Machine-generated Text Detection
Yiliao Song, Zhenqiao Yuan, Shuhai Zhang et al.
DisasterM3: A Remote Sensing Vision-Language Dataset for Disaster Damage Assessment and Response
Junjue Wang, Weihao Xuan, Heli Qi et al.
Audio-Visual Instance Segmentation
Ruohao Guo, Xianghua Ying, Yaru Chen et al.
Sparse Autoencoders Reveal Temporal Difference Learning in Large Language Models
Can Demircan, Tankred Saanum, Akshay Jagadish et al.
GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution
Fengxiang Wang, Mingshuo Chen, Yueying Li et al.
Integrated Augmented and Virtual Reality Technologies for Realistic Fire Drill Training
Hosan Kang, Jinseong Yang, Beom-Seok Ko et al.
MIDI-GPT: A Controllable Generative Model for Computer-Assisted Multitrack Music Composition
Philippe Pasquier, Jeff Ens, Nathan Fradet et al.
FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video
Yue Gao, Hong-Xing Yu, Bo Zhu et al.
Transformer-Squared: Self-adaptive LLMs
Qi Sun, Edoardo Cetin, Yujin Tang
Skill Expansion and Composition in Parameter Space
Tenglong Liu, Jianxiong Li, Yinan Zheng et al.
Selective Visual Prompting in Vision Mamba
Yifeng Yao, Zichen Liu, Zhenyu Cui et al.
Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning
Baoqi Pei, Yifei Huang, Jilan Xu et al.
Epistemic Alignment: A Mediating Framework for User-LLM Knowledge Delivery
Nicholas Clark, Hua Shen, Bill Howe et al.
BearLLM: A Prior Knowledge-Enhanced Bearing Health Management Framework with Unified Vibration Signal Representation
Haotian Peng, Jiawei Liu, Jinsong Du et al.
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories
Muzhi Zhu, Yuzhuo Tian, Hao Chen et al.
nnWNet: Rethinking the Use of Transformers in Biomedical Image Segmentation and Calling for a Unified Evaluation Benchmark
Yanfeng Zhou, Lingrui Li, Le Lu et al.
LaGeM: A Large Geometry Model for 3D Representation Learning and Diffusion
Biao Zhang, Peter Wonka
OmniStyle: Filtering High Quality Style Transfer Data at Scale
Ye Wang, Ruiqi Liu, Jiang Lin et al.
Towards Optimal Multi-draft Speculative Decoding
Zhengmian Hu, Tong Zheng, Vignesh Viswanathan et al.
Latent Chain-of-Thought for Visual Reasoning
Guohao Sun, Hang Hua, Jian Wang et al.
Almost Optimal Batch-Regret Tradeoff for Batch Linear Contextual Bandits
Zihan Zhang, Xiangyang Ji, Yuan Zhou
Geometry of Lightning Self-Attention: Identifiability and Dimension
Nathan Henry, Giovanni Luca Marchetti, Kathlén Kohn
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
Jiange Yang, Haoyi Zhu, Yating Wang et al.
On Linear Representations and Pretraining Data Frequency in Language Models
Jack Merullo, Noah Smith, Sarah Wiegreffe et al.
Detecting Backdoor Attacks in Federated Learning via Direction Alignment Inspection
Jiahao Xu, Zikai Zhang, Rui Hu
OmniCount: Multi-label Object Counting with Semantic-Geometric Priors
Anindya Mondal, Sauradip Nag, Xiatian Zhu et al.
MPTSNet: Integrating Multiscale Periodic Local Patterns and Global Dependencies for Multivariate Time Series Classification
Yang Mu, Muhammad Shahzad, Xiao Xiang Zhu
NetMoE: Accelerating MoE Training through Dynamic Sample Placement
Xinyi Liu, Yujie Wang, Fangcheng Fu et al.
Conformalized Interval Arithmetic with Symmetric Calibration
Rui Luo, Zhixin Zhou
Adversarial Machine Unlearning
Zonglin Di, Sixie Yu, Yevgeniy Vorobeychik et al.
Conformal Thresholded Intervals for Efficient Regression
Rui Luo, Zhixin Zhou
Identifying and Mitigating Position Bias of Multi-image Vision-Language Models
Xinyu Tian, Shu Zou, Zhaoyuan Yang et al.
Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?
Antonia Wüst, Tim Woydt, Lukas Helff et al.
ViSpeak: Visual Instruction Feedback in Streaming Videos
Shenghao Fu, Qize Yang, Yuan-Ming Li et al.
GaussianSpa: An “Optimizing-Sparsifying” Simplification Framework for Compact and High-Quality 3D Gaussian Splatting
Yangming Zhang, Wenqi Jia, Wei Niu et al.
Rectified Diffusion Guidance for Conditional Generation
Mengfei Xia, Nan Xue, Yujun Shen et al.
Advancing Mathematical Reasoning in Language Models: The Impact of Problem-Solving Data, Data Synthesis Methods, and Training Stages
Zui Chen, Tianqiao Liu, Tongqing et al.
From Commands to Prompts: LLM-based Semantic File System for AIOS
Zeru Shi, Kai Mei, Mingyu Jin et al.
R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing
Tianyu Fu, Yi Ge, Yichen You et al.
RelitLRM: Generative Relightable Radiance for Large Reconstruction Models
Tianyuan Zhang, Zhengfei Kuang, Haian Jin et al.
LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation models
Ziqi Lu, Heng Yang, Danfei Xu et al.
Envisioning Class Entity Reasoning by Large Language Models for Few-shot Learning
Mushui Liu, Fangtai Wu, Bozheng Li et al.
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation
Fating Hong, Zunnan Xu, Zixiang Zhou et al.
LiveXiv - A Multi-Modal live benchmark based on Arxiv papers content
Nimrod Shabtay, Felipe Maia Polo, Sivan Doveh et al.
DG-Mamba: Robust and Efficient Dynamic Graph Structure Learning with Selective State Space Models
Haonan Yuan, Qingyun Sun, Zhaonan Wang et al.
Proxy Denoising for Source-Free Domain Adaptation
Song Tang, Wenxin Su, Yan Gan et al.
Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper
Xinyue Zhu, Binghao Huang, Yunzhu Li
Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts
Guorui Zheng, Xidong Wang, Juhao Liang et al.
Learning Few-Step Diffusion Models by Trajectory Distribution Matching
Yihong Luo, Tianyang Hu, Jiacheng Sun et al.
SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding
Yangliu Hu, Zikai Song, Na Feng et al.
Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models
Jinho Jeong, Sangmin Han, Jinwoo Kim et al.
Efficient Rectification of Neuro-Symbolic Reasoning Inconsistencies by Abductive Reflection
Wen-Chao Hu, Wang-Zhou Dai, Yuan Jiang et al.
Bridging the Gap for Test-Time Multimodal Sentiment Analysis
Zirun Guo, Tao Jin, Wenlong Xu et al.
Preference Optimization on Pareto Sets: On a Theory of Multi-Objective Optimization
Abhishek Roy, Geelon So, Yian Ma
Reangle-A-Video: 4D Video Generation as Video-to-Video Translation
Hyeonho Jeong, Suhyeon Lee, Jong Ye
SeaS: Few-shot Industrial Anomaly Image Generation with Separation and Sharing Fine-tuning
Zhewei Dai, Shilei Zeng, Haotian Liu et al.
Synthetic Video Enhances Physical Fidelity in Video Synthesis
Qi Zhao, Xingyu Ni, Ziyu Wang et al.
Battling the Non-stationarity in Time Series Forecasting via Test-time Adaptation
HyunGi Kim, Siwon Kim, Jisoo Mok et al.
RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints
Yiran Qin, Li Kang, Xiufeng Song et al.
HashAttention: Semantic Sparsity for Faster Inference
Aditya Desai, Shuo Yang, Alejandro Cuadron et al.
VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers
Juncan Deng, Shuaiting Li, Zeyu Wang et al.
Benign Samples Matter! Fine-tuning On Outlier Benign Samples Severely Breaks Safety
Zihan Guan, Mengxuan Hu, Ronghang Zhu et al.
Knowledge Localization: Mission Not Accomplished? Enter Query Localization!
Yuheng Chen, Pengfei Cao, Yubo Chen et al.
Liger: Linearizing Large Language Models to Gated Recurrent Structures
Disen Lan, Weigao Sun, Jiaxi Hu et al.
DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models
Ziyi Wu, Anil Kag, Ivan Skorokhodov et al.
ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding
Zhenxing Zhang, Yaxiong Wang, Lechao Cheng et al.
4DGC: Rate-Aware 4D Gaussian Compression for Efficient Streamable Free-Viewpoint Video
Qiang Hu, Zihan Zheng, Houqiang Zhong et al.
FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
Renshan Zhang, Rui Shao, Gongwei Chen et al.
Jailbreaking as a Reward Misspecification Problem
Zhihui Xie, Jiahui Gao, Lei Li et al.
ProSec: Fortifying Code LLMs with Proactive Security Alignment
Xiangzhe Xu, Zian Su, Jinyao Guo et al.
Towards Trustworthy Knowledge Graph Reasoning: An Uncertainty Aware Perspective
Bo Ni, Yu Wang, Lu Cheng et al.
RoboTron-Drive: All-in-One Large Multimodal Model for Autonomous Driving
Zhijian Huang, Chengjian Feng, Baihui Xiao et al.
ROCKET-1: Mastering Open-World Interaction with Visual-Temporal Context Prompting
Shaofei Cai, Zihao Wang, Kewei Lian et al.
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve?
Zhenheng Tang, Xiang Liu, Qian Wang et al.
Ultra-Sparse Memory Network
Zihao Huang, Qiyang Min, Hongzhi Huang et al.
Bridging the User-side Knowledge Gap in Knowledge-aware Recommendations with Large Language Models
Zheng Hu, Zhe Li, Ziyun Jiao et al.
Hidden in the Noise: Two-Stage Robust Watermarking for Images
Kasra Arabi, Benjamin Feuer, R. Teal Witter et al.
HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator
Fan Yang, Ru Zhen, Jianing Wang et al.
Precise Localization of Memories: A Fine-grained Neuron-level Knowledge Editing Technique for LLMs
Haowen Pan, Xiaozhi Wang, Yixin Cao et al.
Unifying Appearance Codes and Bilateral Grids for Driving Scene Gaussian Splatting
Nan Wang, Lixing Xiao, Yuantao Chen et al.
Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding
Yue Fan, Xiaojian Ma, Rongpeng Su et al.
STD-PLM: Understanding Both Spatial and Temporal Properties of Spatial-Temporal Data with PLM
Yiheng Huang, Xiaowei Mao, Shengnan Guo et al.
Jailbreak Antidote: Runtime Safety-Utility Balance via Sparse Representation Adjustment in Large Language Models
Guobin Shen, Dongcheng Zhao, Yiting Dong et al.
TaoAvatar: Real-Time Lifelike Full-Body Talking Avatars for Augmented Reality via 3D Gaussian Splatting
Jianchuan Chen, Jingchuan Hu, Gaige Wang et al.
From Words to Structured Visuals: A Benchmark and Framework for Text-to-Diagram Generation and Editing
Jingxuan Wei, Cheng Tan, Qi Chen et al.
Node Identifiers: Compact, Discrete Representations for Efficient Graph Learning
Yuankai Luo, Hongkang Li, Qijiong Liu et al.
Differentiable Optimization of Similarity Scores Between Models and Brains
Nathan Cloos, Moufan Li, Markus Siegel et al.
Frame Order Matters: A Temporal Sequence-Aware Model for Few-Shot Action Recognition
Bozheng Li, Mushui Liu, Gaoang Wang et al.
ExpertAF: Expert Actionable Feedback from Video
Kumar Ashutosh, Tushar Nagarajan, Georgios Pavlakos et al.
VladVA: Discriminative Fine-tuning of LVLMs
Yassine Ouali, Adrian Bulat, ALEXANDROS XENOS et al.
Manifold Learning by Mixture Models of VAEs for Inverse Problems
Giovanni S. Alberti, Johannes Hertrich, Matteo Santacesaria et al.
Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts
Yun Wang, Longguang Wang, Chenghao Zhang et al.
LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws
Prasanna Mayilvahanan, Thaddäus Wiedemer, Sayak Mallick et al.
Flash-VStream: Efficient Real-Time Understanding for Long Video Streams
Haoji Zhang, Yiqin Wang, Yansong Tang et al.
Multi-domain Distribution Learning for De Novo Drug Design
Arne Schneuing, Ilia Igashov, Adrian Dobbelstein et al.
TopoCellGen: Generating Histopathology Cell Topology with a Diffusion Model
Meilong Xu, Saumya Gupta, Xiaoling Hu et al.
Residual Stream Analysis with Multi-Layer SAEs
Tim Lawson, Lucy Farnik, Conor Houghton et al.
SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation
Duc-Hai Pham, Tung Do, Phong Nguyen et al.
Visual-Instructed Degradation Diffusion for All-in-One Image Restoration
Haina Qin, Wenyang Luo, Zewen Chen et al.
Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma?
Tianyuan Qu, Longxiang Tang, Bohao PENG et al.
Revisiting In-context Learning Inference Circuit in Large Language Models
Hakaze Cho, Mariko Kato, Yoshihiro Sakai et al.
MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation
Huaize Liu, WenZhang Sun, Donglin Di et al.
Few for Many: Tchebycheff Set Scalarization for Many-Objective Optimization
Xi Lin, Yilu Liu, Xiaoyuan Zhang et al.
SpatialCLIP: Learning 3D-aware Image Representations from Spatially Discriminative Language
zehan wang, Sashuai zhou, Shaoxuan He et al.
Deep Learning Alternatives Of The Kolmogorov Superposition Theorem
Leonardo Ferreira Guilhoto, Paris Perdikaris
Semantic and Sequential Alignment for Referring Video Object Segmentation
Feiyu Pan, Hao Fang, Fangkai Li et al.
Rethinking Invariance in In-context Learning
Lizhe Fang, Yifei Wang, Khashayar Gatmiry et al.
Context Steering: Controllable Personalization at Inference Time
Zhiyang He, Sashrika Pandey, Mariah Schrum et al.
CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization
Feize Wu, Yun Pang, Junyi Zhang et al.