Most Cited 2025 "plug-and-play control" Papers
22,274 papers found • Page 13 of 112
Conference
From Commands to Prompts: LLM-based Semantic File System for AIOS
Zeru Shi, Kai Mei, Mingyu Jin et al.
Adapter Merging with Centroid Prototype Mapping for Scalable Class-Incremental Learning
Takuma Fukuda, Hiroshi Kera, Kazuhiko Kawamoto
What's the Move? Hybrid Imitation Learning via Salient Points
Priya Sundaresan, Hengyuan Hu, Quan Vuong et al.
Instant Adversarial Purification with Adversarial Consistency Distillation
Chun Tong Lei, Hon Ming Yam, Zhongliang Guo et al.
Understanding Virtual Nodes: Oversquashing and Node Heterogeneity
Joshua Southern, Francesco Di Giovanni, Michael Bronstein et al.
ExpertAF: Expert Actionable Feedback from Video
Kumar Ashutosh, Tushar Nagarajan, Georgios Pavlakos et al.
DifIISR: A Diffusion Model with Gradient Guidance for Infrared Image Super-Resolution
Xingyuan Li, Zirui Wang, Yang Zou et al.
RFWave: Multi-band Rectified Flow for Audio Waveform Reconstruction
Peng Liu, Dongyang Dai, Zhiyong Wu
Advancing Spiking Neural Networks Towards Multiscale Spatiotemporal Interaction Learning
Yimeng Shan, Malu Zhang, Rui-jie Zhu et al.
Benign Samples Matter! Fine-tuning On Outlier Benign Samples Severely Breaks Safety
Zihan Guan, Mengxuan Hu, Ronghang Zhu et al.
From Words to Worth: Newborn Article Impact Prediction with LLM
Penghai Zhao, Qinghua Xing, Kairan Dou et al.
Hyperbolic Fine-Tuning for Large Language Models
Menglin Yang, Ram Samarth B B, Aosong Feng et al.
Understanding Emotional Body Expressions via Large Language Models
Haifeng Lu, Jiuyi Chen, Feng Liang et al.
PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation
Chen Wang, Chuhao Chen, Yiming Huang et al.
MoEE: Mixture of Emotion Experts for Audio-Driven Portrait Animation
Huaize Liu, WenZhang Sun, Donglin Di et al.
Liger: Linearizing Large Language Models to Gated Recurrent Structures
Disen Lan, Weigao Sun, Jiaxi Hu et al.
Preference Optimization on Pareto Sets: On a Theory of Multi-Objective Optimization
Abhishek Roy, Geelon So, Yian Ma
Integrated Augmented and Virtual Reality Technologies for Realistic Fire Drill Training
Hosan Kang, Jinseong Yang, Beom-Seok Ko et al.
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories
Muzhi Zhu, Yuzhuo Tian, Hao Chen et al.
LaGeM: A Large Geometry Model for 3D Representation Learning and Diffusion
Biao Zhang, Peter Wonka
Hidden in the Noise: Two-Stage Robust Watermarking for Images
Kasra Arabi, Benjamin Feuer, R. Teal Witter et al.
Adversarial Machine Unlearning
Zonglin Di, Sixie Yu, Yevgeniy Vorobeychik et al.
4DGC: Rate-Aware 4D Gaussian Compression for Efficient Streamable Free-Viewpoint Video
Qiang Hu, Zihan Zheng, Houqiang Zhong et al.
Bongard in Wonderland: Visual Puzzles that Still Make AI Go Mad?
Antonia Wüst, Tim Woydt, Lukas Helff et al.
Unifying Appearance Codes and Bilateral Grids for Driving Scene Gaussian Splatting
Nan Wang, Lixing Xiao, Yuantao Chen et al.
nnWNet: Rethinking the Use of Transformers in Biomedical Image Segmentation and Calling for a Unified Evaluation Benchmark
Yanfeng Zhou, Lingrui Li, Le Lu et al.
Ultra-Sparse Memory Network
Zihao Huang, Qiyang Min, Hongzhi Huang et al.
Semantic and Sequential Alignment for Referring Video Object Segmentation
Feiyu Pan, Hao Fang, Fangkai Li et al.
LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws
Prasanna Mayilvahanan, Thaddäus Wiedemer, Sayak Mallick et al.
HashAttention: Semantic Sparsity for Faster Inference
Aditya Desai, Shuo Yang, Alejandro Cuadron et al.
SF2T: Self-supervised Fragment Finetuning of Video-LLMs for Fine-Grained Understanding
Yangliu Hu, Zikai Song, Na Feng et al.
Rethinking Invariance in In-context Learning
Lizhe Fang, Yifei Wang, Khashayar Gatmiry et al.
DenseDPO: Fine-Grained Temporal Preference Optimization for Video Diffusion Models
Ziyi Wu, Anil Kag, Ivan Skorokhodov et al.
SEC-bench: Automated Benchmarking of LLM Agents on Real-World Software Security Tasks
Hwiwon Lee, Ziqi Zhang, Hanxiao Lu et al.
SANER: Annotation-free Societal Attribute Neutralizer for Debiasing CLIP
Yusuke Hirota, Min-Hung Chen, Chien-Yi Wang et al.
3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding
Tatiana Zemskova, Dmitry Yudin
RecFlow: An Industrial Full Flow Recommendation Dataset
Qi Liu, Kai Zheng, Rui Huang et al.
HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator
Fan Yang, Ru Zhen, Jianing Wang et al.
Consistent and Controllable Image Animation with Motion Diffusion Models
Xin Ma, Yaohui Wang, Gengyun Jia et al.
TopoCellGen: Generating Histopathology Cell Topology with a Diffusion Model
Meilong Xu, Saumya Gupta, Xiaoling Hu et al.
Interaction Asymmetry: A General Principle for Learning Composable Abstractions
Jack Brady, Julius von Kügelgen, Sebastien Lachapelle et al.
MetaUrban: An Embodied AI Simulation Platform for Urban Micromobility
Wayne Wu, Honglin He, Jack He et al.
On Conformal Isometry of Grid Cells: Learning Distance-Preserving Position Embedding
Dehong Xu, Ruiqi Gao, Wenhao Zhang et al.
Multi-step Visual Reasoning with Visual Tokens Scaling and Verification
Tianyi Bai, Zengjie Hu, Fupeng Sun et al.
Distilling LLM Agent into Small Models with Retrieval and Code Tools
Minki Kang, Jongwon Jeong, Seanie Lee et al.
Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper
Xinyue Zhu, Binghao Huang, Yunzhu Li
The Computational Complexity of Circuit Discovery for Inner Interpretability
Federico Adolfi, Martina G. Vilas, Todd Wareham
Breaking Latent Prior Bias in Detectors for Generalizable AIGC Image Detection
Yue Zhou, Xinan He, Kaiqing Lin et al.
RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion
Xiaomeng Chu, Jiajun Deng, Guoliang You et al.
BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization
Xueyang Zhou, Guiyao Tie, Guowen Zhang et al.
Twilight: Adaptive Attention Sparsity with Hierarchical Top-$p$ Pruning
Chaofan Lin, Jiaming Tang, Shuo Yang et al.
HiP-AD: Hierarchical and Multi-Granularity Planning with Deformable Attention for Autonomous Driving in a Single Decoder
Yingqi Tang, Zhuoran Xu, Zhaotie Meng et al.
Latent Thought Models with Variational Bayes Inference-Time Computation
Deqian Kong, Minglu Zhao, Dehong Xu et al.
CoRe: Benchmarking LLMs’ Code Reasoning Capabilities through Static Analysis Tasks
Danning Xie, Mingwei Zheng, Xuwei Liu et al.
EVolSplat: Efficient Volume-based Gaussian Splatting for Urban View Synthesis
Sheng Miao, Jiaxin Huang, Dongfeng Bai et al.
LoRACLR: Contrastive Adaptation for Customization of Diffusion Models
Enis Simsar, Thomas Hofmann, Federico Tombari et al.
RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints
Yiran Qin, Li Kang, Xiufeng Song et al.
Synthetic Video Enhances Physical Fidelity in Video Synthesis
Qi Zhao, Xingyu Ni, Ziyu Wang et al.
NoT: Federated Unlearning via Weight Negation
Yasser Khalil, Leo Maxime Brunswic, Soufiane Lamghari et al.
Precise Localization of Memories: A Fine-grained Neuron-level Knowledge Editing Technique for LLMs
Haowen Pan, Xiaozhi Wang, Yixin Cao et al.
Artificial Hivemind: The Open-Ended Homogeneity of Language Models (and Beyond)
Liwei Jiang, Yuanjun Chai, Margaret Li et al.
Learning the RoPEs: Better 2D and 3D Position Encodings with STRING
Connor Schenck, Isaac Reid, Mithun Jacob et al.
Large language models can learn and generalize steganographic chain-of-thought under process supervision
ROBERT MC CARTHY, Joey SKAF, Luis Ibanez-Lissen et al.
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning
Alexander Nikulin, Ilya Zisman, Alexey Zemtsov et al.
Federated Learning with Sample-level Client Drift Mitigation
Haoran Xu, Jiaze Li, Wanyi Wu et al.
Enhancing Trustworthiness of Graph Neural Networks with Rank-Based Conformal Training
Ting Wang, Zhixin Zhou, Rui Luo
KnowPO: Knowledge-Aware Preference Optimization for Controllable Knowledge Selection in Retrieval-Augmented Language Models
Ruizhe Zhang, Yongxin Xu, Yuzhen Xiao et al.
Addax: Utilizing Zeroth-Order Gradients to Improve Memory Efficiency and Performance of SGD for Fine-Tuning Language Models
Zeman Li, Xinwei Zhang, Peilin Zhong et al.
Multi-view Reconstruction via SfM-guided Monocular Depth Estimation
Haoyu Guo, He Zhu, Sida Peng et al.
Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts
Guorui Zheng, Xidong Wang, Juhao Liang et al.
Embodied VideoAgent: Persistent Memory from Egocentric Videos and Embodied Sensors Enables Dynamic Scene Understanding
Yue Fan, Xiaojian Ma, Rongpeng Su et al.
FluidNexus: 3D Fluid Reconstruction and Prediction from a Single Video
Yue Gao, Hong-Xing Yu, Bo Zhu et al.
MM-CamObj: A Comprehensive Multimodal Dataset for Camouflaged Object Scenarios
Jiacheng Ruan, Wenzhen Yuan, Zehao Lin et al.
Rectified Diffusion Guidance for Conditional Generation
Mengfei Xia, Nan Xue, Yujun Shen et al.
DELTA: Pre-Train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment
Haitao Li, Qingyao Ai, Xinyan Han et al.
Effective Training Data Synthesis for Improving MLLM Chart Understanding
Yuwei Yang, Zeyu Zhang, Yunzhong Hou et al.
Planning in the Dark: LLM-Symbolic Planning Pipeline Without Experts
Sukai Huang, Nir Lipovetzky, Trevor Cohn
SeaS: Few-shot Industrial Anomaly Image Generation with Separation and Sharing Fine-tuning
Zhewei Dai, Shilei Zeng, Haotian Liu et al.
Proxy Denoising for Source-Free Domain Adaptation
Song Tang, Wenxin Su, Yan Gan et al.
Detecting Backdoor Attacks in Federated Learning via Direction Alignment Inspection
Jiahao Xu, Zikai Zhang, Rui Hu
BlobGEN-Vid: Compositional Text-to-Video Generation with Blob Video Representations
Weixi Feng, Chao Liu, Sifei Liu et al.
Node Identifiers: Compact, Discrete Representations for Efficient Graph Learning
Yuankai Luo, Hongkang Li, Qijiong Liu et al.
FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
Renshan Zhang, Rui Shao, Gongwei Chen et al.
VladVA: Discriminative Fine-tuning of LVLMs
Yassine Ouali, Adrian Bulat, ALEXANDROS XENOS et al.
Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts
Yun Wang, Longguang Wang, Chenghao Zhang et al.
Residual Stream Analysis with Multi-Layer SAEs
Tim Lawson, Lucy Farnik, Conor Houghton et al.
Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma?
Tianyuan Qu, Longxiang Tang, Bohao PENG et al.
Toward Real-world BEV Perception: Depth Uncertainty Estimation via Gaussian Splatting
Shu-Wei Lu, Yi-Hsuan Tsai, Yi-Ting Chen
TopoNets: High performing vision and language models with brain-like topography
Mayukh Deb, Mainak Deb, Apurva Murty
KTAE: A Model-Free Algorithm to Key-Tokens Advantage Estimation in Mathematical Reasoning
Wei Sun, Wen Yang, Pu Jian et al.
UniMLVG: Unified Framework for Multi-view Long Video Generation with Comprehensive Control Capabilities for Autonomous Driving
Rui Chen, Zehuan Wu, Yichen Liu et al.
Adaptive Prompting for Continual Relation Extraction: A Within-Task Variance Perspective
Minh Le, Tien Ngoc Luu, An Nguyen The et al.
LoRASculpt: Sculpting LoRA for Harmonizing General and Specialized Knowledge in Multimodal Large Language Models
Jian Liang, Wenke Huang, Guancheng Wan et al.
SLIP: Spoof-Aware One-Class Face Anti-Spoofing with Language Image Pretraining
Pei-Kai Huang, Jun-Xiong Chong, Cheng-Hsuan Chiang et al.
Advancing Mathematical Reasoning in Language Models: The Impact of Problem-Solving Data, Data Synthesis Methods, and Training Stages
Zui Chen, Tianqiao Liu, Tongqing et al.
LBM: Latent Bridge Matching for Fast Image-to-Image Translation
Clément Chadebec, Onur Tasar, Sanjeev Sreetharan et al.
LoRA3D: Low-Rank Self-Calibration of 3D Geometric Foundation models
Ziqi Lu, Heng Yang, Danfei Xu et al.
SpiritSight Agent: Advanced GUI Agent with One Look
Zhiyuan Huang, Ziming Cheng, Junting Pan et al.
Jailbreak Antidote: Runtime Safety-Utility Balance via Sparse Representation Adjustment in Large Language Models
Guobin Shen, Dongcheng Zhao, Yiting Dong et al.
One-for-More: Continual Diffusion Model for Anomaly Detection
Xiaofan Li, Xin Tan, Zhuo Chen et al.
VTDexManip: A Dataset and Benchmark for Visual-tactile Pretraining and Dexterous Manipulation with Reinforcement Learning
Qingtao Liu, Yu Cui, Zhengnan Sun et al.
MPTSNet: Integrating Multiscale Periodic Local Patterns and Global Dependencies for Multivariate Time Series Classification
Yang Mu, Muhammad Shahzad, Xiao Xiang Zhu
This Time is Different: An Observability Perspective on Time Series Foundation Models
Ben Cohen, Emaad Khwaja, Youssef Doubli et al.
MIDI-GPT: A Controllable Generative Model for Computer-Assisted Multitrack Music Composition
Philippe Pasquier, Jeff Ens, Nathan Fradet et al.
GaussianSpa: An “Optimizing-Sparsifying” Simplification Framework for Compact and High-Quality 3D Gaussian Splatting
Yangming Zhang, Wenqi Jia, Wei Niu et al.
NetMoE: Accelerating MoE Training through Dynamic Sample Placement
Xinyi Liu, Yujie Wang, Fangcheng Fu et al.
OmniCount: Multi-label Object Counting with Semantic-Geometric Priors
Anindya Mondal, Sauradip Nag, Xiatian Zhu et al.
DropoutGS: Dropping Out Gaussians for Better Sparse-view Rendering
Yexing Xu, Longguang Wang, Minglin Chen et al.
BlockDance: Reuse Structurally Similar Spatio-Temporal Features to Accelerate Diffusion Transformers
Hui Zhang, Tingwei Gao, Jie Shao et al.
DisasterM3: A Remote Sensing Vision-Language Dataset for Disaster Damage Assessment and Response
Junjue Wang, Weihao Xuan, Heli Qi et al.
Frame Order Matters: A Temporal Sequence-Aware Model for Few-Shot Action Recognition
Bozheng Li, Mushui Liu, Gaoang Wang et al.
Efficient Rectification of Neuro-Symbolic Reasoning Inconsistencies by Abductive Reflection
Wen-Chao Hu, Wang-Zhou Dai, Yuan Jiang et al.
On Linear Representations and Pretraining Data Frequency in Language Models
Jack Merullo, Noah Smith, Sarah Wiegreffe et al.
Bridging the Gap for Test-Time Multimodal Sentiment Analysis
Zirun Guo, Tao Jin, Wenlong Xu et al.
Solver-Informed RL: Grounding Large Language Models for Authentic Optimization Modeling
Yitian Chen, Jingfan Xia, Siyu Shao et al.
Envisioning Class Entity Reasoning by Large Language Models for Few-shot Learning
Mushui Liu, Fangtai Wu, Bozheng Li et al.
Data Synthesis with Diverse Styles for Face Recognition via 3DMM-Guided Diffusion
Yuxi Mi, Zhizhou Zhong, Yuge Huang et al.
Causal Composition Diffusion Model for Closed-loop Traffic Generation
Haohong Lin, Xin Huang, Tung Phan-Minh et al.
Battling the Non-stationarity in Time Series Forecasting via Test-time Adaptation
HyunGi Kim, Siwon Kim, Jisoo Mok et al.
VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers
Juncan Deng, Shuaiting Li, Zeyu Wang et al.
Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance
Dimitris Oikonomou, Nicolas Loizou
GeoLLaVA-8K: Scaling Remote-Sensing Multimodal Large Language Models to 8K Resolution
Fengxiang Wang, Mingshuo Chen, Yueying Li et al.
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step
Hanyang Wang, Fangfu Liu, Jiawei Chi et al.
Conformal Thresholded Intervals for Efficient Regression
Rui Luo, Zhixin Zhou
DG-Mamba: Robust and Efficient Dynamic Graph Structure Learning with Selective State Space Models
Haonan Yuan, Qingyun Sun, Zhaonan Wang et al.
Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models
Jinho Jeong, Sangmin Han, Jinwoo Kim et al.
Broadening Target Distributions for Accelerated Diffusion Models via a Novel Analysis Approach
Yuchen Liang, Peizhong Ju, Yingbin Liang et al.
TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation
Gihyun Kwon, Jong Chul YE
Taming Video Diffusion Prior with Scene-Grounding Guidance for 3D Gaussian Splatting from Sparse Inputs
Yingji Zhong, Zhihao Li, Dave Zhenyu Chen et al.
STD-PLM: Understanding Both Spatial and Temporal Properties of Spatial-Temporal Data with PLM
Yiheng Huang, Xiaowei Mao, Shengnan Guo et al.
FSFM: A Generalizable Face Security Foundation Model via Self-Supervised Facial Representation Learning
Gaojian Wang, Feng Lin, Tong Wu et al.
Audio-Visual Instance Segmentation
Ruohao Guo, Xianghua Ying, Yaru Chen et al.
Once-for-All: Controllable Generative Image Compression with Dynamic Granularity Adaptation
Anqi Li, Feng Li, Yuxi Liu et al.
Transformer-Squared: Self-adaptive LLMs
Qi Sun, Edoardo Cetin, Yujin Tang
Lifting Motion to the 3D World via 2D Diffusion
Jiaman Li, Karen Liu, Jiajun Wu
OmniStyle: Filtering High Quality Style Transfer Data at Scale
Ye Wang, Ruiqi Liu, Jiang Lin et al.
Context Steering: Controllable Personalization at Inference Time
Zhiyang He, Sashrika Pandey, Mariah Schrum et al.
Differentiable Optimization of Similarity Scores Between Models and Brains
Nathan Cloos, Moufan Li, Markus Siegel et al.
CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance
Yongchao Chen, Yilun Hao, Yueying Liu et al.
Bridging the User-side Knowledge Gap in Knowledge-aware Recommendations with Large Language Models
Zheng Hu, Zhe Li, Ziyun Jiao et al.
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
Jiange Yang, Haoyi Zhu, Yating Wang et al.
TIGER: Time-frequency Interleaved Gain Extraction and Reconstruction for Efficient Speech Separation
Mohan Xu, Kai Li, Guo Chen et al.
RoboTron-Drive: All-in-One Large Multimodal Model for Autonomous Driving
Zhijian Huang, Chengjian Feng, Baihui Xiao et al.
WildSAT: Learning Satellite Image Representations from Wildlife Observations
Rangel Daroya, Elijah Cole, Oisin Mac Aodha et al.
SuperDec: 3D Scene Decomposition with Superquadrics Primitives
Elisabetta Fedele, Boyang Sun, Francis Engelmann et al.
General Scene Adaptation for Vision-and-Language Navigation
Haodong Hong, Yanyuan Qiao, Sen Wang et al.
CADDreamer: CAD Object Generation from Single-view Images
Yuan Li, Cheng Lin, Yuan Liu et al.
Light-T2M: A Lightweight and Fast Model for Text-to-motion Generation
Ling-An Zeng, Guohong Huang, Gaojie Wu et al.
Model-based RL as a Minimalist Approach to Horizon-Free and Second-Order Bounds
Zhiyong Wang, Dongruo Zhou, John C.S. Lui et al.
Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data
Seiji Maekawa, Hayate Iso, Nikita Bhutani
Flexible Frame Selection for Efficient Video Reasoning
Shyamal Buch, Arsha Nagrani, Anurag Arnab et al.
DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding
Wenhui Liao, Jiapeng Wang, Hongliang Li et al.
FedMIA: An Effective Membership Inference Attack Exploiting "All for One" Principle in Federated Learning
Gongxi Zhu, Donghao Li, Hanlin Gu et al.
Sports-Traj: A Unified Trajectory Generation Model for Multi-Agent Movement in Sports
Yi Xu, Yun Fu
Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling
Tsung-Han (Patrick) Wu, Heekyung Lee, Jiaxin Ge et al.
Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model
Keda TAO, Jinjin Gu, Yulun Zhang et al.
Unlearn and Burn: Adversarial Machine Unlearning Requests Destroy Model Accuracy
Yangsibo Huang, Daogao Liu, Lynn Chua et al.
Backdoor Attacks Against No-Reference Image Quality Assessment Models via a Scalable Trigger
Yi Yu, Song Xia, Xun Lin et al.
Training-free LLM-generated Text Detection by Mining Token Probability Sequences
Yihuai Xu, Yongwei Wang, YIFEI BI et al.
Generalized Recorrupted-to-Recorrupted: Self-Supervised Learning Beyond Gaussian Noise
Brayan Monroy, Jorge Bacca, Julián Tachella
SSLAM: Enhancing Self-Supervised Models with Audio Mixtures for Polyphonic Soundscapes
Tony Alex, Sara Atito, Armin Mustafa et al.
Reconstructing People, Places, and Cameras
Lea Müller, Hongsuk Choi, Anthony Zhang et al.
Taylor Series-Inspired Local Structure Fitting Network for Few-shot Point Cloud Semantic Segmentation
Changshuo Wang, Shuting He, Xiang Fang et al.
Efficient Attention-Sharing Information Distillation Transformer for Lightweight Single Image Super-Resolution
Karam Park, Jae Woong Soh, Nam Ik Cho
Noise Stability Optimization for Finding Flat Minima: A Hessian-based Regularization Approach
Haotian Ju, Hongyang Zhang, Dongyue Li
InsightEdit: Towards Better Instruction Following for Image Editing
Yingjing Xu, Jie Kong, Jiazhi Wang et al.
ReCap: Better Gaussian Relighting with Cross-Environment Captures
Jingzhi Li, Zongwei Wu, Eduard Zamfir et al.
On the Crucial Role of Initialization for Matrix Factorization
Bingcong Li, Liang Zhang, Aryan Mokhtari et al.
Gazing Into Missteps: Leveraging Eye-Gaze for Unsupervised Mistake Detection in Egocentric Videos of Skilled Human Activities
Michele Mazzamuto, Antonino Furnari, Yoichi Sato et al.
Enhancing Multilingual LLM Pretraining with Model-Based Data Selection
Bettina Messmer, Vinko Sabolčec, Martin Jaggi
CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models
Hao He, Ceyuan Yang, Shanchuan Lin et al.
Beyond Walking: A Large-Scale Image-Text Benchmark for Text-based Person Anomaly Search
Shuyu Yang, Yaxiong Wang, Li Zhu et al.
EmotiCrafter: Text-to-Emotional-Image Generation based on Valence-Arousal Model
Shengqi Dang, Yi He, Long Ling et al.
ASAP: Advancing Semantic Alignment Promotes Multi-Modal Manipulation Detecting and Grounding
Zhenxing Zhang, Yaxiong Wang, Lechao Cheng et al.
MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-text Decoding
Weikang Qiu, Zheng Huang, Haoyu Hu et al.
Fast training and sampling of Restricted Boltzmann Machines
Nicolas BEREUX, Aurélien Decelle, Cyril Furtlehner et al.
Open-World Reinforcement Learning over Long Short-Term Imagination
Jiajian Li, Qi Wang, Yunbo Wang et al.
Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model
Zhaochong An, Guolei Sun, Yun Liu et al.
Attention layers provably solve single-location regression
Pierre Marion, Raphaël Berthier, Gérard Biau et al.
Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation
Yuxuan Wang, Xuanyu Yi, Haohan Weng et al.
Efficient Gaussian Splatting for Monocular Dynamic Scene Rendering via Sparse Time-Variant Attribute Modeling
Hanyang Kong, Xingyi Yang, Xinchao Wang
More RLHF, More Trust? On The Impact of Preference Alignment On Trustworthiness
Aaron J. Li, Satyapriya Krishna, Hima Lakkaraju
Diff2Flow: Training Flow Matching Models via Diffusion Model Alignment
Johannes Schusterbauer, Ming Gui, Frank Fundel et al.
Graph Neural Preconditioners for Iterative Solutions of Sparse Linear Systems
Jie Chen
From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit
Valérie Costa, Thomas Fel, Ekdeep S Lubana et al.
MolParser: End-to-end Visual Recognition of Molecule Structures in the Wild
Xi Fang, Jiankun Wang, Xiaochen Cai et al.
Lightweight Neural App Control
Filippos Christianos, Georgios Papoudakis, Thomas Coste et al.
The 3D-PC: a benchmark for visual perspective taking in humans and machines
Drew Linsley, Peisen Zhou, Alekh Ashok et al.
MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL
Claas Voelcker, Marcel Hussing, ERIC EATON et al.
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data
Yiren Song, Cheng Liu, Mike Zheng Shou
AlphaPO: Reward Shape Matters for LLM Alignment
Aman Gupta, Shao Tang, Qingquan Song et al.
AdaRankGrad: Adaptive Gradient Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning
Yehonathan Refael, Jonathan Svirsky, Boris Shustin et al.
Constrain Alignment with Sparse Autoencoders
Qingyu Yin, Chak Tou Leong, Hongbo Zhang et al.
STOP: Integrated Spatial-Temporal Dynamic Prompting for Video Understanding
Zichen Liu, Kunlun Xu, Bing Su et al.
Atlas Gaussians Diffusion for 3D Generation
Haitao Yang, Yuan Dong, Hanwen Jiang et al.
A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks
Thomas Schmied, Thomas Adler, Vihang Patil et al.
A Recipe for Generating 3D Worlds from a Single Image
Katja Schwarz, Denis Rozumny, Samuel Rota Bulò et al.
Are They the Same? Exploring Visual Correspondence Shortcomings of Multimodal LLMs
Yikang Zhou, Tao Zhang, Shilin Xu et al.
VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation
Wenhao Wang, Yi Yang
Consistency Checks for Language Model Forecasters
Daniel Paleka, Abhimanyu Pallavi Sudhir, Alejandro Alvarez et al.