Most Cited 2025 "squared error loss" Papers
22,274 papers found • Page 10 of 112
Conference
Kernel-Aware Graph Prompt Learning for Few-Shot Anomaly Detection
Fenfang Tao, Guo-Sen Xie, Fang Zhao et al.
Retrieval Augmented Time Series Forecasting
Sungwon Han, Seungeon Lee, MEEYOUNG CHA et al.
DiffMS: Diffusion Generation of Molecules Conditioned on Mass Spectra
Montgomery Bohde, Mrunali Manjrekar, Runzhong Wang et al.
Discrete to Continuous: Generating Smooth Transition Poses from Sign Language Observations
Shengeng Tang, Jiayi He, Lechao Cheng et al.
PACT: Pruning and Clustering-Based Token Reduction for Faster Visual Language Models
Dhouib Mohamed, Davide Buscaldi, Vanier Sonia et al.
OmniPaint: Mastering Object-Oriented Editing via Disentangled Insertion-Removal Inpainting
Yongsheng Yu, Ziyun Zeng, Haitian Zheng et al.
ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems
Xiangyuan Xue, Zeyu Lu, Di Huang et al.
Improved Bounds for Online Facility Location with Predictions
Dimitris Fotakis, Evangelia Gergatsouli, Themistoklis Gouleakis et al.
An Empirical Study of Autoregressive Pre-training from Videos
Jathushan Rajasegaran, Ilija Radosavovic, Rahul Ravishankar et al.
PuzzleFusion++: Auto-agglomerative 3D Fracture Assembly by Denoise and Verify
Zhengqing Wang, Jiacheng Chen, Yasutaka Furukawa
ILIAS: Instance-Level Image retrieval At Scale
Giorgos Kordopatis-Zilos, Vladan Stojnić, Anna Manko et al.
LoRA Done RITE: Robust Invariant Transformation Equilibration for LoRA Optimization
Jui-Nan Yen, Si Si, Zhao Meng et al.
From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications
Ajay Jaiswal, Yifan Wang, Lu Yin et al.
FD2-Net: Frequency-Driven Feature Decomposition Network for Infrared-Visible Object Detection
Ke Li, Di Wang, Zhangyuan Hu et al.
Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach
Jiancong Xiao, Bojian Hou, Zhanliang Wang et al.
Image-level Memorization Detection via Inversion-based Inference Perturbation
Yue Jiang, Haokun Lin, Yang Bai et al.
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
Tianjin Huang, Ziquan Zhu, Gaojie Jin et al.
S2Gaussian: Sparse-View Super-Resolution 3D Gaussian Splatting
Yecong Wan, Mingwen Shao, Yuanshuo Cheng et al.
BindGPT: A Scalable Framework for 3D Molecular Design via Language Modeling and Reinforcement Learning
Artem Zholus, Maksim Kuznetsov, Roman Schutski et al.
DRoC: Elevating Large Language Models for Complex Vehicle Routing via Decomposed Retrieval of Constraints
Xia Jiang, Yaoxin Wu, Chenhao Zhang et al.
Spiking Vision Transformer with Saccadic Attention
Shuai Wang, Malu Zhang, Dehao Zhang et al.
UNSURE: self-supervised learning with Unknown Noise level and Stein's Unbiased Risk Estimate
Julián Tachella, Mike Davies, Laurent Jacques
The Pitfalls of Memorization: When Memorization Hurts Generalization
Reza Bayat, Mohammad Pezeshki, Elvis Dohmatob et al.
Wasserstein Flow Matching: Generative Modeling Over Families of Distributions
Doron Haviv, Aram-Alexandre Pooladian, Dana Pe'er et al.
RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers
Yan Gong, Yiren Song, Yicheng Li et al.
GaussianFlowOcc: Sparse and Weakly Supervised Occupancy Estimation using Gaussian Splatting and Temporal Flow
Simon Boeder, Fabian Gigengack, Benjamin Risse
VMBench: A Benchmark for Perception-Aligned Video Motion Generation
Xinran Ling, Chen Zhu, Meiqi Wu et al.
BillBoard Splatting (BBSplat): Learnable Textured Primitives for Novel View Synthesis
David Svitov, Pietro Morerio, Lourdes Agapito et al.
Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs
Lei Zhang, Yunshui Li, Jiaming Li et al.
Logically Consistent Language Models via Neuro-Symbolic Integration
Diego Calanzone, Stefano Teso, Antonio Vergari
Scaling Properties of Diffusion Models For Perceptual Tasks
Rahul Ravishankar, Zeeshan Patel, Jathushan Rajasegaran et al.
Black-Box Detection of Language Model Watermarks
Thibaud Gloaguen, Nikola Jovanović, Robin Staab et al.
RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
Haifeng Huang, Xinyi Chen, Yilun Chen et al.
Security Attacks on LLM-based Code Completion Tools
Wen Cheng, Ke Sun, Xinyu Zhang et al.
VLM4D: Towards Spatiotemporal Awareness in Vision Language Models
Shijie Zhou, Alexander Vilesov, Xuehai He et al.
FrugalNeRF: Fast Convergence for Extreme Few-shot Novel View Synthesis without Learned Priors
Chin-Yang Lin, Chung-Ho Wu, Changhan Yeh et al.
Breaking the Low-Rank Dilemma of Linear Attention
Qihang Fan, Huaibo Huang, Ran He
Thermalizer: Stable autoregressive neural emulation of spatiotemporal chaos
Chris Pedersen, Laure Zanna, Joan Bruna
IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations
Zhibing Li, Tong Wu, Jing Tan et al.
Any6D: Model-free 6D Pose Estimation of Novel Object
Taeyeop Lee, Bowen Wen, Minjun Kang et al.
F-Fidelity: A Robust Framework for Faithfulness Evaluation of Explainable AI
Xu Zheng, Farhad Shirani, Zhuomin Chen et al.
Efficient Learning with Sine-Activated Low-Rank Matrices
Yiping Ji, Hemanth Saratchandran, Cameron Gordon et al.
Logical Consistency of Large Language Models in Fact-Checking
Bishwamittra Ghosh, Sarah Hasan, Naheed Anjum Arafat et al.
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models
Dewei Zhou, Mingwei Li, Zongxin Yang et al.
Dual Data Alignment Makes AI-Generated Image Detector Easier Generalizable
Ruoxin Chen, Junwei Xi, Zhiyuan Yan et al.
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
Yuxian Gu, Qinghao Hu, Haocheng Xi et al.
AgentAuditor: Human-level Safety and Security Evaluation for LLM Agents
Hanjun Luo, Shenyu Dai, Chiming Ni et al.
Multi-Domain Graph Foundation Models: Robust Knowledge Transfer via Topology Alignment
Shuo Wang, Bokui Wang, Zhixiang Shen et al.
AutoToM: Scaling Model-based Mental Inference via Automated Agent Modeling
Zhining Zhang, Chuanyang Jin, Mung Yao Jia et al.
AllTracker: Efficient Dense Point Tracking at High Resolution
Adam Harley, Yang You, Yang Zheng et al.
Video-T1: Test-time Scaling for Video Generation
Fangfu Liu, Hanyang Wang, Yimo Cai et al.
The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense
Yangyang Guo, Fangkai Jiao, Liqiang Nie et al.
FreeTimeGS: Free Gaussian Primitives at Anytime Anywhere for Dynamic Scene Reconstruction
Yifan Wang, Peishan Yang, Zhen Xu et al.
Speeding Up the NSGA-II with a Simple Tie-Breaking Rule
Benjamin Doerr, Tudor Ivan, Martin S. Krejca
Diffusion Models are Evolutionary Algorithms
Yanbo Zhang, Benedikt Hartl, Hananel Hazan et al.
AGENTIF: Benchmarking Large Language Models Instruction Following Ability in Agentic Scenarios
Yunjia Qi, Hao Peng, Xiaozhi Wang et al.
LSNet: See Large, Focus Small
Ao Wang, Hui Chen, Zijia Lin et al.
Systematic Outliers in Large Language Models
Yongqi An, Xu Zhao, Tao Yu et al.
Is Artificial Intelligence Generated Image Detection a Solved Problem?
Ziqiang Li, Jiazhen Yan, Ziwen He et al.
Stochastic Deep Restoration Priors for Imaging Inverse Problems
Yuyang Hu, Albert Peng, Weijie Gan et al.
Can In-context Learning Really Generalize to Out-of-distribution Tasks?
Qixun Wang, Yifei Wang, Xianghua Ying et al.
RocketEval: Efficient automated LLM evaluation via grading checklist
Tianjun Wei, Wei Wen, Ruizhi Qiao et al.
Transformers Struggle to Learn to Search
Abulhair Saparov, Srushti Ajay Pawar, Shreyas Pimpalgaonkar et al.
VLRMBench: A Comprehensive and Challenging Benchmark for Vision-Language Reward Models
JIACHENG RUAN, Wenzhen Yuan, Xian Gao et al.
Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding
Wei Suo, Lijun Zhang, Mengyang Sun et al.
3D-AffordanceLLM: Harnessing Large Language Models for Open-Vocabulary Affordance Detection in 3D Worlds
Hengshuo Chu, Xiang Deng, Qi Lv et al.
Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval
Sheryl Hsu, Omar Khattab, Chelsea Finn et al.
Sylber: Syllabic Embedding Representation of Speech from Raw Audio
Cheol Jun Cho, Nicholas Lee, Akshat Gupta et al.
DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation
Changdae Oh, Yixuan Li, Kyungwoo Song et al.
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion
Jiuhai Chen, Jianwei Yang, Haiping Wu et al.
RoboScape: Physics-informed Embodied World Model
Yu Shang, Xin Zhang, Yinzhou Tang et al.
SpectroMotion: Dynamic 3D Reconstruction of Specular Scenes
Cheng-De Fan, Chen-Wei Chang, Yi-Ruei Liu et al.
AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration
Andy Zhou, Kevin Wu, Francesco Pinto et al.
Bridging the Data Provenance Gap Across Text, Speech, and Video
Shayne Longpre, Nikhil Singh, Manuel Cherep et al.
Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models
Yiran Guo, Lijie Xu, Jie Liu et al.
TimeKAN: KAN-based Frequency Decomposition Learning Architecture for Long-term Time Series Forecasting
Songtao Huang, Zhen Zhao, Can Li et al.
V2M4: 4D Mesh Animation Reconstruction from a Single Monocular Video
Jianqi Chen, Biao Zhang, Xiangjun Tang et al.
Debate or Vote: Which Yields Better Decisions in Multi-Agent Large Language Models?
Hyeong Kyu Choi, Jerry Zhu, Sharon Li
X-Dyna: Expressive Dynamic Human Image Animation
Di Chang, Hongyi Xu, You Xie et al.
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References
Qiyuan Zhang, Yufei Wang, Tiezheng YU et al.
Active Data Curation Effectively Distills Large-Scale Multimodal Models
Vishaal Udandarao, Nikhil Parthasarathy, Muhammad Ferjad Naeem et al.
Needle In A Video Haystack: A Scalable Synthetic Evaluator for Video MLLMs
Zijia Zhao, Haoyu Lu, Yuqi Huo et al.
REArtGS: Reconstructing and Generating Articulated Objects via 3D Gaussian Splatting with Geometric and Motion Constraints
Di Wu, Liu Liu, Zhou Linli et al.
DyMO: Training-Free Diffusion Model Alignment with Dynamic Multi-Objective Scheduling
Xin Xie, Dong Gong
PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs
Oskar van der Wal, Pietro Lesci, Max Müller-Eberstein et al.
JamMa: Ultra-lightweight Local Feature Matching with Joint Mamba
Xiaoyong Lu, Songlin Du
Regressing the Relative Future: Efficient Policy Optimization for Multi-turn RLHF
Zhaolin Gao, Wenhao Zhan, Jonathan Chang et al.
Optimal transport-based conformal prediction
Gauthier Thurin, Kimia Nadjahi, Claire Boyer
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis
Jiatao Gu, Tianrong Chen, David Berthelot et al.
CircuitFusion: Multimodal Circuit Representation Learning for Agile Chip Design
Wenji Fang, Shang Liu, Jing Wang et al.
Physics-Constrained Flow Matching: Sampling Generative Models with Hard Constraints
Utkarsh Utkarsh, Pengfei Cai, Alan Edelman et al.
AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks
Zekang Yang, Wang Zeng, Sheng Jin et al.
Mechanistic Permutability: Match Features Across Layers
Nikita Balagansky, Ian Maksimov, Daniil Gavrilov
Language Guided Skill Discovery
Seungeun Rho, Laura Smith, Tianyu Li et al.
LightPROF: A Lightweight Reasoning Framework for Large Language Model on Knowledge Graph
Tu Ao, Yanhua Yu, Yuling Wang et al.
FunBO: Discovering Acquisition Functions for Bayesian Optimization with FunSearch
Virginia Aglietti, Ira Ktena, Jessica Schrouff et al.
Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval
Yuanmin Tang, Jing Yu, Keke Gai et al.
CofCA: A STEP-WISE Counterfactual Multi-hop QA benchmark
Jian Wu, Linyi Yang, Zhen Wang et al.
MovieBench: A Hierarchical Movie Level Dataset for Long Video Generation
Weijia Wu, Mingyu Liu, Zeyu Zhu et al.
UniPose: A Unified Multimodal Framework for Human Pose Comprehension, Generation and Editing
Yiheng Li, RuiBing Hou, Hong Chang et al.
Unleashing Vecset Diffusion Model for Fast Shape Generation
Zeqiang Lai, Zhao Yunfei, Zibo Zhao et al.
IDProtector: An Adversarial Noise Encoder to Protect Against ID-Preserving Image Generation
Yiren Song, Pei Yang, Hai Ci et al.
4DGT: Learning a 4D Gaussian Transformer Using Real-World Monocular Videos
Zhen Xu, Zhengqin Li, Zhao Dong et al.
Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM
Han Wang, Yuxiang Nie, Yongjie Ye et al.
FineLIP: Extending CLIP’s Reach via Fine-Grained Alignment with Longer Text Inputs
Mothilal Asokan, Kebin wu, Fatima Albreiki
Federated Unlearning with Gradient Descent and Conflict Mitigation
Zibin Pan, Zhichao Wang, Chi Li et al.
MMRole: A Comprehensive Framework for Developing and Evaluating Multimodal Role-Playing Agents
Yanqi Dai, Huanran Hu, Lei Wang et al.
Knowledge Editing with Dynamic Knowledge Graphs for Multi-Hop Question Answering
Yifan Lu, Yigeng Zhou, Jing Li et al.
Ready-to-React: Online Reaction Policy for Two-Character Interaction Generation
Zhi Cen, Huaijin Pi, Sida Peng et al.
Refine Knowledge of Large Language Models via Adaptive Contrastive Learning
Yinghui Li, Haojing Huang, Jiayi Kuang et al.
HiLo: A Learning Framework for Generalized Category Discovery Robust to Domain Shifts
Hongjun Wang, Sagar Vaze, Kai Han
MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research
James Burgess, Jeffrey J Nirschl, Laura Bravo-Sánchez et al.
Know "No" Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP
Junsung Park, Jungbeom Lee, Jongyoon Song et al.
Mitigate the Gap: Improving Cross-Modal Alignment in CLIP
Sedigheh Eslami, Gerard de Melo
Robust Function-Calling for On-Device Language Model via Function Masking
Qiqiang Lin, Muning Wen, Qiuying Peng et al.
A Second-Order Perspective on Model Compositionality and Incremental Learning
Angelo Porrello, Lorenzo Bonicelli, Pietro Buzzega et al.
The Power of LLM-Generated Synthetic Data for Stance Detection in Online Political Discussions
Stefan Sylvius Wagner, Maike Behrendt, Marc Ziegele et al.
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao, Chenlu Ye, Quanquan Gu et al.
Omnia de EgoTempo: Benchmarking Temporal Understanding of Multi-Modal LLMs in Egocentric Videos
Chiara Plizzari, Alessio Tonioni, Yongqin Xian et al.
Presto! Distilling Steps and Layers for Accelerating Music Generation
Zachary Novack, Ge Zhu, Jonah Casebeer et al.
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption
Tiehan Fan, Kepan Nan, Rui Xie et al.
Docopilot: Improving Multimodal Models for Document-Level Understanding
Yuchen Duan, Zhe Chen, Yusong Hu et al.
Open Vision Reasoner: Transferring Linguistic Cognitive Behavior for Visual Reasoning
Yana Wei, Liang Zhao, Jianjian Sun et al.
Unified Parameter-Efficient Unlearning for LLMs
Chenlu Ding, Jiancan Wu, Yancheng Yuan et al.
Geolocation Representation from Large Language Models Are Generic Enhancers for Spatio-Temporal Learning
Junlin He, Tong Nie, Wei Ma
Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning
Zenan Li, Zhaoyu Li, Wen Tang et al.
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Julie Kallini, Shikhar Murty, Christopher Manning et al.
Beyond Canonicalization: How Tensorial Messages Improve Equivariant Message Passing
Peter Lippmann, Gerrit Gerhartz, Roman Remme et al.
Cross-Modal Safety Mechanism Transfer in Large Vision-Language Models
Shicheng Xu, Liang Pang, Yunchang Zhu et al.
Transformers Can Learn Temporal Difference Methods for In-Context Reinforcement Learning
Jiuqi Wang, Ethan Blaser, Hadi Daneshmand et al.
Lost in Translation, Found in Context: Sign Language Translation with Contextual Cues
Youngjoon Jang, Haran Raajesh, Liliane Momeni et al.
Towards a Theoretical Understanding of Synthetic Data in LLM Post-Training: A Reverse-Bottleneck Perspective
Zeyu Gan, Yong Liu
SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models
Zilan Wang, Junfeng Guo, Jiacheng Zhu et al.
Inference-Time Hyper-Scaling with KV Cache Compression
Adrian Łańcucki, Konrad Staniszewski, Piotr Nawrot et al.
Code Graph Model (CGM): A Graph-Integrated Large Language Model for Repository-Level Software Engineering Tasks
Hongyuan Tao, Ying Zhang, Zhenhao Tang et al.
Joint Velocity-Growth Flow Matching for Single-Cell Dynamics Modeling
Dongyi Wang, Yuanwei Jiang, Zhenyi Zhang et al.
Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization
Zichen Miao, Zhengyuan Yang, Kevin Lin et al.
Quantized Spike-driven Transformer
Xuerui Qiu, Malu Zhang, Jieyuan Zhang et al.
Semantic Convergence: Harmonizing Recommender Systems via Two-Stage Alignment and Behavioral Semantic Tokenization
Guanghan Li, Xun Zhang, Yufei Zhang et al.
Vision-Language Gradient Descent-driven All-in-One Deep Unfolding Networks
Haijin Zeng, Xiangming Wang, Yongyong Chen et al.
Efficient Automated Circuit Discovery in Transformers using Contextual Decomposition
Aliyah Hsu, Georgia Zhou, Yeshwanth Cherapanamjeri et al.
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation
Ziyang Luo, Haoning Wu, Dongxu Li et al.
Weak-to-Strong Preference Optimization: Stealing Reward from Weak Aligned Model
Wenhong Zhu, Zhiwei He, Xiaofeng Wang et al.
BingoGuard: LLM Content Moderation Tools with Risk Levels
Fan Yin, Philippe Laban, XIANGYU PENG et al.
Scalable Discrete Diffusion Samplers: Combinatorial Optimization and Statistical Physics
Sebastian Sanokowski, Wilhelm Berghammer, Haoyu Wang et al.
Video Anomaly Detection with Motion and Appearance Guided Patch Diffusion Model
Hang Zhou, Jiale Cai, Yuteng Ye et al.
Low-Light Image Enhancement via Generative Perceptual Priors
Han Zhou, Wei Dong, Xiaohong Liu et al.
ThinkBench: Dynamic Out-of-Distribution Evaluation for Robust LLM Reasoning
Shulin Huang, Linyi Yang, Yan Song et al.
Optimizing Temperature for Language Models with Multi-Sample Inference
Weihua Du, Yiming Yang, Sean Welleck
Optimization with Access to Auxiliary Information
EL MAHDI CHAYTI, Sai Karimireddy
KVTuner: Sensitivity-Aware Layer-Wise Mixed-Precision KV Cache Quantization for Efficient and Nearly Lossless LLM Inference
Xing Li, Zeyu Xing, Yiming Li et al.
Universal Cross-Tokenizer Distillation via Approximate Likelihood Matching
Benjamin Minixhofer, Ivan Vulić, Edoardo Maria Ponti
Block-Attention for Efficient Prefilling
Dongyang Ma, Yan Wang, Tian Lan
Quantization without Tears
Minghao Fu, Hao Yu, Jie Shao et al.
Pippo: High-Resolution Multi-View Humans from a Single Image
Yash Kant, Ethan Weber, Jin Kyu Kim et al.
CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images
Chen Cheng, Jiacheng Wei, Tianrun Chen et al.
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference
Xiang Liu, Zhenheng Tang, Peijie Dong et al.
QiMeng-CodeV-R1: Reasoning-Enhanced Verilog Generation
Yaoyu Zhu, Di Huang, Hanqi Lyu et al.
Explore In-Context Segmentation via Latent Diffusion Models
Chaoyang Wang, Xiangtai Li, Henghui Ding et al.
OpenUnlearning: Accelerating LLM Unlearning via Unified Benchmarking of Methods and Metrics
Vineeth Dorna, Anmol Mekala, Wenlong Zhao et al.
LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models
Lukas Helff, Felix Friedrich, Manuel Brack et al.
Rethinking End-to-End 2D to 3D Scene Segmentation in Gaussian Splatting
Runsong Zhu, Shi Qiu, ZHENGZHE LIU et al.
Can Transformers Learn Full Bayesian Inference in Context?
Arik Reuter, Tim G. J. Rudner, Vincent Fortuin et al.
Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation
Slava Elizarov, Ciara Rowles, Simon Donné
SAIST: Segment Any Infrared Small Target Model Guided by Contrastive Language-Image Pretraining
Mingjin Zhang, Xiaolong Li, Fei Gao et al.
Reversible Decoupling Network for Single Image Reflection Removal
Hao Zhao, Mingjia Li, Qiming Hu et al.
GigaHands: A Massive Annotated Dataset of Bimanual Hand Activities
Rao Fu, Dingxi Zhang, Alex Jiang et al.
RelGNN: Composite Message Passing for Relational Deep Learning
Tianlang Chen, Charilaos Kanatsoulis, Jure Leskovec
Weak-to-Strong Generalization Through the Data-Centric Lens
Changho Shin, John Cooper, Frederic Sala
UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface
Hao Tang, Chen-Wei Xie, Haiyang Wang et al.
Lux Post Facto: Learning Portrait Performance Relighting with Conditional Video Diffusion and a Hybrid Dataset
Yiqun Mei, Mingming He, Li Ma et al.
FaceShot: Bring Any Character into Life
Junyao Gao, Yanan Sun, Fei Shen et al.
Video Diffusion Models Are Strong Video Inpainter
Minhyeok Lee, Suhwan Cho, Chajin Shin et al.
SwS: Self-aware Weakness-driven Problem Synthesis in Reinforcement Learning for LLM Reasoning
Xiao Liang, Zhong-Zhi Li, Yeyun Gong et al.
Mixture of Parrots: Experts improve memorization more than reasoning
Samy Jelassi, Clara Mohri, David Brandfonbrener et al.
AerialMegaDepth: Learning Aerial-Ground Reconstruction and View Synthesis
Khiem Vuong, Anurag Ghosh, Deva Ramanan et al.
E-Valuating Classifier Two-Sample Tests
Tim Bakker, Christian A. Naesseth, Patrick Forré et al.
HGSFusion: Radar-Camera Fusion with Hybrid Generation and Synchronization for 3D Object Detection
Zijian Gu, Jianwei Ma, Yan Huang et al.
Generating Multi-Image Synthetic Data for Text-to-Image Customization
Nupur Kumari, Xi Yin, Jun-Yan Zhu et al.
Bridging Modalities: Improving Universal Multimodal Retrieval by Multimodal Large Language Models
Xin Zhang, Yanzhao Zhang, Wen Xie et al.
Deep Distributed Optimization for Large-Scale Quadratic Programming
Augustinos Saravanos, Hunter Kuperman, Alex Oshin et al.
DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing
William June Suk Choi, Kyungmin Lee, Jongheon Jeong et al.
Online Reasoning Video Segmentation with Just-in-Time Digital Twins
Yiqing Shen, Bohan Liu, Chenjia Li et al.
Learning Efficient Positional Encodings with Graph Neural Networks
Charilaos Kanatsoulis, Evelyn Choi, Stefanie Jegelka et al.
ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement
XIANGYU PENG, Congying Xia, Xinyi Yang et al.
Multi-Turn Jailbreaking Large Language Models via Attention Shifting
Xiaohu Du, Fan Mo, Ming Wen et al.
Position: Don't Use the CLT in LLM Evals With Fewer Than a Few Hundred Datapoints
Sam Bowyer, Laurence Aitchison, Desi Ivanova
Endless Jailbreaks with Bijection Learning
Brian R.Y. Huang, Max Li, Leonard Tang
Layout-your-3D: Controllable and Precise 3D Generation with 2D Blueprint
Junwei Zhou, Xueting Li, Lu Qi et al.
NEST: A Neuromodulated Small-world Hypergraph Trajectory Prediction Model for Autonomous Driving
Chengyue Wang, Haicheng Liao, Bonan Wang et al.
MambaIC: State Space Models for High-Performance Learned Image Compression
Fanhu Zeng, Hao Tang, Yihua Shao et al.
Probabilistic Language-Image Pre-Training
Sanghyuk Chun, Wonjae Kim, Song Park et al.
Improved Regret Bounds for Linear Adversarial MDPs via Linear Optimization
XiangCheng Zhang, Fang Kong, Baoxiang Wang et al.
Provably Accurate Shapley Value Estimation via Leverage Score Sampling
Christopher Musco, R. Teal Witter
Retrieval-Augmented Dynamic Prompt Tuning for Incomplete Multimodal Learning
Jian Lang, Zhangtao Cheng, Ting Zhong et al.
NFIG: Multi-Scale Autoregressive Image Generation via Frequency Ordering
Zhihao Huang, Xi Qiu, Yukuo Ma et al.
Retrieving Semantics from the Deep: an RAG Solution for Gesture Synthesis
M. Hamza Mughal, Rishabh Dabral, Merel CJ Scholman et al.
Diffusion on Language Model Encodings for Protein Sequence Generation
Viacheslav Meshchaninov, Pavel Strashnov, Andrey Shevtsov et al.
Uni4D: Unifying Visual Foundation Models for 4D Modeling from a Single Video
David Yifan Yao, Albert J. Zhai, Shenlong Wang
Assessing and Learning Alignment of Unimodal Vision and Language Models
Le Zhang, Qian Yang, Aishwarya Agrawal