Most Cited 2025 "3d all-atom models" Papers
22,274 papers found • Page 12 of 112
Conference
Unleashing Vecset Diffusion Model for Fast Shape Generation
Zeqiang Lai, Zhao Yunfei, Zibo Zhao et al.
Presto! Distilling Steps and Layers for Accelerating Music Generation
Zachary Novack, Ge Zhu, Jonah Casebeer et al.
The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise
Shuze Daniel Liu, Shuhang Chen, Shangtong Zhang
KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction
Jang-Hyun Kim, Jinuk Kim, Sangwoo Kwon et al.
Amplifier: Bringing Attention to Neglected Low-Energy Components in Time Series Forecasting
Jingru Fei, Kun Yi, Wei Fan et al.
Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation
Guy Yariv, Yuval Kirstain, Amit Zohar et al.
HRAvatar: High-Quality and Relightable Gaussian Head Avatar
Dongbin Zhang, Yunfei Liu, Lijian Lin et al.
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
Jiarui Yao, Yifan Hao, Hanning Zhang et al.
Scenario Dreamer: Vectorized Latent Diffusion for Generating Driving Simulation Environments
Luke Rowe, Roger Girgis, Anthony Gosselin et al.
Change3D: Revisiting Change Detection and Captioning from A Video Modeling Perspective
Duowang Zhu, Xiaohu Huang, Haiyan Huang et al.
HarmAug: Effective Data Augmentation for Knowledge Distillation of Safety Guard Models
Seanie Lee, Haebin Seong, Dong Bok Lee et al.
CirT: Global Subseasonal-to-Seasonal Forecasting with Geometry-inspired Transformer
Yang Liu, Zinan Zheng, Jiashun Cheng et al.
Local Conditional Controlling for Text-to-Image Diffusion Models
Yibo Zhao, Liang Peng, Yang Yang et al.
AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models
Ziyin Zhou, Yunpeng Luo, Yuanchen Wu et al.
Multimodal Class-aware Semantic Enhancement Network for Audio-Visual Video Parsing
Pengcheng Zhao, Jinxing Zhou, Yang Zhao et al.
ATLAS: Autoformalizing Theorems through Lifting, Augmentation, and Synthesis of Data
Xiaoyang Liu, Kangjie Bao, Jiashuo Zhang et al.
PILAF: Optimal Human Preference Sampling for Reward Modeling
Yunzhen Feng, Ariel Kwiatkowski, Kunhao Zheng et al.
HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
Runhui Huang, Xinpeng Ding, Chunwei Wang et al.
VisionArena: 230k Real World User-VLM Conversations with Preference Labels
Christopher Chou, Lisa Dunlap, Wei-Lin Chiang et al.
DVP-MVS: Synergize Depth-Edge and Visibility Prior for Multi-View Stereo
Zhenlong Yuan, Jinguo Luo, Fei Shen et al.
SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs
Mohammad Mozaffari, Amir Yazdanbakhsh, Zhao Zhang et al.
Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation
Abdelrahman Eldesokey, Peter Wonka
Rethinking Training for De-biasing Text-to-Image Generation: Unlocking the Potential of Stable Diffusion
Eunji Kim, Siwon Kim, Minjun Park et al.
Measuring Non-Adversarial Reproduction of Training Data in Large Language Models
Michael Aerni, Javier Rando, Edoardo Debenedetti et al.
Adding Conditional Control to Diffusion Models with Reinforcement Learning
Yulai Zhao, Masatoshi Uehara, Gabriele Scalia et al.
Efficient Inference for Large Language Model-based Generative Recommendation
Xinyu Lin, Chaoqun Yang, Wenjie Wang et al.
Temporal Query Network for Efficient Multivariate Time Series Forecasting
Shengsheng Lin, Haojun Chen, Haijie Wu et al.
What Makes a Maze Look Like a Maze?
Joy Hsu, Jiayuan Mao, Joshua B Tenenbaum et al.
Contextual Bandits for Unbounded Context Distributions
Puning Zhao, Rongfei Fan, Shaowei Wang et al.
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
Makoto Shing, Kou Misaki, Han Bao et al.
Consistent and Controllable Image Animation with Motion Diffusion Models
Xin Ma, Yaohui Wang, Gengyun Jia et al.
Adversarial Locomotion and Motion Imitation for Humanoid Policy Learning
Jiyuan Shi, Xinzhe Liu, Dewei Wang et al.
Event-based Video Super-Resolution via State Space Models
Zeyu Xiao, Xinchao Wang
Detecting High-Stakes Interactions with Activation Probes
Alex McKenzie, Urja Pawar, Phil Blandfort et al.
Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration
Hao Zhong, Muzhi Zhu, Zongze Du et al.
Ctrl-U: Robust Conditional Image Generation via Uncertainty-aware Reward Modeling
Guiyu Zhang, Huan-ang Gao, Zijian Jiang et al.
Bayesian scaling laws for in-context learning
Aryaman Arora, Dan Jurafsky, Christopher Potts et al.
Generative Photography: Scene-Consistent Camera Control for Realistic Text-to-Image Synthesis
Yu Yuan, Xijun Wang, Yichen Sheng et al.
RI-MAE: Rotation-Invariant Masked AutoEncoders for Self-Supervised Point Cloud Representation Learning
Kunming Su, Qiuxia Wu, Panpan Cai et al.
GS-LiDAR: Generating Realistic LiDAR Point Clouds with Panoramic Gaussian Splatting
Junzhe Jiang, Chun Gu, Yurui Chen et al.
Perception-as-Control: Fine-grained Controllable Image Animation with 3D-aware Motion Representation
Yingjie Chen, Yifang Men, Yuan Yao et al.
Sign-IDD: Iconicity Disentangled Diffusion for Sign Language Production
Shengeng Tang, Jiayi He, Dan Guo et al.
MemSim: A Bayesian Simulator for Evaluating Memory of LLM-based Personal Assistants
Zeyu Zhang, Quanyu Dai, Luyu Chen et al.
Referring to Any Person
Qing Jiang, Lin Wu, Zhaoyang Zeng et al.
Truncated Consistency Models
Sangyun Lee, Yilun Xu, Tomas Geffner et al.
MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models
Jingwei Xu, Junyu Lai, Yunpeng Huang
Eve: Efficient Multimodal Vision Language Models with Elastic Visual Experts
Miao Rang, Zhenni Bi, Chuanjian Liu et al.
SEC-bench: Automated Benchmarking of LLM Agents on Real-World Software Security Tasks
Hwiwon Lee, Ziqi Zhang, Hanxiao Lu et al.
Classical Planning with LLM-Generated Heuristics: Challenging the State of the Art with Python Code
Augusto B. Corrêa, André G. Pereira, Jendrik Seipp
Dynamic Loss-Based Sample Reweighting for Improved Large Language Model Pretraining
Daouda Sow, Herbert Woisetschläger, Saikiran Bulusu et al.
MindJourney: Test-Time Scaling with World Models for Spatial Reasoning
Yuncong Yang, Jiageng Liu, Zheyuan Zhang et al.
Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models
Fu-Yun Wang, Yunhao Shui, Jingtan Piao et al.
CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion
Yunlong Tang, Gen Zhan, Li Yang et al.
VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning
Xueqing Wu, Yuheng Ding, Bingxuan Li et al.
CoA-VLA: Improving Vision-Language-Action Models via Visual-Text Chain-of-Affordance
Jinming Li, Yichen Zhu, Zhibin Tang et al.
ShotAdapter: Text-to-Multi-Shot Video Generation with Diffusion Models
Ozgur Kara, Krishna Kumar Singh, Feng Liu et al.
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset
Yingzi Ma, Jiongxiao Wang, Fei Wang et al.
Ward: Provable RAG Dataset Inference via LLM Watermarks
Nikola Jovanović, Robin Staab, Maximilian Baader et al.
Causal Composition Diffusion Model for Closed-loop Traffic Generation
Haohong Lin, Xin Huang, Tung Phan-Minh et al.
UFM: A Simple Path towards Unified Dense Correspondence with Flow
Yuchen Zhang, Nikhil Keetha, Chenwei Lyu et al.
SALAD: Skeleton-aware Latent Diffusion for Text-driven Motion Generation and Editing
Seokhyeon Hong, Chaelin Kim, Serin Yoon et al.
An Engorgio Prompt Makes Large Language Model Babble on
Jianshuo Dong, Ziyuan Zhang, Qingjie Zhang et al.
Compression of 3D Gaussian Splatting with Optimized Feature Planes and Standard Video Codecs
Soonbin Lee, Fangwen Shu, Yago Sanchez de la Fuente et al.
UniNet: A Contrastive Learning-guided Unified Framework with Feature Selection for Anomaly Detection
Shun Wei, Jielin Jiang, Xiaolong Xu
Consistent Paths Lead to Truth: Self-Rewarding Reinforcement Learning for LLM Reasoning
Kongcheng Zhang, QI YAO, Shunyu Liu et al.
How Contaminated Is Your Benchmark? Measuring Dataset Leakage in Large Language Models with Kernel Divergence
Hyeong Kyu Choi, Maxim Khanov, Hongxin Wei et al.
AWRaCLe: All-Weather Image Restoration Using Visual In-Context Learning
Sudarshan Rajagopalan, Vishal M. Patel
Prompt-Reverse Inconsistency: LLM Self-Inconsistency Beyond Generative Randomness and Prompt Paraphrasing
Jihyun Janice Ahn, Wenpeng Yin
BadVLA: Towards Backdoor Attacks on Vision-Language-Action Models via Objective-Decoupled Optimization
Xueyang Zhou, Guiyao Tie, Guowen Zhang et al.
MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning
Yaming Yang, Dilxat Muhtar, Yelong Shen et al.
On the Relationship Between Monotone and Squared Probabilistic Circuits
Benjie Wang, Guy Van den Broeck
CEB: Compositional Evaluation Benchmark for Fairness in Large Language Models
Song Wang, Peng Wang, Tong Zhou et al.
Establishing Best Practices in Building Rigorous Agentic Benchmarks
Yuxuan Zhu, Tengjun Jin, Yada Pruksachatkun et al.
Exploring More from Multiple Gait Modalities for Human Identification
Dongyang Jin, Chao Fan, Weihua Chen et al.
HEROS-GAN: Honed-Energy Regularized and Optimal Supervised GAN for Enhancing Accuracy and Range of Low-Cost Accelerometers
Yifeng Wang, Yi Zhao
Trivialized Momentum Facilitates Diffusion Generative Modeling on Lie Groups
Yuchen Zhu, Tianrong Chen, Lingkai Kong et al.
Standardizing Structural Causal Models
Weronika Ormaniec, Scott Sussex, Lars Lorch et al.
Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo
Zachary Charles, Gabriel Teston, Lucio Dery et al.
MIRAGE: Evaluating and Explaining Inductive Reasoning Process in Language Models
Jiachun Li, Pengfei Cao, Zhuoran Jin et al.
Image Over Text: Transforming Formula Recognition Evaluation with Character Detection Matching
Bin Wang, Fan Wu, Linke Ouyang et al.
On a Connection Between Imitation Learning and RLHF
Teng Xiao, Yige Yuan, Mingxiao Li et al.
Mixture of Attentions For Speculative Decoding
Matthieu Zimmer, Milan Gritta, Gerasimos Lampouras et al.
EffiBench-X: A Multi-Language Benchmark for Measuring Efficiency of LLM-Generated Code
Yuhao Qing, Boyu Zhu, Mingzhe Du et al.
InsightBench: Evaluating Business Analytics Agents Through Multi-Step Insight Generation
Gaurav Sahu, Abhay Puri, Juan A. Rodriguez et al.
Detect Anything 3D in the Wild
Hanxue Zhang, Haoran Jiang, Qingsong Yao et al.
Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties
Gouki Minegishi, Hiroki Furuta, Takeshi Kojima et al.
Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints
Ming Dai, Jian Li, Jiedong Zhuang et al.
Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models
Thao Nguyen, Yang Li, Olga Golovneva et al.
Benchmarking LLMs' Judgments with No Gold Standard
Shengwei Xu, Yuxuan Lu, Grant Schoenebeck et al.
VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception
Ziang Yan, Yinan He, Xinhao Li et al.
Scaling Autonomous Agents via Automatic Reward Modeling And Planning
Zhenfang Chen, Delin Chen, Rui Sun et al.
LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos
Chin-Yang Lin, Cheng Sun, Fu-En Yang et al.
Let LRMs Break Free from Overthinking via Self-Braking Tuning
Haoran Zhao, Yuchen Yan, Yongliang Shen et al.
Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations
Ji-An Li, Huadong Xiong, Robert Wilson et al.
Small Batch Size Training for Language Models: When Vanilla SGD Works, and Why Gradient Accumulation is Wasteful
Martin Marek, Sanae Lotfi, Aditya Somasundaram et al.
Parameter and Memory Efficient Pretraining via Low-rank Riemannian Optimization
Zhanfeng Mo, Long-Kai Huang, Sinno Jialin Pan
Force Prompting: Video Generation Models Can Learn And Generalize Physics-based Control Signals
Nate Gillman, Charles Herrmann, Michael Freeman et al.
InstantSplamp: Fast and Generalizable Stenography Framework for Generative Gaussian Splatting
Chenxin Li, Hengyu Liu, Zhiwen Fan et al.
Mix Data or Merge Models? Balancing the Helpfulness, Honesty, and Harmlessness of Large Language Model via Model Merging
Jinluan Yang, Dingnan Jin, Anke Tang et al.
Efficient Residual Learning with Mixture-of-Experts for Universal Dexterous Grasping
Ziye Huang, Haoqi Yuan, Yuhui Fu et al.
Unlearning through Knowledge Overwriting: Reversible Federated Unlearning via Selective Sparse Adapter
Zhengyi Zhong, Weidong Bao, Ji Wang et al.
Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation
Yuheng Shi, Minjing Dong, Chang Xu
AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution
Fengyuan Liu, Nikhil Kandpal, Colin Raffel
ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments
Hojae Han, seung-won hwang, Rajhans Samdani et al.
ECVC: Exploiting Non-Local Correlations in Multiple Frames for Contextual Video Compression
Wei Jiang, Junru Li, Kai Zhang et al.
TANGO: Training-free Embodied AI Agents for Open-world Tasks
Filippo Ziliotto, Tommaso Campari, Luciano Serafini et al.
High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws
Muhammed Ildiz, Halil Gozeten, Ege Taga et al.
ASIGN: An Anatomy-aware Spatial Imputation Graphic Network for 3D Spatial Transcriptomics
Junchao Zhu, Ruining Deng, Tianyuan Yao et al.
Diffusion Model as a Noise-Aware Latent Reward Model for Step-Level Preference Optimization
Tao Zhang, Cheng Da, Kun Ding et al.
Fréchet Wavelet Distance: A Domain-Agnostic Metric for Image Generation
Lokesh Veeramacheneni, Moritz Wolter, Hilde Kuehne et al.
Neuroplastic Expansion in Deep Reinforcement Learning
Jiashun Liu, Johan S Obando Ceron, Aaron Courville et al.
Grounding Continuous Representations in Geometry: Equivariant Neural Fields
David Wessels, David Knigge, Riccardo Valperga et al.
MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation
Shuwei Shi, Biao Gong, Xi Chen et al.
Human Motion Instruction Tuning
Lei Li, Sen Jia, Jianhao Wang et al.
Robust Self-Paced Hashing for Cross-Modal Retrieval with Noisy Labels
Ruitao Pu, Yuan Sun, Yang Qin et al.
Puppeteer: Rig and Animate Your 3D Models
Chaoyue Song, Xiu Li, Fan Yang et al.
MBQ: Modality-Balanced Quantization for Large Vision-Language Models
Shiyao Li, Yingchun Hu, Xuefei Ning et al.
HOIGen-1M: A Large-scale Dataset for Human-Object Interaction Video Generation
Kun Liu, Qi Liu, Xinchen Liu et al.
C-CLIP: Multimodal Continual Learning for Vision-Language Model
Wenzhuo Liu, Fei Zhu, Longhui Wei et al.
Learning Transformer-based World Models with Contrastive Predictive Coding
Maxime Burchi, Radu Timofte
A Periodic Bayesian Flow for Material Generation
Hanlin Wu, Yuxuan Song, Jingjing Gong et al.
Do LLMs estimate uncertainty well in instruction-following?
Juyeon Heo, Miao Xiong, Christina Heinze-Deml et al.
Sum of Squares Circuits
Lorenzo Loconte, Stefan Mengel, Antonio Vergari
VPO: Aligning Text-to-Video Generation Models with Prompt Optimization
Jiale Cheng, Ruiliang Lyu, Xiaotao Gu et al.
Convergence of Score-Based Discrete Diffusion Models: A Discrete-Time Analysis
Zikun Zhang, Zixiang Chen, Quanquan Gu
Conformal Prediction for Causal Effects of Continuous Treatments
Maresa Schröder, Dennis Frauen, Jonas Schweisthal et al.
Aligning Vision to Language: Annotation-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning
Junming Liu, Siyuan Meng, Yanting Gao et al.
Surprising Effectiveness of pretraining Ternary Language Model at Scale
Ayush Kaushal, Tejas Vaidhya, Arnab Mondal et al.
AdaWM: Adaptive World Model based Planning for Autonomous Driving
Hang Wang, Xin Ye, Feng Tao et al.
A Unifying Framework for Representation Learning
Shaden Alshammari, John Hershey, Axel Feldmann et al.
Are Large Vision Language Models Good Game Players?
Xinyu Wang, Bohan Zhuang, Qi Wu
MapExpert: Online HD Map Construction with Simple and Efficient Sparse Map Element Expert
Dapeng Zhang, Dayu Chen, Peng Zhi et al.
TS-RAG: Retrieval-Augmented Generation based Time Series Foundation Models are Stronger Zero-Shot Forecaster
Kanghui Ning, Zijie Pan, Yu Liu et al.
Synthetic Data is an Elegant GIFT for Continual Vision-Language Models
Bin Wu, Wuxuan Shi, Jinqiao Wang et al.
KITS: Inductive Spatio-Temporal Kriging with Increment Training Strategy
Qianxiong Xu, Cheng Long, Ziyue Li et al.
ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large Language Models
Hao Yin, Guangzong Si, Zilei Wang
Faster Algorithms for Structured Linear and Kernel Support Vector Machines
Yuzhou Gu, Zhao Song, Lichen Zhang
RoboTron-Mani: All-in-One Multimodal Large Model for Robotic Manipulation
Feng yan, Fanfan Liu, Yiyang Huang et al.
PipeFusion: Patch-level Pipeline Parallelism for Diffusion Transformers Inference
Jiarui Fang, Jinzhe Pan, Aoyu Li et al.
AG-VPReID: A Challenging Large-Scale Benchmark for Aerial-Ground Video-based Person Re-Identification
Huy Nguyen, Kien Nguyen Thanh, Akila Pemasiri et al.
ACC-Collab: An Actor-Critic Approach to Multi-Agent LLM Collaboration
Andrew Estornell, Jean-Francois Ton, Yuanshun Yao et al.
From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit
Valérie Costa, Thomas Fel, Ekdeep S Lubana et al.
Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
Vaishnavh Nagarajan, Chen Wu, Charles Ding et al.
Enhancing Vision-Language Model Reliability with Uncertainty-Guided Dropout Decoding
Yixiong Fang, Ziran Yang, Zhaorun Chen et al.
On the Feature Learning in Diffusion Models
Andi Han, Wei Huang, Yuan Cao et al.
FedBiP: Heterogeneous One-Shot Federated Learning with Personalized Latent Diffusion Models
Haokun Chen, Hang Li, Yao Zhang et al.
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling
Jian Yang, Dacheng Yin, Yizhou Zhou et al.
MVTokenFlow: High-quality 4D Content Generation using Multiview Token Flow
Hanzhuo Huang, Yuan Liu, Ge Zheng et al.
OWLS: Scaling Laws for Multilingual Speech Recognition and Translation Models
William Chen, Jinchuan Tian, Yifan Peng et al.
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing
Ming Li, Xin Gu, Fan Chen et al.
RhythmMamba: Fast, Lightweight, and Accurate Remote Physiological Measurement
Bochao Zou, Zizheng Guo, Xiaocheng Hu et al.
ReasonGrounder: LVLM-Guided Hierarchical Feature Splatting for Open-Vocabulary 3D Visual Grounding and Reasoning
Zhenyang Liu, Yikai Wang, Sixiao Zheng et al.
P-SPIKESSM: HARNESSING PROBABILISTIC SPIKING STATE SPACE MODELS FOR LONG-RANGE DEPENDENCY TASKS
Malyaban Bal, Abhronil Sengupta
Patient-Level Anatomy Meets Scanning-Level Physics: Personalized Federated Low-Dose CT Denoising Empowered by Large Language Model
Ziyuan Yang, Yingyu Chen, Zhiwen Wang et al.
ACL: Activating Capability of Linear Attention for Image Restoration
Yubin Gu, Yuan Meng, Jiayi Ji et al.
Mobile Video Diffusion
Haitam Ben Yahia, Denis Korzhenkov, Ioannis Lelekas et al.
DreamDance: Animating Human Images by Enriching 3D Geometry Cues from 2D Poses
Yatian Pang, Bin Zhu, Bin Lin et al.
Repulsive Latent Score Distillation for Solving Inverse Problems
Nicolas Zilberstein, Morteza Mardani, Santiago Segarra
FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors
Yabo Zhang, xinpeng zhou, Yihan Zeng et al.
CLIMB: Data Foundations for Large Scale Multimodal Clinical Foundation Models
David Dai, Peilin Chen, Malinda Lu et al.
GeoBEV: Learning Geometric BEV Representation for Multi-view 3D Object Detection
Jinqing Zhang, Yanan Zhang, Yunlong Qi et al.
The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise
Yuanhao Ban, Ruochen Wang, Tianyi Zhou et al.
Adversarial Distribution Matching for Diffusion Distillation Towards Efficient Image and Video Synthesis
Yanzuo Lu, Yuxi Ren, Xin Xia et al.
ICE-Bench: A Unified and Comprehensive Benchmark for Image Creating and Editing
Yulin Pan, Xiangteng He, Chaojie Mao et al.
SmartEraser: Remove Anything from Images using Masked-Region Guidance
Longtao Jiang, Zhendong Wang, Jianmin Bao et al.
Multi-Teacher Knowledge Distillation with Reinforcement Learning for Visual Recognition
Chuanguang Yang, XinQiang Yu, Han Yang et al.
CLIMB-ReID: A Hybrid CLIP-Mamba Framework for Person Re-Identification
Chenyang Yu, Xuehu Liu, Jiawen Zhu et al.
SymmCompletion: High-Fidelity and High-Consistency Point Cloud Completion with Symmetry Guidance
Hongyu Yan, Zijun Li, Kunming Luo et al.
Flow matching achieves almost minimax optimal convergence
Kenji Fukumizu, Taiji Suzuki, Noboru Isobe et al.
OLiDM: Object-aware LiDAR Diffusion Models for Autonomous Driving
Tianyi Yan, Junbo Yin, Xianpeng Lang et al.
Quest: Query-centric Data Synthesis Approach for Long-context Scaling of Large Language Model
Chaochen Gao, Xing W, Qi Fu et al.
Searching Latent Program Spaces
Matthew Macfarlane, Clem Bonnet
CoT Red-Handed: Stress Testing Chain-of-Thought Monitoring
Benjamin Arnav, Pablo Bernabeu-Perez, Nathan Helm-Burger et al.
Vec2Face: Scaling Face Dataset Generation with Loosely Constrained Vectors
Haiyu Wu, Jaskirat Singh, Sicong Tian et al.
In Search of Adam’s Secret Sauce
Antonio Orvieto, Robert Gower
Equivariance Everywhere All At Once: A Recipe for Graph Foundation Models
Ben Finkelshtein, Ismail Ilkan Ceylan, Michael Bronstein et al.
TREAD: Token Routing for Efficient Architecture-agnostic Diffusion Training
Felix Krause, Timy Phan, Ming Gui et al.
Backdoor Attacks Against No-Reference Image Quality Assessment Models via a Scalable Trigger
Yi Yu, Song Xia, Xun Lin et al.
Heterogeneous Swarms: Jointly Optimizing Model Roles and Weights for Multi-LLM Systems
Shangbin Feng, Zifeng Wang, Palash Goyal et al.
Consistent Flow Distillation for Text-to-3D Generation
runjie yan, Yinbo Chen, Xiaolong Wang
Can Watermarked LLMs be Identified by Users via Crafted Prompts?
Aiwei Liu, Sheng Guan, Yiming Liu et al.
STEREO: A Two-Stage Framework for Adversarially Robust Concept Erasing from Text-to-Image Diffusion Models
Koushik Srivatsan, Fahad Shamshad, Muzammal Naseer et al.
Ambient Diffusion Omni: Training Good Models with Bad Data
Giannis Daras, Adrian Rodriguez-Munoz, Adam Klivans et al.
Revisiting Zeroth-Order Optimization: Minimum-Variance Two-Point Estimators and Directionally Aligned Perturbations
Shaocong Ma, Heng Huang
Debiased All-in-one Image Restoration with Task Uncertainty Regularization
Gang Wu, Junjun Jiang, Yijun Wang et al.
Unsupervised Foundation Model-Agnostic Slide-Level Representation Learning
Tim Lenz, Peter Neidlinger, Marta Ligero et al.
Yuan: Yielding Unblemished Aesthetics Through a Unified Network for Visual Imperfections Removal in Generated Images
Zhenyu Yu, Chee Seng Chan
OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Xize Cheng, Siqi Zheng, zehan wang et al.
Exploring the limits of strong membership inference attacks on large language models
Jamie Hayes, I Shumailov, Christopher A. Choquette-Choo et al.
NAVIX: Scaling MiniGrid Environments with JAX
Eduardo Pignatelli, Jarek Liesen, Robert Lange et al.
Panacea: Mitigating Harmful Fine-tuning for Large Language Models via Post-fine-tuning Perturbation
Yibo Wang, Tiansheng Huang, Li Shen et al.
How to Synthesize Text Data without Model Collapse?
Xuekai Zhu, Daixuan Cheng, Hengli Li et al.
CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation
Wei Chen, Lin Li, Yongqi Yang et al.
Generalized Principal-Agent Problem with a Learning Agent
Tao Lin, Yiling Chen
RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation
Mingfei Han, Liang Ma, Kamila Zhumakhanova et al.
Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering
Xingrui Wang, Wufei Ma, Angtian Wang et al.
AdaManip: Adaptive Articulated Object Manipulation Environments and Policy Learning
Yuanfei Wang, Xiaojie Zhang, Ruihai Wu et al.
GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs
Advik Basani, Xiao Zhang
Image Generation Diversity Issues and How to Tame Them
Mischa Dombrowski, Weitong Zhang, Hadrien Reynaud et al.
Coreset Selection via Reducible Loss in Continual Learning
Ruilin Tong, Yuhang Liu, Javen Qinfeng Shi et al.