Most Cited ICLR "human-centric forgery detection" Papers
6,124 papers found • Page 3 of 31
Conference
Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control
Longtao Zheng, Rundong Wang, Xinrun Wang et al.
Autoregressive Video Generation without Vector Quantization
Haoge Deng, Ting Pan, Haiwen Diao et al.
Unpaired Image-to-Image Translation via Neural Schrödinger Bridge
Beomsu Kim, Gihyun Kwon, Kwanyoung Kim et al.
On the self-verification limitations of large language models on reasoning and planning tasks
Kaya Stechly, Karthik Valmeekam, Subbarao Kambhampati
#InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of Large Language Models
Keming Lu, Hongyi Yuan, Zheng Yuan et al.
SV4D: Dynamic 3D Content Generation with Multi-Frame and Multi-View Consistency
Yiming Xie, Chun-Han Yao, Vikram Voleti et al.
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization
Weiran Yao, Shelby Heinecke, Juan Carlos Niebles et al.
A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models
Haoran Xu, Young Jin Kim, Amr Mohamed Nabil Aly Aly Sharaf et al.
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?
John Yang, Carlos E Jimenez, Alex Zhang et al.
OmniRe: Omni Urban Scene Reconstruction
Ziyu Chen, Jiawei Yang, Jiahui Huang et al.
Agent S: An Open Agentic Framework that Uses Computers Like a Human
Saaket Agashe, Jiuzhou Han, Shuyu Gan et al.
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs
Angelica Chen, Ravid Shwartz-Ziv, Kyunghyun Cho et al.
Fine-Tuned Language Models Generate Stable Inorganic Materials as Text
Nate Gruver, Anuroop Sriram, Andrea Madotto et al.
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
Sam Toyer, Olivia Watkins, Ethan Mendes et al.
Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations
Litu Rout, Yujia Chen, Nataniel Ruiz et al.
Consistent4D: Consistent 360° Dynamic Object Generation from Monocular Video
Yanqin Jiang, Li Zhang, Jin Gao et al.
MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance
Xierui Wang, Siming Fu, Qihan Huang et al.
VideoPhy: Evaluating Physical Commonsense for Video Generation
Hritik Bansal, Zongyu Lin, Tianyi Xie et al.
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai, Enxin Song, Yilun Du et al.
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search
Yuchen Zhuang, Xiang Chen, Tong Yu et al.
CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets
Lifan Yuan, Yangyi Chen, Xingyao Wang et al.
RegMix: Data Mixture as Regression for Language Model Pre-training
Qian Liu, Xiaosen Zheng, Niklas Muennighoff et al.
HIFA: High-fidelity Text-to-3D Generation with Advanced Diffusion Guidance
Junzhe Zhu, Peiye Zhuang, Sanmi Koyejo
Large Language Models as Generalizable Policies for Embodied Tasks
Andrew Szot, Max Schwarzer, Harsh Agrawal et al.
Conformal Language Modeling
Victor Quach, Adam Fisch, Tal Schuster et al.
Knowledge Fusion of Large Language Models
Fanqi Wan, Xinting Huang, Deng Cai et al.
Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG
Bowen Jin, Jinsung Yoon, Jiawei Han et al.
ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models
Iman Mirzadeh, Keivan Alizadeh-Vahid, Sachin Mehta et al.
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training
Dawei Zhu, Nan Yang, Liang Wang et al.
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs
Yushi Bai, Jiajie Zhang, Xin Lv et al.
BadEdit: Backdooring Large Language Models by Model Editing
Yanzhou Li, Tianlin Li, Kangjie Chen et al.
Vision-by-Language for Training-Free Compositional Image Retrieval
Shyamgopal Karthik, Karsten Roth, Massimiliano Mancini et al.
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models
Gen Luo, Yiyi Zhou, Yuxin Zhang et al.
VDT: General-purpose Video Diffusion Transformers via Mask Modeling
Haoyu Lu, Guoxing Yang, Nanyi Fei et al.
UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition
Wenxuan Zhou, Sheng Zhang, Yu Gu et al.
Universal Humanoid Motion Representations for Physics-Based Control
Zhengyi Luo, Jinkun Cao, Josh Merel et al.
MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers
Yiwen Chen, Tong He, Di Huang et al.
Not All Language Model Features Are One-Dimensionally Linear
Josh Engels, Eric Michaud, Isaac Liao et al.
Unified Human-Scene Interaction via Prompted Chain-of-Contacts
Zeqi Xiao, Tai Wang, Jingbo Wang et al.
Tag2Text: Guiding Vision-Language Model via Image Tagging
Xinyu Huang, Youcai Zhang, Jinyu Ma et al.
The Expressive Power of Low-Rank Adaptation
Yuchen Zeng, Kangwook Lee
Language Model Cascades: Token-Level Uncertainty And Beyond
Neha Gupta, Harikrishna Narasimhan, Wittawat Jitkrittum et al.
Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
Tian Ye, Zicheng Xu, Yuanzhi Li et al.
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer
Haotian Tang, Yecheng Wu, Shang Yang et al.
Circuit Component Reuse Across Tasks in Transformer Language Models
Jack Merullo, Carsten Eickhoff, Ellie Pavlick
Consistency Models Made Easy
Zhengyang Geng, Ashwini Pokle, Weijian Luo et al.
DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models
Yongchan Kwon, Eric Wu, Kevin Wu et al.
Towards image compression with perfect realism at ultra-low bitrates
Marlene Careil, Matthew J Muckley, Jakob Verbeek et al.
Decoding Natural Images from EEG for Object Recognition
Yonghao Song, Bingchuan Liu, Xiang Li et al.
Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF
Anand Siththaranjan, Cassidy Laidlaw, Dylan Hadfield-Menell
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
Nikhil Prakash, Tamar Shaham, Tal Haklay et al.
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization
Weiyang Liu, Zeju Qiu, Yao Feng et al.
SVDQuant: Absorbing Outliers by Low-Rank Component for 4-Bit Diffusion Models
Muyang Li, Yujun Lin, Zhekai Zhang et al.
Rethinking Model Ensemble in Transfer-based Adversarial Attacks
Huanran Chen, Yichi Zhang, Yinpeng Dong et al.
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
Xichen Pan, Li Dong, Shaohan Huang et al.
DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation
Bowen Yin, Xuying Zhang, Zhong-Yu Li et al.
HyperAttention: Long-context Attention in Near-Linear Time
Insu Han, Rajesh Jayaram, Amin Karbasi et al.
Adam-mini: Use Fewer Learning Rates To Gain More
Yushun Zhang, Congliang Chen, Ziniu Li et al.
When Attention Sink Emerges in Language Models: An Empirical View
Xiangming Gu, Tianyu Pang, Chao Du et al.
Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks
Samyak Jain, Robert Kirk, Ekdeep Singh Lubana et al.
A Semantic Invariant Robust Watermark for Large Language Models
Aiwei Liu, Leyi Pan, Xuming Hu et al.
ARGS: Alignment as Reward-Guided Search
Maxim Khanov, Jirayu Burapacheep, Yixuan Li
CADS: Unleashing the Diversity of Diffusion Models through Condition-Annealed Sampling
Seyedmorteza Sadat, Jakob Buhmann, Derek Bradley et al.
Noise-free Score Distillation
Oren Katzir, Or Patashnik, Daniel Cohen-Or et al.
LVSM: A Large View Synthesis Model with Minimal 3D Inductive Bias
Haian Jin, Hanwen Jiang, Hao Tan et al.
CBraMod: A Criss-Cross Brain Foundation Model for EEG Decoding
Jiquan Wang, Sha Zhao, Zhiling Luo et al.
At Which Training Stage Does Code Data Help LLMs Reasoning?
ma yingwei, Yue Liu, Yue Yu et al.
Deconstructing Denoising Diffusion Models for Self-Supervised Learning
Xinlei Chen, Zhuang Liu, Saining Xie et al.
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
Jintao Zhang, Jia wei, Pengle Zhang et al.
Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency
Jianwen Jiang, Chao Liang, Jiaqi Yang et al.
Label-free Node Classification on Graphs with Large Language Models (LLMs)
Zhikai Chen, Haitao Mao, Hongzhi Wen et al.
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
Yuang Peng, Yuxin Cui, Haomiao Tang et al.
MiniLLM: Knowledge Distillation of Large Language Models
Yuxian Gu, Li Dong, Furu Wei et al.
Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation
Xinyu Tang, Richard Shin, Huseyin Inan et al.
Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs
Minh Nguyen, Andrew Baker, Clement Neo et al.
ColPali: Efficient Document Retrieval with Vision Language Models
Manuel Faysse, Hugues Sibille, Tony Wu et al.
AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation
Yuning Cui, Syed Waqas Zamir, Salman Khan et al.
Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion
Lunjun Zhang, Yuwen Xiong, Ze Yang et al.
FeatUp: A Model-Agnostic Framework for Features at Any Resolution
Stephanie Fu, Mark Hamilton, Laura E. Brandt et al.
Improved sampling via learned diffusions
Lorenz Richter, Julius Berner
TimeMixer++: A General Time Series Pattern Machine for Universal Predictive Analysis
Shiyu Wang, Jiawei LI, Xiaoming Shi et al.
Unbiased Watermark for Large Language Models
Zhengmian Hu, Lichang Chen, Xidong Wu et al.
Consistency-guided Prompt Learning for Vision-Language Models
Shuvendu Roy, Ali Etemad
Brain decoding: toward real-time reconstruction of visual perception
Yohann Benchetrit, Hubert Banville, Jean-Remi King
Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation
Yang Tian, Sizhe Yang, Jia Zeng et al.
An Extensible Framework for Open Heterogeneous Collaborative Perception
Yifan Lu, Yue Hu, Yiqi Zhong et al.
Bayesian Low-rank Adaptation for Large Language Models
Adam Yang, Maxime Robeyns, Xi Wang et al.
Kolmogorov-Arnold Transformer
Xingyi Yang, Xinchao Wang
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision
Cong Wei, Zheyang Xiong, Weiming Ren et al.
DynaMath: A Dynamic Visual Benchmark for Evaluating Mathematical Reasoning Robustness of Vision Language Models
Chengke Zou, Xingang Guo, Rui Yang et al.
Learning Delays in Spiking Neural Networks using Dilated Convolutions with Learnable Spacings
Ilyass Hammouamri, Ismail Khalfaoui Hassani, Timothée Masquelier
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry
Michael Zhang, Kush Bhatia, Hermann Kumbong et al.
Motif: Intrinsic Motivation from Artificial Intelligence Feedback
Martin Klissarov, Pierluca D'Oro, Shagun Sodhani et al.
GeoLLM: Extracting Geospatial Knowledge from Large Language Models
Rohin Manvi, Samar Khanna, Gengchen Mai et al.
CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity
Aditya Bhatt, Daniel Palenicek, Boris Belousov et al.
Training Socially Aligned Language Models on Simulated Social Interactions
Ruibo Liu, Ruixin Yang, Chenyan Jia et al.
Turning large language models into cognitive models
Marcel Binz, Eric Schulz
Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning
Yiwei Li, Peiwen Yuan, Shaoxiong Feng et al.
OGBench: Benchmarking Offline Goal-Conditioned RL
Seohong Park, Kevin Frans, Benjamin Eysenbach et al.
Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis
Zhenhui Ye, Tianyun Zhong, Yi Ren et al.
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions
Juncheng Li, Kaihang Pan, Zhiqi Ge et al.
Lemur: Harmonizing Natural Language and Code for Language Agents
Yiheng Xu, Hongjin SU, Chen Xing et al.
Making Text Embedders Few-Shot Learners
Chaofan Li, Minghao Qin, Shitao Xiao et al.
How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?
Jingfeng Wu, Difan Zou, Zixiang Chen et al.
SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore
Sewon Min, Suchin Gururangan, Eric Wallace et al.
Unlocking Guidance for Discrete State-Space Diffusion and Flow Models
Hunter Nisonoff, Junhao Xiong, Stephan Allenspach et al.
LiveBench: A Challenging, Contamination-Limited LLM Benchmark
Colin White, Samuel Dooley, Manley Roberts et al.
Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM
Eliya Nachmani, Alon Levkovitch, Roy Hirsch et al.
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs
jiarui zhang, Mahyar Khayatkhoei, Prateek Chhikara et al.
KoLA: Carefully Benchmarking World Knowledge of Large Language Models
Jifan Yu, Xiaozhi Wang, Shangqing Tu et al.
GraphRouter: A Graph-based Router for LLM Selections
Tao Feng, Yanzhen Shen, Jiaxuan You
Entropy is not Enough for Test-Time Adaptation: From the Perspective of Disentangled Factors
Jonghyun Lee, Dahuin Jung, Saehyung Lee et al.
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Weifeng Lin, Xinyu Wei, Ruichuan An et al.
Finetuning Text-to-Image Diffusion Models for Fairness
Xudong Shen, Chao Du, Tianyu Pang et al.
Detecting, Explaining, and Mitigating Memorization in Diffusion Models
Yuxin Wen, Yuchen Liu, Chen Chen et al.
Batch Calibration: Rethinking Calibration for In-Context Learning and Prompt Engineering
Han Zhou, Xingchen Wan, Lev Proleev et al.
Vision-LSTM: xLSTM as Generic Vision Backbone
Benedikt Alkin, Maximilian Beck, Korbinian Pöppel et al.
Amortizing intractable inference in large language models
Edward Hu, Moksh Jain, Eric Elmoznino et al.
MM-EMBED: UNIVERSAL MULTIMODAL RETRIEVAL WITH MULTIMODAL LLMS
Sheng-Chieh Lin, Chankyu Lee, Mohammad Shoeybi et al.
Safety Layers in Aligned Large Language Models: The Key to LLM Security
Shen Li, Liuyi Yao, Lan Zhang et al.
Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances
Shilin Lu, Zihan Zhou, Jiayou Lu et al.
MotionClone: Training-Free Motion Cloning for Controllable Video Generation
Pengyang Ling, Jiazi Bu, Pan Zhang et al.
Neural Common Neighbor with Completion for Link Prediction
Xiyuan Wang, Haotong Yang, Muhan Zhang
AHA: A Vision-Language-Model for Detecting and Reasoning Over Failures in Robotic Manipulation
Jiafei Duan, Wilbert Pumacay, Nishanth Kumar et al.
Programming Refusal with Conditional Activation Steering
Bruce W. Lee, Inkit Padhi, Karthikeyan Natesan Ramamurthy et al.
Large-scale Training of Foundation Models for Wearable Biosignals
Salar Abbaspourazad, Oussama Elachqar, Andrew Miller et al.
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models
Javier Ferrando, Oscar Obeso, Senthooran Rajamanoharan et al.
Improved Techniques for Optimization-Based Jailbreaking on Large Language Models
Xiaojun Jia, Tianyu Pang, Chao Du et al.
LQ-LoRA: Low-rank plus Quantized Matrix Decomposition for Efficient Language Model Finetuning
Han Guo, Philip Greengard, Eric Xing et al.
Human Feedback is not Gold Standard
Tom Hosking, Phil Blunsom, Max Bartolo
BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models
Zhen Xiang, Fengqing Jiang, Zidi Xiong et al.
Controlling Vision-Language Models for Multi-Task Image Restoration
Ziwei Luo, Fredrik K. Gustafsson, Zheng Zhao et al.
SuRe: Summarizing Retrievals using Answer Candidates for Open-domain QA of LLMs
Jaehyung Kim, Jaehyun Nam, Sangwoo Mo et al.
AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?
Qi Zhao, Shijie Wang, Ce Zhang et al.
Beyond Autoregression: Discrete Diffusion for Complex Reasoning and Planning
Jiacheng Ye, Jiahui Gao, Shansan Gong et al.
Training-free Camera Control for Video Generation
Chen Hou, Zhibo Chen
MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models
Peng Xia, Kangyu Zhu, Haoran Li et al.
Skeleton-of-Thought: Prompting LLMs for Efficient Parallel Generation
Xuefei Ning, Zinan Lin, Zixuan Zhou et al.
CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models
Hyungjin Chung, Jeongsol Kim, Geon Yeong Park et al.
CARD: Channel Aligned Robust Blend Transformer for Time Series Forecasting
xue wang, Tian Zhou, Qingsong Wen et al.
ClimODE: Climate and Weather Forecasting with Physics-informed Neural ODEs
Yogesh Verma, Markus Heinonen, Vikas Garg
Soft Merging of Experts with Adaptive Routing
Haokun Liu, Muqeeth Mohammed, Colin Raffel
Diffusion-Based Planning for Autonomous Driving with Flexible Guidance
Yinan Zheng, Ruiming Liang, Kexin ZHENG et al.
PB-LLM: Partially Binarized Large Language Models
Zhihang Yuan, Yuzhang Shang, Zhen Dong
Towards Foundation Models for Knowledge Graph Reasoning
Mikhail Galkin, Xinyu Yuan, Hesham Mostafa et al.
In-Context Pretraining: Language Modeling Beyond Document Boundaries
Weijia Shi, Sewon Min, Maria Lomeli et al.
Real-Time Video Generation with Pyramid Attention Broadcast
Xuanlei Zhao, Xiaolong Jin, Kai Wang et al.
Convolution Meets LoRA: Parameter Efficient Finetuning for Segment Anything Model
Zihan Zhong, Zhiqiang Tang, Tong He et al.
Language Model Self-improvement by Reinforcement Learning Contemplation
Jing-Cheng Pang, Pengyuan Wang, Kaiyuan Li et al.
Dissecting Adversarial Robustness of Multimodal LM Agents
Chen Wu, Rishi Shah, Jing Yu Koh et al.
A Benchmark for Learning to Translate a New Language from One Grammar Book
Garrett Tanzer, Mirac Suzgun, Eline Visser et al.
D-FINE: Redefine Regression Task of DETRs as Fine-grained Distribution Refinement
Yansong Peng, Hebei Li, Peixi Wu et al.
DyVal: Dynamic Evaluation of Large Language Models for Reasoning Tasks
Kaijie Zhu, Jiaao Chen, Jindong Wang et al.
Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation
Jiahao Cui, Hui Li, Yao Yao et al.
InfoBatch: Lossless Training Speed Up by Unbiased Dynamic Data Pruning
Ziheng Qin, Kai Wang, Zangwei Zheng et al.
Improving Instruction-Following in Language Models through Activation Steering
Alessandro Stolfo, Vidhisha Balachandran, Safoora Yousefi et al.
MMTEB: Massive Multilingual Text Embedding Benchmark
Kenneth Enevoldsen, Isaac Chung, Imene Kerboua et al.
DeepZero: Scaling Up Zeroth-Order Optimization for Deep Model Training
AOCHUAN CHEN, Yimeng Zhang, Jinghan Jia et al.
Generalization v.s. Memorization: Tracing Language Models’ Capabilities Back to Pretraining Data
Xinyi Wang, Antonis Antoniades, Yanai Elazar et al.
Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF
Tengyang Xie, Dylan Foster, Akshay Krishnamurthy et al.
Linear attention is (maybe) all you need (to understand Transformer optimization)
Kwangjun Ahn, Xiang Cheng, Minhak Song et al.
Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization
Yang Jin, Kun Xu, Kun Xu et al.
On the Stability of Iterative Retraining of Generative Models on their own Data
Quentin Bertrand, Joey Bose, Alexandre Duplessis et al.
How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions
Lorenzo Pacchiardi, Alex Chan, Sören Mindermann et al.
MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine
Renrui Zhang, Xinyu Wei, Dongzhi Jiang et al.
Towards 3D Molecule-Text Interpretation in Language Models
Sihang Li, Zhiyuan Liu, Yanchen Luo et al.
Curiosity-driven Red-teaming for Large Language Models
Zhang-Wei Hong, Idan Shenfeld, Johnson (Tsun-Hsuan) Wang et al.
Eliciting Human Preferences with Language Models
Belinda Li, Alex Tamkin, Noah Goodman et al.
Language models scale reliably with over-training and on downstream tasks
Samir Yitzhak Gadre, Georgios Smyrnis, Vaishaal Shankar et al.
Reasoning with Latent Thoughts: On the Power of Looped Transformers
Nikunj Saunshi, Nishanth Dikkala, Zhiyuan Li et al.
Learning to Act without Actions
Dominik Schmidt, Minqi Jiang
Language Models Learn to Mislead Humans via RLHF
Jiaxin Wen, Ruiqi Zhong, Akbir Khan et al.
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
Zilong (Ryan) Wang, Zifeng Wang, Long Le et al.
Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment
Utkarsh Kumar Mall, Cheng Perng Phoo, Meilin Liu et al.
RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation
Sergio Gómez Colmenarejo, Jost Springenberg, Jose Enrique Chen et al.
Multiscale Positive-Unlabeled Detection of AI-Generated Texts
Yuchuan Tian, Hanting Chen, Xutao Wang et al.
DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D Generation
Yukun Huang, Jianan Wang, Yukai Shi et al.
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages
Jinyi Hu, Yuan Yao, Chongyi Wang et al.
Elucidating the Exposure Bias in Diffusion Models
Mang Ning, Mingxiao Li, Jianlin Su et al.
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy
Pingzhi Li, Zhenyu Zhang, Prateek Yadav et al.
Robustness of AI-Image Detectors: Fundamental Limits and Practical Attacks
Mehrdad Saberi, Vinu Sankar Sadasivan, Keivan Rezaei et al.
Single Motion Diffusion
Sigal Raab, Inbal Leibovitch, Guy Tevet et al.
FakeShield: Explainable Image Forgery Detection and Localization via Multi-modal Large Language Models
Zhipei Xu, Xuanyu Zhang, Runyi Li et al.
How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations
Tianyu Guo, Wei Hu, Song Mei et al.
Model Merging by Uncertainty-Based Gradient Matching
Nico Daheim, Thomas Möllenhoff, Edoardo M. Ponti et al.
LLM-grounded Video Diffusion Models
Long Lian, Baifeng Shi, Adam Yala et al.
Enhancing End-to-End Autonomous Driving with Latent World Model
Yingyan Li, Lue Fan, Jiawei He et al.
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning
Bahare Fatemi, Seyed Mehran Kazemi, Anton Tsitsulin et al.
InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales
Zhepei Wei, Wei-Lin Chen, Yu Meng
Simple Guidance Mechanisms for Discrete Diffusion Models
Yair Schiff, Subham Sahoo, Hao Phung et al.
Round and Round We Go! What makes Rotary Positional Encodings useful?
Federico Barbero, Alex Vitvitskyi, Christos Perivolaropoulos et al.
Fine-tuning can cripple your foundation model; preserving features may be the solution
Philip Torr, Puneet Dokania, Jishnu Mukhoti et al.
Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks
Murtaza Dalal, Tarun Chiruvolu, Devendra Chaplot et al.
MaskBit: Embedding-free Image Generation via Bit Tokens
Mark Weber, Lijun Yu, Qihang Yu et al.
Confronting Reward Model Overoptimization with Constrained RLHF
Ted Moskovitz, Aaditya Singh, DJ Strouse et al.
HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion
Xian Liu, Jian Ren, Aliaksandr Siarohin et al.
METRA: Scalable Unsupervised RL with Metric-Aware Abstraction
Seohong Park, Oleh Rybkin, Sergey Levine
HAMSTER: Hierarchical Action Models for Open-World Robot Manipulation
Yi Li, Yuquan Deng, Jesse Zhang et al.
LLM-CXR: Instruction-Finetuned LLM for CXR Image Understanding and Generation
Suhyeon Lee, Won Jun Kim, Jinho Chang et al.
Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions
Satwik Bhattamishra, Arkil Patel, Phil Blunsom et al.