Most Cited 2024 "stochastic submodular rewards" Papers
12,324 papers found • Page 4 of 62
Conference
SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing
Zhecheng Wang, Rajanie Prabha, Tianyuan Huang et al.
HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting
Hongyu Zhou, Jiahao Shao, Lu Xu et al.
A Minimaximalist Approach to Reinforcement Learning from Human Feedback
Gokul Swamy, Christoph Dann, Rahul Kidambi et al.
The Pitfalls of Next-Token Prediction
Gregor Bachmann, Vaishnavh Nagarajan
UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
Cong Wei, Yang Chen, Haonan Chen et al.
DisCo: Disentangled Control for Realistic Human Dance Generation
Tan Wang, Linjie Li, Kevin Lin et al.
Probing the 3D Awareness of Visual Foundation Models
Mohamed El Banani, Amit Raj, Kevis-kokitsi Maninis et al.
Zipformer: A faster and better encoder for automatic speech recognition
Zengwei Yao, Liyong Guo, Xiaoyu Yang et al.
PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation
Haibo Jin, Haoxuan Che, Yi Lin et al.
Large Language Models as Analogical Reasoners
Michihiro Yasunaga, Xinyun Chen, Yujia Li et al.
Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning
Juan Rocamonde, Victoriano Montesinos, Elvis Nava et al.
Mixture of LoRA Experts
xun wu, Shaohan Huang, Furu Wei
DiffusionSat: A Generative Foundation Model for Satellite Imagery
Samar Khanna, Patrick Liu, Linqi Zhou et al.
Language Agents with Reinforcement Learning for Strategic Play in the Werewolf Game
Zelai Xu, Chao Yu, Fei Fang et al.
SparseTSF: Modeling Long-term Time Series Forecasting with *1k* Parameters
Shengsheng Lin, Weiwei Lin, Wentai Wu et al.
SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM
Mingrui Li, Shuhong Liu, Heng Zhou et al.
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Chengshu Li, Jacky Liang, Andy Zeng et al.
Denoising Diffusion Bridge Models
Linqi Zhou, Aaron Lou, Samar Khanna et al.
ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
Yifei Zhou, Andrea Zanette, Jiayi Pan et al.
Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation
Mukul Khanna, Yongsen Mao, Hanxiao Jiang et al.
Rich Human Feedback for Text-to-Image Generation
Youwei Liang, Junfeng He, Gang Li et al.
Paying More Attention to Images: A Training-Free Method for Alleviating Hallucination in LVLMs
Shi Liu, Kecheng Zheng, Wei Chen
LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model
Dilxat Muhtar, Zhenshi Li, Feng Gu et al.
Harnessing Explanations: LLM-to-LM Interpreter for Enhanced Text-Attributed Graph Representation Learning
Xiaoxin He, Xavier Bresson, Thomas Laurent et al.
ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions
Chunlong Xia, Xinliang Wang, Feng Lv et al.
VLP: Vision Language Planning for Autonomous Driving
Chenbin Pan, Burhan Yaman, Tommaso Nesti et al.
Prompting4Debugging: Red-Teaming Text-to-Image Diffusion Models by Finding Problematic Prompts
Zhi-Yi Chin, Chieh Ming Jiang, Ching-Chun Huang et al.
Task Contamination: Language Models May Not Be Few-Shot Anymore
Changmao Li, Jeffrey Flanigan
XCube: Large-Scale 3D Generative Modeling using Sparse Voxel Hierarchies
Xuanchi Ren, Jiahui Huang, Xiaohui Zeng et al.
SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research
Liangtai Sun, Yang Han, Zihan Zhao et al.
In-context Autoencoder for Context Compression in a Large Language Model
Tao Ge, Hu Jing, Lei Wang et al.
Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching
Yang Liu, Muzhi Zhu, Hengtao Li et al.
An LLM can Fool Itself: A Prompt-Based Adversarial Attack
Xilie Xu, Keyi Kong, Ning Liu et al.
LEDITS++: Limitless Image Editing using Text-to-Image Models
Manuel Brack, Felix Friedrich, Katharina Kornmeier et al.
MogaNet: Multi-order Gated Aggregation Network
Siyuan Li, Zedong Wang, Zicheng Liu et al.
GART: Gaussian Articulated Template Models
Jiahui Lei, Yufu Wang, Georgios Pavlakos et al.
SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding
Baoxiong Jia, Yixin Chen, Huangyue Yu et al.
LongVLM: Efficient Long Video Understanding via Large Language Models
Yuetian Weng, Mingfei Han, Haoyu He et al.
OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text
Keiran Paster, Marco Dos Santos, Zhangir Azerbayev et al.
SCTNet: Single Branch CNN with Transformer Semantic Information for Real-Time Segmentation
Authors: Zhengze Xu, Dongyue Wu, Changqian Yu et al.
Spike-driven Transformer V2: Meta Spiking Neural Network Architecture Inspiring the Design of Next-generation Neuromorphic Chips
Man Yao, Jiakui Hu, Tianxiang Hu et al.
Learning Performance-Improving Code Edits
Alexander Shypula, Aman Madaan, Yimeng Zeng et al.
DistillSpec: Improving Speculative Decoding via Knowledge Distillation
Yongchao Zhou, Kaifeng Lyu, Ankit Singh Rawat et al.
Text-IF: Leveraging Semantic Text Guidance for Degradation-Aware and Interactive Image Fusion
Xunpeng Yi, Han Xu, HAO ZHANG et al.
EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
Tai Wang, Xiaohan Mao, Chenming Zhu et al.
NeuRAD: Neural Rendering for Autonomous Driving
Adam Tonderski, Carl Lindström, Georg Hess et al.
Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation
Jaemin Cho, Yushi Hu, Jason Baldridge et al.
GSVA: Generalized Segmentation via Multimodal Large Language Models
Zhuofan Xia, Dongchen Han, Yizeng Han et al.
Relightable Gaussian Codec Avatars
Shunsuke Saito, Gabriel Schwartz, Tomas Simon et al.
ST-LLM: Large Language Models Are Effective Temporal Learners
Ruyang Liu, Chen Li, Haoran Tang et al.
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data
Qifan Yu, Juncheng Li, Longhui Wei et al.
Manifold Preserving Guided Diffusion
Yutong He, Naoki Murata, Chieh-Hsin Lai et al.
SAM-6D: Segment Anything Model Meets Zero-Shot 6D Object Pose Estimation
Jiehong Lin, lihua liu, Dekun Lu et al.
Behavior Generation with Latent Actions
Seungjae Lee, Yibin Wang, Haritheja Etukuru et al.
Large Language Model Cascades with Mixture of Thought Representations for Cost-Efficient Reasoning
Murong Yue, Jie Zhao, Min Zhang et al.
Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing
Bingyan Liu, Chengyu Wang, Tingfeng Cao et al.
Generalized Predictive Model for Autonomous Driving
Jiazhi Yang, Shenyuan Gao, Yihang Qiu et al.
QuRating: Selecting High-Quality Data for Training Language Models
Alexander Wettig, Aatmik Gupta, Saumya Malik et al.
The Illusion of State in State-Space Models
William Merrill, Jackson Petty, Ashish Sabharwal
AM-RADIO: Agglomerative Vision Foundation Model Reduce All Domains Into One
Mike Ranzinger, Greg Heinrich, Jan Kautz et al.
Dolphins: Multimodal Language Model for Driving
Yingzi Ma, Yulong Cao, Jiachen Sun et al.
Real-IAD: A Real-World Multi-View Dataset for Benchmarking Versatile Industrial Anomaly Detection
Chengjie Wang, wenbing zhu, Bin-Bin Gao et al.
TSLANet: Rethinking Transformers for Time Series Representation Learning
Emadeldeen Eldele, Mohamed Ragab, Zhenghua Chen et al.
SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
Feng Wang, Jieru Mei, Alan Yuille
Cache Me if You Can: Accelerating Diffusion Models through Block Caching
Felix Wimbauer, Bichen Wu, Edgar Schoenfeld et al.
What's In My Big Data?
Yanai Elazar, Akshita Bhagia, Ian Magnusson et al.
Adapting Large Language Models via Reading Comprehension
Daixuan Cheng, Shaohan Huang, Furu Wei
InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning
Yan-Shuo Liang, Wu-Jun Li
Rewards-in-Context: Multi-objective Alignment of Foundation Models with Dynamic Preference Adjustment
Rui Yang, Xiaoman Pan, Feng Luo et al.
FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering
Zhenyu Li, Sunqi Fan, Yu Gu et al.
SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation
Yuxuan Zhang, Yiren Song, Jiaming Liu et al.
Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
Rohit Gandikota, Joanna Materzynska, Tingrui Zhou et al.
AnyText: Multilingual Visual Text Generation and Editing
Yuxiang Tuo, Wangmeng Xiang, Jun-Yan He et al.
Large Language Models to Enhance Bayesian Optimization
Tennison Liu, Nicolás Astorga, Nabeel Seedat et al.
WonderJourney: Going from Anywhere to Everywhere
Hong-Xing Yu, Haoyi Duan, Junhwa Hur et al.
An LLM Compiler for Parallel Function Calling
Sehoon Kim, Suhong Moon, Ryan Tabrizi et al.
VideoBooth: Diffusion-based Video Generation with Image Prompts
Yuming Jiang, Tianxing Wu, Shuai Yang et al.
Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models
Yongshuo Zong, Ondrej Bohdal, Tingyang Yu et al.
SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction
Yuanhui Huang, Wenzhao Zheng, Borui Zhang et al.
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Nikhil Sardana, Jacob Portes, Alexandre (Sasha) Doubov et al.
OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning
Lingyi Hong, Shilin Yan, Renrui Zhang et al.
Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models
Yin Fang, Xiaozhuan Liang, Ningyu Zhang et al.
Zoology: Measuring and Improving Recall in Efficient Language Models
Simran Arora, Sabri Eyuboglu, Aman Timalsina et al.
DiffiT: Diffusion Vision Transformers for Image Generation
Ali Hatamizadeh, Jiaming Song, Guilin Liu et al.
Decomposed Diffusion Sampler for Accelerating Large-Scale Inverse Problems
Hyungjin Chung, Suhyeon Lee, Jong Chul YE
Scaling Laws of RoPE-based Extrapolation
Xiaoran Liu, Hang Yan, Chenxin An et al.
GenSim: Generating Robotic Simulation Tasks via Large Language Models
Lirui Wang, Yiyang Ling, Zhecheng Yuan et al.
MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model
Wenxun Dai, Ling-Hao Chen, Jingbo Wang et al.
Drag Anything: Motion Control for Anything using Entity Representation
Weijia Wu, Zhuang Li, Yuchao Gu et al.
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue
Xing Han Lù, Zdeněk Kasner, Siva Reddy
Text-to-3D with Classifier Score Distillation
Xin Yu, Yuan-Chen Guo, Yangguang Li et al.
Hallucination Augmented Contrastive Learning for Multimodal Large Language Model
Chaoya Jiang, Haiyang Xu, Mengfan Dong et al.
Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents
Yuxi Wei, Zi Wang, Yifan Lu et al.
Teaching Arithmetic to Small Transformers
Nayoung Lee, Kartik Sreenivasan, Jason Lee et al.
Token-level Direct Preference Optimization
Yongcheng Zeng, Guoqing Liu, Weiyu Ma et al.
Navigating Text-To-Image Customization: From LyCORIS Fine-Tuning to Model Evaluation
Shih-Ying Yeh, Yu-Guan Hsieh, Zhidong Gao et al.
ShapeLLM: Universal 3D Object Understanding for Embodied Interaction
Zekun Qi, Runpei Dong, Shaochen Zhang et al.
Scaling Laws for Fine-Grained Mixture of Experts
Jan Ludziejewski, Jakub Krajewski, Kamil Adamczewski et al.
LGMRec: Local and Global Graph Learning for Multimodal Recommendation
Zhiqiang Guo, Jianjun Li, Guohui Li et al.
ProAgent: Building Proactive Cooperative Agents with Large Language Models
Ceyao Zhang, Kaijie Yang, Siyi Hu et al.
CoDeF: Content Deformation Fields for Temporally Consistent Video Processing
Hao Ouyang, Qiuyu Wang, Yuxi Xiao et al.
LUT-GEMM: Quantized Matrix Multiplication based on LUTs for Efficient Inference in Large-Scale Generative Language Models
Gunho Park, baeseong park, Minsub Kim et al.
RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback
Yufei Wang, Zhanyi Sun, Jesse Zhang et al.
EditGuard: Versatile Image Watermarking for Tamper Localization and Copyright Protection
Xuanyu Zhang, Runyi Li, Jiwen Yu et al.
One-dimensional Adapter to Rule Them All: Concepts Diffusion Models and Erasing Applications
Mengyao Lyu, Yuhong Yang, Haiwen Hong et al.
RT-Trajectory: Robotic Task Generalization via Hindsight Trajectory Sketches
Jiayuan Gu, Sean Kirmani, Paul Wohlhart et al.
Modeling Dense Multimodal Interactions Between Biological Pathways and Histology for Survival Prediction
Guillaume Jaume, Anurag Vaidya, Richard J. Chen et al.
Controlled Decoding from Language Models
Sidharth Mudgal, Jong Lee, Harish Ganapathy et al.
Advancing Pose-Guided Image Synthesis with Progressive Conditional Diffusion Models
Fei Shen, Hu Ye, Jun Zhang et al.
Towards Learning a Generalist Model for Embodied Navigation
Duo Zheng, Shijia Huang, Lin Zhao et al.
Autoregressive Queries for Adaptive Tracking with Spatio-Temporal Transformers
Jinxia Xie, Bineng Zhong, Zhiyi Mo et al.
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
Weiyun Wang, Min Shi, Qingyun Li et al.
The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
Pratyusha Sharma, Jordan Ash, Dipendra Kumar Misra
InstructIR: High-Quality Image Restoration Following Human Instructions
Marcos Conde, Gregor Geigle, Radu Timofte
PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation
Yuqi Wang, Yuntao Chen, Xingyu Liao et al.
SEA-RAFT: Simple, Efficient, Accurate RAFT for Optical Flow
Yihan Wang, Lahav Lipson, Jia Deng
IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection
Mingjin Zhang, Yuchun Wang, Jie Guo et al.
Implicit Style-Content Separation using B-LoRA
Yarden Frenkel, Yael Vinker, Ariel Shamir et al.
Cameras as Rays: Pose Estimation via Ray Diffusion
Jason Zhang, Amy Lin, Moneish Kumar et al.
Efficient Test-Time Adaptation of Vision-Language Models
Adilbek Karmanov, Dayan Guan, Shijian Lu et al.
VideoLLM-online: Online Video Large Language Model for Streaming Video
Joya Chen, Zhaoyang Lv, Shiwei Wu et al.
Human Gaussian Splatting: Real-time Rendering of Animatable Avatars
Arthur Moreau, Jifei Song, Helisa Dhamo et al.
Expandable Subspace Ensemble for Pre-Trained Model-Based Class-Incremental Learning
Da-Wei Zhou, Hai-Long Sun, Han-Jia Ye et al.
Challenges in Training PINNs: A Loss Landscape Perspective
Pratik Rathore, Weimu Lei, Zachary Frangella et al.
Retrieval meets Long Context Large Language Models
Peng Xu, Wei Ping, Xianchao Wu et al.
CompGS: Smaller and Faster Gaussian Splatting with Vector Quantization
K L Navaneet, Kossar Pourahmadi, Soroush Abbasi Koohpayegani et al.
Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving
Ming Nie, Renyuan Peng, Chunwei Wang et al.
Gaussian Shading: Provable Performance-Lossless Image Watermarking for Diffusion Models
Zijin Yang, Kai Zeng, Kejiang Chen et al.
MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis
Dewei Zhou, You Li, Fan Ma et al.
LLMCarbon: Modeling the End-to-End Carbon Footprint of Large Language Models
Ahmad Faiz, Sotaro Kaneda, Ruhan Wang et al.
Universal Jailbreak Backdoors from Poisoned Human Feedback
Javier Rando, Florian Tramer
DynMF: Neural Motion Factorization for Real-time Dynamic View Synthesis with 3D Gaussian Splatting
Angelos Kratimenos, Jiahui Lei, Kostas Daniilidis
Understanding Catastrophic Forgetting in Language Models via Implicit Inference
Suhas Kotha, Jacob Springer, Aditi Raghunathan
Incomplete Contrastive Multi-View Clustering with High-Confidence Guiding
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Hao Zhang, Hongyang Li, Feng Li et al.
Motion Mamba: Efficient and Long Sequence Motion Generation
Zeyu Zhang, Akide Liu, Ian Reid et al.
SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation
Wenxi Yue, Jing Zhang, Kun Hu et al.
Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching
Xianqi Wang, Gangwei Xu, Hao Jia et al.
PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection
Xiaofan Li, Zhizhong Zhang, Xin Tan et al.
CG-HOI: Contact-Guided 3D Human-Object Interaction Generation
Christian Diller, Angela Dai
Towards Lossless Dataset Distillation via Difficulty-Aligned Trajectory Matching
Ziyao Guo, Kai Wang, George Cazenavette et al.
Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models
Xianfang Zeng, Xin Chen, Zhongqi Qi et al.
FreGS: 3D Gaussian Splatting with Progressive Frequency Regularization
Jiahui Zhang, Fangneng Zhan, MUYU XU et al.
OneFormer3D: One Transformer for Unified Point Cloud Segmentation
Maksim Kolodiazhnyi, Anna Vorontsova, Anton Konushin et al.
Can I Trust Your Answer? Visually Grounded Video Question Answering
Junbin Xiao, Angela Yao, Yicong Li et al.
NEFTune: Noisy Embeddings Improve Instruction Finetuning
Neel Jain, Ping-yeh Chiang, Yuxin Wen et al.
Prodigy: An Expeditiously Adaptive Parameter-Free Learner
Konstantin Mishchenko, Aaron Defazio
RCBEVDet: Radar-camera Fusion in Bird's Eye View for 3D Object Detection
Zhiwei Lin, Zhe Liu, Zhongyu Xia et al.
MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion
Di Chang, Yichun Shi, Quankai Gao et al.
Relightable 3D Gaussians: Realistic Point Cloud Relighting with BRDF Decomposition and Ray Tracing
Jian Gao, chun gu, Youtian Lin et al.
FasterViT: Fast Vision Transformers with Hierarchical Attention
Ali Hatamizadeh, Greg Heinrich, Hongxu Yin et al.
Binding Touch to Everything: Learning Unified Multimodal Tactile Representations
Fengyu Yang, Chao Feng, Ziyang Chen et al.
OmniACT: A Dataset and Benchmark for Enabling Multimodal Generalist Autonomous Agents for Desktop and Web
Raghav Kapoor, Yash Parag Butala, Melisa A Russak et al.
Mitigating Large Language Model Hallucinations via Autonomous Knowledge Graph-Based Retrofitting
Xinyan Guan, Yanjiang Liu, Hongyu Lin et al.
ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation
Guanxing Lu, Shiyi Zhang, Ziwei Wang et al.
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
Yazhou Xing, Yingqing He, Zeyue Tian et al.
LEGO-Prover: Neural Theorem Proving with Growing Libraries
Haiming Wang, Huajian Xin, Chuanyang Zheng et al.
ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models
Yingqing He, Shaoshu Yang, Haoxin Chen et al.
Upscale-A-Video: Temporal-Consistent Diffusion Model for Real-World Video Super-Resolution
Shangchen Zhou, Peiqing Yang, Jianyi Wang et al.
HumanTOMATO: Text-aligned Whole-body Motion Generation
Shunlin Lu, Ling-Hao Chen, Ailing Zeng et al.
SmartPlay : A Benchmark for LLMs as Intelligent Agents
Yue Wu, Xuan Tang, Tom Mitchell et al.
Text2Reward: Reward Shaping with Language Models for Reinforcement Learning
Tianbao Xie, Siheng Zhao, Chen Henry Wu et al.
Gaussian in the wild: 3D Gaussian Splatting for Unconstrained Image Collections
Dongbin Zhang, Chuming Wang, Weitao Wang et al.
Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement
Linlu Qiu, Liwei Jiang, Ximing Lu et al.
Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control
Longtao Zheng, Rundong Wang, Xinrun Wang et al.
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
Size Wu, Wenwei Zhang, Lumin Xu et al.
Strategic Preys Make Acute Predators: Enhancing Camouflaged Object Detectors by Generating Camouflaged Objects
Chunming He, Kai Li, Yachao Zhang et al.
ODIN: Disentangled Reward Mitigates Hacking in RLHF
Lichang Chen, Chen Zhu, Jiuhai Chen et al.
A Tale of Tails: Model Collapse as a Change of Scaling Laws
Elvis Dohmatob, Yunzhen Feng, Pu Yang et al.
Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation
Xiao Ma, Sumit Patidar, Iain Haughton et al.
AI Control: Improving Safety Despite Intentional Subversion
Ryan Greenblatt, Buck Shlegeris, Kshitij Sachan et al.
Unpaired Image-to-Image Translation via Neural Schrödinger Bridge
Beomsu Kim, Gihyun Kwon, Kwanyoung Kim et al.
Zero-Reference Low-Light Enhancement via Physical Quadruple Priors
Wenjing Wang, Huan Yang, Jianlong Fu et al.
#InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of Large Language Models
Keming Lu, Hongyi Yuan, Zheng Yuan et al.
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
Chuwei Luo, Yufan Shen, Zhaoqing Zhu et al.
LITA: Language Instructed Temporal-Localization Assistant
De-An Huang, Shijia Liao, Subhashree Radhakrishnan et al.
A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models
Haoran Xu, Young Jin Kim, Amr Mohamed Nabil Aly Aly Sharaf et al.
Eyes Closed, Safety On: Protecting Multimodal LLMs via Image-to-Text Transformation
Yunhao Gou, Kai Chen, Zhili LIU et al.
OMG-Seg: Is One Model Good Enough For All Segmentation?
Xiangtai Li, Haobo Yuan, Wei Li et al.
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization
Weiran Yao, Shelby Heinecke, Juan Carlos Niebles et al.
ReNoise: Real Image Inversion Through Iterative Noising
Daniel Garibi, Or Patashnik, Andrey Voynov et al.
VELMA: Verbalization Embodiment of LLM Agents for Vision and Language Navigation in Street View
Raphael Schumann, Wanrong Zhu, Weixi Feng et al.
Exploring Large Language Model for Graph Data Understanding in Online Job Recommendations
Likang Wu, Zhaopeng Qiu, Zhi Zheng et al.
Scaling Laws of Synthetic Images for Model Training ... for Now
Lijie Fan, Kaifeng Chen, Dilip Krishnan et al.
GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence
Van Nguyen Nguyen, Thibault Groueix, Mathieu Salzmann et al.
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation
Shaowei Liu, Zhongzheng Ren, Saurabh Gupta et al.
Revisiting Zeroth-Order Optimization for Memory-Efficient LLM Fine-Tuning: A Benchmark
Yihua Zhang, Pingzhi Li, Junyuan Hong et al.
Scaling Up Dynamic Human-Scene Interaction Modeling
Nan Jiang, Zhiyuan Zhang, Hongjie Li et al.
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
Chuofan Ma, Yi Jiang, Jiannan Wu et al.
Fine-Tuned Language Models Generate Stable Inorganic Materials as Text
Nate Gruver, Anuroop Sriram, Andrea Madotto et al.
VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning
Ziyang Luo, Nian Liu, Wangbo Zhao et al.
Can Mamba Learn How To Learn? A Comparative Study on In-Context Learning Tasks
Jong Ho Park, Jaden Park, Zheyang Xiong et al.
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs
Angelica Chen, Ravid Shwartz-Ziv, Kyunghyun Cho et al.
Stop Regressing: Training Value Functions via Classification for Scalable Deep RL
Jesse Farebrother, Jordi Orbay, Quan Vuong et al.
RoboDreamer: Learning Compositional World Models for Robot Imagination
Siyuan Zhou, Yilun Du, Jiaben Chen et al.
SimDA: Simple Diffusion Adapter for Efficient Video Generation
Zhen Xing, Qi Dai, Han Hu et al.
Tensor Trust: Interpretable Prompt Injection Attacks from an Online Game
Sam Toyer, Olivia Watkins, Ethan Mendes et al.
Consistent4D: Consistent 360° Dynamic Object Generation from Monocular Video
Yanqin Jiang, Li Zhang, Jin Gao et al.
latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction
Christopher Wewer, Kevin Raj, Eddy Ilg et al.
On Prompt-Driven Safeguarding for Large Language Models
Chujie Zheng, Fan Yin, Hao Zhou et al.