Most Cited 2024 Poster "feature discriminability" Papers
12,324 papers found • Page 1 of 62
Conference
Improved Baselines with Visual Instruction Tuning
Haotian Liu, Chunyuan Li, Yuheng Li et al.
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
Dustin Podell, Zion English, Kyle Lacey et al.
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
Shilong Liu, Zhaoyang Zeng, Tianhe Ren et al.
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
Chien-Yao Wang, I-Hau Yeh, Hong-Yuan Mark Liao
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Patrick Esser, Sumith Kulal, Andreas Blattmann et al.
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
Deyao Zhu, jun chen, Xiaoqian Shen et al.
DETRs Beat YOLOs on Real-time Object Detection
Yian Zhao, Wenyu Lv, Shangliang Xu et al.
Let's Verify Step by Step
Hunter Lightman, Vineet Kosaraju, Yuri Burda et al.
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen, Jiannan Wu, Wenhai Wang et al.
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Tri Dao
MMBENCH: Is Your Multi-Modal Model an All-around Player?
Yuan Liu, Haodong Duan, Yuanhan Zhang et al.
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Xiang Yue, Yuansheng Ni, Kai Zhang et al.
SWE-bench: Can Language Models Resolve Real-world Github Issues?
Carlos E Jimenez, John Yang, Alexander Wettig et al.
Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Lihe Yang, Bingyi Kang, Zilong Huang et al.
T2I-Adapter: Learning Adapters to Dig Out More Controllable Ability for Text-to-Image Diffusion
Chong Mou, Xintao Wang, Liangbin Xie et al.
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Lianghui Zhu, Bencheng Liao, Qian Zhang et al.
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Akari Asai, Zeqiu Wu, Yizhong Wang et al.
Efficient Streaming Language Models with Attention Sinks
Guangxuan Xiao, Yuandong Tian, Beidi Chen et al.
MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework
Sirui Hong, Mingchen Zhuge, Jonathan Chen et al.
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Xin Li, Jing Yu Koh, Alexander Ku et al.
iTransformer: Inverted Transformers Are Effective for Time Series Forecasting
Yong Liu, Tengge Hu, Haoran Zhang et al.
AnimateDiff: Animate Your Personalized Text-to-Image Diffusion Models without Specific Tuning
Yuwei GUO, Ceyuan Yang, Anyi Rao et al.
Improving Factuality and Reasoning in Language Models through Multiagent Debate
Yilun Du, Shuang Li, Antonio Torralba et al.
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts
Pan Lu, Hritik Bansal, Tony Xia et al.
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
Yujia Qin, Shihao Liang, Yining Ye et al.
WizardLM: Empowering Large Pre-Trained Language Models to Follow Complex Instructions
Can Xu, Qingfeng Sun, Kai Zheng et al.
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Tri Dao, Albert Gu
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Maciej Besta, Nils Blach, Ales Kubicek et al.
4D Gaussian Splatting for Real-Time Dynamic Scene Rendering
Guanjun Wu, Taoran Yi, Jiemin Fang et al.
VBench: Comprehensive Benchmark Suite for Video Generative Models
Ziqi Huang, Yinan He, Jiashuo Yu et al.
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
Weihao Yu, Zhengyuan Yang, Linjie Li et al.
Grounding Multimodal Large Language Models to the World
Zhiliang Peng, Wenhui Wang, Li Dong et al.
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference
Wei-Lin Chiang, Lianmin Zheng, Ying Sheng et al.
DUSt3R: Geometric 3D Vision Made Easy
Shuzhe Wang, Vincent Leroy, Yohann Cabon et al.
A Generalist Agent
Jackie Kay, Sergio Gómez Colmenarejo, Mahyar Bordbar et al.
ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
Lin Chen, Jinsong Li, Xiaoyi Dong et al.
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!
Xiangyu Qi, Yi Zeng, Tinghao Xie et al.
Teaching Large Language Models to Self-Debug
Xinyun Chen, Maxwell Lin, Nathanael Schaerli et al.
WebArena: A Realistic Web Environment for Building Autonomous Agents
Shuyan Zhou, Frank F Xu, Hao Zhu et al.
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Kunchang Li, Yali Wang, Yinan He et al.
DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation
Jiaxiang Tang, Jiawei Ren, Hang Zhou et al.
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
Ziyang Luo, Can Xu, Pu Zhao et al.
MVDream: Multi-view Diffusion for 3D Generation
Yichun Shi, Peng Wang, Jianglong Ye et al.
Model Alignment as Prospect Theoretic Optimization
Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff et al.
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
Mantas Mazeika, Long Phan, Xuwang Yin et al.
ChatEval: Towards Better LLM-based Evaluators through Multi-Agent Debate
Chi-Min Chan, Weize Chen, Yusheng Su et al.
Time-LLM: Time Series Forecasting by Reprogramming Large Language Models
Ming Jin, Shiyu Wang, Lintao Ma et al.
LISA: Reasoning Segmentation via Large Language Model
Xin Lai, Zhuotao Tian, Yukang Chen et al.
Large Language Models Cannot Self-Correct Reasoning Yet
Jie Huang, Xinyun Chen, Swaroop Mishra et al.
NExT-GPT: Any-to-Any Multimodal LLM
Shengqiong Wu, Hao Fei, Leigang Qu et al.
Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs
Miao Xiong, Zhiyuan Hu, Xinyang Lu et al.
LRM: Large Reconstruction Model for Single Image to 3D
Yicong Hong, Kai Zhang, Jiuxiang Gu et al.
Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction
Ziyi Yang, Xinyu Gao, Wen Zhou et al.
DoRA: Weight-Decomposed Low-Rank Adaptation
Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin et al.
VILA: On Pre-training for Visual Language Models
Ji Lin, Danny Yin, Wei Ping et al.
A Simple and Effective Pruning Approach for Large Language Models
Mingjie Sun, Zhuang Liu, Anna Bair et al.
Training Diffusion Models with Reinforcement Learning
Kevin Black, Michael Janner, Yilun Du et al.
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
Li Hu
Large Language Models as Optimizers
Chengrun Yang, Xuezhi Wang, Yifeng Lu et al.
YOLO-World: Real-Time Open-Vocabulary Object Detection
Tianheng Cheng, Lin Song, Yixiao Ge et al.
Wonder3D: Single Image to 3D using Cross-Domain Diffusion
Xiaoxiao Long, Yuan-Chen Guo, Cheng Lin et al.
SuGaR: Surface-Aligned Gaussian Splatting for Efficient 3D Mesh Reconstruction and High-Quality Mesh Rendering
Antoine Guédon, Vincent Lepetit
Vision Transformers Need Registers
Timothée Darcet, Maxime Oquab, Julien Mairal et al.
LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation
Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen et al.
CogAgent: A Visual Language Model for GUI Agents
Wenyi Hong, Weihan Wang, Qingsong Lv et al.
SyncDreamer: Generating Multiview-consistent Images from a Single-view Image
Yuan Liu, Cheng Lin, Zijiao Zeng et al.
Adversarial Diffusion Distillation
Axel Sauer, Dominik Lorenz, Andreas Blattmann et al.
Mip-Splatting: Alias-free 3D Gaussian Splatting
Zehao Yu, Anpei Chen, Binbin Huang et al.
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
Zhibin Gou, Zhihong Shao, Yeyun Gong et al.
Scaffold-GS: Structured 3D Gaussians for View-Adaptive Rendering
Tao Lu, Mulin Yu, Linning Xu et al.
MusicRL: Aligning Music Generation to Human Preferences
Geoffrey Cideron, Sertan Girgin, Mauro Verzetti et al.
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
Qinghao Ye, Haiyang Xu, Jiabo Ye et al.
AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models
Xiaogeng Liu, Nan Xu, Muhao Chen et al.
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Shengbang Tong, Zhuang Liu, Yuexiang Zhai et al.
Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting
Melanie Sclar, Yejin Choi, Yulia Tsvetkov et al.
One-step Diffusion with Distribution Matching Distillation
Tianwei Yin, Michaël Gharbi, Richard Zhang et al.
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities
Boyuan Chen, Zhuo Xu, Sean Kirmani et al.
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
Longhui Yu, Weisen JIANG, Han Shi et al.
Safe RLHF: Safe Reinforcement Learning from Human Feedback
Juntao Dai, Xuehai Pan, Ruiyang Sun et al.
Diffusion Model Alignment Using Direct Preference Optimization
Bram Wallace, Meihua Dang, Rafael Rafailov et al.
MambaIR: A Simple Baseline for Image Restoration with State-Space Model
Hang Guo, Jinmin Li, Tao Dai et al.
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Tianle Cai, Yuhong Li, Zhengyang Geng et al.
Language Model Beats Diffusion - Tokenizer is key to visual generation
Lijun Yu, José Lezama, Nitesh Bharadwaj Gundavarapu et al.
AgentBench: Evaluating LLMs as Agents
Xiao Liu, Hao Yu, Hanchen Zhang et al.
Grounding Image Matching in 3D with MASt3R
Vincent Leroy, Yohann Cabon, Jerome Revaud
GAIA: a benchmark for General AI Assistants
Grégoire Mialon, Clémentine Fourrier, Thomas Wolf et al.
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch
Le Yu, Bowen Yu, Haiyang Yu et al.
RLAIF vs. RLHF: Scaling Reinforcement Learning from Human Feedback with AI Feedback
Harrison Lee, Samrat Phatale, Hassan Mansoor et al.
Towards Understanding Sycophancy in Language Models
Mrinank Sharma, Meg Tong, Tomek Korbak et al.
MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning
Xiang Yue, Xingwei Qu, Ge Zhang et al.
pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction
David Charatan, Sizhe Lester Li, Andrea Tagliasacchi et al.
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
Haoxin Chen, Yong Zhang, Xiaodong Cun et al.
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors
Weize Chen, Yusheng Su, Jingwei Zuo et al.
LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models
Yanwei Li, Chengyao Wang, Jiaya Jia
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Renrui Zhang, Dongzhi Jiang, Yichi Zhang et al.
SplaTAM: Splat Track & Map 3D Gaussians for Dense RGB-D SLAM
Nikhil Keetha, Jay Karhade, Krishna Murthy Jatavallabhula et al.
Self-Rewarding Language Models
Weizhe Yuan, Richard Yuanzhe Pang, Kyunghyun Cho et al.
A decoder-only foundation model for time-series forecasting
Abhimanyu Das, Weihao Kong, Rajat Sen et al.
Patches Are All You Need?
Asher Trockman, J Kolter
Eureka: Human-Level Reward Design via Coding Large Language Models
Yecheng Jason Ma, William Liang, Guanzhi Wang et al.
Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive Decoding
Sicong Leng, Hang Zhang, Guanzheng Chen et al.
RepViT: Revisiting Mobile CNN From ViT Perspective
Ao Wang, Hui Chen, Zijia Lin et al.
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models
Zixiang Chen, Yihe Deng, Huizhuo Yuan et al.
Benchmarking Large Language Models in Retrieval-Augmented Generation
Jiawei Chen, Hongyu Lin, Xianpei Han et al.
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
Enxin Song, Wenhao Chai, Guanhong Wang et al.
CoTracker: It is Better to Track Together
Nikita Karaev, Ignacio Rocco, Ben Graham et al.
Gaussian Splatting SLAM
Hidenobu Matsuki, Riku Murai, Paul Kelly et al.
SALMONN: Towards Generic Hearing Abilities for Large Language Models
Changli Tang, Wenyi Yu, Guangzhi Sun et al.
Real-time Photorealistic Dynamic Scene Representation and Rendering with 4D Gaussian Splatting
Zeyu Yang, Hongye Yang, Zijie Pan et al.
Ferret: Refer and Ground Anything Anywhere at Any Granularity
Haoxuan You, Haotian Zhang, Zhe Gan et al.
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers
Nanye Ma, Mark Goldstein, Michael Albergo et al.
YaRN: Efficient Context Window Extension of Large Language Models
Bowen Peng, Jeffrey Quesnelle, Honglu Fan et al.
Generative Multimodal Models are In-Context Learners
Quan Sun, Yufeng Cui, Xiaosong Zhang et al.
TimeMixer: Decomposable Multiscale Mixing for Time Series Forecasting
Shiyu Wang, Haixu Wu, Xiaoming Shi et al.
FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects
Bowen Wen, Wei Yang, Jan Kautz et al.
MobileNetV4: Universal Models for the Mobile Ecosystem
Danfeng Qin, Chas Leichner, Manolis Delakis et al.
Catastrophic Jailbreak of Open-source LLMs via Exploiting Generation
Yangsibo Huang, Samyak Gupta, Mengzhou Xia et al.
Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors
Guocheng Qian, Jinjie Mai, Abdullah Hamdi et al.
Unified Training of Universal Time Series Forecasting Transformers
Gerald Woo, Chenghao Liu, Akshat Kumar et al.
Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
Mengzhou Xia, Tianyu Gao, Zhiyuan Zeng et al.
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
Jinbo Xing, Menghan Xia, Yong Zhang et al.
GPT-4V(ision) is a Generalist Web Agent, if Grounded
Boyuan Zheng, Boyu Gou, Jihyung Kil et al.
Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
Fuxiao Liu, Kevin Lin, Linjie Li et al.
VideoPoet: A Large Language Model for Zero-Shot Video Generation
Dan Kondratyuk, Lijun Yu, Xiuye Gu et al.
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
Yi Wang, Yinan He, Yizhuo Li et al.
Reasoning on Graphs: Faithful and Interpretable Large Language Model Reasoning
Linhao Luo, Yuan-Fang Li, Reza Haffari et al.
AnyDoor: Zero-shot Object-level Image Customization
Xi Chen, Lianghua Huang, Yu Liu et al.
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation
Haoran Xu, Amr Sharaf, Yunmo Chen et al.
WildChat: 1M ChatGPT Interaction Logs in the Wild
Wenting Zhao, Xiang Ren, Jack Hessel et al.
GLaMM: Pixel Grounding Large Multimodal Model
Hanoona Rasheed, Muhammad Maaz, Sahal Shaji Mullappilly et al.
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Bin Xiao, Haiping Wu, Weijian Xu et al.
VideoMamba: State Space Model for Efficient Video Understanding
Kunchang Li, Xinhao Li, Yi Wang et al.
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
Collin Burns, Pavel Izmailov, Jan Kirchner et al.
GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher
Youliang Yuan, Wenxiang Jiao, Wenxuan Wang et al.
Llemma: An Open Language Model for Mathematics
Zhangir Azerbayev, Hailey Schoelkopf, Keiran Paster et al.
LESS: Selecting Influential Data for Targeted Instruction Tuning
Mengzhou Xia, Sadhika Malladi, Suchin Gururangan et al.
Universal Guidance for Diffusion Models
Arpit Bansal, Hong-Min Chu, Avi Schwarzschild et al.
Genie: Generative Interactive Environments
Jake Bruce, Michael Dennis, Ashley Edwards et al.
Prometheus: Inducing Fine-Grained Evaluation Capability in Language Models
Seungone Kim, Jamin Shin, yejin cho et al.
Q-Align: Teaching LMMs for Visual Scoring via Discrete Text-Defined Levels
Haoning Wu, Zicheng Zhang, Weixia Zhang et al.
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models
Tianrui Guan, Fuxiao Liu, Xiyang Wu et al.
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
Zhang Li, Biao Yang, Qiang Liu et al.
Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
Suyu Ge, Yunan Zhang, Liyuan Liu et al.
TokenFlow: Consistent Diffusion Features for Consistent Video Editing
Michal Geyer, Omer Bar Tal, Shai Bagon et al.
OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation
Qidong Huang, Xiaoyi Dong, Pan Zhang et al.
Large Language Models Are Not Robust Multiple Choice Selectors
Chujie Zheng, Hao Zhou, Fandong Meng et al.
Instant3D: Fast Text-to-3D with Sparse-view Generation and Large Reconstruction Model
Jiahao Li, Hao Tan, Kai Zhang et al.
Finite Scalar Quantization: VQ-VAE Made Simple
Fabian Mentzer, David Minnen, Eirikur Agustsson et al.
How Language Model Hallucinations Can Snowball
Muru Zhang, Ofir Press, William Merrill et al.
InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning
Jing Shi, Wei Xiong, Zhe Lin et al.
ExpeL: LLM Agents Are Experiential Learners
Andrew Zhao, Daniel Huang, Quentin Xu et al.
GS-SLAM: Dense Visual SLAM with 3D Gaussian Splatting
Chi Yan, Delin Qu, Dong Wang et al.
DriveLM: Driving with Graph Visual Question Answering
Chonghao Sima, Katrin Renz, Kashyap Chitta et al.
MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
Yuedong Chen, Haofei Xu, Chuanxia Zheng et al.
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
Shuhuai Ren, Linli Yao, Shicheng Li et al.
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Jiawei Zhao, Zhenyu Zhang, Beidi Chen et al.
LangSplat: 3D Language Gaussian Splatting
Minghan Qin, Wanhua Li, Jiawei ZHOU et al.
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
Zirui Liu, Jiayi Yuan, Hongye Jin et al.
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Liang Chen, Haozhe Zhao, Tianyu Liu et al.
Compact 3D Gaussian Representation for Radiance Field
Joo Chan Lee, Daniel Rho, Xiangyu Sun et al.
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Peng Jin, Ryuichi Takanobu, Cai Zhang et al.
Promptbreeder: Self-Referential Self-Improvement via Prompt Evolution
Chrisantha Fernando, Dylan Banarse, Henryk Michalewski et al.
The Linear Representation Hypothesis and the Geometry of Large Language Models
Kiho Park, Yo Joong Choe, Victor Veitch
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback
Tianyu Yu, Yuan Yao, Haoye Zhang et al.
Evaluating Text-to-Visual Generation with Image-to-Text Generation
Zhiqiu Lin, Deepak Pathak, Baiqi Li et al.
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
Bin Zhu, Bin Lin, Munan Ning et al.
Effective Data Augmentation With Diffusion Models
Brandon Trabucco, Kyle Doherty, Max Gurinas et al.
DrivingGaussian: Composite Gaussian Splatting for Surrounding Dynamic Autonomous Driving Scenes
Xiaoyu Zhou, Zhiwei Lin, Xiaojun Shan et al.
Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution
Aaron Lou, Chenlin Meng, Stefano Ermon
MOMENT: A Family of Open Time-series Foundation Models
Mononito Goswami, Konrad Szafer, Arjun Choudhry et al.
Analyzing and Improving the Training Dynamics of Diffusion Models
Tero Karras, Miika Aittala, Jaakko Lehtinen et al.
Rewrite the Stars
Xu Ma, Xiyang Dai, Yue Bai et al.
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng et al.
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace et al.
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression
Tim Dettmers, Ruslan Svirschevski, Vage Egiazarian et al.
Learning Interactive Real-World Simulators
Sherry Yang, Yilun Du, Seyed Ghasemipour et al.
Wavelet Convolutions for Large Receptive Fields
Shahaf Finder, Roy Amoyal, Eran Treister et al.
V?: Guided Visual Search as a Core Mechanism in Multimodal LLMs
Penghao Wu, Saining Xie
Executable Code Actions Elicit Better LLM Agents
Xingyao Wang, Yangyi Chen, Lifan Yuan et al.
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
Kai Shen, Zeqian Ju, Xu Tan et al.
Gaussian Grouping: Segment and Edit Anything in 3D Scenes
Mingqiao Ye, Martin Danelljan, Fisher Yu et al.
Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Kristen Grauman, Andrew Westbury, Lorenzo Torresani et al.
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
Wenqi Shao, Mengzhao Chen, Zhaoyang Zhang et al.
LoRA+: Efficient Low Rank Adaptation of Large Models
Soufiane Hayou, Nikhil Ghosh, Bin Yu
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Yuhui Li, Fangyun Wei, Chao Zhang et al.
Safety-Tuned LLaMAs: Lessons From Improving the Safety of Large Language Models that Follow Instructions
Federico Bianchi, Mirac Suzgun, Giuseppe Attanasio et al.
DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genomes
Zhihan Zhou, Yanrong Ji, Weijian Li et al.
Preference Ranking Optimization for Human Alignment
Feifan Song, Bowen Yu, Minghao Li et al.
Poly Kernel Inception Network for Remote Sensing Detection
Xinhao Cai, Qiuxia Lai, Yuwei Wang et al.
What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning
Wei Liu, Weihao Zeng, Keqing He et al.
PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization
Yidong Wang, Zhuohao Yu, Wenjin Yao et al.
ControlVideo: Training-free Controllable Text-to-video Generation
Yabo Zhang, Yuxiang Wei, Dongsheng jiang et al.
Feature 3DGS: Supercharging 3D Gaussian Splatting to Enable Distilled Feature Fields
Shijie Zhou, Haoran Chang, Sicheng Jiang et al.
BLINK: Multimodal Large Language Models Can See but Not Perceive
Xingyu Fu, Yushi Hu, Bangzheng Li et al.
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
Parth Sarthi, Salman Abdullah, Aditi Tuli et al.
Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion
Dongjun Kim, Chieh-Hsin Lai, WeiHsiang Liao et al.
Text-to-3D using Gaussian Splatting
Zilong Chen, Feng Wang, Yikai Wang et al.
The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning
Nathaniel Li, Alexander Pan, Anjali Gopal et al.
GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting
Yiwen Chen, Zilong Chen, Chi Zhang et al.
Improved Techniques for Training Consistency Models
Yang Song, Prafulla Dhariwal