Most Cited 2024 "equal opportunity constraints" Papers
12,324 papers found • Page 3 of 62
Conference
Evolution of Heuristics: Towards Efficient Automatic Algorithm Design Using Large Language Model
Fei Liu, Tong Xialiang, Mingxuan Yuan et al.
AdaMerging: Adaptive Model Merging for Multi-Task Learning
Enneng Yang, Zhenyi Wang, Li Shen et al.
Nash Learning from Human Feedback
REMI MUNOS, Michal Valko, Daniele Calandriello et al.
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases
Zechun Liu, Changsheng Zhao, Forrest Iandola et al.
AlphaFold Meets Flow Matching for Generating Protein Ensembles
Bowen Jing, Bonnie Berger, Tommi Jaakkola
Nougat: Neural Optical Understanding for Academic Documents
Lukas Blecher, Guillem Cucurull Preixens, Thomas Scialom et al.
OmniControl: Control Any Joint at Any Time for Human Motion Generation
Yiming Xie, Varun Jampani, Lei Zhong et al.
OctoPack: Instruction Tuning Code Large Language Models
Niklas Muennighoff, Qian Liu, Armel Zebaze et al.
CAT-Seg: Cost Aggregation for Open-Vocabulary Semantic Segmentation
Seokju Cho, Heeseong Shin, Sunghwan Hong et al.
Solving Inverse Problems with Latent Diffusion Models via Hard Data Consistency
Bowen Song, Soo Min Kwon, Zecheng Zhang et al.
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
Hubert Siuzdak
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
Wenbo Hu, Yifan Xu, Yi Li et al.
Holodeck: Language Guided Generation of 3D Embodied AI Environments
Yue Yang, Fan-Yun Sun, Luca Weihs et al.
MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning
Haozhe Zhao, Zefan Cai, Shuzheng Si et al.
GS-IR: 3D Gaussian Splatting for Inverse Rendering
Zhihao Liang, Qi Zhang, Ying Feng et al.
Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models
Huaixiu Steven Zheng, Swaroop Mishra, Xinyun Chen et al.
Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters
Jiazuo Yu, Yunzhi Zhuge, Lu Zhang et al.
Gaussian-Flow: 4D Reconstruction with Dynamic 3D Gaussian Particle
Youtian Lin, Zuozhuo Dai, Siyu Zhu et al.
Making LLaMA SEE and Draw with SEED Tokenizer
Yuying Ge, Sijie Zhao, Ziyun Zeng et al.
HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression
Yihang Chen, Qianyi Wu, Weiyao Lin et al.
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Bo He, Hengduo Li, Young Kyun Jang et al.
Reward Model Ensembles Help Mitigate Overoptimization
Thomas Coste, Usman Anwar, Robert Kirk et al.
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
Soroush Nasiriany, Fei Xia, Wenhao Yu et al.
Nearly $d$-Linear Convergence Bounds for Diffusion Models via Stochastic Localization
Joe Benton, Valentin De Bortoli, Arnaud Doucet et al.
RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation
Yufei Wang, Zhou Xian, Feng Chen et al.
ZigMa: A DiT-style Zigzag Mamba Diffusion Model
Tao Hu, Stefan Andreas Baumann, Ming Gui et al.
ODTrack: Online Dense Temporal Token Learning for Visual Tracking
Yaozong Zheng, Bineng Zhong, Qihua Liang et al.
PMET: Precise Model Editing in a Transformer
Xiaopeng Li, Shasha Li, Shezheng Song et al.
ReLoRA: High-Rank Training Through Low-Rank Updates
Vladislav Lialin, Sherin Muckatira, Namrata Shivagunde et al.
Data Engineering for Scaling Language Models to 128K Context
Yao Fu, Rameswar Panda, Xinyao Niu et al.
CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians
Yang Liu, Chuanchen Luo, Lue Fan et al.
Putting the Object Back into Video Object Segmentation
Ho Kei Cheng, Seoung Wug Oh, Brian Price et al.
On the Reliability of Watermarks for Large Language Models
John Kirchenbauer, Jonas Geiping, Yuxin Wen et al.
MSGNet: Learning Multi-Scale Inter-series Correlations for Multivariate Time Series Forecasting
Wanlin Cai, Yuxuan Liang, Xianggen Liu et al.
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods
Fred Zhang, Neel Nanda
Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications
Boyi Wei, Kaixuan Huang, Yangsibo Huang et al.
RMT: Retentive Networks Meet Vision Transformers
Qihang Fan, Huaibo Huang, Mingrui Chen et al.
EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS
Sharath Girish, Kamal Gupta, Abhinav Shrivastava
Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs
Shashank Gupta, Vaishnavi Shrivastava, Ameet Deshpande et al.
Mini-Splatting: Representing Scenes with a Constrained Number of Gaussians
Guangchi Fang, Bing Wang
Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews
Weixin Liang, Zachary Izzo, Yaohui Zhang et al.
ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation
Xiaoqi Li, Mingxu Zhang, Yiran Geng et al.
SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models
Xiaoxuan Wang, ziniu hu, Pan Lu et al.
Transolver: A Fast Transformer Solver for PDEs on General Geometries
Haixu Wu, Huakun Luo, Haowen Wang et al.
InstanceDiffusion: Instance-level Control for Image Generation
XuDong Wang, Trevor Darrell, Sai Saketh Rambhatla et al.
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Fahim Tajwar, Anikait Singh, Archit Sharma et al.
Sapiens: Foundation for Human Vision Models
Rawal Khirodkar, Timur Bagautdinov, Julieta Martinez et al.
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning
Ning Miao, Yee Whye Teh, Tom Rainforth
StableVITON: Learning Semantic Correspondence with Latent Diffusion Model for Virtual Try-On
Jeongho Kim, Gyojung Gu, Minho Park et al.
Fundamental Limitations of Alignment in Large Language Models
Yotam Wolf, Noam Wies, Oshri Avnery et al.
tinyBenchmarks: evaluating LLMs with fewer examples
Felipe Maia Polo, Lucas Weber, Leshem Choshen et al.
Generalized Planning in PDDL Domains with Pretrained Large Language Models
Tom Silver, Soham Dan, Kavitha Srinivas et al.
Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding
Jin-Chuan Shi, Miao Wang, Haobin Duan et al.
To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now
Yimeng Zhang, jinghan jia, Xin Chen et al.
Fast Machine Unlearning without Retraining through Selective Synaptic Dampening
Jack Foster, Stefan Schoepf, Alexandra Brintrup
BioCLIP: A Vision Foundation Model for the Tree of Life
Samuel Stevens, Jiaman Wu, Matthew Thompson et al.
RealNet: A Feature Selection Network with Realistic Synthetic Anomaly for Anomaly Detection
Ximiao Zhang, Min Xu, Xiuzhuang Zhou
Uni3D: Exploring Unified 3D Representation at Scale
Junsheng Zhou, Jinsheng Wang, Baorui Ma et al.
Beyond Memorization: Violating Privacy via Inference with Large Language Models
Robin Staab, Mark Vero, Mislav Balunovic et al.
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
Viraj Shah, Nataniel Ruiz, Forrester Cole et al.
LLaVA-UHD: an LMM Perceiving any Aspect Ratio and High-Resolution Images
Zonghao Guo, Ruyi Xu, Yuan Yao et al.
SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting
Zhijing Shao, Wang Zhaolong, Zhuang Li et al.
Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models
Huan Ling, Seung Wook Kim, Antonio Torralba et al.
Talk like a Graph: Encoding Graphs for Large Language Models
Bahare Fatemi, Jonathan Halcrow, Bryan Perozzi
DreamAvatar: Text-and-Shape Guided 3D Human Avatar Generation via Diffusion Models
Yukang Cao, Yan-Pei Cao, Kai Han et al.
4D-fy: Text-to-4D Generation Using Hybrid Score Distillation Sampling
Sherwin Bahmani, Ivan Skorokhodov, Victor Rong et al.
Ring-A-Bell! How Reliable are Concept Removal Methods For Diffusion Models?
Yu-Lin Tsai, Chia-Yi Hsu, Chulin Xie et al.
Photo-SLAM: Real-time Simultaneous Localization and Photorealistic Mapping for Monocular Stereo and RGB-D Cameras
Huajian Huang, Longwei Li, Hui Cheng et al.
RichDreamer: A Generalizable Normal-Depth Diffusion Model for Detail Richness in Text-to-3D
Lingteng Qiu, Guanying Chen, Xiaodong Gu et al.
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
Asma Ghandeharioun, Avi Caciularu, Adam Pearce et al.
SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models
Yuwei Guo, Ceyuan Yang, Anyi Rao et al.
Logit Standardization in Knowledge Distillation
Shangquan Sun, Wenqi Ren, Jingzhi Li et al.
Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?
Zhiqi Li, Zhiding Yu, Shiyi Lan et al.
OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving
Wenzhao Zheng, Weiliang Chen, Yuanhui Huang et al.
DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior
Jingxiang Sun, Bo Zhang, Ruizhi Shao et al.
Compositional Chain-of-Thought Prompting for Large Multimodal Models
Chancharik Mitra, Brandon Huang, Trevor Darrell et al.
Can Large Language Models Infer Causation from Correlation?
Zhijing Jin, Jiarui Liu, Zhiheng LYU et al.
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun, Ye Fang, Tong Wu et al.
What Algorithms can Transformers Learn? A Study in Length Generalization
Hattie Zhou, Arwen Bradley, Etai Littwin et al.
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs
Yanwu Xu, Yang Zhao, Zhisheng Xiao et al.
Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling
Yair Schiff, Chia Hsiang Kao, Aaron Gokaslan et al.
GauHuman: Articulated Gaussian Splatting from Monocular Human Videos
Shoukang Hu, Tao Hu, Ziwei Liu
Inversion by Direct Iteration: An Alternative to Denoising Diffusion for Image Restoration
Peyman Milanfar, Mauricio Delbracio
WHAM: Reconstructing World-grounded Humans with Accurate 3D Motion
Soyong Shin, Juyong Kim, Eni Halilaj et al.
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning
Ke Wang, Houxing Ren, Aojun Zhou et al.
MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation
Qian Huang, Jian Vora, Percy Liang et al.
Is Self-Repair a Silver Bullet for Code Generation?
Theo X. Olausson, Jeevana Priya Inala, Chenglong Wang et al.
GaussianEditor: Editing 3D Gaussians Delicately with Text Instructions
Junjie Wang, Jiemin Fang, Xiaopeng Zhang et al.
HIVE: Harnessing Human Feedback for Instructional Visual Editing
Shu Zhang, Xinyi Yang, Yihao Feng et al.
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
Zhifeng Kong, ARUSHI GOEL, Rohan Badlani et al.
ZipIt! Merging Models from Different Tasks without Training
George Stoica, Daniel Bolya, Jakob Bjorner et al.
Self-Alignment with Instruction Backtranslation
Xian Li, Ping Yu, Chunting Zhou et al.
MotionDirector: Motion Customization of Text-to-Video Diffusion Models
Rui Zhao, Yuchao Gu, Jay Zhangjie Wu et al.
Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory
Niloofar Mireshghallah, Hyunwoo Kim, Xuhui Zhou et al.
GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis
Shunyuan Zheng, Boyao ZHOU, Ruizhi Shao et al.
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Andrew Lee, Xiaoyan Bai, Itamar Pres et al.
BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion
Xuan JU, Xian Liu, Xintao Wang et al.
BooookScore: A systematic exploration of book-length summarization in the era of LLMs
Yapei Chang, Kyle Lo, Tanya Goyal et al.
Equivariant Multi-Modality Image Fusion
Zixiang Zhao, Haowen Bai, Jiangshe Zhang et al.
Diffusion-TS: Interpretable Diffusion for General Time Series Generation
Xinyu Yuan, Yan Qiao
VadCLIP: Adapting Vision-Language Models for Weakly Supervised Video Anomaly Detection
Peng Wu, Xuerong Zhou, Guansong Pang et al.
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI
Kaining Ying, Fanqing Meng, Jin Wang et al.
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
Zigang Geng, Binxin Yang, Tiankai Hang et al.
Generative End-to-End Autonomous Driving
Wenzhao Zheng, Ruiqi Song, Xianda Guo et al.
Grounded Text-to-Image Synthesis with Attention Refocusing
Quynh Phung, Songwei Ge, Jia-Bin Huang
Repeat After Me: Transformers are Better than State Space Models at Copying
Samy Jelassi, David Brandfonbrener, Sham Kakade et al.
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding
Yue Fan, Xiaojian Ma, Rujie Wu et al.
RAIN: Your Language Models Can Align Themselves without Finetuning
Yuhui Li, Fangyun Wei, Jinjing Zhao et al.
Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and Mitigation
Niels Mündler, Jingxuan He, Slobodan Jenko et al.
Extreme Compression of Large Language Models via Additive Quantization
Vage Egiazarian, Andrei Panferov, Denis Kuznedelev et al.
Learning to Act from Actionless Videos through Dense Correspondences
Po-Chen Ko, Jiayuan Mao, Yilun Du et al.
Multi-Modal Hallucination Control by Visual Information Grounding
Alessandro Favero, Luca Zancato, Matthew Trager et al.
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models
Fuzhao Xue, Zian Zheng, Yao Fu et al.
Flow Matching on General Geometries
Ricky T. Q. Chen, Yaron Lipman
Multi-Scale 3D Gaussian Splatting for Anti-Aliased Rendering
Zhiwen Yan, Weng Fei Low, Yu Chen et al.
Chain of Hindsight aligns Language Models with Feedback
Hao Liu, Carmelo Sferrazza, Pieter Abbeel
Emu: Generative Pretraining in Multimodality
Quan Sun, Qiying Yu, Yufeng Cui et al.
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
Haoning Wu, Zicheng Zhang, Erli Zhang et al.
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion
Yujie Wei, Shiwei Zhang, Zhiwu Qing et al.
Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting
Yunzhi Yan, Haotong Lin, Chenxu Zhou et al.
MetaTool Benchmark for Large Language Models: Deciding Whether to Use Tools and Which to Use
Yue Huang, Jiawen Shi, Yuan Li et al.
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Keen You, Haotian Zhang, Eldon Schoop et al.
Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints
Chaoqi Wang, Yibo Jiang, Chenghao Yang et al.
COLD-Attack: Jailbreaking LLMs with Stealthiness and Controllability
Xingang Guo, Fangxu Yu, Huan Zhang et al.
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Yuhui Xu, Lingxi Xie, Xiaotao Gu et al.
Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D Generation
Junyoung Seo, Wooseok Jang, Min-Seop Kwak et al.
Pathformer: Multi-scale Transformers with Adaptive Pathways for Time Series Forecasting
Peng Chen, Yingying ZHANG, Yunyao Cheng et al.
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Mu Cai, Haotian Liu, Siva Mustikovela et al.
PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction
Peng Wang, Hao Tan, Sai Bi et al.
Global Structure-from-Motion Revisited
Linfei Pan, Daniel Barath, Marc Pollefeys et al.
Generative Judge for Evaluating Alignment
Junlong Li, Shichao Sun, Weizhe Yuan et al.
Score identity Distillation: Exponentially Fast Distillation of Pretrained Diffusion Models for One-Step Generation
Mingyuan Zhou, Huangjie Zheng, Zhendong Wang et al.
Interpreting CLIP's Image Representation via Text-Based Decomposition
Yossi Gandelsman, Alexei Efros, Jacob Steinhardt
Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks
Vaidehi Ramesh Patil, Peter Hase, Mohit Bansal
Hypothesis Search: Inductive Reasoning with Language Models
Ruocheng Wang, Eric Zelikman, Gabriel Poesia et al.
A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting
Junhao Zhuang, Yanhong Zeng, WENRAN LIU et al.
Rotary Position Embedding for Vision Transformer
Byeongho Heo, Song Park, Dongyoon Han et al.
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
Ming Li, Taojiannan Yang, Huafeng Kuang et al.
SweetDreamer: Aligning Geometric Priors in 2D diffusion for Consistent Text-to-3D
Weiyu LI, Rui Chen, Xuelin Chen et al.
HUGS: Human Gaussian Splats
Muhammed Kocabas, Jen-Hao Rick Chang, James Gabriel et al.
FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling
Haonan Qiu, Menghan Xia, Yong Zhang et al.
Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity
Lu Yin, You Wu, Zhenyu Zhang et al.
Chain-of-Knowledge: Grounding Large Language Models via Dynamic Knowledge Adapting over Heterogeneous Sources
Xingxuan Li, Ruochen Zhao, Yew Ken Chia et al.
Time Travel in LLMs: Tracing Data Contamination in Large Language Models
Shahriar Golchin, Mihai Surdeanu
Generalized Preference Optimization: A Unified Approach to Offline Alignment
Yunhao Tang, Zhaohan Guo, Zeyu Zheng et al.
MMA-Diffusion: MultiModal Attack on Diffusion Models
Yijun Yang, Ruiyuan Gao, Xiaosen Wang et al.
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
Seonghyeon Ye, Doyoung Kim, Sungdong Kim et al.
One Step of Gradient Descent is Provably the Optimal In-Context Learner with One Layer of Linear Self-Attention
Arvind Mahankali, Tatsunori Hashimoto, Tengyu Ma
Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations
Jiaqi Zhai, Yunxing Liao, Xing Liu et al.
Infrared Small Target Detection with Scale and Location Sensitivity
Qiankun Liu, Rui Liu, Bolun Zheng et al.
Guiding Instruction-based Image Editing via Multimodal Large Language Models
Tsu-Jui Fu, Wenze Hu, Xianzhi Du et al.
Osprey: Pixel Understanding with Visual Instruction Tuning
Yuqian Yuan, Wentong Li, Jian liu et al.
DNA-GPT: Divergent N-Gram Analysis for Training-Free Detection of GPT-Generated Text
Xianjun Yang, Wei Cheng, Yue Wu et al.
Make Pixels Dance: High-Dynamic Video Generation
Yan Zeng, Guoqiang Wei, Jiani Zheng et al.
Timer: Generative Pre-trained Transformers Are Large Time Series Models
Yong Liu, Haoran Zhang, Chenyu Li et al.
LLaGA: Large Language and Graph Assistant
Runjin Chen, Tong Zhao, Ajay Jaiswal et al.
AdaCLIP: Adapting CLIP with Hybrid Learnable Prompts for Zero-Shot Anomaly Detection
Yunkang Cao, Jiangning Zhang, Luca Frittoli et al.
Efficient Deformable ConvNets: Rethinking Dynamic and Sparse Operator for Vision Applications
Yuwen Xiong, Zhiqi Li, Yuntao Chen et al.
Generalization in diffusion models arises from geometry-adaptive harmonic representations
Zahra Kadkhodaie, Florentin Guth, Eero Simoncelli et al.
Video Language Planning
Yilun Du, Sherry Yang, Pete Florence et al.
Wavelet-based Fourier Information Interaction with Frequency Diffusion Adjustment for Underwater Image Restoration
Chen Zhao, Weiling Cai, Chenyu Dong et al.
SE(3)-Stochastic Flow Matching for Protein Backbone Generation
Joey Bose, Tara Akhound-Sadegh, Guillaume Huguet et al.
Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians
Yuelang Xu, Benwang Chen, Zhe Li et al.
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
Ziniu Li, Tian Xu, Yushun Zhang et al.
GRiT: A Generative Region-to-text Transformer for Object Understanding
Jialian Wu, Jianfeng Wang, Zhengyuan Yang et al.
FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing
Yuren Cong, Mengmeng Xu, Christian Simon et al.
Optimal Transport Aggregation for Visual Place Recognition
Sergio Izquierdo, Javier Civera
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation
Tong Wu, Guandao Yang, Zhibing Li et al.
MathScale: Scaling Instruction Tuning for Mathematical Reasoning
Zhengyang Tang, Xingxing Zhang, Benyou Wang et al.
Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed
Yifan Wang, Xingyi He, Sida Peng et al.
Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition
Hao Fei, Shengqiong Wu, Wei Ji et al.
Stealing part of a production language model
Nicholas Carlini, Daniel Paleka, Krishnamurthy Dvijotham et al.
XFeat: Accelerated Features for Lightweight Image Matching
Guilherme Potje, Felipe Cadar, André Araujo et al.
GoLLIE: Annotation Guidelines improve Zero-Shot Information-Extraction
Oscar Sainz, Iker García-Ferrero, Rodrigo Agerri et al.
OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM
Yutao Hu, Tianbin, Quanfeng Lu et al.
Small-scale proxies for large-scale Transformer training instabilities
Mitchell Wortsman, Peter Liu, Lechao Xiao et al.
Multimodal Web Navigation with Instruction-Finetuned Foundation Models
Hiroki Furuta, Kuang-Huei Lee, Ofir Nachum et al.
MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning
Zayne Sprague, Xi Ye, Kaj Bostrom et al.
AnomalyDiffusion: Few-Shot Anomaly Image Generation with Diffusion Model
Teng Hu, Jiangning Zhang, Ran Yi et al.
Compact 3D Scene Representation via Self-Organizing Gaussian Grids
Wieland Morgenstern, Florian Barthel, Anna Hilsmann et al.
Hybrid-SORT: Weak Cues Matter for Online Multi-Object Tracking
Mingzhan Yang, Guangxin Han, Bin Yan et al.
SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models
Yuzhou Huang, Liangbin Xie, Xintao Wang et al.
3DGStream: On-the-Fly Training of 3D Gaussians for Efficient Streaming of Photo-Realistic Free-Viewpoint Videos
Jiakai Sun, Han Jiao, Guangyuan Li et al.
Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning
Ted Zadouri, Ahmet Üstün, Arash Ahmadian et al.
SpatialTracker: Tracking Any 2D Pixels in 3D Space
Yuxi Xiao, Qianqian Wang, Shangzhan Zhang et al.
Linearity of Relation Decoding in Transformer Language Models
Evan Hernandez, Arnab Sen Sharma, Tal Haklay et al.
Neural Video Compression with Feature Modulation
Jiahao Li, Bin Li, Yan Lu
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Wei Huang, Yangdong Liu, Haotong Qin et al.
Physics-Based Interaction with 3D Objects via Video Generation
Tianyuan Zhang, Hong-Xing Yu, Rundi Wu et al.
Image Hijacks: Adversarial Images can Control Generative Models at Runtime
Luke Bailey, Euan Ong, Stuart Russell et al.
HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding
Zhaorun Chen, Zhuokai Zhao, HONGYIN LUO et al.
Does Writing with Language Models Reduce Content Diversity?
Vishakh Padmakumar, He He
Forgery-aware Adaptive Transformer for Generalizable Synthetic Image Detection
Huan Liu, Zichang Tan, Chuangchuang Tan et al.
Improving LoRA in Privacy-preserving Federated Learning
Youbang Sun, Zitao Li, Yaliang Li et al.
WorkArena: How Capable are Web Agents at Solving Common Knowledge Work Tasks?
Alexandre Drouin, Maxime Gasse, Massimo Caccia et al.
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Dongyang Liu, Renrui Zhang, Longtian Qiu et al.
BetterV: Controlled Verilog Generation with Discriminative Guidance
Zehua Pei, Huiling Zhen, Mingxuan Yuan et al.
Transcending Forgery Specificity with Latent Space Augmentation for Generalizable Deepfake Detection
Zhiyuan Yan, Yuhao Luo, Siwei Lyu et al.
Simple linear attention language models balance the recall-throughput tradeoff
Simran Arora, Sabri Eyuboglu, Michael Zhang et al.
SkyScript: A Large and Semantically Diverse Vision-Language Dataset for Remote Sensing
Zhecheng Wang, Rajanie Prabha, Tianyuan Huang et al.