Most Cited 2024 "generative model fine-tuning" Papers
12,324 papers found • Page 2 of 62
Conference
PF-LRM: Pose-Free Large Reconstruction Model for Joint Pose and Shape Prediction
Peng Wang, Hao Tan, Sai Bi et al.
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
Mu Cai, Haotian Liu, Siva Mustikovela et al.
A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting
Junhao Zhuang, Yanhong Zeng, WENRAN LIU et al.
Hypothesis Search: Inductive Reasoning with Language Models
Ruocheng Wang, Eric Zelikman, Gabriel Poesia et al.
SweetDreamer: Aligning Geometric Priors in 2D diffusion for Consistent Text-to-3D
Weiyu LI, Rui Chen, Xuelin Chen et al.
Generative End-to-End Autonomous Driving
Wenzhao Zheng, Ruiqi Song, Xianda Guo et al.
Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting
Yunzhi Yan, Haotong Lin, Chenxu Zhou et al.
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
Ming Li, Taojiannan Yang, Huafeng Kuang et al.
Gaussian Head Avatar: Ultra High-fidelity Head Avatar via Dynamic Gaussians
Yuelang Xu, Benwang Chen, Zhe Li et al.
Osprey: Pixel Understanding with Visual Instruction Tuning
Yuqian Yuan, Wentong Li, Jian liu et al.
Chain-of-Knowledge: Grounding Large Language Models via Dynamic Knowledge Adapting over Heterogeneous Sources
Xingxuan Li, Ruochen Zhao, Yew Ken Chia et al.
Video Language Planning
Yilun Du, Sherry Yang, Pete Florence et al.
Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed
Yifan Wang, Xingyi He, Sida Peng et al.
Multimodal Web Navigation with Instruction-Finetuned Foundation Models
Hiroki Furuta, Kuang-Huei Lee, Ofir Nachum et al.
MMA-Diffusion: MultiModal Attack on Diffusion Models
Yijun Yang, Ruiyuan Gao, Xiaosen Wang et al.
Linearity of Relation Decoding in Transformer Language Models
Evan Hernandez, Arnab Sen Sharma, Tal Haklay et al.
ResDiff: Combining CNN and Diffusion Model for Image Super-resolution
Shuyao Shang, Zhengyang Shan, Guangxing Liu et al.
SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models
Yuzhou Huang, Liangbin Xie, Xintao Wang et al.
Optimal Transport Aggregation for Visual Place Recognition
Sergio Izquierdo, Javier Civera
Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning
Ted Zadouri, Ahmet Üstün, Arash Ahmadian et al.
Physics-Based Interaction with 3D Objects via Video Generation
Tianyuan Zhang, Hong-Xing Yu, Rundi Wu et al.
Rotary Position Embedding for Vision Transformer
Byeongho Heo, Song Park, Dongyoon Han et al.
Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning
Juan Rocamonde, Victoriano Montesinos, Elvis Nava et al.
Transcending Forgery Specificity with Latent Space Augmentation for Generalizable Deepfake Detection
Zhiyuan Yan, Yuhao Luo, Siwei Lyu et al.
Large Language Models as Analogical Reasoners
Michihiro Yasunaga, Xinyun Chen, Yujia Li et al.
MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning
Zayne Sprague, Xi Ye, Kaj Bostrom et al.
SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM
Mingrui Li, Shuhong Liu, Heng Zhou et al.
ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions
Chunlong Xia, Xinliang Wang, Feng Lv et al.
Probing the 3D Awareness of Visual Foundation Models
Mohamed El Banani, Amit Raj, Kevis-kokitsi Maninis et al.
Task Contamination: Language Models May Not Be Few-Shot Anymore
Changmao Li, Jeffrey Flanigan
GART: Gaussian Articulated Template Models
Jiahui Lei, Yufu Wang, Georgios Pavlakos et al.
LongVLM: Efficient Long Video Understanding via Large Language Models
Yuetian Weng, Mingfei Han, Haoyu He et al.
VLP: Vision Language Planning for Autonomous Driving
Chenbin Pan, Burhan Yaman, Tommaso Nesti et al.
GSVA: Generalized Segmentation via Multimodal Large Language Models
Zhuofan Xia, Dongchen Han, Yizeng Han et al.
LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model
Dilxat Muhtar, Zhenshi Li, Feng Gu et al.
XCube: Large-Scale 3D Generative Modeling using Sparse Voxel Hierarchies
Xuanchi Ren, Jiahui Huang, Xiaohui Zeng et al.
Relightable Gaussian Codec Avatars
Shunsuke Saito, Gabriel Schwartz, Tomas Simon et al.
SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research
Liangtai Sun, Yang Han, Zihan Zhao et al.
UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
Cong Wei, Yang Chen, Haonan Chen et al.
Dolphins: Multimodal Language Model for Driving
Yingzi Ma, Yulong Cao, Jiachen Sun et al.
Adapting Large Language Models via Reading Comprehension
Daixuan Cheng, Shaohan Huang, Furu Wei
SCTNet: Single Branch CNN with Transformer Semantic Information for Real-Time Segmentation
Authors: Zhengze Xu, Dongyue Wu, Changqian Yu et al.
NeuRAD: Neural Rendering for Autonomous Driving
Adam Tonderski, Carl Lindström, Georg Hess et al.
MogaNet: Multi-order Gated Aggregation Network
Siyuan Li, Zedong Wang, Zicheng Liu et al.
ST-LLM: Large Language Models Are Effective Temporal Learners
Ruyang Liu, Chen Li, Haoran Tang et al.
Text-IF: Leveraging Semantic Text Guidance for Degradation-Aware and Interactive Image Fusion
Xunpeng Yi, Han Xu, HAO ZHANG et al.
FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering
Zhenyu Li, Sunqi Fan, Yu Gu et al.
Generalized Predictive Model for Autonomous Driving
Jiazhi Yang, Shenyuan Gao, Yihang Qiu et al.
Paying More Attention to Images: A Training-Free Method for Alleviating Hallucination in LVLMs
Shi Liu, Kecheng Zheng, Wei Chen
AnyText: Multilingual Visual Text Generation and Editing
Yuxiang Tuo, Wangmeng Xiang, Jun-Yan He et al.
SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
Feng Wang, Jieru Mei, Alan Yuille
Real-IAD: A Real-World Multi-View Dataset for Benchmarking Versatile Industrial Anomaly Detection
Chengjie Wang, wenbing zhu, Bin-Bin Gao et al.
GenSim: Generating Robotic Simulation Tasks via Large Language Models
Lirui Wang, Yiyang Ling, Zhecheng Yuan et al.
Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
Rohit Gandikota, Joanna Materzynska, Tingrui Zhou et al.
Drag Anything: Motion Control for Anything using Entity Representation
Weijia Wu, Zhuang Li, Yuchao Gu et al.
DiffiT: Diffusion Vision Transformers for Image Generation
Ali Hatamizadeh, Jiaming Song, Guilin Liu et al.
OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning
Lingyi Hong, Shilin Yan, Renrui Zhang et al.
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
Weiyun Wang, Min Shi, Qingyun Li et al.
Autoregressive Queries for Adaptive Tracking with Spatio-Temporal Transformers
Jinxia Xie, Bineng Zhong, Zhiyi Mo et al.
VideoBooth: Diffusion-based Video Generation with Image Prompts
Yuming Jiang, Tianxing Wu, Shuai Yang et al.
EditGuard: Versatile Image Watermarking for Tamper Localization and Copyright Protection
Xuanyu Zhang, Runyi Li, Jiwen Yu et al.
MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model
Wenxun Dai, Ling-Hao Chen, Jingbo Wang et al.
Towards Learning a Generalist Model for Embodied Navigation
Duo Zheng, Shijia Huang, Lin Zhao et al.
Teaching Arithmetic to Small Transformers
Nayoung Lee, Kartik Sreenivasan, Jason Lee et al.
InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning
Yan-Shuo Liang, Wu-Jun Li
Hallucination Augmented Contrastive Learning for Multimodal Large Language Model
Chaoya Jiang, Haiyang Xu, Mengfan Dong et al.
Decomposed Diffusion Sampler for Accelerating Large-Scale Inverse Problems
Hyungjin Chung, Suhyeon Lee, Jong Chul YE
InstructIR: High-Quality Image Restoration Following Human Instructions
Marcos Conde, Gregor Geigle, Radu Timofte
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Hao Zhang, Hongyang Li, Feng Li et al.
SEA-RAFT: Simple, Efficient, Accurate RAFT for Optical Flow
Yihan Wang, Lahav Lipson, Jia Deng
Human Gaussian Splatting: Real-time Rendering of Animatable Avatars
Arthur Moreau, Jifei Song, Helisa Dhamo et al.
ShapeLLM: Universal 3D Object Understanding for Embodied Interaction
Zekun Qi, Runpei Dong, Shaochen Zhang et al.
Implicit Style-Content Separation using B-LoRA
Yarden Frenkel, Yael Vinker, Ariel Shamir et al.
DynMF: Neural Motion Factorization for Real-time Dynamic View Synthesis with 3D Gaussian Splatting
Angelos Kratimenos, Jiahui Lei, Kostas Daniilidis
Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving
Ming Nie, Renyuan Peng, Chunwei Wang et al.
One-dimensional Adapter to Rule Them All: Concepts Diffusion Models and Erasing Applications
Mengyao Lyu, Yuhong Yang, Haiwen Hong et al.
ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models
Yingqing He, Shaoshu Yang, Haoxin Chen et al.
SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation
Wenxi Yue, Jing Zhang, Kun Hu et al.
LLMCarbon: Modeling the End-to-End Carbon Footprint of Large Language Models
Ahmad Faiz, Sotaro Kaneda, Ruhan Wang et al.
IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection
Mingjin Zhang, Yuchun Wang, Jie Guo et al.
MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis
Dewei Zhou, You Li, Fan Ma et al.
VideoLLM-online: Online Video Large Language Model for Streaming Video
Joya Chen, Zhaoyang Lv, Shiwei Wu et al.
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
Yazhou Xing, Yingqing He, Zeyue Tian et al.
Gaussian in the wild: 3D Gaussian Splatting for Unconstrained Image Collections
Dongbin Zhang, Chuming Wang, Weitao Wang et al.
Efficient Test-Time Adaptation of Vision-Language Models
Adilbek Karmanov, Dayan Guan, Shijian Lu et al.
Binding Touch to Everything: Learning Unified Multimodal Tactile Representations
Fengyu Yang, Chao Feng, Ziyang Chen et al.
Can I Trust Your Answer? Visually Grounded Video Question Answering
Junbin Xiao, Angela Yao, Yicong Li et al.
Motion Mamba: Efficient and Long Sequence Motion Generation
Zeyu Zhang, Akide Liu, Ian Reid et al.
Universal Jailbreak Backdoors from Poisoned Human Feedback
Javier Rando, Florian Tramer
Mitigating Large Language Model Hallucinations via Autonomous Knowledge Graph-Based Retrofitting
Xinyan Guan, Yanjiang Liu, Hongyu Lin et al.
Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models
Xianfang Zeng, Xin Chen, Zhongqi Qi et al.
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
Chuofan Ma, Yi Jiang, Jiannan Wu et al.
Unpaired Image-to-Image Translation via Neural Schrödinger Bridge
Beomsu Kim, Gihyun Kwon, Kwanyoung Kim et al.
FreGS: 3D Gaussian Splatting with Progressive Frequency Regularization
Jiahui Zhang, Fangneng Zhan, MUYU XU et al.
OMG-Seg: Is One Model Good Enough For All Segmentation?
Xiangtai Li, Haobo Yuan, Wei Li et al.
CompGS: Smaller and Faster Gaussian Splatting with Vector Quantization
K L Navaneet, Kossar Pourahmadi, Soroush Abbasi Koohpayegani et al.
SimDA: Simple Diffusion Adapter for Efficient Video Generation
Zhen Xing, Qi Dai, Han Hu et al.
ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation
Guanxing Lu, Shiyi Zhang, Ziwei Wang et al.
ReNoise: Real Image Inversion Through Iterative Noising
Daniel Garibi, Or Patashnik, Andrey Voynov et al.
GigaPose: Fast and Robust Novel Object Pose Estimation via One Correspondence
Van Nguyen Nguyen, Thibault Groueix, Mathieu Salzmann et al.
A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models
Haoran Xu, Young Jin Kim, Amr Mohamed Nabil Aly Aly Sharaf et al.
Exploring Large Language Model for Graph Data Understanding in Online Job Recommendations
Likang Wu, Zhaopeng Qiu, Zhi Zheng et al.
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization
Weiran Yao, Shelby Heinecke, Juan Carlos Niebles et al.
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation
Shaowei Liu, Zhongzheng Ren, Saurabh Gupta et al.
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
Size Wu, Wenwei Zhang, Lumin Xu et al.
PromptAD: Learning Prompts with only Normal Samples for Few-Shot Anomaly Detection
Xiaofan Li, Zhizhong Zhang, Xin Tan et al.
TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering
Jingye Chen, Yupan Huang, Tengchao Lv et al.
Understanding Catastrophic Forgetting in Language Models via Implicit Inference
Suhas Kotha, Jacob Springer, Aditi Raghunathan
How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs
Haoqin Tu, Chenhang Cui, Zijun Wang et al.
Synapse: Trajectory-as-Exemplar Prompting with Memory for Computer Control
Longtao Zheng, Rundong Wang, Xinrun Wang et al.
RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding
Jihan Yang, Runyu Ding, Weipeng DENG et al.
Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data
Shufan Li, Aditya Grover, Harkanwar Singh
TimesURL: Self-Supervised Contrastive Learning for Universal Time Series Representation Learning
jiexi Liu, Songcan Chen
latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction
Christopher Wewer, Kevin Raj, Eddy Ilg et al.
Zero-Reference Low-Light Enhancement via Physical Quadruple Priors
Wenjing Wang, Huan Yang, Jianlong Fu et al.
PLGSLAM: Progressive Neural Scene Represenation with Local to Global Bundle Adjustment
Tianchen Deng, Guole Shen, Tong Qin et al.
MVDiffHD: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction
Shitao Tang, Jiacheng Chen, Dilin Wang et al.
Unified Human-Scene Interaction via Prompted Chain-of-Contacts
Zeqi Xiao, Tai Wang, Jingbo Wang et al.
Fully-Connected Spatial-Temporal Graph for Multivariate Time-Series Data
Yucheng Wang, Yuecong Xu, Jianfei Yang et al.
OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models
Changhun Lee, Jungyu Jin, Taesu Kim et al.
Stronger Fewer & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation
ZHIXIANG WEI, Lin Chen, Xiaoxiao Ma et al.
3D Geometry-Aware Deformable Gaussian Splatting for Dynamic View Synthesis
Zhicheng Lu, xiang guo, Le Hui et al.
An Attentive Inductive Bias for Sequential Recommendation beyond the Self-Attention
Yehjin Shin, Jeongwhan Choi, Hyowon Wi et al.
Rolling-Unet: Revitalizing MLP’s Ability to Efficiently Extract Long-Distance Dependencies for Medical Image Segmentation
Yutong Liu, Haijiang Zhu, Mengting Liu et al.
DiffusionDepth: Diffusion Denoising Approach for Monocular Depth Estimation
Yiqun Duan, Xianda Guo, Zheng Zhu
LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models
Hai Jiang, Ao Luo, Xiaohong Liu et al.
DOCCI: Descriptions of Connected and Contrasting Images
Yasumasa Onoe, Sunayana Rane, Zachary E Berger et al.
Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks
MohammadReza Davari, Eugene Belilovsky
UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition
Wenxuan Zhou, Sheng Zhang, Yu Gu et al.
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
Chuwei Luo, Yufan Shen, Zhaoqing Zhu et al.
SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design
Seokju Yun, Youngmin Ro
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
Nikhil Prakash, Tamar Shaham, Tal Haklay et al.
UCMCTrack: Multi-Object Tracking with Uniform Camera Motion Compensation
Kefu Yi, Kai Luo, Xiaolei Luo et al.
DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation
Bowen Yin, Xuying Zhang, Zhong-Yu Li et al.
Single-Model and Any-Modality for Video Object Tracking
Zongwei Wu, Jilai Zheng, Xiangxuan Ren et al.
Pixel-GS Density Control with Pixel-aware Gradient for 3D Gaussian Splatting
Zheng Zhang, WENBO HU, Yixing Lao et al.
Rethinking Model Ensemble in Transfer-based Adversarial Attacks
Huanran Chen, Yichi Zhang, Yinpeng Dong et al.
Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer
Rafail Fridman, Danah Yatim, Omer Bar-Tal et al.
Fluctuation-Based Adaptive Structured Pruning for Large Language Models
Yongqi An, Xu Zhao, Tao Yu et al.
GARField: Group Anything with Radiance Fields
Chung Min Kim, Mingxuan Wu, Justin Kerr et al.
HIPTrack: Visual Tracking with Historical Prompts
Wenrui Cai, Qingjie Liu, Yunhong Wang
GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction
Yuanhui Huang, Wenzhao Zheng, Yunpeng Zhang et al.
Self-correcting LLM-controlled Diffusion Models
Tsung-Han Wu, Long Lian, Joseph Gonzalez et al.
Revising Densification in Gaussian Splatting
Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder
Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models
Yifan Li, hangyu guo, Kun Zhou et al.
VISA: Reasoning Video Object Segmentation via Large Language Model
Cilin Yan, haochen wang, Shilin Yan et al.
An Empirical Study of CLIP for Text-Based Person Search
Cao Min, Yang Bai, ziyin Zeng et al.
HyperAttention: Long-context Attention in Near-Linear Time
Insu Han, Rajesh Jayaram, Amin Karbasi et al.
Noise-free Score Distillation
Oren Katzir, Or Patashnik, Daniel Cohen-Or et al.
Towards Open-ended Visual Quality Comparison
Haoning Wu, Hanwei Zhu, Zicheng Zhang et al.
Generative Image Dynamics
Zhengqi Li, Richard Tucker, Noah Snavely et al.
Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models
Ruichen Wang, Zekang Chen, Chen Chen et al.
Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion
Lunjun Zhang, Yuwen Xiong, Ze Yang et al.
Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models
Zhiyuan You, Zheyuan Li, Jinjin Gu et al.
Decoding Natural Images from EEG for Object Recognition
Yonghao Song, Bingchuan Liu, Xiang Li et al.
STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians
Yifei Zeng, Yanqin Jiang, Siyu Zhu et al.
Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance
Liting Lin, Heng Fan, Zhipeng Zhang et al.
8976 PointAttN: You Only Need Attention for Point Cloud Completion
Jun Wang, Ying Cui, Dongyan Guo et al.
Consistency-guided Prompt Learning for Vision-Language Models
Shuvendu Roy, Ali Etemad
Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties
Taylor Sorensen, Liwei Jiang, Jena Hwang et al.
SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation
Bin Xie, Jiale Cao, Jin Xie et al.
At Which Training Stage Does Code Data Help LLMs Reasoning?
ma yingwei, Yue Liu, Yue Yu et al.
Brain decoding: toward real-time reconstruction of visual perception
Yohann Benchetrit, Hubert Banville, Jean-Remi King
Decoupled Contrastive Multi-View Clustering with High-Order Random Walks
Yiding Lu, Yijie Lin, Mouxing Yang et al.
CoR-GS: Sparse-View 3D Gaussian Splatting via Co-Regularization
Jiawei Zhang, Jiahe Li, Xiaohan Yu et al.
Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks
Samyak Jain, Robert Kirk, Ekdeep Singh Lubana et al.
Bayes' Rays: Uncertainty Quantification for Neural Radiance Fields
Leili Goli, Cody Reading, Silvia Sellán et al.
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation
Sihan liu, Yiwei Ma, Xiaoqing Zhang et al.
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Haoran Wei, Lingyu Kong, Jinyue Chen et al.
DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing
Chong Mou, Xintao Wang, Jiechong Song et al.
VidToMe: Video Token Merging for Zero-Shot Video Editing
Xirui Li, Chao Ma, Xiaokang Yang et al.
GES : Generalized Exponential Splatting for Efficient Radiance Field Rendering
Abdullah J Hamdi, Luke Melas-Kyriazi, Jinjie Mai et al.
Reliable Conflictive Multi-View Learning
Cai Xu, Jiajun Si, Ziyu Guan et al.
Training Socially Aligned Language Models on Simulated Social Interactions
Ruibo Liu, Ruixin Yang, Chenyan Jia et al.
Boosting Adversarial Transferability by Block Shuffle and Rotation
Kunyu Wang, he xuanran, Wenxuan Wang et al.
Improved sampling via learned diffusions
Lorenz Richter, Julius Berner
SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore
Sewon Min, Suchin Gururangan, Eric Wallace et al.
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
Weiyun Wang Weiyun, yiming ren, Haowen Luo et al.
FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning
Haokun Chen, Yao Zhang, Denis Krompass et al.
ZeroNVS: Zero-Shot 360-Degree View Synthesis from a Single Image
Kyle Sargent, Zizhang Li, Tanmay Shah et al.
KoLA: Carefully Benchmarking World Knowledge of Large Language Models
Jifan Yu, Xiaozhi Wang, Shangqing Tu et al.
Finetuning Text-to-Image Diffusion Models for Fairness
Xudong Shen, Chao Du, Tianyu Pang et al.
DiffuseMix: Label-Preserving Data Augmentation with Diffusion Models
Khawar Islam, Muhammad Zaigham Zaheer, Arif Mahmood et al.
AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error
Jonas Ricker, Denis Lukovnikov, Asja Fischer
How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?
Jingfeng Wu, Difan Zou, Zixiang Chen et al.
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
Yifei Huang, Guo Chen, Jilan Xu et al.
VIGC: Visual Instruction Generation and Correction
Théo Delemazure, Jérôme Lang, Grzegorz Pierczyński
Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation
Alexander Raistrick, Lingjie Mei, Karhan Kayan et al.
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry
Michael Zhang, Kush Bhatia, Hermann Kumbong et al.
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri et al.
AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion
yitong jiang, Zhaoyang Zhang, Tianfan Xue et al.
Detecting, Explaining, and Mitigating Memorization in Diffusion Models
Yuxin Wen, Yuchen Liu, Chen Chen et al.
Bidirectional Multi-Scale Implicit Neural Representations for Image Deraining
Xiang Chen, Jinshan Pan, Jiangxin Dong
TUMTraf V2X Cooperative Perception Dataset
Walter Zimmer, Gerhard Arya Wardana, Suren Sritharan et al.
Human Feedback is not Gold Standard
Tom Hosking, Phil Blunsom, Max Bartolo
T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
Qing Jiang, Feng Li, Zhaoyang Zeng et al.
PSALM: Pixelwise Segmentation with Large Multi-modal Model
Zheng Zhang, YeYao Ma, Enming Zhang et al.
Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation
Yuanchen Ju, Kaizhe Hu, Guowei Zhang et al.
Event Stream-based Visual Object Tracking: A High-Resolution Benchmark Dataset and A Novel Baseline
Xiao Wang, Shiao Wang, Chuanming Tang et al.
Video ReCap: Recursive Captioning of Hour-Long Videos
Md Mohaiminul Islam, Vu Bao Ngan Ho, Xitong Yang et al.