Most Cited 2024 "generalizable denoising" Papers
12,324 papers found • Page 5 of 62
Conference
Open3DIS: Open-Vocabulary 3D Instance Segmentation with 2D Mask Guidance
Phuc Nguyen, Tuan Duc Ngo, Evangelos Kalogerakis et al.
latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction
Christopher Wewer, Kevin Raj, Eddy Ilg et al.
Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks
MohammadReza Davari, Eugene Belilovsky
On Prompt-Driven Safeguarding for Large Language Models
Chujie Zheng, Fan Yin, Hao Zhou et al.
HIPTrack: Visual Tracking with Historical Prompts
Wenrui Cai, Qingjie Liu, Yunhong Wang
Consistent4D: Consistent 360° Dynamic Object Generation from Monocular Video
Yanqin Jiang, Li Zhang, Jin Gao et al.
Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data
Shufan Li, Aditya Grover, Harkanwar Singh
SNI-SLAM: Semantic Neural Implicit SLAM
Siting Zhu, Guangming Wang, Hermann Blum et al.
HIFA: High-fidelity Text-to-3D Generation with Advanced Diffusion Guidance
Junzhe Zhu, Peiye Zhuang, Sanmi Koyejo
How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs
Haoqin Tu, Chenhang Cui, Zijun Wang et al.
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search
Yuchen Zhuang, Xiang Chen, Tong Yu et al.
CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets
Lifan Yuan, Yangyi Chen, Xingyao Wang et al.
TimesURL: Self-Supervised Contrastive Learning for Universal Time Series Representation Learning
jiexi Liu, Songcan Chen
HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting
Xian Liu, Xiaohang Zhan, Jiaxiang Tang et al.
AdaShield: Safeguarding Multimodal Large Language Models from Structure-based Attack via Adaptive Shield Prompting
Yu Wang, Xiaogeng Liu, Yu Li et al.
Conformal Language Modeling
Victor Quach, Adam Fisch, Tal Schuster et al.
PLGSLAM: Progressive Neural Scene Represenation with Local to Global Bundle Adjustment
Tianchen Deng, Guole Shen, Tong Qin et al.
Large Language Models as Generalizable Policies for Embodied Tasks
Andrew Szot, Max Schwarzer, Harsh Agrawal et al.
OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Models
Changhun Lee, Jungyu Jin, Taesu Kim et al.
DiffuScene: Denoising Diffusion Models for Generative Indoor Scene Synthesis
Jiapeng Tang, Yinyu Nie, Lev Markhasin et al.
ASH: Animatable Gaussian Splats for Efficient and Photoreal Human Rendering
Haokai Pang, Heming Zhu, Adam Kortylewski et al.
ReLU Strikes Back: Exploiting Activation Sparsity in Large Language Models
Iman Mirzadeh, Keivan Alizadeh-Vahid, Sachin Mehta et al.
Can Large Language Models Serve as Rational Players in Game Theory? A Systematic Analysis
Caoyun Fan, Jindou Chen, Yaohui Jin et al.
Fully-Connected Spatial-Temporal Graph for Multivariate Time-Series Data
Yucheng Wang, Yuecong Xu, Jianfei Yang et al.
An Attentive Inductive Bias for Sequential Recommendation beyond the Self-Attention
Yehjin Shin, Jeongwhan Choi, Hyowon Wi et al.
Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
Long Qian, Juncheng Li, Yu Wu et al.
RegionPLC: Regional Point-Language Contrastive Learning for Open-World 3D Scene Understanding
Jihan Yang, Runyu Ding, Weipeng DENG et al.
Knowledge Fusion of Large Language Models
Fanqi Wan, Xinting Huang, Deng Cai et al.
DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Muyang Li, Tianle Cai, Jiaxin Cao et al.
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
Willi Menapace, Aliaksandr Siarohin, Ivan Skorokhodov et al.
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training
Dawei Zhu, Nan Yang, Liang Wang et al.
MMM: Generative Masked Motion Model
Ekkasit Pinyoanuntapong, Pu Wang, Minwoo Lee et al.
Provably Robust DPO: Aligning Language Models with Noisy Feedback
Sayak Ray Chowdhury, Anush Kini, Nagarajan Natarajan
Stronger Fewer & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation
ZHIXIANG WEI, Lin Chen, Xiaoxiao Ma et al.
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast
Xiangming Gu, Xiaosen Zheng, Tianyu Pang et al.
DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback
Yangyi Chen, Karan Sikka, Michael Cogswell et al.
Restoring Images in Adverse Weather Conditions via Histogram Transformer
Shangquan Sun, Wenqi Ren, Xinwei Gao et al.
Rethinking Inductive Biases for Surface Normal Estimation
Gwangbin Bae, Andrew J. Davison
Universal Humanoid Motion Representations for Physics-Based Control
Zhengyi Luo, Jinkun Cao, Josh Merel et al.
GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements
Alexander Havrilla, Sharath Chandra Raparthy, Christoforos Nalmpantis et al.
Noisy-Correspondence Learning for Text-to-Image Person Re-identification
Yang Qin, Yingke Chen, Dezhong Peng et al.
Vision-by-Language for Training-Free Compositional Image Retrieval
Shyamgopal Karthik, Karsten Roth, Massimiliano Mancini et al.
SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design
Seokju Yun, Youngmin Ro
UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition
Wenxuan Zhou, Sheng Zhang, Yu Gu et al.
VDT: General-purpose Video Diffusion Transformers via Mask Modeling
Haoyu Lu, Guoxing Yang, Nanyi Fei et al.
How Transformers Learn Causal Structure with Gradient Descent
Eshaan Nichani, Alex Damian, Jason Lee
BadEdit: Backdooring Large Language Models by Model Editing
Yanzhou Li, Tianlin Li, Kangjie Chen et al.
3D Geometry-Aware Deformable Gaussian Splatting for Dynamic View Synthesis
Zhicheng Lu, xiang guo, Le Hui et al.
FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition
Sicheng Mo, Fangzhou Mu, Kuan Heng Lin et al.
LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models
Hai Jiang, Ao Luo, Xiaohong Liu et al.
The Expressive Power of Low-Rank Adaptation
Yuchen Zeng, Kangwook Lee
Tag2Text: Guiding Vision-Language Model via Image Tagging
Xinyu Huang, Youcai Zhang, Jinyu Ma et al.
HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation
Xin Huang, Ruizhi Shao, Qi Zhang et al.
Self-correcting LLM-controlled Diffusion Models
Tsung-Han Wu, Long Lian, Joseph Gonzalez et al.
Unified Human-Scene Interaction via Prompted Chain-of-Contacts
Zeqi Xiao, Tai Wang, Jingbo Wang et al.
Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling
Bairu Hou, Yujian Liu, Kaizhi Qian et al.
Language Model Cascades: Token-Level Uncertainty And Beyond
Neha Gupta, Harikrishna Narasimhan, Wittawat Jitkrittum et al.
UCMCTrack: Multi-Object Tracking with Uniform Camera Motion Compensation
Kefu Yi, Kai Luo, Xiaolei Luo et al.
STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians
Yifei Zeng, Yanqin Jiang, Siyu Zhu et al.
Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models
Yifan Li, hangyu guo, Kun Zhou et al.
Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer
Rafail Fridman, Danah Yatim, Omer Bar-Tal et al.
Proactive Detection of Voice Cloning with Localized Watermarking
Robin San Roman, Pierre Fernandez, Hady Elsahar et al.
Visual Point Cloud Forecasting enables Scalable Autonomous Driving
Zetong Yang, Li Chen, Yanan Sun et al.
Single-Model and Any-Modality for Video Object Tracking
Zongwei Wu, Jilai Zheng, Xiangxuan Ren et al.
MVDiffHD: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction
Shitao Tang, Jiacheng Chen, Dilin Wang et al.
ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe
Yifan Bai, Zeyang Zhao, Yihong Gong et al.
SyncTalk: The Devil is in the Synchronization for Talking Head Synthesis
Ziqiao Peng, Wentao Hu, Yue Shi et al.
4K4D: Real-Time 4D View Synthesis at 4K Resolution
Zhen Xu, Sida Peng, Haotong Lin et al.
DOCCI: Descriptions of Connected and Contrasting Images
Yasumasa Onoe, Sunayana Rane, Zachary E Berger et al.
Circuit Component Reuse Across Tasks in Transformer Language Models
Jack Merullo, Carsten Eickhoff, Ellie Pavlick
Towards image compression with perfect realism at ultra-low bitrates
Marlene Careil, Matthew J Muckley, Jakob Verbeek et al.
VISA: Reasoning Video Object Segmentation via Large Language Model
Cilin Yan, haochen wang, Shilin Yan et al.
Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models
Bilgehan Sel, Ahmad Al-Tawaha, Vanshaj Khattar et al.
DiffusionDepth: Diffusion Denoising Approach for Monocular Depth Estimation
Yiqun Duan, Xianda Guo, Zheng Zhu
GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction
Yuanhui Huang, Wenzhao Zheng, Yunpeng Zhang et al.
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
Nikhil Prakash, Tamar Shaham, Tal Haklay et al.
Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF
Anand Siththaranjan, Cassidy Laidlaw, Dylan Hadfield-Menell
Symphonize 3D Semantic Scene Completion with Contextual Instance Queries
Haoyi Jiang, Tianheng Cheng, Naiyu Gao et al.
Pixel-GS Density Control with Pixel-aware Gradient for 3D Gaussian Splatting
Zheng Zhang, WENBO HU, Yixing Lao et al.
Decoding Natural Images from EEG for Object Recognition
Yonghao Song, Bingchuan Liu, Xiang Li et al.
DataInf: Efficiently Estimating Data Influence in LoRA-tuned LLMs and Diffusion Models
Yongchan Kwon, Eric Wu, Kevin Wu et al.
Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization
Weiyang Liu, Zeju Qiu, Yao Feng et al.
ChatScene: Knowledge-Enabled Safety-Critical Scenario Generation for Autonomous Vehicles
Jiawei Zhang, Chejian Xu, Bo Li
DFormer: Rethinking RGBD Representation Learning for Semantic Segmentation
Bowen Yin, Xuying Zhang, Zhong-Yu Li et al.
BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning
Siyuan Liang, Mingli Zhu, Aishan Liu et al.
InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks
Xueyu Hu, Ziyu Zhao, Shuang Wei et al.
GARField: Group Anything with Radiance Fields
Chung Min Kim, Mingxuan Wu, Justin Kerr et al.
Rethinking Model Ensemble in Transfer-based Adversarial Attacks
Huanran Chen, Yichi Zhang, Yinpeng Dong et al.
FoundPose: Unseen Object Pose Estimation with Foundation Features
Evin Pınar Örnek, Yann Labbé, Bugra Tekin et al.
An Empirical Study of CLIP for Text-Based Person Search
Cao Min, Yang Bai, ziyin Zeng et al.
Iterated Denoising Energy Matching for Sampling from Boltzmann Densities
Tara Akhound-Sadegh, Jarrid Rector-Brooks, Joey Bose et al.
HyperAttention: Long-context Attention in Near-Linear Time
Insu Han, Rajesh Jayaram, Amin Karbasi et al.
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
Xichen Pan, Li Dong, Shaohan Huang et al.
Revising Densification in Gaussian Splatting
Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder
Rolling-Unet: Revitalizing MLP’s Ability to Efficiently Extract Long-Distance Dependencies for Medical Image Segmentation
Yutong Liu, Haijiang Zhu, Mengting Liu et al.
Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance
Liting Lin, Heng Fan, Zhipeng Zhang et al.
LDMVFI: Video Frame Interpolation with Latent Diffusion Models
Duolikun Danier, Fan Zhang, David Bull
MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data
Paul Scotti, Mihir Tripathy, Cesar Kadir Torrico Villanueva et al.
Boosting Adversarial Transferability by Block Shuffle and Rotation
Kunyu Wang, he xuanran, Wenxuan Wang et al.
Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks
Samyak Jain, Robert Kirk, Ekdeep Singh Lubana et al.
SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation
Bin Xie, Jiale Cao, Jin Xie et al.
CADS: Unleashing the Diversity of Diffusion Models through Condition-Annealed Sampling
Seyedmorteza Sadat, Jakob Buhmann, Derek Bradley et al.
Reliable Conflictive Multi-View Learning
Cai Xu, Jiajun Si, Ziyu Guan et al.
Generative Image Dynamics
Zhengqi Li, Richard Tucker, Noah Snavely et al.
A Semantic Invariant Robust Watermark for Large Language Models
Aiwei Liu, Leyi Pan, Xuming Hu et al.
Explicit Visual Prompts for Visual Object Tracking
Liangtao Shi, Bineng Zhong, Qihua Liang et al.
On the Diversity and Realism of Distilled Dataset: An Efficient Dataset Distillation Paradigm
Peng Sun, Bei Shi, Daiwei Yu et al.
Flora: Low-Rank Adapters Are Secretly Gradient Compressors
Yongchang Hao, Yanshuai Cao, Lili Mou
Receler: Reliable Concept Erasing of Text-to-Image Diffusion Models via Lightweight Erasers
Chi-Pin Huang, Kai-Po Chang, Chung-Ting Tsai et al.
ARGS: Alignment as Reward-Guided Search
Maxim Khanov, Jirayu Burapacheep, Yixuan Li
GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting
Xiaoyu Zhou, Xingjian Ran, Yajiao Xiong et al.
Preserving Fairness Generalization in Deepfake Detection
Li Lin, Xinan He, Yan Ju et al.
Noise-free Score Distillation
Oren Katzir, Or Patashnik, Daniel Cohen-Or et al.
Label-free Node Classification on Graphs with Large Language Models (LLMs)
Zhikai Chen, Haitao Mao, Hongzhi Wen et al.
GaussCtrl: Multi-View Consistent Text-Driven 3D Gaussian Splatting Editing
Jing Wu, Jiawang Bian, Xinghui Li et al.
Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models
Zhiyuan You, Zheyuan Li, Jinjin Gu et al.
Diffusion Language Models Are Versatile Protein Learners
Xinyou Wang, Zaixiang Zheng, Fei YE et al.
MiniLLM: Knowledge Distillation of Large Language Models
Yuxian Gu, Li Dong, Furu Wei et al.
Towards Open-ended Visual Quality Comparison
Haoning Wu, Hanwei Zhu, Zicheng Zhang et al.
At Which Training Stage Does Code Data Help LLMs Reasoning?
ma yingwei, Yue Liu, Yue Yu et al.
SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking
Xiaojun Hou, Jiazheng Xing, Yijie Qian et al.
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference
Piotr Nawrot, Adrian Łańcucki, Marcin Chochowski et al.
Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion
Lunjun Zhang, Yuwen Xiong, Ze Yang et al.
Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation
Xinyu Tang, Richard Shin, Huseyin Inan et al.
DsDm: Model-Aware Dataset Selection with Datamodels
Logan Engstrom
Value Kaleidoscope: Engaging AI with Pluralistic Human Values, Rights, and Duties
Taylor Sorensen, Liwei Jiang, Jena Hwang et al.
SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation
Thuan Nguyen, Anh Tran
Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models
Ruichen Wang, Zekang Chen, Chen Chen et al.
Brain decoding: toward real-time reconstruction of visual perception
Yohann Benchetrit, Hubert Banville, Jean-Remi King
FeatUp: A Model-Agnostic Framework for Features at Any Resolution
Stephanie Fu, Mark Hamilton, Laura E. Brandt et al.
VidToMe: Video Token Merging for Zero-Shot Video Editing
Xirui Li, Chao Ma, Xiaokang Yang et al.
Improved sampling via learned diffusions
Lorenz Richter, Julius Berner
Consistency-guided Prompt Learning for Vision-Language Models
Shuvendu Roy, Ali Etemad
Residual Denoising Diffusion Models
Jiawei Liu, Qiang Wang, Huijie Fan et al.
PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI
Yandan Yang, Baoxiong Jia, Peiyuan Zhi et al.
Large Language Models are Geographically Biased
Rohin Manvi, Samar Khanna, Marshall Burke et al.
Position: Levels of AGI for Operationalizing Progress on the Path to AGI
Meredith Morris, Jascha Sohl-Dickstein, Noah Fiedel et al.
Unbiased Watermark for Large Language Models
Zhengmian Hu, Lichang Chen, Xidong Wu et al.
Diffusion Models Without Attention
Jing Nathan Yan, Jiatao Gu, Alexander Rush
An Extensible Framework for Open Heterogeneous Collaborative Perception
Yifan Lu, Yue Hu, Yiqi Zhong et al.
DPOT: Auto-Regressive Denoising Operator Transformer for Large-Scale PDE Pre-Training
Zhongkai Hao, Chang Su, LIU SONGMING et al.
Bayes' Rays: Uncertainty Quantification for Neural Radiance Fields
Leili Goli, Cody Reading, Silvia Sellán et al.
GES : Generalized Exponential Splatting for Efficient Radiance Field Rendering
Abdullah J Hamdi, Luke Melas-Kyriazi, Jinjie Mai et al.
Decoupled Contrastive Multi-View Clustering with High-Order Random Walks
Yiding Lu, Yijie Lin, Mouxing Yang et al.
Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts
Jiawen Zhu, Guansong Pang
8976 PointAttN: You Only Need Attention for Point Cloud Completion
Jun Wang, Ying Cui, Dongyan Guo et al.
AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers
Reduan Achtibat, Sayed Mohammad Vakilzadeh Hatefi, Maximilian Dreyer et al.
Localizing Task Information for Improved Model Merging and Compression
Ke Wang, Nikolaos Dimitriadis, Guillermo Ortiz-Jimenez et al.
Bayesian Low-rank Adaptation for Large Language Models
Adam Yang, Maxime Robeyns, Xi Wang et al.
Online Speculative Decoding
Xiaoxuan Liu, Lanxiang Hu, Peter Bailis et al.
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation
Sihan liu, Yiwei Ma, Xiaoqing Zhang et al.
SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection
Peng Qi, Zehong Yan, Wynne Hsu et al.
RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models
Ozgur Kara, Bariscan Kurtkaya, Hidir Yesiltepe et al.
LAA-Net: Localized Artifact Attention Network for Quality-Agnostic and Generalizable Deepfake Detection
Dat NGUYEN, Nesryne Mejri, Inder Pal Singh et al.
CapsFusion: Rethinking Image-Text Data at Scale
Qiying Yu, Quan Sun, Xiaosong Zhang et al.
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry
Michael Zhang, Kush Bhatia, Hermann Kumbong et al.
CrossKD: Cross-Head Knowledge Distillation for Object Detection
JiaBao Wang, yuming chen, Zhaohui Zheng et al.
Frequency-Adaptive Dilated Convolution for Semantic Segmentation
Linwei Chen, Lin Gu, Dezhi Zheng et al.
CrossQ: Batch Normalization in Deep Reinforcement Learning for Greater Sample Efficiency and Simplicity
Aditya Bhatt, Daniel Palenicek, Boris Belousov et al.
DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing
Chong Mou, Xintao Wang, Jiechong Song et al.
GeoLLM: Extracting Geospatial Knowledge from Large Language Models
Rohin Manvi, Samar Khanna, Gengchen Mai et al.
Learning Delays in Spiking Neural Networks using Dilated Convolutions with Learnable Spacings
Ilyass Hammouamri, Ismail Khalfaoui Hassani, Timothée Masquelier
Motif: Intrinsic Motivation from Artificial Intelligence Feedback
Martin Klissarov, Pierluca D'Oro, Shagun Sodhani et al.
Training Socially Aligned Language Models on Simulated Social Interactions
Ruibo Liu, Ruixin Yang, Chenyan Jia et al.
CoR-GS: Sparse-View 3D Gaussian Splatting via Co-Regularization
Jiawei Zhang, Jiahe Li, Xiaohan Yu et al.
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions
Jack Urbanek, Florian Bordes, Pietro Astolfi et al.
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively
Haobo Yuan, Xiangtai Li, Chong Zhou et al.
Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning
Yiwei Li, Peiwen Yuan, Shaoxiong Feng et al.
Turning large language models into cognitive models
Marcel Binz, Eric Schulz
SparQ Attention: Bandwidth-Efficient LLM Inference
Luka Ribar, Ivan Chelombiev, Luke Hudlass-Galley et al.
Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis
Zhenhui Ye, Tianyun Zhong, Yi Ren et al.
DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations
Tianhao Qi, Shancheng Fang, Yanze Wu et al.
DiffuseMix: Label-Preserving Data Augmentation with Diffusion Models
Khawar Islam, Muhammad Zaigham Zaheer, Arif Mahmood et al.
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions
Juncheng Li, Kaihang Pan, Zhiqi Ge et al.
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Pavan Kumar Anasosalu Vasu, Hadi Pouransari, Fartash Faghri et al.
Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos
Mehmet Saygin Seyfioglu, Wisdom Ikezogwo, Fatemeh Ghezloo et al.
Controllable Human-Object Interaction Synthesis
Jiaman Li, Alexander Clegg, Roozbeh Mottaghi et al.
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models
Hyeonho Jeong, Geon Yeong Park, Jong Chul Ye
DemoFusion: Democratising High-Resolution Image Generation With No $$$
Ruoyi DU, Dongliang Chang, Timothy Hospedales et al.
AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error
Jonas Ricker, Denis Lukovnikov, Asja Fischer
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Haoran Wei, Lingyu Kong, Jinyue Chen et al.
TUMTraf V2X Cooperative Perception Dataset
Walter Zimmer, Gerhard Arya Wardana, Suren Sritharan et al.
On the Robustness of Large Multimodal Models Against Image Adversarial Attacks
Xuanming Cui, Alejandro Aparcedo, Young Kyun Jang et al.
Lemur: Harmonizing Natural Language and Code for Language Agents
Yiheng Xu, Hongjin SU, Chen Xing et al.
How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?
Jingfeng Wu, Difan Zou, Zixiang Chen et al.
Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images
Chaoqin Huang, Aofan Jiang, Jinghao Feng et al.
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
Weiyun Wang Weiyun, yiming ren, Haowen Luo et al.
Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation
Alexander Raistrick, Lingjie Mei, Karhan Kayan et al.
VIGC: Visual Instruction Generation and Correction
Théo Delemazure, Jérôme Lang, Grzegorz Pierczyński
Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM
Eliya Nachmani, Alon Levkovitch, Roy Hirsch et al.
Robust CLIP: Unsupervised Adversarial Fine-Tuning of Vision Embeddings for Robust Large Vision-Language Models
Christian Schlarmann, Naman Singh, Francesco Croce et al.
AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion
yitong jiang, Zhaoyang Zhang, Tianfan Xue et al.
KoLA: Carefully Benchmarking World Knowledge of Large Language Models
Jifan Yu, Xiaozhi Wang, Shangqing Tu et al.
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
Kai Zhang, Yi Luan, Hexiang Hu et al.
Human Alignment of Large Language Models through Online Preference Optimisation
Daniele Calandriello, Zhaohan Guo, REMI MUNOS et al.
MmAP: Multi-Modal Alignment Prompt for Cross-Domain Multi-Task Learning
Yi Xin, Junlong Du, Qiang Wang et al.
Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning
Hao Zhao, Maksym Andriushchenko, Francesco Croce et al.
MaxMin-RLHF: Alignment with Diverse Human Preferences
Souradip Chakraborty, Jiahao Qiu, Hui Yuan et al.
SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore
Sewon Min, Suchin Gururangan, Eric Wallace et al.
VRP-SAM: SAM with Visual Reference Prompt
Yanpeng Sun, Jiahui Chen, Shan Zhang et al.