Most Cited 2024 "gradient similarity" Papers
12,324 papers found • Page 7 of 62
Conference
SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion
Hsuan-I Ho, Jie Song, Otmar Hilliges
ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation
Mengcheng Lan, Chaofeng Chen, Yiping Ke et al.
Polynormer: Polynomial-Expressive Graph Transformer in Linear Time
Chenhui Deng, Zichao Yue, Zhiru Zhang
GenTron: Diffusion Transformers for Image and Video Generation
Shoufa Chen, Mengmeng Xu, Jiawei Ren et al.
Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs
Yuxin Zhang, Lirui Zhao, Mingbao Lin et al.
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis
Ziyue Jiang, Jinglin Liu, Yi Ren et al.
Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance
Donghoon Ahn, Hyoungwon Cho, Jaewon Min et al.
DITTO: Diffusion Inference-Time T-Optimization for Music Generation
Zachary Novack, Julian McAuley, Taylor Berg-Kirkpatrick et al.
Integer-Valued Training and Spike-driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection
Xinhao Luo, Man Yao, Yuhong Chou et al.
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
Muyao Niu, Xiaodong Cun, Xintao Wang et al.
RegionGPT: Towards Region Understanding Vision Language Model
Qiushan Guo, Shalini De Mello, Danny Yin et al.
Stay on Topic with Classifier-Free Guidance
Guillaume Sanchez, Alexander Spangher, Honglu Fan et al.
EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models
YEFEI HE, Jing Liu, Weijia Wu et al.
Depth Information Assisted Collaborative Mutual Promotion Network for Single Image Dehazing
Yafei Zhang, Shen Zhou, Huafeng Li
LaRE^2: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection
Yunpeng Luo, Junlong Du, Ke Yan et al.
DE-COP: Detecting Copyrighted Content in Language Models Training Data
André Duarte, Xuandong Zhao, Arlindo Oliveira et al.
SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks
Jiwon Song, Kyungseok Oh, Taesu Kim et al.
Generating Human Interaction Motions in Scenes with Text Control
Hongwei Yi, Justus Thies, Michael J. Black et al.
DistiLLM: Towards Streamlined Distillation for Large Language Models
Jongwoo Ko, Sungnyun Kim, Tianyi Chen et al.
ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion
Jiayu Yang, Ziang Cheng, Yunfei Duan et al.
Relay Diffusion: Unifying diffusion process across resolutions for image synthesis
Jiayan Teng, Wendi Zheng, Ming Ding et al.
Guidance with Spherical Gaussian Constraint for Conditional Diffusion
Lingxiao Yang, Shutong Ding, Yifan Cai et al.
SpikingBERT: Distilling BERT to Train Spiking Language Models Using Implicit Differentiation
Malyaban Bal, Abhronil Sengupta
VideoPrism: A Foundational Visual Encoder for Video Understanding
Long Zhao, Nitesh Bharadwaj Gundavarapu, Liangzhe Yuan et al.
VISTA-LLAMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens
Fan Ma, Xiaojie Jin, Heng Wang et al.
PromptTTS 2: Describing and Generating Voices with Text Prompt
Yichong Leng, ZHifang Guo, Kai Shen et al.
Teaching Large Language Models to Translate with Comparison
Jiali Zeng, Fandong Meng, Yongjing Yin et al.
Overthinking the Truth: Understanding how Language Models Process False Demonstrations
Danny Halawi, Jean-Stanislas Denain, Jacob Steinhardt
DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling
Haoran Li, Haolin Shi, Wenli Zhang et al.
MoCha-Stereo: Motif Channel Attention Network for Stereo Matching
Ziyang Chen, Wei Long, He Yao et al.
LLMs are Good Sign Language Translators
Jia Gong, Lin Geng Foo, Yixuan He et al.
EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion
Zehuan Huang, Hao Wen, Junting Dong et al.
SkeletonGait: Gait Recognition Using Skeleton Maps
Chao Fan, Jingzhe Ma, Dongyang Jin et al.
Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation
Guy Yariv, Itai Gat, Sagie Benaim et al.
Confucius: Iterative Tool Learning from Introspection Feedback by Easy-to-Difficult Curriculum
Zhengliang Shi, Shen Gao, Minghang Zhu et al.
Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology
Wenhao Tang, Fengtao ZHOU, Sheng Huang et al.
SMFANet: A Lightweight Self-Modulation Feature Aggregation Network for Efficient Image Super-Resolution
mingjun zheng, Long Sun, Jiangxin Dong et al.
Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation
Zhenliang Ni, Xinghao Chen, Yingjie Zhai et al.
GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation
Mukul Khanna, Ram Ramrakhya, Gunjan Chhablani et al.
ChatPose: Chatting about 3D Human Pose
Yao Feng, Jing Lin, Sai Kumar Dwivedi et al.
WAVES: Benchmarking the Robustness of Image Watermarks
Bang An, Mucong Ding, Tahseen Rabbani et al.
Vanilla Bayesian Optimization Performs Great in High Dimensions
Carl Hvarfner, Erik Hellsten, Luigi Nardi
OpenBias: Open-set Bias Detection in Text-to-Image Generative Models
Moreno D', Incà, Elia Peruzzo et al.
UniPAD: A Universal Pre-training Paradigm for Autonomous Driving
Honghui Yang, Sha Zhang, Di Huang et al.
Inducing High Energy-Latency of Large Vision-Language Models with Verbose Images
Kuofeng Gao, Yang Bai, Jindong Gu et al.
Fast Adversarial Attacks on Language Models In One GPU Minute
Vinu Sankar Sadasivan, Shoumik Saha, Gaurang Sriramanan et al.
HARDVS: Revisiting Human Activity Recognition with Dynamic Vision Sensors
Xiao Wang, Zongzhen Wu, Bo Jiang et al.
GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting
XINJIE ZHANG, Xingtong Ge, Tongda Xu et al.
OmniGlue: Generalizable Feature Matching with Foundation Model Guidance
Hanwen Jiang, Arjun Karpur, Bingyi Cao et al.
Enhancing Job Recommendation through LLM-Based Generative Adversarial Networks
Yingpeng Du, Di Luo, Rui Yan et al.
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
Evonne Ng, Javier Romero, Timur Bagautdinov et al.
In-Context Learning through the Bayesian Prism
Madhur Panwar, Kabir Ahuja, Navin Goyal
Learning to Model the World With Language
Jessy Lin, Yuqing Du, Olivia Watkins et al.
Exploiting Style Latent Flows for Generalizing Deepfake Video Detection
Jongwook Choi, Taehoon Kim, Yonghyun Jeong et al.
Frequency-Aware Transformer for Learned Image Compression
Han Li, Shaohui Li, Wenrui Dai et al.
TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection
Hao Sun, Mingyao Zhou, Wenjing Chen et al.
Plug-In Diffusion Model for Sequential Recommendation
Haokai Ma, Ruobing Xie, Lei Meng et al.
TorchRL: A data-driven decision-making library for PyTorch
Albert Bou, Matteo Bettini, Sebastian Dittert et al.
HDMixer: Hierarchical Dependency with Extendable Patch for Multivariate Time Series Forecasting
Qihe Huang, Lei Shen, Ruixin Zhang et al.
Towards Generalizable Tumor Synthesis
Qi Chen, Xiaoxi Chen, Haorui Song et al.
Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation
Zhewei Yao, Xiaoxia Wu, Cheng Li et al.
Frequency-Spatial Entanglement Learning for Camouflaged Object Detection
Yanguang Sun, Chunyan Xu, Jian Yang et al.
Scalable Diffusion for Materials Generation
Sherry Yang, Kwanghwan Cho, Amil Merchant et al.
End-to-End Rate-Distortion Optimized 3D Gaussian Representation
Henan Wang, Hanxin Zhu, Tianyu He et al.
OneRestore: A Universal Restoration Framework for Composite Degradation
Yu Guo, Yuan Gao, Yuxu Lu et al.
DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving
Chen Min, Dawei Zhao, Liang Xiao et al.
DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing
Minghao Chen, Iro Laina, Andrea Vedaldi
When Do We Not Need Larger Vision Models?
Baifeng Shi, Ziyang Wu, Maolin Mao et al.
Beyond First-Order Tweedie: Solving Inverse Problems using Latent Diffusion
Litu Rout, Yujia Chen, Abhishek Kumar et al.
GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning
Ye Yuan, Xueting Li, Yangyi Huang et al.
Simple Hierarchical Planning with Diffusion
Chang Chen, Fei Deng, Kenji Kawaguchi et al.
InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining
Boxin Wang, Wei Ping, Lawrence McAfee et al.
MindBridge: A Cross-Subject Brain Decoding Framework
Shizun Wang, Songhua Liu, Zhenxiong Tan et al.
Accelerating Convergence of Score-Based Diffusion Models, Provably
Gen Li, Yu Huang, Timofey Efimov et al.
Learning a Diffusion Model Policy from Rewards via Q-Score Matching
Michael Psenka, Alejandro Escontrela, Pieter Abbeel et al.
Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization
Yiyang Chen, Zhedong Zheng, Wei Ji et al.
Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining
Licong Lin, Yu Bai, Song Mei
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
Zhihao Yuan, Jinke Ren, Chun-Mei Feng et al.
InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior
Chenguo Lin, Yadong MU
Harnessing Large Language Models for Training-free Video Anomaly Detection
Luca Zanella, Willi Menapace, Massimiliano Mancini et al.
DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting
Shijie Zhou, Zhiwen Fan, Dejia Xu et al.
DiffDA: a Diffusion model for weather-scale Data Assimilation
Langwen Huang, Lukas Gianinazzi, Yuejiang Yu et al.
On the Learnability of Watermarks for Language Models
Chenchen Gu, XIANG LI, Percy Liang et al.
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion
Wendi Zheng, Jiayan Teng, Zhuoyi Yang et al.
Circumventing Concept Erasure Methods For Text-To-Image Generative Models
Minh Pham, Kelly Marshall, Niv Cohen et al.
Detector-Free Structure from Motion
Xingyi He, Jiaming Sun, Yifan Wang et al.
ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models
Lukas Höllein, Aljaž Božič, Norman Müller et al.
NeRF-LiDAR: Generating Realistic LiDAR Point Clouds with Neural Radiance Fields
Junge Zhang, Feihu Zhang, Shaochen Kuang et al.
DOGE: Domain Reweighting with Generalization Estimation
Simin Fan, Matteo Pagliardini, Martin Jaggi
FINER: Flexible Spectral-bias Tuning in Implicit NEural Representation by Variable-periodic Activation Functions
Zhen Liu, Hao Zhu, Qi Zhang et al.
Learning to Rank in Generative Retrieval
Yongqi Li, Nan Yang, Liang Wang et al.
How do Language Models Bind Entities in Context?
Jiahai Feng, Jacob Steinhardt
Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis
Basile Van Hoorick, Rundi Wu, Ege Ozguroglu et al.
LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs
Yunsheng Ma, Can Cui, Xu Cao et al.
Deep Confident Steps to New Pockets: Strategies for Docking Generalization
Gabriele Corso, Arthur Deng, Nicholas Polizzi et al.
Learning Content-Enhanced Mask Transformer for Domain Generalized Urban-Scene Segmentation
Qi Bi, Shaodi You, Theo Gevers
VoroNav: Voronoi-based Zero-shot Object Navigation with Large Language Model
Pengying Wu, Yao Mu, Bingxian Wu et al.
Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval
Jiamian Wang, Guohao Sun, Pichao Wang et al.
DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing
Kaiwen Zhang, Yifan Zhou, Xudong XU et al.
Generative-Based Fusion Mechanism for Multi-Modal Tracking
Zhangyong Tang, Tianyang Xu, Xiaojun Wu et al.
Free3D: Consistent Novel View Synthesis without 3D Representation
Chuanxia Zheng, Andrea Vedaldi
Generating Images of Rare Concepts Using Pre-trained Diffusion Models
Dvir Samuel, Rami Ben-Ari, Simon Raviv et al.
AVID: Any-Length Video Inpainting with Diffusion Model
Zhixing Zhang, Bichen Wu, Xiaoyan Wang et al.
Guiding Masked Representation Learning to Capture Spatio-Temporal Relationship of Electrocardiogram
Yeongyeon Na, Minje Park, Yunwon Tae et al.
SolidGen: An Autoregressive Model for Direct B-rep Synthesis
Karl Willis, Joseph Lambourne, Nigel Morris et al.
Task-Customized Mixture of Adapters for General Image Fusion
Pengfei Zhu, Yang Sun, Bing Cao et al.
Open-Vocabulary Video Anomaly Detection
Peng Wu, Xuerong Zhou, Guansong Pang et al.
Aligning and Prompting Everything All at Once for Universal Visual Perception
Yunhang Shen, Chaoyou Fu, Peixian Chen et al.
EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis
Shuai Tan, Bin Ji, Mengxiao Bi et al.
OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models
Kong Zhe, Yong Zhang, Tianyu Yang et al.
Learning to Unlearn: Instance-Wise Unlearning for Pre-trained Classifiers
Sungmin Cha, Sungjun Cho, Dasol Hwang et al.
Asymmetry in Low-Rank Adapters of Foundation Models
Jiacheng Zhu, Kristjan Greenewald, Kimia Nadjahi et al.
Tensor Programs VI: Feature Learning in Infinite Depth Neural Networks
Greg Yang, Dingli Yu, Chen Zhu et al.
Scaling Laws for Data Filtering— Data Curation cannot be Compute Agnostic
Sachin Goyal, Pratyush Maini, Zachary Lipton et al.
Optimizing Diffusion Noise Can Serve As Universal Motion Priors
Korrawe Karunratanakul, Konpat Preechakul, Emre Aksan et al.
Recursive Generalization Transformer for Image Super-Resolution
Zheng Chen, Yulun Zhang, Jinjin Gu et al.
Stochastic Interpolants with Data-Dependent Couplings
Michael Albergo, Mark Goldstein, Nicholas Boffi et al.
Towards Realistic Scene Generation with LiDAR Diffusion Models
Haoxi Ran, Vitor Guizilini, Yue Wang
OmniSeg3D: Omniversal 3D Segmentation via Hierarchical Contrastive Learning
Haiyang Ying, Yixuan Yin, Jinzhi Zhang et al.
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models
Mert Yuksekgonul, Varun Chandrasekaran, Erik Jones et al.
QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models
Jing Liu, Ruihao Gong, Xiuying Wei et al.
Grokking as the transition from lazy to rich training dynamics
Tanishq Kumar, Blake Bordelon, Samuel Gershman et al.
Multi-Source Diffusion Models for Simultaneous Music Generation and Separation
Giorgio Mariani, Irene Tallini, Emilian Postolache et al.
BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP
Jiawang Bai, Kuofeng Gao, Shaobo Min et al.
Scaling for Training Time and Post-hoc Out-of-distribution Detection Enhancement
Kai Xu, Rongyu Chen, Gianni Franchi et al.
MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning
Vishal Nedungadi, Ankit Kariryaa, Stefan Oehmcke et al.
PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution
Honghao Chen, Xiangxiang Chu, Renyongjian et al.
BAT: Behavior-Aware Human-Like Trajectory Prediction for Autonomous Driving
Haicheng Liao, Zhenning Li, Huanming Shen et al.
TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos
Yufu Wang, Ziyun Wang, Lingjie Liu et al.
Deep Temporal Graph Clustering
Meng Liu, Yue Liu, KE LIANG et al.
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Juhong Min, Shyamal Buch, Arsha Nagrani et al.
Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models
Chang Liu, Haoning Wu, Yujie Zhong et al.
Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery
Sukrut Rao, Sweta Mahajan, Moritz Böhle et al.
Gaussian Shell Maps for Efficient 3D Human Generation
Rameen Abdal, Wang Yifan, Zifan Shi et al.
An Aggregation-Free Federated Learning for Tackling Data Heterogeneity
Yuan Wang, Huazhu Fu, Renuga Kanagavelu et al.
CorrMatch: Label Propagation via Correlation Matching for Semi-Supervised Semantic Segmentation
Bo-Yuan Sun, Yuqi Yang, Le Zhang et al.
LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery
Pingchuan Ma, Johnson Tsun-Hsuan Wang, Minghao Guo et al.
Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition
Feng Lu, Lijun Zhang, Xiangyuan Lan et al.
SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks
Xinyu Shi, Zecheng Hao, Zhaofei Yu
Matryoshka Diffusion Models
Jiatao Gu, Shuangfei Zhai, Yizhe Zhang et al.
Image Fusion via Vision-Language Model
Zixiang Zhao, Lilun Deng, Haowen Bai et al.
SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution
Zhixuan Liang, Yao Mu, Hengbo Ma et al.
Successor Heads: Recurring, Interpretable Attention Heads In The Wild
Rhys Gould, Euan Ong, George Ogden et al.
FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis
Feng Liang, Bichen Wu, Jialiang Wang et al.
RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content
Zhuowen Yuan, Zidi Xiong, Yi Zeng et al.
Looped Transformers are Better at Learning Learning Algorithms
Liu Yang, Kangwook Lee, Robert Nowak et al.
GIVT: Generative Infinite-Vocabulary Transformers
Michael Tschannen, Cian Eastwood, Fabian Mentzer
Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation
Shanshan Zhong, Zhongzhan Huang, Shanghua Gao et al.
Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer
Eric Brachmann, Jamie Wynn, Shuai Chen et al.
Space Group Constrained Crystal Generation
Rui Jiao, Wenbing Huang, Yu Liu et al.
Attention-Challenging Multiple Instance Learning for Whole Slide Image Classification
Yunlong Zhang, Honglin Li, YUXUAN SUN et al.
Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting
Haipeng Liu, Yang Wang, Biao Qian et al.
Long-Tail Learning with Foundation Model: Heavy Fine-Tuning Hurts
Jiang-Xin Shi, Tong Wei, Zhi Zhou et al.
NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild
Weining Ren, Zihan Zhu, Boyang Sun et al.
DiffusionTrack: Diffusion Model for Multi-Object Tracking
Run Luo, Zikai Song, Lintao Ma et al.
An Emulator for Fine-tuning Large Language Models using Small Language Models
Eric Mitchell, Rafael Rafailov, Archit Sharma et al.
Alleviating Exposure Bias in Diffusion Models through Sampling with Shifted Time Steps
Mingxiao Li, Tingyu Qu, Ruicong Yao et al.
Vlogger: Make Your Dream A Vlog
Shaobin Zhuang, Kunchang Li, Xinyuan Chen et al.
TC4D: Trajectory-Conditioned Text-to-4D Generation
Sherwin Bahmani, Xian Liu, Wang Yifan et al.
Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot
Fabien Baradel, Thomas Lucas, Matthieu Armando et al.
TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models
Yushi Huang, Ruihao Gong, Jing Liu et al.
Light the Night: A Multi-Condition Diffusion Framework for Unpaired Low-Light Enhancement in Autonomous Driving
JINLONG LI, Baolu Li, Zhengzhong Tu et al.
Evaluating the Zero-shot Robustness of Instruction-tuned Language Models
Jiuding Sun, Chantal Shaib, Byron Wallace
MonoCD: Monocular 3D Object Detection with Complementary Depths
Longfei Yan, Pei Yan, Shengzhou Xiong et al.
Large Language Models Can Automatically Engineer Features for Few-Shot Tabular Learning
Sungwon Han, Jinsung Yoon, Sercan Arik et al.
Make RepVGG Greater Again: A Quantization-Aware Approach
Xuesong Nie, Yunfeng Yan, Siyuan Li et al.
Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation
Hang Li, Chengzhi Shen, Philip H.S. Torr et al.
Video Interpolation with Diffusion Models
Siddhant Jain, Daniel Watson, Aleksander Holynski et al.
C-TPT: Calibrated Test-Time Prompt Tuning for Vision-Language Models via Text Feature Dispersion
Hee Suk Yoon, Eunseop Yoon, Joshua Tian Jin Tee et al.
GIM: Learning Generalizable Image Matcher From Internet Videos
Xuelun Shen, zhipeng cai, Wei Yin et al.
On the Foundations of Shortcut Learning
Katherine Hermann, Hossein Mobahi, Thomas FEL et al.
EMOPortraits: Emotion-enhanced Multimodal One-shot Head Avatars
Nikita Drobyshev, Antoni Bigata Casademunt, Konstantinos Vougioukas et al.
TabR: Tabular Deep Learning Meets Nearest Neighbors
Yury Gorishniy, Ivan Rubachev, Nikolay Kartashev et al.
GraphCare: Enhancing Healthcare Predictions with Personalized Knowledge Graphs
Pengcheng Jiang, Cao Xiao, Adam Cross et al.
Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation
Danny Halawi, Alexander Wei, Eric Wallace et al.
NExT: Teaching Large Language Models to Reason about Code Execution
Ansong Ni, Miltiadis Allamanis, Arman Cohan et al.
Repoformer: Selective Retrieval for Repository-Level Code Completion
Di Wu, Wasi Ahmad, Dejiao Zhang et al.
Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships
Sebastian Koch, Narunas Vaskevicius, Mirco Colosi et al.
GliDe with a CaPE: A Low-Hassle Method to Accelerate Speculative Decoding
Cunxiao Du, Jing Jiang, Xu Yuanchen et al.
A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models
Julio Silva-Rodríguez, Sina Hajimiri, Ismail Ben Ayed et al.
Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions
Oindrila Saha, Grant Horn, Subhransu Maji
Don't Trust: Verify -- Grounding LLM Quantitative Reasoning with Autoformalization
Jin Zhou, Charles Staats, Wenda Li et al.
Dense Reward for Free in Reinforcement Learning from Human Feedback
Alexander Chan, Hao Sun, Samuel Holt et al.
Source-Free Domain Adaptation with Frozen Multimodal Foundation Model
Song Tang, Wenxin Su, Mao Ye et al.
Language-Image Pre-training with Long Captions
Kecheng Zheng, Yifei Zhang, Wei Wu et al.
Robust agents learn causal world models
Jonathan Richens, Tom Everitt
Variational Bayesian Last Layers
James Harrison, John Willes, Jasper Snoek
MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance
Yake Wei, Di Hu
Journey to the Center of the Knowledge Neurons: Discoveries of Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons
Yuheng Chen, Pengfei Cao, Yubo Chen et al.
Koala: Key Frame-Conditioned Long Video-LLM
Reuben Tan, Ximeng Sun, Ping Hu et al.
Relation DETR: Exploring Explicit Position Relation Prior for Object Detection
Xiuquan Hou, Meiqin Liu, Senlin Zhang et al.
GPT4Point: A Unified Framework for Point-Language Understanding and Generation
Zhangyang Qi, Ye Fang, Zeyi Sun et al.
Unleashing the Potential of SAM for Medical Adaptation via Hierarchical Decoding
Zhiheng Cheng, Qingyue Wei, Hongru Zhu et al.
Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation
guo, Tianwei Lin
Masked Audio Generation using a Single Non-Autoregressive Transformer
Alon Ziv, Itai Gat, Gael Le Lan et al.
ImagenHub: Standardizing the evaluation of conditional image generation models
Max Ku, Tianle Li, Kai Zhang et al.
CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech
Jaehyeon Kim, Keon Lee, Seungjun Chung et al.
Prompt-tuning Latent Diffusion Models for Inverse Problems
Hyungjin Chung, Jong Chul YE, Peyman Milanfar et al.
Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation
Shuting He, Henghui Ding
What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
Aaditya Singh, Ted Moskovitz, Feilx Hill et al.