Most Cited 2024 "referring expression editing" Papers
12,324 papers found • Page 10 of 62
Conference
DAP: A Dynamic Adversarial Patch for Evading Person Detectors
Amira Guesmi, Ruitian Ding, Muhammad Abdullah Hanif et al.
Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction
Zihao Liu, Xiaoyu Zhang, Guangwei Liu et al.
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
Qilang Ye, Zitong Yu, Rui Shao et al.
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
Chaoyi Zhang, Kevin Lin, Zhengyuan Yang et al.
Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval
Yucheng Suo, Fan Ma, Linchao Zhu et al.
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
Jitesh Jain, Jianwei Yang, Humphrey Shi
A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators
Chen Zhang, L. F. D’Haro, Yiming Chen et al.
How to Capture Higher-order Correlations? Generalizing Matrix Softmax Attention to Kronecker Computation
Josh Alman, Zhao Song
Feature Fusion from Head to Tail for Long-Tailed Visual Recognition
Mengke Li, Zhikai HU, Yang Lu et al.
ArtBank: Artistic Style Transfer with Pre-trained Diffusion Model and Implicit Style Prompt Bank
Zhanjie Zhang, Quanwei Zhang, Wei Xing et al.
What does the Knowledge Neuron Thesis Have to do with Knowledge?
Jingcheng Niu, Andrew Liu, Zining Zhu et al.
Generative Latent Coding for Ultra-Low Bitrate Image Compression
Zhaoyang Jia, Jiahao Li, Bin Li et al.
Fully Sparse 3D Occupancy Prediction
Haisong Liu, Yang Chen, Haiguang Wang et al.
Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
Pingping Zhang, Yuhao Wang, Yang Liu et al.
FROSTER: Frozen CLIP is A Strong Teacher for Open-Vocabulary Action Recognition
Xiaohu Huang, Hao Zhou, Kun Yao et al.
Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit
Blake Bordelon, Lorenzo Noci, Mufan Li et al.
TIBET: Identifying and Evaluating Biases in Text-to-Image Generative Models
Aditya Aravind Chinchure, Pushkar Shukla, Gaurav Bhatt et al.
Gramformer: Learning Crowd Counting via Graph-Modulated Transformer
Hui LIN, Zhiheng Ma, Xiaopeng Hong et al.
Accurate Spatial Gene Expression Prediction by Integrating Multi-Resolution Features
Youngmin Chung, Ji Hun Ha, Kyeong Chan Im et al.
LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis
Kevin Xie, Tianshi Cao, Jonathan P Lorraine et al.
Communication-Efficient Collaborative Perception via Information Filling with Codebook
Yue Hu, Juntong Peng, Sifei Liu et al.
Simplifying Transformer Blocks
Bobby He, Thomas Hofmann
Why Larger Language Models Do In-context Learning Differently?
Zhenmei Shi, Junyi Wei, Zhuoyan Xu et al.
VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models
Shicheng Li, Lei Li, Yi Liu et al.
Group Preference Optimization: Few-Shot Alignment of Large Language Models
Siyan Zhao, John Dang, Aditya Grover
Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis
Xin Zhou, Dingkang Liang, Wei Xu et al.
GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation
Kai Chen, Enze Xie, Zhe Chen et al.
DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control
Yuru Jia, Lukas Hoyer, Shengyu Huang et al.
MS-DETR: Efficient DETR Training with Mixed Supervision
Chuyang Zhao, Yifan Sun, Wenhao Wang et al.
Watch Your Steps: Local Image and Scene Editing by Text Instructions
Ashkan Mirzaei, Tristan T Aumentado-Armstrong, Marcus A Brubaker et al.
Steve-Eye: Equipping LLM-based Embodied Agents with Visual Perception in Open Worlds
Sipeng Zheng, jiazheng liu, Yicheng Feng et al.
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
Junyuan Hong, Jinhao Duan, Chenhui Zhang et al.
Causal Representation Learning from Multiple Distributions: A General Setting
Kun Zhang, Shaoan Xie, Ignavier Ng et al.
Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing
Hyelin Nam, Gihyun Kwon, Geon Yeong Park et al.
Improving Audio-Visual Segmentation with Bidirectional Generation
Dawei Hao, Yuxin Mao, Bowen He et al.
Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer
Yu Deng, Duomin Wang, Baoyuan Wang
Jumanji: a Diverse Suite of Scalable Reinforcement Learning Environments in JAX
Clément Bonnet, Daniel Luo, Donal Byrne et al.
Trajeglish: Traffic Modeling as Next-Token Prediction
Jonah Philion, Xue Bin Peng, Sanja Fidler
Self-Alignment of Large Language Models via Monopolylogue-based Social Scene Simulation
Xianghe Pang, shuo tang, Rui Ye et al.
OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers
Han Liang, Jiacheng Bao, Ruichi Zhang et al.
DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM
Yixuan Wu, Yizhou Wang, Shixiang Tang et al.
Pre-trained Vision and Language Transformers Are Few-Shot Incremental Learners
Keon Hee Park, Kyungwoo Song, Gyeong-Moon Park
Breathing Life Into Sketches Using Text-to-Video Priors
Rinon Gal, Yael Vinker, Yuval Alaluf et al.
Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models
Didi Zhu, Zhongyi Sun, Zexi Li et al.
Faithful Explanations of Black-box NLP Models Using LLM-generated Counterfactuals
Yair Gat, Nitay Calderon, Amir Feder et al.
Grounded Question-Answering in Long Egocentric Videos
Shangzhe Di, Weidi Xie
When Fast Fourier Transform Meets Transformer for Image Restoration
xingyu jiang, Xiuhui Zhang, Ning Gao et al.
Linguistic Calibration of Long-Form Generations
Neil Band, Xuechen Li, Tengyu Ma et al.
RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation
Mahdi Nikdan, Soroush Tabesh, Elvir Crnčević et al.
Learning with Mixture of Prototypes for Out-of-Distribution Detection
Haodong Lu, Dong Gong, Shuo Wang et al.
Generating Human Motion in 3D Scenes from Text Descriptions
Zhi Cen, Huaijin Pi, Sida Peng et al.
Str2Str: A Score-based Framework for Zero-shot Protein Conformation Sampling
Jiarui Lu, Bozitao Zhong, Zuobai Zhang et al.
SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples
Phillip Howard, Avinash Madasu, Tiep Le et al.
LightIt: Illumination Modeling and Control for Diffusion Models
Peter Kocsis, Kalyan Sunkavalli, Julien Philip et al.
DeS3: Adaptive Attention-Driven Self and Soft Shadow Removal Using ViT Similarity
Yeying Jin, Wenhan Yang, W. Ye et al.
MeaCap: Memory-Augmented Zero-shot Image Captioning
Zequn Zeng, Yan Xie, Hao Zhang et al.
FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally
Qiuhong Shen, Xingyi Yang, Xinchao Wang
Improving Automatic VQA Evaluation Using Large Language Models
Oscar Mañas, Benno Krojer, Aishwarya Agrawal
Not All Voxels Are Equal: Hardness-Aware Semantic Scene Completion with Self-Distillation
Song Wang, Jiawei Yu, Wentong Li et al.
Mosaic-SDF for 3D Generative Models
Lior Yariv, Omri Puny, Oran Gafni et al.
Navigating the Design Space of Equivariant Diffusion-Based Generative Models for De Novo 3D Molecule Generation
Tuan Le, Julian Cremer, Frank Noe et al.
Reinforced Adaptive Knowledge Learning for Multimodal Fake News Detection
Litian Zhang, Xiaoming Zhang, Chaozhuo Li et al.
DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection
Lewei Yao, Renjie Pi, Jianhua Han et al.
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu, Michael Jordan, Jiantao Jiao
UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models
Yiming Zhao, Zhouhui Lian
JoMA: Demystifying Multilayer Transformers via Joint Dynamics of MLP and Attention
Yuandong Tian, Yiping Wang, Zhenyu Zhang et al.
Leveraging Vision-Language Models for Improving Domain Generalization in Image Classification
Sravanti Addepalli, Ashish Asokan, Lakshay Sharma et al.
UniTraj: A Unified Framework for Scalable Vehicle Trajectory Prediction
Lan Feng, Mohammadhossein Bahari, Kaouther Messaoud et al.
Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models
Yixuan Ren, Yang Zhou, Jimei Yang et al.
Deep Networks Always Grok and Here is Why
Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
Yuiga Wada, Kanta Kaneda, Daichi Saito et al.
PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation
Zhenyu Li, Shariq Bhat, Peter Wonka
S2MAE: A Spatial-Spectral Pretraining Foundation Model for Spectral Remote Sensing Data
Xuyang Li, Danfeng Hong, Jocelyn Chanussot
Reprompting: Automated Chain-of-Thought Prompt Inference Through Gibbs Sampling
Weijia Xu, Andrzej Banburski-Fahey, Nebojsa Jojic
Unifying Visual and Vision-Language Tracking via Contrastive Learning
HIMap: HybrId Representation Learning for End-to-end Vectorized HD Map Construction
Yi ZHOU, Hui Zhang, Jiaqian Yu et al.
One-Prompt to Segment All Medical Images
Wu, Min Xu
Variational Learning is Effective for Large Deep Networks
Yuesong Shen, Nico Daheim, Bai Cong et al.
Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation
Tong Shao, Zhuotao Tian, Hang Zhao et al.
Digital Life Project: Autonomous 3D Characters with Social Intelligence
Zhongang Cai, Jianping Jiang, Zhongfei Qing et al.
Online Vectorized HD Map Construction using Geometry
Zhixin Zhang, Yiyuan Zhang, Xiaohan Ding et al.
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer
Jianjian Cao, Peng Ye, Shengze Li et al.
GLARE: Low Light Image Enhancement via Generative Latent Feature based Codebook Retrieval
Han Zhou, Wei Dong, Xiaohong Liu et al.
Boosting Diffusion Models with Moving Average Sampling in Frequency Domain
Yurui Qian, Qi Cai, Yingwei Pan et al.
MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data
Yinya Huang, Xiaohan Lin, Zhengying Liu et al.
SEPT: Towards Efficient Scene Representation Learning for Motion Prediction
Zhiqian Lan, Yuxuan Jiang, Yao Mu et al.
An Analysis of Linear Time Series Forecasting Models
William Toner, Luke Darlow
CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs
Yassine Ouali, Adrian Bulat, Brais Martinez et al.
FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models
Andrea Caraffa, Davide Boscaini, Amir Hamza et al.
RangeLDM: Fast Realistic LiDAR Point Cloud Generation
Qianjiang Hu, Zhimin Zhang, Wei Hu
Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration
Zhongzhi Yu, Zheng Wang, Yonggan Fu et al.
Towards Codable Watermarking for Injecting Multi-Bits Information to LLMs
Lean Wang, Wenkai Yang, Deli Chen et al.
On the Embedding Collapse when Scaling up Recommendation Models
Xingzhuo Guo, Junwei Pan, Ximei Wang et al.
CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update
Zhi Gao, Yuntao Du., Xintong Zhang et al.
A Dense Reward View on Aligning Text-to-Image Diffusion with Preference
Shentao Yang, Tianqi Chen, Mingyuan Zhou
Xformer: Hybrid X-Shaped Transformer for Image Denoising
Jiale Zhang, Yulun Zhang, Jinjin Gu et al.
RealGen: Retrieval Augmented Generation for Controllable Traffic Scenarios
Wenhao Ding, Yulong Cao, DING ZHAO et al.
SPTNet: An Efficient Alternative Framework for Generalized Category Discovery with Spatial Prompt Tuning
Hongjun Wang, Sagar Vaze, Kai Han
Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions
Weizhen He, Yiheng Deng, SHIXIANG TANG et al.
AltDiffusion: A Multilingual Text-to-Image Diffusion Model
Fulong Ye, Guang Liu, Xinya Wu et al.
Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness
Sibo Wang, Jie Zhang, Zheng Yuan et al.
PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations
Yang Zheng, Qingqing Zhao, Guandao Yang et al.
Smooth ECE: Principled Reliability Diagrams via Kernel Smoothing
Jaroslaw Blasiok, Preetum Nakkiran
LivePhoto: Real Image Animation with Text-guided Motion Control
Xi Chen, Zhiheng Liu, Mengting Chen et al.
Investigating the Effectiveness of Task-Agnostic Prefix Prompt for Instruction Following
Seonghyeon Ye, Hyeonbin Hwang, Sohee Yang et al.
Multimodal Industrial Anomaly Detection by Crossmodal Feature Mapping
Alex Costanzino, Pierluigi Zama Ramirez, Giuseppe Lisanti et al.
Localizing and Editing Knowledge In Text-to-Image Generative Models
Samyadeep Basu, Nanxuan Zhao, Vlad Morariu et al.
CasCast: Skillful High-resolution Precipitation Nowcasting via Cascaded Modelling
JUNCHAO GONG, LEI BAI, Peng Ye et al.
Divide and not forget: Ensemble of selectively trained experts in Continual Learning
Grzegorz Rypeść, Sebastian Cygert, Valeriya Khan et al.
EasyTPP: Towards Open Benchmarking Temporal Point Processes
Siqiao Xue, Xiaoming Shi, Zhixuan Chu et al.
Global and Local Prompts Cooperation via Optimal Transport for Federated Learning
Hongxia Li, Wei Huang, Jingya Wang et al.
Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation
Zhipeng Du, Miaojing Shi, Jiankang Deng
Click-Gaussian: Interactive Segmentation to Any 3D Gaussians
Seokhun Choi, Hyeonseop Song, Jaechul Kim et al.
MagMax: Leveraging Model Merging for Seamless Continual Learning
Daniel Marczak, Bartlomiej Twardowski, Tomasz Trzcinski et al.
Dysen-VDM: Empowering Dynamics-aware Text-to-Video Diffusion with LLMs
Hao Fei, Shengqiong Wu, Wei Ji et al.
R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model
Changhoon Kim, Kyle Min, Yezhou Yang
Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
Xiaoshi Wu, Yiming Hao, Manyuan Zhang et al.
Improving Image Restoration through Removing Degradations in Textual Representations
Jingbo Lin, Zhilu Zhang, Yuxiang Wei et al.
MyVLM: Personalizing VLMs for User-Specific Queries
Yuval Alaluf, Elad Richardson, Sergey Tulyakov et al.
On Computational Limits of Modern Hopfield Models: A Fine-Grained Complexity Analysis
Jerry Yao-Chieh Hu, Thomas Lin, Zhao Song et al.
RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis
Yao Mu, Junting Chen, Qing-Long Zhang et al.
One-shot Empirical Privacy Estimation for Federated Learning
Galen Andrew, Peter Kairouz, Sewoong Oh et al.
Scaling Laws for Sparsely-Connected Foundation Models
Elias Frantar, Carlos Riquelme Ruiz, Neil Houlsby et al.
LEAP-VO: Long-term Effective Any Point Tracking for Visual Odometry
Weirong Chen, Le Chen, Rui Wang et al.
DS-AL: A Dual-Stream Analytic Learning for Exemplar-Free Class-Incremental Learning
Huiping Zhuang, Run He, Kai Tong et al.
GAIA: Zero-shot Talking Avatar Generation
Tianyu He, Junliang Guo, Runyi Yu et al.
Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis
Kai Chen, Chunwei Wang, Kuo Yang et al.
SocialCircle: Learning the Angle-based Social Interaction Representation for Pedestrian Trajectory Prediction
Conghao Wong, Beihao Xia, Ziqian Zou et al.
MGMap: Mask-Guided Learning for Online Vectorized HD Map Construction
Xiaolu Liu, Song Wang, Wentong Li et al.
Point Segment and Count: A Generalized Framework for Object Counting
Zhizhong Huang, Mingliang Dai, Yi Zhang et al.
Get What You Want, Not What You Don't: Image Content Suppression for Text-to-Image Diffusion Models
Senmao Li, Joost van de Weijer, taihang Hu et al.
eCeLLM: Generalizing Large Language Models for E-commerce from Large-scale, High-quality Instruction Data
Peng, Xinyi Ling, Ziru Chen et al.
Dual Operating Modes of In-Context Learning
Ziqian Lin, Kangwook Lee
CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
Sreyan Ghosh, Ashish Seth, Sonal Kumar et al.
Understanding and Mitigating the Label Noise in Pre-training on Downstream Tasks
Hao Chen, Jindong Wang, Ankit Parag Shah et al.
DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models
Namhyuk Ahn, Junsoo Lee, Chunggi Lee et al.
Active Preference Learning for Large Language Models
William Muldrew, Peter Hayes, Mingtian Zhang et al.
DAVE - A Detect-and-Verify Paradigm for Low-Shot Counting
Jer Pelhan, Alan Lukezic, Vitjan Zavrtanik et al.
Improving fine-grained understanding in image-text pre-training
Ioana Bica, Anastasija Ilic, Matthias Bauer et al.
DrM: Mastering Visual Reinforcement Learning through Dormant Ratio Minimization
Guowei Xu, Ruijie Zheng, Yongyuan Liang et al.
LEAD: Learning Decomposition for Source-free Universal Domain Adaptation
Sanqing Qu, Tianpei Zou, Lianghua He et al.
Abductive Ego-View Accident Video Understanding for Safe Driving Perception
Jianwu Fang, Lei-lei Li, Junfei Zhou et al.
Improving the Generalization of Segmentation Foundation Model under Distribution Shift via Weakly Supervised Adaptation
Haojie Zhang, Yongyi Su, Xun Xu et al.
RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features
Geonho Bang, Kwangjin Choi, Jisong Kim et al.
NeuroNCAP: Photorealistic Closed-loop Safety Testing for Autonomous Driving
William Ljungbergh, Adam Tonderski, Joakim Johnander et al.
SingularTrajectory: Universal Trajectory Predictor Using Diffusion Model
Inhwan Bae, Young-Jae Park, Hae-Gon Jeon
Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment
Siyao Li, Tianpei Gu, Zhitao Yang et al.
DrivingDiffusion: Layout-Guided Multi-View Driving Scenarios Video Generation with Latent Diffusion Model
Li Xiaofan, Zhang Yifu, Xiaoqing Ye
Learning Transferable Negative Prompts for Out-of-Distribution Detection
Tianqi Li, Guansong Pang, wenjun miao et al.
ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance
Yongwei Chen, Tengfei Wang, Tong Wu et al.
Vision-and-Language Navigation via Causal Learning
Liuyi Wang, Zongtao He, Ronghao Dang et al.
A Resilient and Accessible Distribution-Preserving Watermark for Large Language Models
Yihan Wu, Zhengmian Hu, Junfeng Guo et al.
CAD-SIGNet: CAD Language Inference from Point Clouds using Layer-wise Sketch Instance Guided Attention
Mohammad Sadil Khan, Elona Dupont, Sk Aziz Ali et al.
Multimodal Prompt Perceiver: Empower Adaptiveness Generalizability and Fidelity for All-in-One Image Restoration
Yuang Ai, Huaibo Huang, Xiaoqiang Zhou et al.
Sign2GPT: Leveraging Large Language Models for Gloss-Free Sign Language Translation
Ryan Wong, Necati Cihan Camgoz, Richard Bowden
AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation
Haonan Wang, Qixiang ZHANG, Yi Li et al.
Towards Diverse Behaviors: A Benchmark for Imitation Learning with Human Demonstrations
Xiaogang Jia, Denis Blessing, Xinkai Jiang et al.
Exploiting Diffusion Prior for Generalizable Dense Prediction
Hsin-Ying Lee, Hung-Yu Tseng, Hsin-Ying Lee et al.
Gradient-based Parameter Selection for Efficient Fine-Tuning
Zhi Zhang, Qizhe Zhang, Zijun Gao et al.
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
Rizhao Cai, Zirui Song, DAYAN GUAN et al.
Amodal Ground Truth and Completion in the Wild
Guanqi Zhan, Chuanxia Zheng, Weidi Xie et al.
Neural Sign Actors: A Diffusion Model for 3D Sign Language Production from Text
Vasileios Baltatzis, Rolandos Alexandros Potamias, Evangelos Ververas et al.
BAGS: Blur Agnostic Gaussian Splatting through Multi-Scale Kernel Modeling
Cheng Peng, Yutao Tang, Yifan Zhou et al.
From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models
Rongjie Li, Songyang Zhang, Dahua Lin et al.
EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
Jiaxuan Li, Duc Minh Vo, Akihiro Sugimoto et al.
Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles
Rui Song, Chenwei Liang, Hu Cao et al.
Query-Dependent Prompt Evaluation and Optimization with Offline Inverse RL
Hao Sun, Alihan Hüyük, Mihaela van der Schaar
A Multimodal Automated Interpretability Agent
Tamar Rott Shaham, Sarah Schwettmann, Franklin Wang et al.
LoTa-Bench: Benchmarking Language-oriented Task Planners for Embodied Agents
Jae-Woo Choi, Youngwoo Yoon, Youngwoo Yoon et al.
Bridging Remote Sensors with Multisensor Geospatial Foundation Models
Boran Han, Shuai Zhang, Xingjian Shi et al.
One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models
Lin Li, Haoyan Guan, Jianing Qiu et al.
Graph Metanetworks for Processing Diverse Neural Architectures
Derek Lim, Haggai Maron, Marc T Law et al.
Cross-Layer and Cross-Sample Feature Optimization Network for Few-Shot Fine-Grained Image Classification
Zhen-Xiang Ma, Zhen-Duo Chen, Li-Jun Zhao et al.
Efficient Multi-scale Network with Learnable Discrete Wavelet Transform for Blind Motion Deblurring
Xin Gao, Tianheng Qiu, Xinyu Zhang et al.
LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model
Yulin Luo, Ruichuan An, Bocheng Zou et al.
Learn to Follow: Decentralized Lifelong Multi-Agent Pathfinding via Planning and Learning
Alexey Skrynnik, Anton Andreychuk, Maria Nesterova et al.
ReGenNet: Towards Human Action-Reaction Synthesis
Liang Xu, Yizhou Zhou, Yichao Yan et al.
Towards Surveillance Video-and-Language Understanding: New Dataset Baselines and Challenges
Tongtong Yuan, Xuange Zhang, Kun Liu et al.
Real-Fake: Effective Training Data Synthesis Through Distribution Matching
Jianhao Yuan, Jie Zhang, Shuyang Sun et al.
TOP-ReID: Multi-Spectral Object Re-identification with Token Permutation
Yuhao Wang, Xuehu Liu, Pingping Zhang et al.
Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision
Yi Yu, Xue Yang, Qingyun Li et al.
On the Error Analysis of 3D Gaussian Splatting and an Optimal Projection Strategy
Letian Huang, Jiayang Bai, Jie Guo et al.
DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval
Xiangpeng Yang, Linchao Zhu, Xiaohan Wang et al.
Posterior Distillation Sampling
Juil Koo, Chanho Park, Minhyuk Sung
ParCo: Part-Coordinating Text-to-Motion Synthesis
Qiran Zou, Shangyuan Yuan, Shian Du et al.
Bridging State and History Representations: Understanding Self-Predictive RL
Tianwei Ni, Benjamin Eysenbach, Erfan Seyedsalehi et al.
Improved Probabilistic Image-Text Representations
Sanghyuk Chun
Equivariant Graph Neural Operator for Modeling 3D Dynamics
Minkai Xu, Jiaqi Han, Aaron Lou et al.
MG-TSD: Multi-Granularity Time Series Diffusion Models with Guided Learning Process
Xinyao Fan, Yueying Wu, Chang XU et al.
KVQ: Kwai Video Quality Assessment for Short-form Videos
Yiting Lu, Xin Li, Yajing Pei et al.
Generative Modeling of Regular and Irregular Time Series Data via Koopman VAEs
Ilan Naiman, N. Benjamin Erichson, Pu Ren et al.
UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All
Yuanhuiyi Lyu, Xu Zheng, Jiazhou Zhou et al.
Unsupervised Continual Anomaly Detection with Contrastively-Learned Prompt
Jiaqi Liu, Kai Wu, Qiang Nie et al.
Diffusion Reward: Learning Rewards via Conditional Video Diffusion
Tao Huang, Guangqi Jiang, Yanjie Ze et al.
A Dual-Augmentor Framework for Domain Generalization in 3D Human Pose Estimation
Qucheng Peng, Ce Zheng, Chen Chen
LLMRG: Improving Recommendations through Large Language Model Reasoning Graphs
Yan Wang, Zhixuan Chu, Xin Ouyang et al.
Zero-Shot Detection of AI-Generated Images
Davide Cozzolino, GIovanni Poggi, Matthias Niessner et al.
LLMs are Good Action Recognizers
Haoxuan Qu, Yujun Cai, Jun Liu
Unveiling the Pitfalls of Knowledge Editing for Large Language Models
Zhoubo Li, Ningyu Zhang, Yunzhi Yao et al.
LLM-Assisted Code Cleaning For Training Accurate Code Generators
Naman Jain, Tianjun Zhang, Wei-Lin Chiang et al.