Most Cited 2024 "action space smoothness" Papers
12,324 papers found • Page 3 of 62
Conference
CLIP-KD: An Empirical Study of CLIP Model Distillation
Chuanguang Yang, Zhulin An, Libo Huang et al.
T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
Qing Jiang, Feng Li, Zhaoyang Zeng et al.
FedDAT: An Approach for Foundation Model Finetuning in Multi-Modal Heterogeneous Federated Learning
Haokun Chen, Yao Zhang, Denis Krompass et al.
KoLA: Carefully Benchmarking World Knowledge of Large Language Models
Jifan Yu, Xiaozhi Wang, Shangqing Tu et al.
Prompt-Based Distribution Alignment for Unsupervised Domain Adaptation
Shuanghao Bai, Min Zhang, Wanqi Zhou et al.
AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion
yitong jiang, Zhaoyang Zhang, Tianfan Xue et al.
Finetuning Text-to-Image Diffusion Models for Fairness
Xudong Shen, Chao Du, Tianyu Pang et al.
VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense Scene Understanding
Yi Xin, Junlong Du, Qiang Wang et al.
How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?
Jingfeng Wu, Difan Zou, Zixiang Chen et al.
The Hedgehog & the Porcupine: Expressive Linear Attentions with Softmax Mimicry
Michael Zhang, Kush Bhatia, Hermann Kumbong et al.
Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation
Alexander Raistrick, Lingjie Mei, Karhan Kayan et al.
EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World
Yifei Huang, Guo Chen, Jilan Xu et al.
Human Feedback is not Gold Standard
Tom Hosking, Phil Blunsom, Max Bartolo
Detecting, Explaining, and Mitigating Memorization in Diffusion Models
Yuxin Wen, Yuchen Liu, Chen Chen et al.
Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery
Mubashir Noman, Muzammal Naseer, Hisham Cholakkal et al.
Bidirectional Multi-Scale Implicit Neural Representations for Image Deraining
Xiang Chen, Jinshan Pan, Jiangxin Dong
TUMTraf V2X Cooperative Perception Dataset
Walter Zimmer, Gerhard Arya Wardana, Suren Sritharan et al.
Video ReCap: Recursive Captioning of Hour-Long Videos
Md Mohaiminul Islam, Vu Bao Ngan Ho, Xitong Yang et al.
PSALM: Pixelwise Segmentation with Large Multi-modal Model
Zheng Zhang, YeYao Ma, Enming Zhang et al.
Prompt-Free Diffusion: Taking “Text” out of Text-to-Image Diffusion Models
Xingqian Xu, Jiayi Guo, Zhangyang Wang et al.
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
Zhening Huang, Xiaoyang Wu, Xi Chen et al.
Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation
Yuanchen Ju, Kaizhe Hu, Guowei Zhang et al.
In-Context Pretraining: Language Modeling Beyond Document Boundaries
Weijia Shi, Sewon Min, Maria Lomeli et al.
Batch Calibration: Rethinking Calibration for In-Context Learning and Prompt Engineering
Han Zhou, Xingchen Wan, Lev Proleev et al.
General Object Foundation Model for Images and Videos at Scale
Junfeng Wu, Yi Jiang, Qihao Liu et al.
AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?
Qi Zhao, Shijie Wang, Ce Zhang et al.
HiFi4G: High-Fidelity Human Performance Rendering via Compact Gaussian Splatting
Yuheng Jiang, Zhehao Shen, Penghao Wang et al.
DiffusionEdge: Diffusion Probabilistic Model for Crisp Edge Detection
Yunfan Ye, Yuhang Huang, Renjiao Yi et al.
InitNO: Boosting Text-to-Image Diffusion Models via Initial Noise Optimization
Xiefan Guo, Jinlin Liu, Miaomiao Cui et al.
IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection
Junbo Yin, Wenguan Wang, Runnan Chen et al.
Octopus: Embodied Vision-Language Programmer from Environmental Feedback
Jingkang Yang, Yuhao Dong, Shuai Liu et al.
NIFTY: Neural Object Interaction Fields for Guided Human Motion Synthesis
Nilesh Kulkarni, Davis Rempe, Kyle Genova et al.
InstructVideo: Instructing Video Diffusion Models with Human Feedback
Hangjie Yuan, Shiwei Zhang, Xiang Wang et al.
Learning Multi-Dimensional Human Preference for Text-to-Image Generation
Sixian Zhang, Bohan Wang, Junqiang Wu et al.
KAM-CoT: Knowledge Augmented Multimodal Chain-of-Thoughts Reasoning
Debjyoti Mondal, Suraj Modi, Subhadarshi Panda et al.
BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models
Zhen Xiang, Fengqing Jiang, Zidi Xiong et al.
PB-LLM: Partially Binarized Large Language Models
Zhihang Yuan, Yuzhang Shang, Zhen Dong
Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement
Xiuquan Hou, Meiqin Liu, Senlin Zhang et al.
Morphological Prototyping for Unsupervised Slide Representation Learning in Computational Pathology
Andrew Song, Richard J. Chen, Tong Ding et al.
FreeInit: Bridging Initialization Gap in Video Diffusion Models
Tianxing Wu, Chenyang Si, Yuming Jiang et al.
CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition
Feng Lu, Xiangyuan Lan, Lijun Zhang et al.
CCEdit: Creative and Controllable Video Editing via Diffusion Models
Ruoyu Feng, Wenming Weng, Yanhui Wang et al.
CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field
Jiarui Hu, Xianhao Chen, Boyin Feng et al.
A Benchmark for Learning to Translate a New Language from One Grammar Book
Garrett Tanzer, Mirac Suzgun, Eline Visser et al.
Arc2Face: A Foundation Model for ID-Consistent Human Faces
Foivos Paraperas Papantoniou, Alexandros Lattas, Stylianos Moschoglou et al.
Escape Sky-high Cost: Early-stopping Self-Consistency for Multi-step Reasoning
Yiwei Li, Peiwen Yuan, Shaoxiong Feng et al.
Move as You Say Interact as You Can: Language-guided Human Motion Generation with Scene Affordance
Zan Wang, Yixin Chen, Baoxiong Jia et al.
Deblurring 3D Gaussian Splatting
Byeonghyeon Lee, Howoong Lee, Xiangyu Sun et al.
Towards Foundation Models for Knowledge Graph Reasoning
Mikhail Galkin, Xinyu Yuan, Hesham Mostafa et al.
Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following
Yutong Feng, Biao Gong, Di Chen et al.
FlashAvatar: High-fidelity Head Avatar with Efficient Gaussian Embedding
Jun Xiang, Xuan Gao, Yudong Guo et al.
Amortizing intractable inference in large language models
Edward Hu, Moksh Jain, Eric Elmoznino et al.
DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D Generation
Yukun Huang, Jianan Wang, Yukai Shi et al.
EMAGE: Towards Unified Holistic Co-Speech Gesture Generation via Expressive Masked Audio Gesture Modeling
Haiyang Liu, Zihao Zhu, Giorgio Becherini et al.
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages
Jinyi Hu, Yuan Yao, Chongyi Wang et al.
TLControl: Trajectory and Language Control for Human Motion Synthesis
WEILIN WAN, Zhiyang Dou, Taku Komura et al.
Towards Large-scale 3D Representation Learning with Multi-dataset Point Prompt Training
Xiaoyang Wu, Zhuotao Tian, Xin Wen et al.
Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering
Kim Youwang, Tae-Hyun Oh, Gerard Pons-Moll
NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
Gengze Zhou, Yicong Hong, Zun Wang et al.
EMDM: Efficient Motion Diffusion Model for Fast, High-Quality Human Motion Generation
Wenyang Zhou, Zhiyang Dou, Zeyu Cao et al.
Curiosity-driven Red-teaming for Large Language Models
Zhang-Wei Hong, Idan Shenfeld, Johnson (Tsun-Hsuan) Wang et al.
Graph Neural Prompting with Large Language Models
Yijun Tian, Huan Song, Zichen Wang et al.
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation
Yichi Zhang, Ziqiao Ma, Xiaofeng Gao et al.
SAI3D: Segment Any Instance in 3D Scenes
Yingda Yin, Yuzheng Liu, Yang Xiao et al.
GLOP: Learning Global Partition and Local Construction for Solving Large-Scale Routing Problems in Real-Time
Haoran Ye, Jiarui Wang, Helan Liang et al.
Transcending the Limit of Local Window: Advanced Super-Resolution Transformer with Adaptive Token Dictionary
Leheng Zhang, Yawei Li, Xingyu Zhou et al.
MetaDiff: Meta-Learning with Conditional Diffusion for Few-Shot Learning
Baoquan Zhang, Chuyao Luo, Demin Yu et al.
LLM-grounded Video Diffusion Models
Long Lian, Baifeng Shi, Adam Yala et al.
MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception
Yiran Qin, Enshen Zhou, Qichang Liu et al.
VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis
Linshan Wu, Jia-Xin Zhuang, Hao Chen
RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting
Lei Shu, Liangchen Luo, Jayakumar Hoskere et al.
LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation
Yushi Lan, Fangzhou Hong, Shuai Yang et al.
Structure-Aware Sparse-View X-ray 3D Reconstruction
Yuanhao Cai, Jiahao Wang, Alan L. Yuille et al.
Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization
Renjie Pi, Tianyang Han, Wei Xiong et al.
A Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization
Qiyu Chen, Huiyuan Luo, Chengkan Lv et al.
Distilling Diffusion Models into Conditional GANs
Minguk Kang, Richard Zhang, Connelly Barnes et al.
V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models
Heng Wang, Jianbo Ma, Santiago Pascual et al.
Dynamic Graph Representation with Knowledge-aware Attention for Histopathology Whole Slide Image Analysis
Jiawen Li, Yuxuan Chen, Hongbo Chu et al.
Robustness of AI-Image Detectors: Fundamental Limits and Practical Attacks
Mehrdad Saberi, Vinu Sankar Sadasivan, Keivan Rezaei et al.
BAD-Gaussians: Bundle Adjusted Deblur Gaussian Splatting
Lingzhe Zhao, Peng Wang, Peidong Liu
Multiscale Positive-Unlabeled Detection of AI-Generated Texts
Yuchuan Tian, Hanting Chen, Xutao Wang et al.
CoSeR: Bridging Image and Language for Cognitive Super-Resolution
Haoze Sun, Wenbo Li, Jianzhuang Liu et al.
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis
Ziyue Jiang, Jinglin Liu, Yi Ren et al.
SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion
Hsuan-I Ho, Jie Song, Otmar Hilliges
Grounding Everything: Emerging Localization Properties in Vision-Language Transformers
Walid Bousselham, Felix Petersen, Vittorio Ferrari et al.
FontDiffuser: One-Shot Font Generation via Denoising Diffusion with Multi-Scale Content Aggregation and Style Contrastive Learning
Zhenhua Yang, Dezhi Peng, Yuxin Kong et al.
Depth Information Assisted Collaborative Mutual Promotion Network for Single Image Dehazing
Yafei Zhang, Shen Zhou, Huafeng Li
Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation
Homanga Bharadhwaj, Roozbeh Mottaghi, Abhinav Gupta et al.
Per-Gaussian Embedding-Based Deformation for Deformable 3D Gaussian Splatting
Jeongmin Bae, Seoha Kim, Youngsik Yun et al.
Model Stock: All we need is just a few fine-tuned models
Dong-Hwan Jang, Sangdoo Yun, Dongyoon Han
Confronting Reward Model Overoptimization with Constrained RLHF
Ted Moskovitz, Aaditya Singh, DJ Strouse et al.
MoCha-Stereo: Motif Channel Attention Network for Stereo Matching
Ziyang Chen, Wei Long, He Yao et al.
COTR: Compact Occupancy TRansformer for Vision-based 3D Occupancy Prediction
Qihang Ma, Xin Tan, Yanyun Qu et al.
Towards 3D Molecule-Text Interpretation in Language Models
Sihang Li, Zhiyuan Liu, Yanchen Luo et al.
LLaFS: When Large Language Models Meet Few-Shot Segmentation
Lanyun Zhu, Tianrun Chen, Deyi Ji et al.
V2X-Real: a Largs-Scale Dataset for Vehicle-to-Everything Cooperative Perception
Hao Xiang, Xin Xia, Zhaoliang Zheng et al.
ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion
Jiayu Yang, Ziang Cheng, Yunfei Duan et al.
Enhancing Job Recommendation through LLM-Based Generative Adversarial Networks
Yingpeng Du, Di Luo, Rui Yan et al.
SMFANet: A Lightweight Self-Modulation Feature Aggregation Network for Efficient Image Super-Resolution
mingjun zheng, Long Sun, Jiangxin Dong et al.
Improving Diffusion Models for Authentic Virtual Try-on in the Wild
Choi Yisol, Sangkyung Kwak, Kyungmin Lee et al.
Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation
Zhenliang Ni, Xinghao Chen, Yingjie Zhai et al.
OmniGlue: Generalizable Feature Matching with Foundation Model Guidance
Hanwen Jiang, Arjun Karpur, Bingyi Cao et al.
Elucidating the Exposure Bias in Diffusion Models
Mang Ning, Mingxiao Li, Jianlin Su et al.
FedASMU: Efficient Asynchronous Federated Learning with Dynamic Staleness-Aware Model Update
Ji Liu, Juncheng Jia, Tianshi Che et al.
From Audio to Photoreal Embodiment: Synthesizing Humans in Conversations
Evonne Ng, Javier Romero, Timur Bagautdinov et al.
Large-scale Reinforcement Learning for Diffusion Models
Yinan Zhang, Eric Tzeng, Yilun Du et al.
Temporal Adaptive RGBT Tracking with Modality Prompt
Hongyu Wang, Xiaotao Liu, Yifan Li et al.
Exploring Post-training Quantization in LLMs from Comprehensive Study to Low Rank Compensation
Zhewei Yao, Xiaoxia Wu, Cheng Li et al.
When Do We Not Need Larger Vision Models?
Baifeng Shi, Ziyang Wu, Maolin Mao et al.
SkeletonGait: Gait Recognition Using Skeleton Maps
Chao Fan, Jingzhe Ma, Dongyang Jin et al.
MCD: Diverse Large-Scale Multi-Campus Dataset for Robot Perception
Thien-Minh Nguyen, Shenghai Yuan, Thien Nguyen et al.
OneRestore: A Universal Restoration Framework for Composite Degradation
Yu Guo, Yuan Gao, Yuxu Lu et al.
PromptTTS 2: Describing and Generating Voices with Text Prompt
Yichong Leng, ZHifang Guo, Kai Shen et al.
SpikingBERT: Distilling BERT to Train Spiking Language Models Using Implicit Differentiation
Malyaban Bal, Abhronil Sengupta
GAvatar: Animatable 3D Gaussian Avatars with Implicit Mesh Learning
Ye Yuan, Xueting Li, Yangyi Huang et al.
RGBD GS-ICP SLAM
Seongbo Ha, Jiung Yeon, Hyeonwoo Yu
Text Prompt with Normality Guidance for Weakly Supervised Video Anomaly Detection
Zhiwei Yang, Jing Liu, Peng Wu
Learning to Act without Actions
Dominik Schmidt, Minqi Jiang
Plug-In Diffusion Model for Sequential Recommendation
Haokai Ma, Ruobing Xie, Lei Meng et al.
SolidGen: An Autoregressive Model for Direct B-rep Synthesis
Karl Willis, Joseph Lambourne, Nigel Morris et al.
EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models
YEFEI HE, Jing Liu, Weijia Wu et al.
Frequency-Spatial Entanglement Learning for Camouflaged Object Detection
Yanguang Sun, Chunyan Xu, Jian Yang et al.
Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval
Jiamian Wang, Guohao Sun, Pichao Wang et al.
HARDVS: Revisiting Human Activity Recognition with Dynamic Vision Sensors
Xiao Wang, Zongzhen Wu, Bo Jiang et al.
OpenBias: Open-set Bias Detection in Text-to-Image Generative Models
Moreno D', Incà, Elia Peruzzo et al.
BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP
Jiawang Bai, Kuofeng Gao, Shaobo Min et al.
On the Learnability of Watermarks for Language Models
Chenchen Gu, XIANG LI, Percy Liang et al.
METRA: Scalable Unsupervised RL with Metric-Aware Abstraction
Seohong Park, Oleh Rybkin, Sergey Levine
Learning to Rank in Generative Retrieval
Yongqi Li, Nan Yang, Liang Wang et al.
Towards Realistic Scene Generation with LiDAR Diffusion Models
Haoxi Ran, Vitor Guizilini, Yue Wang
DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting
Shijie Zhou, Zhiwen Fan, Dejia Xu et al.
ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference
Mengcheng Lan, Chaofeng Chen, Yiping Ke et al.
Free3D: Consistent Novel View Synthesis without 3D Representation
Chuanxia Zheng, Andrea Vedaldi
Optimizing Diffusion Noise Can Serve As Universal Motion Priors
Korrawe Karunratanakul, Konpat Preechakul, Emre Aksan et al.
GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting
XINJIE ZHANG, Xingtong Ge, Tongda Xu et al.
Scaling Laws for Data Filtering— Data Curation cannot be Compute Agnostic
Sachin Goyal, Pratyush Maini, Zachary Lipton et al.
Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer
Eric Brachmann, Jamie Wynn, Shuai Chen et al.
SkillDiffuser: Interpretable Hierarchical Planning via Skill Abstractions in Diffusion-Based Task Execution
Zhixuan Liang, Yao Mu, Hengbo Ma et al.
End-to-End Rate-Distortion Optimized 3D Gaussian Representation
Henan Wang, Hanxin Zhu, Tianyu He et al.
TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos
Yufu Wang, Ziyun Wang, Lingjie Liu et al.
NeRF-LiDAR: Generating Realistic LiDAR Point Clouds with Neural Radiance Fields
Junge Zhang, Feihu Zhang, Shaochen Kuang et al.
Deep Temporal Graph Clustering
Meng Liu, Yue Liu, KE LIANG et al.
DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing
Kaiwen Zhang, Yifan Zhou, Xudong XU et al.
DiffusionTrack: Diffusion Model for Multi-Object Tracking
Run Luo, Zikai Song, Lintao Ma et al.
FINER: Flexible Spectral-bias Tuning in Implicit NEural Representation by Variable-periodic Activation Functions
Zhen Liu, Hao Zhu, Qi Zhang et al.
NeRF On-the-go: Exploiting Uncertainty for Distractor-free NeRFs in the Wild
Weining Ren, Zihan Zhu, Boyang Sun et al.
Attention-Challenging Multiple Instance Learning for Whole Slide Image Classification
Yunlong Zhang, Honglin Li, YUXUAN SUN et al.
OmniSeg3D: Omniversal 3D Segmentation via Hierarchical Contrastive Learning
Haiyang Ying, Yixuan Yin, Jinzhi Zhang et al.
Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions
Oindrila Saha, Grant Horn, Subhransu Maji
A Closer Look at the Few-Shot Adaptation of Large Vision-Language Models
Julio Silva-Rodríguez, Sina Hajimiri, Ismail Ben Ayed et al.
Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot
Fabien Baradel, Thomas Lucas, Matthieu Armando et al.
Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation
Shanshan Zhong, Zhongzhan Huang, Shanghua Gao et al.
Multi-Source Diffusion Models for Simultaneous Music Generation and Separation
Giorgio Mariani, Irene Tallini, Emilian Postolache et al.
Make RepVGG Greater Again: A Quantization-Aware Approach
Xuesong Nie, Yunfeng Yan, Siyuan Li et al.
Open3DSG: Open-Vocabulary 3D Scene Graphs from Point Clouds with Queryable Objects and Open-Set Relationships
Sebastian Koch, Narunas Vaskevicius, Mirco Colosi et al.
MonoCD: Monocular 3D Object Detection with Complementary Depths
Longfei Yan, Pei Yan, Shengzhou Xiong et al.
Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation
Shuting He, Henghui Ding
TC4D: Trajectory-Conditioned Text-to-4D Generation
Sherwin Bahmani, Xian Liu, Wang Yifan et al.
GPT4Point: A Unified Framework for Point-Language Understanding and Generation
Zhangyang Qi, Ye Fang, Zeyi Sun et al.
Unifying 3D Vision-Language Understanding via Promptable Queries
ziyu zhu, Zhuofan Zhang, Xiaojian Ma et al.
Open-Vocabulary Video Anomaly Detection
Peng Wu, Xuerong Zhou, Guansong Pang et al.
IMPRINT: Generative Object Compositing by Learning Identity-Preserving Representation
Yizhi Song, Zhifei Zhang, Zhe Lin et al.
Video Interpolation with Diffusion Models
Siddhant Jain, Daniel Watson, Aleksander Holynski et al.
GIVT: Generative Infinite-Vocabulary Transformers
Michael Tschannen, Cian Eastwood, Fabian Mentzer
HPNet: Dynamic Trajectory Forecasting with Historical Prediction Attention
Xiaolong Tang, Meina Kan, Shiguang Shan et al.
Monte Carlo guided Denoising Diffusion models for Bayesian linear inverse problems.
Gabriel Cardoso, Yazid Janati el idrissi, Sylvain Le Corff et al.
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
Yuchao Gu, Yipin Zhou, Bichen Wu et al.
Evaluating the Zero-shot Robustness of Instruction-tuned Language Models
Jiuding Sun, Chantal Shaib, Byron Wallace
Grokking as the transition from lazy to rich training dynamics
Tanishq Kumar, Blake Bordelon, Samuel Gershman et al.
Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation
guo, Tianwei Lin
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization
Jisu Nam, Heesu Kim, DongJae Lee et al.
Language-Image Pre-training with Long Captions
Kecheng Zheng, Yifei Zhang, Wei Wu et al.
GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views
Yaniv Wolf, Amit Bracha, Ron Kimmel
DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation
Junming Chen, Yunfei Liu, Jianan Wang et al.
Koala: Key Frame-Conditioned Long Video-LLM
Reuben Tan, Ximeng Sun, Ping Hu et al.
Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery
Sukrut Rao, Sweta Mahajan, Moritz Böhle et al.
SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition
Jeonghyeok Do, Munchurl Kim
Source-Free Domain Adaptation with Frozen Multimodal Foundation Model
Song Tang, Wenxin Su, Mao Ye et al.
Scaling for Training Time and Post-hoc Out-of-distribution Detection Enhancement
Kai Xu, Rongyu Chen, Gianni Franchi et al.
Relation DETR: Exploring Explicit Position Relation Prior for Object Detection
Xiuquan Hou, Meiqin Liu, Senlin Zhang et al.
Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology
Oren Kraus, Kian Kenyon-Dean, Saber Saberian et al.
DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video
Narek Tumanyan, Assaf Singer, Shai Bagon et al.
Large Motion Model for Unified Multi-Modal Motion Generation
Mingyuan Zhang, Daisheng Jin, Chenyang Gu et al.
Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers
Hongjie Wang, Bhishma Dedhia, Niraj Jha
DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction
Weiyi Lv, Yuhang Huang, NING Zhang et al.
Towards Compact 3D Representations via Point Feature Enhancement Masked Autoencoders
Yaohua Zha, Huizhen Ji, Jinmin Li et al.
pix2gestalt: Amodal Segmentation by Synthesizing Wholes
Ege Ozguroglu, Ruoshi Liu, Dídac Surís et al.
Gated Attention Coding for Training High-Performance and Efficient Spiking Neural Networks
Xuerui Qiu, Rui-Jie Zhu, Yuhong Chou et al.
Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction
Junuk Cha, Jihyeon Kim, Jae Shin Yoon et al.
PEEKABOO: Interactive Video Generation via Masked-Diffusion
Yash Jain, Anshul Nasery, Vibhav Vineet et al.
ControlRoom3D: Room Generation using Semantic Proxy Rooms
Jonas Schult, Sam Tsai, Lukas Höllein et al.
DePT: Decoupled Prompt Tuning
Ji Zhang, Shihan Wu, Lianli Gao et al.
Unleashing the Potential of SAM for Medical Adaptation via Hierarchical Decoding
Zhiheng Cheng, Qingyue Wei, Hongru Zhu et al.
DiffCast: A Unified Framework via Residual Diffusion for Precipitation Nowcasting
Demin Yu, Xutao Li, Yunming Ye et al.
Toward effective protection against diffusion-based mimicry through score distillation
Haotian Xue, Chumeng Liang, Xiaoyu Wu et al.
Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models
Hyeonho Jeong, Jong Chul YE
Seamless Human Motion Composition with Blended Positional Encodings
German Barquero, Sergio Escalera, Cristina Palmero
LaneSegNet: Map Learning with Lane Segment Perception for Autonomous Driving
Tianyu Li, Peijin Jia, Bangjun Wang et al.
GraspXL: Generating Grasping Motions for Diverse Objects at Scale
Hui Zhang, Sammy Christen, Zicong Fan et al.
Ternary Spike: Learning Ternary Spikes for Spiking Neural Networks
Yufei Guo, Yuanpei Chen, Xiaode Liu et al.