Most Cited 2024 "attribute disentanglement" Papers
12,324 papers found • Page 8 of 62
Conference
Decoupling Static and Hierarchical Motion Perception for Referring Video Segmentation
Shuting He, Henghui Ding
What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
Aaditya Singh, Ted Moskovitz, Feilx Hill et al.
MMPareto: Boosting Multimodal Learning with Innocent Unimodal Assistance
Yake Wei, Di Hu
Should we be going MAD? A Look at Multi-Agent Debate Strategies for LLMs
Andries Smit, Nathan Grinsztajn, Paul Duckworth et al.
Koala: Key Frame-Conditioned Long Video-LLM
Reuben Tan, Ximeng Sun, Ping Hu et al.
Efficient Dataset Distillation via Minimax Diffusion
Jianyang Gu, Saeed Vahidian, Vyacheslav Kungurtsev et al.
Focus on Your Instruction: Fine-grained and Multi-instruction Image Editing by Attention Modulation
guo, Tianwei Lin
Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation without Manual Labels
Rui Huang, Songyou Peng, Ayca Takmaz et al.
GPT4Point: A Unified Framework for Point-Language Understanding and Generation
Zhangyang Qi, Ye Fang, Zeyi Sun et al.
Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs
Kanchana Ranasinghe, Satya Narayan Shukla, Omid Poursaeed et al.
Diffusion Generative Flow Samplers: Improving learning signals through partial trajectory optimization
Dinghuai Zhang, Ricky T. Q. Chen, Chenghao Liu et al.
Sentence-level Prompts Benefit Composed Image Retrieval
Yang Bai, Xinxing Xu, Yong Liu et al.
RGBD Objects in the Wild: Scaling Real-World 3D Object Learning from RGB-D Videos
Hongchi Xia, Yang Fu, Sifei Liu et al.
Position: Topological Deep Learning is the New Frontier for Relational Learning
Theodore Papamarkou, Tolga Birdal, Michael Bronstein et al.
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization
Jisu Nam, Heesu Kim, DongJae Lee et al.
ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models
Jeong-gi Kwak, Erqun Dong, Yuhe Jin et al.
Toward effective protection against diffusion-based mimicry through score distillation
Haotian Xue, Chumeng Liang, Xiaoyu Wu et al.
HPNet: Dynamic Trajectory Forecasting with Historical Prediction Attention
Xiaolong Tang, Meina Kan, Shiguang Shan et al.
ViDA: Homeostatic Visual Domain Adapter for Continual Test Time Adaptation
Jiaming Liu, Senqiao Yang, Peidong Jia et al.
Consistency Models as a Rich and Efficient Policy Class for Reinforcement Learning
Zihan Ding, Chi Jin
Large Motion Model for Unified Multi-Modal Motion Generation
Mingyuan Zhang, Daisheng Jin, Chenyang Gu et al.
Differentially Private Synthetic Data via Foundation Model APIs 2: Text
Chulin Xie, Zinan Lin, Arturs Backurs et al.
Generative Pre-training for Speech with Flow Matching
Alexander Liu, Matthew Le, Apoorv Vyas et al.
Unifying 3D Vision-Language Understanding via Promptable Queries
ziyu zhu, Zhuofan Zhang, Xiaojian Ma et al.
SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition
Jeonghyeok Do, Munchurl Kim
GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views
Yaniv Wolf, Amit Bracha, Ron Kimmel
C3: High-Performance and Low-Complexity Neural Compression from a Single Image or Video
Hyunjik Kim, Matthias Bauer, Lucas Theis et al.
Monte Carlo guided Denoising Diffusion models for Bayesian linear inverse problems.
Gabriel Cardoso, Yazid Janati el idrissi, Sylvain Le Corff et al.
3D Facial Expressions through Analysis-by-Neural-Synthesis
George Retsinas, Panagiotis Filntisis, Radek Danecek et al.
SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection
Gang Zhang, Chen Junnan, Guohuan Gao et al.
GraspXL: Generating Grasping Motions for Diverse Objects at Scale
Hui Zhang, Sammy Christen, Zicong Fan et al.
Transformers Implement Functional Gradient Descent to Learn Non-Linear Functions In Context
Xiang Cheng, Yuxin Chen, Suvrit Sra
CoherentGS: Sparse Novel View Synthesis with Coherent 3D Gaussians
Avinash Paliwal, Wei Ye, Jinhui Xiong et al.
BoQ: A Place is Worth a Bag of Learnable Queries
Amar Ali-bey, Brahim Chaib-draa, Philippe Giguère
OptiMUS: Scalable Optimization Modeling with (MI)LP Solvers and Large Language Models
Ali AhmadiTeshnizi, Wenzhi Gao, Madeleine Udell
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
Yuchao Gu, Yipin Zhou, Bichen Wu et al.
DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video
Narek Tumanyan, Assaf Singer, Shai Bagon et al.
Large Language Models Are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales
Taeyoon Kwon, Kai Ong, Dongjin Kang et al.
SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting
Hoon Kim, Minje Jang, Wonjun Yoon et al.
Getting the most out of your tokenizer for pre-training and domain adaptation
Gautier Dagan, Gabriel Synnaeve, Baptiste Roziere
PerceptionGPT: Effectively Fusing Visual Perception into LLM
Renjie Pi, Lewei Yao, Jiahui Gao et al.
All-in-one simulation-based inference
Manuel Gloeckler, Michael Deistler, Christian Weilbach et al.
SVGDreamer: Text Guided SVG Generation with Diffusion Model
XiMing Xing, Chuang Wang, Haitao Zhou et al.
Revitalizing Multivariate Time Series Forecasting: Learnable Decomposition with Inter-Series Dependencies and Intra-Series Variations Modeling
Guoqi Yu, Jing Zou, Xiaowei Hu et al.
Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning
Michael Matthews, Michael Beukman, Benjamin Ellis et al.
Volumetric Environment Representation for Vision-Language Navigation
Liu, Wenguan Wang, Yi Yang
pix2gestalt: Amodal Segmentation by Synthesizing Wholes
Ege Ozguroglu, Ruoshi Liu, Dídac Surís et al.
DePT: Decoupled Prompt Tuning
Ji Zhang, Shihan Wu, Lianli Gao et al.
Local All-Pair Correspondence for Point Tracking
Seokju Cho, Jiahui Huang, Jisu Nam et al.
DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation
Junming Chen, Yunfei Liu, Jianan Wang et al.
Negative Label Guided OOD Detection with Pretrained Vision-Language Models
Xue JIANG, Feng Liu, Zhen Fang et al.
SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model
Armen Avetisyan, Christopher Xie, Henry Howard-Jenkins et al.
DP-OPT: Make Large Language Model Your Privacy-Preserving Prompt Engineer
Junyuan Hong, Jiachen (Tianhao) Wang, Chenhui Zhang et al.
Text2HOI: Text-guided 3D Motion Generation for Hand-Object Interaction
Junuk Cha, Jihyeon Kim, Jae Shin Yoon et al.
Gated Attention Coding for Training High-Performance and Efficient Spiking Neural Networks
Xuerui Qiu, Rui-Jie Zhu, Yuhong Chou et al.
PINNsFormer: A Transformer-Based Framework For Physics-Informed Neural Networks
Zhiyuan Zhao, Xueying Ding, B. Aditya Prakash
BEND: Benchmarking DNA Language Models on Biologically Meaningful Tasks
Frederikke Marin, Felix Teufel, Marc Horlacher et al.
QUAR-VLA: Vision-Language-Action Model for Quadruped Robots
Pengxiang Ding, Han Zhao, Wenjie Zhang et al.
MuRF: Multi-Baseline Radiance Fields
Haofei Xu, Anpei Chen, Yuedong Chen et al.
DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction
Weiyi Lv, Yuhang Huang, NING Zhang et al.
Align Your Steps: Optimizing Sampling Schedules in Diffusion Models
Amirmojtaba Sabour, Sanja Fidler, Karsten Kreis
Towards Compact 3D Representations via Point Feature Enhancement Masked Autoencoders
Yaohua Zha, Huizhen Ji, Jinmin Li et al.
PEEKABOO: Interactive Video Generation via Masked-Diffusion
Yash Jain, Anshul Nasery, Vibhav Vineet et al.
OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation
Bohao Peng, Xiaoyang Wu, Li Jiang et al.
Compressing LLMs: The Truth is Rarely Pure and Never Simple
AJAY JAISWAL, Zhe Gan, Xianzhi Du et al.
HGPrompt: Bridging Homogeneous and Heterogeneous Graphs for Few-Shot Prompt Learning
Xingtong Yu, Yuan Fang, Zemin Liu et al.
Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models
Yuchen Yang, Kwonjoon Lee, Behzad Dariush et al.
ControlRoom3D: Room Generation using Semantic Proxy Rooms
Jonas Schult, Sam Tsai, Lukas Höllein et al.
The Neglected Tails in Vision-Language Models
Shubham Parashar, Tian Liu, Zhiqiu Lin et al.
SKILL-MIX: a Flexible and Expandable Family of Evaluations for AI Models
Dingli Yu, Simran Kaur, Arushi Gupta et al.
TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting
Jiahe Li, Jiawei Zhang, Xiao Bai et al.
The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing
Shen Nie, Hanzhong Guo, Cheng Lu et al.
Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology
Oren Kraus, Kian Kenyon-Dean, Saber Saberian et al.
Diffusion-EDFs: Bi-equivariant Denoising Generative Modeling on SE(3) for Visual Robotic Manipulation
Hyunwoo Ryu, Jiwoo Kim, Hyunseok An et al.
Zero-Shot ECG Classification with Multimodal Learning and Test-time Clinical Knowledge Enhancement
che liu, Zhongwei Wan, Cheng Ouyang et al.
Geographic Location Encoding with Spherical Harmonics and Sinusoidal Representation Networks
Marc Rußwurm, Konstantin Klemmer, Esther Rolf et al.
Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes
Zhen Qin, Daoyuan Chen, Bingchen Qian et al.
LaneSegNet: Map Learning with Lane Segment Perception for Autonomous Driving
Tianyu Li, Peijin Jia, Bangjun Wang et al.
Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers
Hongjie Wang, Bhishma Dedhia, Niraj Jha
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules
Hung Le, Hailin Chen, Amrita Saha et al.
VeCLIP: Improving CLIP Training via Visual-enriched Captions
Zhengfeng Lai, Haotian Zhang, Bowen Zhang et al.
LEAP: Liberate Sparse-View 3D Modeling from Camera Poses
Hanwen Jiang, Zhenyu Jiang, Yue Zhao et al.
Position: Bayesian Deep Learning is Needed in the Age of Large-Scale AI
Theodore Papamarkou, Maria Skoularidou, Konstantina Palla et al.
MotionEditor: Editing Video Motion via Content-Aware Diffusion
Shuyuan Tu, Qi Dai, Zhi-Qi Cheng et al.
Feedback Loops With Language Models Drive In-Context Reward Hacking
Alexander Pan, Erik Jones, Meena Jagadeesan et al.
Loopy-SLAM: Dense Neural SLAM with Loop Closures
Lorenzo Liso, Erik Sandström, Vladimir Yugay et al.
KTPFormer: Kinematics and Trajectory Prior Knowledge-Enhanced Transformer for 3D Human Pose Estimation
Jihua Peng, Yanghong Zhou, Tracy P Y Mok
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control
Yue Han, Junwei Zhu, Keke He et al.
Test-Time Model Adaptation with Only Forward Passes
Shuaicheng Niu, Chunyan Miao, Guohao Chen et al.
Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models
Hyeonho Jeong, Jong Chul YE
Differentially Private Synthetic Data via Foundation Model APIs 1: Images
Zinan Lin, Sivakanth Gopi, Janardhan Kulkarni et al.
DiffCast: A Unified Framework via Residual Diffusion for Precipitation Nowcasting
Demin Yu, Xutao Li, Yunming Ye et al.
Multi-Task Dense Prediction via Mixture of Low-Rank Experts
Yuqi Yang, Peng-Tao Jiang, Qibin Hou et al.
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism
Yanxi Chen, Xuchen Pan, Yaliang Li et al.
DQ-DETR: DETR with Dynamic Query for Tiny Object Detection
Yi-Xin Huang, Hou-I Liu, Hong-Han Shuai et al.
Seamless Human Motion Composition with Blended Positional Encodings
German Barquero, Sergio Escalera, Cristina Palmero
Lemur: Integrating Large Language Models in Automated Program Verification
Haoze Wu, Clark Barrett, Nina Narodytska
Language Model Inversion
John X. Morris, Wenting Zhao, Justin Chiu et al.
Transcriptomics-guided Slide Representation Learning in Computational Pathology
Guillaume Jaume, Lukas Oldenburg, Anurag Vaidya et al.
Training-Free Long-Context Scaling of Large Language Models
Chenxin An, Fei Huang, Jun Zhang et al.
Context-I2W: Mapping Images to Context-Dependent Words for Accurate Zero-Shot Composed Image Retrieval
Yuanmin Tang, Jing Yu, Keke Gai et al.
Parameter-Efficient Fine-Tuning with Discrete Fourier Transform
Ziqi Gao, Qichao Wang, Aochuan Chen et al.
Ternary Spike: Learning Ternary Spikes for Spiking Neural Networks
Yufei Guo, Yuanpei Chen, Xiaode Liu et al.
Diffusion Models for Open-Vocabulary Segmentation
Laurynas Karazija, Iro Laina, Andrea Vedaldi et al.
Transformers, parallel computation, and logarithmic depth
Clayton Sanford, Daniel Hsu, Matus Telgarsky
Position: The No Free Lunch Theorem, Kolmogorov Complexity, and the Role of Inductive Biases in Machine Learning
Micah Goldblum, Marc Finzi, Keefer Rowan et al.
OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion
Xinyu Zhan, Lixin Yang, Yifei Zhao et al.
Neural Parametric Gaussians for Monocular Non-Rigid Object Reconstruction
Devikalyan Das, Christopher Wewer, Raza Yunus et al.
Modality-agnostic Domain Generalizable Medical Image Segmentation by Multi-Frequency in Multi-Scale Attention
Ju-Hyeon Nam, Nur Suriza Syazwany, Su Jung Kim et al.
SILC: Improving Vision Language Pretraining with Self-Distillation
Muhammad Ferjad Naeem, Yongqin Xian, Xiaohua Zhai et al.
Foundation Policies with Hilbert Representations
Seohong Park, Tobias Kreiman, Sergey Levine
MASTER: Market-Guided Stock Transformer for Stock Price Forecasting
Tong Li, Zhaoyang Liu, Yanyan Shen et al.
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding
Chenpeng Du, Yiwei Guo, Feiyu Shen et al.
Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors
Nicolae Ristea, Florinel Croitoru, Radu Tudor Ionescu et al.
CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
Shuyang Sun, Runjia Li, Philip H.S. Torr et al.
Correlation Matching Transformation Transformers for UHD Image Restoration
Cong Wang, Jinshan Pan, Wei Wang et al.
AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models
Xuelong Dai, Kaisheng Liang, Bin Xiao
Point Cloud Pre-training with Diffusion Models
xiao zheng, Xiaoshui Huang, Guofeng Mei et al.
VLCounter: Text-Aware Visual Representation for Zero-Shot Object Counting
Seunggu Kang, WonJun Moon, Euiyeon Kim et al.
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion
Daniel Winter, Matan Cohen, Shlomi Fruchter et al.
Driving Everywhere with Large Language Model Policy Adaptation
Boyi Li, Yue Wang, Jiageng Mao et al.
Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory Prediction
Inhwan Bae, Junoh Lee, Hae-Gon Jeon
NeRFiller: Completing Scenes via Generative 3D Inpainting
Ethan Weber, Aleksander Holynski, Varun Jampani et al.
On Diffusion Modeling for Anomaly Detection
Victor Livernoche, Vineet Jain, Yashar Hezaveh et al.
InstructZero: Efficient Instruction Optimization for Black-Box Large Language Models
Lichang Chen, Jiuhai Chen, Tom Goldstein et al.
FFT-Based Dynamic Token Mixer for Vision
Yuki Tatsunami, Masato Taki
Delving into Multimodal Prompting for Fine-Grained Visual Classification
Xin Jiang, Hao Tang, Junyao Gao et al.
Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion
Hila Manor, Tomer Michaeli
Rethinking Diffusion Model for Multi-Contrast MRI Super-Resolution
Guangyuan Li, Chen Rao, Juncheng Mo et al.
Decoding-time Realignment of Language Models
Tianlin Liu, Shangmin Guo, Leonardo Martins Bianco et al.
Exploring Target Representations for Masked Autoencoders
xingbin liu, Jinghao Zhou, Tao Kong et al.
Orthogonal Adaptation for Modular Customization of Diffusion Models
Ryan Po, Guandao Yang, Kfir Aberman et al.
COLLIE: Systematic Construction of Constrained Text Generation Tasks
Shunyu Yao, Howard Chen, Austin Hanjie et al.
MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts
Jianan Zhou, Zhiguang Cao, Yaoxin Wu et al.
SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation
Yi-Chia Chen, WeiHua Li, Cheng Sun et al.
Massive Editing for Large Language Models via Meta Learning
Chenmien Tan, Ge Zhang, Jie Fu
DocFormerv2: Local Features for Document Understanding
Srikar Appalaraju, Peng Tang, Qi Dong et al.
MuSc: Zero-Shot Industrial Anomaly Classification and Segmentation with Mutual Scoring of the Unlabeled Images
Xurui Li, Ziming Huang, Feng Xue et al.
Magnushammer: A Transformer-Based Approach to Premise Selection
Maciej Mikuła, Szymon Tworkowski, Szymon Antoniak et al.
On the Origins of Linear Representations in Large Language Models
Yibo Jiang, Goutham Rajendran, Pradeep Ravikumar et al.
Auto MC-Reward: Automated Dense Reward Design with Large Language Models for Minecraft
Hao Li, Xue Yang, Zhaokai Wang et al.
Image Restoration by Denoising Diffusion Models with Iteratively Preconditioned Guidance
Tomer Garber, Tom Tirer
Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis
Yanzuo Lu, Manlin Zhang, Jinhua Ma et al.
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning
Zhiheng Xi, Wenxiang Chen, Boyang Hong et al.
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
Shilin Yan, Renrui Zhang, Ziyu Guo et al.
Hot or Cold? Adaptive Temperature Sampling for Code Generation with Large Language Models
Yuqi Zhu, Jia Li, Ge Li et al.
Revisiting Graph-Based Fraud Detection in Sight of Heterophily and Spectrum
Fan Xu, Nan Wang, Hao Wu et al.
Multi-View Causal Representation Learning with Partial Observability
Dingling Yao, Danru Xu, Sébastien Lachapelle et al.
Adaptive Bidirectional Displacement for Semi-Supervised Medical Image Segmentation
Hanyang Chi, Jian Pang, Bingfeng Zhang et al.
OWL: A Large Language Model for IT Operations
Hongcheng Guo, Jian Yang, Jiaheng Liu et al.
Towards Modular LLMs by Building and Reusing a Library of LoRAs
Oleksiy Ostapenko, Zhan Su, Edoardo Ponti et al.
GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection
hang yao, Ming LIU, Zhicun Yin et al.
VCP-CLIP: A visual context prompting model for zero-shot anomaly segmentation
Zhen Qu, Xian Tao, Mukesh Prasad et al.
Harnessing the Power of MLLMs for Transferable Text-to-Image Person ReID
Wentao Tan, Changxing Ding, Jiayu Jiang et al.
BRAVE: Broadening the visual encoding of vision-language models
Oguzhan Fatih Kar, Alessio Tonioni, Petra Poklukar et al.
Bilateral Propagation Network for Depth Completion
Jie Tang, Fei-Peng Tian, Boshi An et al.
SECap: Speech Emotion Captioning with Large Language Model
Yaoxun Xu, Hangting Chen, Jianwei Yu et al.
RCooper: A Real-world Large-scale Dataset for Roadside Cooperative Perception
Ruiyang Hao, Siqi Fan, Yingru Dai et al.
CLLMs: Consistency Large Language Models
Siqi Kou, Lanxiang Hu, Zhezhi He et al.
PC-Conv: Unifying Homophily and Heterophily with Two-Fold Filtering
Bingheng Li, Erlin Pan, Zhao Kang
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation
Omer Dahary, Or Patashnik, Kfir Aberman et al.
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking
Kaifeng Lyu, Jikai Jin, Zhiyuan Li et al.
IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination
Xi Chen, Sida Peng, Dongchen Yang et al.
An Unforgeable Publicly Verifiable Watermark for Large Language Models
Aiwei Liu, Leyi Pan, Xuming Hu et al.
MemFlow: Optical Flow Estimation and Prediction with Memory
Qiaole Dong, Yanwei Fu
SelfPromer: Self-Prompt Dehazing Transformers with Depth-Consistency
8137 Feiyu Zhu, Reid Simmons
CogBench: a large language model walks into a psychology lab
Julian Coda-Forno, Marcel Binz, Jane Wang et al.
Learning to Route Among Specialized Experts for Zero-Shot Generalization
Mohammed Muqeeth, Haokun Liu, Yufan Liu et al.
Generalized Large-Scale Data Condensation via Various Backbone and Statistical Matching
Shitong Shao, Zeyuan Yin, Muxin Zhou et al.
Editing Language Model
Based Knowledge Graph Embeddings
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation
Zhiwu Qing, Shiwei Zhang, Jiayu Wang et al.
ControlLLM: Augment Language Models with Tools by Searching on Graphs
Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao et al.
FedAS: Bridging Inconsistency in Personalized Federated Learning
Xiyuan Yang, Wenke Huang, Mang Ye
Prototypical Information Bottlenecking and Disentangling for Multimodal Cancer Survival Prediction
Yilan Zhang, Yingxue XU, Jianqi Chen et al.
LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts
Hanan Gani, Shariq Bhat, Muzammal Naseer et al.
SAMformer: Unlocking the Potential of Transformers in Time Series Forecasting with Sharpness-Aware Minimization and Channel-Wise Attention
Romain Ilbert, Ambroise Odonnat, Vasilii Feofanov et al.
OpenNeRF: Open Set 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views
Francis Engelmann, Fabian Manhardt, Michael Niemeyer et al.
Latent Guard: a Safety Framework for Text-to-image Generation
Runtao Liu, Ashkan Khakzar, Jindong Gu et al.
Instruction Tuning for Secure Code Generation
Jingxuan He, Mark Vero, Gabriela Krasnopolska et al.
Diffusion Handles Enabling 3D Edits for Diffusion Models by Lifting Activations to 3D
Karran Pandey, Paul Guerrero, Matheus Gadelha et al.
Text2Loc: 3D Point Cloud Localization from Natural Language
Yan Xia, Letian Shi, Zifeng Ding et al.
Intrinsic Image Diffusion for Indoor Single-view Material Estimation
Peter Kocsis, Vincent Sitzmann, Matthias Nießner
OmniSat: Self-Supervised Modality Fusion for Earth Observation
Guillaume Astruc, Nicolas Gonthier, Clement Mallet et al.
TACO: Benchmarking Generalizable Bimanual Tool-ACtion-Object Understanding
Yun Liu, Haolin Yang, Xu Si et al.
Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts
Xinhua Cheng, Tianyu Yang, Jianan Wang et al.
Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance Primitives
Ronghui Li, Yuxiang Zhang, Yachao Zhang et al.
On Penalty Methods for Nonconvex Bilevel Optimization and First-Order Stochastic Approximation
Jeongyeol Kwon, Dohyun Kwon, Stephen Wright et al.
Position: What Can Large Language Models Tell Us about Time Series Analysis
Ming Jin, Yi-Fan Zhang, Wei Chen et al.
Lane Graph as Path: Continuity-preserving Path-wise Modeling for Online Lane Graph Construction
Bencheng Liao, Shaoyu Chen, Bo Jiang et al.
In-Context Learning Learns Label Relationships but Is Not Conventional Learning
Jannik Kossen, Yarin Gal, Tom Rainforth
Implicit Discriminative Knowledge Learning for Visible-Infrared Person Re-Identification
kaijie ren, Lei Zhang
Improving 2D Feature Representations by 3D-Aware Fine-Tuning
Yuanwen Yue, Anurag Das, Francis Engelmann et al.
From Molecules to Materials: Pre-training Large Generalizable Models for Atomic Property Prediction
Nima Shoghi, Adeesh Kolluru, John Kitchin et al.
HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video
Zicong Fan, Maria Parelli, Maria Kadoglou et al.
HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution
Xiang Zhang, Yulun Zhang, Fisher Yu
Controlled Text Generation via Language Model Arithmetic
Jasper Dekoninck, Marc Fischer, Luca Beurer-Kellner et al.
Neural Operators with Localized Integral and Differential Kernels
Miguel Liu-Schiaffini, Julius Berner, Boris Bonev et al.
View-Consistent 3D Editing with Gaussian Splatting
Yuxuan Wang, Xuanyu Yi, Zike Wu et al.
Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model
Yinan Zheng, Jianxiong Li, Dongjie Yu et al.
CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model
Jianhao Zeng, Dan Song, Weizhi Nie et al.