Most Cited 2024 "portrait editing" Papers
12,324 papers found • Page 4 of 62
Conference
Point Cloud Pre-training with Diffusion Models
xiao zheng, Xiaoshui Huang, Guofeng Mei et al.
MASTER: Market-Guided Stock Transformer for Stock Price Forecasting
Tong Li, Zhaoyang Liu, Yanyan Shen et al.
Journey to the Center of the Knowledge Neurons: Discoveries of Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons
Yuheng Chen, Pengfei Cao, Yubo Chen et al.
SILC: Improving Vision Language Pretraining with Self-Distillation
Muhammad Ferjad Naeem, Yongqin Xian, Xiaohua Zhai et al.
VeCLIP: Improving CLIP Training via Visual-enriched Captions
Zhengfeng Lai, Haotian Zhang, Bowen Zhang et al.
Can Language Beat Numerical Regression? Language-Based Multimodal Trajectory Prediction
Inhwan Bae, Junoh Lee, Hae-Gon Jeon
TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting
Jiahe Li, Jiawei Zhang, Xiao Bai et al.
VLCounter: Text-Aware Visual Representation for Zero-Shot Object Counting
Seunggu Kang, WonJun Moon, Euiyeon Kim et al.
PerceptionGPT: Effectively Fusing Visual Perception into LLM
Renjie Pi, Lewei Yao, Jiahui Gao et al.
Geographic Location Encoding with Spherical Harmonics and Sinusoidal Representation Networks
Marc Rußwurm, Konstantin Klemmer, Esther Rolf et al.
Driving Everywhere with Large Language Model Policy Adaptation
Boyi Li, Yue Wang, Jiageng Mao et al.
Diffusion Models for Open-Vocabulary Segmentation
Laurynas Karazija, Iro Laina, Andrea Vedaldi et al.
The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing
Shen Nie, Hanzhong Guo, Cheng Lu et al.
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding
Chenpeng Du, Yiwei Guo, Feiyu Shen et al.
Correlation Matching Transformation Transformers for UHD Image Restoration
Cong Wang, Jinshan Pan, Wei Wang et al.
DocFormerv2: Local Features for Document Understanding
Srikar Appalaraju, Peng Tang, Qi Dong et al.
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
Shilin Yan, Renrui Zhang, Ziyu Guo et al.
Image Restoration by Denoising Diffusion Models with Iteratively Preconditioned Guidance
Tomer Garber, Tom Tirer
SECap: Speech Emotion Captioning with Large Language Model
Yaoxun Xu, Hangting Chen, Jianwei Yu et al.
SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation
Yi-Chia Chen, WeiHua Li, Cheng Sun et al.
Large Language Models Are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated Rationales
Taeyoon Kwon, Kai Ong, Dongjin Kang et al.
Magnushammer: A Transformer-Based Approach to Premise Selection
Maciej Mikuła, Szymon Tworkowski, Szymon Antoniak et al.
GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection
hang yao, Ming LIU, Zhicun Yin et al.
Language Model Inversion
John X. Morris, Wenting Zhao, Justin Chiu et al.
PC-Conv: Unifying Homophily and Heterophily with Two-Fold Filtering
Bingheng Li, Erlin Pan, Zhao Kang
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion
Daniel Winter, Matan Cohen, Shlomi Fruchter et al.
FedAS: Bridging Inconsistency in Personalized Federated Learning
Xiyuan Yang, Wenke Huang, Mang Ye
Editing Language Model
Based Knowledge Graph Embeddings
Context-I2W: Mapping Images to Context-Dependent Words for Accurate Zero-Shot Composed Image Retrieval
Yuanmin Tang, Jing Yu, Keke Gai et al.
Neural Parametric Gaussians for Monocular Non-Rigid Object Reconstruction
Devikalyan Das, Christopher Wewer, Raza Yunus et al.
ControlLLM: Augment Language Models with Tools by Searching on Graphs
Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao et al.
Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis
Yanzuo Lu, Manlin Zhang, Jinhua Ma et al.
BEND: Benchmarking DNA Language Models on Biologically Meaningful Tasks
Frederikke Marin, Felix Teufel, Marc Horlacher et al.
DrFuse: Learning Disentangled Representation for Clinical Multi-Modal Fusion with Missing Modality and Modal Inconsistency
Wenfang Yao, Kejing Yin, William Cheung et al.
Lane Graph as Path: Continuity-preserving Path-wise Modeling for Online Lane Graph Construction
Bencheng Liao, Shaoyu Chen, Bo Jiang et al.
Hot or Cold? Adaptive Temperature Sampling for Code Generation with Large Language Models
Yuqi Zhu, Jia Li, Ge Li et al.
VCP-CLIP: A visual context prompting model for zero-shot anomaly segmentation
Zhen Qu, Xian Tao, Mukesh Prasad et al.
DQ-DETR: DETR with Dynamic Query for Tiny Object Detection
Yi-Xin Huang, Hou-I Liu, Hong-Han Shuai et al.
Text2Loc: 3D Point Cloud Localization from Natural Language
Yan Xia, Letian Shi, Zifeng Ding et al.
SelfPromer: Self-Prompt Dehazing Transformers with Depth-Consistency
8137 Feiyu Zhu, Reid Simmons
Hierarchical Spatio-temporal Decoupling for Text-to-Video Generation
Zhiwu Qing, Shiwei Zhang, Jiayu Wang et al.
Watermarking Conditional Text Generation for AI Detection: Unveiling Challenges and a Semantic-Aware Watermark Remedy
Yu Fu, Deyi Xiong, Yue Dong
TEILP: Time Prediction over Knowledge Graphs via Logical Reasoning
Siheng Xiong, Yuan Yang, Ali Payani et al.
Self-Distilled Masked Auto-Encoders are Efficient Video Anomaly Detectors
Nicolae Ristea, Florinel Croitoru, Radu Tudor Ionescu et al.
Delving into Multimodal Prompting for Fine-Grained Visual Classification
Xin Jiang, Hao Tang, Junyao Gao et al.
Improving 2D Feature Representations by 3D-Aware Fine-Tuning
Yuanwen Yue, Anurag Das, Francis Engelmann et al.
Lodge: A Coarse to Fine Diffusion Network for Long Dance Generation Guided by the Characteristic Dance Primitives
Ronghui Li, Yuxiang Zhang, Yachao Zhang et al.
MuSc: Zero-Shot Industrial Anomaly Classification and Segmentation with Mutual Scoring of the Unlabeled Images
Xurui Li, Ziming Huang, Feng Xue et al.
OmniSat: Self-Supervised Modality Fusion for Earth Observation
Guillaume Astruc, Nicolas Gonthier, Clement Mallet et al.
Harnessing the Power of MLLMs for Transferable Text-to-Image Person ReID
Wentao Tan, Changxing Ding, Jiayu Jiang et al.
Rethinking Diffusion Model for Multi-Contrast MRI Super-Resolution
Guangyuan Li, Chen Rao, Juncheng Mo et al.
FlashTex: Fast Relightable Mesh Texturing with LightControlNet
Kangle Deng, Timothy Omernick, Alexander B Weiss et al.
SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation
Yamei Chen, Yan Di, Guangyao Zhai et al.
MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping
Jiacheng Chen, Yuefan Wu, Tan Jiaqi et al.
UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling
Haoyu Lu, Yuqi Huo, Guoxing Yang et al.
A Semi-supervised Nighttime Dehazing Baseline with Spatial-Frequency Aware and Realistic Brightness Constraint
Xiaofeng Cong, Jie Gui, Jing Zhang et al.
MemFlow: Optical Flow Estimation and Prediction with Memory
Qiaole Dong, Yanwei Fu
Generalized Large-Scale Data Condensation via Various Backbone and Statistical Matching
Shitong Shao, Zeyuan Yin, Muxin Zhou et al.
Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts
Xinhua Cheng, Tianyu Yang, Jianan Wang et al.
SWAG: Splatting in the Wild images with Appearance-conditioned Gaussians
Hiba Dahmani, Moussab Bennehar, Nathan Piasco et al.
RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation
Peng Lu, Tao Jiang, Yining Li et al.
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control
Yue Han, Junwei Zhu, Keke He et al.
IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination
Xi Chen, Sida Peng, Dongchen Yang et al.
Text-Image Alignment for Diffusion-Based Perception
Neehar Kondapaneni, Markus Marks, Manuel Knott et al.
GAMC: An Unsupervised Method for Fake News Detection Using Graph Autoencoder with Masking
Shu Yin, Peican Zhu, Lianwei Wu et al.
FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models
Shivangi Aneja, Justus Thies, Angela Dai et al.
Training Diffusion Models Towards Diverse Image Generation with Reinforcement Learning
Zichen Miao, Jiang Wang, Ze Wang et al.
A Comparative Study of Image Restoration Networks for General Backbone Network Design
Xiangyu Chen, Zheyuan Li, Yuandong Pu et al.
GoMAvatar: Efficient Animatable Human Modeling from Monocular Video Using Gaussians-on-Mesh
Jing Wen, Xiaoming Zhao, Jason Ren et al.
HeadGaS: Real-Time Animatable Head Avatars via 3D Gaussian Splatting
Helisa Dhamo, Yinyu Nie, Arthur Moreau et al.
In-Context Learning Learns Label Relationships but Is Not Conventional Learning
Jannik Kossen, Yarin Gal, Tom Rainforth
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering
Zeyu Liu, Weicong Liang, Zhanhao Liang et al.
Latent Guard: a Safety Framework for Text-to-image Generation
Runtao Liu, Ashkan Khakzar, Jindong Gu et al.
Towards Scalable 3D Anomaly Detection and Localization: A Benchmark via 3D Anomaly Synthesis and A Self-Supervised Learning Network
wenqiao Li, Xiaohao Xu, Yao Gu et al.
FakeInversion: Learning to Detect Images from Unseen Text-to-Image Models by Inverting Stable Diffusion
George Cazenavette, Avneesh Sud, Thomas Leung et al.
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos
Xiang Wang, Shiwei Zhang, Hangjie Yuan et al.
LatestEval: Addressing Data Contamination in Language Model Evaluation through Dynamic and Time
Sensitive Test Construction - Yucheng Li, Frank Guerin, Chenghua Lin
CFPL-FAS: Class Free Prompt Learning for Generalizable Face Anti-spoofing
Ajian Liu, Shuai Xue, Gan Jianwen et al.
Enhancing Multimodal Cooperation via Sample-level Modality Valuation
Yake Wei, Ruoxuan Feng, Zihe Wang et al.
PointOBB: Learning Oriented Object Detection via Single Point Supervision
Junwei Luo, Xue Yang, Yi Yu et al.
VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models
Junlin Han, Filippos Kokkinos, Philip Torr
Visual In-Context Prompting
Feng Li, Qing Jiang, Hao Zhang et al.
SQLdepth: Generalizable Self-Supervised Fine-Structured Monocular Depth Estimation
Dong Wu, Mingmin Chi, Xuan Zang et al.
SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
Kiana Ehsani, Tanmay Gupta, Rose Hendrix et al.
GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering
Yanyan Li, Chenyu Lyu, Yan Di et al.
HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution
Xiang Zhang, Yulun Zhang, Fisher Yu
LaRa: Efficient Large-Baseline Radiance Fields
Anpei Chen, Haofei Xu, Stefano Esposito et al.
AvatarGPT: All-in-One Framework for Motion Understanding Planning Generation and Beyond
Zixiang Zhou, Yu Wan, Baoyuan Wang
GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes
Ibrahim Ethem Hamamci, Sezgin Er, Anjany Sekuboyina et al.
LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection
hongcheng Guo, Jian Yang, Jiaheng Liu et al.
JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation
Yu Zeng, Vishal M. Patel, Haochen Wang et al.
Describing Differences in Image Sets with Natural Language
Lisa Dunlap, Yuhui Zhang, Xiaohan Wang et al.
Accelerating Diffusion Sampling with Optimized Time Steps
Shuchen Xue, Zhaoqiang Liu, Fei Chen et al.
LCM-Lookahead for Encoder-based Text-to-Image Personalization
Rinon Gal, Or Lichter, Elad Richardson et al.
MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human Captures
Zhangyang Xiong, Chenghong Li, Kenkun Liu et al.
Bilateral Propagation Network for Depth Completion
Jie Tang, Fei-Peng Tian, Boshi An et al.
End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames
Shuming Liu, Chenlin Zhang, Chen Zhao et al.
DiffusionLight: Light Probes for Free by Painting a Chrome Ball
Pakkapon Phongthawee, Worameth Chinchuthakun, Nontaphat Sinsunthithet et al.
MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
Yining Hong, Zishuo Zheng, Peihao Chen et al.
Graph Neural Networks for Learning Equivariant Representations of Neural Networks
Miltiadis (Miltos) Kofinas, Boris Knyazev, Yan Zhang et al.
ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions
Anindita Ghosh, Rishabh Dabral, Vladislav Golyanik et al.
FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis
Linjiang Huang, Rongyao Fang, Aiping Zhang et al.
Neural Implicit Representation for Building Digital Twins of Unknown Articulated Objects
Yijia Weng, Bowen Wen, Jonathan Tremblay et al.
Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior
Zike Wu, Pan Zhou, YI Xuanyu et al.
Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark
Fangjun Li, David C. Hogg, Anthony G. Cohn
GVGEN: Text-to-3D Generation with Volumetric Representation
Xianglong He, Junyi Chen, Sida Peng et al.
Intriguing Properties of Generative Classifiers
Priyank Jaini, Kevin Clark, Robert Geirhos
Implicit Discriminative Knowledge Learning for Visible-Infrared Person Re-Identification
kaijie ren, Lei Zhang
EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering
Junjue Wang, Zhuo Zheng, Zihang Chen et al.
Dynamic Semantic-Based Spatial Graph Convolution Network for Skeleton-Based Human Action Recognition
Jianyang Xie, Yanda Meng, Yitian Zhao et al.
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
Qilang Ye, Zitong Yu, Rui Shao et al.
Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction
Zihao Liu, Xiaoyu Zhang, Guangwei Liu et al.
Few-Shot Object Detection with Foundation Models
Guangxing Han, Ser-Nam Lim
SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection
JUNSU KIM, Hoseong Cho, Jihyeon Kim et al.
Jack of All Tasks Master of Many: Designing General-Purpose Coarse-to-Fine Vision-Language Model
Shraman Pramanick, Guangxing Han, Rui Hou et al.
Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training
David Wan, Jaemin Cho, Elias Stengel-Eskin et al.
Challenging Forgets: Unveiling the Worst-Case Forget Sets in Machine Unlearning
Chongyu Fan, Jiancheng Liu, Alfred Hero et al.
Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation
Lanqing Guo, Yingqing He, Haoxin Chen et al.
Discovering and Mitigating Visual Biases through Keyword Explanation
Younghyun Kim, Sangwoo Mo, Minkyu Kim et al.
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
Chaoyi Zhang, Kevin Lin, Zhengyuan Yang et al.
Matching Anything by Segmenting Anything
Siyuan Li, Lei Ke, Martin Danelljan et al.
SubT-MRS Dataset: Pushing SLAM Towards All-weather Environments
Shibo Zhao, Yuanjun Gao, Tianhao Wu et al.
Knowledge-Enhanced Dual-stream Zero-shot Composed Image Retrieval
Yucheng Suo, Fan Ma, Linchao Zhu et al.
From Zero to Turbulence: Generative Modeling for 3D Flow Simulation
Marten Lienen, David Lüdke, Jan Hansen-Palmus et al.
VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models
Shicheng Li, Lei Li, Yi Liu et al.
Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention
Jie Ren, Yaxin Li, Shenglai Zeng et al.
Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector
Yuqian Fu, Yu Wang, Yixuan Pan et al.
UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation
Zexiang Liu, Yangguang Li, Youtian Lin et al.
Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-Modal Structured Representations
Yufeng Huang, Jiji Tang, Zhuo Chen et al.
On the Test-Time Zero-Shot Generalization of Vision-Language Models: Do We Really Need Prompt Learning?
Maxime Zanella, Ismail Ben Ayed
AutomaTikZ: Text-Guided Synthesis of Scientific Vector Graphics with TikZ
Jonas Belouadi, Anne Lauscher, Steffen Eger
A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators
Chen Zhang, L. F. D’Haro, Yiming Chen et al.
TIBET: Identifying and Evaluating Biases in Text-to-Image Generative Models
Aditya Aravind Chinchure, Pushkar Shukla, Gaurav Bhatt et al.
ReMamber: Referring Image Segmentation with Mamba Twister
Yuhuan Yang, Chaofan Ma, Jiangchao Yao et al.
FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation
Shuai Yang, Yifan Zhou, Ziwei Liu et al.
Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification
Pingping Zhang, Yuhao Wang, Yang Liu et al.
OMG: Towards Open-vocabulary Motion Generation via Mixture of Controllers
Han Liang, Jiacheng Bao, Ruichi Zhang et al.
Language-driven All-in-one Adverse Weather Removal
Hao Yang, Liyuan Pan, Yan Yang et al.
Reinforced Adaptive Knowledge Learning for Multimodal Fake News Detection
Litian Zhang, Xiaoming Zhang, Chaozhuo Li et al.
Frozen Transformers in Language Models Are Effective Visual Encoder Layers
Ziqi Pang, Ziyang Xie, Yunze Man et al.
SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples
Phillip Howard, Avinash Madasu, Tiep Le et al.
Generating Human Motion in 3D Scenes from Text Descriptions
Zhi Cen, Huaijin Pi, Sida Peng et al.
Local Search GFlowNets
Minsu Kim, Yun Taeyoung, Emmanuel Bengio et al.
Leveraging Vision-Language Models for Improving Domain Generalization in Image Classification
Sravanti Addepalli, Ashish Asokan, Lakshay Sharma et al.
Improving Audio-Visual Segmentation with Bidirectional Generation
Dawei Hao, Yuxin Mao, Bowen He et al.
DeS3: Adaptive Attention-Driven Self and Soft Shadow Removal Using ViT Similarity
Yeying Jin, Wenhan Yang, W. Ye et al.
Neural Markov Random Field for Stereo Matching
Tongfan Guan, Chen Wang, Yun-Hui Liu
When Fast Fourier Transform Meets Transformer for Image Restoration
xingyu jiang, Xiuhui Zhang, Ning Gao et al.
Soft Contrastive Learning for Time Series
Seunghan Lee, Taeyoung Park, Kibok Lee
CoMo: Controllable Motion Generation through Language Guided Pose Code Editing
Yiming Huang, WEILIN WAN, Yue Yang et al.
DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM
Yixuan Wu, Yizhou Wang, Shixiang Tang et al.
VCoder: Versatile Vision Encoders for Multimodal Large Language Models
Jitesh Jain, Jianwei Yang, Humphrey Shi
DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control
Yuru Jia, Lukas Hoyer, Shengyu Huang et al.
MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation
Mi Yan, Jiazhao Zhang, Yan Zhu et al.
DAP: A Dynamic Adversarial Patch for Evading Person Detectors
Amira Guesmi, Ruitian Ding, Muhammad Abdullah Hanif et al.
GPAvatar: Generalizable and Precise Head Avatar from Image(s)
Xuangeng Chu, Yu Li, Ailing Zeng et al.
Feature Fusion from Head to Tail for Long-Tailed Visual Recognition
Mengke Li, Zhikai HU, Yang Lu et al.
UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models
Yiming Zhao, Zhouhui Lian
LightIt: Illumination Modeling and Control for Diffusion Models
Peter Kocsis, Kalyan Sunkavalli, Julien Philip et al.
PatchFusion: An End-to-End Tile-Based Framework for High-Resolution Monocular Metric Depth Estimation
Zhenyu Li, Shariq Bhat, Peter Wonka
Jumanji: a Diverse Suite of Scalable Reinforcement Learning Environments in JAX
Clément Bonnet, Daniel Luo, Donal Byrne et al.
UniTraj: A Unified Framework for Scalable Vehicle Trajectory Prediction
Lan Feng, Mohammadhossein Bahari, Kaouther Messaoud et al.
Simplifying Transformer Blocks
Bobby He, Thomas Hofmann
RangeLDM: Fast Realistic LiDAR Point Cloud Generation
Qianjiang Hu, Zhimin Zhang, Wei Hu
S2MAE: A Spatial-Spectral Pretraining Foundation Model for Spectral Remote Sensing Data
Xuyang Li, Danfeng Hong, Jocelyn Chanussot
MatFuse: Controllable Material Generation with Diffusion Models
Giuseppe Vecchio, Renato Sortino, Simone Palazzo et al.
One-Prompt to Segment All Medical Images
Wu, Min Xu
Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks
Yuhao Liu, Zhanghan Ke, Fang Liu et al.
What does the Knowledge Neuron Thesis Have to do with Knowledge?
Jingcheng Niu, Andrew Liu, Zining Zhu et al.
Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer
Yu Deng, Duomin Wang, Baoyuan Wang
Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models
Yixuan Ren, Yang Zhou, Jimei Yang et al.
AltDiffusion: A Multilingual Text-to-Image Diffusion Model
Fulong Ye, Guang Liu, Xinya Wu et al.
Mosaic-SDF for 3D Generative Models
Lior Yariv, Omri Puny, Oran Gafni et al.
Improving Image Restoration through Removing Degradations in Textual Representations
Jingbo Lin, Zhilu Zhang, Yuxiang Wei et al.
JoMA: Demystifying Multilayer Transformers via Joint Dynamics of MLP and Attention
Yuandong Tian, Yiping Wang, Zhenyu Zhang et al.
GAIA: Zero-shot Talking Avatar Generation
Tianyu He, Junliang Guo, Runyi Yu et al.
Grounded Question-Answering in Long Egocentric Videos
Shangzhe Di, Weidi Xie
FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally
Qiuhong Shen, Xingyi Yang, Xinchao Wang
Str2Str: A Score-based Framework for Zero-shot Protein Conformation Sampling
Jiarui Lu, Bozitao Zhong, Zuobai Zhang et al.
S2WAT: Image Style Transfer via Hierarchical Vision Transformer Using Strips Window Attention
Chiyu Zhang, Xiaogang Xu, Lei Wang et al.
PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations
Yang Zheng, Qingqing Zhao, Guandao Yang et al.
BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation
Peng Xu, Wenqi Shao, Mengzhao Chen et al.
Point Segment and Count: A Generalized Framework for Object Counting
Zhizhong Huang, Mingliang Dai, Yi Zhang et al.
Group Preference Optimization: Few-Shot Alignment of Large Language Models
Siyan Zhao, John Dang, Aditya Grover
Smooth ECE: Principled Reliability Diagrams via Kernel Smoothing
Jaroslaw Blasiok, Preetum Nakkiran
ODEFormer: Symbolic Regression of Dynamical Systems with Transformers
Stéphane d'Ascoli, Sören Becker, Philippe Schwaller et al.
Digital Life Project: Autonomous 3D Characters with Social Intelligence
Zhongang Cai, Jianping Jiang, Zhongfei Qing et al.
Duolando: Follower GPT with Off-Policy Reinforcement Learning for Dance Accompaniment
Siyao Li, Tianpei Gu, Zhitao Yang et al.
Real-Fake: Effective Training Data Synthesis Through Distribution Matching
Jianhao Yuan, Jie Zhang, Shuyang Sun et al.
Bridging Remote Sensors with Multisensor Geospatial Foundation Models
Boran Han, Shuai Zhang, Xingjian Shi et al.
Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit
Blake Bordelon, Lorenzo Noci, Mufan Li et al.
Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles
Rui Song, Chenwei Liang, Hu Cao et al.
Pre-trained Vision and Language Transformers Are Few-Shot Incremental Learners
Keon Hee Park, Kyungwoo Song, Gyeong-Moon Park
Neural Sign Actors: A Diffusion Model for 3D Sign Language Production from Text
Vasileios Baltatzis, Rolandos Alexandros Potamias, Evangelos Ververas et al.
DrivingDiffusion: Layout-Guided Multi-View Driving Scenarios Video Generation with Latent Diffusion Model
Li Xiaofan, Zhang Yifu, Xiaoqing Ye
CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update
Zhi Gao, Yuntao Du., Xintong Zhang et al.
ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance
Yongwei Chen, Tengfei Wang, Tong Wu et al.
SEPT: Towards Efficient Scene Representation Learning for Motion Prediction
Zhiqian Lan, Yuxuan Jiang, Yao Mu et al.
DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection
Lewei Yao, Renjie Pi, Jianhua Han et al.
Cross-Layer and Cross-Sample Feature Optimization Network for Few-Shot Fine-Grained Image Classification
Zhen-Xiang Ma, Zhen-Duo Chen, Li-Jun Zhao et al.