Most Cited 2024 "conditional score networks" Papers
12,324 papers found • Page 54 of 62
Conference
Elite360D: Towards Efficient 360 Depth Estimation via Semantic- and Distance-Aware Bi-Projection Fusion
Hao Ai, Addison, Lin Wang
Learning Triangular Distribution in Visual World
Ping Chen, Xingpeng Zhang, Chengtao Zhou et al.
Your Student is Better Than Expected: Adaptive Teacher-Student Collaboration for Text-Conditional Diffusion Models
Nikita Starodubcev, Dmitry Baranchuk, Artem Fedorov et al.
GLiDR: Topologically Regularized Graph Generative Network for Sparse LiDAR Point Clouds
Prashant Kumar, Kshitij Madhav Bhat, Vedang Bhupesh Shenvi Nadkarni et al.
HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation
Xin Huang, Ruizhi Shao, Qi Zhang et al.
Unbiased Estimator for Distorted Conics in Camera Calibration
Chaehyeon Song, Jaeho Shin, Myung-Hwan Jeon et al.
Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding
Chaolei Tan, Jianhuang Lai, Wei-Shi Zheng et al.
Enhancing Quality of Compressed Images by Mitigating Enhancement Bias Towards Compression Domain
Qunliang Xing, Mai Xu, Shengxi Li et al.
TIGER: Time-Varying Denoising Model for 3D Point Cloud Generation with Diffusion Process
Zhiyuan Ren, Minchul Kim, Feng Liu et al.
HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video
Zicong Fan, Maria Parelli, Maria Kadoglou et al.
Learning Continual Compatible Representation for Re-indexing Free Lifelong Person Re-identification
Zhenyu Cui, Jiahuan Zhou, Xun Wang et al.
LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation
Linfeng Yuan, Miaojing Shi, Zijie Yue et al.
Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos
Mehmet Saygin Seyfioglu, Wisdom Ikezogwo, Fatemeh Ghezloo et al.
X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model
Lingmin Ran, Xiaodong Cun, Jia-Wei Liu et al.
AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving
Mingfu Liang, Jong-Chyi Su, Samuel Schulter et al.
Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models
Pablo Marcos-Manchón, Roberto Alcover-Couso, Juan SanMiguel et al.
Attentive Illumination Decomposition Model for Multi-Illuminant White Balancing
Dongyoung Kim, Jinwoo Kim, Junsang Yu et al.
Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning
Wenjin Hou, Shiming Chen, Shuhuang Chen et al.
A&B BNN: Add&Bit-Operation-Only Hardware-Friendly Binary Neural Network
Ruichen Ma, Guanchao Qiao, Yian Liu et al.
OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies
Lingdong Kong, Youquan Liu, Lai Xing Ng et al.
Z*: Zero-shot Style Transfer via Attention Reweighting
Yingying Deng, Xiangyu He, Fan Tang et al.
G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis
Yufei Ye, Abhinav Gupta, Kris Kitani et al.
3D Face Reconstruction with the Geometric Guidance of Facial Part Segmentation
Zidu Wang, Xiangyu Zhu, Tianshuo Zhang et al.
Spike-guided Motion Deblurring with Unknown Modal Spatiotemporal Alignment
Jiyuan Zhang, Shiyan Chen, Yajing Zheng et al.
Multi-Task Dense Prediction via Mixture of Low-Rank Experts
Yuqi Yang, Peng-Tao Jiang, Qibin Hou et al.
A Bayesian Approach to OOD Robustness in Image Classification
Prakhar Kaushik, Adam Kortylewski, Alan L. Yuille
ConCon-Chi: Concept-Context Chimera Benchmark for Personalized Vision-Language Tasks
Andrea Rosasco, Stefano Berti, Giulia Pasquale et al.
Instance-aware Contrastive Learning for Occluded Human Mesh Reconstruction
Mi-Gyeong Gwon, Gi-Mun Um, Won-Sik Cheong et al.
DITTO: Dual and Integrated Latent Topologies for Implicit 3D Reconstruction
Jaehyeok Shim, Kyungdon Joo
UniMODE: Unified Monocular 3D Object Detection
Zhuoling Li, Xiaogang Xu, Ser-Nam Lim et al.
Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation
Mukul Khanna, Yongsen Mao, Hanxiao Jiang et al.
Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training
Yipeng Gao, Zeyu Wang, Wei-Shi Zheng et al.
KeyPoint Relative Position Encoding for Face Recognition
Minchul Kim, Feng Liu, Yiyang Su et al.
QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition
Xiang Li, Jinglu Wang, Xiaohao Xu et al.
From a Bird's Eye View to See: Joint Camera and Subject Registration without the Camera Calibration
Zekun Qian, Ruize Han, Wei Feng et al.
Joint2Human: High-Quality 3D Human Generation via Compact Spherical Embedding of 3D Joints
Muxin Zhang, Qiao Feng, Zhuo Su et al.
Investigating Compositional Challenges in Vision-Language Models for Visual Grounding
Yunan Zeng, Yan Huang, Jinjin Zhang et al.
SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation
Chen Sichen, Yingyi Zhang, Siming Huang et al.
FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis
Feng Liang, Bichen Wu, Jialiang Wang et al.
DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving
Chen Min, Dawei Zhao, Liang Xiao et al.
Accept the Modality Gap: An Exploration in the Hyperbolic Space
Sameera Ramasinghe, Violetta Shevchenko, Gil Avraham et al.
MirageRoom: 3D Scene Segmentation with 2D Pre-trained Models by Mirage Projection
Haowen Sun, Yueqi Duan, Juncheng Yan et al.
CAD: Photorealistic 3D Generation via Adversarial Distillation
Ziyu Wan, Despoina Paschalidou, Ian Huang et al.
CityDreamer: Compositional Generative Model of Unbounded 3D Cities
Haozhe Xie, Zhaoxi Chen, Fangzhou Hong et al.
Noisy-Correspondence Learning for Text-to-Image Person Re-identification
Yang Qin, Yingke Chen, Dezhong Peng et al.
Random Entangled Tokens for Adversarially Robust Vision Transformer
Huihui Gong, Minjing Dong, Siqi Ma et al.
PNeRV: Enhancing Spatial Consistency via Pyramidal Neural Representation for Videos
Qi Zhao, M. Salman Asif, Zhan Ma
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
Tongjia Chen, Hongshan Yu, Zhengeng Yang et al.
DYSON: Dynamic Feature Space Self-Organization for Online Task-Free Class Incremental Learning
Yuhang He, YingJie Chen, Yuhan Jin et al.
Harnessing Large Language Models for Training-free Video Anomaly Detection
Luca Zanella, Willi Menapace, Massimiliano Mancini et al.
Continuous Pose for Monocular Cameras in Neural Implicit Representation
Qi Ma, Danda Paudel, Ajad Chhatkuli et al.
Learned Trajectory Embedding for Subspace Clustering
Yaroslava Lochman, Christopher Zach, Carl Olsson
BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning
Siyuan Liang, Mingli Zhu, Aishan Liu et al.
Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis
Yuchao Gu, Xintao Wang, Yixiao Ge et al.
Weakly Supervised Video Individual Counting
Xinyan Liu, Guorong Li, Yuankai Qi et al.
FairRAG: Fair Human Generation via Fair Retrieval Augmentation
Robik Shrestha, Yang Zou, Qiuyu Chen et al.
MicroDiffusion: Implicit Representation-Guided Diffusion for 3D Reconstruction from Limited 2D Microscopy Projections
mude hui, Zihao Wei, Hongru Zhu et al.
Learning Inclusion Matching for Animation Paint Bucket Colorization
Yuekun Dai, Shangchen Zhou, Blake Li et al.
LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching
Yixun Liang, Xin Yang, Jiantao Lin et al.
Preserving Fairness Generalization in Deepfake Detection
Li Lin, Xinan He, Yan Ju et al.
RepViT: Revisiting Mobile CNN From ViT Perspective
Ao Wang, Hui Chen, Zijia Lin et al.
Improved Implicit Neural Representation with Fourier Reparameterized Training
Kexuan Shi, Xingyu Zhou, Shuhang Gu
Gradient Alignment for Cross-Domain Face Anti-Spoofing
MINH BINH LE, Simon Woo
U-VAP: User-specified Visual Appearance Personalization via Decoupled Self Augmentation
You Wu, Kean Liu, Xiaoyue Mi et al.
Prompt-Free Diffusion: Taking “Text” out of Text-to-Image Diffusion Models
Xingqian Xu, Jiayi Guo, Zhangyang Wang et al.
Insights from the Use of Previously Unseen Neural Architecture Search Datasets
Rob Geada, David Towers, Matthew Forshaw et al.
A Pedestrian is Worth One Prompt: Towards Language Guidance Person Re-Identification
Zexian Yang, Dayan Wu, Chenming Wu et al.
Layout-Agnostic Scene Text Image Synthesis with Diffusion Models
Qilong Zhangli, Jindong Jiang, Di Liu et al.
CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation
Kangfu Mei, Mauricio Delbracio, Hossein Talebi et al.
From SAM to CAMs: Exploring Segment Anything Model for Weakly Supervised Semantic Segmentation
Hyeokjun Kweon, Kuk-Jin Yoon
Referring Expression Counting
Siyang Dai, Jun Liu, Ngai-Man Cheung
Vlogger: Make Your Dream A Vlog
Shaobin Zhuang, Kunchang Li, Xinyuan Chen et al.
EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion
Zehuan Huang, Hao Wen, Junting Dong et al.
IPoD: Implicit Field Learning with Point Diffusion for Generalizable 3D Object Reconstruction from Single RGB-D Images
Yushuang Wu, Luyue Shi, Junhao Cai et al.
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Bo He, Hengduo Li, Young Kyun Jang et al.
SVGDreamer: Text Guided SVG Generation with Diffusion Model
XiMing Xing, Chuang Wang, Haitao Zhou et al.
Dual Prototype Attention for Unsupervised Video Object Segmentation
Suhwan Cho, Minhyeok Lee, Seunghoon Lee et al.
R-Cyclic Diffuser: Reductive and Cyclic Latent Diffusion for 3D Clothed Human Digitalization
Kennard Chan, Fayao Liu, Guosheng Lin et al.
Contrastive Mean-Shift Learning for Generalized Category Discovery
Sua Choi, Dahyun Kang, Minsu Cho
Panacea: Panoramic and Controllable Video Generation for Autonomous Driving
Yuqing Wen, Yucheng Zhao, Yingfei Liu et al.
Flexible Biometrics Recognition: Bridging the Multimodality Gap through Attention Alignment and Prompt Tuning
Leslie Ching Ow Tiong, Dick Sigmund, Chen-Hui Chan et al.
Towards Variable and Coordinated Holistic Co-Speech Motion Generation
Yifei Liu, Qiong Cao, Yandong Wen et al.
VSRD: Instance-Aware Volumetric Silhouette Rendering for Weakly Supervised 3D Object Detection
Zihua Liu, Hiroki Sakuma, Masatoshi Okutomi
Class Incremental Learning with Multi-Teacher Distillation
Haitao Wen, Lili Pan, Yu Dai et al.
Parameter Efficient Self-Supervised Geospatial Domain Adaptation
Linus Scheibenreif, Michael Mommert, Damian Borth
ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Transformers
Narges Norouzi, Svetlana Orlova, Daan de Geus et al.
Beyond Seen Primitive Concepts and Attribute-Object Compositional Learning
Nirat Saini, Khoi Pham, Abhinav Shrivastava
Scaling Laws of Synthetic Images for Model Training ... for Now
Lijie Fan, Kaifeng Chen, Dilip Krishnan et al.
UDiFF: Generating Conditional Unsigned Distance Fields with Optimal Wavelet Diffusion
Junsheng Zhou, Weiqi Zhang, Baorui Ma et al.
Learning Group Activity Features Through Person Attribute Prediction
Chihiro Nakatani, Hiroaki Kawashima, Norimichi Ukita
MICap: A Unified Model for Identity-Aware Movie Descriptions
Haran Raajesh, Naveen Reddy Desanur, Zeeshan Khan et al.
UFineBench: Towards Text-based Person Retrieval with Ultra-fine Granularity
Jialong Zuo, Hanyu Zhou, Ying Nie et al.
Test-Time Zero-Shot Temporal Action Localization
Benedetta Liberatori, Alessandro Conti, Paolo Rota et al.
FreeU: Free Lunch in Diffusion U-Net
Chenyang Si, Ziqi Huang, Yuming Jiang et al.
Towards Text-guided 3D Scene Composition
Qihang Zhang, Chaoyang Wang, Aliaksandr Siarohin et al.
Instance-aware Exploration-Verification-Exploitation for Instance ImageGoal Navigation
Xiaohan Lei, Min Wang, Wengang Zhou et al.
AnyScene: Customized Image Synthesis with Composited Foreground
Ruidong Chen, Lanjun Wang, Weizhi Nie et al.
Learning SO(3)-Invariant Semantic Correspondence via Local Shape Transform
Chunghyun Park, Seungwook Kim, Jaesik Park et al.
Color Shift Estimation-and-Correction for Image Enhancement
Yiyu Li, Ke Xu, Gerhard Hancke et al.
TE-TAD: Towards Full End-to-End Temporal Action Detection via Time-Aligned Coordinate Expression
Ho-Joong Kim, Jung-Ho Hong, Heejo Kong et al.
Endow SAM with Keen Eyes: Temporal-spatial Prompt Learning for Video Camouflaged Object Detection
Wenjun Hui, Zhenfeng Zhu, Shuai Zheng et al.
NICE: Neurogenesis Inspired Contextual Encoding for Replay-free Class Incremental Learning
Mustafa B Gurbuz, Jean Moorman, Constantine Dovrolis
Taming Mode Collapse in Score Distillation for Text-to-3D Generation
Peihao Wang, Dejia Xu, Zhiwen Fan et al.
FocusMAE: Gallbladder Cancer Detection from Ultrasound Videos with Focused Masked Autoencoders
Soumen Basu, Mayuna Gupta, Chetan Madan et al.
Noisy One-point Homographies are Surprisingly Good
Yaqing Ding, Jonathan Astermark, Magnus Oskarsson et al.
CSTA: CNN-based Spatiotemporal Attention for Video Summarization
Jaewon Son, Jaehun Park, Kwangsu Kim
SUGAR: Pre-training 3D Visual Representations for Robotics
Shizhe Chen, Ricardo Garcia Pinel, Ivan Laptev et al.
SnAG: Scalable and Accurate Video Grounding
Fangzhou Mu, Sicheng Mo, Yin Li
GLaMM: Pixel Grounding Large Multimodal Model
Hanoona Rasheed, Muhammad Maaz, Sahal Shaji Mullappilly et al.
ManiFPT: Defining and Analyzing Fingerprints of Generative Models
Hae Jin Song, Mahyar Khayatkhoei, Wael AbdAlmageed
Self-Calibrating Vicinal Risk Minimisation for Model Calibration
Jiawei Liu, Changkun Ye, Ruikai Cui et al.
Cinematic Behavior Transfer via NeRF-based Differentiable Filming
Xuekun Jiang, Anyi Rao, Jingbo Wang et al.
Finding Lottery Tickets in Vision Models via Data-driven Spectral Foresight Pruning
Leonardo Iurada, Marco Ciccone, Tatiana Tommasi
CORE-MPI: Consistency Object Removal with Embedding MultiPlane Image
Donggeun Yoon, Donghyeon Cho
ScoreHypo: Probabilistic Human Mesh Estimation with Hypothesis Scoring
Yuan Xu, Xiaoxuan Ma, Jiajun Su et al.
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
Nataniel Ruiz, Yuanzhen Li, Varun Jampani et al.
Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models
Chang Liu, Haoning Wu, Yujie Zhong et al.
UniMix: Towards Domain Adaptive and Generalizable LiDAR Semantic Segmentation in Adverse Weather
Haimei Zhao, Jing Zhang, Zhuo Chen et al.
LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs
Yunsheng Ma, Can Cui, Xu Cao et al.
Dynamic Policy-Driven Adaptive Multi-Instance Learning for Whole Slide Image Classification
Tingting Zheng, Kui Jiang, Hongxun Yao
BilevelPruning: Unified Dynamic and Static Channel Pruning for Convolutional Neural Networks
Shangqian Gao, Yanfu Zhang, Feihu Huang et al.
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
Jiayi Guo, Xingqian Xu, Yifan Pu et al.
Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents
Yuxi Wei, Zi Wang, Yifan Lu et al.
Learning Continuous 3D Words for Text-to-Image Generation
Ta-Ying Cheng, Matheus Gadelha, Thibault Groueix et al.
Content-Adaptive Non-Local Convolution for Remote Sensing Pansharpening
Yule Duan, Xiao Wu, Haoyu Deng et al.
A Conditional Denoising Diffusion Probabilistic Model for Point Cloud Upsampling
Wentao Qu, Yuantian Shao, Lingwu Meng et al.
APISR: Anime Production Inspired Real-World Anime Super-Resolution
Boyang Wang, Fengyu Yang, Xihang Yu et al.
Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions
Weizhen He, Yiheng Deng, SHIXIANG TANG et al.
Device-Wise Federated Network Pruning
Shangqian Gao, Junyi Li, Zeyu Zhang et al.
SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers
Ioannis Kakogeorgiou, Spyros Gidaris, Konstantinos Karantzalos et al.
MRC-Net: 6-DoF Pose Estimation with MultiScale Residual Correlation
Yuelong Li, Yafei Mao, Raja Bala et al.
Rethinking FID: Towards a Better Evaluation Metric for Image Generation
Sadeep Jayasumana, Srikumar Ramalingam, Andreas Veit et al.
Progress-Aware Online Action Segmentation for Egocentric Procedural Task Videos
Yuhan Shen, Ehsan Elhamifar
MOHO: Learning Single-view Hand-held Object Reconstruction with Multi-view Occlusion-Aware Supervision
Chenyangguang Zhang, Guanlong Jiao, Yan Di et al.
Faces that Speak: Jointly Synthesising Talking Face and Speech from Text
Youngjoon Jang, Jihoon Kim, Junseok Ahn et al.
Learning to Segment Referred Objects from Narrated Egocentric Videos
Yuhan Shen, Huiyu Wang, Xitong Yang et al.
EGTR: Extracting Graph from Transformer for Scene Graph Generation
Jinbae Im, JeongYeon Nam, Nokyung Park et al.
Distributionally Generative Augmentation for Fair Facial Attribute Classification
Fengda Zhang, Qianpei He, Kun Kuang et al.
PikeLPN: Mitigating Overlooked Inefficiencies of Low-Precision Neural Networks
Marina Neseem, Conor McCullough, Randy Hsin et al.
THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models
Prannay Kaul, Zhizhong Li, Hao Yang et al.
UniVS: Unified and Universal Video Segmentation with Prompts as Queries
Minghan LI, Shuai Li, Xindong Zhang et al.
Inlier Confidence Calibration for Point Cloud Registration
Yongzhe Yuan, Yue Wu, Xiaolong Fan et al.
CLIP-BEVFormer: Enhancing Multi-View Image-Based BEV Detector with Ground Truth Flow
Chenbin Pan, Burhan Yaman, Senem Velipasalar et al.
ADFactory: An Effective Framework for Generalizing Optical Flow with NeRF
Han Ling, Quansen Sun, Yinghui Sun et al.
3D Building Reconstruction from Monocular Remote Sensing Images with Multi-level Supervisions
Weijia Li, Haote Yang, Zhenghao Hu et al.
In Search of a Data Transformation That Accelerates Neural Field Training
Junwon Seo, Sangyoon Lee, Kwang In Kim et al.
FastMAC: Stochastic Spectral Sampling of Correspondence Graph
Yifei Zhang, Hao Zhao, Hongyang Li et al.
PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution
Honghao Chen, Xiangxiang Chu, Renyongjian et al.
Towards Generalizing to Unseen Domains with Few Labels
Chamuditha Jayanga Galappaththige, Sanoojan Baliah, Malitha Gunawardhana et al.
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
Jialin Wu, Xia Hu, Yaqing Wang et al.
Improving Semantic Correspondence with Viewpoint-Guided Spherical Maps
Octave Mariotti, Oisin Mac Aodha, Hakan Bilen
Learning Degradation-Independent Representations for Camera ISP Pipelines
Yanhui Guo, Fangzhou Luo, Xiaolin Wu
A Subspace-Constrained Tyler's Estimator and its Applications to Structure from Motion
Feng Yu, Teng Zhang, Gilad Lerman
Low-Resource Vision Challenges for Foundation Models
Yunhua Zhang, Hazel Doughty, Cees G. M. Snoek
Low-Latency Neural Stereo Streaming
Qiqi Hou, Farzad Farhadzadeh, Amir Said et al.
Your Transferability Barrier is Fragile: Free-Lunch for Transferring the Non-Transferable Learning
Ziming Hong, Li Shen, Tongliang Liu
ARTrackV2: Prompting Autoregressive Tracker Where to Look and How to Describe
Yifan Bai, Zeyang Zhao, Yihong Gong et al.
DPHMs: Diffusion Parametric Head Models for Depth-based Tracking
Jiapeng Tang, Angela Dai, Yinyu Nie et al.
MaxQ: Multi-Axis Query for N:M Sparsity Network
Jingyang Xiang, Siqi Li, Junhao Chen et al.
The Mirrored Influence Hypothesis: Efficient Data Influence Estimation by Harnessing Forward Passes
Myeongseob Ko, Feiyang Kang, Weiyan Shi et al.
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models
Haoning Wu, Zicheng Zhang, Erli Zhang et al.
Efficient Scene Recovery Using Luminous Flux Prior
ZhongYu Li, Lei Zhang
Structured Gradient-based Interpretations via Norm-Regularized Adversarial Training
Shizhan Gong, Qi Dou, Farzan Farnia
Revisiting Global Translation Estimation with Feature Tracks
Peilin Tao, Hainan Cui, Mengqi Rong et al.
Forgery-aware Adaptive Transformer for Generalizable Synthetic Image Detection
Huan Liu, Zichang Tan, Chuangchuang Tan et al.
MeaCap: Memory-Augmented Zero-shot Image Captioning
Zequn Zeng, Yan Xie, Hao Zhang et al.
MuseChat: A Conversational Music Recommendation System for Videos
Zhikang Dong, Bin Chen, Xiulong Liu et al.
Novel View Synthesis with View-Dependent Effects from a Single Image
Juan Luis Gonzalez Bello, Munchurl Kim
Orchestrate Latent Expertise: Advancing Online Continual Learning with Multi-Level Supervision and Reverse Self-Distillation
Hongwei Yan, Liyuan Wang, Kaisheng Ma et al.
DisCo: Disentangled Control for Realistic Human Dance Generation
Tan Wang, Linjie Li, Kevin Lin et al.
Suppress and Rebalance: Towards Generalized Multi-Modal Face Anti-Spoofing
Xun Lin, Shuai Wang, RIZHAO CAI et al.
Constructing and Exploring Intermediate Domains in Mixed Domain Semi-supervised Medical Image Segmentation
Qinghe Ma, Jian Zhang, Lei Qi et al.
LAMP: Learn A Motion Pattern for Few-Shot Video Generation
Rui-Qi Wu, Liangyu Chen, Tong Yang et al.
PixelLM: Pixel Reasoning with Large Multimodal Model
Zhongwei Ren, Zhicheng Huang, Yunchao Wei et al.
Towards CLIP-driven Language-free 3D Visual Grounding via 2D-3D Relational Enhancement and Consistency
Yuqi Zhang, Han Luo, Yinjie Lei
iKUN: Speak to Trackers without Retraining
Yunhao Du, Cheng Lei, Zhicheng Zhao et al.
Neural Fields as Distributions: Signal Processing Beyond Euclidean Space
Daniel Rebain, Soroosh Yazdani, Kwang Moo Yi et al.
Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation
Xingqun Qi, Jiahao Pan, Peng Li et al.
LaRE^2: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection
Yunpeng Luo, Junlong Du, Ke Yan et al.
Stratified Avatar Generation from Sparse Observations
Han Feng, Wenchao Ma, Quankai Gao et al.
Few-shot Learner Parameterization by Diffusion Time-steps
Zhongqi Yue, Pan Zhou, Richang Hong et al.
Global and Hierarchical Geometry Consistency Priors for Few-shot NeRFs in Indoor Scenes
Xiaotian Sun, Qingshan Xu, Xinjie Yang et al.
Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation
Zihan Wang, Xiangyang Li, Jiahao Yang et al.
Compressed 3D Gaussian Splatting for Accelerated Novel View Synthesis
Simon Niedermayr, Josef Stumpfegger, rüdiger westermann
The STVchrono Dataset: Towards Continuous Change Recognition in Time
Yanjun Sun, Yue Qiu, Mariia Khan et al.
Unleashing Channel Potential: Space-Frequency Selection Convolution for SAR Object Detection
Ke Li, Di Wang, Zhangyuan Hu et al.
Motion Blur Decomposition with Cross-shutter Guidance
Xiang Ji, Haiyang Jiang, Yinqiang Zheng
Mind Marginal Non-Crack Regions: Clustering-Inspired Representation Learning for Crack Segmentation
zhuangzhuang chen, Zhuonan Lai, Jie Chen et al.
LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
Gongwei Chen, Leyang Shen, Rui Shao et al.
Pixel-Aligned Language Model
Jiarui Xu, Xingyi Zhou, Shen Yan et al.
Eclipse: Disambiguating Illumination and Materials using Unintended Shadows
Dor Verbin, Ben Mildenhall, Peter Hedman et al.
ConvoFusion: Multi-Modal Conversational Diffusion for Co-Speech Gesture Synthesis
Muhammad Hamza Mughal, Rishabh Dabral, Ikhsanul Habibie et al.
2S-UDF: A Novel Two-stage UDF Learning Method for Robust Non-watertight Model Reconstruction from Multi-view Images
Junkai Deng, Fei Hou, Xuhui Chen et al.
ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models
Lukas Höllein, Aljaž Božič, Norman Müller et al.
Taming Stable Diffusion for Text to 360 Panorama Image Generation
Cheng Zhang, Qianyi Wu, Camilo Cruz Gambardella et al.
CAMEL: CAusal Motion Enhancement Tailored for Lifting Text-driven Video Editing
Guiwei Zhang, Tianyu Zhang, Guanglin Niu et al.
DuPL: Dual Student with Trustworthy Progressive Learning for Robust Weakly Supervised Semantic Segmentation
Yuanchen Wu, Xichen Ye, KequanYang et al.
A Physics-informed Low-rank Deep Neural Network for Blind and Universal Lens Aberration Correction
Jin Gong, Runzhao Yang, Weihang Zhang et al.
NAPGuard: Towards Detecting Naturalistic Adversarial Patches
Siyang Wu, Jiakai Wang, Jiejie Zhao et al.
Descriptor and Word Soups: Overcoming the Parameter Efficiency Accuracy Tradeoff for Out-of-Distribution Few-shot Learning
Christopher Liao, Theodoros Tsiligkaridis, Brian Kulis