Most Cited 2024 Poster Papers
12,324 papers found • Page 47 of 62
Conference
RANRAC: Robust Neural Scene Representations via Random Ray Consensus
Benno Buschmann, Andreea Dogaru, Elmar Eisemann et al.
LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model
Runhui Huang, Kaixin Cai, Jianhua Han et al.
CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs
Yassine Ouali, Adrian Bulat, Brais Martinez et al.
Get Your Embedding Space in Order: Domain-Adaptive Regression for Forest Monitoring
Sizhuo Li, Dimitri Gominski, Martin Brandt et al.
Curved Diffusion: A Generative Model With Optical Geometry Control
Andrey Voynov, Amir Hertz, Moab Arar et al.
LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis
Kevin Xie, Tianshi Cao, Jonathan P Lorraine et al.
SeA: Semantic Adversarial Augmentation for Last Layer Features from Unsupervised Representation Learning
Qi Qian, Yuanhong Xu, JUHUA HU
3D Reconstruction of Objects in Hands without Real World 3D Supervision
Aditya Prakash, Matthew Chang, Matthew Jin et al.
To Supervise or Not to Supervise: Understanding and Addressing the Key Challenges of Point Cloud Transfer Learning
Souhail Hadgi, Lei Li, Maks Ovsjanikov
A Diffusion Model for Simulation Ready Coronary Anatomy with Morpho-skeletal Control
Karim Kadry, Shreya Gupta, Jonas Sogbadji et al.
Evaluating the Adversarial Robustness of Semantic Segmentation: Trying Harder Pays Off
Levente Ferenc Halmosi, Bálint Mohos, Márk Jelasity
AdaDiff: Accelerating Diffusion Models through Step-Wise Adaptive Computation
Shengkun Tang, Yaqing Wang, Caiwen Ding et al.
Ex2Eg-MAE: A Framework for Adaptation of Exocentric Video Masked Autoencoders for Egocentric Social Role Understanding
Minh Tran, Yelin Kim, Che-Chun Su et al.
Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning
Thanh Thong Nguyen, Yi Bin, Xiaobao Wu et al.
Learning Representation for Multitask Learning through Self-Supervised Auxiliary Learning
Seokwon Shin, Hyungrok Do, Youngdoo Son
An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation
Zhiyu Tan, Mengping Yang, Luozheng Qin et al.
Generalizable Symbolic Optimizer Learning
Xiaotian Song, Peng Zeng, Yanan Sun et al.
On the Vulnerability of Skip Connections to Model Inversion Attacks
Jun Hao Koh, Sy-Tuyen Ho, Ngoc-Bao Nguyen et al.
Motion Keyframe Interpolation for Any Human Skeleton using Point Cloud-based Human Motion Data Homogenisation
Clinton Mo, Kun Hu, Chengjiang Long et al.
Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics
Woojin Cho, Jihyun Lee, Minjae Yi et al.
PreciseControl: Enhancing Text-To-Image Diffusion Models with Fine-Grained Attribute Control
Rishubh Parihar, Sachidanand VS, Sabariswaran Mani et al.
SRPose: Two-view Relative Pose Estimation with Sparse Keypoints
Rui Yin, Yulun Zhang, Zherong Pan et al.
Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
Xiaoshi Wu, Yiming Hao, Manyuan Zhang et al.
Efficient Vision Transformers with Partial Attention
Xuan-Thuy Vo, Duy-Linh Nguyen, Adri Priadana et al.
Generalized Coverage for More Robust Low-Budget Active Learning
Wonho Bae, Junhyug Noh, Danica J. Sutherland
R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model
Changhoon Kim, Kyle Min, Yezhou Yang
Embracing Events and Frames with Hierarchical Feature Refinement Network for Object Detection
Hu Cao, Zehua Zhang, Yan Xia et al.
TimeLens-XL: Real-time Event-based Video Frame Interpolation with Large Motion
Shi Guo, Yutian Chen, Tianfan Xue et al.
Identity-Consistent Diffusion Network for Grading Knee Osteoarthritis Progression in Radiographic Imaging
Wenhua Wu, Kun Hu, Wenxi Yue et al.
Teach CLIP to Develop a Number Sense for Ordinal Regression
Yao DU, Qiang Zhai, Weihang Dai et al.
Compact 3D Scene Representation via Self-Organizing Gaussian Grids
Wieland Morgenstern, Florian Barthel, Anna Hilsmann et al.
Instant Uncertainty Calibration of NeRFs Using a Meta-Calibrator
Niki Amini-Naieni, Tomas Jakab, Andrea Vedaldi et al.
SHIC: Shape-Image Correspondences with no Keypoint Supervision
Aleksandar Shtedritski, Christian Rupprecht, Andrea Vedaldi
Debiasing surgeon: fantastic weights and how to find them
Remi Nahon, Ivan Luiz De Moura Matos, Van-Tam Nguyen et al.
A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis
Xiang Liu, Zhaoxiang Liu, Huan Hu et al.
EMIE-MAP: Large-Scale Road Surface Reconstruction Based on Explicit Mesh and Implicit Encoding
Wenhua Wu, Qi Wang, Guangming Wang et al.
HyperSpaceX: Radial and Angular Exploration of HyperSpherical Dimensions
Chiranjeev Chiranjeev, Muskan Dosi, Kartik Thakral et al.
Common Sense Reasoning for Deep Fake Detection
Yue Zhang, Ben Colman, Xiao Guo et al.
ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling
Siming Yan, Min Bai, Weifeng Chen et al.
Deep Companion Learning: Enhancing Generalization Through Historical Consistency
Ruizhao Zhu, Venkatesh Saligrama
ABC Easy as 123: A Blind Counter for Exemplar-Free Multi-Class Class-agnostic Counting
Michael A Hobley, Victor Adrian Prisacariu
CrossScore: A Multi-View Approach to Image Evaluation and Scoring
Zirui Wang, Wenjing Bian, Victor Adrian Prisacariu
CPM: Class-conditional Prompting Machine for Audio-visual Segmentation
Yuanhong Chen, Chong Wang, Yuyuan Liu et al.
DiffClass: Diffusion-Based Class Incremental Learning
Zichong Meng, Jie Zhang, Changdi Yang et al.
DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing
Minghao Chen, Iro Laina, Andrea Vedaldi
Dynamic Neural Radiance Field From Defocused Monocular Video
Xianrui Luo, Huiqiang Sun, Juewen Peng et al.
4Diff: 3D-Aware Diffusion Model for Third-to-First Viewpoint Translation
Feng Cheng, Mi Luo, Huiyu Wang et al.
Realistic Human Motion Generation with Cross-Diffusion Models
Zeping Ren, Shaoli Huang, Xiu Li
MERLiN: Single-Shot Material Estimation and Relighting for Photometric Stereo
Ashish Tiwari, Satoshi Ikehata, Shanmuganathan Raman
BI-MDRG: Bridging Image History in Multimodal Dialogue Response Generation
Hee Suk Yoon, Eunseop Yoon, Joshua Tian Jin Tee et al.
Rethinking Few-shot Class-incremental Learning: Learning from Yourself
Yu-Ming Tang, Yi-Xing Peng, Jing-Ke Meng et al.
RoGUENeRF: A Robust Geometry-Consistent Universal Enhancer for NeRF
Sibi Catley-Chandar, Richard Shaw, Greg Slabaugh et al.
FuseTeacher: Modality-fused Encoders are Strong Vision Supervisors
Chen-Wei Xie, Siyang Sun, Liming Zhao et al.
MVDD: Multi-View Depth Diffusion Models
Zhen Wang, Qiangeng Xu, Feitong Tan et al.
Wavelet Convolutions for Large Receptive Fields
Shahaf Finder, Roy Amoyal, Eran Treister et al.
Gradient-based Out-of-Distribution Detection
Taha Entesari, Sina Sharifi, Bardia Safaei et al.
Veil Privacy on Visual Data: Concealing Privacy for Humans, Unveiling for DNNs
Shuchao Pang, Ruhao Ma, Bing Li et al.
Simple Unsupervised Knowledge Distillation With Space Similarity
Aditya Singh, Haohan Wang
Learning Natural Consistency Representation for Face Forgery Video Detection
Daichi Zhang, Zihao Xiao, Shikun Li et al.
View-Consistent 3D Editing with Gaussian Splatting
Yuxuan Wang, Xuanyu Yi, Zike Wu et al.
HO-Gaussian: Hybrid Optimization of 3D Gaussian Splatting for Urban Scenes
Zhuopeng Li, Yilin Zhang, Chenming Wu et al.
Generating Human Interaction Motions in Scenes with Text Control
Hongwei Yi, Justus Thies, Michael J. Black et al.
Instruction Tuning-free Visual Token Complement for Multimodal LLMs
Dongsheng Wang, Jiequan Cui, Miaoge Li et al.
Improving Point-based Crowd Counting and Localization Based on Auxiliary Point Guidance
I-HSIANG CHEN, Wei-Ting Chen, Yu-Wei Liu et al.
SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer
Zijie Wu, Chaohui Yu, Yanqin Jiang et al.
Revisit Self-supervision with Local Structure-from-Motion
Shengjie Zhu, Xiaoming Liu
GraphBEV: Towards Robust BEV Feature Alignment for Multi-Modal 3D Object Detection
Ziying Song, Lei Yang, Shaoqing Xu et al.
EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis
Shuai Tan, Bin Ji, Mengxiao Bi et al.
Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos
Mi Luo, Zihui Xue, Alex Dimakis et al.
LivePhoto: Real Image Animation with Text-guided Motion Control
Xi Chen, Zhiheng Liu, Mengting Chen et al.
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion
Wendi Zheng, Jiayan Teng, Zhuoyi Yang et al.
OAPT: Offset-Aware Partition Transformer for Double JPEG Artifacts Removal
Qiao Mo, Yukang Ding, Jinhua Hao et al.
Context-Aware Action Recognition: Introducing a Comprehensive Dataset for Behavior Contrast
Tatsuya Sasaki, Yoshiki Ito, Satoshi Kondo
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
Jinbo Xing, Menghan Xia, Yong Zhang et al.
Bridging Synthetic and Real Worlds for Pre-training Scene Text Detectors
Tongkun Guan, Wei Shen, Xue Yang et al.
Image-to-Lidar Relational Distillation for Autonomous Driving Data
Anas Mahmoud, Ali Harakeh, Steven Waslander
WPS-SAM: Towards Weakly-Supervised Part Segmentation with Foundation Models
xinjian wu, Ruisong Zhang, Jie Qin et al.
Analysis-by-Synthesis Transformer for Single-View 3D Reconstruction
Dian Jia, Xiaoqian Ruan, Kun Xia et al.
DMiT: Deformable Mipmapped Tri-Plane Representation for Dynamic Scenes
Jing-Wen Yang, Jia-Mu Sun, Yong-Liang Yang et al.
Motion-Oriented Compositional Neural Radiance Fields for Monocular Dynamic Human Modeling
Jaehyeok Kim, Dongyoon Wee, Dan Xu
KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding
Zhihao Xu, Shengjie Gong, Jiapeng Tang et al.
Post-training Quantization with Progressive Calibration and Activation Relaxing for Text-to-Image Diffusion Models
Siao Tang, Xin Wang, Hong Chen et al.
DISCO: Embodied Navigation and Interaction via Differentiable Scene Semantics and Dual-level Control
Xinyu Xu, Shengcheng Luo, Yanchao Yang et al.
Textual Query-Driven Mask Transformer for Domain Generalized Segmentation
Byeonghyun Pak, Byeongju Woo, Sunghwan Kim et al.
Arbitrary-Scale Video Super-Resolution with Structural and Textural Priors
Wei Shang, Dongwei Ren, Wanying Zhang et al.
Combining Generative and Geometry Priors for Wide-Angle Portrait Correction
Lan Yao, Chaofeng Chen, Xiaoming Li et al.
To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now
Yimeng Zhang, jinghan jia, Xin Chen et al.
DualDn: Dual-domain Denoising via Differentiable ISP
Ruikang Li, Yujin Wang, Shiqi Chen et al.
AdaIFL: Adaptive Image Forgery Localization via a Dynamic and Importance-aware Transformer Network
Yuxi Li, Fuyuan Cheng, Wangbo Yu et al.
Event-based Head Pose Estimation: Benchmark and Method
jiahui yuan, Hebei Li, Yansong Peng et al.
Learning Unsigned Distance Functions from Multi-view Images with Volume Rendering Priors
Wen Yuan Zhang, Kanle Shi, Yushen Liu et al.
Assessing Sample Quality via the Latent Space of Generative Models
Jingyi Xu, Hieu Le, Dimitris Samaras
Responsible Visual Editing
Minheng Ni, Yeli Shen, Yabin Zhang et al.
Consistent 3D Line Mapping
Xulong Bai, Hainan Cui, Shuhan Shen
Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation
Zhihang Zhong, Gurunandan Krishnan, Xiao Sun et al.
MotionDirector: Motion Customization of Text-to-Video Diffusion Models
Rui Zhao, Yuchao Gu, Jay Zhangjie Wu et al.
OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving
Guoqing Wang, Zhongdao Wang, Pin Tang et al.
Probabilistic Image-Driven Traffic Modeling via Remote Sensing
Scott Workman, Armin Hadzic
UAV First-Person Viewers Are Radiance Field Learners
Liqi Yan, Qifan Wang, Junhan Zhao et al.
Knowledge-enhanced Visual-Language Pretraining for Computational Pathology
Xiao Zhou, Xiaoman Zhang, Chaoyi Wu et al.
Pick-a-back: Selective Device-to-Device Knowledge Transfer in Federated Continual Learning
JinYi Yoon, HyungJune Lee
Situated Instruction Following
So Yeon Min, Xavier Puig, Devendra Singh Chaplot et al.
Holodepth: Programmable Depth-Varying Projection via Computer-Generated Holography
Dorian Chan, Matthew O'Toole, Sizhuo Ma et al.
Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators
Yifan Pu, Xia Zhuofan, Jiayi Guo et al.
Two-Stage Video Shadow Detection via Temporal-Spatial Adaption
Xin Duan, Yu Cao, Lei Zhu et al.
CLIP-DINOiser: Teaching CLIP a few DINO tricks for open-vocabulary semantic segmentation
Monika Wysoczanska, Oriane Siméoni, Michaël Ramamonjisoa et al.
M^2Depth: Self-supervised Two-Frame Multi-camera Metric Depth Estimation
Yingshuang Zou, Yikang Ding, Xi Qiu et al.
Improving Adversarial Transferability via Model Alignment
Avery Ma, Amir-massoud Farahmand, Yangchen Pan et al.
RealGen: Retrieval Augmented Generation for Controllable Traffic Scenarios
Wenhao Ding, Yulong Cao, DING ZHAO et al.
Factorizing Text-to-Video Generation by Explicit Image Conditioning
Rohit Girdhar, Mannat Singh, Andrew Brown et al.
Cut out the Middleman: Revisiting Pose-based Gait Recognition
YANG FU, Saihui Hou, Shibei Meng et al.
Fast Registration of Photorealistic Avatars for VR Facial Animation
Chaitanya Patel, Shaojie Bai, Te-Li Wang et al.
Caltech Aerial RGB-Thermal Dataset in the Wild
Connor Lee, Matthew Anderson, Nikhil Ranganathan et al.
Diagnosing and Re-learning for Balanced Multimodal Learning
Yake Wei, Siwei Li, Ruoxuan Feng et al.
MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning
Vishal Nedungadi, Ankit Kariryaa, Stefan Oehmcke et al.
Loc3Diff: Local Diffusion for 3D Human Head Synthesis and Editing
Yushi Lan, Feitong Tan, Qiangeng Xu et al.
Learning to Distinguish Samples for Generalized Category Discovery
Fengxiang Yang, Pu Nan, Wenjing Li et al.
WBP: Training-time Backdoor Attacks through Hardware-based Weight Bit Poisoning
Kunbei Cai, Zhenkai Zhang, Qian Lou et al.
HVCLIP: High-dimensional Vector in CLIP for Unsupervised Domain Adaptation
Noranart Vesdapunt, Kah Kuen Fu, Yue Wu et al.
Improving 3D Semi-supervised Learning by Effectively Utilizing All Unlabelled Data
Sneha Paul, Zachary Patterson, Nizar Bouguila
Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling
Zixiao Wang, Hongtao Xie, YuXin Wang et al.
CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-Training Quantization of ViTs
Akshat Ramachandran, Souvik Kundu, Tushar Krishna
A Riemannian Approach for Spatiotemporal Analysis and Generation of 4D Tree-shaped Structures
Tahmina Khanam, Mohammed Bennamoun, Guan Wang et al.
ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation
Mengcheng Lan, Chaofeng Chen, Yiping Ke et al.
Unsupervised Variational Translator for Bridging Image Restoration and High-Level Vision Tasks
Jiawei Wu, Zhi Jin
Fast Point Cloud Geometry Compression with Context-based Residual Coding and INR-based Refinement
Hao Xu, Xi Zhang, Xiaolin Wu
Scene-Conditional 3D Object Stylization and Composition
Jinghao Zhou, Tomas Jakab, Philip Torr et al.
Contextual Correspondence Matters: Bidirectional Graph Matching for Video Summarization
yunzuo zhang, Yameng Liu
Easing 3D Pattern Reasoning with Side-view Features for Semantic Scene Completion
Linxi Huan, Mingyue Dong, Linwei Yue et al.
Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression
Animesh Sinha, Bo Sun, Anmol Kalia et al.
High-Quality Mesh Blendshape Generation from Face Videos via Neural Inverse Rendering
Xin Ming, Jiawei Li, Jingwang Ling et al.
InfoNorm: Mutual Information Shaping of Normals for Sparse-View Reconstruction
Xulong Wang, Siyan Dong, Youyi Zheng et al.
DreamReward: Aligning Human Preference in Text-to-3D Generation
junliang ye, Fangfu Liu, Qixiu Li et al.
Towards Image Ambient Lighting Normalization
Florin-Alexandru Vasluianu, Tim Seizinger, Zongwei Wu et al.
FedHide: Federated Learning by Hiding in the Neighbors
Hyunsin Park, Sungrack Yun
Towards Open-World Object-based Anomaly Detection via Self-Supervised Outlier Synthesis
Brian Isaac Medina, Yona Falinie Abdul Gaus, Neelanjan Bhowmik et al.
LPViT: Low-Power Semi-structured Pruning for Vision Transformers
KAIXIN Xu, Zhe Wang, Chunyun Chen et al.
Weighted Ensemble Models Are Strong Continual Learners
Imad Eddine Marouf, Subhankar Roy, Enzo Tartaglione et al.
GGRt: Towards Generalizable 3D Gaussians without Pose Priors in Real-Time
Hao Li, Yuanyuan Gao, Dingwen Zhang et al.
Chains of Diffusion Models
Yanheng Wei, Lianghua Huang, Zhi-Fan Wu et al.
Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance
Donghoon Ahn, Hyoungwon Cho, Jaewon Min et al.
Close, But Not There: Boosting Geographic Distance Sensitivity in Visual Place Recognition
Sergio Izquierdo, Javier Civera
TAG: Text Prompt Augmentation for Zero-Shot Out-of-Distribution Detection
Xixi Liu, Christopher Zach
Continual Learning and Unknown Object Discovery in 3D Scenes via Self-Distillation
Mohamed El Amine Boudjoghra, Jean Lahoud, Salman Khan et al.
Lost and Found: Overcoming Detector Failures in Online Multi-Object Tracking
Lorenzo Vaquero, Yihong XU, Xavier Alameda-Pineda et al.
How Far Can a 1-Pixel Camera Go? Solving Vision Tasks using Photoreceptors and Computationally Designed Visual Morphology
Andrei Atanov, Rishubh Singh, Jiawei Fu et al.
Beyond Pixels: Semi-Supervised Semantic Segmentation with a Multi-scale Patch-based Multi-Label Classifier
Prantik Howlader, Srijan Das, Hieu Le et al.
Spiking Wavelet Transformer
Yuetong Fang, Ziqing Wang, Lingfeng Zhang et al.
WAVE: Warping DDIM Inversion Features for Zero-shot Text-to-Video Editing
Yutang Feng, Sicheng Gao, Yuxiang Bao et al.
Few-shot Defect Image Generation based on Consistency Modeling
Qingfeng Shi, Jing Wei, Fei Shen et al.
AnimateMe: 4D Facial Expressions via Diffusion Models
Dimitrios Gerogiannis, Foivos Paraperas Papantoniou, Rolandos Alexandros Potamias et al.
iNeMo: Incremental Neural Mesh Models for Robust Class-Incremental Learning
Tom Fischer, Yaoyao Liu, Artur Jesslen et al.
Pose Guided Fine-Grained Sign Language Video Generation
Tongkai Shi, Lianyu Hu, Fanhua Shang et al.
Optimization-based Uncertainty Attribution Via Learning Informative Perturbations
Hanjing Wang, Bashirul Azam Biswas, Qiang Ji
Object-Aware Query Perturbation for Cross-Modal Image-Text Retrieval
Naoya Sogi, Takashi Shibata, Makoto Terao
GRiT: A Generative Region-to-text Transformer for Object Understanding
Jialian Wu, Jianfeng Wang, Zhengyuan Yang et al.
LRSLAM: Low-rank Representation of Signed Distance Fields in Dense Visual SLAM System
Hongbeen Park, Minjeong Park, Giljoo Nam et al.
BAGS: Blur Agnostic Gaussian Splatting through Multi-Scale Kernel Modeling
Cheng Peng, Yutao Tang, Yifan Zhou et al.
DPA-Net: Structured 3D Abstraction from Sparse Views via Differentiable Primitive Assembly
Fenggen Yu, Yiming Qian, Xu Zhang et al.
Reinforcement Learning via Auxillary Task Distillation
Abhinav Narayan Harish, Larry Heck, Josiah P Hanna et al.
Plug and Play: A Representation Enhanced Domain Adapter for Collaborative Perception
TIANYOU LUO, Quan Yuan, Yuchen Xia et al.
Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models
Yuchen Yang, Kwonjoon Lee, Behzad Dariush et al.
Improving Hyperbolic Representations via Gromov-Wasserstein Regularization
yifei Yang, Wonjun Lee, Dongmian Zou et al.
Depth-Aware Blind Image Decomposition for Real-World Adverse Weather Recovery
Chao Wang, Zhedong Zheng, Ruijie Quan et al.
DreamSampler: Unifying Diffusion Sampling and Score Distillation for Image Manipulation
Jeongsol Kim, Geon Yeong Park, Jong Chul Ye
Kinetic Typography Diffusion Model
Seonmi Park, Inhwan Bae, Seunghyun Shin et al.
Efficient 3D-Aware Facial Image Editing via Attribute-Specific Prompt Learning
Amandeep Kumar, Muhammad Awais, Sanath Narayan et al.
Unsupervised Representation Learning by Balanced Self Attention Matching
Daniel Shalam, Simon Korman
Make-Your-3D: Fast and Consistent Subject-Driven 3D Content Generation
Fangfu Liu, Hanyang Wang, Weiliang Chen et al.
SceneTeller: Language-to-3D Scene Generation
Basak Melis Ocal, Maxim Tatarchenko, Sezer Karaoglu et al.
MagMax: Leveraging Model Merging for Seamless Continual Learning
Daniel Marczak, Bartlomiej Twardowski, Tomasz Trzcinski et al.
Spline-based Transformers
Prashanth Chandran, Agon Serifi, Markus Gross et al.
Efficient NeRF Optimization - Not All Samples Remain Equally Hard
Juuso Korhonen, Goutham Rangu, Hamed Rezazadegan Tavakoli et al.
Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models
Taesup Kim, Donggeun Kim
Fast Diffusion-Based Counterfactuals for Shortcut Removal and Generation
Nina Weng, Paraskevas Pegios, Eike Petersen et al.
GroCo: Ground Constraint for Metric Self-Supervised Monocular Depth
Aurélien Cecille, Stefan Duffner, Franck DAVOINE et al.
Tight and Efficient Upper Bound on Spectral Norm of Convolutional Layers
Ekaterina Grishina, Mikhail Gorbunov, Maxim Rakhuba
Deciphering the Role of Representation Disentanglement: Investigating Compositional Generalization in CLIP Models
Reza Abbasi, Mohammad Rohban, Mahdieh Soleymani Baghshah
Towards compact reversible image representations for neural style transfer
Xiyao Liu, Siyu Yang, Jian Zhang et al.
Learning Scalable Model Soup on a Single GPU: An Efficient Subspace Training Strategy
Tao Li, Weisen Jiang, Fanghui Liu et al.
Straightforward Layer-wise Pruning for More Efficient Visual Adaptation
Ruizi Han, Jinglei Tang
Dual-Decoupling Learning and Metric-Adaptive Thresholding for Semi-Supervised Multi-Label Learning
Jiahao Xiao, Ming-Kun Xie, Heng-Bo Fan et al.
SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation
Lingchen Meng, Shiyi Lan, Hengduo Li et al.
Handling The Non-Smooth Challenge in Tensor SVD: A Multi-Objective Tensor Recovery Framework
Jingjing Zheng, Wanglong Lu, Wenzhe Wang et al.
Sur^2f: A Hybrid Representation for High-Quality and Efficient Surface Reconstruction from Multi-view Images
Zhangjin Huang, Zhihao Liang, Kui Jia
UniTalker: Scaling up Audio-Driven 3D Facial Animation through A Unified Model
Xiangyu Fan, Jiaqi Li, Zhiqian Lin et al.
PartCraft: Crafting Creative Objects by Parts
Kam Woh Ng, Xiatian Zhu, Yi-Zhe Song et al.
AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation
Sun Yanan, Yanchen Liu, Yinhao Tang et al.
Data Overfitting for On-Device Super-Resolution with Dynamic Algorithm and Compiler Co-Design
Li, zhihao shu, Jie Ji et al.
PRET: Planning with Directed Fidelity Trajectory for Vision and Language Navigation
Renjie Lu, Jing-Ke Meng, WEISHI ZHENG
Long-CLIP: Unlocking the Long-Text Capability of CLIP
Beichen Zhang, Pan Zhang, Xiaoyi Dong et al.
Learning with Counterfactual Explanations for Radiology Report Generation
Mingjie Li, Haokun Lin, Liang Qiu et al.
Pseudo-Embedding for Generalized Few-Shot Point Cloud Segmentation
Chih-Jung Tsai, Hwann-Tzong Chen, Tyng-Luh Liu
AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer
Zhuguanyu Wu, Jiaxin Chen, Hanwen Zhong et al.
Optimizing Illuminant Estimation in Dual-Exposure HDR Imaging
Mahmoud Afifi, Zhenhua Hu, Liang Liang
Chat-Edit-3D: Interactive 3D Scene Editing via Text Prompts
Shuangkang Fang, Yufeng Wang, Yi-Hsuan Tsai et al.
HybridBooth: Hybrid Prompt Inversion for Efficient Subject-Driven Generation
Shanyan Guan, Yanhao Ge, Ying Tai et al.
On the Viability of Monocular Depth Pre-training for Semantic Segmentation
DONG LAO, Fengyu Yang, Daniel Wang et al.
Weakly-supervised Camera Localization by Ground-to-satellite Image Registration
Yujiao Shi, HONGDONG LI, Akhil Perincherry et al.
ProtoComp: Diverse Point Cloud Completion with Controllable Prototype
Xumin Yu, Yanbo Wang, Jie Zhou et al.