Most Cited 2024 Poster Papers
12,324 papers found • Page 38 of 62
Conference
FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models
LIn Zhao, Tianchen Zhao, Zinan Lin et al.
COLMAP-Free 3D Gaussian Splatting
Yang Fu, Sifei Liu, Amey Kulkarni et al.
Personalized Residuals for Concept-Driven Text-to-Image Generation
Cusuh Ham, Matthew Fisher, James Hays et al.
Forecasting of 3D Whole-body Human Poses with Grasping Objects
yan haitao, Qiongjie Cui, Jiexin Xie et al.
Edge-Aware 3D Instance Segmentation Network with Intelligent Semantic Prior
Wonseok Roh, Hwanhee Jung, Giljoo Nam et al.
RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models
Ozgur Kara, Bariscan Kurtkaya, Hidir Yesiltepe et al.
Generalizable Novel-View Synthesis using a Stereo Camera
Haechan Lee, Wonjoon Jin, Seung-Hwan Baek et al.
Don’t Drop Your Samples! Coherence-Aware Training Benefits Conditional Diffusion
Nicolas Dufour, Victor Besnier, Vicky Kalogeiton et al.
Action-slot: Visual Action-centric Representations for Multi-label Atomic Activity Recognition in Traffic Scenes
Chi-Hsi Kung, 書緯 呂, Yi-Hsuan Tsai et al.
Diff-BGM: A Diffusion Model for Video Background Music Generation
Sizhe Li, Yiming Qin, Minghang Zheng et al.
Shadow-Enlightened Image Outpainting
Hang Yu, Ruilin Li, Shaorong Xie et al.
Specularity Factorization for Low-Light Enhancement
Saurabh Saini, P. J. Narayanan
Shallow-Deep Collaborative Learning for Unsupervised Visible-Infrared Person Re-Identification
Bin Yang, Jun Chen, Mang Ye
Attack To Defend: Exploiting Adversarial Attacks for Detecting Poisoned Models
Samar Fares, Karthik Nandakumar
LiDAR-Net: A Real-scanned 3D Point Cloud Dataset for Indoor Scenes
Yanwen Guo, Yuanqi Li, Dayong Ren et al.
Non-autoregressive Sequence-to-Sequence Vision-Language Models
Kunyu Shi, Qi Dong, Luis Goncalves et al.
No More Ambiguity in 360° Room Layout via Bi-Layout Estimation
Yu-Ju Tsai, Jin-Cheng Jhang, JINGJING ZHENG et al.
HUNTER: Unsupervised Human-centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real Scenes
Yichen Yao, Zimo Jiang, YUJING SUN et al.
3D LiDAR Mapping in Dynamic Environments using a 4D Implicit Neural Representation
Xingguang Zhong, Yue Pan, Cyrill Stachniss et al.
See Say and Segment: Teaching LMMs to Overcome False Premises
Tsung-Han Wu, Giscard Biamby, David Chan et al.
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
Li Hu
Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation
Xiao Lin, Wenfei Yang, Yuan Gao et al.
PostureHMR: Posture Transformation for 3D Human Mesh Recovery
Yu-Pei Song, Xiao WU, Zhaoquan Yuan et al.
WANDR: Intention-guided Human Motion Generation
Markos Diomataris, Nikos Athanasiou, Omid Taheri et al.
WWW: A Unified Framework for Explaining What Where and Why of Neural Networks by Interpretation of Neuron Concepts
Yong Hyun Ahn, Hyeon Bae Kim, Seong Tae Kim
SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation
Yuxuan Zhang, Yiren Song, Jiaming Liu et al.
Denoising Point Clouds in Latent Space via Graph Convolution and Invertible Neural Network
Aihua Mao, Biao Yan, Zijing Ma et al.
SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation
Thuan Nguyen, Anh Tran
Looking 3D: Anomaly Detection with 2D-3D Alignment
Ankan Kumar Bhunia, Changjian Li, Hakan Bilen
EventPS: Real-Time Photometric Stereo Using an Event Camera
Bohan Yu, Jieji Ren, Jin Han et al.
Circuit Design and Efficient Simulation of Quantum Inner Product and Empirical Studies of Its Effect on Near-Term Hybrid Quantum-Classic Machine Learning
Hao Xiong, Yehui Tang, Xinyu Ye et al.
PSDPM: Prototype-based Secondary Discriminative Pixels Mining for Weakly Supervised Semantic Segmentation
Xinqiao Zhao, Ziqian Yang, Tianhong Dai et al.
Towards 3D Vision with Low-Cost Single-Photon Cameras
Fangzhou Mu, Carter Sifferman, Sacha Jungerman et al.
On Train-Test Class Overlap and Detection for Image Retrieval
Chull Hwan Song, Jooyoung Yoon, Taebaek Hwang et al.
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Jieneng Chen, Qihang Yu, Xiaohui Shen et al.
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models
Yiming Zhang, Zhening Xing, Yanhong Zeng et al.
DemoCaricature: Democratising Caricature Generation with a Rough Sketch
Dar-Yen Chen, Ayan Kumar Bhunia, Subhadeep Koley et al.
Language-driven Object Fusion into Neural Radiance Fields with Pose-Conditioned Dataset Updates
Ka Chun SHUM, Jaeyeon Kim, Binh-Son Hua et al.
Unlocking the Potential of Pre-trained Vision Transformers for Few-Shot Semantic Segmentation through Relationship Descriptors
Ziqin Zhou, Hai-Ming Xu, Yangyang Shu et al.
Relightful Harmonization: Lighting-aware Portrait Background Replacement
Mengwei Ren, Wei Xiong, Jae Shin Yoon et al.
Taming the Tail in Class-Conditional GANs: Knowledge Sharing via Unconditional Training at Lower Resolutions
Saeed Khorram, Mingqi Jiang, Mohamad Shahbazi et al.
MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation
Yanhui Wang, Jianmin Bao, Wenming Weng et al.
LightOctree: Lightweight 3D Spatially-Coherent Indoor Lighting Estimation
Xuecan Wang, Shibang Xiao, Xiaohui Liang
Bi-level Learning of Task-Specific Decoders for Joint Registration and One-Shot Medical Image Segmentation
Xin Fan, Xiaolin Wang, Jiaxin Gao et al.
FairCLIP: Harnessing Fairness in Vision-Language Learning
Yan Luo, MIN SHI, Muhammad Osama Khan et al.
ArGue: Attribute-Guided Prompt Tuning for Vision-Language Models
Xinyu Tian, Shu Zou, Zhaoyuan Yang et al.
MS-MANO: Enabling Hand Pose Tracking with Biomechanical Constraints
Pengfei Xie, Wenqiang Xu, Tutian Tang et al.
CGI-DM: Digital Copyright Authentication for Diffusion Models via Contrasting Gradient Inversion
Xiaoyu Wu, Yang Hua, Chumeng Liang et al.
Separate and Conquer: Decoupling Co-occurrence via Decomposition and Representation for Weakly Supervised Semantic Segmentation
Zhiwei Yang, Kexue Fu, Minghong Duan et al.
Telling Left from Right: Identifying Geometry-Aware Semantic Correspondence
Junyi Zhang, Charles Herrmann, Junhwa Hur et al.
MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual Grounding
Chun-Peng Chang, Shaoxiang Wang, Alain Pagani et al.
RTracker: Recoverable Tracking via PN Tree Structured Memory
Yuqing Huang, Xin Li, Zikun Zhou et al.
Efficient Solution of Point-Line Absolute Pose
Petr Hruby, Timothy Duff, Marc Pollefeys
SPIN: Simultaneous Perception Interaction and Navigation
Shagun Uppal, Ananye Agarwal, Haoyu Xiong et al.
CAMixerSR: Only Details Need More "Attention"
Yan Wang, Yi Liu, Shijie Zhao et al.
MarkovGen: Structured Prediction for Efficient Text-to-Image Generation
Sadeep Jayasumana, Daniel Glasner, Srikumar Ramalingam et al.
Constrained Layout Generation with Factor Graphs
Mohammed Haroon Dupty, Yanfei Dong, Sicong Leng et al.
Neural Implicit Morphing of Face Images
Guilherme Schardong, Tiago Novello, Hallison Paz et al.
Kernel Adaptive Convolution for Scene Text Detection via Distance Map Prediction
Jinzhi Zheng, Heng Fan, Libo Zhang
PlatoNeRF: 3D Reconstruction in Plato's Cave via Single-View Two-Bounce Lidar
Tzofi Klinghoffer, Xiaoyu Xiang, Siddharth Somasundaram et al.
Tackling the Singularities at the Endpoints of Time Intervals in Diffusion Models
Pengze Zhang, Hubery Yin, Chen Li et al.
Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning
Siteng Huang, Biao Gong, Yutong Feng et al.
LoCoNet: Long-Short Context Network for Active Speaker Detection
Xizi Wang, Feng Cheng, Gedas Bertasius
WinSyn: : A High Resolution Testbed for Synthetic Data
Tom Kelly, John Femiani, Peter Wonka
SpatialTracker: Tracking Any 2D Pixels in 3D Space
Yuxi Xiao, Qianqian Wang, Shangzhan Zhang et al.
Multimodal Representation Learning by Alternating Unimodal Adaptation
Xiaohui Zhang, Jaehong Yoon, Mohit Bansal et al.
Attention Calibration for Disentangled Text-to-Image Personalization
Yanbing Zhang, Mengping Yang, Qin Zhou et al.
Segment Every Out-of-Distribution Object
Wenjie Zhao, Jia Li, Xin Dong et al.
Open-Vocabulary Object 6D Pose Estimation
Jaime Corsetti, Davide Boscaini, Changjae Oh et al.
Quantifying Uncertainty in Motion Prediction with Variational Bayesian Mixture
Juanwu Lu, Can Cui, Yunsheng Ma et al.
SG-PGM: Partial Graph Matching Network with Semantic Geometric Fusion for 3D Scene Graph Alignment and Its Downstream Tasks
Yaxu Xie, Alain Pagani, Didier Stricker
Fast ODE-based Sampling for Diffusion Models in Around 5 Steps
Zhenyu Zhou, Defang Chen, Can Wang et al.
From-Ground-To-Objects: Coarse-to-Fine Self-supervised Monocular Depth Estimation of Dynamic Objects with Ground Contact Prior
Jaeho Moon, Juan Luis Gonzalez Bello, Byeongjun Kwon et al.
MMCert: Provable Defense against Adversarial Attacks to Multi-modal Models
Yanting Wang, Hongye Fu, Wei Zou et al.
Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization
Guopeng Li, Ming Qian, Gui-Song Xia
Hearing Anything Anywhere
Mason Wang, Ryosuke Sawata, Samuel Clarke et al.
Dr2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning
Chen Zhao, Shuming Liu, Karttikeya Mangalam et al.
Video-Based Human Pose Regression via Decoupled Space-Time Aggregation
Jijie He, Wenwu Yang
Boosting Image Restoration via Priors from Pre-trained Models
Xiaogang Xu, Shu Kong, Tao Hu et al.
RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features
Geonho Bang, Kwangjin Choi, Jisong Kim et al.
EasyDrag: Efficient Point-based Manipulation on Diffusion Models
Xingzhong Hou, Boxiao Liu, Yi Zhang et al.
Learned Lossless Image Compression based on Bit Plane Slicing
Zhe Zhang, Huairui Wang, Zhenzhong Chen et al.
BEM: Balanced and Entropy-based Mix for Long-Tailed Semi-Supervised Learning
Hongwei Zheng, Linyuan Zhou, Han Li et al.
TexTile: A Differentiable Metric for Texture Tileability
Carlos Rodriguez-Pardo, Dan Casas, Elena Garces et al.
MatSynth: A Modern PBR Materials Dataset
Giuseppe Vecchio, Valentin Deschaintre
LED: A Large-scale Real-world Paired Dataset for Event Camera Denoising
Yuxing Duan
Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models
Daniel Geng, Inbum Park, Andrew Owens
Learn from View Correlation: An Anchor Enhancement Strategy for Multi-view Clustering
Suyuan Liu, KE LIANG, Zhibin Dong et al.
Passive Snapshot Coded Aperture Dual-Pixel RGB-D Imaging
Bhargav Ghanekar, Salman Siddique Khan, Pranav Sharma et al.
ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations
Maitreya Patel, Changhoon Kim, Sheng Cheng et al.
Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning
Haoyu Chen, Wenbo Li, Jinjin Gu et al.
Dual DETRs for Multi-Label Temporal Action Detection
Yuhan Zhu, Guozhen Zhang, Jing Tan et al.
GigaTraj: Predicting Long-term Trajectories of Hundreds of Pedestrians in Gigapixel Complex Scenes
Haozhe Lin, Chunyu Wei, Li He et al.
Image Sculpting: Precise Object Editing with 3D Geometry Control
Jiraphon Yenphraphai, Xichen Pan, Sainan Liu et al.
ID-Blau: Image Deblurring by Implicit Diffusion-based reBLurring AUgmentation
Jia-Hao Wu, Fu-Jen Tsai, Yan-Tsung Peng et al.
HumanNeRF-SE: A Simple yet Effective Approach to Animate HumanNeRF with Diverse Poses
Caoyuan Ma, Yu-Lun Liu, Zhixiang Wang et al.
LeftRefill: Filling Right Canvas based on Left Reference through Generalized Text-to-Image Diffusion Model
Chenjie Cao, Yunuo Cai, Qiaole Dong et al.
Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentation
Wenxuan Wang, Tongtian Yue, Yisi Zhang et al.
Ungeneralizable Examples
Jingwen Ye, Xinchao Wang
Language-only Training of Zero-shot Composed Image Retrieval
Geonmo Gu, Sanghyuk Chun, Wonjae Kim et al.
Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models
Shitian Zhao, Zhuowan Li, YadongLu et al.
Instruct-Imagen: Image Generation with Multi-modal Instruction
Hexiang Hu, Kelvin C.K. Chan, Yu-Chuan Su et al.
Diffeomorphic Template Registration for Atmospheric Turbulence Mitigation
Dong Lao, Congli Wang, Alex Wong et al.
Holodeck: Language Guided Generation of 3D Embodied AI Environments
Yue Yang, Fan-Yun Sun, Luca Weihs et al.
SpikingResformer: Bridging ResNet and Vision Transformer in Spiking Neural Networks
Xinyu Shi, Zecheng Hao, Zhaofei Yu
Tactile-Augmented Radiance Fields
Yiming Dou, Fengyu Yang, Yi Liu et al.
KVQ: Kwai Video Quality Assessment for Short-form Videos
Yiting Lu, Xin Li, Yajing Pei et al.
Purified and Unified Steganographic Network
GuoBiao Li, Sheng Li, Zicong Luo et al.
Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model
Dian Zheng, Xiao-Ming Wu, Shuzhou Yang et al.
L4D-Track: Language-to-4D Modeling Towards 6-DoF Tracking and Shape Reconstruction in 3D Point Cloud Stream
Jingtao Sun, Yaonan Wang, Mingtao Feng et al.
MAPSeg: Unified Unsupervised Domain Adaptation for Heterogeneous Medical Image Segmentation Based on 3D Masked Autoencoding and Pseudo-Labeling
Xuzhe Zhang, Yuhao Wu, Elsa Angelini et al.
Traffic Scene Parsing through the TSP6K Dataset
Peng-Tao Jiang, Yuqi Yang, Yang Cao et al.
DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Muyang Li, Tianle Cai, Jiaxin Cao et al.
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Juhong Min, Shyamal Buch, Arsha Nagrani et al.
DIMAT: Decentralized Iterative Merging-And-Training for Deep Learning Models
Nastaran Saadati, Minh Pham, Nasla Saleem et al.
Text-Driven Image Editing via Learnable Regions
Yuanze Lin, Yi-Wen Chen, Yi-Hsuan Tsai et al.
Self-Supervised Representation Learning from Arbitrary Scenarios
Zhaowen Li, Yousong Zhu, Zhiyang Chen et al.
The Neglected Tails in Vision-Language Models
Shubham Parashar, Tian Liu, Zhiqiu Lin et al.
SODA: Bottleneck Diffusion Models for Representation Learning
Drew Hudson, Daniel Zoran, Mateusz Malinowski et al.
Enhancing the Power of OOD Detection via Sample-Aware Model Selection
Feng Xue, Zi He, Yuan Zhang et al.
Towards More Unified In-context Visual Understanding
Dianmo Sheng, Dongdong Chen, Zhentao Tan et al.
Fourier-basis Functions to Bridge Augmentation Gap: Rethinking Frequency Augmentation in Image Classification
Mei Vaish, Shunxin Wang, Nicola Strisciuglio
Molecular Data Programming: Towards Molecule Pseudo-labeling with Systematic Weak Supervision
Xin Juan, Kaixiong Zhou, Ninghao Liu et al.
Hyper-MD: Mesh Denoising with Customized Parameters Aware of Noise Intensity and Geometric Characteristics
Xingtao Wang, Hongliang Wei, Xiaopeng Fan et al.
Strong Transferable Adversarial Attacks via Ensembled Asymptotically Normal Distribution Learning
Zhengwei Fang, Rui Wang, Tao Huang et al.
An Empirical Study of Scaling Law for Scene Text Recognition
Miao Rang, Zhenni Bi, Chuanjian Liu et al.
Differentiable Neural Surface Refinement for Modeling Transparent Objects
Weijian Deng, Dylan Campbell, Chunyi Sun et al.
Tune-An-Ellipse: CLIP Has Potential to Find What You Want
Jinheng Xie, Songhe Deng, Bing Li et al.
OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos
Dongyoung Choi, Hyeonjoong Jang, Min H. Kim
Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
Kai Yang, Jian Tao, Jiafei Lyu et al.
XFeat: Accelerated Features for Lightweight Image Matching
Guilherme Potje, Felipe Cadar, André Araujo et al.
Looking Similar Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Nikhil Singh, Chih-Wei Wu, Iroro Orife et al.
Taming Self-Training for Open-Vocabulary Object Detection
Shiyu Zhao, Samuel Schulter, Long Zhao et al.
FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Action Segmentation
Zijia Lu, Ehsan Elhamifar
GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians
Liangxiao Hu, Hongwen Zhang, Yuxiang Zhang et al.
FlowerFormer: Empowering Neural Architecture Encoding using a Flow-aware Graph Transformer
Dongyeong Hwang, Hyunju Kim, Sunwoo Kim et al.
Text2QR: Harmonizing Aesthetic Customization and Scanning Robustness for Text-Guided QR Code Generation
Guangyang Wu, Xiaohong Liu, Jun Jia et al.
ProMark: Proactive Diffusion Watermarking for Causal Attribution
Vishal Asnani, John Collomosse, Tu Bui et al.
DiffForensics: Leveraging Diffusion Prior to Image Forgery Detection and Localization
Zeqin Yu, Jiangqun Ni, Yuzhen Lin et al.
Neural Directional Encoding for Efficient and Accurate View-Dependent Appearance Modeling
Liwen Wu, Sai Bi, Zexiang Xu et al.
SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection
Gang Zhang, Chen Junnan, Guohuan Gao et al.
Sheared Backpropagation for Fine-tuning Foundation Models
Zhiyuan Yu, Li Shen, Liang Ding et al.
On the Content Bias in Fréchet Video Distance
Songwei Ge, Aniruddha Mahapatra, Gaurav Parmar et al.
Multiview Aerial Visual RECognition (MAVREC): Can Multi-view Improve Aerial Visual Perception?
Aritra Dutta, Srijan Das, Jacob Nielsen et al.
CPGA: Coding Priors-Guided Aggregation Network for Compressed Video Quality Enhancement
Qiang Zhu, Jinhua Hao, Yukang Ding et al.
Video2Game: Real-time Interactive Realistic and Browser-Compatible Environment from a Single Video
Hongchi Xia, Chih-Hao Lin, Wei-Chiu Ma et al.
Identifying Important Group of Pixels using Interactions
Kosuke Sumiyasu, Kazuhiko Kawamoto, Hiroshi Kera
Are Conventional SNNs Really Efficient? A Perspective from Network Quantization
Guobin Shen, Dongcheng Zhao, Tenglong Li et al.
CapHuman: Capture Your Moments in Parallel Universes
Chao Liang, Fan Ma, Linchao Zhu et al.
Self-Discovering Interpretable Diffusion Latent Directions for Responsible Text-to-Image Generation
Hang Li, Chengzhi Shen, Philip H.S. Torr et al.
Towards Automatic Power Battery Detection: New Challenge Benchmark Dataset and Baseline
Xiaoqi Zhao, Youwei Pang, Zhenyu Chen et al.
Infrared Small Target Detection with Scale and Location Sensitivity
Qiankun Liu, Rui Liu, Bolun Zheng et al.
SleepVST: Sleep Staging from Near-Infrared Video Signals using Pre-Trained Transformers
Jonathan F. Carter, Joao Jorge, Oliver Gibson et al.
Empowering Resampling Operation for Ultra-High-Definition Image Enhancement with Model-Aware Guidance
Yu, Jie Huang, Li et al.
From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models
Rongjie Li, Songyang Zhang, Dahua Lin et al.
OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion
Xinyu Zhan, Lixin Yang, Yifei Zhao et al.
Visual Point Cloud Forecasting enables Scalable Autonomous Driving
Zetong Yang, Li Chen, Yanan Sun et al.
HumanNorm: Learning Normal Diffusion Model for High-quality and Realistic 3D Human Generation
Xin Huang, Ruizhi Shao, Qi Zhang et al.
HOLD: Category-agnostic 3D Reconstruction of Interacting Hands and Objects from Video
Zicong Fan, Maria Parelli, Maria Kadoglou et al.
AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving
Mingfu Liang, Jong-Chyi Su, Samuel Schulter et al.
Attentive Illumination Decomposition Model for Multi-Illuminant White Balancing
Dongyoung Kim, Jinwoo Kim, Junsang Yu et al.
OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies
Lingdong Kong, Youquan Liu, Lai Xing Ng et al.
Z*: Zero-shot Style Transfer via Attention Reweighting
Yingying Deng, Xiangyu He, Fan Tang et al.
G-HOP: Generative Hand-Object Prior for Interaction Reconstruction and Grasp Synthesis
Yufei Ye, Abhinav Gupta, Kris Kitani et al.
ConCon-Chi: Concept-Context Chimera Benchmark for Personalized Vision-Language Tasks
Andrea Rosasco, Stefano Berti, Giulia Pasquale et al.
Habitat Synthetic Scenes Dataset (HSSD-200): An Analysis of 3D Scene Scale and Realism Tradeoffs for ObjectGoal Navigation
Mukul Khanna, Yongsen Mao, Hanxiao Jiang et al.
KeyPoint Relative Position Encoding for Face Recognition
Minchul Kim, Feng Liu, Yiyang Su et al.
QDFormer: Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition
Xiang Li, Jinglu Wang, Xiaohao Xu et al.
From a Bird's Eye View to See: Joint Camera and Subject Registration without the Camera Calibration
Zekun Qian, Ruize Han, Wei Feng et al.
Joint2Human: High-Quality 3D Human Generation via Compact Spherical Embedding of 3D Joints
Muxin Zhang, Qiao Feng, Zhuo Su et al.
FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis
Feng Liang, Bichen Wu, Jialiang Wang et al.
Accept the Modality Gap: An Exploration in the Hyperbolic Space
Sameera Ramasinghe, Violetta Shevchenko, Gil Avraham et al.
Random Entangled Tokens for Adversarially Robust Vision Transformer
Huihui Gong, Minjing Dong, Siqi Ma et al.
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
Tongjia Chen, Hongshan Yu, Zhengeng Yang et al.
Continuous Pose for Monocular Cameras in Neural Implicit Representation
Qi Ma, Danda Paudel, Ajad Chhatkuli et al.
BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning
Siyuan Liang, Mingli Zhu, Aishan Liu et al.
A Pedestrian is Worth One Prompt: Towards Language Guidance Person Re-Identification
Zexian Yang, Dayan Wu, Chenming Wu et al.
From SAM to CAMs: Exploring Segment Anything Model for Weakly Supervised Semantic Segmentation
Hyeokjun Kweon, Kuk-Jin Yoon
Vlogger: Make Your Dream A Vlog
Shaobin Zhuang, Kunchang Li, Xinyuan Chen et al.
EpiDiff: Enhancing Multi-View Synthesis via Localized Epipolar-Constrained Diffusion
Zehuan Huang, Hao Wen, Junting Dong et al.
IPoD: Implicit Field Learning with Point Diffusion for Generalizable 3D Object Reconstruction from Single RGB-D Images
Yushuang Wu, Luyue Shi, Junhao Cai et al.
SVGDreamer: Text Guided SVG Generation with Diffusion Model
XiMing Xing, Chuang Wang, Haitao Zhou et al.
Dual Prototype Attention for Unsupervised Video Object Segmentation
Suhwan Cho, Minhyeok Lee, Seunghoon Lee et al.
Contrastive Mean-Shift Learning for Generalized Category Discovery
Sua Choi, Dahyun Kang, Minsu Cho
Panacea: Panoramic and Controllable Video Generation for Autonomous Driving
Yuqing Wen, Yucheng Zhao, Yingfei Liu et al.
Flexible Biometrics Recognition: Bridging the Multimodality Gap through Attention Alignment and Prompt Tuning
Leslie Ching Ow Tiong, Dick Sigmund, Chen-Hui Chan et al.
ALGM: Adaptive Local-then-Global Token Merging for Efficient Semantic Segmentation with Plain Vision Transformers
Narges Norouzi, Svetlana Orlova, Daan de Geus et al.
UDiFF: Generating Conditional Unsigned Distance Fields with Optimal Wavelet Diffusion
Junsheng Zhou, Weiqi Zhang, Baorui Ma et al.
UFineBench: Towards Text-based Person Retrieval with Ultra-fine Granularity
Jialong Zuo, Hanyu Zhou, Ying Nie et al.
Test-Time Zero-Shot Temporal Action Localization
Benedetta Liberatori, Alessandro Conti, Paolo Rota et al.
Towards Text-guided 3D Scene Composition
Qihang Zhang, Chaoyang Wang, Aliaksandr Siarohin et al.
Instance-aware Exploration-Verification-Exploitation for Instance ImageGoal Navigation
Xiaohan Lei, Min Wang, Wengang Zhou et al.
Learning SO(3)-Invariant Semantic Correspondence via Local Shape Transform
Chunghyun Park, Seungwook Kim, Jaesik Park et al.
Taming Mode Collapse in Score Distillation for Text-to-3D Generation
Peihao Wang, Dejia Xu, Zhiwen Fan et al.
SnAG: Scalable and Accurate Video Grounding
Fangzhou Mu, Sicheng Mo, Yin Li
GLaMM: Pixel Grounding Large Multimodal Model
Hanoona Rasheed, Muhammad Maaz, Sahal Shaji Mullappilly et al.
ManiFPT: Defining and Analyzing Fingerprints of Generative Models
Hae Jin Song, Mahyar Khayatkhoei, Wael AbdAlmageed
CORE-MPI: Consistency Object Removal with Embedding MultiPlane Image
Donggeun Yoon, Donghyeon Cho
ScoreHypo: Probabilistic Human Mesh Estimation with Hypothesis Scoring
Yuan Xu, Xiaoxuan Ma, Jiajun Su et al.
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models
Nataniel Ruiz, Yuanzhen Li, Varun Jampani et al.