Most Cited CVPR "manufacturing applications" Papers
5,589 papers found • Page 3 of 28
Conference
One Prompt Word is Enough to Boost Adversarial Robustness for Pre-trained Vision-Language Models
Lin Li, Haoyan Guan, Jianing Qiu et al.
Accurate Spatial Gene Expression Prediction by Integrating Multi-Resolution Features
Youngmin Chung, Ji Hun Ha, Kyeong Chan Im et al.
Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation
Zhipeng Du, Miaojing Shi, Jiankang Deng
Posterior Distillation Sampling
Juil Koo, Chanho Park, Minhyuk Sung
4D-DRESS: A 4D Dataset of Real-World Human Clothing With Semantic Annotations
Wenbo Wang, Hsuan-I Ho, Chen Guo et al.
MET3R: Measuring Multi-View Consistency in Generated Images
Mohammad Asim, Christopher Wewer, Thomas Wimmer et al.
Fair Federated Learning under Domain Skew with Local Consistency and Domain Diversity
Yuhang Chen, Wenke Huang, Mang Ye
Efficient Multi-scale Network with Learnable Discrete Wavelet Transform for Blind Motion Deblurring
Xin Gao, Tianheng Qiu, Xinyu Zhang et al.
Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision
Yi Yu, Xue Yang, Qingyun Li et al.
Neural Sign Actors: A Diffusion Model for 3D Sign Language Production from Text
Vasileios Baltatzis, Rolandos Alexandros Potamias, Evangelos Ververas et al.
SeeGround: See and Ground for Zero-Shot Open-Vocabulary 3D Visual Grounding
Rong Li, Shijie Li, Lingdong Kong et al.
DAVE - A Detect-and-Verify Paradigm for Low-Shot Counting
Jer Pelhan, Alan Lukezic, Vitjan Zavrtanik et al.
Error Detection in Egocentric Procedural Task Videos
Shih-Po Lee, Zijia Lu, Zekun Zhang et al.
LEAD: Learning Decomposition for Source-free Universal Domain Adaptation
Sanqing Qu, Tianpei Zou, Lianghua He et al.
DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment
Dahyun Kang, Piotr Bojanowski, Huy V. Vo et al.
GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control
Mariam Hassan, Sebastian Stapf, Ahmad Rahimi et al.
Learning the 3D Fauna of the Web
Zizhang Li, Dor Litvak, Ruining Li et al.
HouseCat6D - A Large-Scale Multi-Modal Category Level 6D Object Perception Dataset with Household Objects in Realistic Scenarios
HyunJun Jung, Shun-Cheng Wu, Patrick Ruhkamp et al.
Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion
Kiran Chhatre, Radek Danecek, Nikos Athanasiou et al.
UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All
Yuanhuiyi Lyu, Xu Zheng, Jiazhou Zhou et al.
CAGE: Controllable Articulation GEneration
Jiayi Liu, Hou In Ivan Tam, Ali Mahdavi Amiri et al.
Universal Actions for Enhanced Embodied Foundation Models
Jinliang Zheng, Jianxiong Li, Dongxiu Liu et al.
SemCity: Semantic Scene Generation with Triplane Diffusion
Jumin Lee, Sebin Lee, Changho Jo et al.
AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation
Haonan Wang, Qixiang ZHANG, Yi Li et al.
Continual Self-supervised Learning: Towards Universal Multi-modal Medical Data Representation Learning
Yiwen Ye, Yutong Xie, Jianpeng Zhang et al.
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages
Ashmal Vayani, Dinura Dissanayake, Hasindri Watawana et al.
Exploiting Diffusion Prior for Generalizable Dense Prediction
Hsin-Ying Lee, Hung-Yu Tseng, Hsin-Ying Lee et al.
Towards Efficient Replay in Federated Incremental Learning
Yichen Li, Qunwei Li, Haozhao Wang et al.
Generative Proxemics: A Prior for 3D Social Interaction from Images
Vickie Ye, Vickie Ye, Georgios Pavlakos et al.
Vision-and-Language Navigation via Causal Learning
Liuyi Wang, Zongtao He, Ronghao Dang et al.
FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization
Shuai Tan, Bin Ji, Ye Pan
DiffAvatar: Simulation-Ready Garment Optimization with Differentiable Simulation
Yifei Li, Hsiaoyu Chen, Egor Larionov et al.
Test-Time Domain Generalization for Face Anti-Spoofing
Qianyu Zhou, Ke-Yue Zhang, Taiping Yao et al.
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
Yuqian Yuan, Hang Zhang, Wentong Li et al.
Linguistic-Aware Patch Slimming Framework for Fine-grained Cross-Modal Alignment
Zheren Fu, Lei Zhang, Hou Xia et al.
Investigating and Mitigating the Side Effects of Noisy Views for Self-Supervised Clustering Algorithms in Practical Multi-View Scenarios
Jie Xu, Yazhou Ren, Xiaolong Wang et al.
Towards Robust Event-guided Low-Light Image Enhancement: A Large-Scale Real-World Event-Image Dataset and Novel Approach
Guoqiang Liang, Kanghao Chen, Hangyu Li et al.
ID-like Prompt Learning for Few-Shot Out-of-Distribution Detection
Yichen Bai, Zongbo Han, Bing Cao et al.
A Distractor-Aware Memory for Visual Object Tracking with SAM2
Alan Lukezic, Jovana Videnović, Matej Kristan
HD-EPIC: A Highly-Detailed Egocentric Video Dataset
Toby Perrett, Ahmad Darkhalil, Saptarshi Sinha et al.
Don't Shake the Wheel: Momentum-Aware Planning in End-to-End Autonomous Driving
Ziying Song, Caiyan Jia, Lin Liu et al.
Generalizing Deepfake Video Detection with Plug-and-Play: Video-Level Blending and Spatiotemporal Adapter Tuning
Zhiyuan Yan, Yandan Zhao, Shen Chen et al.
Prompt Learning via Meta-Regularization
Jinyoung Park, Juyeon Ko, Hyunwoo J. Kim
SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors
Dave Zhenyu Chen, Haoxuan Li, Hsin-Ying Lee et al.
Multi-subject Open-set Personalization in Video Generation
Tsai-Shien Chen, Aliaksandr Siarohin, Willi Menapace et al.
AA-CLIP: Enhancing Zero-Shot Anomaly Detection via Anomaly-Aware CLIP
wenxin ma, Xu Zhang, Qingsong Yao et al.
Scene Adaptive Sparse Transformer for Event-based Object Detection
Yansong Peng, Li Hebei, Yueyi Zhang et al.
DEFOM-Stereo: Depth Foundation Model Based Stereo Matching
Hualie Jiang, Zhiqiang Lou, Laiyan Ding et al.
A Vision Check-up for Language Models
Pratyusha Sharma, Tamar Rott Shaham, Manel Baradad et al.
OpenHumanVid: A Large-Scale High-Quality Dataset for Enhancing Human-Centric Video Generation
Hui Li, Mingwang Xu, Qingkun Su et al.
Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On
Xu Yang, Changxing Ding, Zhibin Hong et al.
NOPE: Novel Object Pose Estimation from a Single Image
Van Nguyen Nguyen, Thibault Groueix, Georgy Ponimatkin et al.
UnScene3D: Unsupervised 3D Instance Segmentation for Indoor Scenes
David Rozenberszki, Or Litany, Angela Dai
PAD: Patch-Agnostic Defense against Adversarial Patch Attacks
Lihua Jing, Rui Wang, Wenqi Ren et al.
DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception
Yibo Wang, Ruiyuan Gao, Kai Chen et al.
Portrait4D: Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data
Yu Deng, Duomin Wang, Xiaohang Ren et al.
Learning Diffusion Texture Priors for Image Restoration
Tian Ye, Sixiang Chen, Wenhao Chai et al.
NoiseCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions in Diffusion Models
Yusuf Dalva, Pinar Yanardag
Gradient Reweighting: Towards Imbalanced Class-Incremental Learning
Jiangpeng He
Sonata: Self-Supervised Learning of Reliable Point Representations
Xiaoyang Wu, Daniel DeTone, Duncan Frost et al.
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Pinelopi Papalampidi, Skanda Koppula, Shreya Pathak et al.
AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation
Qingping SUN, Yanjun Wang, Ailing Zeng et al.
3D-LLaVA: Towards Generalist 3D LMMs with Omni Superpoint Transformer
Jiajun Deng, Tianyu He, Li Jiang et al.
OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
Junbo Niu, Yifei Li, Ziyang Miao et al.
ZeroShape: Regression-based Zero-shot Shape Reconstruction
Zixuan Huang, Stefan Stojanov, Anh Thai et al.
GLACE: Global Local Accelerated Coordinate Encoding
Fangjinhua Wang, Xudong Jiang, Silvano Galliani et al.
DrVideo: Document Retrieval Based Long Video Understanding
Ziyu Ma, Chenhui Gou, Hengcan Shi et al.
SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction
Yang Zhou, Hao Shao, Letian Wang et al.
DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention
Lianghui Zhu, Zilong Huang, Bencheng Liao et al.
Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior
Fangfu Liu, Diankun Wu, Yi Wei et al.
5%>100%: Breaking Performance Shackles of Full Fine-Tuning on Visual Recognition Tasks
Dongshuo Yin, Leiyi Hu, Bin Li et al.
Multi-view Aggregation Network for Dichotomous Image Segmentation
Qian Yu, Xiaoqi Zhao, Youwei Pang et al.
EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning
Hongxia Xie, Chu-Jun Peng, Yu-Wen Tseng et al.
Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving
JunDa Cheng, Wei Yin, Kaixuan Wang et al.
MIDI: Multi-Instance Diffusion for Single Image to 3D Scene Generation
Zehuan Huang, Yuanchen Guo, Xingqiao An et al.
FastVLM: Efficient Vision Encoding for Vision Language Models
Pavan Kumar Anasosalu Vasu, Fartash Faghri, Chun-Liang Li et al.
RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization
Mengqi Huang, Zhendong Mao, Mingcong Liu et al.
Video-Guided Foley Sound Generation with Multimodal Controls
Ziyang Chen, Prem Seetharaman, Bryan Russell et al.
VISTA3D: A Unified Segmentation Foundation Model For 3D Medical Imaging
Yufan He, Pengfei Guo, Yucheng Tang et al.
GaussianFormer-2: Probabilistic Gaussian Superposition for Efficient 3D Occupancy Prediction
Yuanhui Huang, Amonnut Thammatadatrakoon, Wenzhao Zheng et al.
PEM: Prototype-based Efficient MaskFormer for Image Segmentation
Niccolò Cavagnero, Gabriele Rosi, Claudia Cuttano et al.
Unbiased Faster R-CNN for Single-source Domain Generalized Object Detection
Yajing Liu, Shijun Zhou, Xiyao Liu et al.
FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models
Jinglin Xu, Yijie Guo, Yuxin Peng
Rethinking Diffusion for Text-Driven Human Motion Generation: Redundant Representations, Evaluation, and Masked Autoregression
Zichong Meng, Yiming Xie, Xiaogang Peng et al.
TSD-SR: One-Step Diffusion with Target Score Distillation for Real-World Image Super-Resolution
linwei dong, Qingnan Fan, Yihong Guo et al.
ElasticDiffusion: Training-free Arbitrary Size Image Generation through Global-Local Content Separation
Moayed Haji Ali, Guha Balakrishnan, Vicente Ordonez
SimAC: A Simple Anti-Customization Method for Protecting Face Privacy against Text-to-Image Synthesis of Diffusion Models
Feifei Wang, Zhentao Tan, Tianyi Wei et al.
MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation
Hanzhe Hu, Zhizhuo Zhou, Varun Jampani et al.
Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion
Lucas Nunes, Rodrigo Marcuzzi, Benedikt Mersch et al.
Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models
Xin Li, Yunfei Wu, Xinghua Jiang et al.
HIR-Diff: Unsupervised Hyperspectral Image Restoration Via Improved Diffusion Models
Li Pang, Xiangyu Rui, Long Cui et al.
Disentangled Prompt Representation for Domain Generalization
De Cheng, Zhipeng Xu, XINYANG JIANG et al.
DifFlow3D: Toward Robust Uncertainty-Aware Scene Flow Estimation with Iterative Diffusion-Based Refinement
Jiuming Liu, Guangming Wang, Weicai Ye et al.
Revisiting Single Image Reflection Removal In the Wild
Yurui Zhu, Bo Li, Xueyang Fu et al.
Question Aware Vision Transformer for Multimodal Reasoning
Roy Ganz, Yair Kittenplon, Aviad Aberdam et al.
SplatAD: Real-Time Lidar and Camera Rendering with 3D Gaussian Splatting for Autonomous Driving
Georg Hess, Carl Lindström, Maryam Fatemi et al.
VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation
XuDong Wang, Ishan Misra, Ziyun Zeng et al.
Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model
Zhicai Wang, Longhui Wei, Tan Wang et al.
Vision-Language Models Do Not Understand Negation
Kumail Alhamoud, Shaden Alshammari, Yonglong Tian et al.
Speedy-Splat: Fast 3D Gaussian Splatting with Sparse Pixels and Sparse Primitives
Alex Hanson, Allen Tu, Geng Lin et al.
Distilling Semantic Priors from SAM to Efficient Image Restoration Models
Quan Zhang, Xiaoyu Liu, Wei Li et al.
IDOL: Instant Photorealistic 3D Human Creation from a Single Image
Yiyu Zhuang, Jiaxi Lv, Hao Wen et al.
Mitigating Motion Blur in Neural Radiance Fields with Events and Frames
Marco Cannici, Davide Scaramuzza
View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network
Quan Zhang, Lei Wang, Vishal M. Patel et al.
Control4D: Efficient 4D Portrait Editing with Text
Ruizhi Shao, Jingxiang Sun, Cheng Peng et al.
Amodal Completion via Progressive Mixed Context Diffusion
Katherine Xu, Lingzhi Zhang, Jianbo Shi
FedSelect: Personalized Federated Learning with Customized Selection of Parameters for Fine-Tuning
Rishub Tamirisa, Chulin Xie, Wenxuan Bao et al.
DiffAM: Diffusion-based Adversarial Makeup Transfer for Facial Privacy Protection
Yuhao Sun, Lingyun Yu, Hongtao Xie et al.
Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator
Chaehun Shin, Jooyoung Choi, Heeseung Kim et al.
Re-thinking Temporal Search for Long-Form Video Understanding
Jinhui Ye, Zihan Wang, Haosen Sun et al.
How to Configure Good In-Context Sequence for Visual Question Answering
Li Li, Jiawei Peng, huiyi chen et al.
Communication-Efficient Federated Learning with Accelerated Client Gradient
Geeho Kim, Jinkyu Kim, Bohyung Han
Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark
Ziyang Chen, Israel D. Gebru, Christian Richardt et al.
Producing and Leveraging Online Map Uncertainty in Trajectory Prediction
Xunjiang Gu, Guanyu Song, Igor Gilitschenski et al.
GenZI: Zero-Shot 3D Human-Scene Interaction Generation
Lei Li, Angela Dai
VicTR: Video-conditioned Text Representations for Activity Recognition
Kumara Kahatapitiya, Anurag Arnab, Arsha Nagrani et al.
Score-Guided Diffusion for 3D Human Recovery
Anastasis Stathopoulos, Ligong Han, Dimitris N. Metaxas
Interactive Continual Learning: Fast and Slow Thinking
Biqing Qi, Xinquan Chen, Junqi Gao et al.
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
Hao Li, Changyao TIAN, Jie Shao et al.
ICP-Flow: LiDAR Scene Flow Estimation with ICP
Yancong Lin, Holger Caesar
Alchemist: Parametric Control of Material Properties with Diffusion Models
Prafull Sharma, Varun Jampani, Yuanzhen Li et al.
Collaborating Foundation Models for Domain Generalized Semantic Segmentation
Yasser Benigmim, Subhankar Roy, Slim Essid et al.
Towards General Visual-Linguistic Face Forgery Detection
Ke Sun, Shen Chen, Taiping Yao et al.
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
Yunhao Ge, Xiaohui Zeng, Jacob Huffman et al.
RobustSAM: Segment Anything Robustly on Degraded Images
Wei-Ting Chen, Yu Jiet Vong, Sy-Yen Kuo et al.
Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance
Dazhong Shen, Guanglu Song, Zeyue Xue et al.
Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture
Fei Wang, Dan Guo, Kun Li et al.
UnO: Unsupervised Occupancy Fields for Perception and Forecasting
Ben Agro, Quinlan Sykora, Sergio Casas et al.
Pow3R: Empowering Unconstrained 3D Reconstruction with Camera and Scene Priors
Wonbong Jang, Philippe Weinzaepfel, Vincent Leroy et al.
MNE-SLAM: Multi-Agent Neural SLAM for Mobile Robots
Tianchen Deng, Guole Shen, Chen Xun et al.
Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models
Shweta Mahajan, Tanzila Rahman, Kwang Moo Yi et al.
StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models
Yunzhi Yan, Zhen Xu, Haotong Lin et al.
LLMDet: Learning Strong Open-Vocabulary Object Detectors under the Supervision of Large Language Models
Shenghao Fu, Qize Yang, Qijie Mo et al.
High-fidelity Person-centric Subject-to-Image Synthesis
Yibin Wang, Weizhong Zhang, Jianwei Zheng et al.
MonoNPHM: Dynamic Head Reconstruction from Monocular Videos
Simon Giebenhain, Tobias Kirschstein, Markos Georgopoulos et al.
Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning
Shiming Chen, Wenjin Hou, Salman Khan et al.
ParaHome: Parameterizing Everyday Home Activities Towards 3D Generative Modeling of Human-Object Interactions
Jeonghwan Kim, Jisoo Kim, Jeonghyeon Na et al.
AnySat: One Earth Observation Model for Many Resolutions, Scales, and Modalities
Guillaume Astruc, Nicolas Gonthier, Clement Mallet et al.
One Diffusion to Generate Them All
Duong H. Le, Tuan Pham, Sangho Lee et al.
ADA-Track: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association
Shuxiao Ding, Lukas Schneider, Marius Cordts et al.
CoralSCOP: Segment any COral Image on this Planet
Zheng Ziqiang, Liang Haixin, Binh-Son Hua et al.
Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation
Yuanhong Chen, Yuyuan Liu, Hu Wang et al.
GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction
Xiao Chen, Quanyi Li, Tai Wang et al.
ExtDM: Distribution Extrapolation Diffusion Model for Video Prediction
Zhicheng Zhang, Junyao Hu, Wentao Cheng et al.
Neural Redshift: Random Networks are not Random Functions
Damien Teney, Armand Nicolicioiu, Valentin Hartmann et al.
Watermark-embedded Adversarial Examples for Copyright Protection against Diffusion Models
Peifei Zhu, Tsubasa Takahashi, Hirokatsu Kataoka
Active Generalized Category Discovery
Shijie Ma, Fei Zhu, Zhun Zhong et al.
Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention
Xingyu Zhou, Leheng Zhang, Xiaorui Zhao et al.
LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant
Wei Li, Bing Hu, Rui Shao et al.
CoGS: Controllable Gaussian Splatting
Heng Yu, Joel Julin, Zoltán Á. Milacski et al.
SPAR3D: Stable Point-Aware Reconstruction of 3D Objects from Single Images
Zixuan Huang, Mark Boss, Aaryaman Vasishta et al.
Simple Semantic-Aided Few-Shot Learning
Hai Zhang, Junzhe Xu, Shanlin Jiang et al.
Words or Vision: Do Vision-Language Models Have Blind Faith in Text?
Ailin Deng, Tri Cao, Zhirui Chen et al.
Learning Object State Changes in Videos: An Open-World Perspective
Zihui Xue, Kumar Ashutosh, Kristen Grauman
LiDAR4D: Dynamic Neural Fields for Novel Space-time View LiDAR Synthesis
Zehan Zheng, Fan Lu, Weiyi Xue et al.
PartGen: Part-level 3D Generation and Reconstruction with Multi-view Diffusion Models
Minghao Chen, Roman Shapovalov, Iro Laina et al.
AutoAD III: The Prequel – Back to the Pixels
Tengda Han, Max Bain, Arsha Nagrani et al.
Rethinking Generalizable Face Anti-spoofing via Hierarchical Prototype-guided Distribution Refinement in Hyperbolic Space
Chengyang Hu, Ke-Yue Zhang, Taiping Yao et al.
MAS: Multi-view Ancestral Sampling for 3D Motion Generation Using 2D Diffusion
Roy Kapon, Guy Tevet, Daniel Cohen-Or et al.
MonSter: Marry Monodepth to Stereo Unleashes Power
JunDa Cheng, Longliang Liu, Gangwei Xu et al.
Towards Generalizable Multi-Object Tracking
Zheng Qin, Le Wang, Sanping Zhou et al.
Towards a Simultaneous and Granular Identity-Expression Control in Personalized Face Generation
Renshuai Liu, Bowen Ma, Wei Zhang et al.
Diffusion Renderer: Neural Inverse and Forward Rendering with Video Diffusion Models
Ruofan Liang, Žan Gojčič, Huan Ling et al.
GreedyViG: Dynamic Axial Graph Construction for Efficient Vision GNNs
Mustafa Munir, William Avery, Md Mostafijur Rahman et al.
REACTO: Reconstructing Articulated Objects from a Single Video
Chaoyue Song, Jiacheng Wei, Chuan-Sheng Foo et al.
Physical Property Understanding from Language-Embedded Feature Fields
Albert J. Zhai, Yuan Shen, Emily Y. Chen et al.
Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction
Hao Li, Ying Chen, Yifei Chen et al.
3D-HGS: 3D Half-Gaussian Splatting
Haolin Li, Jinyang Liu, Mario Sznaier et al.
Stereo Anywhere: Robust Zero-Shot Deep Stereo Matching Even Where Either Stereo or Mono Fail
Luca Bartolomei, Fabio Tosi, Matteo Poggi et al.
HarmonyView: Harmonizing Consistency and Diversity in One-Image-to-3D
Sangmin Woo, byeongjun park, Hyojun Go et al.
Generative Gaussian Splatting for Unbounded 3D City Generation
Haozhe Xie, Zhaoxi Chen, Fangzhou Hong et al.
Three Pillars Improving Vision Foundation Model Distillation for Lidar
Gilles Puy, Spyros Gidaris, Alexandre Boulch et al.
OpenStreetView-5M: The Many Roads to Global Visual Geolocation
Guillaume Astruc, Nicolas Dufour, Ioannis Siglidis et al.
Transductive Zero-Shot and Few-Shot CLIP
Ségolène Martin, Yunshi HUANG, Fereshteh Shakeri et al.
AZ-NAS: Assembling Zero-Cost Proxies for Network Architecture Search
Junghyup Lee, Bumsub Ham
Fantastic Animals and Where to Find Them: Segment Any Marine Animal with Dual SAM
Pingping Zhang, Tianyu Yan, Yang Liu et al.
LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos
Tiantian Geng, Jinrui Zhang, Qingni Wang et al.
A Dynamic Kernel Prior Model for Unsupervised Blind Image Super-Resolution
Zhixiong Yang, Jingyuan Xia, Shengxi Li et al.
Inversion-Free Image Editing with Language-Guided Diffusion Models
Sihan Xu, Yidong Huang, Jiayi Pan et al.
Make It Count: Text-to-Image Generation with an Accurate Number of Objects
Lital Binyamin, Yoad Tewel, Hilit Segev et al.
SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
Hao Chen, Ze Wang, Xiang Li et al.
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Zeyue Tian, Zhaoyang Liu, Ruibin Yuan et al.
How Far Can We Compress Instant-NGP-Based NeRF?
Yihang Chen, Qianyi Wu, Mehrtash Harandi et al.
Material Palette: Extraction of Materials from a Single Image
Ivan Lopes, Fabio Pizzati, Raoul de Charette
Open-World Human-Object Interaction Detection via Multi-modal Prompts
Jie Yang, Bingliang Li, Ailing Zeng et al.
ExACT: Language-guided Conceptual Reasoning and Uncertainty Estimation for Event-based Action Recognition and More
Jiazhou Zhou, Xu Zheng, Yuanhuiyi Lyu et al.
PREGO: Online Mistake Detection in PRocedural EGOcentric Videos
Alessandro Flaborea, Guido M. D&, #x27 et al.
Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling
Shentong Mo, Pedro Morgado
Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding
seil kang, Jinyeong Kim, Junhyeok Kim et al.
Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation
Hyeonho Jeong, Chun-Hao P. Huang, Jong Chul Ye et al.
Contextrast: Contextual Contrastive Learning for Semantic Segmentation
Changki Sung, Wanhee Kim, Jungho An et al.
Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
Rui Qian, Shuangrui Ding, Xiaoyi Dong et al.
Complexity Experts are Task-Discriminative Learners for Any Image Restoration
Eduard Zamfir, Zongwei Wu, Nancy Mehta et al.
Atlantis: Enabling Underwater Depth Estimation with Stable Diffusion
Fan Zhang, Shaodi You, Yu Li et al.
SpecNeRF: Gaussian Directional Encoding for Specular Reflections
Li Ma, Vasu Agrawal, Haithem Turki et al.
ATP-LLaVA: Adaptive Token Pruning for Large Vision Language Models
Xubing Ye, Yukang Gan, Yixiao Ge et al.
FlowIE: Efficient Image Enhancement via Rectified Flow
Yixuan Zhu, Wenliang Zhao, Ao Li et al.
CAT: Exploiting Inter-Class Dynamics for Domain Adaptive Object Detection
Mikhail Kennerley, Jian-Gang Wang, Bharadwaj Veeravalli et al.
InterHandGen: Two-Hand Interaction Generation via Cascaded Reverse Diffusion
Jihyun Lee, Shunsuke Saito, Giljoo Nam et al.
Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning
xin zhang, Jiawei Du, Weiying Xie et al.