Most Cited ECCV Spotlight "pathology severity control" Papers
2,387 papers found • Page 1 of 12
Conference
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
Shilong Liu, Zhaoyang Zeng, Tianhe Ren et al.
YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information
Chien-Yao Wang, I-Hau Yeh, Hong-Yuan Mark Liao
Adversarial Diffusion Distillation
Axel Sauer, Dominik Lorenz, Andreas Blattmann et al.
LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation
Jiaxiang Tang, Zhaoxi Chen, Xiaokang Chen et al.
Grounding Image Matching in 3D with MASt3R
Vincent Leroy, Yohann Cabon, Jerome Revaud
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems?
Renrui Zhang, Dongzhi Jiang, Yichi Zhang et al.
CoTracker: It is Better to Track Together
Nikita Karaev, Ignacio Rocco, Ben Graham et al.
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers
Nanye Ma, Mark Goldstein, Michael Albergo et al.
MobileNetV4: Universal Models for the Mobile Ecosystem
Danfeng Qin, Chas Leichner, Manolis Delakis et al.
VideoMamba: State Space Model for Efficient Video Understanding
Kunchang Li, Xinhao Li, Yi Wang et al.
MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
Yuedong Chen, Haofei Xu, Chuanxia Zheng et al.
Evaluating Text-to-Visual Generation with Image-to-Text Generation
Zhiqiu Lin, Deepak Pathak, Baiqi Li et al.
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
Liang Chen, Haozhe Zhao, Tianyu Liu et al.
SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion
Vikram Voleti, Chun-Han Yao, Mark Boss et al.
BLINK: Multimodal Large Language Models Can See but Not Perceive
Xingyu Fu, Yushi Hu, Bangzheng Li et al.
FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting
Zehao Zhu, Zhiwen Fan, Yifan Jiang et al.
PointLLM: Empowering Large Language Models to Understand Point Clouds
Runsen Xu, Xiaolong Wang, Tai Wang et al.
DiffBIR: Toward Blind Image Restoration with Generative Diffusion Prior
Xinqi Lin, Jingwen He, Ziyan Chen et al.
Photorealistic Video Generation with Diffusion Models
Agrim Gupta, Lijun Yu, Kihyuk Sohn et al.
GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation
Yinghao Xu, Zifan Shi, Wang Yifan et al.
GS-LRM: Large Reconstruction Model for 3D Gaussian Splatting
Kai Zhang, Sai Bi, Hao Tan et al.
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Brandon McKinzie, Zhe Gan, Jean-Philippe Fauconnier et al.
Pixel-Aware Stable Diffusion for Realistic Image Super-Resolution and Personalized Stylization
Tao Yang, Rongyuan Wu, Peiran Ren et al.
DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving
Xiaofeng Wang, Zheng Zhu, Guan Huang et al.
Champ: Controllable and Consistent Human Image Animation with 3D Parametric Guidance
Shenhao Zhu, Junming Chen, Zuozhuo Dai et al.
Segment and Recognize Anything at Any Granularity
Feng Li, Hao Zhang, Peize Sun et al.
PixArt-Sigma: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
Junsong Chen, Chongjian GE, Enze Xie et al.
GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image
Xiao Fu, Wei Yin, Mu Hu et al.
EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
Linrui Tian, Qi Wang, Bang Zhang et al.
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
Yi Wang, Kunchang Li, Xinhao Li et al.
CRM: Single Image to 3D Textured Mesh with Convolutional Reconstruction Model
Zhengyi Wang, Yikai Wang, Yifei Chen et al.
Agent Attention: On the Integration of Softmax and Linear Attention
Dongchen Han, Tianzhu Ye, Yizeng Han et al.
ZigMa: A DiT-style Zigzag Mamba Diffusion Model
Tao Hu, Stefan Andreas Baumann, Ming Gui et al.
MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models
Xin Liu, Yichen Zhu, Jindong Gu et al.
CityGaussian: Real-time High-quality Large-Scale Scene Rendering with Gaussians
Yang Liu, Chuanchen Luo, Lue Fan et al.
HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression
Yihang Chen, Qianyi Wu, Weiyao Lin et al.
Mini-Splatting: Representing Scenes with a Constrained Number of Gaussians
Guangchi Fang, Bing Wang
SparseCtrl: Adding Sparse Controls to Text-to-Video Diffusion Models
Yuwei Guo, Ceyuan Yang, Anyi Rao et al.
Sapiens: Foundation for Human Vision Models
Rawal Khirodkar, Timur Bagautdinov, Julieta Martinez et al.
LLaVA-UHD: an LMM Perceiving any Aspect Ratio and High-Resolution Images
Zonghao Guo, Ruyi Xu, Yuan Yao et al.
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs
Viraj Shah, Nataniel Ruiz, Forrester Cole et al.
OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving
Wenzhao Zheng, Weiliang Chen, Yuanhui Huang et al.
BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion
Xuan JU, Xian Liu, Xintao Wang et al.
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs
Keen You, Haotian Zhang, Eldon Schoop et al.
A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting
Junhao Zhuang, Yanhong Zeng, WENRAN LIU et al.
Generative End-to-End Autonomous Driving
Wenzhao Zheng, Ruiqi Song, Xianda Guo et al.
Street Gaussians: Modeling Dynamic Urban Scenes with Gaussian Splatting
Yunzhi Yan, Haotong Lin, Chenxu Zhou et al.
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback
Ming Li, Taojiannan Yang, Huafeng Kuang et al.
Physics-Based Interaction with 3D Objects via Video Generation
Tianyuan Zhang, Hong-Xing Yu, Rundi Wu et al.
Rotary Position Embedding for Vision Transformer
Byeongho Heo, Song Park, Dongyoon Han et al.
SGS-SLAM: Semantic Gaussian Splatting For Neural Dense SLAM
Mingrui Li, Shuhong Liu, Heng Zhou et al.
LongVLM: Efficient Long Video Understanding via Large Language Models
Yuetian Weng, Mingfei Han, Haoyu He et al.
LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model
Dilxat Muhtar, Zhenshi Li, Feng Gu et al.
UniIR: Training and Benchmarking Universal Multimodal Information Retrievers
Cong Wei, Yang Chen, Haonan Chen et al.
Dolphins: Multimodal Language Model for Driving
Yingzi Ma, Yulong Cao, Jiachen Sun et al.
ST-LLM: Large Language Models Are Effective Temporal Learners
Ruyang Liu, Chen Li, Haoran Tang et al.
Paying More Attention to Images: A Training-Free Method for Alleviating Hallucination in LVLMs
Shi Liu, Kecheng Zheng, Wei Chen
Drag Anything: Motion Control for Anything using Entity Representation
Weijia Wu, Zhuang Li, Yuchao Gu et al.
SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
Feng Wang, Jieru Mei, Alan Yuille
Concept Sliders: LoRA Adaptors for Precise Control in Diffusion Models
Rohit Gandikota, Joanna Materzynska, Tingrui Zhou et al.
DiffiT: Diffusion Vision Transformers for Image Generation
Ali Hatamizadeh, Jiaming Song, Guilin Liu et al.
MotionLCM: Real-time Controllable Motion Generation via Latent Consistency Model
Wenxun Dai, Ling-Hao Chen, Jingbo Wang et al.
InstructIR: High-Quality Image Restoration Following Human Instructions
Marcos Conde, Gregor Geigle, Radu Timofte
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Hao Zhang, Hongyang Li, Feng Li et al.
SEA-RAFT: Simple, Efficient, Accurate RAFT for Optical Flow
Yihan Wang, Lahav Lipson, Jia Deng
Implicit Style-Content Separation using B-LoRA
Yarden Frenkel, Yael Vinker, Ariel Shamir et al.
DynMF: Neural Motion Factorization for Real-time Dynamic View Synthesis with 3D Gaussian Splatting
Angelos Kratimenos, Jiahui Lei, Kostas Daniilidis
ShapeLLM: Universal 3D Object Understanding for Embodied Interaction
Zekun Qi, Runpei Dong, Shaochen Zhang et al.
Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving
Ming Nie, Renyuan Peng, Chunwei Wang et al.
IRSAM: Advancing Segment Anything Model for Infrared Small Target Detection
Mingjin Zhang, Yuchun Wang, Jie Guo et al.
Gaussian in the wild: 3D Gaussian Splatting for Unconstrained Image Collections
Dongbin Zhang, Chuming Wang, Weitao Wang et al.
Motion Mamba: Efficient and Long Sequence Motion Generation
Zeyu Zhang, Akide Liu, Ian Reid et al.
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
Chuofan Ma, Yi Jiang, Jiannan Wu et al.
CompGS: Smaller and Faster Gaussian Splatting with Vector Quantization
K L Navaneet, Kossar Pourahmadi, Soroush Abbasi Koohpayegani et al.
ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation
Guanxing Lu, Shiyi Zhang, Ziwei Wang et al.
ReNoise: Real Image Inversion Through Iterative Noising
Daniel Garibi, Or Patashnik, Andrey Voynov et al.
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation
Shaowei Liu, Zhongzheng Ren, Saurabh Gupta et al.
TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering
Jingye Chen, Yupan Huang, Tengchao Lv et al.
How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs
Haoqin Tu, Chenhang Cui, Zijun Wang et al.
Mamba-ND: Selective State Space Modeling for Multi-Dimensional Data
Shufan Li, Aditya Grover, Harkanwar Singh
latentSplat: Autoencoding Variational Gaussians for Fast Generalizable 3D Reconstruction
Christopher Wewer, Kevin Raj, Eddy Ilg et al.
MVDiffHD: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction
Shitao Tang, Jiacheng Chen, Dilin Wang et al.
DiffusionDepth: Diffusion Denoising Approach for Monocular Depth Estimation
Yiqun Duan, Xianda Guo, Zheng Zhu
Model Breadcrumbs: Scaling Multi-Task Model Merging with Sparse Masks
MohammadReza Davari, Eugene Belilovsky
LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models
Hai Jiang, Ao Luo, Xiaohong Liu et al.
DOCCI: Descriptions of Connected and Contrasting Images
Yasumasa Onoe, Sunayana Rane, Zachary E Berger et al.
Pixel-GS Density Control with Pixel-aware Gradient for 3D Gaussian Splatting
Zheng Zhang, WENBO HU, Yixing Lao et al.
Revising Densification in Gaussian Splatting
Samuel Rota Bulò, Lorenzo Porzi, Peter Kontschieder
GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction
Yuanhui Huang, Wenzhao Zheng, Yunpeng Zhang et al.
Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models
Yifan Li, hangyu guo, Kun Zhou et al.
VISA: Reasoning Video Object Segmentation via Large Language Model
Cilin Yan, haochen wang, Shilin Yan et al.
Towards Open-ended Visual Quality Comparison
Haoning Wu, Hanwei Zhu, Zicheng Zhang et al.
STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians
Yifei Zeng, Yanqin Jiang, Siyu Zhu et al.
Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance
Liting Lin, Heng Fan, Zhipeng Zhang et al.
Depicting Beyond Scores: Advancing Image Quality Assessment through Multi-modal Language Models
Zhiyuan You, Zheyuan Li, Jinjin Gu et al.
CoR-GS: Sparse-View 3D Gaussian Splatting via Co-Regularization
Jiawei Zhang, Jiahe Li, Xiaohan Yu et al.
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Haoran Wei, Lingyu Kong, Jinyue Chen et al.
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
Weiyun Wang Weiyun, yiming ren, Haowen Luo et al.
AutoDIR: Automatic All-in-One Image Restoration with Latent Diffusion
yitong jiang, Zhaoyang Zhang, Tianfan Xue et al.
T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
Qing Jiang, Feng Li, Zhaoyang Zeng et al.
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
Zhening Huang, Xiaoyang Wu, Xi Chen et al.
Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot Manipulation
Yuanchen Ju, Kaizhe Hu, Guowei Zhang et al.
PSALM: Pixelwise Segmentation with Large Multi-modal Model
Zheng Zhang, YeYao Ma, Enming Zhang et al.
Octopus: Embodied Vision-Language Programmer from Environmental Feedback
Jingkang Yang, Yuhao Dong, Shuai Liu et al.
Arc2Face: A Foundation Model for ID-Consistent Human Faces
Foivos Paraperas Papantoniou, Alexandros Lattas, Stylianos Moschoglou et al.
CG-SLAM: Efficient Dense RGB-D SLAM in a Consistent Uncertainty-aware 3D Gaussian Field
Jiarui Hu, Xianhao Chen, Boyin Feng et al.
NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
Gengze Zhou, Yicong Hong, Zun Wang et al.
Deblurring 3D Gaussian Splatting
Byeonghyeon Lee, Howoong Lee, Xiangyu Sun et al.
TLControl: Trajectory and Language Control for Human Motion Synthesis
WEILIN WAN, Zhiyang Dou, Taku Komura et al.
EMDM: Efficient Motion Diffusion Model for Fast, High-Quality Human Motion Generation
Wenyang Zhou, Zhiyang Dou, Zeyu Cao et al.
Distilling Diffusion Models into Conditional GANs
Minguk Kang, Richard Zhang, Connelly Barnes et al.
LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation
Yushi Lan, Fangzhou Hong, Shuai Yang et al.
A Unified Anomaly Synthesis Strategy with Gradient Ascent for Industrial Anomaly Detection and Localization
Qiyu Chen, Huiyuan Luo, Chengkan Lv et al.
Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization
Renjie Pi, Tianyang Han, Wei Xiong et al.
BAD-Gaussians: Bundle Adjusted Deblur Gaussian Splatting
Lingzhe Zhao, Peng Wang, Peidong Liu
V2X-Real: a Largs-Scale Dataset for Vehicle-to-Everything Cooperative Perception
Hao Xiang, Xin Xia, Zhaoliang Zheng et al.
Per-Gaussian Embedding-Based Deformation for Deformable 3D Gaussian Splatting
Jeongmin Bae, Seoha Kim, Youngsik Yun et al.
Improving Diffusion Models for Authentic Virtual Try-on in the Wild
Choi Yisol, Sangkyung Kwak, Kyungmin Lee et al.
Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation
Homanga Bharadhwaj, Roozbeh Mottaghi, Abhinav Gupta et al.
SMFANet: A Lightweight Self-Modulation Feature Aggregation Network for Efficient Image Super-Resolution
mingjun zheng, Long Sun, Jiangxin Dong et al.
Model Stock: All we need is just a few fine-tuned models
Dong-Hwan Jang, Sangdoo Yun, Dongyoon Han
Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation
Zhenliang Ni, Xinghao Chen, Yingjie Zhai et al.
When Do We Not Need Larger Vision Models?
Baifeng Shi, Ziyang Wu, Maolin Mao et al.
RGBD GS-ICP SLAM
Seongbo Ha, Jiung Yeon, Hyeonwoo Yu
Large-scale Reinforcement Learning for Diffusion Models
Yinan Zhang, Eric Tzeng, Yilun Du et al.
Frequency-Spatial Entanglement Learning for Camouflaged Object Detection
Yanguang Sun, Chunyan Xu, Jian Yang et al.
GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian Splatting
XINJIE ZHANG, Xingtong Ge, Tongda Xu et al.
ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference
Mengcheng Lan, Chaofeng Chen, Yiping Ke et al.
DreamScene360: Unconstrained Text-to-3D Scene Generation with Panoramic Gaussian Splatting
Shijie Zhou, Zhiwen Fan, Dejia Xu et al.
Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer
Eric Brachmann, Jamie Wynn, Shuai Chen et al.
End-to-End Rate-Distortion Optimized 3D Gaussian Representation
Henan Wang, Hanxin Zhu, Tianyu He et al.
TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos
Yufu Wang, Ziyun Wang, Lingjie Liu et al.
OneRestore: A Universal Restoration Framework for Composite Degradation
Yu Guo, Yuan Gao, Yuxu Lu et al.
Attention-Challenging Multiple Instance Learning for Whole Slide Image Classification
Yunlong Zhang, Honglin Li, YUXUAN SUN et al.
TC4D: Trajectory-Conditioned Text-to-4D Generation
Sherwin Bahmani, Xian Liu, Wang Yifan et al.
Unifying 3D Vision-Language Understanding via Promptable Queries
ziyu zhu, Zhuofan Zhang, Xiaojian Ma et al.
Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot
Fabien Baradel, Thomas Lucas, Matthieu Armando et al.
GIVT: Generative Infinite-Vocabulary Transformers
Michael Tschannen, Cian Eastwood, Fabian Mentzer
Language-Image Pre-training with Long Captions
Kecheng Zheng, Yifei Zhang, Wei Wu et al.
GS2Mesh: Surface Reconstruction from Gaussian Splatting via Novel Stereo Views
Yaniv Wolf, Amit Bracha, Ron Kimmel
Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery
Sukrut Rao, Sweta Mahajan, Moritz Böhle et al.
SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition
Jeonghyeok Do, Munchurl Kim
Relation DETR: Exploring Explicit Position Relation Prior for Object Detection
Xiuquan Hou, Meiqin Liu, Senlin Zhang et al.
DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video
Narek Tumanyan, Assaf Singer, Shai Bagon et al.
Large Motion Model for Unified Multi-Modal Motion Generation
Mingyuan Zhang, Daisheng Jin, Chenyang Gu et al.
GraspXL: Generating Grasping Motions for Diverse Objects at Scale
Hui Zhang, Sammy Christen, Zicong Fan et al.
TalkingGaussian: Structure-Persistent 3D Talking Head Synthesis via Gaussian Splatting
Jiahe Li, Jiawei Zhang, Xiao Bai et al.
VeCLIP: Improving CLIP Training via Visual-enriched Captions
Zhengfeng Lai, Haotian Zhang, Bowen Zhang et al.
Diffusion Models for Open-Vocabulary Segmentation
Laurynas Karazija, Iro Laina, Andrea Vedaldi et al.
SILC: Improving Vision Language Pretraining with Self-Distillation
Muhammad Ferjad Naeem, Yongqin Xian, Xiaohua Zhai et al.
ControlLLM: Augment Language Models with Tools by Searching on Graphs
Zhaoyang Liu, Zeqiang Lai, Zhangwei Gao et al.
SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation
Yi-Chia Chen, WeiHua Li, Cheng Sun et al.
GLAD: Towards Better Reconstruction with Global and Local Adaptive Diffusion Models for Unsupervised Anomaly Detection
hang yao, Ming LIU, Zhicun Yin et al.
ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion
Daniel Winter, Matan Cohen, Shlomi Fruchter et al.
DQ-DETR: DETR with Dynamic Query for Tiny Object Detection
Yi-Xin Huang, Hou-I Liu, Hong-Han Shuai et al.
Lane Graph as Path: Continuity-preserving Path-wise Modeling for Online Lane Graph Construction
Bencheng Liao, Shaoyu Chen, Bo Jiang et al.
Improving 2D Feature Representations by 3D-Aware Fine-Tuning
Yuanwen Yue, Anurag Das, Francis Engelmann et al.
OmniSat: Self-Supervised Modality Fusion for Earth Observation
Guillaume Astruc, Nicolas Gonthier, Clement Mallet et al.
VCP-CLIP: A visual context prompting model for zero-shot anomaly segmentation
Zhen Qu, Xian Tao, Mukesh Prasad et al.
FlashTex: Fast Relightable Mesh Texturing with LightControlNet
Kangle Deng, Timothy Omernick, Alexander B Weiss et al.
IntrinsicAnything: Learning Diffusion Priors for Inverse Rendering Under Unknown Illumination
Xi Chen, Sida Peng, Dongchen Yang et al.
MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping
Jiacheng Chen, Yuefan Wu, Tan Jiaqi et al.
Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control
Yue Han, Junwei Zhu, Keke He et al.
A Comparative Study of Image Restoration Networks for General Backbone Network Design
Xiangyu Chen, Zheyuan Li, Yuandong Pu et al.
Latent Guard: a Safety Framework for Text-to-image Generation
Runtao Liu, Ashkan Khakzar, Jindong Gu et al.
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering
Zeyu Liu, Weicong Liang, Zhanhao Liang et al.
VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models
Junlin Han, Filippos Kokkinos, Philip Torr
GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes
Ibrahim Ethem Hamamci, Sezgin Er, Anjany Sekuboyina et al.
LaRa: Efficient Large-Baseline Radiance Fields
Anpei Chen, Haofei Xu, Stefano Esposito et al.
HiT-SR: Hierarchical Transformer for Efficient Image Super-Resolution
Xiang Zhang, Yulun Zhang, Fisher Yu
GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering
Yanyan Li, Chenyu Lyu, Yan Di et al.
ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions
Anindita Ghosh, Rishabh Dabral, Vladislav Golyanik et al.
GVGEN: Text-to-3D Generation with Volumetric Representation
Xianglong He, Junyi Chen, Sida Peng et al.
FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis
Linjiang Huang, Rongyao Fang, Aiping Zhang et al.
HeadGaS: Real-Time Animatable Head Avatars via 3D Gaussian Splatting
Helisa Dhamo, Yinyu Nie, Arthur Moreau et al.
CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenarios
Qilang Ye, Zitong Yu, Rui Shao et al.
Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training
David Wan, Jaemin Cho, Elias Stengel-Eskin et al.
Challenging Forgets: Unveiling the Worst-Case Forget Sets in Machine Unlearning
Chongyu Fan, Jiancheng Liu, Alfred Hero et al.
Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation
Lanqing Guo, Yingqing He, Haoxin Chen et al.
Leveraging Enhanced Queries of Point Sets for Vectorized Map Construction
Zihao Liu, Xiaoyu Zhang, Guangwei Liu et al.
TIBET: Identifying and Evaluating Biases in Text-to-Image Generative Models
Aditya Aravind Chinchure, Pushkar Shukla, Gaurav Bhatt et al.
ReMamber: Referring Image Segmentation with Mamba Twister
Yuhuan Yang, Chaofan Ma, Jiangchao Yao et al.
VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models
Shicheng Li, Lei Li, Yi Liu et al.
LCM-Lookahead for Encoder-based Text-to-Image Personalization
Rinon Gal, Or Lichter, Elad Richardson et al.
UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation
Zexiang Liu, Yangguang Li, Youtian Lin et al.
CoMo: Controllable Motion Generation through Language Guided Pose Code Editing
Yiming Huang, WEILIN WAN, Yue Yang et al.
Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector
Yuqian Fu, Yu Wang, Yixuan Pan et al.
DGInStyle: Domain-Generalizable Semantic Segmentation with Image Diffusion Models and Stylized Semantic Control
Yuru Jia, Lukas Hoyer, Shengyu Huang et al.
UniTraj: A Unified Framework for Scalable Vehicle Trajectory Prediction
Lan Feng, Mohammadhossein Bahari, Kaouther Messaoud et al.
UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models
Yiming Zhao, Zhouhui Lian
Customize-A-Video: One-Shot Motion Customization of Text-to-Video Diffusion Models
Yixuan Ren, Yang Zhou, Jimei Yang et al.
Portrait4D-v2: Pseudo Multi-View Data Creates Better 4D Head Synthesizer
Yu Deng, Duomin Wang, Baoyuan Wang
DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM
Yixuan Wu, Yizhou Wang, Shixiang Tang et al.
Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention
Jie Ren, Yaxin Li, Shenglai Zeng et al.
When Fast Fourier Transform Meets Transformer for Image Restoration
xingyu jiang, Xiuhui Zhang, Ning Gao et al.
PhysAvatar: Learning the Physics of Dressed 3D Avatars from Visual Observations
Yang Zheng, Qingqing Zhao, Guandao Yang et al.
FlashSplat: 2D to 3D Gaussian Splatting Segmentation Solved Optimally
Qiuhong Shen, Xingyi Yang, Xinchao Wang
ComboVerse: Compositional 3D Assets Creation Using Spatially-Aware Diffusion Guidance
Yongwei Chen, Tengfei Wang, Tong Wu et al.
Explore the Potential of CLIP for Training-Free Open Vocabulary Semantic Segmentation
Tong Shao, Zhuotao Tian, Hang Zhao et al.
BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
Rizhao Cai, Zirui Song, DAYAN GUAN et al.