Most Cited 2025 Poster Papers
22,274 papers found • Page 35 of 112
Conference
Decouple Distortion from Perception: Region Adaptive Diffusion for Extreme-low Bitrate Perception Image Compression
Jinchang Xu, Shaokang Wang, Jintao Chen et al.
CReFT-CAD: Boosting Orthographic Projection Reasoning for CAD via Reinforcement Fine-Tuning
Ke Niu, Zhuofan Chen, Haiyang Yu et al.
CrossAD: Time Series Anomaly Detection with Cross-scale Associations and Cross-window Modeling
Beibu Li, Qichao Shentu, Yang Shu et al.
Hallucinatory Image Tokens: A Training-free EAZY Approach to Detecting and Mitigating Object Hallucinations in LVLMs
Liwei Che, Qingze T Liu, Jing Jia et al.
SC-Captioner: Improving Image Captioning with Self-Correction by Reinforcement Learning
Lin Zhang, Xianfang Zeng, Kangcong Li et al.
SPAZER: Spatial-Semantic Progressive Reasoning Agent for Zero-shot 3D Visual Grounding
Zhao Jin, Rong-Cheng Tu, Jingyi Liao et al.
CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models
Quang-Binh Nguyen, Minh Luu, Quang Nguyen et al.
Sampling from multi-modal distributions with polynomial query complexity in fixed dimension via reverse diffusion
Adrien Vacher, Omar Chehab, Anna Korba
FACE: Faithful Automatic Concept Extraction
Dipkamal Bhusal, Michael Clifford, Sara Rampazzi et al.
Entropic Time Schedulers for Generative Diffusion Models
Dejan Stancevic, Florian Handke, Luca Ambrogioni
Sufficient Invariant Learning for Distribution Shift
Taero Kim, Subeen Park, Sungjun Lim et al.
Differentiation Through Black-Box Quadratic Programming Solvers
Connor Magoon, Fengyu Yang, Noam Aigerman et al.
Integrating Visual Interpretation and Linguistic Reasoning for Geometric Problem Solving
Zixian Guo, Ming Liu, Qilong Wang et al.
$\boldsymbol{\lambda}$-Orthogonality Regularization for Compatible Representation Learning
Simone Ricci, Niccolò Biondi, Federico Pernici et al.
Next Semantic Scale Prediction via Hierarchical Diffusion Language Models
Cai Zhou, Chenyu Wang, Dinghuai Zhang et al.
Tracktention: Leveraging Point Tracking to Attend Videos Faster and Better
Zihang Lai, Andrea Vedaldi
Styl3R: Instant 3D Stylized Reconstruction for Arbitrary Scenes and Styles
Peng Wang, Xiang Liu, Peidong Liu
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance
Ruihang Chu, Yefei He, Zhekai Chen et al.
Joint Diffusion Models in Continual Learning
Paweł Skierś, Kamil Deja
GT-Loc: Unifying When and Where in Images through a Joint Embedding Space
David G. Shatwell, Ishan Rajendrakumar Dave, Swetha Sirnam et al.
MV-CoLight: Efficient Object Compositing with Consistent Lighting and Shadow Generation
Kerui Ren, Jiayang Bai, Linning Xu et al.
GenPO: Generative Diffusion Models Meet On-Policy Reinforcement Learning
Shutong Ding, Ke Hu, Shan Zhong et al.
VITRIX-UniViTAR: Unified Vision Transformer with Native Resolution
Limeng Qiao, Yiyang Gan, Bairui Wang et al.
ATA: Adaptive Transformation Agent for Text-Guided Subject-Position Variable Background Inpainting
Yizhe Tang, Zhimin Sun, Yuzhen Du et al.
GG-SSMs: Graph-Generating State Space Models
Nikola Zubic, Davide Scaramuzza
Dynamic View Synthesis as an Inverse Problem
Hidir Yesiltepe, Pinar Yanardag
PMA: Towards Parameter-Efficient Point Cloud Understanding via Point Mamba Adapter
Yaohua Zha, Yanzi Wang, Hang Guo et al.
Convergent Functions, Divergent Forms
Hyeonseong Jeon, Ainaz Eftekhar, Aaron Walsman et al.
Learning to Better Search with Language Models via Guided Reinforced Self-Training
Seungyong Moon, Bumsoo Park, Hyun Oh Song
HumanMM: Global Human Motion Recovery from Multi-shot Videos
Yuhong Zhang, Guanlin Wu, Ling-Hao Chen et al.
Bootstrapping Grounded Chain-of-Thought in Multimodal LLMs for Data-Efficient Model Adaptation
Jiaer Xia, Bingkui Tong, Yuhang Zang et al.
Reading Recognition in the Wild
Charig Yang, Samiul Alam, Shakhrul Iman Siam et al.
Heavy Labels Out! Dataset Distillation with Label Space Lightening
Ruonan Yu, Songhua Liu, Zigeng Chen et al.
VolumetricSMPL: A Neural Volumetric Body Model for Efficient Interactions, Contacts, and Collisions
Marko Mihajlovic, Siwei Zhang, Gen Li et al.
Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis
Boming Miao, Chunxiao Li, Xiaoxiao Wang et al.
Conditional Balance: Improving Multi-Conditioning Trade-Offs in Image Generation
Nadav Z. Cohen, Oron Nir, Ariel Shamir
EA-KD: Entropy-based Adaptive Knowledge Distillation
Chi-Ping Su, Ching-Hsun Tseng, Bin Pu et al.
Seeing What Matters: Generalizable AI-generated Video Detection with Forensic-Oriented Augmentation
Riccardo Corvi, Davide Cozzolino, Ekta Prashnani et al.
I Am Big, You Are Little; I Am Right, You Are Wrong
David A Kelly, Akchunya Chanchal, Nathan Blake
TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation
Zonglin Lyu, Chen Chen
Generate, Refine, and Encode: Leveraging Synthesized Novel Samples for On-the-Fly Fine-Grained Category Discovery
Xiao Liu, Nan Pu, Haiyang Zheng et al.
Deep learning for continuous-time stochastic control with jumps
Patrick Cheridito, Jean-Loup Dupret, Donatien Hainaut
Surprise3D: A Dataset for Spatial Understanding and Reasoning in Complex 3D Scenes
Jiaxin Huang, Ziwen Li, Hanlue Zhang et al.
BRACE: A Benchmark for Robust Audio Caption Quality Evaluation
Tianyu Guo, Hongyu Chen, Hao Liang et al.
HybridMQA: Exploring Geometry-Texture Interactions for Colored Mesh Quality Assessment
Armin Shafiee Sarvestani, Sheyang Tang, Zhou Wang
SpecEdge: Scalable Edge-Assisted Serving Framework for Interactive LLMs
Jinwoo Park, Seunggeun Cho, Dongsu Han
Self-Refining Language Model Anonymizers via Adversarial Distillation
Kyuyoung Kim, Hyunjun Jeon, Jinwoo Shin
ReDi: Rectified Discrete Flow
Jaehoon Yoo, Wonjung Kim, Seunghoon Hong
Attention! Your Vision Language Model Could Be Maliciously Manipulated
Xiaosen Wang, Shaokang Wang, Zhijin Ge et al.
AdaDetectGPT: Adaptive Detection of LLM-Generated Text with Statistical Guarantees
Hongyi Zhou, Jin Zhu, Pingfan Su et al.
DanmakuTPPBench: A Multi-modal Benchmark for Temporal Point Process Modeling and Understanding
Yue Jiang, Jichu Li, Yang Liu et al.
GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models
Jonathan Roberts, Kai Han, Samuel Albanie
Efficient Policy Optimization in Robust Constrained MDPs with Iteration Complexity Guarantees
Sourav Ganguly, Kishan Panaganti, Arnob Ghosh et al.
SGFormer: Satellite-Ground Fusion for 3D Semantic Scene Completion
Xiyue Guo, Jiarui Hu, Junjie Hu et al.
Visual Modality Prompt for Adapting Vision-Language Object Detectors
Heitor Rapela Medeiros, Atif Belal, Srikanth Muralidharan et al.
Is `Right' Right? Enhancing Object Orientation Understanding in Multimodal Large Language Models through Egocentric Instruction Tuning
JiHyeok Jung, EunTae Kim, SeoYeon Kim et al.
Efficient Unsupervised Shortcut Learning Detection and Mitigation in Transformers
Lukas Kuhn, sari sadiya, Jörg Schlötterer et al.
FreqPolicy: Efficient Flow-based Visuomotor Policy via Frequency Consistency
Yifei Su, Ning Liu, Dong Chen et al.
From Sequence to Structure: Uncovering Substructure Reasoning in Transformers
Xinnan Dai, Kai Yang, Jay Revolinsky et al.
GeoComplete: Geometry-Aware Diffusion for Reference-Driven Image Completion
Beibei Lin, Tingting Chen, Robby Tan
HCRMP: An LLM-Hinted Contextual Reinforcement Learning Framework for Autonomous Driving
Zhiwen Chen, Hanming Deng, Zhuoren Li et al.
Multi-focal Conditioned Latent Diffusion for Person Image Synthesis
Jiaqi Liu, Jichao Zhang, Paolo Rota et al.
TriDi: Trilateral Diffusion of 3D Humans, Objects, and Interactions
Ilya A. Petrov, Riccardo Marin, Julian Chibane et al.
Reinforcement Learning for Out-of-Distribution Reasoning in LLMs: An Empirical Study on Diagnosis-Related Group Coding
Hanyin Wang, Zhenbang Wu, Gururaj Kolar et al.
A Stable Whitening Optimizer for Efficient Neural Network Training
Kevin Frans, Sergey Levine, Pieter Abbeel
HumanSAM: Classifying Human-centric Forgery Videos in Human Spatial, Appearance, and Motion Anomaly
Chang Liu, Yunfan Ye, Fan Zhang et al.
EvoLM: In Search of Lost Language Model Training Dynamics
Zhenting Qi, Fan Nie, Alexandre Alahi et al.
Who You Are Matters: Bridging Interests and Social Roles via LLM-Enhanced Logic Recommendation
Qing Yu, Xiaobei Wang, Shuchang Liu et al.
Fairshare Data Pricing via Data Valuation for Large Language Models
Luyang Zhang, Cathy Jiao, Beibei Li et al.
Scale Efficient Training for Large Datasets
Qing Zhou, Junyu Gao, Qi Wang
Adaptive Distraction: Probing LLM Contextual Robustness with Automated Tree Search
Yanbo Wang, Zixiang Xu, Yue Huang et al.
SceneMI: Motion In-betweening for Modeling Human-Scene Interaction
Inwoo Hwang, Bing Zhou, Young Min Kim et al.
CAPability: A Comprehensive Visual Caption Benchmark for Evaluating Both Correctness and Thoroughness
Zhihang Liu, Chen-Wei Xie, Bin Wen et al.
Zero-Shot Trajectory Planning for Signal Temporal Logic Tasks
Ruijia Liu, Ancheng Hou, Xiao Yu et al.
Comprehensive Relighting: Generalizable and Consistent Monocular Human Relighting and Harmonization
Junying Wang, Jingyuan Liu, Xin Sun et al.
MOOSE-Chem2: Exploring LLM Limits in Fine-Grained Scientific Hypothesis Discovery via Hierarchical Search
Zonglin Yang, Wanhao Liu, Ben Gao et al.
Trans-EnV: A Framework for Evaluating the Linguistic Robustness of LLMs Against English Varieties
Jiyoung Lee, Seungho Kim, Jieun Han et al.
Dataset Distillation via Vision-Language Category Prototype
YAWEN ZOU, Guang Li, Duo Su et al.
Parallelizing MCMC Across the Sequence Length
David Zoltowski, Skyler Wu, Xavier Gonzalez et al.
RoboTron-Nav: A Unified Framework for Embodied Navigation Integrating Perception, Planning, and Prediction
Yufeng Zhong, Chengjian Feng, Feng yan et al.
Zero-shot RGB-D Point Cloud Registration with Pre-trained Large Vision Model
Haobo Jiang, Jin Xie, Jian Yang et al.
FREE-Merging: Fourier Transform for Efficient Model Merging
Shenghe Zheng, Hongzhi Wang
UniPhy: Learning a Unified Constitutive Model for Inverse Physics Simulation
Himangi Mittal, Peiye Zhuang, Hsin-Ying Lee et al.
ATLAS: Decoupling Skeletal and Shape Parameters for Expressive Parametric Human Modeling
Jinhyung Park, Javier Romero, Shunsuke Saito et al.
Identifiability of Deep Polynomial Neural Networks
Konstantin Usevich, Ricardo Borsoi, Clara Dérand et al.
On the Generalization of Representation Uncertainty in Earth Observation
Spyros Kondylatos, Nikolaos Ioannis Bountos, Dimitrios Michail et al.
Predict-Optimize-Distill: A Self-Improving Cycle for 4D Object Understanding
Mingxuan Wu, Huang Huang, Justin Kerr et al.
ImViD: Immersive Volumetric Videos for Enhanced VR Engagement
Zhengxian Yang, Shi Pan, Shengqi Wang et al.
Jigsaw++: Imagining Complete Shape Priors for Object Reassembly
Jiaxin Lu, Gang Hua, Qixing Huang
SMMILE: An expert-driven benchmark for multimodal medical in-context learning
Melanie Rieff, Maya Varma, Ossian Rabow et al.
Towards Unbiased and Robust Spatio-Temporal Scene Graph Generation and Anticipation
Rohith Peddi, Saurabh ., Ayush Abhay Shrivastava et al.
ARGUS: Hallucination and Omission Evaluation in Video-LLMs
Ruchit Rawal, Reza Shirkavand, Heng Huang et al.
In the Eye of MLLM: Benchmarking Egocentric Video Intent Understanding with Gaze-Guided Prompting
Taiying Peng, Jiacheng Hua, Miao Liu et al.
NADER: Neural Architecture Design via Multi-Agent Collaboration
Zekang Yang, Wang ZENG, Sheng Jin et al.
CSI-Bench: A Large-Scale In-the-Wild Dataset for Multi-task WiFi Sensing
Guozhen Zhu, Yuqian Hu, Weihang Gao et al.
BoltzNCE: Learning likelihoods for Boltzmann Generation with Stochastic Interpolants and Noise Contrastive Estimation
Rishal Aggarwal, Jacky Chen, Nicholas Boffi et al.
A Flag Decomposition for Hierarchical Datasets
Nathan Mankovich, Ignacio Santamaria, Gustau Camps-Valls et al.
From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration
Mingyang Song, Xiaoye Qu, Jiawei Zhou et al.
On Fairness of Unified Multimodal Large Language Model for Image Generation
Ming Liu, Hao Chen, Jindong Wang et al.
DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long Videos
Zijia Lu, ASM Iftekhar, Gaurav Mittal et al.
STEP: A Unified Spiking Transformer Evaluation Platform for Fair and Reproducible Benchmarking
Sicheng Shen, Dongcheng Zhao, Linghao Feng et al.
3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection
Yung-Hsu Yang, Luigi Piccinelli, Mattia Segu et al.
SUM Parts: Benchmarking Part-Level Semantic Segmentation of Urban Meshes
Weixiao Gao, Liangliang Nan, Hugo Ledoux
Olympus: A Universal Task Router for Computer Vision Tasks
Yuanze Lin, Yunsheng Li, Dongdong Chen et al.
SViMo: Synchronized Diffusion for Video and Motion Generation in Hand-object Interaction Scenarios
Lingwei Dang, Ruizhi Shao, Hongwen Zhang et al.
FlySearch: Exploring how vision-language models explore
Adam Pardyl, Dominik Matuszek, Mateusz Przebieracz et al.
VA-MoE: Variables-Adaptive Mixture of Experts for Incremental Weather Forecasting
Hao Chen, Tao Han, Song Guo et al.
THUNDER: Tile-level Histopathology image UNDERstanding benchmark
Pierre Marza, Leo Fillioux, Sofiène Boutaj et al.
Believing is Seeing: Unobserved Object Detection using Generative Models
Subhransu S. Bhattacharjee, Dylan Campbell, Rahul Shome
Glocal Information Bottleneck for Time Series Imputation
Jie Yang, Kexin Zhang, Guibin Zhang et al.
OODD: Test-time Out-of-Distribution Detection with Dynamic Dictionary
Yifeng Yang, Lin Zhu, Zewen Sun et al.
Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM
Zinuo Li, Xian Zhang, Yongxin Guo et al.
GO-N3RDet: Geometry Optimized NeRF-enhanced 3D Object Detector
Zechuan Li, Hongshan Yu, Yihao Ding et al.
DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolution
Yuzhong Zhao, Feng Liu, Yue Liu et al.
EmoNet-Face: An Expert-Annotated Benchmark for Synthetic Emotion Recognition
Christoph Schuhmann, Robert Kaczmarczyk, Gollam Rabby et al.
Stable Part Diffusion 4D: Multi-View RGB and Kinematic Parts Video Generation
Hao Zhang, Chun-Han Yao, Simon Donné et al.
Open-Insect: Benchmarking Open-Set Recognition of Novel Species in Biodiversity Monitoring
Yuyan Chen, Nico Lang, B. Schmidt et al.
Unveiling the Invisible: Reasoning Complex Occlusions Amodally with AURA
Zhixuan Li, Hyunse Yoon, Sanghoon Lee et al.
4D Visual Pre-training for Robot Learning
Chengkai Hou, Yanjie Ze, Yankai Fu et al.
Escaping the SpuriVerse: Can Large Vision-Language Models Generalize Beyond Seen Spurious Correlations?
Yiwei Yang, Chung Peng Lee, Shangbin Feng et al.
Compositional Caching for Training-free Open-vocabulary Attribute Detection
Marco Garosi, Alessandro Conti, Gaowen Liu et al.
Extremely Simple Multimodal Outlier Synthesis for Out-of-Distribution Detection and Segmentation
Moru Liu, Hao Dong, Jessica Kelly et al.
Asymptotic Theory of Geometric and Adaptive $k$-Means Clustering
Adam Quinn Jaffe
TaxaDiffusion: Progressively Trained Diffusion Model for Fine-Grained Species Generation
Amin Karimi Monsefi, Mridul Khurana, Rajiv Ramnath et al.
ADIEE: Automatic Dataset Creation and Scorer for Instruction-Guided Image Editing Evaluation
Sherry Chen, Yi Wei, Luowei Zhou et al.
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
Fida Mohammad Thoker, Letian Jiang, Chen Zhao et al.
Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning
Wang Yang, Zirui Liu, Hongye Jin et al.
ResQ: A Novel Framework to Implement Residual Neural Networks on Analog Rydberg Atom Quantum Computers
Nicholas DiBrita, Jason Han, Tirthak Patel
FairGen: Enhancing Fairness in Text-to-Image Diffusion Models via Self-Discovering Latent Directions
Yilei Jiang, Wei-Hong Li, Yiyuan Zhang et al.
Return of ChebNet: Understanding and Improving an Overlooked GNN on Long Range Tasks
Ali Hariri, Alvaro Arroyo, Alessio Gravina et al.
Unveiling the Learning Mind of Language Models: A Cognitive Framework and Empirical Study
Zhengyu Hu, Jianxun Lian, Zheyuan Xiao et al.
Whose View of Safety? A Deep DIVE Dataset for Pluralistic Alignment of Text-to-Image Models
Charvi Rastogi, Tian Huey Teh, Pushkar Mishra et al.
SemanticDraw: Towards Real-Time Interactive Content Creation from Image Diffusion Models
Jaerin Lee, Daniel Jung, Kanggeon Lee et al.
MoPFormer: Motion-Primitive Transformer for Wearable-Sensor Activity Recognition
Hao Zhang, Zhan Zhuang, Xuehao Wang et al.
GEOPARD: Geometric Pretraining for Articulation Prediction in 3D Shapes
Pradyumn Goyal, Dmitrii Petrov, Sheldon Andrews et al.
Memory-Enhanced Neural Solvers for Routing Problems
Felix Chalumeau, Refiloe Shabe, Noah De Nicola et al.
Details Matter for Indoor Open-vocabulary 3D Instance Segmentation
Sanghun Jung, Jingjing Zheng, Ke Zhang et al.
Self-supervised Learning of Hybrid Part-aware 3D Representations of 2D Gaussians and Superquadrics
Zhirui Gao, Renjiao Yi, Yuhang Huang et al.
From Laboratory to Real World: A New Benchmark Towards Privacy-Preserved Visible-Infrared Person Re-Identification
Yan Jiang, Hao Yu, Xu Cheng et al.
Gradient Multi-Normalization for Efficient LLM Training
Meyer Scetbon, Chao Ma, Wenbo Gong et al.
One Sample is Enough to Make Conformal Prediction Robust
Soroush H. Zargarbashi, Mohammad Sadegh Akhondzadeh, Aleksandar Bojchevski
End-to-End Multi-Modal Diffusion Mamba
Chunhao Lu, Qiang Lu, Meichen Dong et al.
VIGFace: Virtual Identity Generation for Privacy-Free Face Recognition Dataset
Minsoo Kim, Min-Cheol Sagong, Gi Pyo Nam et al.
RapVerse: Coherent Vocals and Whole-Body Motion Generation from Text
Jiaben Chen, Xin Yan, Yihang Chen et al.
Gradient-Variation Online Adaptivity for Accelerated Optimization with Hölder Smoothness
Yuheng Zhao, Yu-Hu Yan, Kfir Y. Levy et al.
How Different from the Past? Spatio-Temporal Time Series Forecasting with Self-Supervised Deviation Learning
Haotian Gao, Zheng Dong, Jiawei Yong et al.
Compressed and Smooth Latent Space for Text Diffusion Modeling
Viacheslav Meshchaninov, Egor Chimbulatov, Alexander Shabalin et al.
AdaLRS: Loss-Guided Adaptive Learning Rate Search for Efficient Foundation Model Pretraining
Hongyuan Dong, Dingkang Yang, Xiao Liang et al.
Unleashing High-Quality Image Generation in Diffusion Sampling Using Second-Order Levenberg-Marquardt-Langevin
Fangyikang Wang, Hubery Yin, Lei Qian et al.
PoseTraj: Pose-Aware Trajectory Control in Video Diffusion
longbin ji, Lei Zhong, Pengfei Wei et al.
A machine learning approach that beats Rubik's cubes
Alexander Chervov, Kirill Khoruzhii, Nikita Bukhal et al.
CuMPerLay: Learning Cubical Multiparameter Persistence Vectorizations
Caner Korkmaz, Brighton Nuwagira, Baris Coskunuzer et al.
DERD-Net: Learning Depth from Event-based Ray Densities
Diego de Oliveira Hitzges, Suman Ghosh, Guillermo Gallego
SA-Occ: Satellite-Assisted 3D Occupancy Prediction in Real World
Chen Chen, Zhirui Wang, Taowei Sheng et al.
RoboTron-Sim: Improving Real-World Driving via Simulated Hard-Case
Baihui Xiao, Chengjian Feng, Zhijian Huang et al.
COALA: Numerically Stable and Efficient Framework for Context-Aware Low-Rank Approximation
Uliana Parkina, Maxim Rakhuba
PriOr-Flow: Enhancing Primitive Panoramic Optical Flow with Orthogonal View
Longliang Liu, Miaojie Feng, Junda Cheng et al.
Hierarchical-aware Orthogonal Disentanglement Framework for Fine-grained Skeleton-based Action Recognition
Haochen Chang, Pengfei Ren, Haoyang Zhang et al.
Disentangled Clothed Avatar Generation with Layered Representation
Weitian Zhang, Yichao Yan, Sijing Wu et al.
Anti-Aliased 2D Gaussian Splatting
Mae Younes, Adnane Boukhayma
SAUCE: Selective Concept Unlearning in Vision-Language Models with Sparse Autoencoders
Jiahui Geng, Qing Li
Rethinking Layered Graphic Design Generation with a Top-Down Approach
Jingye Chen, Zhaowen Wang, Nanxuan Zhao et al.
Semantic Equitable Clustering: A Simple and Effective Strategy for Clustering Vision Tokens
Qihang Fan, Huaibo Huang, Mingrui Chen et al.
GaRe: Relightable 3D Gaussian Splatting for Outdoor Scenes from Unconstrained Photo Collections
Haiyang Bai, Jiaqi Zhu, Songru Jiang et al.
PropVG: End-to-End Proposal-Driven Visual Grounding with Multi-Granularity Discrimination
Ming Dai, Wenxuan Cheng, Jiedong Zhuang et al.
MP-HSIR: A Multi-Prompt Framework for Universal Hyperspectral Image Restoration
Zhehui Wu, Yong Chen, Naoto Yokoya et al.
Faster and Better 3D Splatting via Group Training
Chengbo Wang, Guozheng Ma, Yizhen Lao et al.
Inpaint4Drag: Repurposing Inpainting Models for Drag-Based Image Editing via Bidirectional Warping
Jingyi Lu, Kai Han
RTMap: Real-Time Recursive Mapping with Change Detection and Localization
Yuheng Du, Sheng Yang, Lingxuan Wang et al.
Joint Self-Supervised Video Alignment and Action Segmentation
Ali Shah Ali, Syed Ahmed Mahmood, Mubin Saeed et al.
You Think, You ACT: The New Task of Arbitrary Text to Motion Generation
Runqi Wang, Caoyuan Ma, Guopeng Li et al.
Constraint-Aware Feature Learning for Parametric Point Cloud
Xi Cheng, Ruiqi Lei, Di Huang et al.
NeRF Is a Valuable Assistant for 3D Gaussian Splatting
Shuangkang Fang, I-Chao Shen, Takeo Igarashi et al.
TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition
Xingsong Ye, Yongkun Du, Yunbo Tao et al.
Monocular Semantic Scene Completion via Masked Recurrent Networks
Xuzhi Wang, Xinran Wu, Song Wang et al.
DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion
Maksim Siniukov, Di Chang, Minh Tran et al.
Foresight in Motion: Reinforcing Trajectory Prediction with Reward Heuristics
Muleilan Pei, Shaoshuai Shi, Xuesong Chen et al.
Charm: The Missing Piece in ViT Fine-Tuning for Image Aesthetic Assessment
Fatemeh Behrad, Tinne Tuytelaars, Johan Wagemans
Video Individual Counting for Moving Drones
Yaowu Fan, Jia Wan, Tao Han et al.
Open-ended Hierarchical Streaming Video Understanding with Vision Language Models
Hyolim Kang, Yunsu Park, Youngbeom Yoo et al.
A Lesson in Splats: Teacher-Guided Diffusion for 3D Gaussian Splats Generation with 2D Supervision
Chensheng Peng, Ido Sobol, Masayoshi Tomizuka et al.
How To Make Your Cell Tracker Say "I dunno!"
Richard D Paul, Johannes Seiffarth, David Rügamer et al.
GS-Occ3D: Scaling Vision-only Occupancy Reconstruction with Gaussian Splatting
Baijun Ye, Minghui Qin, Saining Zhang et al.
Sparse Fine-Tuning of Transformers for Generative Tasks
Wei Chen, Jingxi Yu, Zichen Miao et al.
From Imitation to Innovation: The Emergence of AI's Unique Artistic Styles and the Challenge of Copyright Protection
Zexi Jia, Chuanwei Huang, Hongyan Fei et al.
FROSS: Faster-Than-Real-Time Online 3D Semantic Scene Graph Generation from RGB-D Images
Hao-Yu Hou, Chun-Yi Lee, Motoharu Sonogashira et al.
Constrained Diffusers for Safe Planning and Control
Jichen Zhang, Liqun Zhao, Antonis Papachristodoulou et al.
LangScene-X: Reconstruct Generalizable 3D Language-Embedded Scenes with TriMap Video Diffusion
Fangfu Liu, Hao Li, Jiawei Chi et al.
Self-Ensembling Gaussian Splatting for Few-Shot Novel View Synthesis
Chen Zhao, Xuan Wang, Tong Zhang et al.
HairCUP: Hair Compositional Universal Prior for 3D Gaussian Avatars
Byungjun Kim, Shunsuke Saito, Giljoo Nam et al.
GenM3: Generative Pretrained Multi-path Motion Model for Text Conditional Human Motion Generation
Junyu Shi, Lijiang LIU, Yong Sun et al.
Long-term Traffic Simulation with Interleaved Autoregressive Motion and Scenario Generation
Xiuyu Yang, Shuhan Tan, Philipp Kraehenbuehl
Kestrel: 3D Multimodal LLM for Part-Aware Grounded Description
Mahmoud Ahmed, Junjie Fei, Jian Ding et al.
StyleMotif: Multi-Modal Motion Stylization using Style-Content Cross Fusion
Ziyu Guo, Young-Yoon Lee, Joseph Liu et al.
Resilient Sensor Fusion Under Adverse Sensor Failures via Multi-Modal Expert Fusion
Konyul Park, Yecheol Kim, Daehun Kim et al.
What You Have is What You Track: Adaptive and Robust Multimodal Tracking
Yuedong Tan, Jiawei Shao, Eduard Zamfir et al.
INTER: Mitigating Hallucination in Large Vision-Language Models by Interaction Guidance Sampling
Xin Dong, Shichao Dong, Jin Wang et al.
4D Gaussian Splatting SLAM
Yanyan Li, Youxu Fang, Zunjie Zhu et al.
PBCAT: Patch-Based Composite Adversarial Training against Physically Realizable Attacks on Object Detection
Xiao Li, Yiming Zhu, Yifan Huang et al.
AdaHuman: Animatable Detailed 3D Human Generation with Compositional Multiview Diffusion
Yangyi Huang, Ye Yuan, Xueting Li et al.