Most Cited 2025 "weight negation" Papers
22,274 papers found • Page 47 of 112
Conference
SnowMaster: Comprehensive Real-world Image Desnowing via MLLM with Multi-Model Feedback Optimization
Jianyu LAI, Sixiang Chen, yunlong lin et al.
Morpheus: Text-Driven 3D Gaussian Splat Shape and Color Stylization
Jamie Wynn, Zawar Qureshi, Jakub Powierza et al.
Efficient Motion-Aware Video MLLM
Zijia Zhao, Yuqi Huo, Tongtian Yue et al.
RICCARDO: Radar Hit Prediction and Convolution for Camera-Radar 3D Object Detection
Yunfei Long, Abhinav Kumar, Xiaoming Liu et al.
CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning
Yang Yue, Yulin Wang, Chenxin Tao et al.
Knowledge Bridger: Towards Training-Free Missing Modality Completion
Guanzhou Ke, Shengfeng He, Xiao-Li Wang et al.
Multi-modal Medical Diagnosis via Large-small Model Collaboration
Wanyi Chen, Zihua Zhao, Jiangchao Yao et al.
Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data
Haoxin Li, Boyang Li
HyperSeg: Hybrid Segmentation Assistant with Fine-grained Visual Perceiver
Cong Wei, Haoxian Tan, Yujie Zhong et al.
Ev-3DOD: Pushing the Temporal Boundaries of 3D Object Detection with Event Cameras
Hoonhee Cho, Jae-Young Kang, Youngho Kim et al.
Efficient Long Video Tokenization via Coordinate-based Patch Reconstruction
Huiwon Jang, Sihyun Yu, Jinwoo Shin et al.
Neuron: Learning Context-Aware Evolving Representations for Zero-Shot Skeleton Action Recognition
Yang Chen, Jingcai Guo, Song Guo et al.
Noise-Resistant Video Anomaly Detection via RGB Error-Guided Multiscale Predictive Coding and Dynamic Memory
Han Hu, Wenli Du, Peng Liao et al.
FreeScene: Mixed Graph Diffusion for 3D Scene Synthesis from Free Prompts
Tongyuan Bai, Wangyuanfan Bai, Dong Chen et al.
Dual-Agent Optimization framework for Cross-Domain Few-Shot Segmentation
Zhaoyang Li, Yuan Wang, Wangkai Li et al.
Spatiotemporal Decoupling for Efficient Vision-Based Occupancy Forecasting
Jingyi Xu, Xieyuanli Chen, Junyi Ma et al.
IDEA-Bench: How Far are Generative Models from Professional Designing?
Chen Liang, Lianghua Huang, Jingwu Fang et al.
Enhancing Dataset Distillation via Non-Critical Region Refinement
Minh-Tuan Tran, Trung Le, Xuan-May Le et al.
AMR-Transformer: Enabling Efficient Long-range Interaction for Complex Neural Fluid Simulation
Zeyi Xu, Jinfan Liu, Kuangxu Chen et al.
Dynamic Derivation and Elimination: Audio Visual Segmentation with Enhanced Audio Semantics
Chen Liu, Liying Yang, Peike Li et al.
Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
Yuanchen Wu, Lu Zhang, Hang Yao et al.
ComRoPE: Scalable and Robust Rotary Position Embedding Parameterized by Trainable Commuting Angle Matrices
Hao Yu, Tangyu Jiang, Shuning Jia et al.
HiFi-Portrait: Zero-shot Identity-preserved Portrait Generation with High-fidelity Multi-face Fusion
Yifang Xu, BenXiang Zhai, Yunzhuo Sun et al.
ZeroVO: Visual Odometry with Minimal Assumptions
Lei Lai, Zekai Yin, Eshed Ohn-Bar
GraphMimic: Graph-to-Graphs Generative Modeling from Videos for Policy Learning
Guangyan Chen, Te Cui, Meiling Wang et al.
Continual SFT Matches Multimodal RLHF with Negative Supervision
Ke Zhu, Yu Wang, Yanpeng Sun et al.
Exploring Contextual Attribute Density in Referring Expression Counting
Zhicheng Wang, Zhiyu Pan, Zhan Peng et al.
UCOD-DPL: Unsupervised Camouflaged Object Detection via Dynamic Pseudo-label Learning
Weiqi Yan, Lvhai Chen, Huaijia Kou et al.
Instruct-CLIP: Improving Instruction-Guided Image Editing with Automated Data Refinement Using Contrastive Learning
Sherry X. Chen, Misha Sra, Pradeep Sen
Learning Audio-guided Video Representation with Gated Attention for Video-Text Retrieval
Boseung Jeong, Jicheol Park, Sungyeon Kim et al.
UniAP: Unifying Inter- and Intra-Layer Automatic Parallelism by Mixed Integer Quadratic Programming
Hao Lin, Ke Wu, Jie Li et al.
SSHNet: Unsupervised Cross-modal Homography Estimation via Problem Reformulation and Split Optimization
Junchen Yu, Siyuan Cao, Runmin Zhang et al.
BiLoRA: Almost-Orthogonal Parameter Spaces for Continual Learning
Hao Zhu, Yifei Zhang, Junhao Dong et al.
Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language
Yicheng Chen, Xiangtai Li, Yining Li et al.
Enhanced then Progressive Fusion with View Graph for Multi-View Clustering
Zhibin Dong, Meng Liu, Siwei Wang et al.
Gen3DEval: Using vLLMs for Automatic Evaluation of Generated 3D Objects
Shalini Maiti, Lourdes Agapito, Filippos Kokkinos
Probing the Mid-level Vision Capabilities of Self-Supervised Learning
Xuweiyi Chen, Markus Marks, Zezhou Cheng
On the Out-Of-Distribution Generalization of Large Multimodal Models
Xingxuan Zhang, Jiansheng Li, Wenjing Chu et al.
Channel Consistency Prior and Self-Reconstruction Strategy Based Unsupervised Image Deraining
Guanglu Dong, Tianheng Zheng, Yuanzhouhan Cao et al.
ReRAW: RGB-to-RAW Image Reconstruction via Stratified Sampling for Efficient Object Detection on the Edge
Radu Berdan, Beril Besbinar, Christoph Reinders et al.
NSD-Imagery: A Benchmark Dataset for Extending fMRI Vision Decoding Methods to Mental Imagery
Reese Kneeland, Paul Scotti, Ghislain St-Yves et al.
Diffusion Model is Effectively Its Own Teacher
Xinyin Ma, Runpeng Yu, Songhua Liu et al.
Unbiased Video Scene Graph Generation via Visual and Semantic Dual Debiasing
Yanjun Li, Zhaoyang Li, Honghui Chen et al.
Focus-N-Fix: Region-Aware Fine-Tuning for Text-to-Image Generation
Xiaoying Xing, Avinab Saha, Junfeng He et al.
DeClotH: Decomposable 3D Cloth and Human Body Reconstruction from a Single Image
Hyeongjin Nam, Donghwan Kim, Jeongtaek Oh et al.
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
Rui Zhao, Weijia Mao, Mike Zheng Shou
Multi-Label Prototype Visual Spatial Search for Weakly Supervised Semantic Segmentation
Songsong Duan, Xi Yang, Nannan Wang
USP-Gaussian: Unifying Spike-based Image Reconstruction, Pose Correction and Gaussian Splatting
Kang Chen, Jiyuan Zhang, Zecheng Hao et al.
VideoLLaMB: Long Streaming Video Understanding with Recurrent Memory Bridges
Yuxuan Wang, Yiqi Song, Cihang Xie et al.
Generative Zoo
Tomasz Niewiadomski, Anastasios Yiannakidis, Hanz Cuevas Velasquez et al.
Dynamic Multimodal Prototype Learning in Vision-Language Models
Xingyu Zhu, Shuo Wang, Beier Zhu et al.
DuoLoRA : Cycle-consistent and Rank-disentangled Content-Style Personalization
Aniket Roy, Shubhankar Borse, Shreya Kadambi et al.
Multi-View 3D Point Tracking
Frano Rajič, Haofei Xu, Marko Mihajlovic et al.
Learning to Inference Adaptively for Multimodal Large Language Models
Zhuoyan Xu, Khoi Nguyen, Preeti Mukherjee et al.
Learning to Unlearn while Retaining: Combating Gradient Conflicts in Machine Unlearning
Gaurav Patel, Qiang Qiu
Robust Multi-View Learning via Representation Fusion of Sample-Level Attention and Alignment of Simulated Perturbation
Jie Xu, Na Zhao, Gang Niu et al.
TurboTrain: Towards Efficient and Balanced Multi-Task Learning for Multi-Agent Perception and Prediction
Zewei Zhou, Zhihao Zhao, Tianhui Cai et al.
ToolVQA: A Dataset for Multi-step Reasoning VQA with External Tools
Shaofeng Yin, Ting Lei, Yang Liu
CAVIS: Context-Aware Video Instance Segmentation
Seunghun Lee, Jiwan Seo, Kiljoon Han et al.
CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting
Siyu Jiao, Haoye Dong, Yuyang Yin et al.
Mastering Collaborative Multi-modal Data Selection: A Focus on Informativeness, Uniqueness, and Representativeness
Qifan Yu, Zhebei Shen, Zhongqi Yue et al.
SMoLoRA: Exploring and Defying Dual Catastrophic Forgetting in Continual Visual Instruction Tuning
Ziqi Wang, Chang Che, Qi Wang et al.
ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers
Qianhao Yuan, Qingyu Zhang, yanjiang liu et al.
MultiVerse: A Multi-Turn Conversation Benchmark for Evaluating Large Vision and Language Models
Young-Jun Lee, Byung-Kwan Lee, Jianshu Zhang et al.
DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding
Wenwen Yu, Zhibo Yang, Yuliang Liu et al.
VGGSounder: Audio-Visual Evaluations for Foundation Models
Daniil Zverev, Thaddäus Wiedemer, Ameya Prabhu et al.
Towards Higher Effective Rank in Parameter-Efficient Fine-tuning using Khatri-Rao Product
Paul Albert, Frederic Zhang, Hemanth Saratchandran et al.
SceneMI: Motion In-betweening for Modeling Human-Scene Interaction
Inwoo Hwang, Bing Zhou, Young Min Kim et al.
Hints of Prompt: Enhancing Visual Representation for Multimodal LLMs in Autonomous Driving
Hao Zhou, Zhanning Gao, Zhili Chen et al.
Rethinking Multi-modal Object Detection from the Perspective of Mono-Modality Feature Learning
Tianyi Zhao, Boyang Liu, Yanglei Gao et al.
Arti-PG: A Toolbox for Procedurally Synthesizing Large-Scale and Diverse Articulated Objects with Rich Annotations
Jianhua Sun, Yuxuan Li, Jiude Wei et al.
X-Capture: An Open-Source Portable Device for Multi-Sensory Learning
Samuel Clarke, Suzannah Wistreich, Yanjie Ze et al.
Humans as a Calibration Pattern: Dynamic 3D Scene Reconstruction from Unsynchronized and Uncalibrated Videos
Changwoon Choi, Jeongjun Kim, Geonho Cha et al.
IGL-Nav: Incremental 3D Gaussian Localization for Image-goal Navigation
Wenxuan Guo, Xiuwei Xu, Hang Yin et al.
3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection
Yung-Hsu Yang, Luigi Piccinelli, Mattia Segu et al.
VOVTrack: Exploring the Potentiality in Raw Videos for Open-Vocabulary Multi-Object Tracking
Zekun Qian, Ruize Han, Junhui Hou et al.
Not all Views are Created Equal: Analyzing Viewpoint Instabilities in Vision Foundation Models
Mateusz Michalkiewicz, Xinyue Bai, Mahsa Baktashmotlagh et al.
CHROME: Clothed Human Reconstruction with Occlusion-Resilience and Multiview-Consistency from a Single Image
Arindam Dutta, Meng Zheng, Zhongpai Gao et al.
Self-supervised Learning of Hybrid Part-aware 3D Representations of 2D Gaussians and Superquadrics
Zhirui Gao, Renjiao Yi, Yuhang Huang et al.
UniEgoMotion: A Unified Model for Egocentric Motion Reconstruction, Forecasting, and Generation
Chaitanya Patel, Hiroki Nakamura, Yuta Kyuragi et al.
Ouroboros: Single-step Diffusion Models for Cycle-consistent Forward and Inverse Rendering
shanlin sun, Yifan Wang, Hanwen Zhang et al.
Few-Shot Image Quality Assessment via Adaptation of Vision-Language Models
Xudong Li, Zihao Huang, Yan Zhang et al.
Self-Calibrated Variance-Stabilizing Transformations for Real-World Image Denoising
Sébastien Herbreteau, Michael Unser
EAMamba: Efficient All-Around Vision State Space Model for Image Restoration
Yu-Cheng Lin, Yu-Syuan Xu, Hao-Wei Chen et al.
IDF: Iterative Dynamic Filtering Networks for Generalizable Image Denoising
Dongjin Kim, Jaekyun Ko, Muhammad Kashif Ali et al.
Free-Form Motion Control: Controlling the 6D Poses of Camera and Objects in Video Generation
Xincheng Shuai, Henghui Ding, Zhenyuan Qin et al.
VertexRegen: Mesh Generation with Continuous Level of Detail
Xiang Zhang, Yawar Siddiqui, Armen Avetisyan et al.
I2V3D: Controllable Image-to-video Generation with 3D Guidance
Zhiyuan Zhang, Dongdong Chen, Jing Liao
Controllable Weather Synthesis and Removal with Video Diffusion Models
Chih-Hao Lin, Zian Wang, Ruofan Liang et al.
Sequential Gaussian Avatars with Hierarchical Motion Context
Wangze Xu, Yifan Zhan, Zhihang Zhong et al.
iManip: Skill-Incremental Learning for Robotic Manipulation
Zexin Zheng, Jia-Feng Cai, Xiao-Ming Wu et al.
Morph: A Motion-free Physics Optimization Framework for Human Motion Generation
Zhuo Li, Mingshuang Luo, RuiBing Hou et al.
CoMPaSS: Enhancing Spatial Understanding in Text-to-Image Diffusion Models
Gaoyang Zhang, Bingtao Fu, Qingnan Fan et al.
WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation
Zhongyu Yang, Jun Chen, Dannong Xu et al.
CuRe: Cultural Gaps in the Long Tail of Text-to-Image Systems
Aniket Rege, Zinnia Nie, Unmesh Raskar et al.
Repurposing 2D Diffusion Models with Gaussian Atlas for 3D Generation
Tiange Xiang, Kai Li, Chengjiang Long et al.
From Image to Video: An Empirical Study of Diffusion Representations
Pedro Vélez, Luisa Polania Cabrera, Yi Yang et al.
Balanced Image Stylization with Style Matching Score
Yuxin Jiang, Liming Jiang, Shuai Yang et al.
LayerD: Decomposing Raster Graphic Designs into Layers
Tomoyuki Suzuki, Kang-Jun Liu, Naoto Inoue et al.
Anchor Token Matching: Implicit Structure Locking for Training-free AR Image Editing
Taihang Hu, Linxuan Li, Kai Wang et al.
UniGlyph: Unified Segmentation-Conditioned Diffusion for Precise Visual Text Synthesis
Yuanrui Wang, Cong Han, Yafei Li et al.
CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching
Zizhuo Li, Yifan Lu, Linfeng Tang et al.
Collaborative Instance Object Navigation: Leveraging Uncertainty-Awareness to Minimize Human-Agent Dialogues
Francesco Taioli, Edoardo Zorzi, Gianni Franchi et al.
BadVideo: Stealthy Backdoor Attack against Text-to-Video Generation
Ruotong Wang, Mingli Zhu, Jiarong Ou et al.
Sculpting Memory: Multi-Concept Forgetting in Diffusion Models via Dynamic Mask and Concept-Aware Optimization
Li, Yang Xiao, Jie Ji et al.
GECKO: Gigapixel Vision-Concept Contrastive Pretraining in Histopathology
Saarthak Kapse, Pushpak Pati, Srikar Yellapragada et al.
Test-time Adaptation for Foundation Medical Segmentation Model Without Parametric Updates
Kecheng Chen, Xinyu Luo, Tiexin Qin et al.
ARGUS: Hallucination and Omission Evaluation in Video-LLMs
Ruchit Rawal, Reza Shirkavand, Heng Huang et al.
OuroMamba: A Data-Free Quantization Framework for Vision Mamba
Akshat Ramachandran, Mingyu Lee, Huan Xu et al.
CutS3D: Cutting Semantics in 3D for 2D Unsupervised Instance Segmentation
Leon Sick, Dominik Engel, Sebastian Hartwig et al.
Beyond [cls]: Exploring the True Potential of Masked Image Modeling Representations
Marcin Przewięźlikowski, Randall Balestriero, Wojciech Jasiński et al.
Stable Diffusion Models are Secretly Good at Visual In-Context Learning
Trevine Oorloff, Vishwanath Sindagi, Wele Gedara Chaminda Bandara et al.
Auto-Vocabulary Semantic Segmentation
Osman Ülger, Maksymilian Kulicki, Yuki Asano et al.
PRM: Photometric Stereo based Large Reconstruction Model
Wenhang Ge, Jiantao Lin, Guibao SHEN et al.
Neural Shell Texture Splatting: More Details and Fewer Primitives
Xin Zhang, Anpei Chen, Jincheng Xiong et al.
JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers
Kwon Byung-Ki, Qi Dai, Lee Hyoseok et al.
DistillDrive: End-to-End Multi-Mode Autonomous Driving Distillation by Isomorphic Hetero-Source Planning Model
Rui Yu, Xianghang Zhang, Runkai Zhao et al.
GS-ID: Illumination Decomposition on Gaussian Splatting via Adaptive Light Aggregation and Diffusion-Guided Material Priors
Kang DU, Zhihao Liang, Yulin Shen et al.
ETA: Efficiency through Thinking Ahead, A Dual Approach to Self-Driving with Large Models
Shadi Hamdan, Chonghao Sima, Zetong Yang et al.
Occupancy Learning with Spatiotemporal Memory
Ziyang Leng, Jiawei Yang, Wenlong Yi et al.
Inverse 3D Microscopy Rendering for Cell Shape Inference with Active Mesh
Sacha Ichbiah, Anshuman Sinha, Fabrice Delbary et al.
PriorMotion: Generative Class-Agnostic Motion Prediction with Raster-Vector Motion Field Priors
Kangan Qian, Jinyu Miao, Xinyu Jiao et al.
BridgeDepth: Bridging Monocular and Stereo Reasoning with Latent Alignment
Tongfan Guan, Jiaxin Guo, Chen Wang et al.
LightSwitch: Multi-view Relighting with Material-guided Diffusion
Yehonathan Litman, Fernando De la Torre, Shubham Tulsiani
QuickSplat: Fast 3D Surface Reconstruction via Learned Gaussian Initialization
Yueh-Cheng Liu, Lukas Höllein, Matthias Nießner et al.
SP2T: Sparse Proxy Attention for Dual-stream Point Transformer
Jiaxu Wan, Hong Zhang, Ziqi He et al.
Controllable 3D Outdoor Scene Generation via Scene Graphs
Yuheng Liu, Xinke Li, Yuning Zhang et al.
SViM3D: Stable Video Material Diffusion for Single Image 3D Generation
Andreas Engelhardt, Mark Boss, Vikram Voleti et al.
SparseRecon: Neural Implicit Surface Reconstruction from Sparse Views with Feature and Depth Consistencies
Liang Han, Xu Zhang, Haichuan Song et al.
SAM4D: Segment Anything in Camera and LiDAR Streams
Jianyun Xu, Song Wang, Ziqian Ni et al.
Articulate3D: Holistic Understanding of 3D Scenes as Universal Scene Description
Anna-Maria Halacheva, Yang Miao, Jan-Nico Zaech et al.
What You Have is What You Track: Adaptive and Robust Multimodal Tracking
Yuedong Tan, Jiawei Shao, Eduard Zamfir et al.
Learning Streaming Video Representation via Multitask Training
Yibin Yan, Jilan Xu, Shangzhe Di et al.
UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation
Emmanuelle Bourigault, Amir Jamaludin, Abdullah Hamdi
TF-TI2I: Training-Free Text-and-Image-to-Image Generation via Multi-Modal Implicit-Context Learning In Text-to-Image Models
Teng-Fang Hsiao, Bo-Kai Ruan, Yi-Lun Wu et al.
Video Motion Graphs
Haiyang Liu, Zhan Xu, Fating Hong et al.
LD-RPS: Zero-Shot Unified Image Restoration via Latent Diffusion Recurrent Posterior Sampling
Li Huaqiu, Yong Wang, Tongwen Huang et al.
Rethinking Discrete Tokens: Treating Them as Conditions for Continuous Autoregressive Image Synthesis
Peng Zheng, Junke Wang, Yi Chang et al.
ForgeLens: Data-Efficient Forgery Focus for Generalizable Forgery Image Detection
Yingjian Chen, Lei Zhang, Yakun Niu
FineMotion: A Dataset and Benchmark with both Spatial and Temporal Annotation for Fine-grained Motion Generation and Editing
Bizhu Wu, Jinheng Xie, Meidan Ding et al.
Leveraging the Power of MLLMs for Gloss-Free Sign Language Translation
Jungeun Kim, Hyeongwoo Jeon, Jongseong Bae et al.
ResGS: Residual Densification of 3D Gaussian for Efficient Detail Recovery
Yanzhe Lyu, Kai Cheng, Kang Xin et al.
PanoLlama: Generating Endless and Coherent Panoramas with Next-Token-Prediction LLMs
Teng Zhou, Xiaoyu Zhang, Yongchuan Tang
MonoMobility: Zero-Shot 3D Mobility Analysis from Monocular Videos
Hongyi Zhou, Xiaogang Wang, Yulan Guo et al.
TrustMark: Robust Watermarking and Watermark Removal for Arbitrary Resolution Images
Tu Bui, Shruti Agarwal, John Collomosse
TimeFormer: Capturing Temporal Relationships of Deformable 3D Gaussians for Robust Reconstruction
Dadong Jiang, Zhi Hou, Zhihui Ke et al.
Region-based Cluster Discrimination for Visual Representation Learning
Yin Xie, Kaicheng Yang, Xiang An et al.
Acknowledging Focus Ambiguity in Visual Questions
Chongyan Chen, Yu-Yun Tseng, Zhuoheng Li et al.
OccluGaussian: Occlusion-Aware Gaussian Splatting for Large Scene Reconstruction and Rendering
Shiyong Liu, Xiao Tang, Zhihao Li et al.
D3: Training-Free AI-Generated Video Detection Using Second-Order Features
Chende Zheng, Ruiqi suo, Chenhao Lin et al.
MonoFusion: Sparse-View 4D Reconstruction via Monocular Fusion
Zihan Wang, Jeff Tan, Tarasha Khurana et al.
Towards Scalable Spatial Intelligence via 2D-to-3D Data Lifting
Xingyu Miao, Haoran Duan, Quanhao Qian et al.
Enhancing Partially Relevant Video Retrieval with Hyperbolic Learning
Jun Li, Jinpeng Wang, Chaolei Tan et al.
Pi-GPS: Enhancing Geometry Problem Solving by Unleashing the Power of Diagrammatic Information
Junbo Zhao, Ting Zhang, Jiayu Sun et al.
GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning
Kelin Yu, Sheng Zhang, Harshit Soora et al.
ViewSRD: 3D Visual Grounding via Structured Multi-View Decomposition
Ronggang Huang, Haoxin Yang, Yan Cai et al.
Boosting Multimodal Learning via Disentangled Gradient Learning
Shicai Wei, Chunbo Luo, Yang Luo
HORT: Monocular Hand-held Objects Reconstruction with Transformers
Zerui Chen, Rolandos Alexandros Potamias, Shizhe Chen et al.
HumanOLAT: A Large-Scale Dataset for Full-Body Human Relighting and Novel-View Synthesis
Timo Teufel, xilong zhou, Umar Iqbal et al.
GAP: Gaussianize Any Point Clouds with Text Guidance
Weiqi Zhang, Junsheng Zhou, Haotian Geng et al.
SIMS: Simulating Stylized Human-Scene Interactions with Retrieval-Augmented Script Generation
Wenjia Wang, Liang Pan, Zhiyang Dou et al.
BokehDiff: Neural Lens Blur with One-Step Diffusion
Chengxuan Zhu, Qingnan Fan, Qi Zhang et al.
Bilateral Collaboration with Large Vision-Language Models for Open Vocabulary Human-Object Interaction Detection
Yupeng Hu, Changxing Ding, Chang Sun et al.
A Token-level Text Image Foundation Model for Document Understanding
Tongkun Guan, Zining Wang, Pei Fu et al.
Learning 3D Object Spatial Relationships from Pre-trained 2D Diffusion Models
Sangwon Baik, Hyeonwoo Kim, Hanbyul Joo
MP-HSIR: A Multi-Prompt Framework for Universal Hyperspectral Image Restoration
Zhehui Wu, Yong Chen, Naoto Yokoya et al.
Learning to Generalize without Bias for Open-Vocabulary Action Recognition
Yating Yu, Congqi Cao, Yifan Zhang et al.
INS-MMBench: A Comprehensive Benchmark for Evaluating LVLMs' Performance in Insurance
Chenwei Lin, Hanjia Lyu, Xian Xu et al.
StelLA: Subspace Learning in Low-rank Adaptation using Stiefel Manifold
Zhizhong Li, Sina Sajadmanesh, Jingtao Li et al.
Test-Time Adaptation of Vision-Language Models for Open-Vocabulary Semantic Segmentation
Mehrdad Noori, David OSOWIECHI, Gustavo Vargas Hakim et al.
The Atlas of In-Context Learning: How Attention Heads Shape In-Context Retrieval Augmentation
Patrick Kahardipraja, Reduan Achtibat, Thomas Wiegand et al.
Compressed and Smooth Latent Space for Text Diffusion Modeling
Viacheslav Meshchaninov, Egor Chimbulatov, Alexander Shabalin et al.
$\texttt{STRCMP}$: Integrating Graph Structural Priors with Language Models for Combinatorial Optimization
Xijun Li, Jiexiang Yang, Jinghao Wang et al.
Light-Weight Diffusion Multiplier and Uncertainty Quantification for Fourier Neural Operators
Albert Matveev, Sanmitra Ghosh, Aamal Hussain et al.
Statistical inference for Linear Stochastic Approximation with Markovian Noise
Sergey Samsonov, Marina Sheshukova, Eric Moulines et al.
GRAVER: Generative Graph Vocabularies for Robust Graph Foundation Models Fine-tuning
Haonan Yuan, Qingyun Sun, Junhua Shi et al.
Incentivizing LLMs to Self-Verify Their Answers
Fuxiang Zhang, Jiacheng Xu, Chaojie Wang et al.
Stop the Nonconsensual Use of Nude Images in Research
Princessa Cintaqia, Arshia Arya, Elissa Redmiles et al.
Lost in Transmission: When and Why LLMs Fail to Reason Globally
Tobias Schnabel, Kiran Tomlinson, Adith Swaminathan et al.
A Simple Linear Patch Revives Layer-Pruned Large Language Models
Xinrui Chen, Haoli Bai, Tao Yuan et al.
DREAM: Drafting with Refined Target Features and Entropy-Adaptive Cross-Attention Fusion for Multimodal Speculative Decoding
Yunhai Hu, Tianhua Xia, Zining Liu et al.
Position: Towards Bidirectional Human-AI Alignment
Hua Shen, Tiffany Knearem, Reshmi Ghosh et al.
InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts
Weipeng Zhong, Peizhou Cao, Yichen Jin et al.
EmoNet-Face: An Expert-Annotated Benchmark for Synthetic Emotion Recognition
Christoph Schuhmann, Robert Kaczmarczyk, Gollam Rabby et al.
VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance
Mohammad Reza Taesiri, Abhijay Ghildyal, Saman Zadtootaghaj et al.
Towards A Generalist Code Embedding Model Based On Massive Data Synthesis
Chaofan Li, Jianlyu Chen, Yingxia Shao et al.
AGC-Drive: A Large-Scale Dataset for Real-World Aerial-Ground Collaboration in Driving Scenarios
Yunhao Hou, Bochao Zou, Min Zhang et al.
QCircuitBench: A Large-Scale Dataset for Benchmarking Quantum Algorithm Design
Rui Yang, Ziruo Wang, Yuntian Gu et al.
Toward a Vision-Language Foundation Model for Medical Data: Multimodal Dataset and Benchmarks for Vietnamese PET/CT Report Generation
Tien Nguyen, Dac Nguyen, Duc Nguyen The Minh et al.
STSBench: A Spatio-temporal Scenario Benchmark for Multi-modal Large Language Models in Autonomous Driving
Christian Fruhwirth-Reisinger, Dušan Malić, Wei Lin et al.
MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness
Yunlong Tang, Pinxin Liu, Mingqian Feng et al.
CSI-Bench: A Large-Scale In-the-Wild Dataset for Multi-task WiFi Sensing
Guozhen Zhu, Yuqian Hu, Weihang Gao et al.
In the Eye of MLLM: Benchmarking Egocentric Video Intent Understanding with Gaze-Guided Prompting
Taiying Peng, Jiacheng Hua, Miao Liu et al.
Identifiability of Deep Polynomial Neural Networks
Konstantin Usevich, Ricardo Borsoi, Clara Dérand et al.
All that structure matches does not glitter
Maya Martirossyan, Thomas Egg, Philipp Höllmer et al.
FlowerTune: A Cross-Domain Benchmark for Federated Fine-Tuning of Large Language Models
Yan Gao, Massimo R. Scamarcia, Javier Fernandez-Marques et al.
C-SEO Bench: Does Conversational SEO Work?
Haritz Puerto, Martin Gubri, Tommaso Green et al.
3EED: Ground Everything Everywhere in 3D
Rong Li, Yuhao Dong, Tianshuai Hu et al.
EconGym: A Scalable AI Testbed with Diverse Economic Tasks
Qirui Mi, Qipeng Yang, Zijun Fan et al.
Dynamic Risk Assessments for Offensive Cybersecurity Agents
Boyi Wei, Benedikt Stroebl, Jiacen Xu et al.