Most Cited 2025 "ventral stream selectivity" Papers
22,274 papers found • Page 54 of 112
Conference
The Complexity of Symmetric Equilibria in Min-Max Optimization and Team Zero-Sum Games
Ioannis Anagnostides, Ioannis Panageas, Tuomas Sandholm et al.
Set Smoothness Unlocks Clarke Hyper-stationarity in Bilevel Optimization
He Chen, Jiajin Li, Anthony Man-Cho So
Beyond Spatial Frequency: Pixel-wise Temporal Frequency-based Deepfake Video Detection
Taehoon Kim, Jongwook Choi, Yonghyun Jeong et al.
MPMAvatar: Learning 3D Gaussian Avatars with Accurate and Robust Physics-Based Dynamics
Changmin Lee, Jihyun Lee, Tae-Kyun Kim
Beyond Scores: Proximal Diffusion Models
Zhenghan Fang, Mateo Diaz, Sam Buchanan et al.
Born a Transformer -- Always a Transformer? On the Effect of Pretraining on Architectural Abilities
Mayank Jobanputra, Yana Veitsman, Yash Sarrof et al.
VA-MoE: Variables-Adaptive Mixture of Experts for Incremental Weather Forecasting
Hao Chen, Tao Han, Song Guo et al.
Deep RL Needs Deep Behavior Analysis: Exploring Implicit Planning by Model-Free Agents in Open-Ended Environments
Riley Simmons-Edler, Ryan Badman, Felix Berg et al.
ImageSentinel: Protecting Visual Datasets from Unauthorized Retrieval-Augmented Image Generation
Ziyuan Luo, Yangyi Zhao, Ka Chun Cheung et al.
Foresight in Motion: Reinforcing Trajectory Prediction with Reward Heuristics
Muleilan Pei, Shaoshuai Shi, Xuesong Chen et al.
MF-LLM: Simulating Population Decision Dynamics via a Mean-Field Large Language Model Framework
Qirui Mi, Mengyue Yang, Xiangning Yu et al.
Hierarchical Material Recognition from Local Appearance
Matthew Beveridge, Shree Nayar
Information-Theoretic Reward Decomposition for Generalizable RLHF
Liyuan Mao, Haoran Xu, Amy Zhang et al.
Brain network science modelling of sparse neural networks enables Transformers and LLMs to perform as fully connected
Yingtao Zhang, Diego Cerretti, Jialin Zhao et al.
Revisiting Image Fusion for Multi-Illuminant White-Balance Correction
David Serrano, Aditya Arora, Luis Herranz et al.
MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion
Onkar Susladkar, Jishu Sen Gupta, Chirag Sehgal et al.
Uncertainty-Aware Gradient Stabilization for Small Object Detection
Huixin Sun, Yanjing Li, Linlin Yang et al.
PDEfuncta: Spectrally-Aware Neural Representation for PDE Solution Modeling
Minju Jo, Woojin Cho, Uvini Balasuriya Mudiyanselage et al.
Learning to Instruct for Visual Instruction Tuning
Zhihan Zhou, Feng Hong, JIAAN LUO et al.
GeneMAN: Generalizable Single-Image 3D Human Reconstruction from Multi-Source Human Data
Wentao Wang, Hang Ye, Fangzhou Hong et al.
One-shot 3D Object Canonicalization based on Geometric and Semantic Consistency
Li Jin, Yujie Wang, Wenzheng Chen et al.
F-Bench: Rethinking Human Preference Evaluation Metrics for Benchmarking Face Generation, Customization, and Restoration
Lu Liu, Huiyu Duan, Qiang Hu et al.
TailedCore: Few-Shot Sampling for Unsupervised Long-Tail Noisy Anomaly Detection
Yoon Gyo Jung, Jaewoo Park, Jaeho Yoon et al.
Can Agent Fix Agent Issues?
Alfin Wijaya Rahardja, Junwei Liu, Weitong Chen et al.
Neural Spacetimes for DAG Representation Learning
Haitz Sáez de Ocáriz Borde, Anastasis Kratsios, Marc T Law et al.
Representational Similarity via Interpretable Visual Concepts
Neehar Kondapaneni, Oisin Mac Aodha, Pietro Perona
Towards Better & Faster Autoregressive Image Generation: From the Perspective of Entropy
Xiaoxiao Ma, Feng Zhao, Pengyang Ling et al.
TaxaDiffusion: Progressively Trained Diffusion Model for Fine-Grained Species Generation
Amin Karimi Monsefi, Mridul Khurana, Rajiv Ramnath et al.
Topology-Aware Conformal Prediction for Stream Networks
Jifan Zhang, Fangxin Wang, Zihe Song et al.
Delta Attention: Fast and Accurate Sparse Attention Inference by Delta Correction
Jeffrey Willette, Heejun Lee, Sung Ju Hwang
Bias in Gender Bias Benchmarks: How Spurious Features Distort Evaluation
Yusuke Hirota, Ryo Hachiuma, Boyi Li et al.
AGO: Adaptive Grounding for Open World 3D Occupancy Prediction
Peizheng Li, Shuxiao Ding, You Zhou et al.
CoralVQA: A Large-Scale Visual Question Answering Dataset for Coral Reef Image Understanding
hongyong han, Wei Wang, Gaowei Zhang et al.
Local-Global Associative Frames for Symmetry-Preserving Crystal Structure Modeling
haowei hua, Wanyu Lin
Conformal Prediction Beyond the Seen: A Missing Mass Perspective for Uncertainty Quantification in Generative Models
Sima Noorani, Shayan Kiyani, George J. Pappas et al.
Register and [CLS] tokens induce a decoupling of local and global features in large ViTs
Alexander Lappe, Martin Giese
Learning to Solve Differential Equation Constrained Optimization Problems
Vincenzo Di Vito Francesco, Mostafa Mohammadian, Kyri Baker et al.
DAMamba: Vision State Space Model with Dynamic Adaptive Scan
Tanzhe Li, Caoshuo Li, Jiayi Lyu et al.
Vision-and-Language Training Helps Deploy Taxonomic Knowledge but Does Not Fundamentally Alter It
Yulu Qin, Dheeraj Varghese, Adam Dahlgren Lindström et al.
Traversal Verification for Speculative Tree Decoding
Yepeng Weng, Qiao Hu, Xujie Chen et al.
Flat Channels to Infinity in Neural Loss Landscapes
Flavio Martinelli, Alexander van Meegen, Berfin Simsek et al.
PGC: Physics-Based Gaussian Cloth from a Single Pose
Michelle Guo, Matt Jen-Yuan Chiang, Igor Santesteban et al.
Optimal Control for Transformer Architectures: Enhancing Generalization, Robustness and Efficiency
Kelvin Kan, Xingjian Li, Benjamin Zhang et al.
CATransformers: Carbon Aware Transformers Through Joint Model-Hardware Optimization
Irene Wang, Mostafa Elhoushi, H Ekin Sumbul et al.
Anytime-valid, Bayes-assisted, Prediction-Powered Inference
Valentin Kilian, Stefano Cortinovis, Francois Caron
PARCO: Parallel AutoRegressive Models for Multi-Agent Combinatorial Optimization
Federico Berto, Chuanbo Hua, Laurin Luttmann et al.
Your Pre-trained LLM is Secretly an Unsupervised Confidence Calibrator
Beier Luo, Shuoyuan Wang, Sharon Li et al.
PMA: Towards Parameter-Efficient Point Cloud Understanding via Point Mamba Adapter
Yaohua Zha, Yanzi Wang, Hang Guo et al.
Global Convergence for Average Reward Constrained MDPs with Primal-Dual Actor Critic Algorithm
Yang Xu, Swetha Ganesh, Washim Mondal et al.
GEOPARD: Geometric Pretraining for Articulation Prediction in 3D Shapes
Pradyumn Goyal, Dmitrii Petrov, Sheldon Andrews et al.
CVFusion: Cross-View Fusion of 4D Radar and Camera for 3D Object Detection
Hanzhi Zhong, Zhiyu Xiang, Ruoyu Xu et al.
NeuraLeaf: Neural Parametric Leaf Models with Shape and Deformation Disentanglement
Yang Yang, Dongni Mao, Hiroaki Santo et al.
Personalized Safety in LLMs: A Benchmark and A Planning-Based Agent Approach
Yuchen Wu, Edward Sun, Kaijie Zhu et al.
LocalDyGS: Multi-view Global Dynamic Scene Modeling via Adaptive Local Implicit Feature Decoupling
Jiahao Wu, Rui Peng, Jianbo Jiao et al.
Unifying Text Semantics and Graph Structures for Temporal Text-attributed Graphs with Large Language Models
Siwei Zhang, Yun Xiong, Yateng Tang et al.
TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation
Zonglin Lyu, Chen Chen
seq-JEPA: Autoregressive Predictive Learning of Invariant-Equivariant World Models
Hafez Ghaemi, Eilif B. Muller, Shahab Bakhtiari
Teaching VLMs to Localize Specific Objects from In-context Examples
Sivan Doveh, Nimrod Shabtay, Eli Schwartz et al.
Details Matter for Indoor Open-vocabulary 3D Instance Segmentation
Sanghun Jung, Jingjing Zheng, Ke Zhang et al.
CReFT-CAD: Boosting Orthographic Projection Reasoning for CAD via Reinforcement Fine-Tuning
Ke Niu, Zhuofan Chen, Haiyang Yu et al.
Reanimating Images using Neural Representations of Dynamic Stimuli
Jacob Yeung, Andrew Luo, Gabriel Sarch et al.
ATA: Adaptive Transformation Agent for Text-Guided Subject-Position Variable Background Inpainting
Yizhe Tang, Zhimin Sun, Yuzhen Du et al.
MoESD: Unveil Speculative Decoding's Potential for Accelerating Sparse MoE
Zongle Huang, Lei Zhu, ZongYuan Zhan et al.
Adaptive Stochastic Coefficients for Accelerating Diffusion Sampling
Ruoyu Wang, Beier Zhu, Junzhi Li et al.
Constraint-Aware Feature Learning for Parametric Point Cloud
Xi Cheng, Ruiqi Lei, Di Huang et al.
Differentiation Through Black-Box Quadratic Programming Solvers
Connor Magoon, Fengyu Yang, Noam Aigerman et al.
$\boldsymbol{\lambda}$-Orthogonality Regularization for Compatible Representation Learning
Simone Ricci, Niccolò Biondi, Federico Pernici et al.
GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models
Jonathan Roberts, Kai Han, Samuel Albanie
Next Semantic Scale Prediction via Hierarchical Diffusion Language Models
Cai Zhou, Chenyu Wang, Dinghuai Zhang et al.
Guiding LLM Decision-Making with Fairness Reward Models
Zara Hall, Melanie Subbiah, Thomas Zollo et al.
VIGFace: Virtual Identity Generation for Privacy-Free Face Recognition Dataset
Minsoo Kim, Min-Cheol Sagong, Gi Pyo Nam et al.
RapVerse: Coherent Vocals and Whole-Body Motion Generation from Text
Jiaben Chen, Xin Yan, Yihang Chen et al.
MetaSlot: Break Through the Fixed Number of Slots in Object-Centric Learning
Hongjia Liu, Rongzhen Zhao, Haohan Chen et al.
Unleashing High-Quality Image Generation in Diffusion Sampling Using Second-Order Levenberg-Marquardt-Langevin
Fangyikang Wang, Hubery Yin, Lei Qian et al.
Dynamic View Synthesis as an Inverse Problem
Hidir Yesiltepe, Pinar Yanardag
Convergent Functions, Divergent Forms
Hyeonseong Jeon, Ainaz Eftekhar, Aaron Walsman et al.
Learning to Better Search with Language Models via Guided Reinforced Self-Training
Seungyong Moon, Bumsoo Park, Hyun Oh Song
SIM: Surface-based fMRI Analysis for Inter-Subject Multimodal Decoding from Movie-Watching Experiments
Simon Dahan, Gabriel Bénédict, Logan Williams et al.
HumanSAM: Classifying Human-centric Forgery Videos in Human Spatial, Appearance, and Motion Anomaly
Chang Liu, Yunfan Ye, Fan Zhang et al.
Approximation algorithms for combinatorial optimization with predictions
Antonios Antoniadis, Marek Elias, Adam Polak et al.
Charting the Design Space of Neural Graph Representations for Subgraph Matching
Vaibhav Raj, Indradyumna Roy, Ashwin Ramachandran et al.
FREE-Merging: Fourier Transform for Efficient Model Merging
Shenghe Zheng, Hongzhi Wang
Robust Conformal Prediction with a Single Binary Certificate
Soroush H. Zargarbashi, Aleksandar Bojchevski
Probabilistic Geometric Principal Component Analysis with application to neural data
Han-Lin Hsieh, Maryam Shanechi
OmniVTON: Training-Free Universal Virtual Try-On
Zhaotong Yang, Yuhui Li, Shengfeng He et al.
Sufficient Invariant Learning for Distribution Shift
Taero Kim, Subeen Park, Sungjun Lim et al.
Action abstractions for amortized sampling
Oussama Boussif, Léna Ezzine, Joseph Viviano et al.
Decouple Distortion from Perception: Region Adaptive Diffusion for Extreme-low Bitrate Perception Image Compression
Jinchang Xu, Shaokang Wang, Jintao Chen et al.
Anatomical Consistency and Adaptive Prior-informed Transformation for Multi-contrast MR Image Synthesis via Diffusion Model
Yejee Shin, Yeeun Lee, Hanbyol Jang et al.
Cross-Subject Mind Decoding from Inaccurate Representations
Yangyang Xu, Bangzhen Liu, Wenqi Shao et al.
Scalable Decision-Making in Stochastic Environments through Learned Temporal Abstraction
Baiting Luo, Ava Pettet, Aron Laszka et al.
Automated Filtering of Human Feedback Data for Aligning Text-to-Image Diffusion Models
Yongjin Yang, Sihyeon Kim, Hojung Jung et al.
RTMap: Real-Time Recursive Mapping with Change Detection and Localization
Yuheng Du, Sheng Yang, Lingxuan Wang et al.
Towards Understanding How Knowledge Evolves in Large Vision-Language Models
Sudong Wang, Yunjian Zhang, Yao Zhu et al.
$\phi$-Update: A Class of Policy Update Methods with Policy Convergence Guarantee
Wenye Li, Jiacai Liu, Ke Wei
Local Dense Logit Relations for Enhanced Knowledge Distillation
Liuchi Xu, Kang Liu, Jinshuai Liu et al.
FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization
Hao Chen, Shell Xu Hu, Wayne Luk et al.
Shading Meets Motion: Self-supervised Indoor 3D Reconstruction Via Simultaneous Shape-from-Shading and Structure-from-Motion
Guoyu Lu
OW-OVD: Unified Open World and Open Vocabulary Object Detection
Xing Xi, Yangyang Huang, Ronghua Luo et al.
ODE: Open-Set Evaluation of Hallucinations in Multimodal Large Language Models
Yahan Tu, Rui Hu, Jitao Sang
Masking meets Supervision: A Strong Learning Alliance
Byeongho Heo, Taekyung Kim, Sangdoo Yun et al.
Are Spatial-Temporal Graph Convolution Networks for Human Action Recognition Over-Parameterized?
Jianyang Xie, Yitian Zhao, Yanda Meng et al.
Conflict-Averse Gradient Aggregation for Constrained Multi-Objective Reinforcement Learning
Dohyeong Kim, Mineui Hong, Jeongho Park et al.
Selective Unlearning via Representation Erasure Using Domain Adversarial Training
Nazanin Sepahvand, Eleni Triantafillou, Hugo Larochelle et al.
BWFormer: Building Wireframe Reconstruction from Airborne LiDAR Point Cloud with Transformer
Yuzhou Liu, Lingjie Zhu, Hanqiao Ye et al.
ORIDa: Object-centric Real-world Image Composition Dataset
Jinwoo Kim, Sangmin Han, Jinho Jeong et al.
Annotation Ambiguity Aware Semi-Supervised Medical Image Segmentation
Suruchi Kumari, Pravendra Singh
End-to-End Multi-Modal Diffusion Mamba
Chunhao Lu, Qiang Lu, Meichen Dong et al.
Faster and Better 3D Splatting via Group Training
Chengbo Wang, Guozheng Ma, Yizhen Lao et al.
FedCALM: Conflict-aware Layer-wise Mitigation for Selective Aggregation in Deeper Personalized Federated Learning
Hao Zheng, Zhigang Hu, Boyu Wang et al.
Rethinking Layered Graphic Design Generation with a Top-Down Approach
Jingye Chen, Zhaowen Wang, Nanxuan Zhao et al.
You Think, You ACT: The New Task of Arbitrary Text to Motion Generation
Runqi Wang, Caoyuan Ma, Guopeng Li et al.
TriTex: Learning Texture from a Single Mesh via Triplane Semantic Features
Dana Cohen-Bar, Daniel Cohen-Or, Gal Chechik et al.
Instant GaussianImage: A Generalizable and Self-Adaptive Image Representation via 2D Gaussian Splatting
Zhaojie Zeng, Yuesong Wang, Chao Yang et al.
PI-HMR: Towards Robust In-bed Temporal Human Shape Reconstruction with Contact Pressure Sensing
Ziyu Wu, Yufan Xiong, Mengting Niu et al.
The Complexity of Two-Team Polymatrix Games with Independent Adversaries
Alexandros Hollender, Gilbert Maystre, Sai Ganesh Nagarajan
Towards Self-Supervised Covariance Estimation in Deep Heteroscedastic Regression
Megh Shukla, Aziz Shameem, Mathieu Salzmann et al.
MMAIF: Multi-task and Multi-degradation All-in-One for Image Fusion with Language Guidance
Zihan Cao, Yu Zhong, Ziqi Wang et al.
Improved Sampling Of Diffusion Models In Fluid Dynamics With Tweedie's Formula
Youssef Shehata, Benjamin Holzschuh, Nils Thuerey
GPAvatar: High-fidelity Head Avatars by Learning Efficient Gaussian Projections
Weiqi Feng, Dong Han, Zekang Zhou et al.
Machine Unlearning via Simulated Oracle Matching
Kristian G Georgiev, Roy Rinberg, Sam Park et al.
TimeExpert: An Expert-Guided Video LLM for Video Temporal Grounding
Zuhao Yang, Yingchen Yu, Yunqing Zhao et al.
Kestrel: 3D Multimodal LLM for Part-Aware Grounded Description
Mahmoud Ahmed, Junjie Fei, Jian Ding et al.
From Search to Sampling: Generative Models for Robust Algorithmic Recourse
Prateek Garg, Lokesh Nagalapatti, Sunita Sarawagi
Mesh Mamba: A Unified State Space Model for Saliency Prediction in Non-Textured and Textured Meshes
Kaiwei Zhang, Dandan Zhu, Xiongkuo Min et al.
Wavelet and Prototype Augmented Query-based Transformer for Pixel-level Surface Defect Detection
Feng Yan, Xiaoheng Jiang, Yang Lu et al.
Alignment, Mining and Fusion: Representation Alignment with Hard Negative Mining and Selective Knowledge Fusion for Medical Visual Question Answering
Yuanhao Zou, Zhaozheng Yin
INTER: Mitigating Hallucination in Large Vision-Language Models by Interaction Guidance Sampling
Xin Dong, Shichao Dong, Jin Wang et al.
ICP: Immediate Compensation Pruning for Mid-to-high Sparsity
Xin Luo, Fu Xueming, Zihang Jiang et al.
ImDy: Human Inverse Dynamics from Imitated Observations
Xinpeng Liu, Junxuan Liang, Zili Lin et al.
VODiff: Controlling Object Visibility Order in Text-to-Image Generation
Dong Liang, Jinyuan Jia, Yuhao Liu et al.
Towards a Universal 3D Medical Multi-modality Generalization via Learning Personalized Invariant Representation
Zhaorui Tan, Xi Yang, Tan Pan et al.
Leveraging SD Map to Augment HD Map-based Trajectory Prediction
Zhiwei Dong, Ran Ding, Wei Li et al.
A Truncated Newton Method for Optimal Transport
Mete Kemertas, Amir-massoud Farahmand, Allan Jepson
DriveScape: High-Resolution Driving Video Generation by Multi-View Feature Fusion
Wei Wu, Xi Guo, Weixuan TANG et al.
BrepGiff: Lightweight Generation of Complex B-rep with 3D GAT Diffusion
Hao Guo, Xiaoshui Huang, Hao jiacheng et al.
Reproducible Vision-Language Models Meet Concepts Out of Pre-Training
Ziliang Chen, Xin Huang, Xiaoxuan Fan et al.
Spatial-Temporal Aware Visuomotor Diffusion Policy Learning
Zhenyang Liu, Yikai Wang, Kuanning Wang et al.
Less Attention is More: Prompt Transformer for Generalized Category Discovery
Wei Zhang, Baopeng Zhang, Zhu Teng et al.
Sensitivity-Aware Efficient Fine-Tuning via Compact Dynamic-Rank Adaptation
Tianran Chen, Jiarui Chen, Baoquan Zhang et al.
FlowSeek: Optical Flow Made Easier with Depth Foundation Models and Motion Bases
Matteo Poggi, Fabio Tosi
Silence is Golden: Leveraging Adversarial Examples to Nullify Audio Control in LDM-based Talking-Head Generation
Yuan Gan, Jiaxu Miao, Yunze Wang et al.
Null Counterfactual Factor Interactions for Goal-Conditioned Reinforcement Learning
Caleb Chuck, Fan Feng, Carl Qi et al.
An Optimal Discriminator Weighted Imitation Perspective for Reinforcement Learning
Haoran Xu, Shuozhe Li, Harshit Sikchi et al.
Identity-Clothing Similarity Modeling for Unsupervised Clothing Change Person Re-Identification
Zhiqi Pang, Junjie Wang, Lingling Zhao et al.
General Compression Framework for Efficient Transformer Object Tracking
Lingyi Hong, Jinglun Li, Xinyu Zhou et al.
BACON: Improving Clarity of Image Captions via Bag-of-Concept Graphs
Zhantao Yang, Ruili Feng, Keyu Yan et al.
PortLLM: Personalizing Evolving Large Language Models with Training-Free and Portable Model Patches
Rana Muhammad Shahroz Khan, Pingzhi Li, Sukwon Yun et al.
AudCast: Audio-Driven Human Video Generation by Cascaded Diffusion Transformers
Jiazhi Guan, Kaisiyuan Wang, Zhiliang Xu et al.
DUALFormer: Dual Graph Transformer
Zhuo Jiaming, Yuwei Liu, Yintong Lu et al.
RL-RC-DoT: A Block-level RL agent for Task-Aware Video Compression
Uri Gadot, Shie Mannor, Assaf Shocher et al.
Geometric Inductive Biases of Deep Networks: The Role of Data and Architecture
Sajad Movahedi, Antonio Orvieto, Seyed-Mohsen Moosavi-Dezfooli
Recognition-Synergistic Scene Text Editing
Zhengyao Fang, Pengyuan Lyu, Jingjing Wu et al.
Towards Consistent Multi-Task Learning: Unlocking the Potential of Task-Specific Parameters
Xiaohan Qin, Xiaoxing Wang, Junchi Yan
Supervising Sound Localization by In-the-wild Egomotion
Anna Min, Ziyang Chen, Hang Zhao et al.
UrbanCAD: Towards Highly Controllable and Photorealistic 3D Vehicles for Urban Scene Simulation
Yichong Lu, Yichi Cai, Shangzhan Zhang et al.
EC-Flow: Enabling Versatile Robotic Manipulation from Action-Unlabeled Videos via Embodiment-Centric Flow
Yixiang Chen, Peiyan Li, Yan Huang et al.
Adaptive backtracking for faster optimization
Joao V. Cavalcanti, Laurent Lessard, Ashia Wilson
DoF-Gaussian: Controllable Depth-of-Field for 3D Gaussian Splatting
Liao Shen, Tianqi Liu, Huiqiang Sun et al.
SoftMatcha: A Soft and Fast Pattern Matcher for Billion-Scale Corpus Searches
Hiroyuki Deguchi, Go Kamoda, Yusuke Matsushita et al.
Sharper Guarantees for Learning Neural Network Classifiers with Gradient Methods
Hossein Taheri, Christos Thrampoulidis, Arya Mazumdar
DAViD: Data-efficient and Accurate Vision Models from Synthetic Data
Fatemeh Saleh, Sadegh Aliakbarian, Charlie Hewitt et al.
SceneCrafter: Controllable Multi-View Driving Scene Editing
Zehao Zhu, Yuliang Zou, Chiyu “Max” Jiang et al.
A Visual Leap in CLIP Compositionality Reasoning through Generation of Counterfactual Sets
Zexi Jia, Chuanwei Huang, Yeshuang Zhu et al.
I2VGuard: Safeguarding Images against Misuse in Diffusion-based Image-to-Video Models
Dongnan Gui, Xun Guo, Wengang Zhou et al.
GPS as a Control Signal for Image Generation
Chao Feng, Ziyang Chen, Aleksander Holynski et al.
Forgetting Through Transforming: Enabling Federated Unlearning via Class-Aware Representation Transformation
Qi Guo, Zhen Tian, Minghao Yao et al.
Enhanced Visual-Semantic Interaction with Tailored Prompts for Pedestrian Attribute Recognition
Junyi Wu, Yan Huang, Min Gao et al.
Splat-LOAM: Gaussian Splatting LiDAR Odometry and Mapping
Emanuele Giacomini, Luca Di Giammarino, Lorenzo De Rebotti et al.
From Laboratory to Real World: A New Benchmark Towards Privacy-Preserved Visible-Infrared Person Re-Identification
Yan Jiang, Hao Yu, Xu Cheng et al.
Beyond Isolated Words: Diffusion Brush for Handwritten Text-Line Generation
Gang Dai, Yifan Zhang, Yutao Qin et al.
TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action Recognition
yilong wang, Zilin Gao, Qilong Wang et al.
Learning Regularized Graphon Mean-Field Games with Unknown Graphons
Fengzhuo Zhang, Vincent Tan, Zhaoran Wang et al.
Pixel-aligned RGB-NIR Stereo Imaging and Dataset for Robot Vision
Jinneyong Kim, Seung-Hwan Baek
Effortless Active Labeling for Long-Term Test-Time Adaptation
Guowei Wang, Changxing Ding
FilmComposer: LLM-Driven Music Production for Silent Film Clips
Zhifeng Xie, Qile He, Youjia Zhu et al.
Adversarial Policy Optimization for Offline Preference-based Reinforcement Learning
Hyungkyu Kang, Min-hwan Oh
Joint Self-Supervised Video Alignment and Action Segmentation
Ali Shah Ali, Syed Ahmed Mahmood, Mubin Saeed et al.
Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Instructional Videos
Sagnik Majumder, Tushar Nagarajan, Ziad Al-Halah et al.
Efficient Model-Based Reinforcement Learning Through Optimistic Thompson Sampling
Jasmine Bayrooti, Carl Ek, Amanda Prorok
MOVE: Motion-Guided Few-Shot Video Object Segmentation
Kaining Ying, Hengrui Hu, Henghui Ding
ETVA: Evaluation of Text-to-Video Alignment via Fine-grained Question Generation and Answering
Kaisi Guan, Zhengfeng Lai, Yuchong Sun et al.
Making Old Film Great Again: Degradation-aware State Space Model for Old Film Restoration
Yudong Mao, Hao Luo, Zhiwei Zhong et al.
Decouple-Then-Merge: Finetune Diffusion Models as Multi-Task Learning
Qianli Ma, Xuefei Ning, Dongrui Liu et al.
VLog: Video-Language Models by Generative Retrieval of Narration Vocabulary
Kevin Qinghong Lin, Mike Zheng Shou
Understanding the Stability-based Generalization of Personalized Federated Learning
Yingqi Liu, Qinglun Li, Jie Tan et al.
ReferEverything: Towards Segmenting Everything We Can Speak of in Videos
Anurag Bagchi, Zhipeng Bao, Yu-Xiong Wang et al.
RoboTron-Sim: Improving Real-World Driving via Simulated Hard-Case
Baihui Xiao, Chengjian Feng, Zhijian Huang et al.
Empowering Large Language Models with 3D Situation Awareness
Zhihao Yuan, Yibo Peng, Jinke Ren et al.
Pioneering 4-Bit FP Quantization for Diffusion Models: Mixup-Sign Quantization and Timestep-Aware Fine-Tuning
Maosen Zhao, Pengtao Chen, Chong Yu et al.
Unveiling the Invisible: Reasoning Complex Occlusions Amodally with AURA
Zhixuan Li, Hyunse Yoon, Sanghoon Lee et al.
Causal Graph Transformer for Treatment Effect Estimation Under Unknown Interference
Anpeng Wu, Haiyi Qiu, Zhengming Chen et al.
No Need to Talk: Asynchronous Mixture of Language Models
Anastasiia Filippova, Angelos Katharopoulos, David Grangier et al.
VAGUE: Visual Contexts Clarify Ambiguous Expressions
Heejeong Nam, Jinwoo Ahn, Keummin Ka et al.
3D-AVS: LiDAR-based 3D Auto-Vocabulary Segmentation
Weijie Wei, Osman Ülger, Fatemeh Karimi Nejadasl et al.
DOF-GS: Adjustable Depth-of-Field 3D Gaussian Splatting for Post-Capture Refocusing, Defocus Rendering and Blur Removal
Yujie Wang, Praneeth Chakravarthula, Baoquan Chen
ReasonVQA: A Multi-hop Reasoning Benchmark with Structural Knowledge for Visual Question Answering
Duong T. Tran, Trung-Kien Tran, Manfred Hauswirth et al.
Grid Cell-Inspired Fragmentation and Recall for Efficient Map Building
Jaedong Hwang, Zhang-Wei Hong, Eric Chen et al.
Can't Slow Me Down: Learning Robust and Hardware-Adaptive Object Detectors against Latency Attacks for Edge Devices
Tianyi Wang, Zichen Wang, Cong Wang et al.
econSG: Efficient and Multi-view Consistent Open-Vocabulary 3D Semantic Gaussians
Can Zhang, Gim H Lee