Most Cited 2025 Poster Papers
22,274 papers found • Page 24 of 112
Conference
Incorporating Visual Correspondence into Diffusion Model for Virtual Try-On
Siqi Wan, Jingwen Chen, Yingwei Pan et al.
Hierarchically-Structured Open-Vocabulary Indoor Scene Synthesis with Pre-trained Large Language Model
Weilin Sun, Xinran Li, Manyi Li et al.
BUFFER-X: Towards Zero-Shot Point Cloud Registration in Diverse Scenes
Minkyun Seo, Hyungtae Lim, Kanghee Lee et al.
SAVVY: Spatial Awareness via Audio-Visual LLMs through Seeing and Hearing
Mingfei Chen, Zijun Cui, Xiulong Liu et al.
From Bytes to Ideas: Language Modeling with Autoregressive U-Nets
Mathurin VIDEAU, Badr Youbi Idrissi, Alessandro Leite et al.
TGBFormer: Transformer-GraphFormer Blender Network for Video Object Detection
Qiang Qi, Xiao Wang
Geo-Sign: Hyperbolic Contrastive Regularisation for Geometrically Aware Sign Language Translation
Edward Fish, Richard Bowden
FlexEvent: Towards Flexible Event-Frame Object Detection at Varying Operational Frequencies
Dongyue Lu, Lingdong Kong, Gim Hee Lee et al.
Thinker: Learning to Think Fast and Slow
Stephen Chung, Wenyu Du, Jie Fu
VeriThoughts: Enabling Automated Verilog Code Generation using Reasoning and Formal Verification
Patrick Yubeaton, Andre Nakkab, Weihua Xiao et al.
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Zhengyao Lyu, Tianlin Pan, Chenyang Si et al.
EvdCLIP: Improving Vision-Language Retrieval with Entity Visual Descriptions from Large Language Models
GuangHao Meng, Sunan He, Jinpeng Wang et al.
Towards Homogeneous Lexical Tone Decoding from Heterogeneous Intracranial Recordings
Di Wu, Siyuan Li, Chen Feng et al.
WorldWeaver: Generating Long-Horizon Video Worlds via Rich Perception
Zhiheng Liu, Xueqing Deng, Shoufa Chen et al.
Learning-Guided Rolling Horizon Optimization for Long-Horizon Flexible Job-Shop Scheduling
Sirui Li, Wenbin Ouyang, Yining Ma et al.
CLIP-PCQA: Exploring Subjective-Aligned Vision-Language Modeling for Point Cloud Quality Assessment
Yating Liu, Yujie Zhang, Ziyu Shan et al.
FaceMe: Robust Blind Face Restoration with Personal Identification
Siyu Liu, Zheng-Peng Duan, Jia OuYang et al.
Purifying Shampoo: Investigating Shampoo's Heuristics by Decomposing its Preconditioner
Runa Eschenhagen, Aaron Defazio, Tsung-Hsien Lee et al.
Poison as Cure: Visual Noise for Mitigating Object Hallucinations in LVMs
Kejia Zhang, Keda TAO, Jiasheng Tang et al.
Multi-head Transformers Provably Learn Symbolic Multi-step Reasoning via Gradient Descent
Tong Yang, Yu Huang, Yingbin Liang et al.
Chiron-o1: Igniting Multimodal Large Language Models towards Generalizable Medical Reasoning via Mentor-Intern Collaborative Search
Haoran Sun, Yankai Jiang, Wenjie Lou et al.
OnlineAnySeg: Online Zero-Shot 3D Segmentation by Visual Foundation Model Guided 2D Mask Merging
Yijie Tang, Jiazhao Zhang, Yuqing Lan et al.
MeshCoder: LLM-Powered Structured Mesh Code Generation from Point Clouds
Bingquan Dai, Luo Li, Qihong Tang et al.
Learning Orthogonal Multi-Index Models: A Fine-Grained Information Exponent Analysis
Yunwei Ren, Jason Lee
Revisiting End-to-End Learning with Slide-level Supervision in Computational Pathology
Wenhao Tang, Rong Qin, Heng Fang et al.
Multi-Modal Grounded Planning and Efficient Replanning for Learning Embodied Agents with a Few Examples
Taewoong Kim, Byeonghwi Kim, Jonghyun Choi
Interpretable Generative Models through Post-hoc Concept Bottlenecks
Akshay R. Kulkarni, Ge Yan, Chung-En Sun et al.
Ego4o: Egocentric Human Motion Capture and Understanding from Multi-Modal Input
Jian Wang, Rishabh Dabral, Diogo Luvizon et al.
PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation
Liyao Jiang, Negar Hassanpour, Mohammad Salameh et al.
LASeR: Learning to Adaptively Select Reward Models with Multi-Arm Bandits
Duy Nguyen, Archiki Prasad, Elias Stengel-Eskin et al.
Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection
Gensheng Pei, Tao Chen, Yujia Wang et al.
Predicting Empirical AI Research Outcomes with Language Models
Jiaxin Wen, Chenglei Si, Yueh-Han Chen et al.
EgoBridge: Domain Adaptation for Generalizable Imitation from Egocentric Human Data
Ryan Punamiya, Dhruv Patel, Patcharapong Aphiwetsa et al.
Dynam3D: Dynamic Layered 3D Tokens Empower VLM for Vision-and-Language Navigation
Zihan Wang, Seungjun Lee, Gim Hee Lee
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
Chenxin Tao, Shiqian Su, Xizhou Zhu et al.
VProChart: Answering Chart Question Through Visual Perception Alignment Agent and Programmatic Solution Reasoning
Muye Huang, Lingling Zhang, Han Lai et al.
Learning from Neighbors: Category Extrapolation for Long-Tail Learning
Shizhen Zhao, Xin Wen, Jiahui Liu et al.
Bringing CLIP to the Clinic: Dynamic Soft Labels and Negation-Aware Learning for Medical Analysis
Hanbin Ko, Chang Min Park
On scalable and efficient training of diffusion samplers
Minkyu Kim, Kiyoung Seong, Dongyeop Woo et al.
LuxDiT: Lighting Estimation with Video Diffusion Transformer
Ruofan Liang, Kai He, Zan Gojcic et al.
Critical Batch Size Revisited: A Simple Empirical Approach to Large-Batch Language Model Training
Will Merrill, Shane Arora, Dirk Groeneveld et al.
Compliant Residual DAgger: Improving Real-World Contact-Rich Manipulation with Human Corrections
Xiaomeng Xu, Yifan Hou, Zeyi Liu et al.
A Unified Solution to Video Fusion: From Multi-Frame Learning to Benchmarking
Zixiang Zhao, Haowen Bai, Bingxin Ke et al.
Multi-Modal View Enhanced Large Vision Models for Long-Term Time Series Forecasting
ChengAo Shen, Wenchao Yu, Ziming Zhao et al.
GarmentPile: Point-Level Visual Affordance Guided Retrieval and Adaptation for Cluttered Garments Manipulation
Ruihai Wu, Ziyu Zhu, Yuran Wang et al.
EchoONE: Segmenting Multiple Echocardiography Planes in One Model
Jiongtong Hu, Wei Zhuo, Jun Cheng et al.
When Are Concepts Erased From Diffusion Models?
Kevin Lu, Nicky Kriplani, Rohit Gandikota et al.
Improved Balanced Classification with Theoretically Grounded Loss Functions
Corinna Cortes, Mehryar Mohri, Yutao Zhong
Multi-Resolution Pathology-Language Pre-training Model with Text-Guided Visual Representation
Shahad Albastaki, Anabia Sohail, IYYAKUTTI IYAPPAN GANAPATHI et al.
Learning Graph Invariance by Harnessing Spuriosity
Tianjun Yao, Yongqiang Chen, Kai Hu et al.
FADA: Fast Diffusion Avatar Synthesis with Mixed-Supervised Multi-CFG Distillation
Tianyun Zhong, Chao Liang, Jianwen Jiang et al.
PFDiff: Training-Free Acceleration of Diffusion Models Combining Past and Future Scores
Guangyi Wang, Yuren Cai, lijiang Li et al.
Language Models Can Predict Their Own Behavior
Dhananjay Ashok, Jonathan May
One2Any: One-Reference 6D Pose Estimation for Any Object
Mengya Liu, Siyuan Li, Ajad Chhatkuli et al.
D^3-Human: Dynamic Disentangled Digital Human from Monocular Video
Honghu Chen, Bo Peng, Yunfan Tao et al.
Estimating Model Performance Under Covariate Shift Without Labels
Jakub Białek, Juhani Kivimäki, Wojciech Kuberski et al.
Towards Understanding Text Hallucination of Diffusion Models via Local Generation Bias
Rui Lu, Runzhe Wang, Kaifeng Lyu et al.
ViPOcc: Leveraging Visual Priors from Vision Foundation Models for Single-View 3D Occupancy Prediction
Yi Feng, Yu Han, Xijing Zhang et al.
Beyond FVD: An Enhanced Evaluation Metrics for Video Generation Distribution Quality
Ge Ya Luo, Gian M Favero, Zhi Hao Luo et al.
Detect Any Mirrors: Boosting Learning Reliability on Large-Scale Unlabeled Data with an Iterative Data Engine
Zhaohu Xing, Lihao Liu, Yijun Yang et al.
QMP: Q-switch Mixture of Policies for Multi-Task Behavior Sharing
Grace Zhang, Ayush Jain, Injune Hwang et al.
PoseLLaVA: Pose Centric Multimodal LLM for Fine-Grained 3D Pose Manipulation
Dong Feng, Ping Guo, Encheng Peng et al.
Learning Interpretable Queries for Explainable Image Classification with Information Pursuit
Stefan Kolek, Aditya Chattopadhyay, Kwan Ho Ryan Chan et al.
Zero-shot 3D Question Answering via Voxel-based Dynamic Token Compression
Hsiang-Wei Huang, Fu-Chen Chen, Wenhao Chai et al.
BiggerGait: Unlocking Gait Recognition with Layer-wise Representations from Large Vision Models
Dingqiang Ye, Chao Fan, Zhanbo Huang et al.
Towards RAW Object Detection in Diverse Conditions
Zhong-Yu Li, Xin Jin, Bo-Yuan Sun et al.
Why LVLMs Are More Prone to Hallucinations in Longer Responses: The Role of Context
Ge Zheng, Jiaye Qian, Jiajin Tang et al.
Training-free Dense-Aligned Diffusion Guidance for Modular Conditional Image Synthesis
Zixuan Wang, DUO PENG, Feng Chen et al.
Learning Spatial-Semantic Features for Robust Video Object Segmentation
Xin Li, Deshui Miao, Zhenyu He et al.
Unlocking Point Processes through Point Set Diffusion
David Lüdke, Enric Rabasseda Raventós, Marcel Kollovieh et al.
Denoising with a Joint-Embedding Predictive Architecture
Chen Dengsheng, Jie Hu, Xiaoming Wei et al.
FreeGave: 3D Physics Learning from Dynamic Videos by Gaussian Velocity
Jinxi Li, Ziyang Song, Siyuan Zhou et al.
A Polarization-Aided Transformer for Image Deblurring via Motion Vector Decomposition
Duosheng Chen, Shihao Zhou, Jinshan Pan et al.
EvHDR-NeRF: Building High Dynamic Range Radiance Fields with Single Exposure Images and Events
Zehao Chen, Zhanfeng Liao, De Ma et al.
Learning Diffusion Models with Flexible Representation Guidance
Chenyu Wang, Cai Zhou, Sharut Gupta et al.
AoP-SAM: Automation of Prompts for Efficient Segmentation
Yi Chen, Muyoung Son, Chuanbo Hua et al.
The Fluorescent Veil: A Stealthy and Effective Physical Adversarial Patch Against Traffic Sign Recognition
Shuai Yuan, Xingshuo Han, Hongwei Li et al.
Uncertainty Quantification with the Empirical Neural Tangent Kernel
Joseph Wilson, Chris van der Heide, Liam Hodgkinson et al.
Attribution Analysis Meets Model Editing: Advancing Knowledge Correction in Vision Language Models with VisEdit
Qizhou Chen, Taolin Zhang, Chengyu Wang et al.
Efficient Quadratic Corrections for Frank-Wolfe Algorithms
Jannis Halbey, Seta Rakotomandimby, Mathieu Besançon et al.
DrVD-Bench: Do Vision-Language Models Reason Like Human Doctors in Medical Image Diagnosis?
Tianhong Zhou, xu yin, Yingtao Zhu et al.
LightsOut: Diffusion-based Outpainting for Enhanced Lens Flare Removal
Shr-Ruei Tsai, Wei-Cheng Chang, Jie-Ying Lee et al.
Progressive Compression with Universally Quantized Diffusion Models
Yibo Yang, Justus Will, Stephan Mandt
BigDocs: An Open Dataset for Training Multimodal Models on Document and Code Tasks
Juan A. Rodriguez, Xiangru Jian, Siba Smarak Panigrahi et al.
OCRT: Boosting Foundation Models in the Open World with Object-Concept-Relation Triad
Luyao Tang, Chaoqi Chen, Yuxuan Yuan et al.
MINGLE: Mixture of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging
Zihuan Qiu, Yi Xu, Chiyuan He et al.
InfinityStar: Unified Spacetime AutoRegressive Modeling for Visual Generation
Jinlai Liu, Jian Han, Bin Yan et al.
Context Clues: Evaluating Long Context Models for Clinical Prediction Tasks on EHR Data
Michael Wornow, Suhana Bedi, Miguel Angel Fuentes Hernandez et al.
Constructing Confidence Intervals for Average Treatment Effects from Multiple Datasets
Yuxin Wang, Maresa Schröder, Dennis Frauen et al.
GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding
Rui Hu, Yuxuan Zhang, Lianghui Zhu et al.
ZAPBench: A Benchmark for Whole-Brain Activity Prediction in Zebrafish
Jan-Matthis Lueckmann, Alexander Immer, Alex Chen et al.
Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing
Zhedong Zhang, Liang Li, Chenggang Yan et al.
Ranked Entropy Minimization for Continual Test-Time Adaptation
Jisu Han, Jaemin Na, Wonjun Hwang
Lawma: The Power of Specialization for Legal Annotation
Ricardo Dominguez-Olmedo, Vedant Nanda, Rediet Abebe et al.
In-Context Learning and Occam's Razor
Eric Elmoznino, Tom Marty, Tejas Kasetty et al.
Curly Flow Matching for Learning Non-gradient Field Dynamics
Katarina Petrović, Lazar Atanackovic, Viggo Moro et al.
Contextual Online Decision Making with Infinite-Dimensional Functional Regression
Haichen Hu, Rui Ai, Stephen Bates et al.
ReNeg: Learning Negative Embedding with Reward Guidance
Xiaomin Li, yixuan liu, Takashi Isobe et al.
DipLLM: Fine-Tuning LLM for Strategic Decision-making in Diplomacy
Kaixuan Xu, Jiajun Chai, Sicheng Li et al.
Automatically Identify and Rectify: Robust Deep Contrastive Multi-view Clustering in Noisy Scenarios
xihong yang, Siwei Wang, Fangdi Wang et al.
Free360: Layered Gaussian Splatting for Unbounded 360-Degree View Synthesis from Extremely Sparse and Unposed Views
Chong Bao, Xiyu Zhang, Zehao Yu et al.
GTG: Generalizable Trajectory Generation Model for Urban Mobility
Jingyuan Wang, Yujing Lin, Yudong Li
Understanding LLM Behaviors via Compression: Data Generation, Knowledge Acquisition and Scaling Laws
Zhixuan Pan, Shaowen Wang, Liao Pengfei et al.
Constrained Belief Updates Explain Geometric Structures in Transformer Representations
Mateusz Piotrowski, Paul Riechers, Daniel Filan et al.
Time-IMM: A Dataset and Benchmark for Irregular Multimodal Multivariate Time Series
Ching Chang, Jeehyun Hwang, Yidan Shi et al.
Autonomous Goal Detection and Cessation in Reinforcement Learning: A Case Study on Source Term Estimation
Yiwei Shi, Muning Wen, Qi Zhang et al.
Efficient Multi-agent Offline Coordination via Diffusion-based Trajectory Stitching
Lei Yuan, Yuqi Bian, Lihe Li et al.
Semantic and Expressive Variations in Image Captions Across Languages
Andre Ye, Sebastin Santy, Jena D. Hwang et al.
TFCustom: Customized Image Generation with Time-Aware Frequency Feature Guidance
Mushui Liu, Dong She, Qihan Huang et al.
EgoExOR: An Ego-Exo-Centric Operating Room Dataset for Surgical Activity Understanding
Ege Özsoy, Arda Mamur, Felix Tristram et al.
Graph-Based Cross-Domain Knowledge Distillation for Cross-Dataset Text-to-Image Person Retrieval
Bingjun Luo, Jinpeng Wang, Zewen Wang et al.
Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation
Xin Yan, Yuxuan Cai, Qiuyue Wang et al.
GARLIC: GPT-Augmented Reinforcement Learning with Intelligent Control for Vehicle Dispatching
Xiao Han, Zijian Zhang, Xiangyu Zhao et al.
CoPRA: Bridging Cross-domain Pretrained Sequence Models with Complex Structures for Protein-RNA Binding Affinity Prediction
Rong Han, Xiaohong Liu, Tong Pan et al.
Risk-Controlling Model Selection via Guided Bayesian Optimization
Adam Fisch, Regina Barzilay, Bracha Laufer-Goldshtein et al.
Robust Federated Finetuning of LLMs via Alternating Optimization of LoRA
Shuangyi Chen, Yuanxin Guo, Yue Ju et al.
IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments
Can Zhang, Gim Hee Lee
mmFAS: Multimodal Face Anti-Spoofing Using Multi-Level Alignment and Switch-Attention Fusion
Geng Chen, Wuyuan Xie, Di Lin et al.
KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models
Fan Wang, Juyong Jiang, Chansung Park et al.
SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts
Gengze Zhou, Yicong Hong, Zun Wang et al.
LUCAS: Layered Universal Codec Avatars
Di Liu, Teng Deng, Giljoo Nam et al.
Faster Inference of Flow-Based Generative Models via Improved Data-Noise Coupling
Aram Davtyan, Leello Dadi, Volkan Cevher et al.
KAC: Kolmogorov-Arnold Classifier for Continual Learning
Yusong Hu, Zichen Liang, Fei Yang et al.
DiET-GS: Diffusion Prior and Event Stream-Assisted Motion Deblurring 3D Gaussian Splatting
Seungjun Lee, Gim Hee Lee
High-Precision Dichotomous Image Segmentation via Probing Diffusion Capacity
Qian Yu, Peng-Tao Jiang, Hao Zhang et al.
Anti-Exposure Bias in Diffusion Models
Junyu Zhang, Daochang Liu, Eunbyung Park et al.
Φ-GAN:Physics-Inspired GAN for Generating SAR Images Under Limited Data
Xidan Zhang, Yihan Zhuang, Qian Guo et al.
Modeling Thousands of Human Annotators for Generalizable Text-to-Image Person Re-identification
Jiayu Jiang, Changxing Ding, Wentao Tan et al.
Spotting the Unexpected (STU): A 3D LiDAR Dataset for Anomaly Segmentation in Autonomous Driving
Alexey Nekrasov, Malcolm Burdorf, Stewart Worrall et al.
Spiral: Semantic-Aware Progressive LiDAR Scene Generation and Understanding
Dekai Zhu, Yixuan Hu, Youquan Liu et al.
Interpretable Image Classification via Non-parametric Part Prototype Learning
Zhijie Zhu, Lei Fan, Maurice Pagnucco et al.
Learning Normal Flow Directly From Events
Dehao Yuan, Levi Burner, Jiayi Wu et al.
Language Agents Meet Causality -- Bridging LLMs and Causal World Models
John Gkountouras, Matthias Lindemann, Phillip Lippe et al.
V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents
Zhengrong Yue, Shaobin Zhuang, Kunchang Li et al.
AdaFisher: Adaptive Second Order Optimization via Fisher Information
Damien GOMES, Yanlei Zhang, Eugene Belilovsky et al.
Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation
Kaining Ying, Henghui Ding, Guangquan Jie et al.
Variational Search Distributions
Dan Steinberg, Rafael Oliveira, Cheng Soon Ong et al.
When Selection Meets Intervention: Additional Complexities in Causal Discovery
Haoyue Dai, Ignavier Ng, Jianle Sun et al.
Towards Improving Exploration through Sibling Augmented GFlowNets
Kanika Madan, Alex Lamb, Emmanuel Bengio et al.
Lifelong Knowledge Editing for Vision Language Models with Low-Rank Mixture-of-Experts
Qizhou Chen, Chengyu Wang, Dakan Wang et al.
Topological Schrödinger Bridge Matching
Maosheng Yang
Unleashing the Potential of Vision-Language Pre-Training for 3D Zero-Shot Lesion Segmentation via Mask-Attribute Alignment
Yankai Jiang, Wenhui Lei, Xiaofan Zhang et al.
DualPM: Dual Posed-Canonical Point Maps for 3D Shape and Pose Reconstruction
Ben Kaye, Tomas Jakab, Shangzhe Wu et al.
AtomSurf: Surface Representation for Learning on Protein Structures
Vincent Mallet, Yangyang Miao, Souhaib Attaiki et al.
Contrastive Test-Time Composition of Multiple LoRA Models for Image Generation
Tuna Meral, Enis Simsar, Federico Tombari et al.
FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model
Yukang Cao, Chenyang Si, Jinghao Wang et al.
A Solvable Attention for Neural Scaling Laws
Bochen Lyu, Di Wang, Zhanxing Zhu
Multi-turn Consistent Image Editing
Zijun Zhou, Yingying Deng, Xiangyu He et al.
Shape it Up! Restoring LLM Safety during Finetuning
ShengYun Peng, Pin-Yu Chen, Jianfeng Chi et al.
PRIMAL: Physically Reactive and Interactive Motor Model for Avatar Learning
Yan Zhang, Yao Feng, Alpár Cseke et al.
Importance Corrected Neural JKO Sampling
Johannes Hertrich, Robert Gruhlke
A Common Pitfall of Margin-based Language Model Alignment: Gradient Entanglement
Hui Yuan, Yifan Zeng, Yue Wu et al.
GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation
Ning Gao, Yilun Chen, Shuai Yang et al.
ELITE: Enhanced Language-Image Toxicity Evaluation for Safety
Wonjun Lee, Doehyeon Lee, Eugene Choi et al.
What should a neuron aim for? Designing local objective functions based on information theory
Andreas C. Schneider, Valentin Neuhaus, David Ehrlich et al.
Scaling Laws for Floating–Point Quantization Training
Xingwu Sun, Shuaipeng Li, Ruobing Xie et al.
DiTASK: Multi-Task Fine-Tuning with Diffeomorphic Transformations
Krishna Sri Ipsit Mantri, Carola-Bibiane Schönlieb, Bruno Ribeiro et al.
Can Generative Geospatial Diffusion Models Excel as Discriminative Geospatial Foundation Models?
Yuru Jia, Valerio Marsocci, Ziyang Gong et al.
CTRL-D: Controllable Dynamic 3D Scene Editing with Personalized 2D Diffusion
Kai He, Chin-Hsuan Wu, Igor Gilitschenski
IMFine: 3D Inpainting via Geometry-guided Multi-view Refinement
Zhihao Shi, Dong Huo, Yuhongze Zhou et al.
High-Fidelity Relightable Monocular Portrait Animation with Lighting-Controllable Video Diffusion Model
Mingtao Guo, Guanyu Xing, Yanli Liu
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
Wenqi Zhang, Hang Zhang, Xin Li et al.
egoPPG: Heart Rate Estimation from Eye-Tracking Cameras in Egocentric Systems to Benefit Downstream Vision Tasks
Björn Braun, Rayan Armani, Manuel Meier et al.
Pos3R: 6D Pose Estimation for Unseen Objects Made Easy
Weijian Deng, Dylan Campbell, Chunyi Sun et al.
Fine-grained Spatiotemporal Grounding on Egocentric Videos
Shuo LIANG, Yiwu Zhong, Zi-Yuan Hu et al.
KV Shifting Attention Enhances Language Modeling
Mingyu Xu, Bingning Wang, Weipeng Chen
PARQ: Piecewise-Affine Regularized Quantization
Lisa Jin, Jianhao Ma, Zechun Liu et al.
ProbeSDF: Light Field Probes For Neural Surface Reconstruction
Briac Toussaint, Diego Thomas, Jean-Sébastien Franco
HandOS: 3D Hand Reconstruction in One Stage
Xingyu Chen, Zhuheng Song, Xiaoke Jiang et al.
DeSplat: Decomposed Gaussian Splatting for Distractor-Free Rendering
Yihao Wang, Marcus Klasson, Matias Turkulainen et al.
From Alexnet to Transformers: Measuring the Non-linearity of Deep Neural Networks with Affine Optimal Transport
Quentin Bouniot, Ievgen Redko, Anton Mallasto et al.
Context-Enhanced Memory-Refined Transformer for Online Action Detection
Zhanzhong Pang, Fadime Sener, Angela Yao
How to Verify Any (Reasonable) Distribution Property: Computationally Sound Argument Systems for Distributions
Tal Herman, Guy Rothblum
Zeroth-Order Fine-Tuning of LLMs with Transferable Static Sparsity
Wentao Guo, Jikai Long, Yimeng Zeng et al.
Cropper: Vision-Language Model for Image Cropping through In-Context Learning
Seung Hyun Lee, Jijun jiang, Yiran Xu et al.
Severing Spurious Correlations with Data Pruning
Varun Mulchandani, Jung-Eun Kim
HUSH: Holistic Panoramic 3D Scene Understanding using Spherical Harmonics
Jongsung Lee, HARIN PARK, Byeong-Uk Lee et al.
On Generalization Across Environments In Multi-Objective Reinforcement Learning
Jayden Teoh, Pradeep Varakantham, Peter Vamplew
EffiCoder: Enhancing Code Generation in Large Language Models through Efficiency-Aware Fine-tuning
Dong HUANG, Guangtao Zeng, Jianbo Dai et al.
ConStellaration: A dataset of QI-like stellarator plasma boundaries and optimization benchmarks
Santiago Cadena, Andrea Merlo, Emanuel Laude et al.
IterIS: Iterative Inference-Solving Alignment for LoRA Merging
Hongxu chen, Zhen Wang, Runshi Li et al.
(How) Can Transformers Predict Pseudo-Random Numbers?
Tao Tao, Darshil Doshi, Dayal Singh Kalra et al.
Multi-modal Vision Pre-training for Medical Image Analysis
Shaohao Rui, Lingzhi Chen, Zhenyu Tang et al.
Bayesian WeakS-to-Strong from Text Classification to Generation
Ziyun Cui, Ziyang Zhang, Guangzhi Sun et al.
CLOC: Contrastive Learning for Ordinal Classification with Multi-Margin N-pair Loss
Dileepa Pitawela, Gustavo Carneiro, Hsiang-Ting Chen
MAGNET: A Multi-agent Framework for Finding Audio-Visual Needles by Reasoning over Multi-Video Haystacks
Sanjoy Chowdhury, Mohamed Elmoghany, Yohan Abeysinghe et al.
Learning to Communicate Through Implicit Communication Channels
Han Wang, Binbin Chen, zhang et al.
TTFSFormer: A TTFS-based Lossless Conversion of Spiking Transformer
Lusen Zhao, Zihan Huang, Ding Jianhao et al.
Understanding Generalization in Quantum Machine Learning with Margins
TAK HUR, Daniel Kyungdeock Park
Rethinking Fair Representation Learning for Performance-Sensitive Tasks
Charles Jones, Fabio De Sousa Ribeiro, Mélanie Roschewitz et al.
ABC-Former: Auxiliary Bimodal Cross-domain Transformer with Interactive Channel Attention for White Balance
Yu-Cheng Chiu, GUAN-RONG CHEN, Zihao Chen et al.
SDGOCC: Semantic and Depth-Guided Bird's-Eye View Transformation for 3D Multimodal Occupancy Prediction
ZaiPeng Duan, Xuzhong Hu, Pei An et al.
Improving Adversarial Transferability on Vision Transformers via Forward Propagation Refinement
Yuchen Ren, Zhengyu Zhao, Chenhao Lin et al.
Reward-Augmented Data Enhances Direct Preference Alignment of LLMs
Shenao Zhang, Zhihan Liu, Boyi Liu et al.
Question-Aware Gaussian Experts for Audio-Visual Question Answering
Hongyeob Kim, Inyoung Jung, Dayoon Suh et al.
DiffVsgg: Diffusion-Driven Online Video Scene Graph Generation
Mu Chen, Liulei Li, Wenguan Wang et al.
BrainOmni: A Brain Foundation Model for Unified EEG and MEG Signals
Qinfan Xiao, Ziyun Cui, Chi Zhang et al.
Bridging the Semantic Gap Between Text and Table: A Case Study on NL2SQL
Lin Long, Xijun Gu, Xinjie Sun et al.
TKG-DM: Training-free Chroma Key Content Generation Diffusion Model
Ryugo Morita, Stanislav Frolov, Brian Bernhard Moser et al.
OpenSDI: Spotting Diffusion-Generated Images in the Open World
Yabin Wang, Zhiwu Huang, Xiaopeng Hong