Most Cited 2024 "reasoning grounding" Papers
12,324 papers found • Page 13 of 62
Conference
GPSFormer: A Global Perception and Local Structure Fitting-based Transformer for Point Cloud Understanding
Changshuo Wang, Meiqing Wu, Siew-Kei Lam et al.
When Model Meets New Normals: Test-Time Adaptation for Unsupervised Time-Series Anomaly Detection
Equivariant Plug-and-Play Image Reconstruction
Matthieu Terris, Thomas Moreau, Nelly Pustelnik et al.
Equivariant Frames and the Impossibility of Continuous Canonicalization
Nadav Dym, Hannah Lawrence, Jonathan Siegel
Hide in Thicket: Generating Imperceptible and Rational Adversarial Perturbations on 3D Point Clouds
Tianrui Lou, Xiaojun Jia, Jindong Gu et al.
Robust Emotion Recognition in Context Debiasing
Dingkang Yang, Kun Yang, Mingcheng Li et al.
Privacy-Preserving Instructions for Aligning Large Language Models
Da Yu, Peter Kairouz, Sewoong Oh et al.
Fake It till You Make It: Curricular Dynamic Forgery Augmentations towards General Deepfake Detection
Yuzhen Lin, Wentang Song, Bin Li et al.
Reasoning with Latent Diffusion in Offline Reinforcement Learning
Siddarth Venkatraman, Shivesh Khaitan, Ravi Tej Akella et al.
OpenTab: Advancing Large Language Models as Open-domain Table Reasoners
Kezhi Kong, Jiani Zhang, Zhengyuan Shen et al.
Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark
Ziyang Chen, Israel D. Gebru, Christian Richardt et al.
Large Language Models are Efficient Learners of Noise-Robust Speech Recognition
Yuchen Hu, CHEN CHEN, Chao-Han Huck Yang et al.
Situational Awareness Matters in 3D Vision Language Reasoning
Yunze Man, Liang-Yan Gui, Yu-Xiong Wang
VLM2Scene: Self-Supervised Image-Text-LiDAR Learning with Foundation Models for Autonomous Driving Scene Understanding
Guibiao Liao, Jiankun Li, Xiaoqing Ye
DiffStitch: Boosting Offline Reinforcement Learning with Diffusion-based Trajectory Stitching
Guanghe Li, Yixiang Shan, Zhengbang Zhu et al.
NeuSurf: On-Surface Priors for Neural Surface Reconstruction from Sparse Input Views
Han Huang, Yulun Wu, Junsheng Zhou et al.
Producing and Leveraging Online Map Uncertainty in Trajectory Prediction
Xunjiang Gu, Guanyu Song, Igor Gilitschenski et al.
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction
Jiatong Shi, Hirofumi Inaguma, Xutai Ma et al.
Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation
Seung Hyun Lee, Yinxiao Li, Junjie Ke et al.
Training Generative Image Super-Resolution Models by Wavelet-Domain Losses Enables Better Control of Artifacts
Cansu Korkmaz, Ahmet Murat Tekalp, Zafer Dogan
Mono3DVG: 3D Visual Grounding in Monocular Images
Yangfan Zhan, Yuan Yuan, Zhitong Xiong
SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction
Marko Mihajlovic, Sergey Prokudin, Siyu Tang et al.
FedBPT: Efficient Federated Black-box Prompt Tuning for Large Language Models
Jingwei Sun, Ziyue Xu, Hongxu Yin et al.
When to Show a Suggestion? Integrating Human Feedback in AI-Assisted Programming
Hussein Mozannar, Gagan Bansal, Adam Fourney et al.
Tactile-Augmented Radiance Fields
Yiming Dou, Fengyu Yang, Yi Liu et al.
Provable Compositional Generalization for Object-Centric Learning
Thaddäus Wiedemer, Jack Brady, Alexander Panfilov et al.
How to Fine-Tune Vision Models with SGD
Ananya Kumar, Ruoqi Shen, Sebastien Bubeck et al.
Neural Redshift: Random Networks are not Random Functions
Damien Teney, Armand Nicolicioiu, Valentin Hartmann et al.
View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network
Quan Zhang, Lei Wang, Vishal M. Patel et al.
MobileDiffusion: Instant Text-to-Image Generation on Mobile Devices
Yang Zhao, Zhisheng Xiao, Yanwu Xu et al.
PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology
YUXUAN SUN, Hao Wu, Chenglu Zhu et al.
From Self-Attention to Markov Models: Unveiling the Dynamics of Generative Transformers
Muhammed Emrullah Ildiz, Yixiao HUANG, Yingcong Li et al.
On the Generalization of Stochastic Gradient Descent with Momentum
Ali Ramezani-Kebrya, Kimon Antonakopoulos, Volkan Cevher et al.
m&m’s: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks
Zixian Ma, Weikai Huang, Jieyu Zhang et al.
Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis
Bichen Wu, Ching-Yao Chuang, Xiaoyan Wang et al.
V-IRL: Grounding Virtual Intelligence in Real Life
Jihan YANG, Runyu Ding, Ellis L Brown et al.
CosmicMan: A Text-to-Image Foundation Model for Humans
Shikai Li, Jianglin Fu, Kaiyuan Liu et al.
Interactive Continual Learning: Fast and Slow Thinking
Biqing Qi, Xinquan Chen, Junqi Gao et al.
Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices
Nathaniel Cohen, Vladimir Kulikov, Matan Kleiner et al.
Distilling Semantic Priors from SAM to Efficient Image Restoration Models
Quan Zhang, Xiaoyu Liu, Wei Li et al.
Momentum Benefits Non-iid Federated Learning Simply and Provably
Ziheng Cheng, Xinmeng Huang, Pengfei Wu et al.
Synergistic Multiscale Detail Refinement via Intrinsic Supervision for Underwater Image Enhancement
Dehuan Zhang, Jingchun Zhou, Chunle Guo et al.
LION: Implicit Vision Prompt Tuning
Haixin Wang, Jianlong Chang, Yihang Zhai et al.
LitCab: Lightweight Language Model Calibration over Short- and Long-form Responses
Xin Liu, Muhammad Khalifa, Lu Wang
Gradient Alignment for Cross-Domain Face Anti-Spoofing
MINH BINH LE, Simon Woo
AMSP-UOD: When Vortex Convolution and Stochastic Perturbation Meet Underwater Object Detection
Jingchun Zhou, Zongxin He, Kin-Man Lam et al.
Progressive Semantic-Guided Vision Transformer for Zero-Shot Learning
Shiming Chen, Wenjin Hou, Salman Khan et al.
Control4D: Efficient 4D Portrait Editing with Text
Ruizhi Shao, Jingxiang Sun, Cheng Peng et al.
Federated Recommendation with Additive Personalization
Zhiwei Li, Guodong Long, Tianyi Zhou
Vamos: Versatile Action Models for Video Understanding
Shijie Wang, Qi Zhao, Minh Quan et al.
Knowledge Distillation Based on Transformed Teacher Matching
Kaixiang Zheng, EN-HUI YANG
6385 Efficient Spiking Neural Networks with Sparse Selective Activation for Continual Learning
Jiangrong Shen, Wenyao Ni, Qi Xu et al.
Dynamic Sparse Training with Structured Sparsity
Mike Lasby, Anna Golubeva, Utku Evci et al.
Riemannian Preconditioned LoRA for Fine-Tuning Foundation Models
Fangzhao Zhang, Mert Pilanci
Token-Level Contrastive Learning with Modality-Aware Prompting for Multimodal Intent Recognition
Qianrui Zhou, Hua Xu, Hao Li et al.
SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers
Ioannis Kakogeorgiou, Spyros Gidaris, Konstantinos Karantzalos et al.
Prompting Language-Informed Distribution for Compositional Zero-Shot Learning
Wentao Bao, Lichang Chen, Heng Huang et al.
Asynchronous Large Language Model Enhanced Planner for Autonomous Driving
Yuan Chen, Zi-han Ding, Ziqin Wang et al.
GenNBV: Generalizable Next-Best-View Policy for Active 3D Reconstruction
Xiao Chen, Quanyi Li, Tai Wang et al.
Revisiting Link Prediction: a data perspective
Haitao Mao, Juanhui Li, Harry Shomer et al.
Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues
Antonio Orvieto, Soham De, Caglar Gulcehre et al.
Bespoke Solvers for Generative Flow Models
Neta Shaul, Juan Perez, Ricky T. Q. Chen et al.
Towards Energy Efficient Spiking Neural Networks: An Unstructured Pruning Framework
Xinyu Shi, Jianhao Ding, Zecheng Hao et al.
Towards Few-Shot Adaptation of Foundation Models via Multitask Finetuning
Zhuoyan Xu, Zhenmei Shi, Junyi Wei et al.
Collaborating Foundation Models for Domain Generalized Semantic Segmentation
Yasser Benigmim, Subhankar Roy, Slim Essid et al.
LiDAR4D: Dynamic Neural Fields for Novel Space-time View LiDAR Synthesis
Zehan Zheng, Fan Lu, Weiyi Xue et al.
Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences
Axel Barroso-Laguna, Sowmya Munukutla, Victor Adrian Prisacariu et al.
SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher
Trung Dao, Thuan Nguyen, Thanh Van Le et al.
Navigating Complexity: Toward Lossless Graph Condensation via Expanding Window Matching
Yuchen Zhang, Tianle Zhang, Kai Wang et al.
HRVDA: High-Resolution Visual Document Assistant
Chaohu Liu, Kun Yin, Haoyu Cao et al.
A Theory of Non-Linear Feature Learning with One Gradient Step in Two-Layer Neural Networks
Behrad Moniri, Donghwan Lee, Hamed Hassani et al.
Larimar: Large Language Models with Episodic Memory Control
Payel Das, Subhajit Chaudhury, Elliot Nelson et al.
FedMut: Generalized Federated Learning via Stochastic Mutation
Ming Hu, Cao Yue, Anran Li et al.
Full-Atom Peptide Design based on Multi-modal Flow Matching
Jiahan Li, Chaoran Cheng, Zuofan Wu et al.
HiFi-123: Towards High-fidelity One Image to 3D Content Generation
Wangbo Yu, Li Yuan, Yanpei Cao et al.
PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models
Yiming Zhang, Zhening Xing, Yanhong Zeng et al.
RIME: Robust Preference-based Reinforcement Learning with Noisy Preferences
Jie Cheng, Gang Xiong, Xingyuan Dai et al.
LaWa: Using Latent Space for In-Generation Image Watermarking
Ahmad Rezaei, Mohammad Akbari, Saeed Ranjbar Alvar et al.
How I Warped Your Noise: a Temporally-Correlated Noise Prior for Diffusion Models
Pascal Chang, Jingwei Tang, Markus Gross et al.
Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture
Fei Wang, Dan Guo, Kun Li et al.
UGG: Unified Generative Grasping
Jiaxin Lu, Hao Kang, Haoxiang Li et al.
Social-Transmotion: Promptable Human Trajectory Prediction
Saeed Saadatnejad, Yang Gao, Kaouther Messaoud et al.
Do Generated Data Always Help Contrastive Learning?
Yifei Wang, Jizhe Zhang, Yisen Wang
Open-World Human-Object Interaction Detection via Multi-modal Prompts
Jie Yang, Bingliang Li, Ailing Zeng et al.
Training-Free Quantum Architecture Search
Zhimin He, Maijie Deng, Shenggen Zheng et al.
ReMasker: Imputing Tabular Data with Masked Autoencoding
Tianyu Du, Luca Melis, Ting Wang
Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval
Hailang Huang, Zhijie Nie, Ziqiao Wang et al.
The Consensus Game: Language Model Generation via Equilibrium Search
Athul Jacob, Yikang Shen, Gabriele Farina et al.
Offline Training of Language Model Agents with Functions as Learnable Weights
Shaokun Zhang, Jieyu Zhang, Jiale Liu et al.
SD-MVS: Segmentation-Driven Deformation Multi-View Stereo with Spherical Refinement and EM Optimization
Zhenlong Yuan, Jiakai Cao, Zhaoxin Li et al.
Propagation Tree Is Not Deep: Adaptive Graph Contrastive Learning Approach for Rumor Detection
LoRA Training in the NTK Regime has No Spurious Local Minima
Uijeong Jang, Jason Lee, Ernest Ryu
Generative Multi-Modal Knowledge Retrieval with Large Language Models
Xinwei Long, Jiali Zeng, Fandong Meng et al.
Offline Actor-Critic Reinforcement Learning Scales to Large Models
Jost Tobias Springenberg, Abbas Abdolmaleki, Jingwei Zhang et al.
Subtractive Mixture Models via Squaring: Representation and Learning
Lorenzo Loconte, Aleksanteri Sladek, Stefan Mengel et al.
SeFlow: A Self-Supervised Scene Flow Method in Autonomous Driving
Qingwen Zhang, Yi Yang, Peizheng Li et al.
Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction
Hao Li, Ying Chen, Yifei Chen et al.
ICP-Flow: LiDAR Scene Flow Estimation with ICP
Yancong Lin, Holger Caesar
Alchemist: Parametric Control of Material Properties with Diffusion Models
Prafull Sharma, Varun Jampani, Yuanzhen Li et al.
Learning Continuous Implicit Field with Local Distance Indicator for Arbitrary-Scale Point Cloud Upsampling
Shujuan Li, Junsheng Zhou, Baorui Ma et al.
Do Vision and Language Encoders Represent the World Similarly?
Mayug Maniparambil, Raiymbek Akshulakov, YASSER ABDELAZIZ DAHOU DJILALI et al.
Towards Multimodal Sentiment Analysis Debiasing via Bias Purification
Dingkang Yang, Mingcheng Li, Dongling Xiao et al.
PTaRL: Prototype-based Tabular Representation Learning via Space Calibration
Hangting Ye, Wei Fan, Xiaozhuang Song et al.
Fair Resource Allocation in Multi-Task Learning
Hao Ban, Kaiyi Ji
LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content
Qihao Zhao, Yalun Dai, Hao Li et al.
MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant
Chenlu Zhan, Gaoang Wang, Yu LIN et al.
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
Yunhao Ge, Xiaohui Zeng, Jacob Huffman et al.
CLIF: Complementary Leaky Integrate-and-Fire Neuron for Spiking Neural Networks
Yulong Huang, Xiaopeng LIN, Hongwei Ren et al.
CRKD: Enhanced Camera-Radar Object Detection with Cross-modality Knowledge Distillation
Lingjun Zhao, Jingyu Song, Katherine Skinner
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts
Jialin Wu, Xia Hu, Yaqing Wang et al.
Minimax Optimality of Score-based Diffusion Models: Beyond the Density Lower Bound Assumptions
Kaihong Zhang, Heqi Yin, Feng Liang et al.
OpenStreetView-5M: The Many Roads to Global Visual Geolocation
Guillaume Astruc, Nicolas Dufour, Ioannis Siglidis et al.
DGCLUSTER: A Neural Framework for Attributed Graph Clustering via Modularity Maximization
Aritra Bhowmick, Mert Kosan, Zexi Huang et al.
How Does Unlabeled Data Provably Help Out-of-Distribution Detection?
Xuefeng Du, Zhen Fang, Ilias Diakonikolas et al.
SemGrasp: Semantic Grasp Generation via Language Aligned Discretization
Kailin Li, Jingbo Wang, Lixin Yang et al.
Towards Effective and General Graph Unlearning via Mutual Evolution
Xunkai Li, Yulin Zhao, Zhengyu Wu et al.
AutoAD III: The Prequel – Back to the Pixels
Tengda Han, Max Bain, Arsha Nagrani et al.
CoGS: Controllable Gaussian Splatting
Heng Yu, Joel Julin, Zoltán Á. Milacski et al.
Improving Point-based Crowd Counting and Localization Based on Auxiliary Point Guidance
I-HSIANG CHEN, Wei-Ting Chen, Yu-Wei Liu et al.
Early Stopping Against Label Noise Without Validation Data
Suqin Yuan, Lei Feng, Tongliang Liu
Disentangled Clothed Avatar Generation from Text Descriptions
Jionghao Wang, Yuan Liu, Zhiyang Dou et al.
Three Pillars Improving Vision Foundation Model Distillation for Lidar
Gilles Puy, Spyros Gidaris, Alexandre Boulch et al.
AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving
Mingfu Liang, Jong-Chyi Su, Samuel Schulter et al.
Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm
Yi Wu, Ziqiang Li, Heliang Zheng et al.
AUFormer: Vision Transformers are Parameter-Efficient Facial Action Unit Detectors
Kaishen Yuan, Zitong Yu, Xin Liu et al.
HexGen: Generative Inference of Large Language Model over Heterogeneous Environment
Youhe Jiang, Ran Yan, Xiaozhe Yao et al.
How Far Can We Compress Instant-NGP-Based NeRF?
Yihang Chen, Qianyi Wu, Mehrtash Harandi et al.
WeConvene: Learned Image Compression with Wavelet-Domain Convolution and Entropy Model
Haisheng Fu, Jie Liang, Zhenman Fang et al.
MoAI: Mixture of All Intelligence for Large Language and Vision Models
Byung-Kwan Lee, Beomchan Park, Chae Won Kim et al.
UniMix: Towards Domain Adaptive and Generalizable LiDAR Semantic Segmentation in Adverse Weather
Haimei Zhao, Jing Zhang, Zhuo Chen et al.
GIN-SD: Source Detection in Graphs with Incomplete Nodes via Positional Encoding and Attentive Fusion
Le Cheng, Peican Zhu, Keke Tang et al.
Concept-Guided Prompt Learning for Generalization in Vision-Language Models
Yi Zhang, Ce Zhang, Ke Yu et al.
Active Generalized Category Discovery
Shijie Ma, Fei Zhu, Zhun Zhong et al.
PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition
Ziyang Zhang, Qizhen Zhang, Jakob Foerster
InstaGen: Enhancing Object Detection by Training on Synthetic Dataset
Chengjian Feng, Yujie Zhong, Zequn Jie et al.
NeWRF: A Deep Learning Framework for Wireless Radiation Field Reconstruction and Channel Prediction
Haofan Lu, Christopher Vattheuer, Baharan Mirzasoleiman et al.
Probabilities of Causation with Nonbinary Treatment and Effect
Ang Li, Judea Pearl
Generalizable Human Gaussians for Sparse View Synthesis
Youngjoong Kwon, Baole Fang, Yixing Lu et al.
LingoQA: Video Question Answering for Autonomous Driving
Ana-Maria Marcu, Long Chen, Jan Hünermann et al.
Adversarial Prompt Tuning for Vision-Language Models
Jiaming Zhang, Xingjun Ma, Xin Wang et al.
EquiPocket: an E(3)-Equivariant Geometric Graph Neural Network for Ligand Binding Site Prediction
yang zhang, Zhewei Wei, Ye Yuan et al.
Revisiting Adversarial Training at Scale
Zeyu Wang, Xianhang li, Hongru Zhu et al.
Simple Semantic-Aided Few-Shot Learning
Hai Zhang, Junzhe Xu, Shanlin Jiang et al.
An Autoregressive Text-to-Graph Framework for Joint Entity and Relation Extraction
Urchade Zaratiana, Nadi Tomeh, Pierre Holat et al.
Lyapunov-stable Neural Control for State and Output Feedback: A Novel Formulation
Lujie Yang, Hongkai Dai, Zhouxing Shi et al.
Assessing Large Language Models on Climate Information
Jannis Bulian, Mike Schäfer, Afra Amini et al.
Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models
Yiwen Tang, Ray Zhang, Zoey Guo et al.
Jointly Training Large Autoregressive Multimodal Models
Emanuele Aiello, Lili Yu, Yixin Nie et al.
Algorithm and Hardness for Dynamic Attention Maintenance in Large Language Models
Jan van den Brand, Zhao Song, Tianyi Zhou
Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations
Chenyu You, Yifei Min, Weicheng Dai et al.
Hierarchical Multi-Marginal Optimal Transport for Network Alignment
Zhichen Zeng, Boxin Du, Si Zhang et al.
Audio-Synchronized Visual Animation
Lin Zhang, Shentong Mo, Yijing Zhang et al.
XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution
Yunpeng Qu, Kun Yuan, Kai Zhao et al.
Low Rank Matrix Completion via Robust Alternating Minimization in Nearly Linear Time
Yuzhou Gu, Zhao Song, Junze Yin et al.
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Ruizhe Shi, Yuyao Liu, Yanjie Ze et al.
Scalable Language Model with Generalized Continual Learning
Bohao PENG, Zhuotao Tian, Shu Liu et al.
MedBench: A Large-Scale Chinese Benchmark for Evaluating Medical Large Language Models
Yan Cai, Linlin Wang, Ye Wang et al.
Let Models Speak Ciphers: Multiagent Debate through Embeddings
Chau Pham, Boyi Liu, Yingxiang Yang et al.
Video Super-Resolution Transformer with Masked Inter&Intra-Frame Attention
Xingyu Zhou, Leheng Zhang, Xiaorui Zhao et al.
NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation
Jingyang Huo, Yikai Wang, Yanwei Fu et al.
AGILE3D: Attention Guided Interactive Multi-object 3D Segmentation
Yuanwen Yue, Sabarinath Mahadevan, Jonas Schult et al.
SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization
Jialong Guo, Xinghao Chen, Yehui Tang et al.
MonoNPHM: Dynamic Head Reconstruction from Monocular Videos
Simon Giebenhain, Tobias Kirschstein, Markos Georgopoulos et al.
MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning
Zhe Li, Laurence Yang, Bocheng Ren et al.
Watermark-embedded Adversarial Examples for Copyright Protection against Diffusion Models
Peifei Zhu, Tsubasa Takahashi, Hirokatsu Kataoka
How Do Nonlinear Transformers Learn and Generalize in In-Context Learning?
Hongkang Li, Meng Wang, Songtao Lu et al.
Online Boosting Adaptive Learning under Concept Drift for Multistream Classification
En Yu, Jie Lu, Bin Zhang et al.
Flextron: Many-in-One Flexible Large Language Model
Ruisi Cai, Saurav Muralidharan, Greg Heinrich et al.
Exploring Sparse Visual Prompt for Domain Adaptive Dense Prediction
Senqiao Yang, Jiarui Wu, Jiaming Liu et al.
Beyond Accuracy: Evaluating Self-Consistency of Code Large Language Models with IdentityChain
Marcus J. Min, Yangruibo Ding, Luca Buratti et al.
Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs
Woomin Song, Seunghyuk Oh, Sangwoo Mo et al.
PRDP: Proximal Reward Difference Prediction for Large-Scale Reward Finetuning of Diffusion Models
Fei Deng, Qifei Wang, Wei Wei et al.
SleepFM: Multi-modal Representation Learning for Sleep Across Brain Activity, ECG and Respiratory Signals
Rahul Thapa, Bryan He, Magnus Ruud Kjaer et al.
Relightable and Animatable Neural Avatar from Sparse-View Video
Zhen Xu, Sida Peng, Chen Geng et al.
High-fidelity Person-centric Subject-to-Image Synthesis
Yibin Wang, Weizhong Zhang, Jianwei Zheng et al.
On the Scalability of Diffusion-based Text-to-Image Generation
Hao Li, Yang Zou, Ying Wang et al.
SpikePoint: An Efficient Point-based Spiking Neural Network for Event Cameras Action Recognition
Hongwei Ren, Yue ZHOU, Xiaopeng LIN et al.
Exploration and Anti-Exploration with Distributional Random Network Distillation
Kai Yang, jian tao, Jiafei Lyu et al.
On the Content Bias in Fréchet Video Distance
Songwei Ge, Aniruddha Mahapatra, Gaurav Parmar et al.
Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning
Siteng Huang, Biao Gong, Yutong Feng et al.
Learning Object State Changes in Videos: An Open-World Perspective
Zihui Xue, Kumar Ashutosh, Kristen Grauman
Improved Implicit Neural Representation with Fourier Reparameterized Training
Kexuan Shi, Xingyu Zhou, Shuhang Gu
Detecting As Labeling: Rethinking LiDAR-camera Fusion in 3D Object Detection
Junjie Huang, Yun Ye, Zhujin Liang et al.
Spurious Feature Diversification Improves Out-of-distribution Generalization
LIN Yong, Lu Tan, Yifan HAO et al.
Unraveling Instance Associations: A Closer Look for Audio-Visual Segmentation
Yuanhong Chen, Yuyuan Liu, Hu Wang et al.
Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3D Object Detection
Feng Liu, Tengteng Huang, Qianjing Zhang et al.
FineParser: A Fine-grained Spatio-temporal Action Parser for Human-centric Action Quality Assessment
Jinglin Xu, Sibo Yin, Guohao Zhao et al.
ExtDM: Distribution Extrapolation Diffusion Model for Video Prediction
Zhicheng Zhang, Junyao Hu, Wentao Cheng et al.
Boosting Transferability in Vision-Language Attacks via Diversification along the Intersection Region of Adversarial Trajectory
Sensen Gao, Xiaojun Jia, Xuhong Ren et al.
REACTO: Reconstructing Articulated Objects from a Single Video
Chaoyue Song, Jiacheng Wei, Chuan-Sheng Foo et al.
CoralSCOP: Segment any COral Image on this Planet
Zheng Ziqiang, Liang Haixin, Binh-Son Hua et al.
Hierarchical Temporal Context Learning for Camera-based Semantic Scene Completion
Bohan Li, Jiajun Deng, Wenyao Zhang et al.
Learning 3D Geometry and Feature Consistent Gaussian Splatting for Object Removal
YUXIN WANG, Qianyi Wu, Guofeng Zhang et al.
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
Nina Shvetsova, Anna Kukleva, Xudong Hong et al.
Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks
Atli Kosson, Bettina Messmer, Martin Jaggi
Don't Play Favorites: Minority Guidance for Diffusion Models
Soobin Um, Suhyeon Lee, Jong Chul YE
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition
CHEN CHEN, Ruizhe Li, Yuchen Hu et al.
G2P-DDM: Generating Sign Pose Sequence from Gloss Sequence with Discrete Diffusion Model
Pan Xie, Qipeng Zhang, Peng Taiying et al.
AST-T5: Structure-Aware Pretraining for Code Generation and Understanding
Linyuan Gong, Mostafa Elhoushi, Alvin Cheung
Beyond TreeSHAP: Efficient Computation of Any-Order Shapley Interactions for Tree Ensembles
Maximilian Muschalik, Fabian Fumagalli, Barbara Hammer et al.