Most Cited 2024 "remote plethysmography" Papers
12,324 papers found • Page 5 of 62
Conference
Accurate Spatial Gene Expression Prediction by Integrating Multi-Resolution Features
Youngmin Chung, Ji Hun Ha, Kyeong Chan Im et al.
FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models
Andrea Caraffa, Davide Boscaini, Amir Hamza et al.
Fair Federated Learning under Domain Skew with Local Consistency and Domain Diversity
Yuhang Chen, Wenke Huang, Mang Ye
Point2RBox: Combine Knowledge from Synthetic Visual Patterns for End-to-end Oriented Object Detection with Single Point Supervision
Yi Yu, Xue Yang, Qingyun Li et al.
Diffusion Reward: Learning Rewards via Conditional Video Diffusion
Tao Huang, Guangqi Jiang, Yanjie Ze et al.
Debiasing Multimodal Sarcasm Detection with Contrastive Learning
Mengzhao Jia, Can Xie, Liqiang Jing
Think2Drive: Efficient Reinforcement Learning by Thinking with Latent World Model for Autonomous Driving (in CARLA-v2)
Qifeng Li, Xiaosong Jia, Shaobo Wang et al.
LLM-Assisted Code Cleaning For Training Accurate Code Generators
Naman Jain, Tianjun Zhang, Wei-Lin Chiang et al.
PREFER: Prompt Ensemble Learning via Feedback-Reflect-Refine
Chenrui Zhang, Lin Liu, Chuyuan Wang et al.
Learning Transferable Negative Prompts for Out-of-Distribution Detection
Tianqi Li, Guansong Pang, wenjun miao et al.
Text2Analysis: A Benchmark of Table Question Answering with Advanced Data Analysis and Unclear Queries
Xinyi He, Mengyu Zhou, Xinrun Xu et al.
EgoLifter: Open-world 3D Segmentation for Egocentric Perception
Qiao Gu, Zhaoyang Lv, Duncan Frost et al.
Efficient Multi-scale Network with Learnable Discrete Wavelet Transform for Blind Motion Deblurring
Xin Gao, Tianheng Qiu, Xinyu Zhang et al.
AffineQuant: Affine Transformation Quantization for Large Language Models
Yuexiao Ma, Huixia Li, Xiawu Zheng et al.
ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image
Hallee E. Wong, Marianne Rakic, John Guttag et al.
Improved Probabilistic Image-Text Representations
Sanghyuk Chun
4D-DRESS: A 4D Dataset of Real-World Human Clothing With Semantic Annotations
Wenbo Wang, Hsuan-I Ho, Chen Guo et al.
LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model
Yulin Luo, Ruichuan An, Bocheng Zou et al.
DAVE - A Detect-and-Verify Paradigm for Low-Shot Counting
Jer Pelhan, Alan Lukezic, Vitjan Zavrtanik et al.
LEAD: Learning Decomposition for Source-free Universal Domain Adaptation
Sanqing Qu, Tianpei Zou, Lianghua He et al.
Neural Sign Actors: A Diffusion Model for 3D Sign Language Production from Text
Vasileios Baltatzis, Rolandos Alexandros Potamias, Evangelos Ververas et al.
Boosting Object Detection with Zero-Shot Day-Night Domain Adaptation
Zhipeng Du, Miaojing Shi, Jiankang Deng
Posterior Distillation Sampling
Juil Koo, Chanho Park, Minhyuk Sung
ParCo: Part-Coordinating Text-to-Motion Synthesis
Qiran Zou, Shangyuan Yuan, Shian Du et al.
Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style
Shuai Tan, Bin Ji, Ye Pan
Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion
Kiran Chhatre, Radek Danecek, Nikos Athanasiou et al.
Towards Diverse Behaviors: A Benchmark for Imitation Learning with Human Demonstrations
Xiaogang Jia, Denis Blessing, Xinkai Jiang et al.
AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation
Haonan Wang, Qixiang ZHANG, Yi Li et al.
A Watermark-Conditioned Diffusion Model for IP Protection
Rui Min, Sen Li, Hongyang Chen et al.
UniBind: LLM-Augmented Unified and Balanced Representation Space to Bind Them All
Yuanhuiyi Lyu, Xu Zheng, Jiazhou Zhou et al.
Learning the 3D Fauna of the Web
Zizhang Li, Dor Litvak, Ruining Li et al.
Multi-Memory Matching for Unsupervised Visible-Infrared Person Re-Identification
Jiangming Shi, Xiangbo Yin, Yeyun Chen et al.
BAMM: Bidirectional Autoregressive Motion Model
Ekkasit Pinyoanuntapong, Muhammad Usama Saleem, Pu Wang et al.
SemCity: Semantic Scene Generation with Triplane Diffusion
Jumin Lee, Sebin Lee, Changho Jo et al.
HouseCat6D - A Large-Scale Multi-Modal Category Level 6D Object Perception Dataset with Household Objects in Realistic Scenarios
HyunJun Jung, Shun-Cheng Wu, Patrick Ruhkamp et al.
Continual Self-supervised Learning: Towards Universal Multi-modal Medical Data Representation Learning
Yiwen Ye, Yutong Xie, Jianpeng Zhang et al.
Curriculum reinforcement learning for quantum architecture search under hardware errors
Yash J. Patel, Akash Kundu, Mateusz Ostaszewski et al.
CAGE: Controllable Articulation GEneration
Jiayi Liu, Hou In Ivan Tam, Ali Mahdavi Amiri et al.
Two-stage LLM Fine-tuning with Less Specialization and More Generalization
Yihan Wang, Si Si, Daliang Li et al.
Exploiting Diffusion Prior for Generalizable Dense Prediction
Hsin-Ying Lee, Hung-Yu Tseng, Hsin-Ying Lee et al.
DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs
Donghyun Kim, Byeongho Heo, Dongyoon Han
Benchmarking and Improving Generator-Validator Consistency of Language Models
XIANG LI, Vaishnavi Shrivastava, Siyan Li et al.
Vision-and-Language Navigation via Causal Learning
Liuyi Wang, Zongtao He, Ronghao Dang et al.
Generative Proxemics: A Prior for 3D Social Interaction from Images
Vickie Ye, Vickie Ye, Georgios Pavlakos et al.
Class-Incremental Learning with CLIP: Adaptive Representation Adjustment and Parameter Fusion
Linlan Huang, Xusheng Cao, Haori Lu et al.
Investigating and Mitigating the Side Effects of Noisy Views for Self-Supervised Clustering Algorithms in Practical Multi-View Scenarios
Jie Xu, Yazhou Ren, Xiaolong Wang et al.
TC-LIF: A Two-Compartment Spiking Neuron Model for Long-Term Sequential Modelling
Shimin Zhang, Qu Yang, Chenxiang Ma et al.
Large Language Models Are Neurosymbolic Reasoners
Meng Fang, Shilong Deng, Yudi Zhang et al.
DiffAvatar: Simulation-Ready Garment Optimization with Differentiable Simulation
Yifei Li, Hsiaoyu Chen, Egor Larionov et al.
Frequency Spectrum Is More Effective for Multimodal Representation and Fusion: A Multimodal Spectrum Rumor Detector
An Lao, Qi Zhang, Chongyang Shi et al.
On the Error Analysis of 3D Gaussian Splatting and an Optimal Projection Strategy
Letian Huang, Jiayang Bai, Jie Guo et al.
DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning
Zhengxiang Shi, Aldo Lipani
A Compact Dynamic 3D Gaussian Representation for Real-Time Dynamic View Synthesis
Kai Katsumata, Duc Minh Vo, Hideki Nakayama
T-MARS: Improving Visual Representations by Circumventing Text Feature Learning
Pratyush Maini, Sachin Goyal, Zachary Lipton et al.
Elijah: Eliminating Backdoors Injected in Diffusion Models via Distribution Shift
Shengwei An, Sheng-Yen Chou, Kaiyuan Zhang et al.
Towards Robust Event-guided Low-Light Image Enhancement: A Large-Scale Real-World Event-Image Dataset and Novel Approach
Guoqiang Liang, Kanghao Chen, Hangyu Li et al.
MAPE-PPI: Towards Effective and Efficient Protein-Protein Interaction Prediction via Microenvironment-Aware Protein Embedding
Lirong Wu, Yijun Tian, Yufei Huang et al.
Attribute-Missing Graph Clustering Network
Wenxuan Tu, Renxiang Guan, Sihang Zhou et al.
FlowVQTalker: High-Quality Emotional Talking Face Generation through Normalizing Flow and Quantization
Shuai Tan, Bin Ji, Ye Pan
Few-Shot Detection of Machine-Generated Text using Style Representations
Rafael Rivera Soto, Kailin Koch, Aleem Khan et al.
XKD: Cross-Modal Knowledge Distillation with Domain Alignment for Video Representation Learning
Pritam Sarkar, Ali Etemad
Devignet: High-Resolution Vignetting Removal via a Dual Aggregated Fusion Transformer with Adaptive Channel Expansion
Shenghong Luo, Xuhang Chen, Weiwen Chen et al.
TOD3Cap: Towards 3D Dense Captioning in Outdoor Scenes
Bu Jin, Yupeng Zheng, Pengfei Li et al.
Texture-GS: Disentangle the Geometry and Texture for 3D Gaussian Splatting Editing
Tian-Xing Xu, WENBO HU, Yu-Kun Lai et al.
A Diffusion-Based Framework for Multi-Class Anomaly Detection
Haoyang He, Jiangning Zhang, Hongxu Chen et al.
ID-like Prompt Learning for Few-Shot Out-of-Distribution Detection
Yichen Bai, Zongbo Han, Bing Cao et al.
Prompt Learning via Meta-Regularization
Jinyoung Park, Juyeon Ko, Hyunwoo J. Kim
SceneTex: High-Quality Texture Synthesis for Indoor Scenes via Diffusion Priors
Dave Zhenyu Chen, Haoxuan Li, Hsin-Ying Lee et al.
Scene Adaptive Sparse Transformer for Event-based Object Detection
Yansong Peng, Li Hebei, Yueyi Zhang et al.
Does CLIP’s generalization performance mainly stem from high train-test similarity?
Prasanna Mayilvahanan, Thaddäus Wiedemer, Evgenia Rusak et al.
WOODS: Benchmarks for Out-of-Distribution Generalization in Time Series
Irina Rish, Kartik Ahuja, Mohammad Javad Darvishi Bayazi et al.
NOPE: Novel Object Pose Estimation from a Single Image
Van Nguyen Nguyen, Thibault Groueix, Georgy Ponimatkin et al.
Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval
Zhihang Liu, Jun Li, Hongtao Xie et al.
MesonGS: Post-training Compression of 3D Gaussians via Efficient Attribute Transformation
Shuzhao Xie, Weixiang Zhang, Chen Tang et al.
A Vision Check-up for Language Models
Pratyusha Sharma, Tamar Rott Shaham, Manel Baradad et al.
Stream Query Denoising for Vectorized HD-Map Construction
Shuo Wang, Fan Jia, Weixin Mao et al.
UnScene3D: Unsupervised 3D Instance Segmentation for Indoor Scenes
David Rozenberszki, Or Litany, Angela Dai
Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On
Xu Yang, Changxing Ding, Zhibin Hong et al.
TransFusion -- A Transparency-Based Diffusion Model for Anomaly Detection
Matic Fučka, Vitjan Zavrtanik, Danijel Skocaj
DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception
Yibo Wang, Ruiyuan Gao, Kai Chen et al.
StyleSinger: Style Transfer for Out
of-Domain Singing Voice Synthesis
ZeroShape: Regression-based Zero-shot Shape Reconstruction
Zixuan Huang, Stefan Stojanov, Anh Thai et al.
Gradient Reweighting: Towards Imbalanced Class-Incremental Learning
Jiangpeng He
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Pinelopi Papalampidi, Skanda Koppula, Shreya Pathak et al.
NoiseCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions in Diffusion Models
Yusuf Dalva, Pinar Yanardag
Text-Guided Molecule Generation with Diffusion Language Model
Haisong Gong, Qiang Liu, Shu Wu et al.
Dual RL: Unification and New Methods for Reinforcement and Imitation Learning
Harshit Sikchi, Qinqing Zheng, Amy Zhang et al.
Towards Continual Knowledge Graph Embedding via Incremental Distillation
Jiajun Liu, Ke Wenjun, Peng Wang et al.
AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation
Qingping SUN, Yanjun Wang, Ailing Zeng et al.
Learning Diffusion Texture Priors for Image Restoration
Tian Ye, Sixiang Chen, Wenhao Chai et al.
SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views
Chao Xu, Ang Li, Linghao Chen et al.
PAD: Patch-Agnostic Defense against Adversarial Patch Attacks
Lihua Jing, Rui Wang, Wenqi Ren et al.
Controllable Mind Visual Diffusion Model
Bohan Zeng, Shanglin Li, Xuhui Liu et al.
GalLop: Learning global and local prompts for vision-language models
Marc Lafon, Elias Ramzi, Clément Rambour et al.
GLACE: Global Local Accelerated Coordinate Encoding
Fangjinhua Wang, Xudong Jiang, Silvano Galliani et al.
Multi-Architecture Multi-Expert Diffusion Models
Yunsung Lee, Jin-Young Kim, Hyojun Go et al.
Portrait4D: Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data
Yu Deng, Duomin Wang, Xiaohang Ren et al.
Provable Offline Preference-Based Reinforcement Learning
Wenhao Zhan, Masatoshi Uehara, Nathan Kallus et al.
PolyGCL: GRAPH CONTRASTIVE LEARNING via Learnable Spectral Polynomial Filters
Jingyu Chen, Runlin Lei, Zhewei Wei
STanHop: Sparse Tandem Hopfield Model for Memory-Enhanced Time Series Prediction
Yu-Hsuan Wu, Jerry Hu, Weijian Li et al.
No Prejudice! Fair Federated Graph Neural Networks for Personalized Recommendation
Nimesh Agrawal, Anuj Sirohi, Sandeep Kumar et al.
Rethinking Reverse Distillation for Multi-Modal Anomaly Detection
Zhihao Gu, Jiangning Zhang, Liang Liu et al.
AirPhyNet: Harnessing Physics-Guided Neural Networks for Air Quality Prediction
Kethmi Hirushini Hettige, Jiahao Ji, Shili Xiang et al.
Quality-Diversity through AI Feedback
Herbie Bradley, Andrew Dai, Hannah Teufel et al.
Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer
Yaoting Wang, Liu Weisong, Guangyao Li et al.
TESTAM: A Time-Enhanced Spatio-Temporal Attention Model with Mixture of Experts
Hyunwook Lee, Sungahn Ko
Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment
Mengting Chen, Xi Chen, Zhonghua Zhai et al.
Multi-view Aggregation Network for Dichotomous Image Segmentation
Qian Yu, Xiaoqi Zhao, Youwei Pang et al.
Few Shot Part Segmentation Reveals Compositional Logic for Industrial Anomaly Detection
Soopil Kim, Sion An, Philip Chikontwe et al.
Test-Time Domain Generalization for Face Anti-Spoofing
Qianyu Zhou, Ke-Yue Zhang, Taiping Yao et al.
PINNACLE: PINN Adaptive ColLocation and Experimental points selection
Gregory Kang Ruey Lau, Apivich Hemachandra, See-Kiong Ng et al.
RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization
Mengqi Huang, Zhendong Mao, Mingcong Liu et al.
Stable Neural Stochastic Differential Equations in Analyzing Irregular Time Series Data
YongKyung Oh, Dongyoung Lim, Sungil Kim
R3D-AD: Reconstruction via Diffusion for 3D Anomaly Detection
Zheyuan Zhou, Wang Le, Naiyu Fang et al.
SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction
Yang Zhou, Hao Shao, Letian Wang et al.
Better Call SAL: Towards Learning to Segment Anything in Lidar
Aljoša Ošep, Tim Meinhardt, Francesco Ferroni et al.
When Do Prompting and Prefix-Tuning Work? A Theory of Capabilities and Limitations
Aleksandar Petrov, Philip Torr, Adel Bibi
Latent Space Editing in Transformer-Based Flow Matching
Vincent Tao Hu, Wei Zhang, Meng Tang et al.
EmoVIT: Revolutionizing Emotion Insights with Visual Instruction Tuning
Hongxia Xie, Chu-Jun Peng, Yu-Wen Tseng et al.
Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models
Xin Li, Yunfei Wu, Xinghua Jiang et al.
Unbiased Faster R-CNN for Single-source Domain Generalized Object Detection
Yajing Liu, Shijun Zhou, Xiyao Liu et al.
HIR-Diff: Unsupervised Hyperspectral Image Restoration Via Improved Diffusion Models
Li Pang, Xiangyu Rui, Long Cui et al.
MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization
Zhao Tianchen, Xuefei Ning, Tongcheng Fang et al.
6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model
Matteo Bortolon, Theodoros Tsesmelis, Stuart James et al.
MathAttack: Attacking Large Language Models towards Math Solving Ability
Zihao Zhou, Qiufeng Wang, Mingyu Jin et al.
A Unified and General Framework for Continual Learning
Zhenyi Wang, Yan Li, Li Shen et al.
SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance
Lukas Hoyer, David Tan, Muhammad Ferjad Naeem et al.
SMooDi: Stylized Motion Diffusion Model
Lei Zhong, Yiming Xie, Varun Jampani et al.
FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models
Jinglin Xu, Yijie Guo, Yuxin Peng
SimAC: A Simple Anti-Customization Method for Protecting Face Privacy against Text-to-Image Synthesis of Diffusion Models
Feifei Wang, Zhentao Tan, Tianyi Wei et al.
Deep Variational Incomplete Multi-View Clustering: Exploring Shared Clustering Structures
Gehui Xu, Jie Wen, Chengliang Liu et al.
Disentangled Prompt Representation for Domain Generalization
De Cheng, Zhipeng Xu, XINYANG JIANG et al.
SegPoint: Segment Any Point Cloud via Large Language Model
Shuting He, Henghui Ding, Xudong Jiang et al.
Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion
Lucas Nunes, Rodrigo Marcuzzi, Benedikt Mersch et al.
SlowTrack: Increasing the Latency of Camera-Based Perception in Autonomous Driving Using Adversarial Examples
Chen Ma, Ningfei Wang, Qi Alfred Chen et al.
Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model
Zhicai Wang, Longhui Wei, Tan Wang et al.
Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving
JunDa Cheng, Wei Yin, Kaixuan Wang et al.
ElasticDiffusion: Training-free Arbitrary Size Image Generation through Global-Local Content Separation
Moayed Haji Ali, Guha Balakrishnan, Vicente Ordonez
PEM: Prototype-based Efficient MaskFormer for Image Segmentation
Niccolò Cavagnero, Gabriele Rosi, Claudia Cuttano et al.
Question Aware Vision Transformer for Multimodal Reasoning
Roy Ganz, Yair Kittenplon, Aviad Aberdam et al.
Making RL with Preference-based Feedback Efficient via Randomization
Runzhe Wu, Wen Sun
Federated Adaptive Prompt Tuning for Multi-Domain Collaborative Learning
Shangchao Su, Mingzhao Yang, Bin Li et al.
Video Question Answering with Procedural Programs
Rohan Choudhury, Koichiro Niinuma, Kris Kitani et al.
Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking
Jiyao Zhang, Weiyao Huang, Bo Peng et al.
MEVG : Multi-event Video Generation with Text-to-Video Models
Gyeongrok Oh, Jaehwan Jeong, Sieun Kim et al.
DifFlow3D: Toward Robust Uncertainty-Aware Scene Flow Estimation with Iterative Diffusion-Based Refinement
Jiuming Liu, Guangming Wang, Weicai Ye et al.
Revisiting Single Image Reflection Removal In the Wild
Yurui Zhu, Bo Li, Xueyang Fu et al.
Making Large Language Models Better Planners with Reasoning-Decision Alignment
Zhijian Huang, Tao Tang, Shaoxiang Chen et al.
Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior
Fangfu Liu, Diankun Wu, Yi Wei et al.
STEM: Unleashing the Power of Embeddings for Multi-Task Recommendation
Liangcai Su, Junwei Pan, Ximei Wang et al.
DiffAM: Diffusion-based Adversarial Makeup Transfer for Facial Privacy Protection
Yuhao Sun, Lingyun Yu, Hongtao Xie et al.
Towards Efficient Replay in Federated Incremental Learning
Yichen Li, Qunwei Li, Haozhao Wang et al.
GenZI: Zero-Shot 3D Human-Scene Interaction Generation
Lei Li, Angela Dai
Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation
Siyu Jiao, hongguang Zhu, Yunchao Wei et al.
Score-Guided Diffusion for 3D Human Recovery
Anastasis Stathopoulos, Ligong Han, Dimitris N. Metaxas
DAMSDet: Dynamic Adaptive Multispectral Detection Transformer with Competitive Query Selection and Adaptive Feature Fusion
Junjie Guo, Chenqiang Gao, Fangcen liu et al.
VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation
XuDong Wang, Ishan Misra, Ziyun Zeng et al.
Multi-Modal Latent Space Learning for Chain-of-Thought Reasoning in Language Models
Liqi He, Zuchao Li, Xiantao Cai et al.
Addressing Loss of Plasticity and Catastrophic Forgetting in Continual Learning
Mohamed Elsayed, A. Rupam Mahmood
Amodal Completion via Progressive Mixed Context Diffusion
Katherine Xu, Lingzhi Zhang, Jianbo Shi
VLM2Scene: Self-Supervised Image-Text-LiDAR Learning with Foundation Models for Autonomous Driving Scene Understanding
Guibiao Liao, Jiankun Li, Xiaoqing Ye
DiffBEV: Conditional Diffusion Model for Bird’s Eye View Perception
Jiayu Zou, Kun Tian, Zheng Zhu et al.
Diagnosing and Re-learning for Balanced Multimodal Learning
Yake Wei, Siwei Li, Ruoxuan Feng et al.
FedSelect: Personalized Federated Learning with Customized Selection of Parameters for Fine-Tuning
Rishub Tamirisa, Chulin Xie, Wenxuan Bao et al.
Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark
Ziyang Chen, Israel D. Gebru, Christian Richardt et al.
How to Configure Good In-Context Sequence for Visual Question Answering
Li Li, Jiawei Peng, huiyi chen et al.
PathMMU: A Massive Multimodal Expert-Level Benchmark for Understanding and Reasoning in Pathology
YUXUAN SUN, Hao Wu, Chenglu Zhu et al.
OpenTab: Advancing Large Language Models as Open-domain Table Reasoners
Kezhi Kong, Jiani Zhang, Zhengyuan Shen et al.
Producing and Leveraging Online Map Uncertainty in Trajectory Prediction
Xunjiang Gu, Guanyu Song, Igor Gilitschenski et al.
DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment
Jiuming Liu, Dong Zhuo, Zhiheng Feng et al.
Pyramid Diffusion for Fine 3D Large Scene Generation
Yuheng Liu, Xinke Li, Xueting Li et al.
Mitigating Motion Blur in Neural Radiance Fields with Events and Frames
Marco Cannici, Davide Scaramuzza
Communication-Efficient Federated Learning with Accelerated Client Gradient
Geeho Kim, Jinkyu Kim, Bohyung Han
Large Language Models are Efficient Learners of Noise-Robust Speech Recognition
Yuchen Hu, CHEN CHEN, Chao-Han Huck Yang et al.
DragVideo: Interactive Drag-style Video Editing
Yufan Deng, Ruida Wang, Yuhao ZHANG et al.
Distilling Semantic Priors from SAM to Efficient Image Restoration Models
Quan Zhang, Xiaoyu Liu, Wei Li et al.
Multi-resolution HuBERT: Multi-resolution Speech Self-Supervised Learning with Masked Unit Prediction
Jiatong Shi, Hirofumi Inaguma, Xutai Ma et al.
Control4D: Efficient 4D Portrait Editing with Text
Ruizhi Shao, Jingxiang Sun, Cheng Peng et al.
U-mixer: An Unet-Mixer Architecture with Stationarity Correction for Time Series Forecasting
Xiang Ma, Xuemei Li, Lexin Fang et al.
Goldfish: Vision-Language Understanding of Arbitrarily Long Videos
Kirolos Ataallah, Xiaoqian Shen, Eslam mohamed abdelrahman et al.
VicTR: Video-conditioned Text Representations for Activity Recognition
Kumara Kahatapitiya, Anurag Arnab, Arsha Nagrani et al.
MBR and QE Finetuning: Training-time Distillation of the Best and Most Expensive Decoding Methods
Mara Finkelstein, Markus Freitag
Vamos: Versatile Action Models for Video Understanding
Shijie Wang, Qi Zhao, Minh Quan et al.
Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance
Dazhong Shen, Guanglu Song, Zeyue Xue et al.
Alchemist: Parametric Control of Material Properties with Diffusion Models
Prafull Sharma, Varun Jampani, Yuanzhen Li et al.
Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models
Shweta Mahajan, Tanzila Rahman, Kwang Moo Yi et al.
Translate Meanings, Not Just Words: IdiomKB’s Role in Optimizing Idiomatic Translation with Language Models
Shuang Li, Jiangjie Chen, Siyu Yuan et al.
FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis
Ke Fan, Junshu Tang, Weijian Cao et al.
LION: Implicit Vision Prompt Tuning
Haixin Wang, Jianlong Chang, Yihang Zhai et al.
6385 Efficient Spiking Neural Networks with Sparse Selective Activation for Continual Learning
Jiangrong Shen, Wenyao Ni, Qi Xu et al.
Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture
Fei Wang, Dan Guo, Kun Li et al.
Mono3DVG: 3D Visual Grounding in Monocular Images
Yangfan Zhan, Yuan Yuan, Zhitong Xiong
GPSFormer: A Global Perception and Local Structure Fitting-based Transformer for Point Cloud Understanding
Changshuo Wang, Meiqing Wu, Siew-Kei Lam et al.
Tokenize Anything via Prompting
Ting Pan, Lulu Tang, Xinlong Wang et al.
Exploiting Label Skews in Federated Learning with Model Concatenation
Yiqun Diao, Qinbin Li, Bingsheng He
MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation
Hanzhe Hu, Zhizhuo Zhou, Varun Jampani et al.
UGG: Unified Generative Grasping
Jiaxin Lu, Hao Kang, Haoxiang Li et al.
RobustSAM: Segment Anything Robustly on Degraded Images
Wei-Ting Chen, Yu Jiet Vong, Sy-Yen Kuo et al.
How to Fine-Tune Vision Models with SGD
Ananya Kumar, Ruoqi Shen, Sebastien Bubeck et al.
ICP-Flow: LiDAR Scene Flow Estimation with ICP
Yancong Lin, Holger Caesar