Most Cited 2024 "reasoning supervision" Papers
12,324 papers found • Page 15 of 62
Conference
Towards Variable and Coordinated Holistic Co-Speech Motion Generation
Yifei Liu, Qiong Cao, Yandong Wen et al.
Hunting Attributes: Context Prototype-Aware Learning for Weakly Supervised Semantic Segmentation
feilong tang, Zhongxing Xu, Zhaojun QU et al.
Language Models as Semantic Indexers
Bowen Jin, Hansi Zeng, Guoyin Wang et al.
Multi-Space Alignments Towards Universal LiDAR Segmentation
Youquan Liu, Lingdong Kong, Xiaoyang Wu et al.
Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation
Bingfeng Zhang, Siyue Yu, Yunchao Wei et al.
Beyond Text: Frozen Large Language Models in Visual Signal Comprehension
Lei Zhu, Fangyun Wei, Yanye Lu
Make-Your-Anchor: A Diffusion-based 2D Avatar Generation Framework
Ziyao Huang, Fan Tang, Yong Zhang et al.
Early Neuron Alignment in Two-layer ReLU Networks with Small Initialization
Hancheng Min, Enrique Mallada, Rene Vidal
Turning Dust into Gold: Distilling Complex Reasoning Capabilities from LLMs by Leveraging Negative Data
Yiwei Li, Peiwen Yuan, Shaoxiong Feng et al.
Revisiting the Power of Prompt for Visual Tuning
Yuzhu Wang, Lechao Cheng, Chaowei Fang et al.
CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation
Kangfu Mei, Mauricio Delbracio, Hossein Talebi et al.
VideoCon: Robust Video-Language Alignment via Contrast Captions
Hritik Bansal, Yonatan Bitton, Idan Szpektor et al.
Learning to Reject with a Fixed Predictor: Application to Decontextualization
Christopher Mohri, Daniel Andor, Eunsol Choi et al.
Diff-BGM: A Diffusion Model for Video Background Music Generation
Sizhe Li, Yiming Qin, Minghang Zheng et al.
EAGLE: Eigen Aggregation Learning for Object-Centric Unsupervised Semantic Segmentation
Chanyoung Kim, Woojung Han, Dayun Ju et al.
Understanding In-Context Learning from Repetitions
Jianhao (Elliott) Yan, Jin Xu, Chiyu Song et al.
Feature emergence via margin maximization: case studies in algebraic tasks
Depen Morwani, Benjamin Edelman, Costin-Andrei Oncescu et al.
Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models
Hongjie Wang, Difan Liu, Yan Kang et al.
CosPGD: an efficient white-box adversarial attack for pixel-wise prediction tasks
Shashank Agnihotri, Steffen Jung, Margret Keuper
MapDistill: Boosting Efficient Camera-based HD Map Construction via Camera-LiDAR Fusion Model Distillation
Xiaoshuai Hao, Ruikai Li, Hui Zhang et al.
PosterLlama: Bridging Design Ability of Langauge Model to Content-Aware Layout Generation
Jaejung Seol, Seojun Kim, Jaejun Yoo
Decision ConvFormer: Local Filtering in MetaFormer is Sufficient for Decision Making
Jeonghye Kim, Su Young Lee, Woojun Kim et al.
Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction
Xinhang Liu, Jiaben Chen, Shiu-Hong Kao et al.
Style Blind Domain Generalized Semantic Segmentation via Covariance Alignment and Semantic Consistence Contrastive Learning
Woo-Jin Ahn, Geun-Yeong Yang, Hyunduck Choi et al.
TimeX++: Learning Time-Series Explanations with Information Bottleneck
Zichuan Liu, Tianchun Wang, Jimeng Shi et al.
iKUN: Speak to Trackers without Retraining
Yunhao Du, Cheng Lei, Zhicheng Zhao et al.
CRA-PCN: Point Cloud Completion with Intra- and Inter-level Cross-Resolution Transformers
Yi Rong, Haoran Zhou, Lixin Yuan et al.
Point2CAD: Reverse Engineering CAD Models from 3D Point Clouds
Yujia Liu, Anton Obukhov, Jan D. Wegner et al.
Understanding Addition in Transformers
Philip Quirke, Fazl Barez
Be-Your-Outpainter: Mastering Video Outpainting through Input-Specific Adaptation
Fu-Yun Wang, Xiaoshi Wu, Zhaoyang Huang et al.
Data Poisoning based Backdoor Attacks to Contrastive Learning
Jinghuai Zhang, Hongbin Liu, Jinyuan Jia et al.
Multi-Scale Representations by Varying Window Attention for Semantic Segmentation
Haotian Yan, Ming Wu, Chuang Zhang
Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs
Muhammad Jehanzeb Mirza, Leonid Karlinsky, Wei Lin et al.
Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF
Han Shen, Zhuoran Yang, Tianyi Chen
WiMANS: A Benchmark Dataset for WiFi-based Multi-user Activity Sensing
Shuokang Huang, Kaihan Li, Di You et al.
Achieving Human Parity in Content-Grounded Datasets Generation
Asaf Yehudai, Boaz Carmeli, Yosi Mass et al.
EventBind: Learning a Unified Representation to Bind Them All for Event-based Open-world Understanding
jiazhou zhou, Xu Zheng, Yuanhuiyi Lyu et al.
EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models
Jingyuan Yang, Jiawei Feng, Hui Huang
Long-Short-Range Message-Passing: A Physics-Informed Framework to Capture Non-Local Interaction for Scalable Molecular Dynamics Simulation
Yunyang Li, Yusong Wang, Lin Huang et al.
Holo-Relighting: Controllable Volumetric Portrait Relighting from a Single Image
Yiqun Mei, Yu Zeng, He Zhang et al.
LLark: A Multimodal Instruction-Following Language Model for Music
Josh Gardner, Simon Durand, Daniel Stoller et al.
Privacy Backdoors: Stealing Data with Corrupted Pretrained Models
Shanglun Feng, Florian Tramer
DP-AdamBC: Your DP-Adam Is Actually DP-SGD (Unless You Apply Bias Correction)
Qiaoyue Tang, Frederick Shpilevskiy, Mathias Lécuyer
Mocap Everyone Everywhere: Lightweight Motion Capture With Smartwatches and a Head-Mounted Camera
Jiye Lee, Hanbyul Joo
CAT-SAM: Conditional Tuning for Few-Shot Adaptation of Segment Anything Model
Aoran Xiao, Weihao Xuan, Heli Qi et al.
Equivariant Deep Weight Space Alignment
Aviv Navon, Aviv Shamsian, Ethan Fetaya et al.
Seeing Motion at Nighttime with an Event Camera
Haoyue Liu, Shihan Peng, Lin Zhu et al.
THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models
Prannay Kaul, Zhizhong Li, Hao Yang et al.
DIFFTACTILE: A Physics-based Differentiable Tactile Simulator for Contact-rich Robotic Manipulation
Zilin Si, Gu Zhang, Qingwei Ben et al.
FreeDrag: Feature Dragging for Reliable Point-based Image Editing
Pengyang Ling, Lin Chen, Pan Zhang et al.
Rethinking Multi-domain Generalization with A General Learning Objective
Zhaorui Tan, Xi Yang, Kaizhu Huang
UMBRAE: Unified Multimodal Brain Decoding
Weihao Xia, Raoul de Charette, Cengiz Oztireli et al.
Boosting Adversarial Transferability across Model Genus by Deformation-Constrained Warping
Qinliang Lin, Cheng Luo, Zenghao Niu et al.
SHAP-EDITOR: Instruction-Guided Latent 3D Editing in Seconds
Minghao Chen, Junyu Xie, Iro Laina et al.
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
Xiangxiang Chu, Jianlin Su, Bo Zhang et al.
Automated Statistical Model Discovery with Language Models
Michael Li, Emily Fox, Noah Goodman
Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models
Mingjia Huo, Sai Ashish Somayajula, Youwei Liang et al.
Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling
Rui Liu, Yifan Hu, Yi Ren et al.
Backdoor Federated Learning by Poisoning Backdoor-Critical Layers
Haomin Zhuang, Mingxian Yu, Hao Wang et al.
A Conditional Denoising Diffusion Probabilistic Model for Point Cloud Upsampling
Wentao Qu, Yuantian Shao, Lingwu Meng et al.
Disentangled Diffusion-Based 3D Human Pose Estimation with Hierarchical Spatial and Temporal Denoiser
Qingyuan Cai, Xuecai Hu, Saihui Hou et al.
VP3D: Unleashing 2D Visual Prompt for Text-to-3D Generation
Yang Chen, Yingwei Pan, haibo yang et al.
Universal Segmentation at Arbitrary Granularity with Language Instruction
Yong Liu, Cairong Zhang, Yitong Wang et al.
LEOD: Label-Efficient Object Detection for Event Cameras
Ziyi Wu, Mathias Gehrig, Qing Lyu et al.
Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation
Wenhao Li, Mengyuan Liu, Hong Liu et al.
N2F2: Hierarchical Scene Understanding with Nested Neural Feature Fields
Yash Bhalgat, Iro Laina, Joao F Henriques et al.
TopoGCL: Topological Graph Contrastive Learning
Yuzhou Chen, Jose Frias, Yulia Gel
Graph Positional and Structural Encoder
Semih Cantürk, Renming Liu, Olivier Lapointe-Gagné et al.
Unifying Multi-Modal Uncertainty Modeling and Semantic Alignment for Text-to-Image Person Re-identification
Zhiwei Zhao, Bin Liu, Yan Lu et al.
CapHuman: Capture Your Moments in Parallel Universes
Chao Liang, Fan Ma, Linchao Zhu et al.
LAMM: Label Alignment for Multi-Modal Prompt Learning
Jingsheng Gao, Jiacheng Ruan, Suncheng Xiang et al.
See More Details: Efficient Image Super-Resolution by Experts Mining
Eduard Zamfir, Zongwei Wu, Nancy Mehta et al.
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Lucas D. Lingle
RetroBridge: Modeling Retrosynthesis with Markov Bridges
Ilia Igashov, Arne Schneuing, Marwin Segler et al.
GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model
Ling Li, Yu Ye, Bingchuan Jiang et al.
BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos
Pilhyeon Lee, Hyeran Byun
Closing the Curious Case of Neural Text Degeneration
Matthew Finlayson, John Hewitt, Alexander Koller et al.
EventDance: Unsupervised Source-free Cross-modal Adaptation for Event-based Object Recognition
Xu Zheng, Addison, Lin Wang
Autoregressive Omni-Aware Outpainting for Open-Vocabulary 360-Degree Image Generation
Zhuqiang Lu, Kun Hu, Chaoyue Wang et al.
Soft Prompt Generation for Domain Generalization
Shuanghao Bai, Yuedi Zhang, Wanqi Zhou et al.
Motion-adaptive Separable Collaborative Filters for Blind Motion Deblurring
Chengxu Liu, Xuan Wang, Xiangyu Xu et al.
DragAPart: Learning a Part-Level Motion Prior for Articulated Objects
Ruining Li, Chuanxia Zheng, Christian Rupprecht et al.
Prompt-Enhanced Multiple Instance Learning for Weakly Supervised Video Anomaly Detection
Junxi Chen, Liang Li, Li Su et al.
Learning Energy Decompositions for Partial Inference in GFlowNets
Hyosoon Jang, Minsu Kim, Sungsoo Ahn
ContraNovo: A Contrastive Learning Approach to Enhance De Novo Peptide Sequencing
Zhi Jin, Sheng Xu, Xiang Zhang et al.
DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks
Jiaxin Zhang, Dezhi Peng, Chongyu Liu et al.
ASID: Active Exploration for System Identification in Robotic Manipulation
Marius Memmel, Andrew Wagenmaker, Chuning Zhu et al.
I-MedSAM: Implicit Medical Image Segmentation with Segment Anything
Xiaobao Wei, Jiajun Cao, Yizhu Jin et al.
Task-Driven Exploration: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection
Jin Yang, Ping Wei, Huan Li et al.
Learning to Scale Logits for Temperature-Conditional GFlowNets
Minsu Kim, Joohwan Ko, Taeyoung Yun et al.
Learning Natural Consistency Representation for Face Forgery Video Detection
Daichi Zhang, Zihao Xiao, Shikun Li et al.
Conditional Variational Autoencoder for Sign Language Translation with Cross-Modal Alignment
Rui Zhao, Liang Zhang, Biao Fu et al.
Domain-Rectifying Adapter for Cross-Domain Few-Shot Segmentation
Jiapeng Su, Qi Fan, Wenjie Pei et al.
Slow and Steady Wins the Race: Maintaining Plasticity with Hare and Tortoise Networks
Hojoon Lee, Hyeonseo Cho, Hyunseung Kim et al.
DiffClass: Diffusion-Based Class Incremental Learning
Zichong Meng, Jie Zhang, Changdi Yang et al.
NARUTO: Neural Active Reconstruction from Uncertain Target Observations
Ziyue Feng, Huangying Zhan, Zheng Chen et al.
Vision Transformer Off-the-Shelf: A Surprising Baseline for Few-Shot Class-Agnostic Counting
Zhicheng Wang, Liwen Xiao, Zhiguo Cao et al.
Boosting Neural Representations for Videos with a Conditional Decoder
XINJIE ZHANG, Ren Yang, Dailan He et al.
Boosting the Power of Small Multimodal Reasoning Models to Match Larger Models with Self-Consistency Training
Cheng Tan, Jingxuan Wei, Zhangyang Gao et al.
Batched Low-Rank Adaptation of Foundation Models
Yeming Wen, Swarat Chaudhuri
Higher-Order Graph Convolutional Network with Flower-Petals Laplacians on Simplicial Complexes
Yiming Huang, Yujie Zeng, Qiang Wu et al.
Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training Framework
Vu Minh Hieu Phan, Yutong Xie, Yuankai Qi et al.
Multimodal Patient Representation Learning with Missing Modalities and Labels
Zhenbang Wu, Anant Dadu, Nicholas Tustison et al.
Posterior Sampling Based on Gradient Flows of the MMD with Negative Distance Kernel
Paul Hagemann, Johannes Hertrich, Fabian Altekrüger et al.
Neural Refinement for Absolute Pose Regression with Feature Synthesis
Shuai Chen, Yash Bhalgat, Xinghui Li et al.
Embracing Events and Frames with Hierarchical Feature Refinement Network for Object Detection
Hu Cao, Zehua Zhang, Yan Xia et al.
Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional Understanding
Le Zhang, Rabiul Awal, Aishwarya Agrawal
Scaling physics-informed hard constraints with mixture-of-experts
Nithin Chalapathi, Yiheng Du, Aditi Krishnapriyan
Personalized Federated Domain-Incremental Learning based on Adaptive Knowledge Matching
Yichen Li, Wenchao Xu, Haozhao Wang et al.
Insect-Foundation: A Foundation Model and Large-scale 1M Dataset for Visual Insect Understanding
Hoang-Quan Nguyen, Thanh-Dat Truong, Xuan-Bac Nguyen et al.
Diffusion Models for Monocular Depth Estimation: Overcoming Challenging Conditions
Fabio Tosi, Pierluigi Zama Ramirez, Matteo Poggi
A Simple Baseline for Efficient Hand Mesh Reconstruction
zhishan zhou, shihao zhou, Zhi Lv et al.
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
Zhecan Wang, Garrett Bingham, Adams Wei Yu et al.
Video Editing via Factorized Diffusion Distillation
Uriel Singer, Amit Zohar, Yuval Kirstain et al.
Four Ways to Improve Verbo-visual Fusion for Dense 3D Visual Grounding
Ozan Unal, Christos Sakaridis, Suman Saha et al.
Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder
Jinseok Kim, Tae-Kyun Kim
Towards Scalable and Versatile Weight Space Learning
Konstantin Schürholt, Michael Mahoney, Damian Borth
Dataset Distillation by Automatic Training Trajectories
Dai Liu, Jindong Gu, Hu Cao et al.
Learning the greatest common divisor: explaining transformer predictions
François Charton
Biased Temporal Convolution Graph Network for Time Series Forecasting with Missing Values
Xiaodan Chen, Xiucheng Li, Bo Liu et al.
Self-Supervised Facial Representation Learning with Facial Region Awareness
Zheng Gao, Ioannis Patras
Selecting Large Language Model to Fine-tune via Rectified Scaling Law
Haowei Lin, Baizhou Huang, Haotian Ye et al.
MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders
Baijiong Lin, Weisen Jiang, Pengguang Chen et al.
Visual-Augmented Dynamic Semantic Prototype for Generative Zero-Shot Learning
Wenjin Hou, Shiming Chen, Shuhuang Chen et al.
Object-Aware Inversion and Reassembly for Image Editing
Zhen Yang, Ganggui Ding, Wen Wang et al.
Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models
Pablo Marcos-Manchón, Roberto Alcover-Couso, Juan SanMiguel et al.
Hybrid Inverse Reinforcement Learning
Juntao Ren, Gokul Swamy, Steven Wu et al.
Chinese Spelling Correction as Rephrasing Language Model
Linfeng Liu, Hongqiu Wu, Hai Zhao
Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch
Xidong Wu, Shangqian Gao, Zeyu Zhang et al.
PolyVoice: Language Models for Speech to Speech Translation
Qianqian Dong, Zhiying Huang, Qiao Tian et al.
Sparse Semi-DETR: Sparse Learnable Queries for Semi-Supervised Object Detection
Tahira Shehzadi, Khurram Azeem Hashmi, Didier Stricker et al.
Parallelizing non-linear sequential models over the sequence length
Yi Heng Lim, Qi Zhu, Joshua Selfridge et al.
A Generalized Neural Diffusion Framework on Graphs
10011 Yibo Li, Xiao Wang, Hongrui Liu et al.
On the Stability of Expressive Positional Encodings for Graphs
Yinan Huang, William Lu, Joshua Robinson et al.
Energy-Based Concept Bottleneck Models: Unifying Prediction, Concept Intervention, and Probabilistic Interpretations
Xinyue Xu, Yi Qin, Lu Mi et al.
Nuvo: Neural UV Mapping for Unruly 3D Representations
Pratul Srinivasan, Stephan J Garbin, Dor Verbin et al.
BAGEL: Bootstrapping Agents by Guiding Exploration with Language
Shikhar Murty, Christopher Manning, Peter Shaw et al.
Elite360D: Towards Efficient 360 Depth Estimation via Semantic- and Distance-Aware Bi-Projection Fusion
Hao Ai, Addison, Lin Wang
UMIE: Unified Multimodal Information Extraction with Instruction Tuning
Lin Sun, Kai Zhang, Qingyuan Li et al.
Do Efficient Transformers Really Save Computation?
Kai Yang, Jan Ackermann, Zhenyu He et al.
Copula Conformal prediction for multi-step time series prediction
Sophia Sun, Rose Yu
UniTabE: A Universal Pretraining Protocol for Tabular Foundation Model in Data Science
Yazheng Yang, Yuqi Wang, Guang Liu et al.
Entropic Open-Set Active Learning
Bardia Safaei, Vibashan VS, Celso de Melo et al.
Cross-Covariate Gait Recognition: A Benchmark
Shinan Zou, Chao Fan, Jianbo Xiong et al.
I'M HOI: Inertia-aware Monocular Capture of 3D Human-Object Interactions
Chengfeng Zhao, Juze Zhang, Jiashen Du et al.
PhyloGFN: Phylogenetic inference with generative flow networks
MING YANG ZHOU, Zichao Yan, Elliot Layne et al.
All in One Framework for Multimodal Re-identification in the Wild
He Li, Mang Ye, Ming Zhang et al.
PPAD: Iterative Interactions of Prediction and Planning for End-to-end Autonomous Driving
Zhili Chen, Maosheng Ye, Shuangjie Xu et al.
WHAC: World-grounded Humans and Cameras
Wanqi Yin, Zhongang Cai, Chen Wei et al.
Ghost on the Shell: An Expressive Representation of General 3D Shapes
Zhen Liu, Yao Feng, Yuliang Xiu et al.
Adapting to Length Shift: FlexiLength Network for Trajectory Prediction
Yi Xu, Yun Fu
The Devil is in the Fine-Grained Details: Evaluating Open-Vocabulary Object Detectors for Fine-Grained Understanding
Lorenzo Bianchi, Fabio Carrara, Nicola Messina et al.
Zero-1-to-3: Domain-Level Zero-Shot Cognitive Diagnosis via One Batch of Early-Bird Students towards Three Diagnostic Objectives
Weibo Gao, Qi Liu, Hao Wang et al.
How Smooth Is Attention?
Valérie Castin, Pierre Ablin, Gabriel Peyré
Contrastive Learning is Spectral Clustering on Similarity Graph
Zhiquan Tan, Yifan Zhang, Jingqin Yang et al.
WANDR: Intention-guided Human Motion Generation
Markos Diomataris, Nikos Athanasiou, Omid Taheri et al.
UDiFF: Generating Conditional Unsigned Distance Fields with Optimal Wavelet Diffusion
Junsheng Zhou, Weiqi Zhang, Baorui Ma et al.
HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects
Xintao Lv, Liang Xu, Yichao Yan et al.
Adapt Before Comparison: A New Perspective on Cross-Domain Few-Shot Segmentation
Jonas Herzog
TERD: A Unified Framework for Safeguarding Diffusion Models Against Backdoors
Yichuan Mo, Hui Huang, Mingjie Li et al.
Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model
Xu He, Qiaochu Huang, Zhensong Zhang et al.
Human Hair Reconstruction with Strand-Aligned 3D Gaussians
Egor Zakharov, Vanessa Sklyarova, Michael J. Black et al.
LaneCPP: Continuous 3D Lane Detection using Physical Priors
Maximilian Pittner, Joel Janai, Alexandru Paul Condurache
Offline and Online Optical Flow Enhancement for Deep Video Compression
Chuanbo Tang, Xihua Sheng, Zhuoyuan Li et al.
SeqGPT: An Out-of-the-Box Large Language Model for Open Domain Sequence Understanding
Tianyu Yu, Chengyue Jiang, Chao Lou et al.
VEGS: View Extrapolation of Urban Scenes in 3D Gaussian Splatting using Learned Priors
Sungwon Hwang, Min-Jung Kim, Taewoong Kang et al.
Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion
Zuoyue Li, Zhenqiang Li, Zhaopeng Cui et al.
Inherently Interpretable Time Series Classification via Multiple Instance Learning
Joseph Early, Gavin Cheung, Kurt Cutajar et al.
Nearest is Not Dearest: Towards Practical Defense against Quantization-conditioned Backdoor Attacks
Boheng Li, Yishuo Cai, Haowei Li et al.
Exploring the Potential of Large Foundation Models for Open-Vocabulary HOI Detection
Ting Lei, Shaofeng Yin, Yang Liu
MatchU: Matching Unseen Objects for 6D Pose Estimation from RGB-D Images
Junwen Huang, Hao Yu, Kuan-Ting Yu et al.
Make-A-Shape: a Ten-Million-scale 3D Shape Model
Ka-Hei Hui, Aditya Sanghi, Arianna Rampini et al.
Label Propagation for Zero-shot Classification with Vision-Language Models
Vladan Stojnić, Yannis Kalantidis, Giorgos Tolias
The LLM Surgeon
Tycho van der Ouderaa, Markus Nagel, Mart van Baalen et al.
4D Contrastive Superflows are Dense 3D Representation Learners
Xiang Xu, Lingdong Kong, Hui Shuai et al.
How do Large Language Models Navigate Conflicts between Honesty and Helpfulness?
Ryan Liu, Theodore R Sumers, Ishita Dasgupta et al.
Are Transformers with One Layer Self-Attention Using Low-Rank Weight Matrices Universal Approximators?
Tokio Kajitsuka, Issei Sato
Image Compression for Machine and Human Vision With Spatial-Frequency Adaptation
han li, Shaohui Li, Shuangrui Ding et al.
Learning to Intervene on Concept Bottlenecks
David Steinmann, Wolfgang Stammer, Felix Friedrich et al.
Generalized Schrödinger Bridge Matching
Guan-Horng Liu, Yaron Lipman, Maximilian Nickel et al.
Whispering Experts: Neural Interventions for Toxicity Mitigation in Language Models
Xavi Suau, Pieter Delobelle, Katherine Metcalf et al.
Graph Disentangled Contrastive Learning with Personalized Transfer for Cross-Domain Recommendation
Jing Liu, Lele Sun, Wei-zhi Nie et al.
NILUT: Conditional Neural Implicit 3D Lookup Tables for Image Enhancement
Marcos Conde, Javier Vazquez-Corral, Michael Brown et al.
DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly
Gianluca Scarpellini, Stefano Fiorini, Francesco Giuliari et al.
3D Paintbrush: Local Stylization of 3D Shapes with Cascaded Score Distillation
Dale Decatur, Itai Lang, Kfir Aberman et al.
T2IShield: Defending Against Backdoors on Text-to-Image Diffusion Models
Zhongqi Wang, Jie Zhang, Shiguang Shan et al.
SPIN: Simultaneous Perception Interaction and Navigation
Shagun Uppal, Ananye Agarwal, Haoyu Xiong et al.
Exploring Conditional Multi-Modal Prompts for Zero-shot HOI Detection
Ting Lei, Shaofeng Yin, Yuxin Peng et al.
IDEAL: Influence-Driven Selective Annotations Empower In-Context Learners in Large Language Models
Shaokun Zhang, Xiaobo Xia, Zhaoqing Wang et al.
MVIP-NeRF: Multi-view 3D Inpainting on NeRF Scenes via Diffusion Prior
Honghua Chen, Chen Change Loy, Xingang Pan
Towards Aligned Layout Generation via Diffusion Model with Aesthetic Constraints
Jian Chen, Ruiyi Zhang, Yufan Zhou et al.
Energy-guided Entropic Neural Optimal Transport
Petr Mokrov, Alexander Korotin, Alexander Kolesov et al.
Masked Structural Growth for 2x Faster Language Model Pre-training
Yiqun Yao, Zheng Zhang, Jing Li et al.
Analyzing Feed-Forward Blocks in Transformers through the Lens of Attention Maps
Goro Kobayashi, Tatsuki Kuribayashi, Sho Yokoi et al.
NOLA: Compressing LoRA using Linear Combination of Random Basis
Soroush Abbasi Koohpayegani, K L Navaneet, Parsa Nooralinejad et al.
Trackastra: Transformer-based cell tracking for live-cell microscopy
Benjamin Gallusser, Weigert Martin
FMA-Net: Flow-Guided Dynamic Filtering and Iterative Feature Refinement with Multi-Attention for Joint Video Super-Resolution and Deblurring
Geunhyuk Youk, Jihyong Oh, Munchurl Kim
Critic-Guided Decision Transformer for Offline Reinforcement Learning
Yuanfu Wang, Chao Yang, Ying Wen et al.
SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation
Heyuan Li, Ce Chen, Tianhao Shi et al.
APT: Adaptive Pruning and Tuning Pretrained Language Models for Efficient Training and Inference
Bowen Zhao, Hannaneh Hajishirzi, Qingqing Cao