Most Cited 2024 "critical token identification" Papers
12,324 papers found • Page 9 of 62
Conference
On Penalty Methods for Nonconvex Bilevel Optimization and First-Order Stochastic Approximation
Jeongyeol Kwon, Dohyun Kwon, Stephen Wright et al.
Neural Operators with Localized Integral and Differential Kernels
Miguel Liu-Schiaffini, Julius Berner, Boris Bonev et al.
OpenNeRF: Open Set 3D Neural Scene Segmentation with Pixel-Wise Features and Rendered Novel Views
Francis Engelmann, Fabian Manhardt, Michael Niemeyer et al.
Analytic-Splatting: Anti-Aliased 3D Gaussian Splatting via Analytic Integration
Zhihao Liang, Qi Zhang, WENBO HU et al.
Implicit Discriminative Knowledge Learning for Visible-Infrared Person Re-Identification
kaijie ren, Lei Zhang
Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model
Yinan Zheng, Jianxiong Li, Dongjie Yu et al.
In-Context Learning Learns Label Relationships but Is Not Conventional Learning
Jannik Kossen, Yarin Gal, Tom Rainforth
Adaptive Text Watermark for Large Language Models
Yepeng Liu, Yuheng Bu
Accelerating Diffusion Sampling with Optimized Time Steps
Shuchen Xue, Zhaoqiang Liu, Fei Chen et al.
FlashTex: Fast Relightable Mesh Texturing with LightControlNet
Kangle Deng, Timothy Omernick, Alexander B Weiss et al.
GeoGaussian: Geometry-aware Gaussian Splatting for Scene Rendering
Yanyan Li, Chenyu Lyu, Yan Di et al.
Test-Time Training on Nearest Neighbors for Large Language Models
Moritz Hardt, Yu Sun
Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning
Bingchen Zhao, Haoqin Tu, Chen Wei et al.
Radiative Gaussian Splatting for Efficient X-ray Novel View Synthesis
Yuanhao Cai, Yixun Liang, Jiahao Wang et al.
CamoDiffusion: Camouflaged Object Detection via Conditional Diffusion Models
Zhongxi Chen, Ke Sun, Xianming Lin
FakeInversion: Learning to Detect Images from Unseen Text-to-Image Models by Inverting Stable Diffusion
George Cazenavette, Avneesh Sud, Thomas Leung et al.
How Universal Polynomial Bases Enhance Spectral Graph Neural Networks: Heterophily, Over-smoothing, and Over-squashing
Keke Huang, Yu Guang Wang, Ming Li et al.
HeadGaS: Real-Time Animatable Head Avatars via 3D Gaussian Splatting
Helisa Dhamo, Yinyu Nie, Arthur Moreau et al.
Data Roaming and Quality Assessment for Composed Image Retrieval
Matan Levy, Rami Ben-Ari, Nir Darshan et al.
Differentially Private Bias-Term Fine-tuning of Foundation Models
Zhiqi Bu, Yu-Xiang Wang, Sheng Zha et al.
SWAG: Splatting in the Wild images with Appearance-conditioned Gaussians
Hiba Dahmani, Moussab Bennehar, Nathan Piasco et al.
RTMO: Towards High-Performance One-Stage Real-Time Multi-Person Pose Estimation
Peng Lu, Tao Jiang, Yining Li et al.
EGTR: Extracting Graph from Transformer for Scene Graph Generation
Jinbae Im, JeongYeon Nam, Nokyung Park et al.
Beyond Weisfeiler-Lehman: A Quantitative Framework for GNN Expressiveness
Bohang Zhang, Jingchu Gai, Yiheng Du et al.
LaRa: Efficient Large-Baseline Radiance Fields
Anpei Chen, Haofei Xu, Stefano Esposito et al.
UniAdapter: Unified Parameter-Efficient Transfer Learning for Cross-modal Modeling
Haoyu Lu, Yuqi Huo, Guoxing Yang et al.
Auto-Regressive Next-Token Predictors are Universal Learners
Eran Malach
Watermarking Conditional Text Generation for AI Detection: Unveiling Challenges and a Semantic-Aware Watermark Remedy
Yu Fu, Deyi Xiong, Yue Dong
A Recipe for Scaling up Text-to-Video Generation with Text-free Videos
Xiang Wang, Shiwei Zhang, Hangjie Yuan et al.
SALMON: Self-Alignment with Instructable Reward Models
Zhiqing Sun, Yikang Shen, Hongxin Zhang et al.
Dynamic Evaluation of Large Language Models by Meta Probing Agents
Kaijie Zhu, Jindong Wang, Qinlin Zhao et al.
Seer: Language Instructed Video Prediction with Latent Diffusion Models
Xianfan Gu, Chuan Wen, Weirui Ye et al.
See and Think: Embodied Agent in Virtual Environment
Zhonghan Zhao, Xuan Wang, Wenhao Chai et al.
Shadows Don't Lie and Lines Can't Bend! Generative Models don't know Projective Geometry...for now
Ayush Sarkar, Hanlin Mai, Amitabh Mahapatra et al.
SecondPose: SE(3)-Consistent Dual-Stream Feature Fusion for Category-Level Pose Estimation
Yamei Chen, Yan Di, Guangyao Zhai et al.
Pruner-Zero: Evolving Symbolic Pruning Metric From Scratch for Large Language Models
Peijie Dong, Lujun Li, Zhenheng Tang et al.
Towards Text-guided 3D Scene Composition
Qihang Zhang, Chaoyang Wang, Aliaksandr Siarohin et al.
Local Search GFlowNets
Minsu Kim, Yun Taeyoung, Emmanuel Bengio et al.
Lookahead Exploration with Neural Radiance Representation for Continuous Vision-Language Navigation
Zihan Wang, Xiangyang Li, Jiahao Yang et al.
Open-Vocabulary Segmentation with Semantic-Assisted Calibration
Yong Liu, Sule Bai, Guanbin Li et al.
Dense Optical Tracking: Connecting the Dots
Guillaume Le Moing, Jean Ponce, Cordelia Schmid
Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains
Junhong Shen, Neil Tenenholtz, James Hall et al.
Visual In-Context Prompting
Feng Li, Qing Jiang, Hao Zhang et al.
VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models
Junlin Han, Filippos Kokkinos, Philip Torr
SPOC: Imitating Shortest Paths in Simulation Enables Effective Navigation and Manipulation in the Real World
Kiana Ehsani, Tanmay Gupta, Rose Hendrix et al.
LatestEval: Addressing Data Contamination in Language Model Evaluation through Dynamic and Time
Sensitive Test Construction - Yucheng Li, Frank Guerin, Chenghua Lin
A Semi-supervised Nighttime Dehazing Baseline with Spatial-Frequency Aware and Realistic Brightness Constraint
Xiaofeng Cong, Jie Gui, Jing Zhang et al.
End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames
Shuming Liu, Chenlin Zhang, Chen Zhao et al.
A Study of Bayesian Neural Network Surrogates for Bayesian Optimization
Yucen Li, Tim G. J. Rudner, Andrew Gordon Wilson
Swallowing the Bitter Pill: Simplified Scalable Conformer Generation
Yuyang Wang, Ahmed Elhag, Navdeep Jaitly et al.
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering
Zeyu Liu, Weicong Liang, Zhanhao Liang et al.
Relightful Harmonization: Lighting-aware Portrait Background Replacement
Mengwei Ren, Wei Xiong, Jae Shin Yoon et al.
MapTracker: Tracking with Strided Memory Fusion for Consistent Vector HD Mapping
Jiacheng Chen, Yuefan Wu, Tan Jiaqi et al.
Embodied Understanding of Driving Scenarios
Yunsong Zhou, Linyan Huang, Qingwen Bu et al.
MVSFormer++: Revealing the Devil in Transformer's Details for Multi-View Stereo
chenjie cao, xinlin ren, Yanwei Fu
VideoStudio: Generating Consistent-Content and Multi-Scene Videos
Fuchen Long, Zhaofan Qiu, Ting Yao et al.
OMNI: Open-endedness via Models of human Notions of Interestingness
Jenny Zhang, Joel Lehman, Kenneth Stanley et al.
Panoptic Scene Graph Generation with Semantics-Prototype Learning
Li Li, Wei Ji, Yiming Wu et al.
Relax Image-Specific Prompt Requirement in SAM: A Single Generic Prompt for Segmenting Camouflaged Objects
Jian Hu, Jiayi Lin, Shaogang Gong et al.
GoMAvatar: Efficient Animatable Human Modeling from Monocular Video Using Gaussians-on-Mesh
Jing Wen, Xiaoming Zhao, Jason Ren et al.
MultiDiff: Consistent Novel View Synthesis from a Single Image
Norman Müller, Katja Schwarz, Barbara Roessle et al.
Consistent Prompting for Rehearsal-Free Continual Learning
Zhanxin Gao, Jun Cen, Xiaobin Chang
Mechanistic Design and Scaling of Hybrid Architectures
Michael Poli, Armin Thomas, Eric Nguyen et al.
Visual Instruction Tuning with Polite Flamingo
Delong Chen, Jianfeng Liu, Wenliang Dai et al.
GAMC: An Unsupervised Method for Fake News Detection Using Graph Autoencoder with Masking
Shu Yin, Peican Zhu, Lianwei Wu et al.
Protein Discovery with Discrete Walk-Jump Sampling
Nathan Frey, Dan Berenberg, Karina Zadorozhny et al.
EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion
Guangyao Zhai, Evin Pınar Örnek, Dave Zhenyu Chen et al.
Training Diffusion Models Towards Diverse Image Generation with Reinforcement Learning
Zichen Miao, Jiang Wang, Ze Wang et al.
GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes
Ibrahim Ethem Hamamci, Sezgin Er, Anjany Sekuboyina et al.
MOTOR: A Time-to-Event Foundation Model For Structured Medical Records
Ethan Steinberg, Jason Fries, Yizhe Xu et al.
Domain-Agnostic Mutual Prompting for Unsupervised Domain Adaptation
Zhekai Du, Xinyao Li, Fengling Li et al.
Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge
Xuan Shen, Peiyan Dong, Lei Lu et al.
Learning Multi-Agent Communication from Graph Modeling Perspective
Shengchao Hu, Li Shen, Ya Zhang et al.
Towards Scalable 3D Anomaly Detection and Localization: A Benchmark via 3D Anomaly Synthesis and A Self-Supervised Learning Network
wenqiao Li, Xiaohao Xu, Yao Gu et al.
EarthVQA: Towards Queryable Earth via Relational Reasoning-Based Remote Sensing Visual Question Answering
Junjue Wang, Zhuo Zheng, Zihang Chen et al.
SQLdepth: Generalizable Self-Supervised Fine-Structured Monocular Depth Estimation
Dong Wu, Mingmin Chi, Xuan Zang et al.
Few-Shot Object Detection with Foundation Models
Guangxing Han, Ser-Nam Lim
A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment
Tianhe Wu, Kede Ma, Jie Liang et al.
Particle Denoising Diffusion Sampler
Angus Phillips, Hai-Dang Dau, Michael Hutchinson et al.
A Comparative Study of Image Restoration Networks for General Backbone Network Design
Xiangyu Chen, Zheyuan Li, Yuandong Pu et al.
Prot2Text: Multimodal Protein’s Function Generation with GNNs and Transformers
Hadi Abdine, Michail Chatzianastasis, Costas Bouyioukos et al.
Graph Neural Networks for Learning Equivariant Representations of Neural Networks
Miltiadis (Miltos) Kofinas, Boris Knyazev, Yan Zhang et al.
WOUAF: Weight Modulation for User Attribution and Fingerprinting in Text-to-Image Diffusion Models
Changhoon Kim, Kyle Min, Maitreya Patel et al.
Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training
David Wan, Jaemin Cho, Elias Stengel-Eskin et al.
DéjàVu: KV-cache Streaming for Fast, Fault-tolerant Generative LLM Serving
Foteini Strati, Sara McAllister, Amar Phanishayee et al.
Text-Image Alignment for Diffusion-Based Perception
Neehar Kondapaneni, Markus Marks, Manuel Knott et al.
AquaLoRA: Toward White-box Protection for Customized Stable Diffusion Models via Watermark LoRA
Weitao Feng, Wenbo Zhou, Jiyan He et al.
Knowledge Card: Filling LLMs' Knowledge Gaps with Plug-in Specialized Language Models
Shangbin Feng, Weijia Shi, Yuyang Bai et al.
ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth Estimation
Suraj Patni, Aradhye Agarwal, Chetan Arora
Direct2.5: Diverse Text-to-3D Generation via Multi-view 2.5D Diffusion
Yuanxun Lu, Jingyang Zhang, Shiwei Li et al.
Learning to Embed Time Series Patches Independently
Seunghan Lee, Taeyoung Park, Kibok Lee
SGNet: Structure Guided Network via Gradient-Frequency Awareness for Depth Map Super-resolution
Zhengxue Wang, Zhiqiang Yan, Jian Yang
A Diffusion Model Framework for Unsupervised Neural Combinatorial Optimization
Sebastian Sanokowski, Sepp Hochreiter, Sebastian Lehner
Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark
Fangjun Li, David C. Hogg, Anthony G. Cohn
Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector
Yuqian Fu, Yu Wang, Yixuan Pan et al.
PointOBB: Learning Oriented Object Detection via Single Point Supervision
Junwei Luo, Xue Yang, Yi Yu et al.
FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition
Ganggui Ding, Canyu Zhao, Wen Wang et al.
Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-Modal Structured Representations
Yufeng Huang, Jiji Tang, Zhuo Chen et al.
FaceTalk: Audio-Driven Motion Diffusion for Neural Parametric Head Models
Shivangi Aneja, Justus Thies, Angela Dai et al.
ScanReason: Empowering 3D Visual Grounding with Reasoning Capabilities
CHENMING ZHU, Tai Wang, Wenwei Zhang et al.
AutomaTikZ: Text-Guided Synthesis of Scientific Vector Graphics with TikZ
Jonas Belouadi, Anne Lauscher, Steffen Eger
Sequential Neural Score Estimation: Likelihood-Free Inference with Conditional Score Based Diffusion Models
Louis Sharrock, Jack Simons, Song Liu et al.
Enhancing Multimodal Cooperation via Sample-level Modality Valuation
Yake Wei, Ruoxuan Feng, Zihe Wang et al.
MolCRAFT: Structure-Based Drug Design in Continuous Parameter Space
Yanru Qu, Keyue Qiu, Yuxuan Song et al.
S2WAT: Image Style Transfer via Hierarchical Vision Transformer Using Strips Window Attention
Chiyu Zhang, Xiaogang Xu, Lei Wang et al.
SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation
Aysim Toker, Marvin Eisenberger, Daniel Cremers et al.
Aleth-NeRF: Illumination Adaptive NeRF with Concealing Field Assumption
Ziteng Cui, Lin Gu, Xiao Sun et al.
SFC: Shared Feature Calibration in Weakly Supervised Semantic Segmentation
Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation
Haofeng Liu, Chenshu Xu, Yifei Yang et al.
Physical 3D Adversarial Attacks against Monocular Depth Estimation in Autonomous Driving
Junhao Zheng, Chenhao Lin, Jiahao Sun et al.
Cooperative Graph Neural Networks
Ben Finkelshtein, Xingyue Huang, Michael Bronstein et al.
M3D: Dataset Condensation by Minimizing Maximum Mean Discrepancy
Hansong Zhang, Shikun Li, Pengju Wang et al.
Intriguing Properties of Generative Classifiers
Priyank Jaini, Kevin Clark, Robert Geirhos
Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention
Jie Ren, Yaxin Li, Shenglai Zeng et al.
A Framework and Benchmark for Deep Batch Active Learning for Regression
David Holzmüller, Viktor Zaverkin, Johannes Kästner et al.
Premise Order Matters in Reasoning with Large Language Models
Xinyun Chen, Ryan Chi, Xuezhi Wang et al.
Prompt to Transfer: Sim-to-Real Transfer for Traffic Signal Control with Prompt Learning
Longchao Da, Minquan Gao, Hua Wei et al.
CFPL-FAS: Class Free Prompt Learning for Generalizable Face Anti-spoofing
Ajian Liu, Shuai Xue, Gan Jianwen et al.
ArtAdapter: Text-to-Image Style Transfer using Multi-Level Style Encoder and Explicit Adaptation
Dar-Yen Chen, Hamish Tennent, Ching-Wen Hsu
GPAvatar: Generalizable and Precise Head Avatar from Image(s)
Xuangeng Chu, Yu Li, Ailing Zeng et al.
Spatial Transform Decoupling for Oriented Object Detection
Hongtian Yu, Yunjie Tian, Qixiang Ye et al.
Tree-Planner: Efficient Close-loop Task Planning with Large Language Models
Mengkang Hu, Yao Mu, Xinmiao Yu et al.
Understanding the Role of the Projector in Knowledge Distillation
Describing Differences in Image Sets with Natural Language
Lisa Dunlap, Yuhui Zhang, Xiaohan Wang et al.
MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
Yining Hong, Zishuo Zheng, Peihao Chen et al.
BigGait: Learning Gait Representation You Want by Large Vision Models
Dingqiang Ye, Chao Fan, Jingzhe Ma et al.
Consistent3D: Towards Consistent High-Fidelity Text-to-3D Generation with Deterministic Sampling Prior
Zike Wu, Pan Zhou, YI Xuanyu et al.
SubT-MRS Dataset: Pushing SLAM Towards All-weather Environments
Shibo Zhao, Yuanjun Gao, Tianhao Wu et al.
Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
shiyu xuan, Qingpei Guo, Ming Yang et al.
Soft Contrastive Learning for Time Series
Seunghan Lee, Taeyoung Park, Kibok Lee
DiffusionLight: Light Probes for Free by Painting a Chrome Ball
Pakkapon Phongthawee, Worameth Chinchuthakun, Nontaphat Sinsunthithet et al.
AvatarGPT: All-in-One Framework for Motion Understanding Planning Generation and Beyond
Zixiang Zhou, Yu Wan, Baoyuan Wang
UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation
Zexiang Liu, Yangguang Li, Youtian Lin et al.
Privacy-Preserving In-Context Learning for Large Language Models
Tong Wu, Ashwinee Panda, Jiachen (Tianhao) Wang et al.
JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation
Yu Zeng, Vishal M. Patel, Haochen Wang et al.
High-Order Structure Based Middle-Feature Learning for Visible-Infrared Person Re-identification
Liuxiang Qiu, Si Chen, Yan Yan et al.
CONFORM: Contrast is All You Need for High-Fidelity Text-to-Image Diffusion Models
Tuna Han Salih Meral, Enis Simsar, Federico Tombari et al.
FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation
Shuai Yang, Yifan Zhou, Ziwei Liu et al.
Challenging Forgets: Unveiling the Worst-Case Forget Sets in Machine Unlearning
Chongyu Fan, Jiancheng Liu, Alfred Hero et al.
DI-V2X: Learning Domain-Invariant Representation for Vehicle-Infrastructure Collaborative 3D Object Detection
Li Xiang, Junbo Yin, Wei Li et al.
UFineBench: Towards Text-based Person Retrieval with Ultra-fine Granularity
Jialong Zuo, Hanyu Zhou, Ying Nie et al.
From Zero to Turbulence: Generative Modeling for 3D Flow Simulation
Marten Lienen, David Lüdke, Jan Hansen-Palmus et al.
On the Test-Time Zero-Shot Generalization of Vision-Language Models: Do We Really Need Prompt Learning?
Maxime Zanella, Ismail Ben Ayed
GeminiFusion: Efficient Pixel-wise Multimodal Fusion for Vision Transformer
Ding Jia, Jianyuan Guo, Kai Han et al.
SelfIE: Self-Interpretation of Large Language Model Embeddings
Haozhe Chen, Carl Vondrick, Chengzhi Mao
Jack of All Tasks Master of Many: Designing General-Purpose Coarse-to-Fine Vision-Language Model
Shraman Pramanick, Guangxing Han, Rui Hou et al.
Score Regularized Policy Optimization through Diffusion Behavior
Huayu Chen, Cheng Lu, Zhengyi Wang et al.
Tri-Perspective View Decomposition for Geometry-Aware Depth Completion
Zhiqiang Yan, Yuankai Lin, Kun Wang et al.
PIGEON: Predicting Image Geolocations
Lukas Haas, Michal Skreta, Silas Alberti et al.
LogFormer: A Pre-train and Tuning Pipeline for Log Anomaly Detection
hongcheng Guo, Jian Yang, Jiaheng Liu et al.
Neural Implicit Representation for Building Digital Twins of Unknown Articulated Objects
Yijia Weng, Bowen Wen, Jonathan Tremblay et al.
Dual Memory Networks: A Versatile Adaptation Approach for Vision-Language Models
Yabin Zhang, Wenjie Zhu, Hui Tang et al.
Scaling Exponents Across Parameterizations and Optimizers
Katie Everett, Lechao Xiao, Mitchell Wortsman et al.
GraphBEV: Towards Robust BEV Feature Alignment for Multi-Modal 3D Object Detection
Ziying Song, Lei Yang, Shaoqing Xu et al.
ArGue: Attribute-Guided Prompt Tuning for Vision-Language Models
Xinyu Tian, Shu Zou, Zhaoyuan Yang et al.
PIE-NeRF: Physics-based Interactive Elastodynamics with NeRF
Yutao Feng, Yintong Shang, Xuan Li et al.
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
Jiayi Guo, Xingqian Xu, Yifan Pu et al.
Curated LLM: Synergy of LLMs and Data Curation for tabular augmentation in low-data regimes
Nabeel Seedat, Nicolas Huynh, Boris van Breugel et al.
ReMoS: 3D Motion-Conditioned Reaction Synthesis for Two-Person Interactions
Anindita Ghosh, Rishabh Dabral, Vladislav Golyanik et al.
FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis
Linjiang Huang, Rongyao Fang, Aiping Zhang et al.
Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation
Lanqing Guo, Yingqing He, Haoxin Chen et al.
LCM-Lookahead for Encoder-based Text-to-Image Personalization
Rinon Gal, Or Lichter, Elad Richardson et al.
Deep Patch Visual SLAM
Lahav Lipson, Zachary Teed, Jia Deng
Language-driven All-in-one Adverse Weather Removal
Hao Yang, Liyuan Pan, Yan Yang et al.
MVHumanNet: A Large-scale Dataset of Multi-view Daily Dressing Human Captures
Zhangyang Xiong, Chenghong Li, Kenkun Liu et al.
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Jieneng Chen, Qihang Yu, Xiaohui Shen et al.
GVGEN: Text-to-3D Generation with Volumetric Representation
Xianglong He, Junyi Chen, Sida Peng et al.
Retrieval-Augmented Egocentric Video Captioning
Jilan Xu, Yifei Huang, Junlin Hou et al.
One-Shot Open Affordance Learning with Foundation Models
Gen Li, Deqing Sun, Laura Sevilla-Lara et al.
Strong Baselines for Parameter-Efficient Few-Shot Fine-Tuning
Samyadeep Basu, Shell Hu, Daniela Massiceti et al.
BESA: Pruning Large Language Models with Blockwise Parameter-Efficient Sparsity Allocation
Peng Xu, Wenqi Shao, Mengzhao Chen et al.
CUTS+: High-Dimensional Causal Discovery from Irregular Time-Series
Yuxiao Cheng, Lianglong Li, Tingxiong Xiao et al.
An Efficient Membership Inference Attack for the Diffusion Model by Proximal Initialization
Fei Kong, Jinhao Duan, ruipeng ma et al.
Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications
Junyi Ma, Xieyuanli Chen, Jiawei Huang et al.
Discovering and Mitigating Visual Biases through Keyword Explanation
Younghyun Kim, Sangwoo Mo, Minkyu Kim et al.
EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models
Sijie Cheng, Zhicheng Guo, Jingwen Wu et al.
Neural Markov Random Field for Stereo Matching
Tongfan Guan, Chen Wang, Yun-Hui Liu
SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection
JUNSU KIM, Hoseong Cho, Jihyeon Kim et al.
DAP: A Dynamic Adversarial Patch for Evading Person Detectors
Amira Guesmi, Ruitian Ding, Muhammad Abdullah Hanif et al.
Hybrid Internal Model: Learning Agile Legged Locomotion with Simulated Robot Response
Junfeng Long, ZiRui Wang, Quanyi Li et al.
Towards Memorization-Free Diffusion Models
Chen Chen, Daochang Liu, Chang Xu
Leveraging Representations from Intermediate Encoder-blocks for Synthetic Image Detection
Christos Koutlis, Symeon Papadopoulos
Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model
Lingjun Zhang, Xinyuan Chen, Yaohui Wang et al.
MM-Narrator: Narrating Long-form Videos with Multimodal In-Context Learning
Chaoyi Zhang, Kevin Lin, Zhengyuan Yang et al.
Matching Anything by Segmenting Anything
Siyuan Li, Lei Ke, Martin Danelljan et al.
CoMo: Controllable Motion Generation through Language Guided Pose Code Editing
Yiming Huang, WEILIN WAN, Yue Yang et al.
Large-Vocabulary 3D Diffusion Model with Transformer
Ziang Cao, Fangzhou Hong, Tong Wu et al.
Dynamic Semantic-Based Spatial Graph Convolution Network for Skeleton-Based Human Action Recognition
Jianyang Xie, Yanda Meng, Yitian Zhao et al.
SC4D: Sparse-Controlled Video-to-4D Generation and Motion Transfer
Zijie Wu, Chaohui Yu, Yanqin Jiang et al.
Patched Denoising Diffusion Models For High-Resolution Image Synthesis
Zheng Ding, Mengqi Zhang, Jiajun Wu et al.
Correlation-aware Coarse-to-fine MLPs for Deformable Medical Image Registration
Mingyuan Meng, Dagan Feng, Lei Bi et al.
Machine Unlearning for Image-to-Image Generative Models
Guihong Li, Hsiang Hsu, Chun-Fu Chen et al.
Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks
Yuhao Liu, Zhanghan Ke, Fang Liu et al.
LLM Augmented LLMs: Expanding Capabilities through Composition
Rachit Bansal, Bidisha Samanta, Siddharth Dalmia et al.
MatSynth: A Modern PBR Materials Dataset
Giuseppe Vecchio, Valentin Deschaintre
MaskClustering: View Consensus based Mask Graph Clustering for Open-Vocabulary 3D Instance Segmentation
Mi Yan, Jiazhao Zhang, Yan Zhu et al.
ReMamber: Referring Image Segmentation with Mamba Twister
Yuhuan Yang, Chaofan Ma, Jiangchao Yao et al.
MatFuse: Controllable Material Generation with Diffusion Models
Giuseppe Vecchio, Renato Sortino, Simone Palazzo et al.
Protein Conformation Generation via Force-Guided SE(3) Diffusion Models
YAN WANG, Lihao Wang, Yuning Shen et al.
Frozen Transformers in Language Models Are Effective Visual Encoder Layers
Ziqi Pang, Ziyang Xie, Yunze Man et al.