Most Cited 2024 "conceptual generalization" Papers
12,324 papers found • Page 42 of 62
Conference
Post-training Quantization with Progressive Calibration and Activation Relaxing for Text-to-Image Diffusion Models
Siao Tang, Xin Wang, Hong Chen et al.
DISCO: Embodied Navigation and Interaction via Differentiable Scene Semantics and Dual-level Control
Xinyu Xu, Shengcheng Luo, Yanchao Yang et al.
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Shilong Liu, Hao Cheng, Haotian Liu et al.
Align before Collaborate: Mitigating Feature Misalignment for Robust Multi-Agent Perception
Dingkang Yang, Ke Li, Dongling Xiao et al.
Textual Query-Driven Mask Transformer for Domain Generalized Segmentation
Byeonghyun Pak, Byeongju Woo, Sunghwan Kim et al.
Arbitrary-Scale Video Super-Resolution with Structural and Textural Priors
Wei Shang, Dongwei Ren, Wanying Zhang et al.
Regulating Model Reliance on Non-Robust Features by Smoothing Input Marginal Density
Peiyu Yang, Naveed Akhtar, Shah Mubarak et al.
Combining Generative and Geometry Priors for Wide-Angle Portrait Correction
Lan Yao, Chaofeng Chen, Xiaoming Li et al.
To Generate or Not? Safety-Driven Unlearned Diffusion Models Are Still Easy To Generate Unsafe Images ... For Now
Yimeng Zhang, jinghan jia, Xin Chen et al.
StereoGlue: Joint Feature Matching and Robust Estimation
Daniel Barath, Dmytro Mishkin, Luca Cavalli et al.
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model
Muyao Niu, Xiaodong Cun, Xintao Wang et al.
Object-Aware NIR-to-Visible Translation
Yunyi Gao, Lin Gu, Qiankun Liu et al.
DualDn: Dual-domain Denoising via Differentiable ISP
Ruikang Li, Yujin Wang, Shiqi Chen et al.
Syn-to-Real Domain Adaptation for Point Cloud Completion via Part-based Approach
Yunseo Yang, Jihun Kim, Kuk-Jin Yoon
Finding Meaning in Points: Weakly Supervised Semantic Segmentation for Event Cameras
Hoonhee Cho, Sung-Hoon Yoon, Hyeokjun Kweon et al.
StableDrag: Stable Dragging for Point-based Image Editing
Yutao Cui, Xiaotong Zhao, Guozhen Zhang et al.
Phase Concentration and Shortcut Suppression for Weakly Supervised Semantic Segmentation
Hoyong Kwon, Jaeseok Jeong, Sung-Hoon Yoon et al.
Embedding-Free Transformer with Inference Spatial Reduction for Efficient Semantic Segmentation
Hyunwoo Yu, Yubin Cho, Beoungwoo Kang et al.
AdaIFL: Adaptive Image Forgery Localization via a Dynamic and Importance-aware Transformer Network
Yuxi Li, Fuyuan Cheng, Wangbo Yu et al.
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively
Haobo Yuan, Xiangtai Li, Chong Zhou et al.
Event-based Head Pose Estimation: Benchmark and Method
jiahui yuan, Hebei Li, Yansong Peng et al.
Robustness Tokens: Towards Adversarial Robustness of Transformers
Brian Pulfer, Yury Belousov, Slava Voloshynovskiy
EINet: Point Cloud Completion via Extrapolation and Interpolation
Pingping Cai, Canyu Zhang, LINGJIA SHI et al.
Bridging the Gap Between Human Motion and Action Semantics via Kinematics Phrases
Xinpeng Liu, Yong-Lu Li, AILING ZENG et al.
ReCON: Training-Free Acceleration for Text-to-Image Synthesis with Retrieval of Concept Prompt Trajectories
Chen-yi Lu, Shubham Agarwal, Mehrab Tanjim et al.
Free-ATM: Harnessing Free Attention Masks for Representation Learning on Diffusion-Generated Images
Junhao Zhang, Mutian Xu, Jay Zhangjie Wu et al.
Learning Unsigned Distance Functions from Multi-view Images with Volume Rendering Priors
Wen Yuan Zhang, Kanle Shi, Yushen Liu et al.
Assessing Sample Quality via the Latent Space of Generative Models
Jingyi Xu, Hieu Le, Dimitris Samaras
Responsible Visual Editing
Minheng Ni, Yeli Shen, Yabin Zhang et al.
Distributed Active Client Selection With Noisy Clients Using Model Association Scores
Kwang In Kim
SCPNet: Unsupervised Cross-modal Homography Estimation via Intra-modal Self-supervised Learning
Runmin Zhang, Jun Ma, Lun Luo et al.
Clearer Frames, Anytime: Resolving Velocity Ambiguity in Video Frame Interpolation
Zhihang Zhong, Gurunandan Krishnan, Xiao Sun et al.
MotionDirector: Motion Customization of Text-to-Video Diffusion Models
Rui Zhao, Yuchao Gu, Jay Zhangjie Wu et al.
ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction
Shaozhe Hao, Kai Han, Zhengyao Lv et al.
AttentionHand: Text-driven Controllable Hand Image Generation for 3D Hand Reconstruction in the Wild
Junho Park, Kyeongbo Kong, Suk-Ju Kang
OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving
Guoqing Wang, Zhongdao Wang, Pin Tang et al.
Probabilistic Image-Driven Traffic Modeling via Remote Sensing
Scott Workman, Armin Hadzic
VideoStudio: Generating Consistent-Content and Multi-Scene Videos
Fuchen Long, Zhaofan Qiu, Ting Yao et al.
Occupancy as Set of Points
Yiang Shi, Tianheng Cheng, Qian Zhang et al.
UAV First-Person Viewers Are Radiance Field Learners
Liqi Yan, Qifan Wang, Junhan Zhao et al.
Knowledge-enhanced Visual-Language Pretraining for Computational Pathology
Xiao Zhou, Xiaoman Zhang, Chaoyi Wu et al.
Pick-a-back: Selective Device-to-Device Knowledge Transfer in Federated Continual Learning
JinYi Yoon, HyungJune Lee
Situated Instruction Following
So Yeon Min, Xavier Puig, Devendra Singh Chaplot et al.
Holodepth: Programmable Depth-Varying Projection via Computer-Generated Holography
Dorian Chan, Matthew O'Toole, Sizhuo Ma et al.
Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators
Yifan Pu, Xia Zhuofan, Jiayi Guo et al.
Two-Stage Video Shadow Detection via Temporal-Spatial Adaption
Xin Duan, Yu Cao, Lei Zhu et al.
Semi-Supervised Video Desnowing Network via Temporal Decoupling Experts and Distribution-Driven Contrastive Regularization
Hongtao Wu, Yijun Yang, Angelica I Aviles-Rivero et al.
CLIP-DINOiser: Teaching CLIP a few DINO tricks for open-vocabulary semantic segmentation
Monika Wysoczanska, Oriane Siméoni, Michaël Ramamonjisoa et al.
FMBoost: Boosting Latent Diffusion with Flow Matching
Johannes Schusterbauer-Fischer, Ming Gui, Pingchuan Ma et al.
M^2Depth: Self-supervised Two-Frame Multi-camera Metric Depth Estimation
Yingshuang Zou, Yikang Ding, Xi Qiu et al.
FoundPose: Unseen Object Pose Estimation with Foundation Features
Evin Pınar Örnek, Yann Labbé, Bugra Tekin et al.
Diffusion Models as Data Mining Tools
Ioannis Siglidis, Aleksander Holynski, Alexei Efros et al.
SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models
Ziyi Lin, Dongyang Liu, Renrui Zhang et al.
Improving Adversarial Transferability via Model Alignment
Avery Ma, Amir-massoud Farahmand, Yangchen Pan et al.
RealGen: Retrieval Augmented Generation for Controllable Traffic Scenarios
Wenhao Ding, Yulong Cao, DING ZHAO et al.
Embodied Understanding of Driving Scenarios
Yunsong Zhou, Linyan Huang, Qingwen Bu et al.
Factorizing Text-to-Video Generation by Explicit Image Conditioning
Rohit Girdhar, Mannat Singh, Andrew Brown et al.
Computing the Lipschitz constant needed for fast scene recovery from CASSI measurements
Niels Chr. Overgaard, Anders Holst
DatasetNeRF: Efficient 3D-aware Data Factory with Generative Radiance Fields
Yu Chi, Fangneng Zhan, Sibo Wu et al.
Cut out the Middleman: Revisiting Pose-based Gait Recognition
YANG FU, Saihui Hou, Shibei Meng et al.
FedHARM: Harmonizing Model Architectural Diversity in Federated Learning
Anestis Kastellos, Athanasios Psaltis, Charalampos Z Patrikakis et al.
Thinking Outside the BBox: Unconstrained Generative Object Compositing
Gemma Canet Tarrés, Zhe Lin, Zhifei Zhang et al.
EAGLES: Efficient Accelerated 3D Gaussians with Lightweight EncodingS
Sharath Girish, Kamal Gupta, Abhinav Shrivastava
Caltech Aerial RGB-Thermal Dataset in the Wild
Connor Lee, Matthew Anderson, Nikhil Ranganathan et al.
UL-VIO: Ultra-lightweight Visual-Inertial Odometry with Noise Robust Test-time Adaptation
Jinho Park, Se Young Chun, Mingoo Seok
DEPICT: Diffusion-Enabled Permutation Importance for Image Classification Tasks
Sarah Jabbour, Gregory Kondas, Ella Kazerooni et al.
Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time
Sanjoy Chowdhury, Sayan Nag, Subhrajyoti Dasgupta et al.
BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion
Gwanghyun Kim, Hayeon Kim, Hoigi Seo et al.
MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning
Vishal Nedungadi, Ankit Kariryaa, Stefan Oehmcke et al.
All You Need is Your Voice: Emotional Face Representation with Audio Perspective for Emotional Talking Face Generation
Seongho Kim, Byung Cheol Song
POET: Prompt Offset Tuning for Continual Human Action Adaptation
Prachi Garg, Joseph K J, Vineeth N Balasubramanian et al.
TrafficNight : An Aerial Multimodal Benchmark For Nighttime Vehicle Surveillance
Guoxing Zhang, Yiming Liu, xiaoyu yang et al.
Loc3Diff: Local Diffusion for 3D Human Head Synthesis and Editing
Yushi Lan, Feitong Tan, Qiangeng Xu et al.
Learning to Distinguish Samples for Generalized Category Discovery
Fengxiang Yang, Pu Nan, Wenjing Li et al.
COM Kitchens: An Unedited Overhead-view Procedural Videos Dataset a Vision-Language Benchmark
Atsushi Hashimoto, Koki Maeda, Tosho Hirasawa et al.
WBP: Training-time Backdoor Attacks through Hardware-based Weight Bit Poisoning
Kunbei Cai, Zhenkai Zhang, Qian Lou et al.
Towards Dual Transparent Liquid Level Estimation in Biomedical Lab: Dataset, Methods and Practice
Xiayu Wang, Ke Ma, Ruiyun Zhong et al.
Delving into Adversarial Robustness on Document Tampering Localization
Huiru Shao, Zhuang Qian, Kaizhu Huang et al.
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
Xiangxiang Chu, Jianlin Su, Bo Zhang et al.
HVCLIP: High-dimensional Vector in CLIP for Unsupervised Domain Adaptation
Noranart Vesdapunt, Kah Kuen Fu, Yue Wu et al.
Improving 3D Semi-supervised Learning by Effectively Utilizing All Unlabelled Data
Sneha Paul, Zachary Patterson, Nizar Bouguila
Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Visual-Concept Alignment and Retention
Zuyao Chen, Jinlin Wu, Zhen Lei et al.
MART: MultiscAle Relational Transformer Networks for Multi-agent Trajectory Prediction
Seongju Lee, Junseok Lee, Yeonguk Yu et al.
SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision
Ankit Vani, Bac Nguyen, Samuel Lavoie et al.
Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling
Zixiao Wang, Hongtao Xie, YuXin Wang et al.
Towards Robust Event-based Networks for Nighttime via Unpaired Day-to-Night Event Translation
Yuhwan Jeong, Hoonhee Cho, Kuk-Jin Yoon
CLAMP-ViT: Contrastive Data-Free Learning for Adaptive Post-Training Quantization of ViTs
Akshat Ramachandran, Souvik Kundu, Tushar Krishna
A Riemannian Approach for Spatiotemporal Analysis and Generation of 4D Tree-shaped Structures
Tahmina Khanam, Mohammed Bennamoun, Guan Wang et al.
Robustness Preserving Fine-tuning using Neuron Importance
Guangrui Li, Rahul Duggal, Aaditya Singh et al.
ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentation
Mengcheng Lan, Chaofeng Chen, Yiping Ke et al.
Similarity of Neural Architectures using Adversarial Attack Transferability
Jaehui Hwang, Dongyoon Han, Byeongho Heo et al.
Dual-Rain: Video Rain Removal using Assertive and Gentle Teachers
Tingting Chen, Beibei Lin, Yeying Jin et al.
Unsupervised Variational Translator for Bridging Image Restoration and High-Level Vision Tasks
Jiawei Wu, Zhi Jin
Fast Point Cloud Geometry Compression with Context-based Residual Coding and INR-based Refinement
Hao Xu, Xi Zhang, Xiaolin Wu
Scene-Conditional 3D Object Stylization and Composition
Jinghao Zhou, Tomas Jakab, Philip Torr et al.
Forbes: Face Obfuscation Rendering via Backpropagation Refinement Scheme
Jintae Kim, Seungwon Yang, Seong-Gyun Jeong et al.
Information Bottleneck Based Data Correction in Continual Learning
Shuai Chen, mingyi zhang, Junge Zhang et al.
SAFT: Towards Out-of-Distribution Generalization in Fine-Tuning
Bac Nguyen, Stefan Uhlich, Fabien Cardinaux et al.
Generalizing to Unseen Domains via Text-guided Augmentation
Daiqing Qi, Handong Zhao, Aidong Zhang et al.
Contextual Correspondence Matters: Bidirectional Graph Matching for Video Summarization
yunzuo zhang, Yameng Liu
Lost in Translation: Latent Concept Misalignment in Text-to-Image Diffusion Models
Juntu Zhao, Junyu Deng, Yixin Ye et al.
Adaptive Multi-head Contrastive Learning
Lei Wang, Piotr Koniusz, Tom Gedeon et al.
Easing 3D Pattern Reasoning with Side-view Features for Semantic Scene Completion
Linxi Huan, Mingyue Dong, Linwei Yue et al.
Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression
Animesh Sinha, Bo Sun, Anmol Kalia et al.
High-Quality Mesh Blendshape Generation from Face Videos via Neural Inverse Rendering
Xin Ming, Jiawei Li, Jingwang Ling et al.
Early Anticipation of Driving Maneuvers
Abdul Wasi Lone, Shankar Gangisetty, Shyam Nandan et al.
SG-NeRF: Neural Surface Reconstruction with Scene Graph Optimization
Yiyang Chen, Siyan Dong, Xulong Wang et al.
InfoNorm: Mutual Information Shaping of Normals for Sparse-View Reconstruction
Xulong Wang, Siyan Dong, Youyi Zheng et al.
DreamReward: Aligning Human Preference in Text-to-3D Generation
junliang ye, Fangfu Liu, Qixiu Li et al.
CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches
Sifan Wu, Amir Hosein Khasahmadi, Mor Katz et al.
Towards Image Ambient Lighting Normalization
Florin-Alexandru Vasluianu, Tim Seizinger, Zongwei Wu et al.
FedHide: Federated Learning by Hiding in the Neighbors
Hyunsin Park, Sungrack Yun
HoloADMM: High-Quality Holographic Complex Field Recovery
Mazen Mel, Paul Springer, Pietro Zanuttigh et al.
Towards Open-World Object-based Anomaly Detection via Self-Supervised Outlier Synthesis
Brian Isaac Medina, Yona Falinie Abdul Gaus, Neelanjan Bhowmik et al.
LPViT: Low-Power Semi-structured Pruning for Vision Transformers
KAIXIN Xu, Zhe Wang, Chunyun Chen et al.
Weighted Ensemble Models Are Strong Continual Learners
Imad Eddine Marouf, Subhankar Roy, Enzo Tartaglione et al.
GGRt: Towards Generalizable 3D Gaussians without Pose Priors in Real-Time
Hao Li, Yuanyuan Gao, Dingwen Zhang et al.
Learning Equilibrium Transformation for Gamut Expansion and Color Restoration
JUN XIAO, Changjian Shui, Zhi-Song Liu et al.
Physics-informed Knowledge Transfer for Underwater Monocular Depth Estimation
Jinghe Yang, Mingming Gong, Ye Pu
Robust Nearest Neighbors for Source-Free Domain Adaptation under Class Distribution Shift
Antonio Tejero-de-Pablos, Riku Togashi, Mayu Otani et al.
Chains of Diffusion Models
Yanheng Wei, Lianghua Huang, Zhi-Fan Wu et al.
Learning Neural Deformation Representation for 4D Dynamic Shape Generation
Gyojin Han, Jiwan Hur, Jaehyun Choi et al.
LAPT: Label-driven Automated Prompt Tuning for OOD Detection with Vision-Language Models
Yabin Zhang, Wenjie Zhu, Chenhang He et al.
Leveraging Representations from Intermediate Encoder-blocks for Synthetic Image Detection
Christos Koutlis, Symeon Papadopoulos
Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance
Donghoon Ahn, Hyoungwon Cho, Jaewon Min et al.
Oulu Remote-photoplethysmography Physical Domain Attacks Database (ORPDAD)
Marko Savic, Guoying Zhao
DoubleTake: Geometry Guided Depth Estimation
Mohamed Sayed, Filippo Aleotti, Jamie Watson et al.
Close, But Not There: Boosting Geographic Distance Sensitivity in Visual Place Recognition
Sergio Izquierdo, Javier Civera
TAG: Text Prompt Augmentation for Zero-Shot Out-of-Distribution Detection
Xixi Liu, Christopher Zach
Continual Learning and Unknown Object Discovery in 3D Scenes via Self-Distillation
Mohamed El Amine Boudjoghra, Jean Lahoud, Salman Khan et al.
Lost and Found: Overcoming Detector Failures in Online Multi-Object Tracking
Lorenzo Vaquero, Yihong XU, Xavier Alameda-Pineda et al.
A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment
Tianhe Wu, Kede Ma, Jie Liang et al.
DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling
Haoran Li, Haolin Shi, Wenli Zhang et al.
Can Textual Semantics Mitigate Sounding Object Segmentation Preference?
Yaoting Wang, Peiwen Sun, Yuanchao Li et al.
MLPHand: Real Time Multi-View 3D Hand Reconstruction via MLP Modeling
Jian Yang, Jiakun Li, Guoming Li et al.
How Far Can a 1-Pixel Camera Go? Solving Vision Tasks using Photoreceptors and Computationally Designed Visual Morphology
Andrei Atanov, Rishubh Singh, Jiawei Fu et al.
MONTRAGE: Monitoring Training for Attribution of Generative Diffusion Models
Jonathan Brokman, Omer Hofman, Roman Vainshtein et al.
AdversariaLeak: External Information Leakage Attack Using Adversarial Samples on Face Recognition Systems
Roye Katzav, Amit Giloni, Edita Grolman et al.
Beyond Pixels: Semi-Supervised Semantic Segmentation with a Multi-scale Patch-based Multi-Label Classifier
Prantik Howlader, Srijan Das, Hieu Le et al.
Spiking Wavelet Transformer
Yuetong Fang, Ziqing Wang, Lingfeng Zhang et al.
WAVE: Warping DDIM Inversion Features for Zero-shot Text-to-Video Editing
Yutang Feng, Sicheng Gao, Yuxiang Bao et al.
COD: Learning Conditional Invariant Representation for Domain Adaptation Regression
Hao-Ran Yang, Chuan-Xian Ren, You-Wei Luo
RANRAC: Robust Neural Scene Representations via Random Ray Consensus
Benno Buschmann, Andreea Dogaru, Elmar Eisemann et al.
LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model
Runhui Huang, Kaixin Cai, Jianhua Han et al.
Few-shot Defect Image Generation based on Consistency Modeling
Qingfeng Shi, Jing Wei, Fei Shen et al.
CLIP-DPO: Vision-Language Models as a Source of Preference for Fixing Hallucinations in LVLMs
Yassine Ouali, Adrian Bulat, Brais Martinez et al.
Get Your Embedding Space in Order: Domain-Adaptive Regression for Forest Monitoring
Sizhuo Li, Dimitri Gominski, Martin Brandt et al.
Curved Diffusion: A Generative Model With Optical Geometry Control
Andrey Voynov, Amir Hertz, Moab Arar et al.
AnimateMe: 4D Facial Expressions via Diffusion Models
Dimitrios Gerogiannis, Foivos Paraperas Papantoniou, Rolandos Alexandros Potamias et al.
LATTE3D: Large-scale Amortized Text-To-Enhanced3D Synthesis
Kevin Xie, Tianshi Cao, Jonathan P Lorraine et al.
iNeMo: Incremental Neural Mesh Models for Robust Class-Incremental Learning
Tom Fischer, Yaoyao Liu, Artur Jesslen et al.
Pose Guided Fine-Grained Sign Language Video Generation
Tongkai Shi, Lianyu Hu, Fanhua Shang et al.
SeA: Semantic Adversarial Augmentation for Last Layer Features from Unsupervised Representation Learning
Qi Qian, Yuanhong Xu, JUHUA HU
3D Reconstruction of Objects in Hands without Real World 3D Supervision
Aditya Prakash, Matthew Chang, Matthew Jin et al.
To Supervise or Not to Supervise: Understanding and Addressing the Key Challenges of Point Cloud Transfer Learning
Souhail Hadgi, Lei Li, Maks Ovsjanikov
Optimization-based Uncertainty Attribution Via Learning Informative Perturbations
Hanjing Wang, Bashirul Azam Biswas, Qiang Ji
A Diffusion Model for Simulation Ready Coronary Anatomy with Morpho-skeletal Control
Karim Kadry, Shreya Gupta, Jonas Sogbadji et al.
Evaluating the Adversarial Robustness of Semantic Segmentation: Trying Harder Pays Off
Levente Ferenc Halmosi, Bálint Mohos, Márk Jelasity
AdaDiff: Accelerating Diffusion Models through Step-Wise Adaptive Computation
Shengkun Tang, Yaqing Wang, Caiwen Ding et al.
Object-Aware Query Perturbation for Cross-Modal Image-Text Retrieval
Naoya Sogi, Takashi Shibata, Makoto Terao
Ex2Eg-MAE: A Framework for Adaptation of Exocentric Video Masked Autoencoders for Egocentric Social Role Understanding
Minh Tran, Yelin Kim, Che-Chun Su et al.
Meta-optimized Angular Margin Contrastive Framework for Video-Language Representation Learning
Thanh Thong Nguyen, Yi Bin, Xiaobao Wu et al.
GRiT: A Generative Region-to-text Transformer for Object Understanding
Jialian Wu, Jianfeng Wang, Zhengyuan Yang et al.
LRSLAM: Low-rank Representation of Signed Distance Fields in Dense Visual SLAM System
Hongbeen Park, Minjeong Park, Giljoo Nam et al.
Learning Representation for Multitask Learning through Self-Supervised Auxiliary Learning
Seokwon Shin, Hyungrok Do, Youngdoo Son
BAGS: Blur Agnostic Gaussian Splatting through Multi-Scale Kernel Modeling
Cheng Peng, Yutao Tang, Yifan Zhou et al.
DPA-Net: Structured 3D Abstraction from Sparse Views via Differentiable Primitive Assembly
Fenggen Yu, Yiming Qian, Xu Zhang et al.
An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation
Zhiyu Tan, Mengping Yang, Luozheng Qin et al.
Generalizable Symbolic Optimizer Learning
Xiaotian Song, Peng Zeng, Yanan Sun et al.
On the Vulnerability of Skip Connections to Model Inversion Attacks
Jun Hao Koh, Sy-Tuyen Ho, Ngoc-Bao Nguyen et al.
Reinforcement Learning via Auxillary Task Distillation
Abhinav Narayan Harish, Larry Heck, Josiah P Hanna et al.
Plug and Play: A Representation Enhanced Domain Adapter for Collaborative Perception
TIANYOU LUO, Quan Yuan, Yuchen Xia et al.
Follow the Rules: Reasoning for Video Anomaly Detection with Large Language Models
Yuchen Yang, Kwonjoon Lee, Behzad Dariush et al.
Motion Keyframe Interpolation for Any Human Skeleton using Point Cloud-based Human Motion Data Homogenisation
Clinton Mo, Kun Hu, Chengjiang Long et al.
Improving Hyperbolic Representations via Gromov-Wasserstein Regularization
yifei Yang, Wonjun Lee, Dongmian Zou et al.
Dense Hand-Object(HO) GraspNet with Full Grasping Taxonomy and Dynamics
Woojin Cho, Jihyun Lee, Minjae Yi et al.
Depth-Aware Blind Image Decomposition for Real-World Adverse Weather Recovery
Chao Wang, Zhedong Zheng, Ruijie Quan et al.
DreamSampler: Unifying Diffusion Sampling and Score Distillation for Image Manipulation
Jeongsol Kim, Geon Yeong Park, Jong Chul Ye
PreciseControl: Enhancing Text-To-Image Diffusion Models with Fine-Grained Attribute Control
Rishubh Parihar, Sachidanand VS, Sabariswaran Mani et al.
SRPose: Two-view Relative Pose Estimation with Sparse Keypoints
Rui Yin, Yulun Zhang, Zherong Pan et al.
Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models
Xiaoshi Wu, Yiming Hao, Manyuan Zhang et al.
Efficient Vision Transformers with Partial Attention
Xuan-Thuy Vo, Duy-Linh Nguyen, Adri Priadana et al.
Generalized Coverage for More Robust Low-Budget Active Learning
Wonho Bae, Junhyug Noh, Danica J. Sutherland
Kinetic Typography Diffusion Model
Seonmi Park, Inhwan Bae, Seunghyun Shin et al.
R.A.C.E.: Robust Adversarial Concept Erasure for Secure Text-to-Image Diffusion Model
Changhoon Kim, Kyle Min, Yezhou Yang
Efficient 3D-Aware Facial Image Editing via Attribute-Specific Prompt Learning
Amandeep Kumar, Muhammad Awais, Sanath Narayan et al.
Embracing Events and Frames with Hierarchical Feature Refinement Network for Object Detection
Hu Cao, Zehua Zhang, Yan Xia et al.
Unsupervised Representation Learning by Balanced Self Attention Matching
Daniel Shalam, Simon Korman
Identity-Consistent Diffusion Network for Grading Knee Osteoarthritis Progression in Radiographic Imaging
Wenhua Wu, Kun Hu, Wenxi Yue et al.
Make-Your-3D: Fast and Consistent Subject-Driven 3D Content Generation
Fangfu Liu, Hanyang Wang, Weiliang Chen et al.
Teach CLIP to Develop a Number Sense for Ordinal Regression
Yao DU, Qiang Zhai, Weihang Dai et al.
Compact 3D Scene Representation via Self-Organizing Gaussian Grids
Wieland Morgenstern, Florian Barthel, Anna Hilsmann et al.
Linking in Style: Understanding learned features in deep learning models
Maren Wehrheim, Pamela Osuna Vargas, Matthias Kaschube
Instant Uncertainty Calibration of NeRFs Using a Meta-Calibrator
Niki Amini-Naieni, Tomas Jakab, Andrea Vedaldi et al.
SHIC: Shape-Image Correspondences with no Keypoint Supervision
Aleksandar Shtedritski, Christian Rupprecht, Andrea Vedaldi
SceneTeller: Language-to-3D Scene Generation
Basak Melis Ocal, Maxim Tatarchenko, Sezer Karaoglu et al.
MagMax: Leveraging Model Merging for Seamless Continual Learning
Daniel Marczak, Bartlomiej Twardowski, Tomasz Trzcinski et al.
Debiasing surgeon: fantastic weights and how to find them
Remi Nahon, Ivan Luiz De Moura Matos, Van-Tam Nguyen et al.
Spline-based Transformers
Prashanth Chandran, Agon Serifi, Markus Gross et al.
Efficient NeRF Optimization - Not All Samples Remain Equally Hard
Juuso Korhonen, Goutham Rangu, Hamed Rezazadegan Tavakoli et al.