Most Cited 2024 "view-invariant motion" Papers
12,324 papers found • Page 12 of 62
Conference
FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models
Jinglin Xu, Yijie Guo, Yuxin Peng
Multimodal Prototyping for cancer survival prediction
Andrew Song, Richard Chen, Guillaume Jaume et al.
Devignet: High-Resolution Vignetting Removal via a Dual Aggregated Fusion Transformer with Adaptive Channel Expansion
Shenghong Luo, Xuhang Chen, Weiwen Chen et al.
TransFusion -- A Transparency-Based Diffusion Model for Anomaly Detection
Matic Fučka, Vitjan Zavrtanik, Danijel Skocaj
ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image Classification
Jiangbo Shi, Chen Li, Tieliang Gong et al.
FlashFFTConv: Efficient Convolutions for Long Sequences with Tensor Cores
Dan Fu, Hermann Kumbong, Eric Nguyen et al.
Revisiting Single Image Reflection Removal In the Wild
Yurui Zhu, Bo Li, Xueyang Fu et al.
Translate Meanings, Not Just Words: IdiomKB’s Role in Optimizing Idiomatic Translation with Language Models
Shuang Li, Jiangjie Chen, Siyu Yuan et al.
Generalized Neural Collapse for a Large Number of Classes
Jiachen Jiang, Jinxin Zhou, Peng Wang et al.
Taming Mode Collapse in Score Distillation for Text-to-3D Generation
Peihao Wang, Dejia Xu, Zhiwen Fan et al.
NoiseCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions in Diffusion Models
Yusuf Dalva, Pinar Yanardag
Fourier Transporter: Bi-Equivariant Robotic Manipulation in 3D
Haojie Huang, Owen Howell, Dian Wang et al.
XKD: Cross-Modal Knowledge Distillation with Domain Alignment for Video Representation Learning
Pritam Sarkar, Ali Etemad
AiOS: All-in-One-Stage Expressive Human Pose and Shape Estimation
Qingping SUN, Yanjun Wang, Ailing Zeng et al.
Gradient Reweighting: Towards Imbalanced Class-Incremental Learning
Jiangpeng He
Learning Occupancy for Monocular 3D Object Detection
Liang Peng, Junkai Xu, Haoran Cheng et al.
Controllable Mind Visual Diffusion Model
Bohan Zeng, Shanglin Li, Xuhui Liu et al.
Decoupling Common and Unique Representations for Multimodal Self-supervised Learning
Yi Wang, Conrad M Albrecht, Nassim Ait Ali Braham et al.
TAIL: Task-specific Adapters for Imitation Learning with Large Pretrained Models
Zuxin Liu, Jesse Zhang, Kavosh Asadi et al.
DuPL: Dual Student with Trustworthy Progressive Learning for Robust Weakly Supervised Semantic Segmentation
Yuanchen Wu, Xichen Ye, KequanYang et al.
AirPhyNet: Harnessing Physics-Guided Neural Networks for Air Quality Prediction
Kethmi Hirushini Hettige, Jiahao Ji, Shili Xiang et al.
Enhancing Video Super-Resolution via Implicit Resampling-based Alignment
Kai Xu, Ziwei Yu, Xin Wang et al.
Incremental Residual Concept Bottleneck Models
Chenming Shang, Shiji Zhou, Hengyuan Zhang et al.
Text-Driven Image Editing via Learnable Regions
Yuanze Lin, Yi-Wen Chen, Yi-Hsuan Tsai et al.
HIR-Diff: Unsupervised Hyperspectral Image Restoration Via Improved Diffusion Models
Li Pang, Xiangyu Rui, Long Cui et al.
Unbiased Faster R-CNN for Single-source Domain Generalized Object Detection
Yajing Liu, Shijun Zhou, Xiyao Liu et al.
Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization
Guopeng Li, Ming Qian, Gui-Song Xia
6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model
Matteo Bortolon, Theodoros Tsesmelis, Stuart James et al.
Towards Continual Knowledge Graph Embedding via Incremental Distillation
Jiajun Liu, Ke Wenjun, Peng Wang et al.
Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models
Tanmay Gautam, Youngsuk Park, Hao Zhou et al.
RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization
Mengqi Huang, Zhendong Mao, Mingcong Liu et al.
Multi-modal Instruction Tuned LLMs with Fine-grained Visual Perception
Junwen He, Yifan Wang, Lijun Wang et al.
SpaRP: Fast 3D Object Reconstruction and Pose Estimation from Sparse Views
Chao Xu, Ang Li, Linghao Chen et al.
SemiVL: Semi-Supervised Semantic Segmentation with Vision-Language Guidance
Lukas Hoyer, David Tan, Muhammad Ferjad Naeem et al.
Depth-aware Test-Time Training for Zero-shot Video Object Segmentation
Weihuang Liu, Xi Shen, Haolun Li et al.
Elastic Feature Consolidation For Cold Start Exemplar-Free Incremental Learning
Simone Magistri, Tomaso Trinci, Albin Soutif--Cormerais et al.
Rethinking Channel Dependence for Multivariate Time Series Forecasting: Learning from Leading Indicators
Lifan Zhao, Yanyan Shen
Complete and Efficient Graph Transformers for Crystal Material Property Prediction
Keqiang Yan, Cong Fu, Xiaofeng Qian et al.
STanHop: Sparse Tandem Hopfield Model for Memory-Enhanced Time Series Prediction
Yu-Hsuan Wu, Jerry Hu, Weijian Li et al.
V-DETR: DETR with Vertex Relative Position Encoding for 3D Object Detection
Yichao Shen, Zigang Geng, YUHUI YUAN et al.
Scalable AI Safety via Doubly-Efficient Debate
Jonah Brown-Cohen, Geoffrey Irving, Georgios Piliouras
HomoFormer: Homogenized Transformer for Image Shadow Removal
Jie Xiao, Xueyang Fu, Yurui Zhu et al.
When Do Prompting and Prefix-Tuning Work? A Theory of Capabilities and Limitations
Aleksandar Petrov, Philip Torr, Adel Bibi
OmniVec2 - A Novel Transformer based Network for Large Scale Multimodal and Multitask Learning
Siddharth Srivastava, Gaurav Sharma
Copyright Traps for Large Language Models
Matthieu Meeus, Igor Shilov, Manuel Faysse et al.
Revisiting the Role of Language Priors in Vision-Language Models
Zhiqiu Lin, Xinyue Chen, Deepak Pathak et al.
Modeling Caption Diversity in Contrastive Vision-Language Pretraining
Samuel Lavoie, Polina Kirichenko, Mark Ibrahim et al.
GLACE: Global Local Accelerated Coordinate Encoding
Fangjinhua Wang, Xudong Jiang, Silvano Galliani et al.
Communication-Efficient Federated Learning with Accelerated Client Gradient
Geeho Kim, Jinkyu Kim, Bohyung Han
MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving
Jiangfei Duan, Runyu Lu, Haojie Duanmu et al.
ZeroShape: Regression-based Zero-shot Shape Reconstruction
Zixuan Huang, Stefan Stojanov, Anh Thai et al.
STEM: Unleashing the Power of Embeddings for Multi-Task Recommendation
Liangcai Su, Junwei Pan, Ximei Wang et al.
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
Ruyang Liu, Chen Li, Yixiao Ge et al.
Graph Generation with Diffusion Mixture
Jaehyeong Jo, Dongki Kim, Sung Ju Hwang
AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question Answering
Xiuyuan Chen, Yuan Lin, Yuchen Zhang et al.
Portrait4D: Learning One-Shot 4D Head Avatar Synthesis using Synthetic Data
Yu Deng, Duomin Wang, Xiaohang Ren et al.
No Prejudice! Fair Federated Graph Neural Networks for Personalized Recommendation
Nimesh Agrawal, Anuj Sirohi, Sandeep Kumar et al.
GalLop: Learning global and local prompts for vision-language models
Marc Lafon, Elias Ramzi, Clément Rambour et al.
Conformal prediction for multi-dimensional time series by ellipsoidal sets
Chen Xu, Hanyang Jiang, Yao Xie
DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception
Yibo Wang, Ruiyuan Gao, Kai Chen et al.
Scissorhands: Scrub Data Influence via Connection Sensitivity in Networks
Jing Wu, Mehrtash Harandi
Learning Diffusion Texture Priors for Image Restoration
Tian Ye, Sixiang Chen, Wenhao Chai et al.
PolyGCL: GRAPH CONTRASTIVE LEARNING via Learnable Spectral Polynomial Filters
Jingyu Chen, Runlin Lei, Zhewei Wei
The Benefits of Reusing Batches for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents
Yatin Dandi, Emanuele Troiani, Luca Arnaboldi et al.
Approximating the Shapley Value without Marginal Contributions
Patrick Kolpaczki, Viktor Bengs, Maximilian Muschalik et al.
EmerDiff: Emerging Pixel-level Semantic Knowledge in Diffusion Models
Koichi Namekata, Amirmojtaba Sabour, Sanja Fidler et al.
Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models
Daniel Geng, Inbum Park, Andrew Owens
SmartRefine: A Scenario-Adaptive Refinement Framework for Efficient Motion Prediction
Yang Zhou, Hao Shao, Letian Wang et al.
Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data
Zhiwei Xu, Yutong Wang, Spencer Frei et al.
Spherical Linear Interpolation and Text-Anchoring for Zero-shot Composed Image Retrieval
Young Kyun Jang, Dat B Huynh, Ashish Shah et al.
MgNO: Efficient Parameterization of Linear Operators via Multigrid
Juncai He, Xinliang Liu, Jinchao Xu
Score-Guided Diffusion for 3D Human Recovery
Anastasis Stathopoulos, Ligong Han, Dimitris N. Metaxas
PanoDiffusion: 360-degree Panorama Outpainting via Diffusion
Tianhao Wu, Chuanxia Zheng, Tat-Jen Cham
Can Protective Perturbation Safeguard Personal Data from Being Exploited by Stable Diffusion?
Zhengyue Zhao, Jinhao Duan, Kaidi Xu et al.
PINNACLE: PINN Adaptive ColLocation and Experimental points selection
Gregory Kang Ruey Lau, Apivich Hemachandra, See-Kiong Ng et al.
Is This the Subspace You Are Looking for? An Interpretability Illusion for Subspace Activation Patching
Aleksandar Makelov, Georg Lange, Atticus Geiger et al.
Enabling Efficient Equivariant Operations in the Fourier Basis via Gaunt Tensor Products
Shengjie Luo, Tianlang Chen, Aditi Krishnapriyan
Mitigating Motion Blur in Neural Radiance Fields with Events and Frames
Marco Cannici, Davide Scaramuzza
Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning?
Khashayar Gatmiry, Nikunj Saunshi, Sashank J. Reddi et al.
Enhancing Visual Document Understanding with Contrastive Learning in Large Visual-Language Models
Xin Li, Yunfei Wu, Xinghua Jiang et al.
DiffAM: Diffusion-based Adversarial Makeup Transfer for Facial Privacy Protection
Yuhao Sun, Lingyun Yu, Hongtao Xie et al.
TopoMLP: A Simple yet Strong Pipeline for Driving Topology Reasoning
Dongming Wu, Jiahao Chang, Fan Jia et al.
Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment
Mengting Chen, Xi Chen, Zhonghua Zhai et al.
DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior
Tianyu Huang, Yihan Zeng, Zhilu Zhang et al.
DAMSDet: Dynamic Adaptive Multispectral Detection Transformer with Competitive Query Selection and Adaptive Feature Fusion
Junjie Guo, Chenqiang Gao, Fangcen liu et al.
GROOT: Learning to Follow Instructions by Watching Gameplay Videos
Shaofei Cai, Bowei Zhang, Zihao Wang et al.
Anomaly Heterogeneity Learning for Open-set Supervised Anomaly Detection
Jiawen Zhu, Choubo Ding, Yu Tian et al.
Multi-Architecture Multi-Expert Diffusion Models
Yunsung Lee, Jin-Young Kim, Hyojun Go et al.
SMooDi: Stylized Motion Diffusion Model
Lei Zhong, Yiming Xie, Varun Jampani et al.
VicTR: Video-conditioned Text Representations for Activity Recognition
Kumara Kahatapitiya, Anurag Arnab, Arsha Nagrani et al.
SlowTrack: Increasing the Latency of Camera-Based Perception in Autonomous Driving Using Adversarial Examples
Chen Ma, Ningfei Wang, Qi Alfred Chen et al.
Subgoal-based Demonstration Learning for Formal Theorem Proving
Xueliang Zhao, Wenda Li, Lingpeng Kong
Transformer-Based No-Reference Image Quality Assessment via Supervised Contrastive Learning
Jinsong Shi, Pan Gao, Jie Qin
TESTAM: A Time-Enhanced Spatio-Temporal Attention Model with Mixture of Experts
Hyunwook Lee, Sungahn Ko
Multi-granularity Correspondence Learning from Long-term Noisy Videos
Yijie Lin, Jie Zhang, Zhenyu Huang et al.
Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance
Dazhong Shen, Guanglu Song, Zeyue Xue et al.
U-mixer: An Unet-Mixer Architecture with Stationarity Correction for Time Series Forecasting
Xiang Ma, Xuemei Li, Lexin Fang et al.
Solving Motion Planning Tasks with a Scalable Generative Model
Yihan Hu, Siqi Chai, Zhening Yang et al.
Weighted Ensemble Models Are Strong Continual Learners
Imad Eddine Marouf, Subhankar Roy, Enzo Tartaglione et al.
How to Configure Good In-Context Sequence for Visual Question Answering
Li Li, Jiawei Peng, huiyi chen et al.
TRIP: Temporal Residual Learning with Image Noise Prior for Image-to-Video Diffusion Models
Zhongwei Zhang, Fuchen Long, Yingwei Pan et al.
Transformers Learn Nonlinear Features In Context: Nonconvex Mean-field Dynamics on the Attention Landscape
Juno Kim, Taiji Suzuki
Prompt Highlighter: Interactive Control for Multi-Modal LLMs
Yuechen Zhang, Shengju Qian, Bohao Peng et al.
Diagnosing and Re-learning for Balanced Multimodal Learning
Yake Wei, Siwei Li, Ruoxuan Feng et al.
Teach LLMs to Phish: Stealing Private Information from Language Models
Ashwinee Panda, Christopher Choquette-Choo, Zhengming Zhang et al.
CLEX: Continuous Length Extrapolation for Large Language Models
Guanzheng Chen, Xin Li, Zaiqiao Meng et al.
Scaling Diffusion Models to Real-World 3D LiDAR Scene Completion
Lucas Nunes, Rodrigo Marcuzzi, Benedikt Mersch et al.
UnO: Unsupervised Occupancy Fields for Perception and Forecasting
Ben Agro, Quinlan Sykora, Sergio Casas et al.
Rethinking Reverse Distillation for Multi-Modal Anomaly Detection
Zhihao Gu, Jiangning Zhang, Liang Liu et al.
PARL: A Unified Framework for Policy Alignment in Reinforcement Learning from Human Feedback
Souradip Chakraborty, Amrit Bedi, Alec Koppel et al.
Grokking as a First Order Phase Transition in Two Layer Networks
Noa Rubin, Inbar Seroussi, Zohar Ringel
R3D-AD: Reconstruction via Diffusion for 3D Anomaly Detection
Zheyuan Zhou, Wang Le, Naiyu Fang et al.
Content-Adaptive Non-Local Convolution for Remote Sensing Pansharpening
Yule Duan, Xiao Wu, Haoyu Deng et al.
CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification
Haoran Lai, Qingsong Yao, Zihang Jiang et al.
Federated Adaptive Prompt Tuning for Multi-Domain Collaborative Learning
Shangchao Su, Mingzhao Yang, Bin Li et al.
Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving
JunDa Cheng, Wei Yin, Kaixuan Wang et al.
MCL-NER: Cross-Lingual Named Entity Recognition via Multi-View Contrastive Learning
Authors: Ying Mo, Jian Yang, Jiahao Liu et al.
Multi-view Aggregation Network for Dichotomous Image Segmentation
Qian Yu, Xiaoqi Zhao, Youwei Pang et al.
Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D Prior
Fangfu Liu, Diankun Wu, Yi Wei et al.
Latent Space Editing in Transformer-Based Flow Matching
Vincent Tao Hu, Wei Zhang, Meng Tang et al.
Hypergraph-enhanced Dual Semi-supervised Graph Classification
Wei Ju, Zhengyang Mao, Siyu Yi et al.
Smooth Tchebycheff Scalarization for Multi-Objective Optimization
Xi Lin, Xiaoyuan Zhang, Zhiyuan Yang et al.
Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer
Yaoting Wang, Liu Weisong, Guangyao Li et al.
AdaRevD: Adaptive Patch Exiting Reversible Decoder Pushes the Limit of Image Deblurring
Xintian Mao, Xiwen Gao, Yan Wang
Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model
Zhicai Wang, Longhui Wei, Tan Wang et al.
Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets
Dominique Beaini, Shenyang(Andy) Huang, Joao Cunha et al.
Unsupervised Universal Image Segmentation
XuDong Wang, Dantong Niu, Xinyang Han et al.
Dynamic Prompt Optimizing for Text-to-Image Generation
Wenyi Mo, Tianyu Zhang, Yalong Bai et al.
PaSCo: Urban 3D Panoptic Scene Completion with Uncertainty Awareness
Anh-Quan Cao, Angela Dai, Raoul de Charette
GenZI: Zero-Shot 3D Human-Scene Interaction Generation
Lei Li, Angela Dai
MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric
Haokun Lin, Haoli Bai, Zhili Liu et al.
Multi-Modal Latent Space Learning for Chain-of-Thought Reasoning in Language Models
Liqi He, Zuchao Li, Xiantao Cai et al.
MBR and QE Finetuning: Training-time Distillation of the Best and Most Expensive Decoding Methods
Mara Finkelstein, Markus Freitag
Skip-Attention: Improving Vision Transformers by Paying Less Attention
Shashank Venkataramanan, Amir Ghodrati, Yuki Asano et al.
MathAttack: Attacking Large Language Models towards Math Solving Ability
Zihao Zhou, Qiufeng Wang, Mingyu Jin et al.
HeadStudio: Text to Animatable Head Avatars with 3D Gaussian Splatting
Zhenglin Zhou, Fan Ma, Hehe Fan et al.
Partitioning Message Passing for Graph Fraud Detection
Wei Zhuo, Zemin Liu, Bryan Hooi et al.
FreeMotion: A Unified Framework for Number-free Text-to-Motion Synthesis
Ke Fan, Junshu Tang, Weijian Cao et al.
Contrastive Mean-Shift Learning for Generalized Category Discovery
Sua Choi, Dahyun Kang, Minsu Cho
SafeDreamer: Safe Reinforcement Learning with World Models
Weidong Huang, Jiaming Ji, Chunhe Xia et al.
Deep Variational Incomplete Multi-View Clustering: Exploring Shared Clustering Structures
Gehui Xu, Jie Wen, Chengliang Liu et al.
Potential Based Diffusion Motion Planning
Yunhao Luo, Chen Sun, Josh Tenenbaum et al.
MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation
Hanzhe Hu, Zhizhuo Zhou, Varun Jampani et al.
VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation
XuDong Wang, Ishan Misra, Ziyun Zeng et al.
MixDQ: Memory-Efficient Few-Step Text-to-Image Diffusion Models with Metric-Decoupled Mixed Precision Quantization
Zhao Tianchen, Xuefei Ning, Tongcheng Fang et al.
GoMVS: Geometrically Consistent Cost Aggregation for Multi-View Stereo
Jiang Wu, Rui Li, Haofei Xu et al.
On the Trajectory Regularity of ODE-based Diffusion Sampling
Defang Chen, Zhenyu Zhou, Can Wang et al.
Don't trust your eyes: on the (un)reliability of feature visualizations
Robert Geirhos, Roland S. Zimmermann, Blair Bilodeau et al.
Generative Sliced MMD Flows with Riesz Kernels
Johannes Hertrich, Christian Wald, Fabian Altekrüger et al.
FedSelect: Personalized Federated Learning with Customized Selection of Parameters for Fine-Tuning
Rishub Tamirisa, Chulin Xie, Wenxuan Bao et al.
Time Weaver: A Conditional Time Series Generation Model
Sai Shankar Narasimhan, Shubhankar Agarwal, Oguzhan Akcin et al.
MetaCloak: Preventing Unauthorized Subject-driven Text-to-image Diffusion-based Synthesis via Meta-learning
Yixin Liu, Chenrui Fan, Yutong Dai et al.
ADA-Track: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association
Shuxiao Ding, Lukas Schneider, Marius Cordts et al.
A Computational Framework for Solving Wasserstein Lagrangian Flows
Kirill Neklyudov, Rob Brekelmans, Alexander Tong et al.
Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset
Shijie Lian, Ziyi Zhang, Hua Li et al.
Exploiting Label Skews in Federated Learning with Model Concatenation
Yiqun Diao, Qinbin Li, Bingsheng He
Robust Node Classification on Graph Data with Graph and Label Noise
Yonghua Zhu, Lei Feng, Zhenyun Deng et al.
RobustSAM: Segment Anything Robustly on Degraded Images
Wei-Ting Chen, Yu Jiet Vong, Sy-Yen Kuo et al.
Alternate Diverse Teaching for Semi-supervised Medical Image Segmentation
Zhen Zhao, Zicheng Wang, Dian Yu et al.
SegPoint: Segment Any Point Cloud via Large Language Model
Shuting He, Henghui Ding, Xudong Jiang et al.
Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking
Jiyao Zhang, Weiyao Huang, Bo Peng et al.
ElasticDiffusion: Training-free Arbitrary Size Image Generation through Global-Local Content Separation
Moayed Haji Ali, Guha Balakrishnan, Vicente Ordonez
DifFlow3D: Toward Robust Uncertainty-Aware Scene Flow Estimation with Iterative Diffusion-Based Refinement
Jiuming Liu, Guangming Wang, Weicai Ye et al.
Fully Hyperbolic Convolutional Neural Networks for Computer Vision
Ahmad Bdeir, Kristian Schwethelm, Niels Landwehr
Making RL with Preference-based Feedback Efficient via Randomization
Runzhe Wu, Wen Sun
Multicalibration for Confidence Scoring in LLMs
Gianluca Detommaso, Martin A Bertran, Riccardo Fogliato et al.
FairRAG: Fair Human Generation via Fair Retrieval Augmentation
Robik Shrestha, Yang Zou, Qiuyu Chen et al.
Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video
Shashank Venkataramanan, Mamshad Nayeem Rizve, Joao Carreira et al.
Convolutional Prompting meets Language Models for Continual Learning
Anurag Roy, Riddhiman Moulick, Vinay Verma et al.
Video Question Answering with Procedural Programs
Rohan Choudhury, Koichiro Niinuma, Kris Kitani et al.
Generalization to New Sequential Decision Making Tasks with In-Context Learning
Sharath Chandra Raparthy, Eric Hambro, Robert Kirk et al.
OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies
Lingdong Kong, Youquan Liu, Lai Xing Ng et al.
Action Scene Graphs for Long-Form Understanding of Egocentric Videos
Ivan Rodin, Antonino Furnari, Kyle Min et al.
An Information-Theoretic Analysis of In-Context Learning
Hong Jun Jeon, Jason Lee, Qi Lei et al.
Efficient Exploration for LLMs
Vikranth Dwaracherla, Seyed Mohammad Asghari, Botao Hao et al.
Disentangled Prompt Representation for Domain Generalization
De Cheng, Zhipeng Xu, XINYANG JIANG et al.
Compositional Text-to-Image Generation with Dense Blob Representations
Weili Nie, Sifei Liu, Morteza Mardani et al.
Towards Accurate Post-training Quantization for Diffusion Models
Changyuan Wang, Ziwei Wang, Xiuwei Xu et al.
Parallel Vertex Diffusion for Unified Visual Grounding
Authors: Zesen Cheng, Kehan Li, Peng Jin et al.
MEVG : Multi-event Video Generation with Text-to-Video Models
Gyeongrok Oh, Jaehwan Jeong, Sieun Kim et al.
Goldfish: Vision-Language Understanding of Arbitrarily Long Videos
Kirolos Ataallah, Xiaoqian Shen, Eslam mohamed abdelrahman et al.
Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies
Brian Bartoldson, James Diffenderfer, Konstantinos Parasyris et al.
Question Aware Vision Transformer for Multimodal Reasoning
Roy Ganz, Yair Kittenplon, Aviad Aberdam et al.
SUGAR: Pre-training 3D Visual Representations for Robotics
Shizhe Chen, Ricardo Garcia Pinel, Ivan Laptev et al.
Adv-Diffusion: Imperceptible Adversarial Face Identity Attack via Latent Diffusion Model
Decheng Liu, Xijun Wang, Chunlei Peng et al.
SAM-PARSER: Fine-Tuning SAM Efficiently by Parameter Space Reconstruction
Zelin Peng, Zhengqin Xu, Zhilin Zeng et al.
SimAC: A Simple Anti-Customization Method for Protecting Face Privacy against Text-to-Image Synthesis of Diffusion Models
Feifei Wang, Zhentao Tan, Tianyi Wei et al.
When Model Meets New Normals: Test-Time Adaptation for Unsupervised Time-Series Anomaly Detection
DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment
Jiuming Liu, Dong Zhuo, Zhiheng Feng et al.
Amodal Completion via Progressive Mixed Context Diffusion
Katherine Xu, Lingzhi Zhang, Jianbo Shi
On the Generalization of Stochastic Gradient Descent with Momentum
Ali Ramezani-Kebrya, Kimon Antonakopoulos, Volkan Cevher et al.
Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis
Bichen Wu, Ching-Yao Chuang, Xiaoyan Wang et al.
Privacy-Preserving Instructions for Aligning Large Language Models
Da Yu, Peter Kairouz, Sewoong Oh et al.
Collaborative Vision-Text Representation Optimizing for Open-Vocabulary Segmentation
Siyu Jiao, hongguang Zhu, Yunchao Wei et al.
Tokenize Anything via Prompting
Ting Pan, Lulu Tang, Xinlong Wang et al.
Hide in Thicket: Generating Imperceptible and Rational Adversarial Perturbations on 3D Point Clouds
Tianrui Lou, Xiaojun Jia, Jindong Gu et al.
Pyramid Diffusion for Fine 3D Large Scene Generation
Yuheng Liu, Xinke Li, Xueting Li et al.
FunQA: Towards Surprising Video Comprehension
Binzhu Xie, Sicheng Zhang, Zitang Zhou et al.
DragVideo: Interactive Drag-style Video Editing
Yufan Deng, Ruida Wang, Yuhao ZHANG et al.
Robust Emotion Recognition in Context Debiasing
Dingkang Yang, Kun Yang, Mingcheng Li et al.