Most Cited 2025 "experiment design" Papers
22,274 papers found • Page 14 of 112
Conference
Scaling Inference-Efficient Language Models
Song Bian, Minghao Yan, Shivaram Venkataraman
Topological Blindspots: Understanding and Extending Topological Deep Learning Through the Lens of Expressivity
Yam Eitan, Yoav Gelberg, Guy Bar-Shalom et al.
ObjectMover: Generative Object Movement with Video Prior
Xin Yu, Tianyu Wang, Soo Ye Kim et al.
The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization
Jae-Won Chung, Jeff J. Ma, Ruofan Wu et al.
Harnessing Massive Satellite Imagery with Efficient Masked Image Modeling
Fengxiang Wang, Hongzhen Wang, Di Wang et al.
Can Transformers Reason Logically? A Study in SAT Solving
Leyan Pan, Vijay Ganesh, Jacob Abernethy et al.
When the Future Becomes the Past: Taming Temporal Correspondence for Self-supervised Video Representation Learning
Yang Liu, Qianqian Xu, Peisong Wen et al.
ADAM: An Embodied Causal Agent in Open-World Environments
Shu Yu, Chaochao Lu
Training-Free Guidance Beyond Differentiability: Scalable Path Steering with Tree Search in Diffusion and Flow Models
Yingqing Guo, Yukang Yang, Hui Yuan et al.
ID-Patch: Robust ID Association for Group Photo Personalization
Yimeng Zhang, Tiancheng Zhi, Jing Liu et al.
PCDreamer: Point Cloud Completion Through Multi-view Diffusion Priors
Guangshun Wei, Yuan Feng, Long Ma et al.
RomanTex: Decoupling 3D-aware Rotary Positional Embedded Multi-Attention Network for Texture Synthesis
yifei feng, Mx Yang, Shuhui Yang et al.
EMOE: Modality-Specific Enhanced Dynamic Emotion Experts
Yiyang Fang, Wenke Huang, Guancheng Wan et al.
CLIP is Strong Enough to Fight Back: Test-time Counterattacks towards Zero-shot Adversarial Robustness of CLIP
Songlong Xing, Zhengyu Zhao, Nicu Sebe
TREAD: Token Routing for Efficient Architecture-agnostic Diffusion Training
Felix Krause, Timy Phan, Ming Gui et al.
Harmony in Divergence: Towards Fast, Accurate, and Memory-efficient Zeroth-order LLM Fine-tuning
Qitao Tan, Jun Liu, Zheng Zhan et al.
Fragment and Geometry Aware Tokenization of Molecules for Structure-Based Drug Design Using Language Models
Cong Fu, Xiner Li, Blake Olson et al.
DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation
Jiangran Lyu, Ziming Li, Xuesong Shi et al.
Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting
Suraj Anand, Michael Lepori, Jack Merullo et al.
DiTFastAttnV2: Head-wise Attention Compression for Multi-Modality Diffusion Transformers
Hanling Zhang, Rundong Su, Zhihang Yuan et al.
Prompt-A-Video: Prompt Your Video Diffusion Model via Preference-Aligned LLM
Yatai Ji, Jiacheng Zhang, Jie Wu et al.
3D Mesh Editing using Masked LRMs
William Gao, Dilin Wang, Yuchen Fan et al.
Aligning Human Motion Generation with Human Perceptions
Haoru Wang, Wentao Zhu, Luyi Miao et al.
RoMo: Robust Motion Segmentation Improves Structure from Motion
Lily Goli, Sara Sabour, Mark Matthews et al.
Visual Test-time Scaling for GUI Agent Grounding
Tiange Luo, Lajanugen Logeswaran, Justin Johnson et al.
Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program
Minghe Gao, Xuqi Liu, Zhongqi Yue et al.
MagicColor: Multi-instance Sketch Colorization
yinhan Zhang, Yue Ma, Bingyuan Wang et al.
UrbanLLaVA: A Multi-modal Large Language Model for Urban Intelligence
Jie Feng, Shengyuan Wang, Tianhui Liu et al.
Probing the Latent Hierarchical Structure of Data via Diffusion Models
Antonio Sclocchi, Alessandro Favero, Noam Levi et al.
EnergyMoGen: Compositional Human Motion Generation with Energy-Based Diffusion Model in Latent Space
Jianrong Zhang, Hehe Fan, Yi Yang
RANKCLIP: Ranking-Consistent Language-Image Pretraining
Yiming Zhang, Zhuokai Zhao, Zhaorun Chen et al.
SuperDec: 3D Scene Decomposition with Superquadrics Primitives
Elisabetta Fedele, Boyang Sun, Francis Engelmann et al.
WildSAT: Learning Satellite Image Representations from Wildlife Observations
Rangel Daroya, Elijah Cole, Oisin Mac Aodha et al.
AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference
Zhuomin He, Yizhen Yao, Pengfei Zuo et al.
Spectral Image Tokenizer
Carlos Esteves, Mohammed Suhail, Ameesh Makadia
MindLLM: A Subject-Agnostic and Versatile Model for fMRI-to-text Decoding
Weikang Qiu, Zheng Huang, Haoyu Hu et al.
DriveGEN: Generalized and Robust 3D Detection in Driving via Controllable Text-to-Image Diffusion Generation
Hongbin Lin, Zilu Guo, Yifan Zhang et al.
General Scene Adaptation for Vision-and-Language Navigation
Haodong Hong, Yanyuan Qiao, Sen Wang et al.
Holistic Reasoning with Long-Context LMs: A Benchmark for Database Operations on Massive Textual Data
Seiji Maekawa, Hayate Iso, Nikita Bhutani
FormalAlign: Automated Alignment Evaluation for Autoformalization
Jianqiao Lu, Yingjia Wan, Yinya Huang et al.
Periodic Materials Generation using Text-Guided Joint Diffusion Model
KISHALAY DAS, Subhojyoti Khastagir, Pawan Goyal et al.
MBQ: Modality-Balanced Quantization for Large Vision-Language Models
Shiyao Li, Yingchun Hu, Xuefei Ning et al.
CCIN: Compositional Conflict Identification and Neutralization for Composed Image Retrieval
Likai Tian, Jian Zhao, Zechao Hu et al.
Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think
Zhenyi Lu, Xiaoye Qu, Zhenyi Lu et al.
CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering
Tianyu Huai, Jie Zhou, Xingjiao Wu et al.
Amplifier: Bringing Attention to Neglected Low-Energy Components in Time Series Forecasting
Jingru Fei, Kun Yi, Wei Fan et al.
A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks
Thomas Schmied, Thomas Adler, Vihang Patil et al.
The Optimization Landscape of SGD Across the Feature Learning Strength
Alexander Atanasov, Alexandru Meterez, James Simon et al.
EmotiCrafter: Text-to-Emotional-Image Generation based on Valence-Arousal Model
Shengqi Dang, Yi He, Long Ling et al.
Seeing Your Speech Style: A Novel Zero-Shot Identity-Disentanglement Face-based Voice Conversion
Yan Rong, Li Liu
Plastic Learning with Deep Fourier Features
Alex Lewandowski, Dale Schuurmans, Marlos C. Machado
SlerpFace: Face Template Protection via Spherical Linear Interpolation
Zhizhou Zhong, Yuxi Mi, Yuge Huang et al.
CameraCtrl II: Dynamic Scene Exploration via Camera-controlled Video Diffusion Models
Hao He, Ceyuan Yang, Shanchuan Lin et al.
RayFlow: Instance-Aware Diffusion Acceleration via Adaptive Flow Trajectories
Huiyang Shao, Xin Xia, Yuhong Yang et al.
PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation
Qihan Huang, Weilong Dai, Jinlong Liu et al.
Enhancing Multilingual LLM Pretraining with Model-Based Data Selection
Bettina Messmer, Vinko Sabolčec, Martin Jaggi
Pareto Set Learning for Multi-Objective Reinforcement Learning
Erlong Liu, Yu-Chang Wu, Xiaobin Huang et al.
Reliable and Efficient Amortized Model-based Evaluation
Sang Truong, Yuheng Tu, Percy Liang et al.
Nautilus: Locality-aware Autoencoder for Scalable Mesh Generation
Yuxuan Wang, Xuanyu Yi, Haohan Weng et al.
CoreMatching: A Co-adaptive Sparse Inference Framework with Token and Neuron Pruning for Comprehensive Acceleration of Vision-Language Models
Qinsi Wang, Hancheng Ye, Ming-Yu Chung et al.
Efficient Model Editing with Task-Localized Sparse Fine-tuning
Leonardo Iurada, Marco Ciccone, Tatiana Tommasi
GRPose: Learning Graph Relations for Human Image Generation with Pose Priors
Xiangchen Yin, Donglin Di, Lei Fan et al.
Splatter-360: Generalizable 360 Gaussian Splatting for Wide-baseline Panoramic Images
Zheng Chen, Chenming Wu, Zhelun Shen et al.
VisionPAD: A Vision-Centric Pre-training Paradigm for Autonomous Driving
Haiming Zhang, Wending Zhou, Shenzhen The Chinese University of Hongkong et al.
BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation
Yuyang Peng, Shishi Xiao, Keming Wu et al.
InsightEdit: Towards Better Instruction Following for Image Editing
Yingjing Xu, Jie Kong, Jiazhi Wang et al.
Neural Exploratory Landscape Analysis for Meta-Black-Box-Optimization
Zeyuan Ma, Jiacheng Chen, Hongshu Guo et al.
GraphAvatar: Compact Head Avatars with GNN-Generated 3D Gaussians
Xiaobao Wei, Peng Chen, Ming Lu et al.
ParZC: Parametric Zero-Cost Proxies for Efficient NAS
Peijie Dong, Lujun Li, Zhenheng Tang et al.
Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning of Vision Language Models
Huajie Tan, Yuheng Ji, Xiaoshuai Hao et al.
Reconstructing People, Places, and Cameras
Lea Müller, Hongsuk Choi, Anthony Zhang et al.
Hydra-SGG: Hybrid Relation Assignment for One-stage Scene Graph Generation
Minghan Chen, Guikun Chen, Wenguan Wang et al.
Gemstones: A Model Suite for Multi-Faceted Scaling Laws
Sean McLeish, John Kirchenbauer, David Miller et al.
$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training
Jin Zhou, Kaiwen Wang, Jonathan Chang et al.
Law of the Weakest Link: Cross Capabilities of Large Language Models
Ming Zhong, Aston Zhang, Xuewei Wang et al.
Latent-EnSF: A Latent Ensemble Score Filter for High-Dimensional Data Assimilation with Sparse Observation Data
Phillip Si, Peng Chen
Revisiting Multimodal Fusion for 3D Anomaly Detection from an Architectural Perspective
Kaifang Long, Guoyang Xie, Lianbo Ma et al.
Towards Trustworthy Knowledge Graph Reasoning: An Uncertainty Aware Perspective
Bo Ni, Yu Wang, Lu Cheng et al.
Temporal Separation with Entropy Regularization for Knowledge Distillation in Spiking Neural Networks
Kairong Yu, Chengting Yu, Tianqing Zhang et al.
Noise Hypernetworks: Amortizing Test-Time Compute in Diffusion Models
Luca Eyring, Shyamgopal Karthik, Alexey Dosovitskiy et al.
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float (DFloat11)
Tianyi Zhang, Mohsen Hariri, Shaochen (Henry) Zhong et al.
MDNS: Masked Diffusion Neural Sampler via Stochastic Optimal Control
Yuchen Zhu, Wei Guo, Jaemoo Choi et al.
A Recipe for Generating 3D Worlds from a Single Image
Katja Schwarz, Denis Rozumny, Samuel Rota Bulò et al.
UCF-Crime-DVS: A Novel Event-Based Dataset for Video Anomaly Detection with Spiking Neural Networks
Yuanbin Qian, Shuhan Ye, Chong Wang et al.
Diff2Flow: Training Flow Matching Models via Diffusion Model Alignment
Johannes Schusterbauer, Ming Gui, Frank Fundel et al.
Data-Driven Performance Guarantees for Classical and Learned Optimizers
Rajiv Sambharya, Bartolomeo Stellato
HIIF: Hierarchical Encoding based Implicit Image Function for Continuous Super-resolution
Yuxuan Jiang, Ho Man Kwan, jasmine peng et al.
Light-T2M: A Lightweight and Fast Model for Text-to-motion Generation
Ling-An Zeng, Guohong Huang, Gaojie Wu et al.
On the Crucial Role of Initialization for Matrix Factorization
Bingcong Li, Liang Zhang, Aryan Mokhtari et al.
Backdoor Attacks Against No-Reference Image Quality Assessment Models via a Scalable Trigger
Yi Yu, Song Xia, Xun Lin et al.
Attention layers provably solve single-location regression
Pierre Marion, Raphaël Berthier, Gérard Biau et al.
Unleashing Hour-Scale Video Training for Long Video-Language Understanding
Jingyang Lin, Jialian Wu, Ximeng Sun et al.
Adaptive Self-improvement LLM Agentic System for ML Library Development
Genghan Zhang, Weixin Liang, Olivia Hsu et al.
Lightweight Neural App Control
Filippos Christianos, Georgios Papoudakis, Thomas Coste et al.
Exploring the limits of strong membership inference attacks on large language models
Jamie Hayes, I Shumailov, Christopher A. Choquette-Choo et al.
Hybrid Global-Local Representation with Augmented Spatial Guidance for Zero-Shot Referring Image Segmentation
Ting Liu, Siyuan Li
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text
Nikhil Kandpal, Brian Lester, Colin Raffel et al.
Taylor Series-Inspired Local Structure Fitting Network for Few-shot Point Cloud Semantic Segmentation
Changshuo Wang, Shuting He, Xiang Fang et al.
Task-Agnostic Guided Feature Expansion for Class-Incremental Learning
Bowen Zheng, Da-Wei Zhou, Han-Jia Ye et al.
Identifying and Mitigating Position Bias of Multi-image Vision-Language Models
Xinyu Tian, Shu Zou, Zhaoyuan Yang et al.
Generative Monoculture in Large Language Models
Fan Wu, Emily Black, Varun Chandrasekaran
Atlas Gaussians Diffusion for 3D Generation
Haitao Yang, Yuan Dong, Hanwen Jiang et al.
NeSyCoCo: A Neuro-Symbolic Concept Composer for Compositional Generalization
Danial Kamali, Elham J. Barezi, Parisa Kordjamshidi
VTON-HandFit: Virtual Try-on for Arbitrary Hand Pose Guided by Hand Priors Embedding
Yujie Liang, Xiaobin Hu, Boyuan Jiang et al.
Consistency Checks for Language Model Forecasters
Daniel Paleka, Abhimanyu Pallavi Sudhir, Alejandro Alvarez et al.
A Closer Look at TabPFN v2: Understanding Its Strengths and Extending Its Capabilities
Han-Jia Ye, Si-Yang Liu, Wei-Lun (Harry) Chao
FilterTS: Comprehensive Frequency Filtering for Multivariate Time Series Forecasting
Yulong Wang, Yushuo Liu, Xiaoyi Duan et al.
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Haiyang Wang, Yue Fan, Muhammad Ferjad Naeem et al.
Efficient Attention-Sharing Information Distillation Transformer for Lightweight Single Image Super-Resolution
Karam Park, Jae Woong Soh, Nam Ik Cho
ConfTuner: Training Large Language Models to Express Their Confidence Verbally
Yibo Li, Miao Xiong, Jiaying Wu et al.
V2X-R: Cooperative LiDAR-4D Radar Fusion with Denoising Diffusion for 3D Object Detection
Xun Huang, Jinlong Wang, Qiming Xia et al.
Generate, but Verify: Reducing Hallucination in Vision-Language Models with Retrospective Resampling
Tsung-Han (Patrick) Wu, Heekyung Lee, Jiaxin Ge et al.
Efficiently Parameterized Neural Metriplectic Systems
Anthony Gruber, Kookjin Lee, Haksoo Lim et al.
Towards a Comprehensive, Efficient and Promptable Anatomic Structure Segmentation Model Using 3D Whole-Body CT Scans
Heng Guo, Jianfeng Zhang, Jiaxing Huang et al.
Vector-ICL: In-context Learning with Continuous Vector Representations
Yufan Zhuang, Chandan Singh, Liyuan Liu et al.
Training-free LLM-generated Text Detection by Mining Token Probability Sequences
Yihuai Xu, Yongwei Wang, YIFEI BI et al.
Measuring Human and AI Values Based on Generative Psychometrics with Large Language Models
Haoran Ye, Yuhang Xie, Yuanyi Ren et al.
Equivariance Everywhere All At Once: A Recipe for Graph Foundation Models
Ben Finkelshtein, Ismail Ilkan Ceylan, Michael Bronstein et al.
GUI-Xplore: Empowering Generalizable GUI Agents with One Exploration
Yuchen Sun, Shanhui Zhao, Tao Yu et al.
MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL
Claas Voelcker, Marcel Hussing, ERIC EATON et al.
Gazing Into Missteps: Leveraging Eye-Gaze for Unsupervised Mistake Detection in Egocentric Videos of Skilled Human Activities
Michele Mazzamuto, Antonino Furnari, Yoichi Sato et al.
DreamRelation: Bridging Customization and Relation Generation
Qingyu Shi, Lu Qi, Jianzong Wu et al.
Ouroboros-Diffusion: Exploring Consistent Content Generation in Tuning-free Long Video Diffusion
Jingyuan Chen, Fuchen Long, Jie An et al.
Confidence Estimation for Error Detection in Text-to-SQL Systems
Oleg Somov, Elena Tutubalina
HOIGPT: Learning Long-Sequence Hand-Object Interaction with Language Models
Mingzhen Huang, Fu-Jen Chu, Bugra Tekin et al.
Towards Multiple Character Image Animation Through Enhancing Implicit Decoupling
Jingyun Xue, WANG HongFa, Qi Tian et al.
Efficient Gaussian Splatting for Monocular Dynamic Scene Rendering via Sparse Time-Variant Attribute Modeling
Hanyang Kong, Xingyi Yang, Xinchao Wang
Speech Recognition Meets Large Language Model: Benchmarking, Models, and Exploration
Ziyang Ma, Guanrou Yang, Yifan Yang et al.
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data
Yiren Song, Cheng Liu, Mike Zheng Shou
Tuning the Frequencies: Robust Training for Sinusoidal Neural Networks
Tiago Novello, Diana Aldana Moreno, André Araujo et al.
Sports-Traj: A Unified Trajectory Generation Model for Multi-Agent Movement in Sports
Yi Xu, Yun Fu
PLeaS - Merging Models with Permutations and Least Squares
Anshul Nasery, Jonathan Hayase, Pang Wei Koh et al.
Overcoming False Illusions in Real-World Face Restoration with Multi-Modal Guided Diffusion Model
Keda TAO, Jinjin Gu, Yulun Zhang et al.
MIP against Agent: Malicious Image Patches Hijacking Multimodal OS Agents
Lukas Aichberger, Alasdair Paren, Guohao Li et al.
Rewind-to-Delete: Certified Machine Unlearning for Nonconvex Functions
Siqiao Mu, Diego Klabjan
FedMIA: An Effective Membership Inference Attack Exploiting "All for One" Principle in Federated Learning
Gongxi Zhu, Donghao Li, Hanlin Gu et al.
Revisiting Tampered Scene Text Detection in the Era of Generative AI
Chenfan Qu, Yiwu Zhong, Fengjun Guo et al.
The 3D-PC: a benchmark for visual perspective taking in humans and machines
Drew Linsley, Peisen Zhou, Alekh Ashok et al.
CADDreamer: CAD Object Generation from Single-view Images
Yuan Li, Cheng Lin, Yuan Liu et al.
Bayesian Concept Bottleneck Models with LLM Priors
Jean Feng, Avni Kothari, Lucas Zier et al.
DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents
Hao Li, Xiaogeng Liu, CHIU Chun et al.
Scaling Laws for Optimal Data Mixtures
Mustafa Shukor, Louis Bethune, Dan Busbridge et al.
Generalized Recorrupted-to-Recorrupted: Self-Supervised Learning Beyond Gaussian Noise
Brayan Monroy, Jorge Bacca, Julián Tachella
Flexible Frame Selection for Efficient Video Reasoning
Shyamal Buch, Arsha Nagrani, Anurag Arnab et al.
From Flat to Hierarchical: Extracting Sparse Representations with Matching Pursuit
Valérie Costa, Thomas Fel, Ekdeep S Lubana et al.
Emergent Temporal Correspondences from Video Diffusion Transformers
Jisu Nam, Soowon Son, Dahyun Chung et al.
Measuring memorization in RLHF for code completion
Jamie Hayes, I Shumailov, Billy Porter et al.
S4-Driver: Scalable Self-Supervised Driving Multimodal Large Language Model with Spatio-Temporal Visual Representation
Yichen Xie, Runsheng Xu, Tong He et al.
VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation
Wenhao Wang, Yi Yang
ReCap: Better Gaussian Relighting with Cross-Environment Captures
Jingzhi Li, Zongwei Wu, Eduard Zamfir et al.
AdvPrefix: An Objective for Nuanced LLM Jailbreaks
Sicheng Zhu, Brandon Amos, Yuandong Tian et al.
DocLayLLM: An Efficient Multi-modal Extension of Large Language Models for Text-rich Document Understanding
Wenhui Liao, Jiapeng Wang, Hongliang Li et al.
Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models
Jin Wang, Chenghui Lv, Xian Li et al.
EgoExoBench: A Benchmark for First- and Third-person View Video Understanding in MLLMs
Yuping He, Yifei Huang, Guo Chen et al.
Needle Threading: Can LLMs Follow Threads Through Near-Million-Scale Haystacks?
Jonathan Roberts, Kai Han, Samuel Albanie
STCOcc: Sparse Spatial-Temporal Cascade Renovation for 3D Occupancy and Scene Flow Prediction
Zhimin Liao, Ping Wei, Shuaijia Chen et al.
Visual Generation Without Guidance
Huayu Chen, Kai Jiang, Kaiwen Zheng et al.
Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering
Federico Cocchi, Nicholas Moratelli, Marcella Cornia et al.
SplatFlow: Self-Supervised Dynamic Gaussian Splatting in Neural Motion Flow Field for Autonomous Driving
Su Sun, Cheng Zhao, Zhuoyang Sun et al.
Deep MMD Gradient Flow without adversarial training
Alexandre Galashov, Valentin De Bortoli, Arthur Gretton
EqNIO: Subequivariant Neural Inertial Odometry
Royina Karegoudra Jayanth, Yinshuang Xu, Ziyun Wang et al.
Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model
Zhaochong An, Guolei Sun, Yun Liu et al.
Multiple Heads are Better than One: Mixture of Modality Knowledge Experts for Entity Representation Learning
Yichi Zhang, Zhuo Chen, Lingbing Guo et al.
DocVLM: Make Your VLM an Efficient Reader
Mor Shpigel Nacson, Aviad Aberdam, Roy Ganz et al.
OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Xize Cheng, Siqi Zheng, zehan wang et al.
Scalable and Certifiable Graph Unlearning: Overcoming the Approximation Error Barrier
Lu Yi, Zhewei Wei
SpatialDreamer: Self-supervised Stereo Video Synthesis from Monocular Input
Zhen Lv, Yangqi Long, Congzhentao Huang et al.
More RLHF, More Trust? On The Impact of Preference Alignment On Trustworthiness
Aaron J. Li, Satyapriya Krishna, Hima Lakkaraju
Synthesizing Privacy-Preserving Text Data via Finetuning *without* Finetuning Billion-Scale LLMs
Bowen Tan, Zheng Xu, Eric Xing et al.
Energy-based Backdoor Defense Against Federated Graph Learning
Guancheng Wan, Zitong Shi, Wenke Huang et al.
MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments
MATTHIEU CORD, Antonin Vobecky, Oriane Siméoni et al.
QuaDiM: A Conditional Diffusion Model For Quantum State Property Estimation
Yehui Tang, Mabiao Long, Junchi Yan
Prioritized Generative Replay
Ren Wang, Kevin Frans, Pieter Abbeel et al.
Probability Density Geodesics in Image Diffusion Latent Space
Qingtao Yu, Jaskirat Singh, Zhaoyuan Yang et al.
MolSpectra: Pre-training 3D Molecular Representation with Multi-modal Energy Spectra
Liang Wang, Shaozhen Liu, Yu Rong et al.
Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?
Simon Park, Abhishek Panigrahi, Yun Cheng et al.
TraF-Align: Trajectory-aware Feature Alignment for Asynchronous Multi-agent Perception
Zhiying Song, Lei Yang, Fuxi Wen et al.
Decomposition Polyhedra of Piecewise Linear Functions
Marie-Charlotte Brandenburg, Moritz Grillo, Christoph Hertrich
MultiGO: Towards Multi-level Geometry Learning for Monocular 3D Textured Human Reconstruction
Gangjian Zhang, Nanjie Yao, Shunsi Zhang et al.
Can Transformers Do Enumerative Geometry?
Baran Hashemi, Roderic Corominas, Alessandro Giacchetto
When Do LLMs Help With Node Classification? A Comprehensive Analysis
Xixi Wu, Yifei Shen, Fangzhou Ge et al.
REGENT: A Retrieval-Augmented Generalist Agent That Can Act In-Context in New Environments
Kaustubh Sridhar, Souradeep Dutta, Dinesh Jayaraman et al.
Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization
Luca Masserano, Abdul Fatir Ansari, Boran Han et al.
DIFFER: Disentangling Identity Features via Semantic Cues for Clothes-Changing Person Re-ID
Xin Liang, Yogesh S. Rawat
DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image
Qingxuan Wu, Zhiyang Dou, Sirui Xu et al.
Aux-Think: Exploring Reasoning Strategies for Data-Efficient Vision-Language Navigation
Shuo Wang, Yongcai Wang, Wanting Li et al.
Rapidly Adapting Policies to the Real-World via Simulation-Guided Fine-Tuning
Patrick Yin, Tyler Westenbroek, Ching-An Cheng et al.
MagCache: Fast Video Generation with Magnitude-Aware Cache
Zehong Ma, Longhui Wei, Feng Wang et al.
PDE-Transformer: Efficient and Versatile Transformers for Physics Simulations
Benjamin Holzschuh, Qiang Liu, Georg Kohl et al.
Circumventing Shortcuts in Audio-visual Deepfake Detection Datasets with Unsupervised Learning
Stefan Smeu, Dragos-Alexandru Boldisor, Dan Oneata et al.
GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency
Dongyue Lu, Lingdong Kong, Tianxin Huang et al.
Learning to Adapt Frozen CLIP for Few-Shot Test-Time Domain Adaptation
Zhixiang Chi, Li Gu, Huan Liu et al.
TASAR: Transfer-based Attack on Skeletal Action Recognition
Yunfeng Diao, Baiqi Wu, Ruixuan Zhang et al.
Evaluating Large Language Models through Role-Guide and Self-Reflection: A Comparative Study
Lili Zhao, Yang Wang, Qi Liu et al.
LoRA Subtraction for Drift-Resistant Space in Exemplar-Free Continual Learning
Xuan Liu, Xiaobin Chang
Progressive Compositionality in Text-to-Image Generative Models
Xu Han, Linghao Jin, Xiaofeng Liu et al.
Language Guided Concept Bottleneck Models for Interpretable Continual Learning
Lu Yu, HaoYu Han, Zhe Tao et al.
Jailbreaking as a Reward Misspecification Problem
Zhihui Xie, Jiahui Gao, Lei Li et al.
Objective drives the consistency of representational similarity across datasets
Laure Ciernik, Lorenz Linhardt, Marco Morik et al.
TACO: Taming Diffusion for in-the-wild Video Amodal Completion
Ruijie Lu, Yixin Chen, Yu Liu et al.