Most Cited 2025 "training speed" Papers
22,274 papers found • Page 17 of 112
Conference
Can Classic GNNs Be Strong Baselines for Graph-level Tasks? Simple Architectures Meet Excellence
Yuankai Luo, Lei Shi, Xiao-Ming Wu
Correcting Deviations from Normality: A Reformulated Diffusion Model for Multi-Class Unsupervised Anomaly Detection
Farzad Beizaee, Gregory A. Lodygensky, Christian Desrosiers et al.
MultiADS: Defect-aware Supervision for Multi-type Anomaly Detection and Segmentation in Zero-Shot Learning
Ylli Sadikaj, Hongkuan Zhou, Lavdim Halilaj et al.
BadToken: Token-level Backdoor Attacks to Multi-modal Large Language Models
Zenghui Yuan, Jiawen Shi, Pan Zhou et al.
Vanish into Thin Air: Cross-prompt Universal Adversarial Attacks for SAM2
Ziqi Zhou, Yifan Hu, Yufei Song et al.
Accurate and Regret-Aware Numerical Problem Solver for Tabular Question Answering
Yuxiang Wang, Jianzhong Qi, Junhao Gan
STAA-SNN: Spatial-Temporal Attention Aggregator for Spiking Neural Networks
Tianqing Zhang, Kairong Yu, Xian Zhong et al.
ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models
Liyan Tang, Grace Kim, Xinyu Zhao et al.
Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation
Chanyoung Kim, Dayun Ju, Woojung Han et al.
PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution
Zhu Li Bo, Jianze Li, Haotong Qin et al.
Deformable Radial Kernel Splatting
Yihua Huang, Mingxian Lin, Yangtian Sun et al.
When Do LLMs Help With Node Classification? A Comprehensive Analysis
Xixi Wu, Yifei Shen, Fangzhou Ge et al.
Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?
Simon Park, Abhishek Panigrahi, Yun Cheng et al.
MLLM-For3D: Adapting Multimodal Large Language Model for 3D Reasoning Segmentation
Jiaxin Huang, Runnan Chen, Ziwen Li et al.
Constrained Fair and Efficient Allocations
Benjamin Cookson, Soroush Ebadian, Nisarg Shah
Markov Persuasion Processes: Learning to Persuade From Scratch
Francesco Bacchiocchi, Francesco Emanuele Stradi, Matteo Castiglioni et al.
InPO: Inversion Preference Optimization with Reparametrized DDIM for Efficient Diffusion Model Alignment
Yunhong Lu, Qichao Wang, Hengyuan Cao et al.
Beyond the convexity assumption: Realistic tabular data generation under quantifier-free real linear constraints
Mihaela Stoian, Eleonora Giunchiglia
Distilling Monocular Foundation Model for Fine-grained Depth Completion
Yingping Liang, Yutao Hu, Wenqi Shao et al.
Probability Density Geodesics in Image Diffusion Latent Space
Qingtao Yu, Jaskirat Singh, Zhaoyuan Yang et al.
LATINO-PRO: LAtent consisTency INverse sOlver with PRompt Optimization
Alessio Spagnoletti, Jean Prost, Andres Almansa et al.
GLASS: Guided Latent Slot Diffusion for Object-Centric Learning
Krishnakant Singh, Simone Schaub-Meyer, Stefan Roth
Language Guided Concept Bottleneck Models for Interpretable Continual Learning
Lu Yu, HaoYu Han, Zhe Tao et al.
Enhancing Foundation Models for Time Series Forecasting via Wavelet-based Tokenization
Luca Masserano, Abdul Fatir Ansari, Boran Han et al.
O-TPT: Orthogonality Constraints for Calibrating Test-time Prompt Tuning in Vision-Language Models
Ashshak Sharifdeen, Muhammad Akhtar Munir, Sanoojan Baliah et al.
Fine-Tuning Visual Autogressive Models for Subject-Driven Generation
Jiwoo Chung, Sangeek Hyun, Hyunjun Kim et al.
REWIND: Real-Time Egocentric Whole-Body Motion Diffusion with Exemplar-Based Identity Conditioning
Jihyun Lee, Weipeng Xu, Alexander Richard et al.
Objective drives the consistency of representational similarity across datasets
Laure Ciernik, Lorenz Linhardt, Marco Morik et al.
MagCache: Fast Video Generation with Magnitude-Aware Cache
Zehong Ma, Longhui Wei, Feng Wang et al.
Neural Video Compression with Context Modulation
Chuanbo Tang, Zhuoyuan Li, Yifan Bian et al.
Genesis: Multimodal Driving Scene Generation with Spatio-Temporal and Cross-Modal Consistency
Xiangyu Guo, Zhanqian Wu, Kaixin Xiong et al.
Quality-Driven Curation of Remote Sensing Vision-Language Data via Learned Scoring Models
Dilxat Muhtar, Enzhuo Zhang, Zhenshi Li et al.
MultiGO: Towards Multi-level Geometry Learning for Monocular 3D Textured Human Reconstruction
Gangjian Zhang, Nanjie Yao, Shunsi Zhang et al.
BountyBench: Dollar Impact of AI Agent Attackers and Defenders on Real-World Cybersecurity Systems
Andy Zhang, Joey Ji, Celeste Menders et al.
PMQ-VE: Progressive Multi-Frame Quantization for Video Enhancement
ZhanFeng Feng, Long Peng, Xin Di et al.
All-in-One: Transferring Vision Foundation Models into Stereo Matching
Jingyi Zhou, Haoyu Zhang, Jiakang Yuan et al.
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs)
Leander Girrbach, Stephan Alaniz, Yiran Huang et al.
An End-to-End Robust Point Cloud Semantic Segmentation Network with Single-Step Conditional Diffusion Models
Wentao Qu, Jing Wang, Yongshun Gong et al.
SAMRefiner: Taming Segment Anything Model for Universal Mask Refinement
Yuqi Lin, Hengjia Li, Wenqi Shao et al.
RNG: Relightable Neural Gaussians
Jiahui Fan, Fujun Luan, Jian Yang et al.
HybridGS: High-Efficiency Gaussian Splatting Data Compression using Dual-Channel Sparse Representation and Point Cloud Encoder
Qi Yang, Le Yang, Geert Van der Auwera et al.
SILMM: Self-Improving Large Multimodal Models for Compositional Text-to-Image Generation
Leigang Qu, Haochuan Li, Wenjie Wang et al.
Model Provenance Testing for Large Language Models
Ivica Nikolic, Teodora Baluta, Prateek Saxena
Counterfactual Generative Modeling with Variational Causal Inference
Yulun Wu, Louis McConnell, Claudia Iriondo
PDE-Transformer: Efficient and Versatile Transformers for Physics Simulations
Benjamin Holzschuh, Qiang Liu, Georg Kohl et al.
SurFhead: Affine Rig Blending for Geometrically Accurate 2D Gaussian Surfel Head Avatars
Jaeseong Lee, Taewoong Kang, Marcel Buehler et al.
QuaDiM: A Conditional Diffusion Model For Quantum State Property Estimation
Yehui Tang, Mabiao Long, Junchi Yan
ADIFF: Explaining audio difference using natural language
Soham Deshmukh, Shuo Han, Rita Singh et al.
Controllable Generation via Locally Constrained Resampling
Kareem Ahmed, Kai-Wei Chang, Guy Van den Broeck
Attention-Driven GUI Grounding: Leveraging Pretrained Multimodal Large Language Models Without Fine-Tuning
Hai-Ming Xu, Qi Chen, Lei Wang et al.
MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining
Yunze Liu, Li Yi
Learning World Models for Interactive Video Generation
Taiye Chen, Xun Hu, Zihan Ding et al.
Scaling Collapse Reveals Universal Dynamics in Compute-Optimally Trained Neural Networks
Shikai Qiu, Lechao Xiao, Andrew Wilson et al.
DIFFER: Disentangling Identity Features via Semantic Cues for Clothes-Changing Person Re-ID
Xin Liang, Yogesh S. Rawat
Fast Summation of Radial Kernels via QMC Slicing
Johannes Hertrich, Tim Jahn, Michael Quellmalz
MegActor-Sigma: Unlocking Flexible Mixed-Modal Control in Portrait Animation with Diffusion Transformer
Shurong Yang, Huadong Li, Juhao Wu et al.
CoA: Towards Real Image Dehazing via Compression-and-Adaptation
Long Ma, Yuxin Feng, Yan Zhang et al.
Test-time Adaptation for Cross-modal Retrieval with Query Shift
Haobin Li, Peng Hu, Qianjun Zhang et al.
HumanRig: Learning Automatic Rigging for Humanoid Character in a Large Scale Dataset
Zedong Chu, Feng Xiong, Meiduo Liu et al.
ICM-Assistant: Instruction-tuning Multimodal Large Language Models for Rule-based Explainable Image Content Moderation
Mengyang Wu, Yuzhi Zhao, Jialun Cao et al.
GAS: Generative Avatar Synthesis from a Single Image
Yixing Lu, Junting Dong, YoungJoong Kwon et al.
Universal Video Temporal Grounding with Generative Multi-modal Large Language Models
Zeqian Li, Shangzhe Di, Zhonghua Zhai et al.
DreamText: High Fidelity Scene Text Synthesis
Yibin Wang, Weizhong Zhang, honghui xu et al.
LIBA: Language Instructed Multi-granularity Bridge Assistant for 3D Visual Grounding
Yuan Wang, Ya-Li Li, W U Eastman Z Y et al.
Adversarially Robust Out-of-Distribution Detection Using Lyapunov-Stabilized Embeddings
Hossein Mirzaei Sadeghlou, Mackenzie Mathis
Aligned Better, Listen Better for Audio-Visual Large Language Models
Yuxin Guo, Shuailei Ma, Shijie Ma et al.
Towards Transformer-Based Aligned Generation with Self-Coherence Guidance
Shulei Wang, Wang Lin, Hai Huang et al.
Discrete Codebook World Models for Continuous Control
Aidan Scannell, Mohammadreza Nakhaeinezhadfard, Kalle Kujanpää et al.
Rethinking Query-based Transformer for Continual Image Segmentation
Yuchen Zhu, Cheng Shi, Dingyou Wang et al.
Solving Inequality Proofs with Large Language Models
Jiayi Sheng, Luna Lyu, Jikai Jin et al.
Circumventing Shortcuts in Audio-visual Deepfake Detection Datasets with Unsupervised Learning
Stefan Smeu, Dragos-Alexandru Boldisor, Dan Oneata et al.
EventGPT: Event Stream Understanding with Multimodal Large Language Models
shaoyu liu, Jianing Li, guanghui zhao et al.
RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance
Chengrui Wang, Pengfei Liu, Min Zhou et al.
GEAL: Generalizable 3D Affordance Learning with Cross-Modal Consistency
Dongyue Lu, Lingdong Kong, Tianxin Huang et al.
Diff3DS: Generating View-Consistent 3D Sketch via Differentiable Curve Rendering
Yibo Zhang, Lihong Wang, Changqing Zou et al.
Learning Pattern-Specific Experts for Time Series Forecasting Under Patch-level Distribution Shift
Yanru Sun, Zongxia Xie, Emadeldeen Eldele et al.
MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem
Fan LIU, Zherui Yang, Cancheng Liu et al.
Theoretically Grounded Framework for LLM Watermarking: A Distribution-Adaptive Approach
Haiyun He, Yepeng Liu, Ziqiao Wang et al.
BodyGen: Advancing Towards Efficient Embodiment Co-Design
Haofei Lu, Zhe Wu, Junliang Xing et al.
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation
Weiming Ren, Huan Yang, Jie Min et al.
Diffusion Tree Sampling: Scalable inference‑time alignment of diffusion models
Vineet Jain, Kusha Sareen, Mohammad Pedramfar et al.
Relieving Universal Label Noise for Unsupervised Visible-Infrared Person Re-Identification by Inferring from Neighbors
Xiao Teng, Long Lan, Dingyao Chen et al.
SoundCTM: Unifying Score-based and Consistency Models for Full-band Text-to-Sound Generation
Koichi Saito, Dongjun Kim, Takashi Shibuya et al.
How Transformers Learn Structured Data: Insights From Hierarchical Filtering
Jerome Garnier-Brun, Marc Mezard, Emanuele Moscato et al.
Beyond Bradley-Terry Models: A General Preference Model for Language Model Alignment
Yifan Zhang, Ge Zhang, Yue Wu et al.
AgentTTS: Large Language Model Agent for Test-time Compute-optimal Scaling Strategy in Complex Tasks
Fali Wang, Hui Liu, Zhenwei Dai et al.
LesionLocator: Zero-Shot Universal Tumor Segmentation and Tracking in 3D Whole-Body Imaging
Maximilian Rokuss, Yannick Kirchhoff, Seval Akbal et al.
SketchVideo: Sketch-based Video Generation and Editing
Feng-Lin Liu, Hongbo Fu, Xintao Wang et al.
Multi-Focus Image Fusion via Explicit Defocus Blur Modelling
Yuhui Quan, Xi Wan, Zitao Tang et al.
CriSPO: Multi-Aspect Critique-Suggestion-guided Automatic Prompt Optimization for Text Generation
Han He, Qianchu Liu, Lei Xu et al.
Diffusion models for Gaussian distributions: Exact solutions and Wasserstein errors
Emile Pierret, Bruno Galerne
Mamba4D: Efficient 4D Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models
Jiuming Liu, Jinru Han, Lihao Liu et al.
Edit Away and My Face Will not Stay: Personal Biometric Defense against Malicious Generative Editing
Hanhui Wang, Yihua Zhang, Ruizheng Bai et al.
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs
Xiaoqin Wang, Xusen Ma, Xianxu Hou et al.
Autonomous LLM-Enhanced Adversarial Attack for Text-to-Motion
Honglei Miao, Fan Ma, Ruijie Quan et al.
MVREC: A General Few-shot Defect Classification Model Using Multi-View Region-Context
Shuai Lyu, Rongchen Zhang, Zeqi Ma et al.
Image Quality Assessment: Investigating Causal Perceptual Effects with Abductive Counterfactual Inference
Wenhao Shen, Mingliang Zhou, Yu Chen et al.
Modality-Specialized Synergizers for Interleaved Vision-Language Generalists
Zhiyang Xu, Minqian Liu, Ying Shen et al.
Secant Line Search for Frank-Wolfe Algorithms
Deborah Hendrych, Sebastian Pokutta, Mathieu Besançon et al.
Fast and Slow Streams for Online Time Series Forecasting Without Information Leakage
Ying-yee Ava Lau, Zhiwen Shao, Dit-Yan Yeung
Effective and Efficient Masked Image Generation Models
Zebin You, Jingyang Ou, Xiaolu Zhang et al.
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers
Zeyuan Allen-Zhu
Data Taggants: Dataset Ownership Verification Via Harmless Targeted Data Poisoning
Wassim Bouaziz, Nicolas Usunier, El-Mahdi El-Mhamdi
HC-LLM: Historical-Constrained Large Language Models for Radiology Report Generation
Tengfei Liu, Jiapu Wang, Yongli Hu et al.
RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models
Haoran Hao, Jiaming Han, Changsheng Li et al.
Why Masking Diffusion Works: Condition on the Jump Schedule for Improved Discrete Diffusion
Alan Amin, Nate Gruver, Andrew Wilson
ACE: Anti-Editing Concept Erasure in Text-to-Image Models
Zihao Wang, Yuxiang Wei, Fan Li et al.
AMO Sampler: Enhancing Text Rendering with Overshooting
Xixi Hu, Keyang Xu, Bo Liu et al.
DELIFT: Data Efficient Language model Instruction Fine-Tuning
Ishika Agarwal, Krishnateja Killamsetty, Lucian Popa et al.
ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention
Bencheng Liao, Xinggang Wang, Lianghui Zhu et al.
INST-IT: Boosting Instance Understanding via Explicit Visual Prompt Instruction Tuning
Wujian Peng, Lingchen Meng, Yitong Chen et al.
FALCON: Fairness Learning via Contrastive Attention Approach to Continual Semantic Scene Understanding
Thanh-Dat Truong, Utsav Prabhu, Bhiksha Raj et al.
Micro-macro Wavelet-based Gaussian Splatting for 3D Reconstruction from Unconstrained Images
Yihui Li, Chengxin Lv, Hongyu Yang et al.
Geometry-aware RL for Manipulation of Varying Shapes and Deformable Objects
Tai Hoang, Huy Le, Philipp Becker et al.
Edge-SD-SR: Low Latency and Parameter Efficient On-device Super-Resolution with Stable Diffusion via Bidirectional Conditioning
Isma Hadji, Mehdi Noroozi, Victor Escorcia et al.
BARD-GS: Blur-Aware Reconstruction of Dynamic Scenes via Gaussian Splatting
Yiren Lu, Yunlai Zhou, Disheng Liu et al.
3D-GSW: 3D Gaussian Splatting for Robust Watermarking
Youngdong Jang, Hyunje Park, Feng Yang et al.
Feature Denoising Diffusion Model for Blind Image Quality Assessment
Xudong Li, Yan Zhang, Yunhang Shen et al.
Incomplete Modality Disentangled Representation for Ophthalmic Disease Grading and Diagnosis
Chengzhi Liu, Zile Huang, Zhe Chen et al.
Can Textual Gradient Work in Federated Learning?
Minghui Chen, Ruinan Jin, Wenlong Deng et al.
SapiensID: Foundation for Human Recognition
Minchul Kim, Dingqiang Ye, Yiyang Su et al.
Learning Chaos In A Linear Way
Xiaoyuan Cheng, Yi He, Yiming Yang et al.
Scene4U: Hierarchical Layered 3D Scene Reconstruction from Single Panoramic Image for Your Immerse Exploration
Zilong Huang, Jun He, Junyan Ye et al.
DataMan: Data Manager for Pre-training Large Language Models
Ru Peng, Kexin Yang, Yawen Zeng et al.
Differentially Private Steering for Large Language Model Alignment
Anmol Goel, Yaxi Hu, Iryna Gurevych et al.
VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning
Ji Soo Lee, Jongha Kim, Jeehye Na et al.
Sail into the Headwind: Alignment via Robust Rewards and Dynamic Labels against Reward Hacking
Paria Rashidinejad, Yuandong Tian
Soft Reasoning: Navigating Solution Spaces in Large Language Models through Controlled Embedding Exploration
Qinglin Zhu, Runcong Zhao, Hanqi Yan et al.
From Experts to a Generalist: Toward General Whole-Body Control for Humanoid Robots
Yuxuan Wang, Ming Yang, Gang Ding et al.
HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts
Neil He, Rishabh Anand, Hiren Madhu et al.
Adversarial Generative Flow Network for Solving Vehicle Routing Problems
Ni Zhang, Jingfeng Yang, Zhiguang Cao et al.
VIKI‑R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning
Li Kang, Xiufeng Song, Heng Zhou et al.
DenseGrounding: Improving Dense Language-Vision Semantics for Ego-centric 3D Visual Grounding
Henry Zheng, Hao Shi, Qihang Peng et al.
T2ICount: Enhancing Cross-modal Understanding for Zero-Shot Counting
Yifei Qian, Zhongliang Guo, Bowen Deng et al.
Local-Prompt: Extensible Local Prompts for Few-Shot Out-of-Distribution Detection
Fanhu Zeng, Zhen Cheng, Fei Zhu et al.
How do Transformers Learn Implicit Reasoning?
Jiaran Ye, Zijun Yao, Zhidian Huang et al.
Focusing on Tracks for Online Multi-Object Tracking
Kyujin Shim, Kangwook Ko, YuJin Yang et al.
Synthetic Prior for Few-Shot Drivable Head Avatar Inversion
Wojciech Zielonka, Stephan J. Garbin, Alexandros Lattas et al.
DiffFNO: Diffusion Fourier Neural Operator
Xiaoyi Liu, Hao Tang
Hypergraph Vision Transformers: Images are More than Nodes, More than Edges
Joshua Fixelle
From Specificity to Generality: Revisiting Generalizable Artifacts in Detecting Face Deepfakes
Long Ma, Zhiyuan Yan, Jin Xu et al.
MergeNet: Knowledge Migration Across Heterogeneous Models, Tasks, and Modalities
Kunxi Li, Tianyu Zhan, Kairui Fu et al.
DIPO: Dual-State Images Controlled Articulated Object Generation Powered by Diverse Data
Ruiqi Wu, Xinjie wang, Liu.Liu et al.
Noise Calibration and Spatial-Frequency Interactive Network for STEM Image Enhancement
Hesong Li, Ziqi Wu, Ruiwen Shao et al.
COCONut-PanCap: Joint Panoptic Segmentation and Grounded Captions for Fine-Grained Understanding and Generation
Xueqing Deng, Linjie Yang, Qihang Yu et al.
Not all solutions are created equal: An analytical dissociation of functional and representational similarity in deep linear neural networks
Lukas Braun, Erin Grant, Andrew Saxe
DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback
Zaid Khan, Elias Stengel-Eskin, Jaemin Cho et al.
Unveiling Differences in Generative Models: A Scalable Differential Clustering Approach
Jingwei Zhang, Mohammad Jalali, Cheuk Ting Li et al.
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding
Chongjun Tu, Lin Zhang, pengtao chen et al.
Right Now, Wrong Then: Non-Stationary Direct Preference Optimization under Preference Drift
Seongho Son, William Bankes, Sayak Ray Chowdhury et al.
ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL
Yang Qin, Chao Chen, Zhihang Fu et al.
V2C-CBM: Building Concept Bottlenecks with Vision-to-Concept Tokenizer
Hangzhou He, Lei Zhu, Xinliang Zhang et al.
Temporal Heterogeneous Graph Generation with Privacy, Utility, and Efficiency
Xinyu He, Dongqi Fu, Hanghang Tong et al.
Combining Cost Constrained Runtime Monitors for AI Safety
Tim Hua, James Baskerville, Henri Lemoine et al.
Sample complexity of data-driven tuning of model hyperparameters in neural networks with structured parameter-dependent dual function
Maria-Florina Balcan, Anh Nguyen, Dravyansh Sharma
Depth-Centric Dehazing and Depth-Estimation from Real-World Hazy Driving Video
Junkai Fan, Kun Wang, Zhiqiang Yan et al.
Decoding Game: On Minimax Optimality of Heuristic Text Generation Strategies
Sijin Chen, Omar Hagrass, Jason Klusowski
LOMA: Language-assisted Semantic Occupancy Network via Triplane Mamba
Yubo Cui, Zhiheng Li, Jiaqiang Wang et al.
Efficient stagewise pretraining via progressive subnetworks
Abhishek Panigrahi, Nikunj Saunshi, Kaifeng Lyu et al.
Bundle Neural Network for message diffusion on graphs
Jacob Bamberger, Federico Barbero, Xiaowen Dong et al.
Alleviate and Mining: Rethinking Unsupervised Domain Adaptation for Mitochondria Segmentation from Pseudo-Label Perspective
Yujia Chen, Rui Sun, Wangkai Li et al.
Filter or Compensate: Towards Invariant Representation from Distribution Shift for Anomaly Detection
Zining Chen, Xingshuang Luo, Weiqiu Wang et al.
CONTRA: Conformal Prediction Region via Normalizing Flow Transformation
Zhenhan FANG, Aixin Tan, Jian Huang
PokerBench: Training Large Language Models to Become Professional Poker Players
Richard Zhuang, Akshat Gupta, Richard Yang et al.
MANTRA: The Manifold Triangulations Assemblage
Rubén Ballester, Ernst Roell, Daniel Bin Schmid et al.
Seeing Eye to AI: Human Alignment via Gaze-Based Response Rewards for Large Language Models
Ángela López-Cardona, Carlos Segura, Alexandros Karatzoglou et al.
LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model Adaptation
Can Jin, Ying Li, Mingyu Zhao et al.
CapeX: Category-Agnostic Pose Estimation from Textual Point Explanation
Matan Rusanovsky, Or Hirschorn, Shai Avidan
Multi-Granular Multimodal Clue Fusion for Meme Understanding
Li Zheng, Hao Fei, Ting Dai et al.
Assessing the Creativity of LLMs in Proposing Novel Solutions to Mathematical Problems
Junyi Ye, Jingyi Gu, Xinyun Zhao et al.
RoboPEPP: Vision-Based Robot Pose and Joint Angle Estimation through Embedding Predictive Pre-Training
Raktim Gautam Goswami, Prashanth Krishnamurthy, Yann LeCun et al.
Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning
Joey Hong, Anca Dragan, Sergey Levine
Quantum-PEFT: Ultra parameter-efficient fine-tuning
Toshiaki Koike-Akino, Francesco Tonin, Yongtao Wu et al.
LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS
Wanhua Li, Yujie Zhao, Minghan Qin et al.
TopoDiffusionNet: A Topology-aware Diffusion Model
Saumya Gupta, Dimitris Samaras, Chao Chen
Re-HOLD: Video Hand Object Interaction Reenactment via adaptive Layout-instructed Diffusion Model
Yingying Fan, Quanwei Yang, Kaisiyuan Wang et al.
Mitigating Social Bias in Large Language Models: A Multi-Objective Approach Within a Multi-Agent Framework
Zhenjie Xu, Wenqing Chen, Yi Tang et al.
Generating Physically Stable and Buildable Brick Structures from Text
Ava Pun, Kangle Deng, Ruixuan Liu et al.
DTGBrepGen: A Novel B-rep Generative Model through Decoupling Topology and Geometry
Jing Li, Yihang Fu, Falai Chen
Adaptive Few-shot Prompting for Machine Translation with Pre-trained Language Models
Lei Tang, Jinghui Qin, Wenxuan Ye et al.
MaFeRw: Query Rewriting with Multi-Aspect Feedbacks for Retrieval-Augmented Large Language Models
Yujing Wang, Hainan Zhang, Liang Pang et al.
SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent
Yandan Yang, Baoxiong Jia, Shujie Zhang et al.
MalCL: Leveraging GAN-Based Generative Replay to Combat Catastrophic Forgetting in Malware Classification
Jimin Park, AHyun Ji, Minji Park et al.
SIGMAN: Scaling 3D Human Gaussian Generation with Millions of Assets
Yuhang Yang, Fengqi Liu, Yixing Lu et al.
EBBS: An Ensemble with Bi-Level Beam Search for Zero-Shot Machine Translation
Yuqiao Wen, Behzad Shayegh, Chenyang Huang et al.
Mask^2DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation
Tianhao Qi, Jianlong Yuan, Wanquan Feng et al.
SmartRAG: Jointly Learn RAG-Related Tasks From the Environment Feedback
Jingsheng Gao, Linxu Li, Ke Ji et al.
HMoRA: Making LLMs More Effective with Hierarchical Mixture of LoRA Experts
Mengqi Liao, Wei Chen, Junfeng Shen et al.
DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery
Jiadong Tang, Yu Gao, Dianyi Yang et al.
Revisiting Random Walks for Learning on Graphs
Jinwoo Kim, Olga Zaghen, Ayhan Suleymanzade et al.
ROD-MLLM: Towards More Reliable Object Detection in Multimodal Large Language Models
Heng Yin, Yuqiang Ren, Ke Yan et al.
Beyond Sequence: Impact of Geometric Context for RNA Property Prediction
Junjie Xu, Artem Moskalev, Tommaso Mansi et al.
VideoOrion: Tokenizing Object Dynamics in Videos
Yicheng Feng, Yijiang Li, Wanpeng Zhang et al.
EWMoE: An Effective Model for Global Weather Forecasting with Mixture-of-Experts
Lihao Gan, Xin Man, Chenghong Zhang et al.
Near, far: Patch-ordering enhances vision foundation models' scene understanding
Valentinos Pariza, Mohammadreza Salehi, Gertjan J Burghouts et al.
GNS: Solving Plane Geometry Problems by Neural-Symbolic Reasoning with Multi-Modal LLMs
Maizhen Ning, Zihao Zhou, Qiufeng Wang et al.
Scaling Laws for Gradient Descent and Sign Descent for Linear Bigram Models under Zipf’s Law
Frederik Kunstner, Francis Bach
ProtComposer: Compositional Protein Structure Generation with 3D Ellipsoids
Hannes Stärk, Bowen Jing, Tomas Geffner et al.
As large as it gets – Studying Infinitely Large Convolutions via Neural Implicit Frequency Filters
Margret Keuper, Julia Grabinski, Janis Keuper
On the Expressiveness of Rational ReLU Neural Networks With Bounded Depth
Gennadiy Averkov, Christopher Hojny, Maximilian Merkert