Most Cited 2025 "optimization-guided neural iterations" Papers
22,274 papers found • Page 18 of 112
Conference
SIGMAN: Scaling 3D Human Gaussian Generation with Millions of Assets
Yuhang Yang, Fengqi Liu, Yixing Lu et al.
Mask^2DiT: Dual Mask-based Diffusion Transformer for Multi-Scene Long Video Generation
Tianhao Qi, Jianlong Yuan, Wanquan Feng et al.
INST-IT: Boosting Instance Understanding via Explicit Visual Prompt Instruction Tuning
Wujian Peng, Lingchen Meng, Yitong Chen et al.
DroneSplat: 3D Gaussian Splatting for Robust 3D Reconstruction from In-the-Wild Drone Imagery
Jiadong Tang, Yu Gao, Dianyi Yang et al.
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language Model
Longrong Yang, Dong Shen, Chaoxiang Cai et al.
SonicSim: A customizable simulation platform for speech processing in moving sound source scenarios
Kai Li, Wendi Sang, Chang Zeng et al.
ROD-MLLM: Towards More Reliable Object Detection in Multimodal Large Language Models
Heng Yin, Yuqiang Ren, Ke Yan et al.
Why Masking Diffusion Works: Condition on the Jump Schedule for Improved Discrete Diffusion
Alan Amin, Nate Gruver, Andrew Wilson
Bayesian Optimization of Antibodies Informed by a Generative Model of Evolving Sequences
Alan Amin, Nate Gruver, Yilun Kuang et al.
CaPo: Cooperative Plan Optimization for Efficient Embodied Multi-Agent Cooperation
Jie Liu, Pan Zhou, Yingjun Du et al.
X-Fusion: Introducing New Modality to Frozen Large Language Models
Sicheng Mo, Thao Nguyen, Xun Huang et al.
MUNBa: Machine Unlearning via Nash Bargaining
Jing Wu, Mehrtash Harandi
Adaptive Learn-then-Test: Statistically Valid and Efficient Hyperparameter Selection
Matteo Zecchin, Sangwoo Park, Osvaldo Simeone
Motion-adaptive Transformer for Event-based Image Deblurring
Senyan Xu, Zhijing Sun, Mingchen Zhong et al.
The Effectiveness of Curvature-Based Rewiring and the Role of Hyperparameters in GNNs Revisited
Floriano Tori, Vincent Holst, Vincent Ginis
From Experts to a Generalist: Toward General Whole-Body Control for Humanoid Robots
Yuxuan Wang, Ming Yang, Gang Ding et al.
TG-LLaVA: Text Guided LLaVA via Learnable Latent Embeddings
Dawei Yan, Pengcheng Li, Yang Li et al.
HELM: Hyperbolic Large Language Models via Mixture-of-Curvature Experts
Neil He, Rishabh Anand, Hiren Madhu et al.
PICO: Reconstructing 3D People In Contact with Objects
Alpár Cseke, Shashank Tripathi, Sai Kumar Dwivedi et al.
Mask-Adapter: The Devil is in the Masks for Open-Vocabulary Segmentation
Yongkang Li, Tianheng Cheng, Bin Feng et al.
VideoOrion: Tokenizing Object Dynamics in Videos
Yicheng Feng, Yijiang Li, Wanpeng Zhang et al.
Tartan IMU: A Light Foundation Model for Inertial Positioning in Robotics
Shibo Zhao, Sifan Zhou, Raphael Blanchard et al.
FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding
Chongjun Tu, Lin Zhang, pengtao chen et al.
How new data permeates LLM knowledge and how to dilute it
Chen Sun, Renat Aksitov, Andrey Zhmoginov et al.
Injecting Universal Jailbreak Backdoors into LLMs in Minutes
Zhuowei Chen, qiannan zhang, Shichao Pei
From One to More: Contextual Part Latents for 3D Generation
Shaocong Dong, Lihe Ding, Xiao Chen et al.
Generating Physically Stable and Buildable Brick Structures from Text
Ava Pun, Kangle Deng, Ruixuan Liu et al.
VIKI‑R: Coordinating Embodied Multi-Agent Cooperation via Reinforcement Learning
Li Kang, Xiufeng Song, Heng Zhou et al.
How do Transformers Learn Implicit Reasoning?
Jiaran Ye, Zijun Yao, Zhidian Huang et al.
Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation
Ziheng Zhang, Jianyang Gu, Arpita Chowdhury et al.
Sail into the Headwind: Alignment via Robust Rewards and Dynamic Labels against Reward Hacking
Paria Rashidinejad, Yuandong Tian
DELIFT: Data Efficient Language model Instruction Fine-Tuning
Ishika Agarwal, Krishnateja Killamsetty, Lucian Popa et al.
LIFe-GoM: Generalizable Human Rendering with Learned Iterative Feedback Over Multi-Resolution Gaussians-on-Mesh
Jing Wen, Alex Schwing, Shenlong Wang
Geometry-aware RL for Manipulation of Varying Shapes and Deformable Objects
Tai Hoang, Huy Le, Philipp Becker et al.
Trace3D: Consistent Segmentation Lifting via Gaussian Instance Tracing
Hongyu Shen, Junfeng Ni, Weishuo Li et al.
NullSwap: Proactive Identity Cloaking Against Deepfake Face Swapping
Tianyi Wang, Shuaicheng Niu, Harry Cheng et al.
MammAlps: A Multi-view Video Behavior Monitoring Dataset of Wild Mammals in the Swiss Alps
Valentin Gabeff, Haozhe Qi, Brendan Flaherty et al.
VISION-XL: High Definition Video Inverse Problem Solver using Latent Image Diffusion Models
Taesung Kwon, Jong Ye
From Specificity to Generality: Revisiting Generalizable Artifacts in Detecting Face Deepfakes
Long Ma, Zhiyuan Yan, Jin Xu et al.
Multi-clue Consistency Learning to Bridge Gaps Between General and Oriented Object in Semi-supervised Detection
Chenxu Wang, Chunyan Xu, Xiang Li et al.
GaussMark: A Practical Approach for Structural Watermarking of Language Models
Adam Block, Alexander Rakhlin, Ayush Sekhari
Dehaze-RetinexGAN: Real-World Image Dehazing via Retinex-based Generative Adversarial Network
Xinran Wang, Guang Yang, Tian Ye et al.
Fast and Robust: Task Sampling with Posterior and Diversity Synergies for Adaptive Decision-Makers in Randomized Environments
Yun Qu, Cheems Wang, Yixiu Mao et al.
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training
Tong Wei, Yijun Yang, Junliang Xing et al.
DIPO: Dual-State Images Controlled Articulated Object Generation Powered by Diverse Data
Ruiqi Wu, Xinjie wang, Liu.Liu et al.
RealisHuman: A Two-Stage Approach for Refining Malformed Human Parts in Generated Images
Benzhi Wang, Jingkai Zhou, Jingqi Bai et al.
Combining Cost Constrained Runtime Monitors for AI Safety
Tim Hua, James Baskerville, Henri Lemoine et al.
Modality-Specialized Synergizers for Interleaved Vision-Language Generalists
Zhiyang Xu, Minqian Liu, Ying Shen et al.
Fast and Slow Streams for Online Time Series Forecasting Without Information Leakage
Ying-yee Ava Lau, Zhiwen Shao, Dit-Yan Yeung
Learning Chaos In A Linear Way
Xiaoyuan Cheng, Yi He, Yiming Yang et al.
Do Visual Imaginations Improve Vision-and-Language Navigation Agents?
Akhil Perincherry, Jacob Krantz, Stefan Lee
PhysSplat: Efficient Physics Simulation for 3D Scenes via MLLM-Guided Gaussian Splatting
Haoyu Zhao, Hao Wang, Xingyue Zhao et al.
Data Taggants: Dataset Ownership Verification Via Harmless Targeted Data Poisoning
Wassim Bouaziz, Nicolas Usunier, El-Mahdi El-Mhamdi
ZIM: Zero-Shot Image Matting for Anything
Beomyoung Kim, Chanyong Shin, Joonhyun Jeong et al.
Temporal Heterogeneous Graph Generation with Privacy, Utility, and Efficiency
Xinyu He, Dongqi Fu, Hanghang Tong et al.
Effective and Efficient Masked Image Generation Models
Zebin You, Jingyang Ou, Xiaolu Zhang et al.
Differentially Private Steering for Large Language Model Alignment
Anmol Goel, Yaxi Hu, Iryna Gurevych et al.
MVREC: A General Few-shot Defect Classification Model Using Multi-View Region-Context
Shuai Lyu, Rongchen Zhang, Zeqi Ma et al.
AniMer: Animal Pose and Shape Estimation Using Family Aware Transformer
Jin Lyu, Tianyi Zhu, Yi Gu et al.
Autonomous LLM-Enhanced Adversarial Attack for Text-to-Motion
Honglei Miao, Fan Ma, Ruijie Quan et al.
Can Textual Gradient Work in Federated Learning?
Minghui Chen, Ruinan Jin, Wenlong Deng et al.
CapeX: Category-Agnostic Pose Estimation from Textual Point Explanation
Matan Rusanovsky, Or Hirschorn, Shai Avidan
DiC: Rethinking Conv3x3 Designs in Diffusion Models
Yuchuan Tian, Jing Han, Chengcheng Wang et al.
DataMan: Data Manager for Pre-training Large Language Models
Ru Peng, Kexin Yang, Yawen Zeng et al.
LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS
Wanhua Li, Yujie Zhao, Minghan Qin et al.
Incomplete Modality Disentangled Representation for Ophthalmic Disease Grading and Diagnosis
Chengzhi Liu, Zile Huang, Zhe Chen et al.
Bundle Neural Network for message diffusion on graphs
Jacob Bamberger, Federico Barbero, Xiaowen Dong et al.
Adversarial Generative Flow Network for Solving Vehicle Routing Problems
Ni Zhang, Jingfeng Yang, Zhiguang Cao et al.
SfM-Free 3D Gaussian Splatting via Hierarchical Training
Bo Ji, Angela Yao
HC-LLM: Historical-Constrained Large Language Models for Radiology Report Generation
Tengfei Liu, Jiapu Wang, Yongli Hu et al.
Soft Reasoning: Navigating Solution Spaces in Large Language Models through Controlled Embedding Exploration
Qinglin Zhu, Runcong Zhao, Hanqi Yan et al.
Secant Line Search for Frank-Wolfe Algorithms
Deborah Hendrych, Sebastian Pokutta, Mathieu Besançon et al.
BRiTE: Bootstrapping Reinforced Thinking Process to Enhance Language Model Reasoning
Han Zhong, Yutong Yin, Shenao Zhang et al.
Efficient Fine-Tuning and Concept Suppression for Pruned Diffusion Models
Reza Shirkavand, Peiran Yu, Shangqian Gao et al.
Local-Prompt: Extensible Local Prompts for Few-Shot Out-of-Distribution Detection
Fanhu Zeng, Zhen Cheng, Fei Zhu et al.
SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent
Yandan Yang, Baoxiong Jia, Shujie Zhang et al.
ROUTE: Robust Multitask Tuning and Collaboration for Text-to-SQL
Yang Qin, Chao Chen, Zhihang Fu et al.
Instruction-based Image Manipulation by Watching How Things Move
Mingdeng Cao, Xuaner Zhang, Yinqiang Zheng et al.
Feature Denoising Diffusion Model for Blind Image Quality Assessment
Xudong Li, Yan Zhang, Yunhang Shen et al.
DORNet: A Degradation Oriented and Regularized Network for Blind Depth Super-Resolution
Zhengxue Wang, Zhiqiang Yan, Jinshan Pan et al.
Micro-macro Wavelet-based Gaussian Splatting for 3D Reconstruction from Unconstrained Images
Yihui Li, Chengxin Lv, Hongyu Yang et al.
Efficient stagewise pretraining via progressive subnetworks
Abhishek Panigrahi, Nikunj Saunshi, Kaifeng Lyu et al.
MergeNet: Knowledge Migration Across Heterogeneous Models, Tasks, and Modalities
Kunxi Li, Tianyu Zhan, Kairui Fu et al.
Scaling Laws for Gradient Descent and Sign Descent for Linear Bigram Models under Zipf’s Law
Frederik Kunstner, Francis Bach
ForestFormer3D: A Unified Framework for End-to-End Segmentation of Forest LiDAR 3D Point Clouds
Binbin Xiang, Maciej Wielgosz, Stefano Puliti et al.
vesselFM: A Foundation Model for Universal 3D Blood Vessel Segmentation
Bastian Wittmann, Yannick Wattenberg, Tamaz Amiranashvili et al.
LoR-VP: Low-Rank Visual Prompting for Efficient Vision Model Adaptation
Can Jin, Ying Li, Mingyu Zhao et al.
Beyond Sequence: Impact of Geometric Context for RNA Property Prediction
Junjie Xu, Artem Moskalev, Tommaso Mansi et al.
Rashomon Sets for Prototypical-Part Networks: Editing Interpretable Models in Real-Time
Jon Donnelly, Zhicheng Guo, Alina Jade Barnett et al.
Self-Cross Diffusion Guidance for Text-to-Image Synthesis of Similar Subjects
Weimin Qiu, Jieke Wang, Meng Tang
AFL: A Single-Round Analytic Approach for Federated Learning with Pre-trained Models
Run He, Kai Tong, Di Fang et al.
PhysX-3D: Physical-Grounded 3D Asset Generation
Ziang Cao, Zhaoxi Chen, Liang Pan et al.
Energy Matching: Unifying Flow Matching and Energy-Based Models for Generative Modeling
Michal Balcerak, Tamaz Amiranashvili, Antonio Terpin et al.
EdgeTAM: On-Device Track Anything Model
Chong Zhou, Chenchen Zhu, Yunyang Xiong et al.
ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention
Bencheng Liao, Xinggang Wang, Lianghui Zhu et al.
DenseGrounding: Improving Dense Language-Vision Semantics for Ego-centric 3D Visual Grounding
Henry Zheng, Hao Shi, Qihang Peng et al.
Decoding Game: On Minimax Optimality of Heuristic Text Generation Strategies
Sijin Chen, Omar Hagrass, Jason Klusowski
DataEnvGym: Data Generation Agents in Teacher Environments with Student Feedback
Zaid Khan, Elias Stengel-Eskin, Jaemin Cho et al.
VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning
Ji Soo Lee, Jongha Kim, Jeehye Na et al.
On the Expressiveness of Rational ReLU Neural Networks With Bounded Depth
Gennadiy Averkov, Christopher Hojny, Maximilian Merkert
CONTRA: Conformal Prediction Region via Normalizing Flow Transformation
Zhenhan FANG, Aixin Tan, Jian Huang
UltraFusion: Ultra High Dynamic Imaging using Exposure Fusion
Zixuan Chen, Yujin Wang, Xin Cai et al.
StateSpaceDiffuser: Bringing Long Context to Diffusion World Models
Nedko Savov, Naser Kazemi, Deheng Zhang et al.
As large as it gets – Studying Infinitely Large Convolutions via Neural Implicit Frequency Filters
Margret Keuper, Julia Grabinski, Janis Keuper
Information-Driven Design of Imaging Systems
Henry Pinkard, Leyla Kabuli, Eric Markley et al.
RAD: Region-Aware Diffusion Models for Image Inpainting
Sora Kim, Sungho Suh, Minsik Lee
Chain-of-region: Visual Language Models Need Details for Diagram Analysis
Xue Li, Yiyou Sun, Wei Cheng et al.
Show and Segment: Universal Medical Image Segmentation via In-Context Learning
Yunhe Gao, Di Liu, Zhuowei Li et al.
Efficient Multi-modal Large Language Models via Progressive Consistency Distillation
Zichen Wen, Shaobo Wang, Yufa Zhou et al.
Right Now, Wrong Then: Non-Stationary Direct Preference Optimization under Preference Drift
Seongho Son, William Bankes, Sayak Ray Chowdhury et al.
Seeing Eye to AI: Human Alignment via Gaze-Based Response Rewards for Large Language Models
Ángela López-Cardona, Carlos Segura, Alexandros Karatzoglou et al.
GeoSplatting: Towards Geometry Guided Gaussian Splatting for Physically-based Inverse Rendering
Kai Ye, Chong Gao, Guanbin Li et al.
BANet: Bilateral Aggregation Network for Mobile Stereo Matching
Gangwei Xu, Jiaxin Liu, Xianqi Wang et al.
Not all solutions are created equal: An analytical dissociation of functional and representational similarity in deep linear neural networks
Lukas Braun, Erin Grant, Andrew Saxe
Quantum-PEFT: Ultra parameter-efficient fine-tuning
Toshiaki Koike-Akino, Francesco Tonin, Yongtao Wu et al.
Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning
Joey Hong, Anca Dragan, Sergey Levine
V2C-CBM: Building Concept Bottlenecks with Vision-to-Concept Tokenizer
Hangzhou He, Lei Zhu, Xinliang Zhang et al.
Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition
Chengxiang Huang, Yake Wei, Zequn Yang et al.
DynamicVL: Benchmarking Multimodal Large Language Models for Dynamic City Understanding
Weihao Xuan, Junjue Wang, Heli Qi et al.
Orchid: Image Latent Diffusion for Joint Appearance and Geometry Generation
Akshay Krishnan, Xinchen Yan, Vincent Casser et al.
TopoDiffusionNet: A Topology-aware Diffusion Model
Saumya Gupta, Dimitris Samaras, Chao Chen
Sparse2DGS: Geometry-Prioritized Gaussian Splatting for Surface Reconstruction from Sparse Views
Jiang Wu, Rui Li, Yu Zhu et al.
SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning
Jiaqi Huang, Zunnan Xu, Jun Zhou et al.
SmartRAG: Jointly Learn RAG-Related Tasks From the Environment Feedback
Jingsheng Gao, Linxu Li, Ke Ji et al.
MANTRA: The Manifold Triangulations Assemblage
Rubén Ballester, Ernst Roell, Daniel Bin Schmid et al.
Joint Out-of-Distribution Filtering and Data Discovery Active Learning
Sebastian Schmidt, Leonard Schenk, Leo Schwinn et al.
Generative Zero-Shot Composed Image Retrieval
Lan Wang, Wei Ao, Vishnu Naresh Boddeti et al.
Near, far: Patch-ordering enhances vision foundation models' scene understanding
Valentinos Pariza, Mohammadreza Salehi, Gertjan J Burghouts et al.
Interactive Speculative Planning: Enhance Agent Efficiency through Co-design of System and User Interface
Wenyue Hua, Mengting Wan, JAGANNATH VADREVU et al.
LOMA: Language-assisted Semantic Occupancy Network via Triplane Mamba
Yubo Cui, Zhiheng Li, Jiaqiang Wang et al.
Scaling Embedding Layers in Language Models
Da Yu, Edith Cohen, Badih Ghazi et al.
SeqGrowGraph: Learning Lane Topology as a Chain of Graph Expansions
Mengwei Xie, Shuang Zeng, Xinyuan Chang et al.
SoMA: Singular Value Decomposed Minor Components Adaptation for Domain Generalizable Representation Learning
Seokju Yun, Seunghye Chae, Dongheon Lee et al.
Whole-Body Conditioned Egocentric Video Prediction
Yutong Bai, Danny Tran, Amir Bar et al.
CoopTrack: Exploring End-to-End Learning for Efficient Cooperative Sequential Perception
Jiaru Zhong, Jiahao Wang, Jiahui Xu et al.
Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification
Yang Qin, Chao Chen, Zhihang Fu et al.
Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images
Kazi Sajeed Mehrab, M. Maruf, Arka Daw et al.
REPA Works Until It Doesn’t: Early-Stopped, Holistic Alignment Supercharges Diffusion Training
Ziqiao Wang, Wangbo Zhao, Yuhao Zhou et al.
Federated Residual Low-Rank Adaption of Large Language Models
Yunlu Yan, Chun-Mei Feng, Wangmeng Zuo et al.
Depth-Centric Dehazing and Depth-Estimation from Real-World Hazy Driving Video
Junkai Fan, Kun Wang, Zhiqiang Yan et al.
Lessons and Insights from a Unifying Study of Parameter-Efficient Fine-Tuning (PEFT) in Visual Recognition
Zheda Mai, Ping Zhang, Cheng-Hao Tu et al.
Alleviate and Mining: Rethinking Unsupervised Domain Adaptation for Mitochondria Segmentation from Pseudo-Label Perspective
Yujia Chen, Rui Sun, Wangkai Li et al.
ProtComposer: Compositional Protein Structure Generation with 3D Ellipsoids
Hannes Stärk, Bowen Jing, Tomas Geffner et al.
Revisiting Random Walks for Learning on Graphs
Jinwoo Kim, Olga Zaghen, Ayhan Suleymanzade et al.
Filter or Compensate: Towards Invariant Representation from Distribution Shift for Anomaly Detection
Zining Chen, Xingshuang Luo, Weiqiu Wang et al.
Foundations of Top-$k$ Decoding for Language Models
Georgy Noarov, Soham Mallick, Tao Wang et al.
PosterO: Structuring Layout Trees to Enable Language Models in Generalized Content-Aware Layout Generation
HsiaoYuan Hsu, Yuxin Peng
FlowRAM: Grounding Flow Matching Policy with Region-Aware Mamba Framework for Robotic Manipulation
Sen Wang, Le Wang, Sanping Zhou et al.
HMoRA: Making LLMs More Effective with Hierarchical Mixture of LoRA Experts
Mengqi Liao, Wei Chen, Junfeng Shen et al.
Multi-Granular Multimodal Clue Fusion for Meme Understanding
Li Zheng, Hao Fei, Ting Dai et al.
Enhancing Adversarial Transferability with Adversarial Weight Tuning
Jiahao Chen, Zhou Feng, Rui Zeng et al.
Implicit Bias of Spectral Descent and Muon on Multiclass Separable Data
Chen Fan, Mark Schmidt, Christos Thrampoulidis
PokerBench: Training Large Language Models to Become Professional Poker Players
Richard Zhuang, Akshat Gupta, Richard Yang et al.
ChatReID: Open-ended Interactive Person Retrieval via Hierarchical Progressive Tuning for Vision Language Models
Ke Niu, Haiyang Yu, Mengyang Zhao et al.
MonoSplat: Generalizable 3D Gaussian Splatting from Monocular Depth Foundation Models
Yifan Liu, Keyu Fan, Weihao Yu et al.
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
Maximilian Beck, Korbinian Pöppel, Phillip Lippe et al.
NAVER: A Neuro-Symbolic Compositional Automaton for Visual Grounding with Explicit Logic Reasoning
Zhixi Cai, Fucai Ke, Simindokht Jahangard et al.
A Training-Free Sub-quadratic Cost Transformer Model Serving Framework with Hierarchically Pruned Attention
Heejun Lee, Geon Park, Youngwan Lee et al.
H3D-DGS: Exploring Heterogeneous 3D Motion Representation for Deformable 3D Gaussian Splatting
Bing He, Yunuo Chen, Guo Lu et al.
Trust Region Constrained Measure Transport in Path Space for Stochastic Optimal Control and Inference
Denis Blessing, Julius Berner, Lorenz Richter et al.
Mitigating Social Bias in Large Language Models: A Multi-Objective Approach Within a Multi-Agent Framework
Zhenjie Xu, Wenqing Chen, Yi Tang et al.
SynQ: Accurate Zero-shot Quantization by Synthesis-aware Fine-tuning
Minjun Kim, Jongjin Kim, U Kang
Multi-Granularity Class Prototype Topology Distillation for Class-Incremental Source-Free Unsupervised Domain Adaptation
Peihua Deng, Jiehua Zhang, Xichun Sheng et al.
MalCL: Leveraging GAN-Based Generative Replay to Combat Catastrophic Forgetting in Malware Classification
Jimin Park, AHyun Ji, Minji Park et al.
BHViT: Binarized Hybrid Vision Transformer
Tian Gao, Yu Zhang, Zhiyuan Zhang et al.
Topograph: An Efficient Graph-Based Framework for Strictly Topology Preserving Image Segmentation
Laurin Lux, Alexander H Berger, Alexander Weers et al.
EBBS: An Ensemble with Bi-Level Beam Search for Zero-Shot Machine Translation
Yuqiao Wen, Behzad Shayegh, Chenyang Huang et al.
MaFeRw: Query Rewriting with Multi-Aspect Feedbacks for Retrieval-Augmented Large Language Models
Yujing Wang, Hainan Zhang, Liang Pang et al.
Assessing the Creativity of LLMs in Proposing Novel Solutions to Mathematical Problems
Junyi Ye, Jingyi Gu, Xinyun Zhao et al.
EWMoE: An Effective Model for Global Weather Forecasting with Mixture-of-Experts
Lihao Gan, Xin Man, Chenghong Zhang et al.
SWE-SQL: Illuminating LLM Pathways to Solve User SQL Issues in Real-World Applications
Jinyang Li, Xiaolong Li, Ge Qu et al.
DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models
Wenlong Deng, Yize Zhao, Vala Vakilian et al.
GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis
Bo Liu, Ke Zou, Li-Ming Zhan et al.
SplatFormer: Point Transformer for Robust 3D Gaussian Splatting
Yutong Chen, Marko Mihajlovic, Xiyi Chen et al.
Beware of Calibration Data for Pruning Large Language Models
Yixin Ji, Yang Xiang, Juntao Li et al.
ProKeR: A Kernel Perspective on Few-Shot Adaptation of Large Vision-Language Models
Yassir Bendou, Amine Ouasfi, Vincent Gripon et al.
The Change You Want To Detect: Semantic Change Detection In Earth Observation With Hybrid Data Generationf
Yanis Benidir, Nicolas Gonthier, Clement Mallet
Temporal Chain of Thought: Long-Video Understanding by Thinking in Frames
Anurag Arnab, Ahmet Iscen, Mathilde Caron et al.
DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference
Jinwei Yao, Kaiqi Chen, Kexun Zhang et al.
GNS: Solving Plane Geometry Problems by Neural-Symbolic Reasoning with Multi-Modal LLMs
Maizhen Ning, Zihao Zhou, Qiufeng Wang et al.
Can DPO Learn Diverse Human Values? A Theoretical Scaling Law
Shawn Im, Sharon Li
SimulPL: Aligning Human Preferences in Simultaneous Machine Translation
Donglei Yu, Yang Zhao, Jie Zhu et al.
Just What You Desire: Constrained Timeline Summarization with Self-Reflection for Enhanced Relevance
Muhammad Reza Qorib, Qisheng Hu, Hwee Tou Ng
Adaptive Few-shot Prompting for Machine Translation with Pre-trained Language Models
Lei Tang, Jinghui Qin, Wenxuan Ye et al.
Progressive Mixed-Precision Decoding for Efficient LLM Inference
Hao (Mark) Chen, Fuwen Tan, Alexandros Kouris et al.
DRESSing Up LLM: Efficient Stylized Question-Answering via Style Subspace Editing
Xinyu Ma, Yifeng Xu, Yang Lin et al.
Scaffold-BPE: Enhancing Byte Pair Encoding for Large Language Models with Simple and Effective Scaffold Token Removal
Haoran Lian, Yizhe Xiong, Jianwei Niu et al.
Exploring Model Editing for LLM-based Aspect-Based Sentiment Classification
Shichen Li, Zhongqing Wang, Zheyu Zhao et al.
Sparse Learning for State Space Models on Mobile
Xuan Shen, Hangyu Zheng, Yifan Gong et al.
RobuRCDet: Enhancing Robustness of Radar-Camera Fusion in Bird's Eye View for 3D Object Detection
Jingtong Yue, Zhiwei Lin, Xin Lin et al.
Boost Your Human Image Generation Model via Direct Preference Optimization
Sanghyeon Na, Yonggyu Kim, Hyunjoon Lee
Sharpness-Aware Minimization: General Analysis and Improved Rates
Dimitris Oikonomou, Nicolas Loizou
Towards Physical Understanding in Video Generation: A 3D Point Regularization Approach
Yunuo Chen, Junli Cao, Vidit Goel et al.
Activation Gradient based Poisoned Sample Detection Against Backdoor Attacks
Danni Yuan, Mingda Zhang, Shaokui Wei et al.
Interpretable Vision-Language Survival Analysis with Ordinal Inductive Bias for Computational Pathology
Pei Liu, Luping Ji, Jiaxiang Gou et al.
X-Drive: Cross-modality Consistent Multi-Sensor Data Synthesis for Driving Scenarios
Yichen Xie, Chenfeng Xu, Chensheng Peng et al.
Tree-Wasserstein Distance for High Dimensional Data with a Latent Feature Hierarchy
Ya-Wei Eileen Lin, Ronald Coifman, Gal Mishne et al.
Deeply Supervised Flow-Based Generative Models
Inkyu Shin, Chenglin Yang, Liang-Chieh Chen
g3D-LF: Generalizable 3D-Language Feature Fields for Embodied Tasks
Zihan Wang, Gim Hee Lee
Scalable Fingerprinting of Large Language Models
Anshul Nasery, Jonathan Hayase, Creston Brooks et al.