KIT | KIT-Bibliothek | Impressum | Datenschutz

HybriDLA: Hybrid Generation for Document Layout Analysis

Chen, Yufan 1; Moured, Omar ORCID iD icon 1; Liu, Ruiping 1; Zheng, Junwei 1; Peng, Kunyu ORCID iD icon 1; Zhang, Jiaming ORCID iD icon 1; Stiefelhagen, Rainer ORCID iD icon 1
1 Institut für Anthropomatik und Robotik (IAR), Karlsruher Institut für Technologie (KIT)

Abstract:

Conventional document layout analysis (DLA) traditionally depends on empirical priors or a fixed set of learnable queries executed in a single forward pass. While sufficient for early-generation documents with a small, predetermined number of regions, this paradigm struggles with contemporary documents, which exhibit diverse element counts and increasingly complex layouts. To address challenges posed by modern documents, we present HybriDLA, a novel generative framework that unifies diffusion and autoregressive decoding within a single layer. The diffusion component iteratively refines bounding-box hypotheses, whereas the autoregressive component injects semantic and contextual awareness, enabling precise region prediction even in highly varied layouts. To further enhance detection quality, we design a multi-scale feature-fusion encoder that captures both fine-grained and high-level visual cues. This architecture elevates performance to 83.5% mean Average Precision (mAP). Extensive experiments on the DocLayNet and M6Doc benchmarks demonstrate that HybriDLA sets a state-of-the-art performance, outperforming previous approaches.


Download
Originalveröffentlichung
DOI: 10.1609/aaai.v40i4.37308
Zugehörige Institution(en) am KIT Institut für Anthropomatik und Robotik (IAR)
Publikationstyp Proceedingsbeitrag
Publikationsjahr 2026
Sprache Englisch
Identifikator ISSN: 2374-3468, 2159-5399
KITopen-ID: 1000192448
Erschienen in Proceedings of the AAAI Conference on Artificial Intelligence
Veranstaltung 40th AAAI Conference on Artificial Intelligence (2026), Singapur, Singapur, 20.01.2026 – 27.01.2026
Auflage 4
Verlag Association for the Advancement of Artificial Intelligence (AAAI)
Seiten 3147–3155
Serie 40
Vorab online veröffentlicht am 14.03.2026
Nachgewiesen in OpenAlex
Scopus
Relationen in KITopen
Globale Ziele für nachhaltige Entwicklung Ziel 4 – Hochwertige Bildung
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page