KIT | KIT-Bibliothek | Impressum | Datenschutz

HybriDLA: Hybrid Generation for Document Layout Analysis

Chen, Yufan 1; Moured, Omar ORCID iD icon 1; Liu, Ruiping 1; Zheng, Junwei 1; Peng, Kunyu ORCID iD icon 1; Zhang, Jiaming ORCID iD icon 1; Stiefelhagen, Rainer ORCID iD icon 1
1 Institut für Anthropomatik und Robotik (IAR), Karlsruher Institut für Technologie (KIT)

Abstract:

Conventional document layout analysis (DLA) traditionally depends on empirical priors or a fixed set of learnable queries executed in a single forward pass. While sufficient for early-generation documents with a small, predetermined number of regions, this paradigm struggles with contemporary documents, which exhibit diverse element counts and increasingly complex layouts. To address challenges posed by modern documents, we present HybriDLA, a novel generative framework that unifies diffusion and autoregressive decoding within a single layer. The diffusion component iteratively refines bounding-box hypotheses, whereas the autoregressive component injects semantic and contextual awareness, enabling precise region prediction even in highly varied layouts. To further enhance detection quality, we design a multi-scale feature-fusion encoder that captures both fine-grained and high-level visual cues. This architecture elevates performance to 83.5% mean Average Precision (mAP). Extensive experiments on the DocLayNet and M6Doc benchmarks demonstrate that HybriDLA sets a state-of-the-art performance, outperforming previous approaches.


Download
Originalveröffentlichung
DOI: 10.1609/aaai.v40i4.37308
Zugehörige Institution(en) am KIT Institut für Anthropomatik und Robotik (IAR)
Publikationstyp Proceedingsbeitrag
Publikationsjahr 2026
Sprache Englisch
Identifikator ISSN: 2374-3468, 2159-5399
KITopen-ID: 1000192448
Erschienen in Proceedings of the AAAI Conference on Artificial Intelligence
Veranstaltung 40th AAAI Conference on Artificial Intelligence (2026), Singapur, Singapur, 20.01.2026 – 27.01.2026
Auflage 4
Verlag Association for the Advancement of Artificial Intelligence (AAAI)
Seiten 3147–3155
Serie 40
Vorab online veröffentlicht am 14.03.2026
Nachgewiesen in Scopus
OpenAlex
Relationen in KITopen
KIT – Die Universität in der Helmholtz-Gemeinschaft
KITopen Landing Page