KIT | KIT-Bibliothek | Impressum | Datenschutz

Accelerating neural network training with distributed asynchronous and selective optimization (DASO)

Coquelin, Daniel ORCID iD icon 1; Debus, Charlotte 1; Götz, Markus ORCID iD icon 1; Lehr, Fabrice von der; Kahn, James ORCID iD icon 1; Siggel, Martin; Streit, Achim ORCID iD icon 1
1 Karlsruher Institut für Technologie (KIT)

Abstract:

With increasing data and model complexities, the time required to train neural networks has become prohibitively large. To address the exponential rise in training time, users are turning to data parallel neural networks (DPNN) and large-scale distributed resources on computer clusters. Current DPNN approaches implement the network parameter updates by synchronizing and averaging gradients across all processes with blocking communication operations after each forward-backward pass. This synchronization is the central algorithmic bottleneck. We introduce the distributed asynchronous and selective optimization (DASO) method, which leverages multi-GPU compute node architectures to accelerate network training while maintaining accuracy. DASO uses a hierarchical and asynchronous communication scheme comprised of node-local and global networks while adjusting the global synchronization rate during the learning process. We show that DASO yields a reduction in training time of up to 34% on classical and state-of-the-art networks, as compared to current optimized data parallel training methods.


Verlagsausgabe §
DOI: 10.5445/IR/1000143221
Veröffentlicht am 22.02.2022
Originalveröffentlichung
DOI: 10.1186/s40537-021-00556-1
Scopus
Zitationen: 3
Web of Science
Zitationen: 4
Dimensions
Zitationen: 4
Cover der Publikation
Zugehörige Institution(en) am KIT Scientific Computing Center (SCC)
Universität Karlsruhe (TH) – Zentrale Einrichtungen (Zentrale Einrichtungen)
Publikationstyp Zeitschriftenaufsatz
Publikationsmonat/-jahr 02.2022
Sprache Englisch
Identifikator ISSN: 2196-1115
KITopen-ID: 1000143221
HGF-Programm 46.21.04 (POF IV, LK 01) HAICU
Weitere HGF-Programme 46.21.02 (POF IV, LK 01) Cross-Domain ATMLs and Research Groups
Erschienen in Journal of Big Data
Verlag SpringerOpen
Band 9
Heft 1
Seiten 14
Bemerkung zur Veröffentlichung Gefördert durch den KIT-Publikationsfonds
Nachgewiesen in Scopus
Web of Science
Dimensions
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page