A Maximum-Entropy Method to Estimate Discrete Distributions from Samples Ensuring Nonzero Probabilities

Darscheid, Paul; Guthke, Anneli; Ehret, Uwe

doi:10.3390/e20080601

A Maximum-Entropy Method to Estimate Discrete Distributions from Samples Ensuring Nonzero Probabilities

Darscheid, Paul ¹; Guthke, Anneli; Ehret, Uwe ¹
¹ Institut für Wasser und Gewässerentwicklung (IWG), Karlsruher Institut für Technologie (KIT)

Abstract:

When constructing discrete (binned) distributions from samples of a data set, applications exist where it is desirable to assure that all bins of the sample distribution have nonzero probability. For example, if the sample distribution is part of a predictive model for which we require returning a response for the entire codomain, or if we use Kullback–Leibler divergence to measure the (dis-)agreement of the sample distribution and the original distribution of the variable, which, in the described case, is inconveniently infinite. Several sample-based distribution estimators exist which assure nonzero bin probability, such as adding one counter to each zero-probability bin of the sample histogram, adding a small probability to the sample pdf, smoothing methods such as Kernel-density smoothing, or Bayesian approaches based on the Dirichlet and Multinomial distribution. Here, we suggest and test an approach based on the Clopper–Pearson method, which makes use of the binominal distribution. Based on the sample distribution, confidence intervals for bin-occupation probability are calculated. The mean of each confidence interval is a strictly positive estimator of the true bin-occupation probability and is convergent with increasing sample size. ... mehr

KITopen-Download

Verlagsausgabe

DOI: 10.5445/IR/1000085954

Veröffentlicht am 24.09.2018

Externe Links

Originalveröffentlichung
DOI: 10.3390/e20080601

Scopus
Zitationen: 11

Web of Science
Zitationen: 10

Dimensions
Zitationen: 15

Export

Statistiken

Seitenaufrufe: 915
seit 27.09.2018

Downloads: 362
seit 27.09.2018

Zugehörige Institution(en) am KIT	Institut für Wasser und Gewässerentwicklung (IWG) KIT-Zentrum Klima und Umwelt (ZKU)
Publikationstyp	Zeitschriftenaufsatz
Publikationsjahr	2018
Sprache	Englisch
Identifikator	ISSN: 1099-4300 urn:nbn:de:swb:90-859547 KITopen-ID: 1000085954
Erschienen in	Entropy
Verlag	MDPI
Band	20
Heft	8
Seiten	Article: 601
Bemerkung zur Veröffentlichung	Gefördert durch den KIT-Publikationsfonds
Vorab online veröffentlicht am	13.08.2018
Schlagwörter	histogram; sample; discrete distribution; empty bin; zero probability; Clopper–Pearson; maximum entropy approach
Nachgewiesen in	Scopus Dimensions OpenAlex Web of Science

Repository KITopen

A Maximum-Entropy Method to Estimate Discrete Distributions from Samples Ensuring Nonzero Probabilities

Abstract: