KIT | KIT-Bibliothek | Impressum | Datenschutz

experiment_evaluation

Seiberlich, Mervin ORCID iD icon 1; Hernandez Sosa, Gerardo ORCID iD icon 2
1 Lichttechnisches Institut (LTI), Karlsruher Institut für Technologie (KIT)
2 Institut für Mikrostrukturtechnik (IMT), Karlsruher Institut für Technologie (KIT)

Abstract (englisch):

Python module for the evaluation of lab experiments.

The module implements functions to import meta-data of measurements, filters to search for subsets of them and routines to import and plot data from this meta-data. It works well in its original context but is currently in open alpha since it will be restructured in order to be compatible with new lab environments.


Zugehörige Institution(en) am KIT Institut für Mikrostrukturtechnik (IMT)
Lichttechnisches Institut (LTI)
Publikationstyp Forschungsdaten
Publikationsdatum 23.02.2023
Erstellungsdatum 01.01.2022 - 19.02.2023
Identifikator DOI: 10.5445/IR/1000156174
KITopen-ID: 1000156174
Lizenz GPLv3 - GNU General Public License or (at your option) any later version
Externe Relationen Forschungsdaten/Software
Forschungsdaten/Software
Schlagwörter RSE, metadata, data evaluation, python
Liesmich

🔬️ Experiment Evaluation

Python module for the evaluation of lab experiments.

The module implements functions to import meta-data of measurements, filters to search for subsets of them and routines to import and plot data from this meta-data. It works well in its original context but is currently in open alpha since it will be restructured in order to be compatible with new lab environments.

Examples of its usage in scientific works will soon be published by the author that can be used to reference it. Feel free to use it for your own projects and to ask questions. For now you can cite this repository as source.

💻️ Installation

You need a running python3 installation on your OS. The module was written on Debian/GNU-Linux, was tested on Windows and should also run on other OS.
It is recommended to work in an virtual environment (see the official python documentation -> from bash: python3 -m venv exp_env source exp_env/bin/activate) or conda installation.

Dependencies

Dependencies are the usual scientific modules like numpy, matplotlib, pandas but also astropy. See the requirements.txt from that you should be able to install the library with

pip install pip -U # Update pip itself
pip install -r /path/to/requirements.txt

Alternatively you can also install the required modules from the shell etc.. The author recommends to also install jupyter that includes the interactive ipython:

# Example via pip
pip install jupyter

pip install numpy
pip install matplotlib
pip install scipy
pip install pandas
pip install astropy
pip install mplcursors
pip install pynufft
# pip install python-slugify  # make sure this version of slugify is installed and not 'slugify'

The module itself

Inside your virtual environment there is a folder exp_env/lib/python3.../site-packages. Place the file experiment_evaluation.py inside this folder (or a new sub-folder with all your personal scientific code) to make it accessible.

From within your code (try it from an interactive ipython session) you should now be able to import it via:

import experiment_evaluation as ee
# or from subfolder: import my_scientific_modules.experiment_evaluation as ee

Matplotlib style

In order to use the fancy custom styles (for example for consistent looking graphs throughout your publication) it is advised to use matplotlib styles. For the provided styles, copy the custom styles "thesis_default.mplstyle" etc. from the folder stylelib inside your matplotlib library folder:
lib/python3.9/site-packages/matplotlib/mpl-data/stylelib/*.mplstyle

🧑‍💻 Usage

A good way to learn its usage is to have a look at the example file. But since the module is work in progress we first explain some concepts.

✨ Why meta-data?

The module automates several steps of experiment evaluations. But the highlight is its capability to handle experimental meta-data. This enables the user to automatically choose and plot data with a question in mind (example: plot all EQE-curves at -2V and 173Hz) instead of repeatedly choosing files manually. For calculations that need more than one measurement this becomes extremely useful but also for implementing statistics.
Meta data include things like experimental settings (applied voltage on a diode, time of the measurement, temperature etc.), the experimentalist and technical informations (file-format etc., manufacturer experimental device).

The module includes some generic functions but to use it for your specific lab environment you might need to add experiment and plot specific functions.

💾️ How to save your experiment files?

In general lab measurement files stem from different devices and export routines. So frankly speaking lab-data is often a mess! But to use automatic evaluation tools some sort of system to recognize the measurement-type and store the meta-data is needed. In an ideal world a lab would decide on one file format for all measurements and labels them systematically. To include different data-types and their meta-data within one file-type there exists the *.asdf (advanced scientific data format, see their documentation for further insight). So if you are just starting with your PhD try to use this file format everywhere ;).
Also to make experiments distinguishable every experiment needs an unique identifier. So you also should number every new experiment with an increasing number and the type of the experiment.

Example of useful file naming for EQE measurements: Nr783_EQE.asdf

In the case of my PhD I decided to use what I found: store the different file formats, store them in folders with the name of the experiment and include meta-data in the file-names (bad example: EQE/Nr783_3volt_pix1.csv). This was not the best idea (so learn from what I learned :P)
To handle that mess, this module therefore implements also some regular-expressions to extract meta-data from file-names (ee.meta_from_filename()), but in general it is advised to store all meta-data in the file-header (with the exception of the unique identifier and experiment type). Like this you could store your files in whatever folder structure you like and still find them from within the script. The module then imports meta-data from the files into a database and you can do fancy data-science with your data!

📑️ Database

For calculations and filtering of datasets the meta-data and data needs to be accessible in a machine readable form. For the time being the module imports all meta-data into a pandas DataFrame that represents our database (For very large datasets this would possibly be needed to be changed). For this we have to name the root folder that includes all experiment files/folders.

Hint: If you did not follow the unique labeling/numbering for all your experiments you can still use this module by choosing a root folder that only includes the current experiment.

from pathlib import Path
measurement_root_folder = Path("/home/PhD/Data/")

We can specify some pre-filtering for the specific experiment we want to evaluate:

# make use of the '/' operator to build OS independant paths
measurement_folder = measurement_root_folder / "LaserLab" / "proximity-sensor" / "OPD-Lens" / "OPD-Lens_v2" 

# Define some pre-filter
devices = [nr for nr in range(1035, 1043)]  # Unique sample numbers of the experiment listed by list-comprehension 
explst = "Mervin Seiberlich"

Then we import the metadata into the pandas DataFrame database via ee.list_measurements() and call it meta-table:

meta_table = ee.list_measurements(measurement_root_folder, devices, experimentalist=explst, sort_by=["measurement_type", "nr", "pix", "v"])

💡️ Advanced note:

Internally ee.list_measurements() uses custom functions to import the experiment specific meta-data. Have a look into the source-code and search for read_meta for an example how this works in detail. With the *.asdf file-format only the generalized import function would be needed.

Import data and meta-data

To import now some measurement data for plotting we use the information inside meta_table with custom import routines and python dictionaries implementing our filters:

# Distinguish between reference and other measurments
lens = {"nr":devices[:5]}
ref = {"nr":devices[5:]}

# Select by bias and compare reference samples with lens (**dict unpacks the values to combine two or mor dictionaries)
eqe_lens_0V = ee.import_eqe(meta_table, mask_dict={**lens, **{"v":0}})
eqe_ref_0V = ee.import_eqe(meta_table, mask_dict={**ref, **{"v":0}})

This yields python lists eqe_lens_0V = [table1, table2, ... tableN] with the selected data ready for plotting (Lists are maybe not smart for huge dataset and some N-dimensional object can replace this in future). Note: The tables inside the list are astropy.QTable() objects including the data and meta-data, as well as units!

So with this few lines of code you already did some advanced data filtering and import!

🌡️ Physical units

The module astropy includes a submodule astropy.units. Since we deal with real world data, it is a good idea to also include units in calculations.

import astropy.units as u

# Radius of one microlens:
r = 98 * u.um

📝️ Calculations

If you have to repeatedly do some advanced calculations or fits for some plots, include them as functions in the source-code. An example would be ee.pink_noise()

📊️ Plots

For plotting there exists many modules in python. Due to its grate power we use matplotlib. This comes with the cost of some complexity (definitely have a look at its documentation!). But this enables us for example to have a consistence color style, figure-size and text-size in large projects like a PhD-thesis:

mpl.style.use(["thesis_default", "thesis_talk"])   # We use style-sheets to set things like figure-size and text-size, see https://matplotlib.org/stable/tutorials/introductory/customizing.html#composing-styles

w,h = plt.rcParams['figure.figsize'] # get the default size for figures to scale plots accordingly

In order to not invent the wheel over and over again it makes sense to wrap some plotting routines for each experiment inside some custom functions. For further detail see the documentation/recommended function signature for matplotlib specialized functions. This enables easy experiment-type specific plotting (even with statistics) once all functions are set up:

# %% plot eqe statistics
fig, ax = plt.subplots(1,1, figsize=(w, h), layout="constrained")

ee.plot_eqe(ax, eqe_lens_0V, statistics=True, color="tab:green", plot_type="EQE", marker=True, ncol=2)
ee.plot_eqe(ax, eqe_ref_0V, statistics=True, color="tab:blue", plot_type="EQE", marker=True, ncol=2)
ax.set_ylim(0,60)
ax.legend(["Lens","$1\sigma - Lens$", "Reference", "$1\sigma - reference$"], loc="lower left", ncol=2)

Only 5 lines of code! For further details see the example file!

📖 FAQ

  1. How do I get further information about the functions of the module?
    Use the doc-string! In jupyter-lab or ipython type for example ee.plot_eqe? to read more about the custom plot function for eqe measurements

  2. How do I exclude some specific measurements (outliers, false_measurements)?
    Import functions now implement an exclude argument comprising a list of dictionarys to exclude specific data.

    data_dark_pcbm_ref = ee.import_iV(meta_table, A=0.01*u.cm**2, mask_dict={"nr":PCBM, **pix}, exclude=[{"c":[25,150]}, {"solventOnly":True}])

    Sometimes it is helpful to only append single measurements by the += operator though. Here is an example:

data_ldr_0V = ee.import_ldr(meta_table, mask_dict={"nr":[1035, 1036, 1037, 1038, 1039, 1040, 1041], "v":0, "custompower":[0, 9.5]})
data_ldr_0V += ee.import_ldr(meta_table, mask_dict={"nr":1042, "pix":1, "v":0, "custompower":[0, 9.5]})

👥️ Contributors and acknowledgment

Many question were answered by searching through forums and the author is very thankful for the culture of openness in the FOSS community.

A warm thank you goes to Jyh-Miin Lin who provided a helpful example and explanations on how to use the excellent pynufft package for non-uniform FFT.

💌️ License

License:

Copyright (C) 2022-2023 Mervin Seiberlich

Author - Mervin Seiberlich

This file is part of experiment_evaluation.

experiment_evaluation is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

experiment_evaluation is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with experiment_evaluation. If not, see <https://www.gnu.org/licenses/&gt;.

Art der Forschungsdaten Dataset
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page