KIT | KIT-Bibliothek | Impressum | Datenschutz

Experimental Data for the Paper "A Comprehensive Study of k-Portfolios of Recent SAT Solvers"

Bach, Jakob ORCID iD icon 1; Iser, Markus ORCID iD icon 2
1 Institut für Programmstrukturen und Datenorganisation (IPD), Karlsruher Institut für Technologie (KIT)
2 Institut für Informationssicherheit und Verlässlichkeit (KASTEL), Karlsruher Institut für Technologie (KIT)

Abstract:

These are the experimental data for the paper

> Bach, Jakob, Markus Iser, and Klemens Böhm. "A Comprehensive Study of k-Portfolios of Recent SAT Solvers"

published at the conference [*SAT 2022*](http://satisfiability.org/SAT22/).
You can find the paper [here](https://www.doi.org/10.4230/LIPIcs.SAT.2022.2) and the code [here](https://github.com/Jakob-Bach/Small-Portfolios).
See the `README` for details.


Zugehörige Institution(en) am KIT Institut für Informationssicherheit und Verlässlichkeit (KASTEL)
Institut für Programmstrukturen und Datenorganisation (IPD)
Publikationstyp Forschungsdaten
Publikationsdatum 30.05.2022
Erstellungsdatum 19.05.2022 - 21.05.2022
Identifikator DOI: 10.5445/IR/1000146629
KITopen-ID: 1000146629
Lizenz Creative Commons Namensnennung 4.0 International
Externe Relationen Forschungsdaten/Software
Schlagwörter propositional satisfiability, solver portfolios, runtime prediction, machine learning, integer programming
Liesmich

Experimental Data for the Paper "A Comprehensive Study of k-Portfolios of Recent SAT Solvers"

These are the experimental data for the paper

> Bach, Jakob, Markus Iser, and Klemens Böhm. "A Comprehensive Study of k-Portfolios of Recent SAT Solvers"

accepted at the conference SAT 2022.
Check our GitHub repository for the code and instructions to reproduce the experiments.
The data were obtained on a server with an AMD EPYC 7551 CPU (32 physical cores, base clock of 2.0 GHz) and 128 GB RAM.
The Python version was 3.8.10.
With this configuration, the experimental pipeline (run_experiments.py) took about 25 h.

The commit hash for the last run of the experimental pipeline (run_experiments.py) is d402353e7f.
The commit hash for the last run of the evaluation pipeline (run_evaluation.py) is 5ba0468088.
Both commits are also tagged.

In the following, we describe:

  • how to use the experimental data
  • the structure/content of each data file

Using the Data

Most of the experimental pipeline's input and output files are plain CSVs.
You can easily read in any of the CSVs with pandas if you are using Python:

import pandas as pd

results = pd.read_csv('<path>/search_results.csv')

All CSVs were written with <DataFrame>.to_csv(<path>, index=False), i.e., they follow pandas' defaults for CSVs,
so it is unnecessary to pass further parameters when reading them in with pandas.
The CSVs mostly contain plain numbers and strings;
the latter are only quoted if necessary, e.g., if they contain commas.
However, the column solvers in search_results.csv contains lists of solver names.
You can convert this column from string to proper Python lists as follows:

import ast

search_results['solvers'] = search_results['solvers'].apply(ast.literal_eval)

*.db files and corresponding *.csv files

Raw instance-feature databases and solver-runtime databases from GBD,
as well as CSV exports of them.
Outputs of the script prepare_dataset.py.

  • meta:
    Meta-data of SAT instances, e.g., in which competition(s) the instances were used.
    We only use this information to filter instances for the experimental datasets but not as features for predictions.
  • satzilla:
    Instance features obtained with the feature extractor of SATzilla 2012.
    Numeric matrix, apart from the hash column that identifies instances.
    Each row represents a SAT instance; each column represents a feature (column names are feature names).
    The value timeout represents missing values (feature extractor exceeded time or memory limits).
  • sc2020:
    Solver runtimes from the SAT Competition 2020.
    Numeric matrix, apart from the hash column that identifies instances.
    Each row represents a SAT instance (not all of them are actually from this SAT competition, so we filter instances later);
    each column represents a solver (column names are solver names).
    The values failed, unknown, and timeout represent missing values (solver did not solve instance within cut-off time).
  • sc2021:
    Solver runtimes from the SAT Competition 2021.
    Numeric matrix, apart from the hash column that identifies instances.
    Each row represents a SAT instance (not all of them are actually from this SAT competition, so we filter instances later);
    each column represents a solver (column names are solver names).
    The value timeout represents missing values (solver did not solve instance within cut-off time).

sc(2020|2021)_features.csv

Pre-processed instance-feature data for the Main Track of the SAT Competitions 2020 and 2021.
Output of the script prepare_dataset.py; input to the script run_experiments.py.
Numeric matrix, apart from the hash column that identifies instances
(we do not have any categorical features; they would need to be encoded beforehand).
Each row represents a SAT instance; each column represents a feature (column names are feature names).
Has the same number of rows as the corresponding runtime file.
The empty string represents missing values caused by the feature extractor exceeding time or memory limits;
these missing values are handled (imputed) in the prediction pipeline.

sc(2020|2021)_runtimes.csv

Pre-processed solver-runtime data from the Main Track of the SAT Competitions 2020 and 2021.
Output of the script prepare_dataset.py; input to the script run_experiments.py.
Numeric matrix, apart from the hash column that identifies instances.
Each row represents a SAT instance; each column represents a solver (column names are solver names).
Has the same number of rows as the corresponding feature file.
Missing values were replaced with the double cut-off time according to PAR-2 scoring (= 10000).

search_results.csv

Results of portfolio search, e.g., portfolios, train/test objective values, and search times.
Output of the script run_experiments.py; input to the script run_evaluation.py.

  • solvers (string, but actually a list of strings):
    List of the names of the solvers forming the portfolio.
    The solver names are column names in sc(2020|2021)_runtimes.csv.
  • train_objective (float):
    Objective value of a solution (= portfolio) to the K-Portfolio-Problem,
    using the SAT instances from the training data.
    The objective is defined as the PAR-2 score of the portfolio's virtual best solver (VBS).
  • test_objective (float):
    Objective value of the K-Portfolio-Problem for the SAT instances from the test data,
    i.e., take the solvers of the portfolio determined on the training data,
    and compute their VBS on the test instances (without running portfolio search again).
  • (train|test)_portfolio_vws (float):
    PAR-2 score of the virtual worst solver (VWS) formed from the portfolio,
    i.e., select the worst (in terms of PAR-2 score) solver from the portfolio for each instance.
    Might be used for comparison purposes; we do not use it in our evaluation.
    Bounds the objective value for portfolios with instance-specific solver selection.
  • (train|test)_portfolio_sbs (float):
    PAR-2 score of the single best solver (SBS) from the portfolio,
    i.e., the best individual solver contained in the portfolio.
    Might be used for comparison purposes; we use it in Figures 4 and 5 as a baseline.
  • (train|test)_portfolio_sws (float):
    PAR-2 score of the single worst solver (SWS) from the portfolio,
    i.e., the worst individual solver contained in the portfolio.
    Might be used for comparison purposes; we do not use it in our evaluation.
  • (train|test)_global_sws (float):
    PAR-2 score of the single worst solver (SWS) from all solvers (independent from current portfolio),
    i.e., the globally worst individual solver.
    Might be used for comparison purposes; we use it in Figures 1 and 2 for the submodularity-based upper bound.
  • search_time (float):
    Runtime (in seconds) of the portfolio search (on the particular dataset and cross-validation fold,
    with the particular portfolio-search approach).
  • search_id (int):
    Identifier denoting combinations of dataset, cross-validation fold, and portfolio-search approach.
    The experimental pipeline parallelizes these tasks.
    Each search_id might be associated with multiple portfolios;
    combining search_id and solution_id allows joining search_results and prediction_results.
  • solution_id (int):
    Identifier to distinguish between multiple portfolios found with
    a particular portfolio-search approach on a particular dataset and cross-validation fold.
    Apart from mip_search (the optimal solution), all search approaches yield multiple portfolios.
  • fold_id (int in {0, 1, 2, 3, 4}):
    Index of the cross-validation fold.
  • problem (string, 2 different values):
    Dataset name (in our experiments: SC2020 or SC2021).
  • algorithm (string, 4 different values):
    Search approach used to determine portfolios
    (in our experiments: beam_search, kbest_search, mip_search, or random_search).
  • k (int in [1, 48]):
    Desired number of solvers in the portfolio, an input parameter to portfolio search.
    The actual number of solvers in column solvers might differ,
    as beam search and k-best are only run with the maximal k for each dataset,
    but also yield all smaller portfolios (intermediate results).
  • w (int in [1, 100], but stored as float):
    Beam width if beam search was used, or number of random samples if random search was used.
    Missing value (represented as an empty string) for the other two search approaches.
    Input parameter to portfolio search.

prediction_results.csv

Results of predictions with portfolios, e.g., train/test prediction performance, train/test objective values, and feature importance.
Output of the script run_experiments.py; input to the script run_evaluation.py.

  • model (string, 2 different values):
    Name of the prediction model (in our experiments: Random Forest and XGBoost).
    Each prediction model is trained for each portfolio from the search
    (thus, prediction_results has twice the number of rows as search_results).
  • pred_time (float):
    Runtime (in seconds) for training the prediction model (for one portfolio) on the training data,
    and making predictions on training data as well as test data.
  • (train|test)_pred_mcc (float in [-1, 1]):
    Prediction performance in terms of Matthews Correlation Coefficient for predicting
    the best solver from the portfolio for each instance.
  • (train|test)_pred_objective (float):
    PAR-2 score of the (instance-specific) solver recommendations made by the prediction model.
  • imp.<feature_name> (float in [0, 1]):
    Feature importances extracted from the prediction model after training.
    Importances are normalized and should sum up to one for each row.
    Missing values (represented as empty strings) occur if no prediction model was trained
    since the prediction target only had one class, e.g.,
    portfolio had size one or always the same solver (out of multiple solvers) was best.
  • search_id (int), solution_id (int):
    Same as in search_results.csv, can be used for joining the results.

Evaluation_console_output.txt

Output of the script run_evaluation.py, manually copied from the console to a file.

Art der Forschungsdaten Dataset
Relationen in KITopen
KIT – Die Forschungsuniversität in der Helmholtz-Gemeinschaft
KITopen Landing Page