GitHub - ISI-MIP/isimip-qc: A command line tool for the quality control of climate impact data of the ISIMIP project. (original) (raw)

Python Version License CI status Latest release DOI

A command line tool for the quality control of climate impact data of the ISIMIP project. It mainly covers tests of:

Setup

The application is written in Python (> 3.11) uses only dependencies, which can be installed without administrator privileges. The installation of Python (and its developing packages), however differs from operating system to operating system. Optional Git is needed if the application is installed directly from GitHub. The installation of Python 3 and Git for different platforms is documented here.

The tool itself can be installed via pip. Usually you want to create a virtual environment first, but this is optional. The tool works also with pipx.

setup venv on Linux/macOS/Windows WSL

python3 -m venv env source env/bin/activate

setup venv on Windows cmd

python -m venv env call env\Scripts\activate.bat

install from the Python Package Index (PyPI), recommended

pip install isimip-qc

update from PyPI

pip install --upgrade isimip-qc

install directly from GitHub

pip install git+https://github.com/ISI-MIP/isimip-qc

update directly from GitHub

pip install --upgrade git+https://github.com/ISI-MIP/isimip-qc

Usage

The tool has several options which can be inspected using the help option -h, --help:

usage: isimip-qc [-h] [-c] [-m] [-O] [--unchecked-path UNCHECKED_PATH]
                 [--checked-path CHECKED_PATH] [--protocol-location PROTOCOL_LOCATIONS]
                 [--log-level LOG_LEVEL] [--show-time] [--show-path] [--log-path LOG_PATH]
                 [--log-path-level LOG_PATH_LEVEL] [--include INCLUDE] [--exclude EXCLUDE] [-f]
                 [-w] [-e] [--ignore-critical] [--skip-exp] [--match-only] [-r [MINMAX]] [-nt]
                 [--summary] [--fix] [--fix-datamodel [FIX_DATAMODEL]] [--check CHECK]
                 [--force-copy-move] [-V]
                 schema_path

Check ISIMIP files for matching protocol definitions

positional arguments:
  schema_path           ISIMIP schema_path, e.g. ISIMIP3a/OutputData/water_global

options:
  -h, --help            show this help message and exit
  -c, --copy            copy checked files to CHECKED_PATH if no warnings or errors were found
  -m, --move            move checked files to CHECKED_PATH if no warnings or errors were found
  -O, --overwrite       overwrite files in CHECKED_PATH if present. Default is False.
  --unchecked-path UNCHECKED_PATH
                        base path of the unchecked files
  --checked-path CHECKED_PATH
                        base path for the checked files
  --protocol-location PROTOCOL_LOCATIONS
                        URL or file path to the protocol when different from official repository
  --log-level LOG_LEVEL
                        log level (CRITICAL, ERROR, WARN, CHECKING, INFO, or DEBUG) [default:
                        CHECKING]
  --show-time           show time in console logs
  --show-path           show path in console logs
  --log-path LOG_PATH   base path for the individual log files
  --log-path-level LOG_PATH_LEVEL
                        log level for the individual log files [default: WARN]
  --include INCLUDE     patterns of files to include. Exclude those that don't match any.
  --exclude EXCLUDE     patterns of files to exclude. Include only those that don't match any.
  -f, --first-file      only process first file found in UNCHECKED_PATH
  -w, --stop-on-warnings
                        stop execution on warnings
  -e, --stop-on-errors  stop execution on errors
  --ignore-critical     allow fixing and copy/move files with critical issues found
  --skip-exp            skip test for valid experiment combination
  --match-only          only match the file name and skip all other checks
  -r [MINMAX], --minmax [MINMAX]
                        test values for valid range (slow). MINMAX denotes the length of the
                        ordered top list of outliers
  -nt, --skip-time-span-check
                        skip check for simulated time period
  --summary             append a summary with statistics about experiments and specifiers to the
                        output
  --fix                 try to fix warnings detected on the original files
  --fix-datamodel [FIX_DATAMODEL]
                        also fix warnings on data model found using NCCOPY or CDO (slow). Choose
                        preferred tool per lower case argument.
  --check CHECK         perform only one particular check
  --force-copy-move     copy or move files despite errors
  -V, --version         show program's version number and exit

The only mandatory argument is the schema_path, which specifies the pattern and schema to use. The schema_path consitst of the simulation_round, the product, and the sector separated by slashes, e.g. ISIMIP3a/OutputData/water_global. If the only argument used is schema_path, the current user path when calling the tool should be same as the directory of the files to be checked.

The options in detail

[isimip-qc]  
unchecked_path = "data/unchecked"  
checked_path = "data/checked"  
log_level = "INFO"  
log_path = "data/log"  
log_path_level = "INFO"  
minmax = true  # for the default value  
minmax = 5     # for a custom value