The Inter-Sectoral Impact Model Intercomparison Project (original) (raw)

  1. Homepage
  2. Simulation Protocols
  3. Preparing simulation files

Preparing simulation files

Submission of simulations

Model output versioning

Note for ISIMIP3 round:

Note for water regional models:

ISIMIP Quality Checking Tool

We developed a quality checking tool that will allow us to test your newly generated files against the definitions, patterns and schemas from our machine-readable protocols for ISIMIP2 and ISIMIP3.

The tool is ready to work on Windows, MacOS and Linux machines with Python>=3.12 installed. You can find install and usage instructions on its GitHub page: https://github.com/ISI-MIP/isimip-qc

Applying the tool to your own files before submitting them to DKRZ could lower the chance of time-consuming email conversations about inconsistencies in the submitted NetCDF files. However, if you are not able to use this tool, our data management team will of course proceed as before.

The tool primarily checks that the NetCDF headers are correct, including the dimension variables, data variables, requested global attributes, and number of time steps, so that they match the specifications given in the file name. The tool cannot correct incorrect variable data types or manipulate the overall data structure or data values.

For the ISIMIP3 round, this tool is capabale of testing whether all values of a variable are within valid-value ranges that were defined in the machine readable protocol at GitHub.

The tool has already been tested on a substantial part of ISIMIP output data, but if you experience crashes or unexpected behavior, please let us know by filing an issue on the GitHub page or writing an email to isimip-data@pik-potsdam.de.

A demo script for using the tool is here.

Quality checks on your simulation data

Our data management team runs a series of checks that identify severe and fixable errors, and correct files with fixable (non-severe) errors. This is the summary of quality checks performed:

The QC process produces a report of the checks and fixes performed. Modelers can consult this report (and log files) at the DKRZ folder: /work/bb0820/ISIMIP/[SIMULATION-ROUND]/UploadArea/[SECTOR]/[MODEL]/_qc-logs/ . Non-severe issues will be fixed by the data managers, but all others will need your assistance. Please check this folder on a regular basis and get in touch with our data management team at isimip-data@pik-potsdam.de to discuss fixes on the files.

Files that successfully pass the Quality Checks will appear in the Output folder: /work/bb0820/ISIMIP/[SIMULATION-ROUND]/OutputData. Such files will be available to all ISIMIP participants.

Working with NetCDF files

Files should be provided in compressed NetCDF format, a self-describing, machine-independent data format that support the creation, access, and sharing of array-oriented scientific data. It can be read/written/processed for example by:

General formatting requirements

In this section you will find the general formatting specifications that your submitted files should follow, in terms of: File naming, Format, Grid, Simulation periods, Global attributes, Variables and dimensions, Time axis, Temporal resolution

Some formatting issues have easy solutions –like wrong chunk sizes, wrong NetCDF format, inverted grid, or wrong variable and dimension names–, please check our easy fixes within every subsection. For further instructions, please give a look at the NetCDF utilities mentioned above.

Standard NetCDF header

A proper NetCDF header for daily global data, shown with ncdump -h FILE should look like this:

dimensions: lon = 720 ; lat = 360 ; time = UNLIMITED ; variables: double lon(lon) ; lon:standard_name = "longitude" ; lon:long_name = "Longitude" ; lon:units = "degrees_east" ; lon:axis = "X" ;

double lat(lat) ; lat:standard_name = "latitude" ; lat:long_name = "Latitude" ; lat:units = "degrees_north" ; lat:axis = "Y" ;

double time(time) ; time:standard_name = "time" ; lat:long_name = "Time" ; time:units = "days since 1661-01-01 00:00:00" ; time:calendar = "proleptic_gregorian" ; time:axis = "T" ;

float tas(time, lat, lon) ; tas:_FillValue = 1.e+20f ; tas:missing_value = 1.e+20f ; tas:units = “K" ; tas:standard_name = "air_temperature" ; tas:long_name = “Near-Surface Air Temperature" ;

// global attributes: :contact = "ISIMIP Coordination Team info@isimip.org"; :institution = "Potsdam-Institute for Climate Impact Research (PIK)"; :comment = "Data prepared for ISIMIP2b" ;

File naming

Within every protocol there is a section dedicated to the conventions on file naming, which applies to all sectors.

File names consist of a series of identifiers, separated by underscores. Identifiers depend on the simulation round, temporal resolution and may be dependent on the sector (see details below).

In general, file names should follow this convention:

_<gcm/observations>__­_______.nc4

Here you'll find example file names for the biomes and global water sector.

Format

You can check a file's format with command "cdo showformat FILE".

Easy fix of NetCDF format

nccopy -k4 -d5 IFILE OFILE

Grid

Easy fix of inverted gid, i.e. to reverse the latitude index

cdo -s invertlat IFILE OFILE

Variables and dimensions

Please note that regional data, multidimensional data, variables with fixed levels (depth layers) and variables with varying levels have additional specifications.

Dimension/Coordinate variable name standard_name long_name unit axis
lon longitude Longitude degrees_east X
lat latitude Latitude degrees_north Y
time time Time days since [reference date] T
depth depth_below_surface Depth of Vertical Layer Center Below Surface m Z

Easy fix of output variable name:

This implies changing the name of the output variable from VAROLD to VARNEW:

ncrename -O -h -v VAROLD,VARNEW IFILE

Easy fix of dimension name:

This implies changing both the name of the dimension (from DIMOLD to DIMNEW) and coordinate variable (from DIMNAMEOLD to DIMNAMENEW):

ncrename -O -h -d DIMOLD,DIMNEW -v DIMNAMEOLD,DIMNAMENEW IFILE

Time axis

Simulation round Reference date
ISIMIP2a 1901-01-01 00:00:00
ISIMIP2b 1661-01-01 00:00:00
ISIMIP3a 1901-01-01 00:00:00
ISIMIP3b 1601-01-01 00:00:00

Temporal resolution

File name specifier Description
"daily" Daily time resolution.For files holding global daily data, files should cover 10 years starting in the second year of a decade and end in the first year of the next decade (e.g. 1991-2000).If the simulation period starts after the second year of the decade or ends before the first year of the new decade, the start or end year of the simulation period should be used as the start or end year of the file respectively.Non-global daily data should be submitted for the entire simulation period in single files per variable.
"monthly", "annual" or "decadal" Monthly, annual, or decadal time resolution.Output should be reported in one single file per simulation period. Simulations should be reported in the first day of the month/year/decade (e.g. 01-01-1861).
"30year-mean" Variables reported with time resolution of 30-year averages (e.g. biodiversity sector).Output should be reported in one single file per simulation period.
"5-year" Variables with time resolution of 5-year periods (e.g. health).Output should be reported in one single file per simulation period.
"growing-season" Variables reported per growing season (e.g. agriculture).Output should be reported in one single file per simulation period.Time dimension is replaced by a unitless coordinate variable with integer values, or counter, named ‘growing-season’, indicating the number of growing season since starting year of the period.

Simulation periods

Sim. round / Period subfolder pre-industrial spin-up (not reported) historical future future_extended
ISIMIP2a 1861-2005
ISIMIP2a extended 1861-1900 1901-2016/2018
ISIMIP2b 1661-1860 1861-2005 2006-2099 2100-2299
ISIMIP3a 1860-1900 1901-2018
ISIMIP3b 1601-1849 1850-2014 2015-2100

Note on simulation periods:

Some input data for ISIMIP3a has been taken from ISIMIP2a and is in the process of being extended. Please contact us if you have any doubts.

Chunk sizes

NetCDF4 internally chunks the data into subsets. Usually one chunk is defined by a record, i.e. a combination of one horizontal field at one time step and one vertical layer. Chunking the data differently makes operations on the data extremely more time consuming. More info here.

You can check the chunk sizes with command "ncdump -hs FILE".

Easy fix of chunk sizes

If your file does not have correct chunk sizes, try rewriting the data with:

2d: nccopy -k4 -d5 -c "time/1,lat/360,lon/720" IFILE OFILE 3d: nccopy -k4 -d5 -c "time/1,depth/10,lat/360,lon/720" IFILE OFILE or 3d: nccopy -k4 -d5 -c "time/1,depth/1,lat/360,lon/720" IFILE OFILE

In some cases, when all dimensions are set as contiguous, the commands above might not work. In those cases, try:

cdo -f nc4c -z zip -copy IFILE OFILE

Global attributes

Requirements for regional data

Requirements for multidimensional data

In some sectors, a same simulation must be reported for several locations or disaggregated into different cohorts. In such cases, please explain in the online model documentation; contact the ISIMIP coordination team in case of questions.

Requirements for variables with fixed levels

For variables with fixed levels (e.g. layers whose depths do not change over time nor over space), we require the following:

Specific uses:

Example NetCDF header for variables with globally and temporary fixed levels (except lakes sector)

A proper NetCDF header printed with "ncdump -h FILE" of a variable with fixed depth layers.

Lines with a trailing triple asterisk indicate the optional usage of layer boundaries.

dimensions: lon = 720 ; lat = 360 ; time = UNLIMITED ; // (2400 currently) depth = 5 ; bnds = 2 ;

variables: double lon(lon) ; lon:long_name = "longitude" ; lon:units = "degrees_east" ; lon:standard_name = "Longitude" ; lon:axis = "X";

double lat(lat) ;
    lat:long_name = "latitude" ;
    lat:units = "degrees_north" ;
    lat:standard_name = "Latitude" ;
            lat:axis = "Y";

double time(time) ;
    time:units = "days since 1661-01-01" ;
    time:calendar = "proleptic_gregorian" ;
            time:standard_name = "time";
            time:long_name = "Time";
            time:axis = "T";

float depth(depth) ;
            depth:units = "m" ;
            depth:bounds = "depth_bnds" ;
            depth:standard_name = "depth_below_surface" ;
            depth:long_name = "Depth of Vertical Layer Center Below Surface" ;
            depth:positive = "down" ;
            depth:axis = "Z" ;

float depth_bnds(depth, bnds) ;
    depth_bnds:units = "m" ;

float soilmoist(time, depth, lat, lon) ;
            soilmoist:standard_name = "soilmoist" ;
    soilmoist:long_name = "Total Soil Moisture Content" ;
    soilmoist:units = "kg m-2" ;
    soilmoist:_FillValue = 1.e+20f ;
    soilmoist:missing_value = 1.e+20f ;

// global attributes: :contact = "Name email@place.com"; :institution = "Institution of Affiliation (ACRONYM)" ; :comment = "Data prepared for ISIMIP2b" ;

Example NetCDF header for variables with globally and temporary fixed levels (lakes sector only)

dimensions: lon = 720 ; lat = 360 ; time = UNLIMITED ; levlak = 20 ; bnds = 2 ;

variables: double lon(lon) ; lon:long_name = "longitude" ; lon:units = "degrees_east" ; lon:standard_name = "Longitude" ; lon:axis = "X";

double lat(lat) ;
    lat:long_name = "latitude" ;
    lat:units = "degrees_north" ;
    lat:standard_name = "Latitude" ;
            lat:axis = "Y";

double time(time) ;
    time:units = "days since 1661-01-01" ;
    time:calendar = "proleptic_gregorian" ;
            time:standard_name = "time";
            time:long_name = "Time";
            time:axis = "T";

float levlak(levlak) ;
    levlak:long_name = "Vertical Water Layer Index" ;
    levlak:standard_name = "water_layer" ;
    levlak:units = "-" ;
    levlak:positive = "down" ;
    levlak:axis = "Z" ;

    float bnds(bnds) ; 
        bnds:positive = "down" ; 

    float depth_bnds(time, bnds, levlak, lat, lon) ; 
        depth_bnds:standard_name = "depth_bounds" ; 
        depth_bnds:long_name = "Depth of Layer\'s Top and Bottom Below Surface" ; 
        depth_bnds:units = "1" ; 
        depth_bnds:positive = "down" ; 
        depth_bnds:comment = "bnds=0 for the top of the layer, and bnds=1 for the bottom of the layer" ;

float depth(levlak) ;
    depth:standard_name = "depth_below_surface" ;
    depth:long_name = "Depth of Vertical Layer Center Below Surface" ;
    depth:units = "m" ;
    depth:positive = "down" ;
    depth:axis = "Z" ;
    depth:bounds = "depth_bnds" ;

float watertemp(time, levlak, lat, lon) ;
            watertemp:standard_name = "watertemp" ;
    watertemp:long_name = "Temperature of Lake Water" ;
    watertemp:units = "K" ;
    watertemp:_FillValue = 1.e+20f ;
    watertemp:missing_value = 1.e+20f ;

// global attributes: :contact = "Name email@place.com"; :institution = "Institution of Affiliation (ACRONYM)" ; :comment = "Data prepared for ISIMIP3b" ;

Requirements for variables with varying vertical layers

For variables with a fixed number of vertical layers that vary over time and/or over space (e.g. layers that can get deeper or shallower over time or have different depths at different locations), we request the following additional attributes:

If you want to introduce lower and upper boundaries to every level, you should also introduce a boundaries dimension (depth_bnds) and an index (bnds). In this case the following applies:

Specific uses: For variables where depth of layers varies over time, add global attribute time_varying_layer_depth and use label "true" or "false" depending on the case. For variables where depth of layers varies per grid cell, add global attribute location_varying_layer_depth and use label "true" or "false" depending on the case.

Example NetCDF header for variables with levels varying over time and space

Note: The level index "levlak" is used in lakes sector only. "depth" can either depend on space or time and space depending on the models capabilities:

Lines with a trailing triple asterisk indicate the optional usage of layer boundaries.

dimensions: time = UNLIMITED ; bnds = 2 ; levlak = 13 ; lat = 360 ; lon = 720 ;

variables: double time(time) ; time:standard_name = "time" ; time:long_name = "Time" ; time:units = "days since 1661-01-01 00:00:00" ; time:calendar = "proleptic_gregorian" ; time:axis = "T";

double lon(lon) ; lon:standard_name = "longitude" ; lon:long_name = "Longitude" ; lon:units = "degrees_east" ; lon:axis = "X" ;

double lat(lat) ; lat:standard_name = "latitude" ; lat:long_name = "Latitude" ; lat:units = "degrees_north" ; lat:axis = "Y" ;

float levlak(levlak) ; lev:standard_name = "water_layer" ; lev:long_name = "Vertical Water Layer Index" ; lev:units = "-" ; lev:axis = "Z" ; lev:positive = "down" ;

float bnds(bnds) ; bnds:positive = "down" ;

float depth_bnds(time, bnds, levlak, lat, lon) ; depth_bnds:standard_name = "depth_bounds" ; depth_bnds:long_name = "Depth of Layer's Top and Bottom Below Surface" ; depth_bnds:units = "1" ; depth_bnds:positive = "down" ; depth_bnds:comment = "bnds=0 for the top of the layer, and bnds=1 for the bottom of the layer" ;

float depth(time, levlak, lat, lon) ; depth:standard_name = "depth_below_surface" ; depth:long_name = "Depth of Vertical Layer Center Below Surface" ; depth:units = "m" ; lev:axis = "Z" ; depth:missing_value = 1.e+20 ; depth:_FillValue = 1.e+20f; depth:positive = "down" ; depth:bounds = "depth_bnds" ;

float soilmoist(time, levlak, lat, lon) ; soilmoist:standard_name = "soilmoist" ; soilmoist:long_name = "Total Soil Moisture Content" ; soilmoist:units = "kg m-2" ; soilmoist:missing_value = 1.e+20f; soilmoist:_FillValue = 1.e+20f ;

// global attributes: :contact = "Name email@place.com"; :institution = "Institution of Affiliation (ACRONYM)" ; :comment = "Data prepared for ISIMIP2b" ; :time_varying_layer_depth = "true" ; :location_varying_layer_depth = "true" ;

Reading ASCII data time series into NetCDF

Here are instructions for converting your global daily ASCII data into a NetCDF file with the required meta data. The tools cdo and ncatted (from NCO) are needed (see links above).

cdo --history -f nc4c -z zip -setmissval,1e+20 -setunit,"UNIT" -setname,VARIABLE -setreftime,1661-01-01,00:00:00,1days -settaxis,STARTYEAR-01-01,00:00:00,1days -input,grid.txt data.nc4 < data.txt

ncatted -O -h -a contact,global,o,c,"NAME " -a institution,global,o,c,"INSTITUTION (SHORT)" -a long_name,VARIABLE,o,c,"VARIABLE LONG NAME" data.nc

Cookies disclaimer Our site saves small pieces of text information (cookies) on your device in order to deliver better content and for statistical purposes. You can disable the usage of cookies by changing the settings of your browser. By browsing our website without changing the browser settings you grant us permission to store that information on your device. Read more