Bulk Processing with GPT

Bulk Processing with GPT

This little tutorial gives an introduction on bulk processing with the command shell on Windows and Unix systems. The provided scripts try to stay very generic in order to serve multiple processing requirements. However, not every edge case can be covered. The intention is to cover at least the main use cases. The scripts can probably be improved at multiple points but they can give you a starting point to write your own scripts. If you know improvements to the scripts or have questions regarding the usage of the script you are kindly invited to the SNAP Forum.
A general introduction to GPT and graphs can be found at Creating a GPF Graph.

The four files mentioned below are attached for download.

Table of Contents

The Windows Script

The Unix Script

Description of the Scripts

  1. Unix: The first line tells the shell what interpreter to use to run the script. Here it is bash.
    Windows: The first thing done in the script is to enable delayed expansion. This allows the evaluation of variables within a loop and is needed when iterating over the source products later on.

  2. Next the path to the gpt batch file is specified. This script is later called to process the data products. The path to the script has to be adapted by the user.

  3. The five parameters which must be given to the script are stored in variables for easier readability.

    1. The path to the XML graph file which defines the processing graph performed on the source data product.

    2. The path to a parameter file. The parameters specified in this file are used to configure graph file. The parameters can be given in the plain properties format format.

    3. The path pointing to the directory which contains the source products.

    4. The path pointing to the directory where the processed data shall be placed.

    5. A file prefix in order to alter the name of the source product and indicate the type of processing.

  4. Unix: Some helper functions are defined. They are later used in the main processing section.

  5. An output directory is created to keep hold of the processed data.

  6. Now the iteration over all source products starts. Here only products with the extension 'SAFE' and the prefix 'S2' are considered.

    • The absolute path of the current source file is retrieved.

    • The path to the target file is compiled using the target directory, the file prefix and the name of the source product without extension. The file extension for the BEAM-DIMAP format is always appended.

    • The command line is assembled using the path to the gpt.exe file, the XML graph file, the parameter file and the source file and target file. The -e option is added in order to get longer messages in case of an error.

    • As last step the command line is executed.

The loop used in the windows batch script is specific for looping recursively over folders in a directory. If the loop should only consider specific files in a source directory the sytanx can be simpler.

for %%F in (%sourceDirectory%\*.nc) do (

)

This will loop over all *.nc files within the specified source directory.

Known Limitations of the Scripts

  • Naming of the target product is limited. Use cases might be that the name should remain, or a more complex pattern should be applied.

  • Which products of the source directory are used for processing is currently hard coded (only with '.SAFE' extension and 'S2' prefix). This should be configurable.

  • The format of the target product is not configurable.

Example Usage

A set of input Sentinel-2 products shall be processed with the Resample processor.
Therefore a XML graph is defined. For the resampling parameters variables are used. The variables are set in a properties file. This makes it easier to change them. It is also possible to write them directly into the XML graph file, as done for the resampleOnPyramidLevels.

XML Graph File for S2 Resampling (resample_s2.xml)
<graph id="Resample_Sentinel-2"> <version>1.0</version> <node id="resample-s2"> <operator>Resample</operator> <sources> <sourceProduct>${sourceProduct}</sourceProduct> </sources> <parameters> <targetResolution>${resolution}</targetResolution> <upsampling>${up}</upsampling> <downsampling>${down}</downsampling> <flagDownsampling>${flag}</flagDownsampling> <resampleOnPyramidLevels>false</resampleOnPyramidLevels> </parameters> </node> </graph>

When entering expressions directly in the XM file then they often need to be escaped. Some characters have special meanings in XML. If they are used for other purposes, for example in an expression they need to be escaped.

These characters are: “ ' < > &

An expression like this: B8 > 0 && B8 < 0.1

Needs to be written like: B8 %gt; 0 &amp;&amp; B8 &lt; 0.1

More details are on Stackoverflow.

 

In the parameter file the 4 used parameters are defined.

Parameters File (resample_20m.properties)
resolution=20 down=Min up=Bicubic flag=First

Now as we have all information we can call the scripts from the command line.

On Windows it is assumed that the source products are located in the directory 'C:\Eodata\toProcess' and the XML graph file and the parameter are in the same directory as the batch file. The processed files will go to 'C:\Eodata\toProcess\output' and have the prefix resampled20m

>processDataset.bat resample_s2.xml resample_20m.properties C:\Eodata\toProcess C:\Eodata\toProcess\output resampled20m


On Unix the directory of source products is assumed to be '/Eodata/toProcess' while the XML graph file and the parameter are in the same directory as the script file. The processed files will go to '/Eodata/toProcess/output' and have the prefix resampled20m

>processDataset.bash resample_s2.xml resample_20m.properties "/Eodata/toProcess" "/Eodata/toProcess/output" resampled20m