Standalone Tool Adapter GPF/GPT Use Case

Purpose

Graphs created using the BEAM GPF must consist of nodes that invoke an implementation of an org.esa.beam.framework.gpf.Operator. Currently, operators can be only be implemented using the dedicated Java API. There is a Python adapter, but actually it also uses the Java API by bridging from the Python VM into the JVM using jpy.

We are often faced with the requirement that a certain EO data processing workflow comprises not only BEAM GPF operators but also external executables that read, convert, process or write EO data. We end up writing glue code - usually Python scripts, that first call a GPF graph, then call the executable, then invoke another GPF graph or operators and so force.

So the use case here is to treat any executable in the same way as the GPF-native operator nodes. We would like to be able

  • invoke them using GPT and hereby using same command-line parameterisation scheme;
  • use them in any graph XML and hereby using the same node configuration scheme;
  • use them as available nodes in the Graph-Builder tool and hereby using (their generated or provided) Java Swing GUIs for parameterisation.

Requirements

Adapter Requirements

In order to provide these capabilities to the SNAP, such an operator adapter for stand-alone executables should reuse or support the "standard" GPF operator descriptor schema which describes the executable's editable parameters, it's sources and optionally it's target. 

Consider also the following:

  • An executable capable of reading a data format not supported by SNAP can take the role of a "read" node and can therefore be used as first/source node in a graph.
  • An executable capable of writing a data format not supported by SNAP can take the role of a "write" node and can therefore be used as last/target node in a graph.
  • An executable capable of reading and writing data formats supported by SNAP can be used anywhere in a graph because it can read it's predecessor's output file(s) and write is successor's input file(s).

  • An executable may be configured to simply pass through it's passed-in source products. This may be useful if the executable can neither read nor write any of SNAP's recognised data formats but still output anything of interest.

At a minimum, we would like to configure any executable in the following way to make it usable as a GPF graph node:

  1. operator descriptor XML file - (similar to what we use for the GPF Python integration).
  2. a configuration file which
    1. contains or points to a velocity template file used used by the operator adapter to generate the command-line used for the invocation of the executable.
    2. contains or points to zero, one or more velocity templates used by the operator adapter to generate text input files to be read by the executable.
    3. tells the operator adapter how to parse progress from the executable's output on stdout, a log file or any other file written by the executable, e.g. by a pattern applied to lines of stdout
    4. tells the operator adapter how to distinguish success and failure of processing, e.g. by the return code of the executable
    5. tells the operator adapter what the name(s) of it's generated output file(s) will be, or how the target product(s) can be detected after processing, e.g. by a pattern that may use elements of the input name
    6. tells the operator adapter about temporary/intermediate outputs, e.g. by a uniquely named working directory that can be deleted by the adapter after processing
  3. an environment for the executable where it finds its (software) installation and optionally auxiliary data, e.g. relative to the path of the executable

Nothing shall inhibit the executable to be executed in parallel several times on different inputs even by the same GPF adapter. This requires that the executable is implemented in a reentrant way, i.e. it must not modify its own software tree during processing, and it shall only write to the working directory named above.

Although executables may be available for several platforms, it's configuration for the stand-alone tools adapter should be the same on all the supported platforms. The adapter shall care for the differences.

Derived GPF Requirements

  • New org.esa.snap.framework.gpf.ExecOperator class. 
  • Operators should be made executable with progress monitor: void run(ProgressMonitor pm)

Design Ideas

The actual operator metadata configuration would be:

Operator Metadata
<!-- This XML file describes the interface of the executable operator. It defines the required source product(s) and the parameters
     for the processing. By using the information of this file the graphical user interface is automatically generated and also
     the help on the command line is derived from the information provided here.
-->
<operator>
    <!-- The name uniquely identifies the operator within SNAP -->
    <name>org.esa.snap.s3tbx.ArcSstOp</name>
    <!-- The alias is a more user-friendly name, e.g. to be used on the command line -->
    <alias>ARC_SST</alias>
    <operatorClass>org.esa.snap.framework.gpf.ExecOperator</operatorClass>
    <version>1.0</version>
    <authors>University of Reading, UK</authors>
    <copyright>(C) 2014 University of Reading</copyright>
    <description>
        The ARC SST Processor
    </description>
    <namedSourceProducts>
        <!-- One or more source products can be specified.
             In the GUI only one is currently supported. On the command line multiple source products
             can be specified by referencing them with the here defined names.
         -->
        <sourceProduct>
            <name>source</name>
        </sourceProduct>
    </namedSourceProducts>
    <parameters>
        <parameter>
            <!-- The name of the parameter; by this name the specified value can be retrieved in the python implementation -->
            <name>lowerFactor</name>
            <!-- The description is shown in the help on the command line and also as tooltip in the GUI -->
            <description>The value of the lower band is multiplied by this value.</description>
            <!-- The type of the parameter; can be boolean, byte, short, int, long, float, double, java.lang.String -->
            <dataType>double</dataType>
            <!-- The default value of the parameter; this is used if no value is specified by the user -->
            <defaultValue>1.0</defaultValue>
        </parameter>
        <parameter>
            <name>lowerName</name>
            <description>The name of the spectral band with the lower (red) wavelength.</description>
            <!-- The label use used in the graphical user interface before the, if not given the name of the parameter is
                 converted into a human readable label
            -->
            <label>Lower band name</label>
            <dataType>java.lang.String</dataType>
            <defaultValue>radiance_7</defaultValue>
        </parameter>
        <parameter>
            <name>upperFactor</name>
            <description>The value of the upper band is multiplied by this value.</description>
            <dataType>double</dataType>
            <defaultValue>1.0</defaultValue>
        </parameter>
        <parameter>
            <name>upperName</name>
            <description>The name of the spectral band with the higher (NIR) wavelength.</description>
            <label>Upper band name</label>
            <dataType>java.lang.String</dataType>
            <defaultValue>radiance_10</defaultValue>
        </parameter>
    </parameters>
</operator>

See also https://github.com/bcdev/beam/tree/master/beam-python/src/main/resources/beampy-examples/beampy-ndvi-operator

The following XML code configures the executable adapter. It could be an extension to the operator XML above or be put into a separate file:

<executable-adapter>
    <environment-variables>
        <environment-variable name="ARC_HOME" value="/home/proc/arc" />
    </environment-variables>
    <templates>
        <template>
            <input>templates/auxprep.vm</input>
            <output>work/aux-get.py</output>
            <role>exec-before</role>
        </template>
        <template>
            <input>templates/params.vm</input>
            <output>work/params.txt</output>
            <role>source</role>
        </template>
        <template>
            <input>templates/auxdel.vm</input>
            <output>work/aux-del.py</output>
            <role>exec-after</role>
        </template>
    </templates>
    <command-line-executable>
        $ARC_HOME/bin/arc_processor.py ${inputFile} -o ${outputFile} -params work/params.txt
    </command-line-executable>
    <command-line-parsers>
       <progress-total>regexp-pattern</progress-total>
       <progress-percent>regexp-pattern</progress-percent>
       <progress-count>regexp-pattern</progress-count>
       <status-ok>regexp-pattern</status-ok>
       <status-failed>regexp-pattern</status-failed>
       <error-message>regexp-pattern</error-message>
       <!-- sometimes the actual output filename is not known, but the executable might output it -->
       <output-file>regexp-pattern</output-file>
    </command-line-parsers>  
</executable-adapter>