SNAP Engine/CEP Specification
Goals
An essential SNAP requirement is allowing users to develop their own applications based on SNAP Java libraries. Very often, such applications are run in headless environments, without a GUI such as it is the case for batch-mode processing or web processing services. In BEAM and in the Sentinel Toolboxes 1.x it was relatively simple to collect a classpath from all the libraries because all binaries have been included in the lib
and modules
sub-directories. Since SNAP 2, libraries are distributed in the installation directory according to NetBeans clusters. It is no longer straight forward for users to collect required libraries form a NetBeans installation structure.
Also, the NetBeans platform allows only for a single module runtime instantiation which means, only a single NetBeans application can be launched from an installation directory (unless it is invoked with a different user directory - TBC). But in a headless environment users don't need and don't want the NetBeans module runtime at all. The ultimate goal is therefore allowing users to use the SNAP Engine modules and their extension modules independently of NetBeans and of other unwanted (GUI) modules. Along with that, it should be possible to run any number of non-NetBeans applications on a single SNAP 2 installation directory.
Background and strategic fit
The STEP Cloud Exploitation Platform, CEP, is a component that requires a headless use of the SNAP Java libraries. For example, it should be possible to directly use SNAP Java libraries for the implementation of Hadoop MapReduce jobs.
The SNAP Engine itself has at least two applications:
- The Graph Processing Tool, GPT, a command-line tool
- The Product Converter, PConvert, another command-line tool
Assumptions
Requirements
# | Title | User Story | Importance |
---|---|---|---|
1 | Multiple apps | Users shall be able to write their own applications using Engine libraries (JARs). For Java implementations, provide a Launcher which will invoke the client's main() method in the context of the dynamically created classpath from configured Engine JARs. For Python implementations, provide an Engine API which can be called via the Python-Java bridge in order to run SNAP API in the ontext. | |
2 | Dynamic classpath | The Engine shall be able to dynamically create a client class path to be used by client code. The Engine API shall allow for accessing it. | |
3 | Collect native libraries | The Engine shall be able to dynamically detect any directories containing supplemental native libraries (based on JNI) and modify "java.library.path" accordingly. | |
4 | Module lifecycle | (Engine) modules shall offer services that are informed on Engine start/stop so that they can perform any module lifecycle actions (similar to OSGi Activator class, NetBeans Installer class). | |
5 | Multiple tool instances | It shall be possible to have multiple instances of Engine tools running at the same time, e.g. 5 GPT invocations running in parallel. | |
6 | OSGi | Any Engine API design shall be compatible with a later migration of the Engine modules to OSGi | Desired |
User Interaction and Design
Loading of Properties
- Properties shall be loaded from the following places by default
- <SNAP_HOME>/etc/<cluster>.properties
- <SNAP_USER_DIR>/etc/<cluster>.properties
- Properties file given on command line
On the command line it shall be possible to specify a properties file. Properties given here overwrite properties loaded from SNAP_USER_DIR or SNAP_HOME.
The command line option could be
-c /path/to/special.properties
- Properties given as VM parameters
These properties shall have the highest priority and overwrite any other property loaded.
Run GPT from the Installation Directory
In the installation directory in the bin folder shall be an executable script named gpt.bat/.sh
. The script will start the runtime as before in BEAM. The runtime needs to lookup the jar files from the multiple modules
directories of the multiple clusters. We have two options here:
- BruteForceClasspath
If we use the BruteForceClasspathFactory, which simply puts all jars on the class path, we will lose the flexibility that two modules can have the same dependency to a 3rd-party library but in different versions. This will not be possible.
Also there are to many modules on the classpath. Only engine modules need to be on the classpath. How to identify these modules? BootstrapClasspath (Module Runtime)
By using this classpath we could have separate classpaths for each module which would allow to have 3rd-party library in different versions, if implemented. But parsing of dependencies need to be done onmanifest.mf
ormodule.xml
needs to be reactivated.
The following two issue need to be implemented in both cases:
- Native libraries need to be properly put on the
java.library.path
. They are located in <cluster>/modules/lib. The same mechanism as NetBeans to resolve native libs should be used. - A simple module lifecycle (start/stop). This needs to be compatible with NetBeans activation mechanisms.
Run GPF on Calvalus
On Calvalus gpt is not used. Instead GPF is called directly via API. The executable code (the modules) is provided on Calvalus as bundle. When a process is invoked on Calvalus all jar files in this bundle are put on the classpath via the Java VM -cp option. Native libraries are also provided with this bundle and it is ensured by the system operator/bundle provider that the libraries are properly provided to the invoked process. Properties are set given in the bundle descriptor or are provided with the processing request. What needs to be done on Calvalus before the SNAP engine is used, is that the start methods of the modules are invoked and afterwards the stop methods.
Questions
Below is a list of questions to be addressed as a result of this requirements document:
Question | Outcome |
---|---|