Discrepancy in interferograms

Description

Run the graph (see attached) with one master ("Read") and four slaves ("ProductSet-Reader"). Then replace "ProductSet-Reader" with "Read(2)" and run again with the same master and one of the four slaves from before. There is a slight discrepancy in the output interferograms from the two runs.

Environment

None

GitHub Work

None

Attachments

5
  • 06 Dec 2017, 05:38 pm
  • 16 Nov 2017, 07:31 pm
  • 16 Nov 2017, 07:31 pm
  • 16 Nov 2017, 07:31 pm
  • 16 Nov 2017, 07:11 pm

Activity

Show:

Tim Moorhouse18 December 2017 at 20:06

The outputs are not bit-for-bit identical on each run, but the small numerical differences have been explained (and are on the order of a relative difference of 1e-7, close to the precision of 32-bit floats). Removing these differences would mean making the enhanced spectral diversity run in a deterministic order (instead of the multithreaded implementation that exists now), hindering performance.

Marcus Engdahl7 December 2017 at 10:41

If I understand this correctly the FFT/inverse-FFT used causes small fluctuations in outputs depending on the order in which threads are computed. What is the FFT-library that has been used? Is this a known issue with multithreaded FFTs, or is the one we are using not implemented well? Are other FFT-implementations already in use within SNAP and do they share this same issue?

In any case the differences are tiny, but still I don't think this should be happening.

Tim Moorhouse6 December 2017 at 18:16

I've attached the exact graph (a copy of Cecilia's with the input and output paths changed) I've been using to investigate this issue. The master and slave products were downloaded from scihub and the following pre-processing steps performed manually:

  • Radar/Sentinel-1 TOPS/S-1 TOPS Split (using subswath of IW3 instead of the default IW1)

  • Radar/Apply Orbit File

Running just the Debug_Chain1 graph can produce numerical differences from one run to the next. In particular, the maximum difference of 0.002 noted previously can occur at the pixel at (x,y)-coordinates (14656,501) in the output image, which corresponds to (21632,4471) in the input products (ie, prior to the subset operator). For this pixel, two outputs can occur during different runs: 30022.55298919551 or 30022.550978616448 (with an absolute difference of around 0.002). Note that the output uses 32-bit floats, which have about 7.2 digits of precision in the significand.

The source of the discrepancy has been identified as the Enhance Spectral Diversity operator. Two steps in this operator are of note, the estimation of the azimuth offset, and the subsequent inverse FFT. Both of these steps have multi-threaded implementations. The order in which these threads execute will affect the order in which intermediate results are combined, and small numerical differences can result. Both of these steps are performed on 64-bit floats, which have about 15.9 digits of precision, and the spectral diversity operator for this pixel is producing a value of either 112.04737472534177 or 112.0473747253418.

Most pixels of the output are producing identical values on all runs, but for those where the different 64-bit float output values result in a different value after rounding to a 32-bit float will be different in the output product. In order to produce bit-for-bit identical outputs from the Enhanced Spectral Diversity operator on each run and make the output completely deterministic, the multi-threaded implementation would need to be abandoned, which would impact performance.

Although the absolute difference in the outputs in this example was as large as 0.002, the relative difference was always much lower, near the limit of precision of 32-bit floats.

Cecilia Wong17 November 2017 at 01:01

When running the simplest Full_graph, Chain1_only_graph and
Chain2_only_graph using SNAP v5, we see the same discrepancy we see
using SNAP v6 preview 5. That is, there is discrepancy between the output
from Full_graph and Chain1_only_graph but no discrepancy between output
of Full_graph and Chain2_only_graph.

Also, in the simplest graphs, it does not even take 4 slaves (using Product Set Reader) for the problem to appear. We can see the discrepancy with just one slave (using Read).

Cecilia Wong16 November 2017 at 19:33

Since the graph is quite complicated, to isolate where the problem could be, we have tried to reduce the graph to as simple as possible but still see a discrepancy.

After much trial and error, the simplest case where we can see a discrepancy is one where we use only one slave and we have removed Deburst and Topo phase removal from the top chain of operators.

We will call the top chain of operators "chain 1" and the bottom chain "chain 2".

We then compare the output from Full_graph with two other graphs where only one of the two chains is present.

We found that the slave output images from "chain 2" using Full_graph and Chain2_only_graph are the same within an error of < 1E-5 which is expected with the accuracy of float32. The master output images are completely identical.

The output images from "chain 1" using Full_graph and Chain1_only_graph are compared and found to have a largest discrepancy of 0.002.

It looks like the presence of "chain 2" in Full_graph somehow affects the output of "chain 1".

Done

Details

Assignee

Reporter

Fix versions

Components

Priority

Created 16 November 2017 at 19:11
Updated 10 February 2023 at 14:56
Resolved 10 February 2023 at 14:47