1

I am trying to access some satellite data on a read only archive. I am only interested in files with certain co-ordinates listed in a .xml within the zip that match my research area.

There are multiple files per day of the year. Currently I am focusing on a folder 2015/07. This has a separate folder for each day of the month in it. Each day folder contains lots of .zip files and other file types.

The naming convention/structure of the zip files is always the same so that the .zip file name is used in all of it’s contained files – with the suffixes/file extensions changing like below:

$unzip -l S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.zip                                                                                                     Archive:  S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  07-08-2015 15:05   S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/
    16099  07-08-2015 15:04   S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/manifest.safe
        0  07-08-2015 15:05   S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/measurement/
861899961  07-08-2015 15:05   S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/measurement/s1a-iw-grd-vv-20150701t135110-20150701t135135-006618-008d39-001.tiff
        0  07-08-2015 15:04   S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/annotation/
  1685172  07-08-2015 15:04   S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/annotation/s1a-iw-grd-vv-20150701t135110-20150701t135135-006618-008d39-001.xml
        0  07-08-2015 15:04   S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/annotation/calibration/
  1013267  07-08-2015 15:04   S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/annotation/calibration/calibration-s1a-iw-grd-vv-20150701t135110-20150701t135135-006618-008d39-001.xml
   317418  07-08-2015 15:05   S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/annotation/calibration/noise-s1a-iw-grd-vv-20150701t135110-20150701t135135-006618-008d39-001.xml
        0  07-08-2015 15:05   S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/preview/
     2437  07-08-2015 15:04   S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/preview/product-preview.html
   124584  07-08-2015 15:05   S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/preview/quick-look.png
        0  07-08-2015 15:05   S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/preview/icons/
    95280  07-08-2015 15:05   S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/preview/icons/logo.png
     1026  07-08-2015 15:04   S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/preview/map-overlay.kml
    20088  07-08-2015 15:04   S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE-report-20150701T155156.pdf
        0  07-08-2015 15:05   S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/support/
      440  07-08-2015 15:04   S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/support/s1-product-preview.xsd
      450  07-08-2015 15:04   S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/support/s1-map-overlay.xsd
      471  07-08-2015 15:04   S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/support/s1-level-1-measurement.xsd
    62654  07-08-2015 15:04   S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/support/s1-object-types.xsd
      469  07-08-2015 15:04   S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/support/s1-level-1-quicklook.xsd
     6427  07-08-2015 15:04   S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/support/s1-level-1-calibration.xsd
   147222  07-08-2015 15:04   S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/support/s1-level-1-product.xsd
     3956  07-08-2015 15:05   S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/support/s1-level-1-noise.xsd

.So if I chose one day of the month I can check each the coordinates in each .kml file using:

unzip -p  S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.zip S1A_IW_GRDH_1SSV_20150701T135110_20150701T135135_006618_008D39_BE79.SAFE/preview/map-overlay.kml`  

To give the contents of the whole .kml file:

<?xml version="1.0" encoding="UTF-8"?>0_20150701T135135_006618_008D39_BE79.SAFE<kml xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:gml="http://wwsa.int/safe/sentinel-1.0/sentinel-1" xmlns:s1sar="http://www.esa.int/safe/sentia.int/safe/sentinel-1.0/sentinel-1/sar/level-2" xmlns:gx="http://www.google.com
  <Document>
    <name>Sentinel-1 Map Overlay</name>
    <Folder>
      <name>Sentinel-1 Scene Overlay</name>
      <GroundOverlay>
        <name>Sentinel-1 Image Overlay</name>
        <Icon>
          <href>quick-look.png</href>
        </Icon>
        <gx:LatLonQuad>
          <coordinates>-115.928909,35.970608 -118.750404,36.374107 -118.459686,
        </gx:LatLonQuad>
      </GroundOverlay>
    </Folder>
  </Document>
</kml>

However, I need to do this for every day of 2015 and 2016 so what I would like to do is: Loop through the zip files and print the name of the .zip file and the line from the contained .xml file which has the co-ordinates in it:

<coordinates>-115.928909,35.970608 -118.750404,36.374107 -118.459686,
    </gx:LatLonQuad>

I don’t expect anyone to completely write this for me but a bit of help starting off would be helpful.

Eric Renouf
  • 18,431
squar_o
  • 125

1 Answers1

1

Start with something like this:

for zf in *.zip ; do
  base=${zf/\.zip/}

  echo "$zf"

  unzip -p "$zf" "$base.SAFE/preview/map-overlay.kml" | 
    sed -ne '/<gx:/,/<\/gx:/p'

done

This pipes the .../map-overlay.kml file from each .zip file into sed, which prints only the lines between <gx: and </gx:.

Alternatively, if you only want the <coordinates> line, change the sed script to:

sed -ne '/<coordinates>/p'

Note, however, that while these sed scripts work with your sample data, even a simple extraction of a few lines from an XML file is prone to failure if you use regular expressions to do the extraction. It would be remiss of me not to say:

Don't parse XML or HTML with regular expressions. Here's why it doesn't work.

Using xmlstarlet would be better. A perl or python script, using one of their XML-parsing libraries would be even better. BTW, both perl and python also have library modules for working with .zip files....so the whole job could be done in either of those languages.

cas
  • 78,579
  • Thank you @cas. I was wondering whether python might be a better option so thanks for the confirmation. – squar_o Jun 08 '16 at 08:37