Using convolutional neural networks part 1

Strategies for exporting tfrecords from the Google Earth Engine. An example for mapping clouds, cloud shadow, water, ice and land.

The more conventional machine learning algorithms can be found in under ee.Classifier and require point data as input.

Example of point data collection used for a random forest classifier in the Google Earth Engine. See the example in this post.

For Convolutional Neural Networks we need image patches. The image patch should contain the image and the labels.

The landsat image and the labels for clouds, shadow and water.

We used the SPARCS dataset as an example here. The data can be found on the website below.

Use the code below to import the data in the Google Earth Engine or click here

/*Copyright (c) 2021 SERVIR-Mekong
  
Permission is hereby granted, free of charge, to any person obtaining a copy
of the data and associated documentation files, to deal in the data
without restriction, including without limitation the rights to use, copy, modify,
merge, publish, distribute, sublicense, and/or sell copies, and to permit persons
to whom the data is furnished to do so, subject to the following conditions:
  
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
  
THE DATA IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.*/

var sparcs = ee.ImageCollection('projects/gmap/datasets/manual_qaMasks/sparcs_masks');

var sparcIm = ee.Image(sparcs.toList(100).get(1));

var l8 = ee.ImageCollection("LANDSAT/LC08/C01/T1_RT_TOA").filterBounds(sparcIm.geometry());
var l8Image = l8.filterDate(sparcIm.date().advance(-6,'second'),sparcIm.date().advance(6,'second'));

var img = ee.Image(l8Image.first()).select(['B1','B2', 'B3', 'B4', 'B5', 'B6', 'B7','B8','B9']);

Map.centerObject(img);
Map.addLayer(ee.Image(img).clip(sparcIm.geometry()),{min:0,max:0.3000,bands:"B4,B3,B2"},"landsat");

Map.addLayer(sparcs.select("b1"),{min:0,max:1},'sparc clouds');
Map.addLayer(sparcs.select("b2"),{min:0,max:1},'sparc shadow');
Map.addLayer(sparcs.select("b3"),{min:0,max:1},'sparc ice ');
Map.addLayer(sparcs.select("b4"),{min:0,max:1},'sparc water');

You will need around a large number of training samples to train a model successfully. We recommend to use at least around 100,000 training sample to train a model. With a limited number of patches you could apply different strategies to increase the number of samples.

It is recommended to create multiple training records from every patch around the feature of interest. This (1) will increase the number of samples and (2) will train the model to recognize the feature from different angles.

By placing random points on the image patch we increase the number of training samples and also have sufficient overlap beween the training patches. For the SPARC dataset with 80 patches we could sample 80,000 training patches using 100 samples per patch. see the example here
/*Copyright (c) 2021 SERVIR-Mekong
  
Permission is hereby granted, free of charge, to any person obtaining a copy
of the data and associated documentation files, to deal in the data
without restriction, including without limitation the rights to use, copy, modify,
merge, publish, distribute, sublicense, and/or sell copies, and to permit persons
to whom the data is furnished to do so, subject to the following conditions:
  
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
  
THE DATA IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.*/

var sparcs = ee.ImageCollection('projects/gmap/datasets/manual_qaMasks/sparcs_masks');

var sparcIm = ee.Image(sparcs.toList(100).get(1));

var l8 = ee.ImageCollection("LANDSAT/LC08/C01/T1_RT_TOA").filterBounds(sparcIm.geometry());
var l8Image = l8.filterDate(sparcIm.date().advance(-6,'second'),sparcIm.date().advance(6,'second'));

var img = ee.Image(l8Image.first()).select(['B1','B2', 'B3', 'B4', 'B5', 'B6', 'B7','B8','B9']);

// create a negative buffer for the geometry
var geom = sparcIm.geometry().buffer(-4000);
// create 100 random samples in the object
var rand = ee.FeatureCollection.randomPoints(geom, 50,1);

// create sqaure patches for vizualization
// a buffer of 128 (pixels) x  30 (meter) means a training patch will represent 256 pixels.
var squares = rand.map(function(feat){
  return feat.geometry().buffer(128*30).bounds();
  });


Map.centerObject(img);
Map.addLayer(ee.Image(img).clip(sparcIm.geometry()),{min:0,max:0.3000,bands:"B4,B3,B2"},"landsat");
Map.addLayer(squares,{},"samples")
Map.addLayer(sparcs.select("b1"),{min:0,max:1},'sparc clouds',false);
Map.addLayer(sparcs.select("b2"),{min:0,max:1},'sparc shadow',false);
Map.addLayer(sparcs.select("b3"),{min:0,max:1},'sparc ice ',false);
Map.addLayer(sparcs.select("b4"),{min:0,max:1},'sparc water',false);

Typically we want 70% of the data to be used for training, 20% for testing and 10% for validation. Below you can find an example in python to batch export the samples. We prefer to use the Python api for those purposes as we can only export a limited amount amount of points per task and python enables us to automate this process.

'''
/*Copyright (c) 2021 SERVIR-Mekong
  
Permission is hereby granted, free of charge, to any person obtaining a copy
of the data and associated documentation files, to deal in the data
without restriction, including without limitation the rights to use, copy, modify,
merge, publish, distribute, sublicense, and/or sell copies, and to permit persons
to whom the data is furnished to do so, subject to the following conditions:
  
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
  
THE DATA IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
'''

import ee
from time import sleep
import math
import numpy as np
import random
from numpy.random import seed
from numpy.random import rand

ee.Initialize()

# Produces a kernel of a given sized fro sampling in GEE
def get_kernel (kernel_size):
    eelist = ee.List.repeat(1, kernel_size)
    lists = ee.List.repeat(eelist, kernel_size)
    kernel = ee.Kernel.fixed(kernel_size, kernel_size, lists)
    return kernel

# import the label image collection
sparcs = ee.ImageCollection('projects/gmap/datasets/manual_qaMasks/sparcs_masks')

# Define kernel size 
kernel_size = 256
image_kernel = get_kernel(kernel_size)

# Specify inputs (Landsat bands) to the model and the response variable.
opticalBands = ['B2', 'B3', 'B4', 'B5', 'B6', 'B7']

BANDS = opticalBands
RESPONSE = ['cloud','shadow','snow','water','land']
FEATURES = BANDS + RESPONSE

outputBucket = "myoutputbucket"
folder = "myoutputfolder"

for i in range(0,80,1):
    
    # get the sparc image
    sparcIm = ee.Image(sparcs.toList(100).get(i)).select(['b1','b2','b3','b4','b5'],RESPONSE)
    
    # generate a negative buffer should be at least 128 pixels x 30 meter
    geom = sparcIm.geometry().buffer(-4000) 

    # create training, testing and validation points, numbers 31 and 17 are used for the random seed.
    pointsTrain = ee.FeatureCollection.randomPoints(geom, 7,i)
    pointsTest = ee.FeatureCollection.randomPoints(geom, 2,i*31)
    pointsVal = ee.FeatureCollection.randomPoints(geom, 1,i*17)
    
    # get the landsat 8 imagery of the SPARC imagery
    l8 = ee.ImageCollection("LANDSAT/LC08/C01/T1_RT_TOA").filterBounds(sparcIm.geometry())
    l8Image = ee.Image(l8.filterDate(sparcIm.date().advance(-3,'second'),sparcIm.date().advance(3,'second')).first())
    
    # select the relevant bands
    img = l8Image.select(opticalBands)
    
    # combine the image with the sparc image
    image = img.addBands(sparcIm).unmask(0,False)
    
    # create the neighborhood kernel for sampling
    neighborhood = image.neighborhoodToArray(image_kernel) 

    # sample the training, testing and validation 
    trainingDataTrain = neighborhood.sample(region = pointsTrain,scale= 30,tileScale= 16, geometries= True)
    trainingDataTest = neighborhood.sample(region = pointsTest, scale= 30,tileScale= 16, geometries= True)
    trainingDataVal = neighborhood.sample(region = pointsVal,scale= 30,tileScale= 16, geometries= True)

    # set the output file names
    trainFilePrefix = "/training/train_"+ str(i).zfill(4)	
    valFilePrefix = "/validation/valid" + str(i).zfill(4)
    testFilePrefix = "/testing/test" + str(i).zfill(4)
    
    # setup export task for training
    trainingTaskTrain = ee.batch.Export.table.toCloudStorage(collection= ee.FeatureCollection(trainingDataTrain),
							description= "trainpatch"+str(i),
							fileNamePrefix= folder+ trainFilePrefix,
							bucket= outputBucket,
							fileFormat= 'TFRecord',
							selectors= FEATURES)
							    
      
     # setup export task for testing
    trainingTaskTest = ee.batch.Export.table.toCloudStorage(collection= ee.FeatureCollection(trainingDataTest),
							description= "testpatch"+str(i),
							fileNamePrefix= folder+ testFilePrefix,
							bucket= outputBucket,
							fileFormat= 'TFRecord',
							selectors= FEATURES)
							    
    

    
     # setup export task for validation
    trainingTaskVal = ee.batch.Export.table.toCloudStorage(collection= ee.FeatureCollection(trainingDataVal),
							description= "valpatch"+str(i),
							fileNamePrefix= folder+ valFilePrefix,
							bucket= outputBucket,
							fileFormat= 'TFRecord',
							selectors= FEATURES)
    # execute the tasts
    trainingTaskTrain.start()
    trainingTaskTest.start()
    trainingTaskVal.start()

14 comments

  1. Please provide a blog on the analysis of water quality parametric estimations using multispectral and hyperspectral (DAESIS and PRISMA datasets).

      1. Could you answer the 1st question related to water quality estimation?

      2. Thank you for your reply and I am looking forward to learning your new blogs.

  2. I’m not an expert on water quality remote sensing. But there is a large body of scientific literature on using empirical to more physical based methods to estimate water quality.

    1. Yes, but my research question is confined to using hyperspectral imagery (DAESIS, PRISMA) datasets and Multispectral (Landsat and Sentinel-2) for water quality estimation. So there isn’t any physical and empirical methods like you have discussed. It is purely based on image analysis for water quality estimation. Are you planning to work in that area? Or do you have any other suggestive blogs specific to the problem statement.

Leave a Reply