Spatial Interpolation – IDW
In order to perform IDW, many parameters need to be set. Most importantly, these are the “Neighbor” and “Power” settings. The neighbor parameters defines how many neighboring points will taken into account for each sector while the power setting defines the strength of the influence that known values have on unknown values based on their distance. For the power parameter, this means that weights will be assigned proportionally inverse to the distance taken to the specific power (*).
Since these parameters have to be set on a case by case basis, the suitability and thus the validity of the parameters need to be established. For this purpose, ArcGIS provides the cross-validation method in the Geospatial Wizard in the Inverse Distance Weighting section.
This method works by removing one of the points from the dataset and consequently interpolating the value for this point as if it was unknown. Repeating this for every single datapoint results in the mean error of the predictions. Graphing the measured value for each point against the predicted values results in a graph where under perfect conditions, the points should be on a perfect line. Since some errors are certainly made, the actual points are not along that line but a bit off. Creating a regression line, the error can be made visible by comparing the deviation of the regression line with the perfect 1:1 line.
Practical application of IDW
In this exercise, a point layer with temperature measurements spanning most of Europe as well as some neighboring regions is given. Based on a choice of AOI, the temperatures is interpolated for the whole AOI using the IDW method in ArcGIS Pro.
As the area of interest, the Fenoscandian nations of Sweden, Norway and Finland were chosen. Mainly because of their large north-south extent as well as the climate differences near the Atlantic cost compared to the near-continental climate of eastern Finland, quite large differences in temperature are to be expected. Additionally, in some areas the measurements are quite sparse, which might lead to the clearer manifestation of differences in IDW parameter settings.
Additionally, a 25 km buffer was drawn around the borders of the nations, since many measurement points are just outside of the vector boundary on small islands. Of course the IDW implementation in ArcGIS Pro creates the interpolation values and then clips the interpolated area to the boundary, but it is certainly of interest to enlarge the AOI a few km into the sea in order to visualize how the algorithm deals with the linear placement of the measurement points.
Based on the points and the boundary layer, the IDW was performed multiple times with changing parameters. The interpolation was performed 4 times with varying Neighbor and 4 times with varying Power parameters:
For the visualization, a single color ramp made out of shades of green was chosen. Multiple colors make it harder to visualize the continuety, since putting all colors on a scale of warmer to colder does not arrive naturally to every user. These shades of green seemed to be easily distinguishable as well as easing the comprehension that darker equals wormer. The classes were defined in 3° steps starting from 0 up to 15, which is just above the maximum value.
import numpy as np #import pandas as pd import matplotlib.pyplot as plt from PIL import Image Image.MAX_IMAGE_PIXELS = None paths=[ "tiffs/IDW_Neighbors_12_Power_0,5.tif", "tiffs/IDW_Neighbors_12_Power_2.tif", "tiffs/IDW_Neighbors_12_Power_3.tif", "tiffs/IDW_Neighbors_14_Power_2.tif", "tiffs/IDW-Neighbors_12_Power_1.tif", "tiffs/IDW-Neighbors_16_Power_2.tif", "tiffs/IDW-Neighbors_18_Power_2.tif"] def extract_title(path): name = path[6:-4] name = name.replace("_"," ") return(name) def to_np(path): im = Image.open(path) im_array = np.array(im) im_array = im_array[~np.isnan(im_array)] im_array[im_array == 0] = np.nan return im_array def graph(im_array,name): """ set bins to either same bins as in Arc or per degree """ bins = [0.0001,2,3,4,5,6,7,8,9,10,11,12,13,14,15] #bins = [0.0001,3,6,9,12,15] bins = np.arange(0,16,0.2) # draw histogram n, bins, patches = plt.hist(im_array[~np.isnan(im_array)], bins=bins,color="red", alpha=0.75, rwidth=0.85) #set y grid plt.grid(axis='y', alpha=0.75) # calculate mean im_mean = im_array[~np.isnan(im_array)].mean() # add mean line plt.axvline(im_mean, color='k', linestyle='dashed', linewidth=1) # Set ticks, labels and title plt.xlabel('Temperature (deg. C)') #plt.ylabel('Frequency') plt.ylim(top=2700000) plt.title('Histogram of '+name+"\nMean: "+str(round(im_mean,4))) plt.xticks([0,3,6,9,12,15]) #set output name to sth computer readable and save output_name = name.replace(" ","_")+"_1degree_bins.png" plt.savefig(output_name,dpi=200) print("saved: ",output_name) # clear figure after saving plt.clf() plt.close("all") for path in paths: title = extract_title(path) graph(to_np(path),title)
Regarding the Neighbors parameter, the differences between 12, 14, 16 and 18 are not very big. Apparently, extending the “radius” to more neighboring points does not change the resulting interpolated map on this scale. This is also confirmed by inspecting the histograms, which show very similar mean values and distributions.
In this area of interest, changing the Power parameter on the other hand has a high influence the interpolation result. With a lower value, the areas exhibit a larger clustering and a larger spatial auto-correlation than with higher values (to test that claim, the Moran’s I value will also be calculated at the end of the exercise). Since raising the power value strengthens influence of spatially nearer known points, the map becomes much more fragmented. To visualize this phenomenon better, the Geospatial Wizard was used to quickly visualize the interpolation areas with extremely high and extremely low power.
IDW quality assessment of AOI
Now it is time to look at the quality assessment for our Area of Interest. Using the Geospatial Wizard and the above-described method, we look at the mean averages for our most extreme cases for the power parameter, since that shows the biggest difference.
On closer inspection of the quality assesment methods, it is questionable how valid this method is. The regression line is closer to the expected line for the power 1 setting, but the mean error worse. This might be because both “upwards” and “downwards” errors are taken to calculate the mean value. Therefore, a higher standard deviation equally distributed both upwards and downwards might be evened out and lower error rate shown than is actually warranted.
Moran’s I applied to IDW results
Trying to verify the previous assumption that a higher power value more sharply fragments the result, the Moran’s I spatial auto-correlation values were calculated for the power parameters 1 and 20. Since I previously worked with this method and already created the code a while ago, it seems interesting to try it out on this dataset. Unfortunately, I was not able to draw any conclusions from this method. It returns a value very close to +1 for every tiff I throw at it, even ones where I would say the spatial autocorrelation should be lower, more towards 0 and thus “random” distribution. I would be happy to hear some feedback on the method, which is described in detail below.
Firstly, the raster images were reclassified.
The Moran’s I value is calculated with the following code, which also creates the charts:
import pysal from skimage.io import imread from libpysal.weights import lat2W import numpy as np from esda.moran import Moran from skimage.color import rgb2gray from splot.esda import moran_scatterplot import matplotlib.pyplot as plt """Loading images as numpy arays, removin nan values""" im_array1 = imread(r'data/Reclass_P0_5.tif') im_array1 = np.nan_to_num(im_array1) im_array2 = imread(r'data/Reclass_P20.tif') im_array2 = np.nan_to_num(im_array2) print("image/s loaded") def Morans_I(data,out_name): data_gray = data col,row = data_gray.shape[:2] WeightMatrix = lat2W(row,col) #WeightMatrix = np.nan_to_num(WeightMatrix) WeightMatrix = lat2W(data_gray.shape,data_gray.shape) MoranM= Moran(data_gray,WeightMatrix) fig, ax = moran_scatterplot(MoranM, aspect_equal=True) print(MoranM) print("Raster Dimensions:\t" + str(data_gray.shape)) print("Moran's I Value:\t" +str(round(MoranM.I,4))) plt.savefig(out_name,dpi=200) plt.show() Morans_I(im_array1,"MI_P0_5.png") Morans_I(im_array2,"MI_P20.png")
The graphs and values returned from the method are as follows:
I am quite sure that the implementation of the Moran’s I algorithm works properly, as seen in my GitHub repository which runs the caculation on some sample images:
What needs to be changed in order for this to work? Or do I have a misunderstanding about how the Moran’s I works?
My ideas are:
- playing around with the weight matrix
- If big areas of confirmity with all their pixels within are counted as high positive spatial autocorreclation, then the next steps would be:
- increase the number of classes during reclassification in order change how many neighboring pixels are on average of a different class
- changing the algorithm to vector-based analysis to eliminate the pixel-based problem all together
I’m looking forward to your feedback!