We have covered the theoretical foundations of computationally intensity in a spatially explicit way. We've introduced Spatial Competition domain. Let's use a couple of applications to make sense out of the theoretical foundations we've just learned. This analytics typically referred to be as inverse distance within interpolation. The equation here is relatively straightforward. The basic idea without going to the details of the notations is that, for spatial interpolation, in this formulation, we are going to involve a good number of neighbors to estimate unknown values at a particular location. The idea here is further away for a particular location that already has measurements, less contributions, their location would make to the unknown location for which we need to estimate the value. In this case, for instance, we want to estimate the temperature of a particular location, but we don't have a measurement for the allocation and we need to look at locations around us that already have measurements for temperature. The idea is more closer to the location. I need to estimate the temperature value for my location. Then I will take more contributions of the known temperature values from those locations closer to me. The idea seems to be intuitive because more closer to my location and who are more similar in terms of the actual values we care about in the example I just mentioned the temperature. There are lot of such correlations across space for instance, you want to estimate crime, even some social Indicators beyond physical properties. An example here is, again, going back to the same point, a dataset we've used to cover the theoretical foundations on the left side of this slide. We want to create our spatial computational domain, which is in the middle that has course resolution compared to your original spatial domain that holds that point pattern data. Now, obviously, your goal for spatial interpolation is to create a smooth surface that has values for everywhere across your spatial domain. The right side of this slide, that visualization is your end result of your spatial interpolation. In the middle, the spatial computational domain is a hidden domain, normally, not becoming evident in the process of your geospatial analytics. It's just a virtual domain created to facilitate and guide the divide and conquer process. Let's see how we established this spatial computational domain and how we used this spatial competition domain to estimate computational intensity. Here's an example of deriving a spatial competition domain. Again, the same dataset, very simple point pattern dataset on the left of this slide. Now, again, in the middle, here we have a 4 by 4 spatial computational domain. It's pretty coarse resolution. At this 4 by 4 spatial competition domain now is imposed on the original spatial domain on the left side. Remember this is a virtual domain. It's not necessarily existing in the natural world, it's existing in our computational world. The spatial interpolation we want to conduct, we want to have for a particular spatial computational domain, the blue color cell there supposed to include another 4 by 4 spatial domain, meaning the spatial domain the blue box includes has a 4 by 4 resolution. If you look at that blue box, there are four points already included in that box. That means those four points have measurement values already. Now, for spatial interpolation to work, we want to estimate the remaining 12 points in that blue box. That blue box is a cell of spatial competition domain. Why we're doing this? Again, remind you of the need of divide and conquer. Each of these boxes, including that blue box, that currently has a 3 by 3 kernel highlighted with green color. These individual boxes are the basic units of allocating such box to potentially different computing resources could be on the same computing resources, but this is the basic building blocks of computing tasks. That's the primary purpose here for estimating the computational intensity based on this spatial computational domain. We have some simple math at the bottom of this slide. The first line of equation there is to estimate the density of points, which is a pretty straight forward. The idea there is for this three-by-three kernel that has in the spatial domain, 16 by nine individual spatial domain cells. Remember now we're switching back and forth between spatial domain and the spatial computational domain, which is for estimating computational intensity. We have the point density, which is divided by the total number of cells in the spatial domain. The number of points already have value's going to be divided by the total number of spatial domain cells. That's the point density we have derived. Now, the second equation there is to calculate the computational intensity, which we have the equations here to illustrate the computational intensity calculation. The idea here is, we have a number of cells in the spatial domain that need to be estimated through the spatial interpolation. That represents the cost of computing. Because those points, the number of points, each of those points in the spatial domain needs values that need to be estimated. This is divided by the number of sampling points plus one, multiply by the point density plus a threshold and square root. The simple idea is the computational intensity here is negatively proportional to the number of known values. Number of points that are already having the measurements multiplied by the density of the points, meaning the higher density will actually have lower cost of computational intensity. This makes sense because if your point density is high, meaning you already have lot of existing measurement values. Then the remaining points need to be calculated based on the existing points. The cost is relatively low. That's why there's this negative relationship between the computational intensity versus the point density, as well as the existing values already available. That means the remaining values we need to estimate the baseline interpolation is relatively lower. That's the relationship we're establishing here for estimating the computational intensity. Again, the detailed notations you'll see from the slide are not complicated. But the basic principle here we learned is, you see, the spatial distribution has a role to play contributing to the estimate of the computational intensity. Here, the exact relationship is inversely proportional to the point density with regard to the actual computational intensity, add individual computational domain cell level. Now, if you go back to the previous slide, you apply this three-by-three kernel across your entire spatial computational domain, you're able to derive the values for each of the cells in your spatial computational domain. Through this process, you would be able to know across your entire spatial computational al domain, which parts are more computationally intensive. Which parts are less. Then in an aggregated fashion, you know also for your entire analytics, what would be the cost of your computational termed as computational intensity, and how you would break your entire problem into smaller parts based on this distribution in your spatial computational domain, that would serve as the guidance for you to do the divide and conquer. I also want to share with you another example that would be able to serve another purpose for us to learn this spatial competition domain. As you might be familiar with viewshed analysis, which is a popular GIS analytics. The idea here is if you are situated on top of a terrain, in this case, you want to look around. You want to figure out what are the parts you are able to see. Because if you're blocked, your views are not going through because of the topography of certain parts that just stood out too high, getting your views blocked, you would not be able to see what's behind that part of the terrain. This analytics is widely used in a number of applications. For instance, for planning purpose and even in some military applications, for instance, some missiles are flying through some complex terrains. The missile would always need to know what is visible as that is going through the topographic environment. The calculation, in that case, needs to be done very quickly to guide the missile to fly to where it needs to go. In this case, I want to highlight the importance of the granularity of spatial computational domain. As you learned from the previous application example there, we have the four-by-four resolution of the spatial computational domain for the inverse distance weighted interpolation example. Now, in this case, there are two competing guidelines here for us to determine the granularity of spatial competition domain. Let me read out this because it's important. The granularity of spatial competition domain needs to be sufficiently coarse to ensure that the derivation and decomposition of the spatial competition domain is computationally expensive. Let's say you have a large geospatial data you need to analyze. Then if you break up your dataset into so many small parts, and you can imagine the overhead of coordinating this many parts could be significant if there are so many of these parts. We need to be mindful of the overhead of dividing and conquering large geospatial analytics problems. As we see, the parts are becoming too many. The work involved to manage all these parts could be more significant. At the same time, you also want to work with enough smaller parts, individually, these small parts should be fitting into your individual computing resources to achieve optimal performance at the level of individual parts. The second point here is the granularity of spatial competition domain needs to be sufficiently fine to allow domain decomposition to produce a large number of sub-domains that are executed concurrently to improve competition in performance and efficiency. The experiment here conducted is really to look at different sizes of datasets. But also more importantly to look at the characteristics of your spatial domain that have to do with how we divide and conquer and how we estimate the spatial computation intensity based on spatial computation domain. There are different sizes of datasets. But also the granularity consideration we just covered is important here for conducting viewshed analysis. On the left side is coarse spatial computation domain, and on the right side is a more finer scale spatial competition domain. In fact, for viewshed analysis, you could config multi-scales of spatial computational domain. Your spatial computation domain does not have to have uniform granularities, you could have multiple granularities. In this case, we used GPU graphics processing unit as a technology to support this high-performance viewshed analysis. The GPU could work with CPU together, and for certain tasks done at the CPU level, it could work at the coarse spatial competition domain level. But for certain tasks done at the GPU level, because the GPU has been managed of exploiting massive geospatial data parallelism. You could have much larger number of units in the spatial competition domain that would benefit from finer granularity of spatial competition domain. To illustrate this by some comparison done in the past for understanding the coarse resolution using GPU versus sequential computing based viewshed analysis. Also, remember we are considering the different characteristics of the terrain. In this case, the topographic aspects of the landscape. There are three distinct features of the landscape we are evaluating: one is pit, meaning there are some lower parts and wholes of your terrain, and the flat means the terrain's relatively flat here in the Champaign Urbana Area in the middle of the country in the US. We have a lot of flat land that would normally be easy to deal with from viewshed point of view. Peak means some high elevation terrains case. Now you can tell from this evaluation, depending upon the different types of terrain features, you actually get different competitional performance. This is important to note because we see the clear evidence, spatial characteristics could have major impacts on computational performance. Now if we go back to the theoretical foundations, I made the comparison between competitional intensity and the competitional complexity. In the evaluation of competitional complexity of an algorithm, we tend to ignore the spatial characteristics, such as the terrain features we are looking to hear. Whereas for estimating competitional intensity, we very much focus on the spatial characteristics such as the terrain features we are evaluating in this case. In that sense, competitional intensity provides the theoretical foundations for taking into account spatial characteristics. Remember, spatial characteristics are special for us to evaluate the impact of such characteristics on the competitional performance. Competition complexity does not take into account such characteristics from spatial point of view, and therefore is limited to serve the purpose of guiding this divide and conquer process. These evaluations for us to better understand the difference between the coarse, single scale, spatial competition domain versus multi-scale, including fine-scale spatial competition domain. You see the benefit of the fine scale spatial competition domain across of the board here, regardless of the spatial characteristics. Spatial characteristics continue to play a role impacting the performance of a competition for all the cases, but the impact there is not significant across the three cases. But you see the fine-scale spatial competition domain is overall performing better than the coarse scale spatial competition domains, so you see the benefit there in this viewshed analysis example. Keep in mind here, when we look at different Geospatial Analytics cases, you need to determine the granularity of your spatial competition domain. At the same time, you have the option to work with multi-scale spatial competition domain. If you have, for instance, computing resources such as GPU that would accommodate this multi-scale representations for exploiting parallelism, both data as well as operation specific parallelism for you to achieve optimal competition performance. At the same time, of course, a major motivation for divide and conquer is to solve bigger and more complex problems that need to be divided and conquered. This concludes the applications part of this topic of theoretical foundations and future trends. Next part will be the future trends of Cyber GS and Geospatial data science.