Recently I have been learning about Geospatial Analysis in which we are often interested in using computer software to analyze the mathematical properties and characteristics of polygons (e.g. calculating their centroids).
For example, using the R programming language, I downloaded a geospatial file of Canada that is broken up into different "polygons" - I then attempted to calculate the centroids of each of these polygons and visualize the results.
Here the Code :
library(dplyr)
library(sf)
library(data.table)
library(rvest)
library(leaflet)
library(ggplot2)
library(urltools)
library(leaflet.extras)
library(stringr)
library(magrittr)
# Download zip files
url_1 <- "https://www12.statcan.gc.ca/census-recensement/alternative_alternatif.cfm?l=eng&dispext=zip&teng=lada000b21a_e.zip&k=%20%20%20151162&loc=//www12.statcan.gc.ca/census-recensement/2021/geo/sip-pis/boundary-limites/files-fichiers/lada000b21a_e.zip"
download.file(url_1, destfile = "lada000b21a_e.zip")
# Extract zip files
unzip("lada000b21a_e.zip")
# Read shapefiles
ada <- st_read("lada000b21a_e.shp")
shapefile_1 = ada %>% st_transform(32617)
#sf_cent <- st_centroid(shapefile_1)
sf_cent <- st_point_on_surface(shapefile_1)
# Transform the centroids to the WGS84 CRS
sf_cent_geo <- st_transform(sf_cent, crs = 4326)
# Extract the longitude and latitude coordinates of the centroids
lon <- st_coordinates(sf_cent_geo)[,1]
lat <- st_coordinates(sf_cent_geo)[,2]
ADAUID <- sf_cent_geo$ADAUID
lon <- st_coordinates(sf_cent_geo)[,1]
lat <- st_coordinates(sf_cent_geo)[,2]
shapefile_1 = ada %>% st_transform(32617)
sf_cent <- st_centroid(ada)
ggplot() +
geom_sf(data = shapefile_1, fill = 'white') +
geom_sf(data = sf_cent, color = 'red')
The results look something like this:
As we can see, some of these polygons appear to have multiple centroids ("red points") - this means the centroids for some of these polygons are located outside of these polygons themselves!
Based on these two references here (https://en.wikipedia.org/wiki/Shoelace_formula, https://en.wikipedia.org/wiki/Centroid), the formula for calculating the centroid of a polygon can be written as such (based on the area of the polygon):
$$A = \frac{1}{2} | \sum_{i=1}^n (x_i y_{i+1} - x_{i+1} y_i) |$$
$$C_x = \frac{1}{6A} \sum_{i=0}^{n-1} ((x_i + x_{i+1}) (x_i y_{i+1} - x_{i+1} y_i))$$
$$C_y = \frac{1}{6A} \sum_{i=0}^{n-1} ((y_i + y_{i+1}) (x_i y_{i+1} - x_{i+1} y_i))$$
My Question: Is it possible to mathematically prove that sometimes $C_x$ and $C_y$ can be located outside the perimeter of the polygon that they belong to? Otherwise, how else is it possible that the centroid of a polygon can be located outside of the polygon itself?
Thanks!



