33

I am working on a fictional dataset with 25 features. Two of the features are latitude and longitude of a place and others are pH values, elevation, windSpeed etc with varying ranges. I can perform normalization on the other features but how do I approach latitude/longitude features?

Edit: This is a problem to predict agriculture yield. I would think lat/long is very important since locations can be vital in prediction and hence the dilemma.

AllThingsScience
  • 443
  • 1
  • 4
  • 5

1 Answers1

41

Lat long coordinates have a problem that they are 2 features that represent a three dimensional space. This means that the long coordinate goes all around, which means the two most extreme values are actually very close together. I've dealt with this problem a few times and what I do in this case is map them to x, y and z coordinates. This means close points in these 3 dimensions are also close in reality. Depending on the use case you can disregard the changes in height and map them to a perfect sphere. These features can then be standardized properly.

To clarify (summarised from the comments):

x = cos(lat) * cos(lon)
y = cos(lat) * sin(lon), 
z = sin(lat) 
rll
  • 103
  • 4
Jan van der Vegt
  • 9,448
  • 37
  • 52