I'm looking for fastest way to get distance between two latitude and longitude. One pair is from user and the other pair is from marker. Below is my code :
import geopy
import pandas as pd
marker = pd.read_csv(file_path)
coords_2 = (4.620881605,101.119911)
marker['Distance'] = round(geopy.distance.geodesic((marker['Latitude'].values,marker['Longitude'].values), (coords_2)).m,2)
Previously, I used apply which is extremely slow :
marker['Distance2'] = marker.apply(lambda x: round(geopy.distance.geodesic((x.Latitude,x.Longitude), (coords_2)).m,2), axis = 1)
Then, I used Pandas Series vectorization :
marker['Distance'] = round(geopy.distance.geodesic((marker['Latitude'].values,marker['Longitude'].values), (coords_2)).m,2)
I'm receiving error :
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I added all() and any() to test (such that marker['Latitude'].values.all(),marker['Longitude'].values.all() and vice versa). However, the result calculated was entirely wrong from both any() and all().
This is my result:
Latitude Longitude Distance Distance2
0 4.620882 101.119911 11132307.42 0.00
1 4.620125 101.120399 11132307.42 99.72
2 4.619368 101.120885 11132307.42 199.26
where Distance is the result from vectorization which is INCORRECT, whereas Distance2 is the result from using apply which is CORRECT. Simply, Distance2 is my expected outcome.
WITHOUT USING apply, I want to produce faster result with correct output.