21

If I have a retail store and have a way to measure how many people enter my store every minute, and timestamp that data, how can I predict future foot traffic?

I have looked into machine learning algorithms, but I'm not sure which one to use. In my test data, a year over year trend is more accurate compared to other things I've tried, like KNN (with what I think are sensible parameters and distance function).

It almost seems like this could be similar to financial modeling, where you deal with time series data. Any ideas?

Rubens
  • 4,117
  • 5
  • 25
  • 42
user1132959
  • 311
  • 1
  • 2
  • 5

7 Answers7

17

The problem with models like KNN is that they do not take into account seasonality (time-dependent variations in trend). To take those into account, you should use Time Series analysis.

For count data, such as yours, you can use generalized linear auto-regressive moving average models (GLARMA). Fortunately, there is an R package that implements them (glarma).

The vignette is a good resource for the theory behind the tool.

Christopher Louden
  • 1,200
  • 7
  • 11
10

I think Christopher's answers above are entirely sensible. As an alternate approach (or perhaps just in addition to the advise he's given), I might start by just visualizing the data a bit to try get a rough sense of what's going on.

If you haven't already done this, you might try adding a date's month and day of week as features -- if you end up sticking with KNN, this will help the model pick up seasonality.

As a different way of taking this on, you might consider starting with a really, really basic model (like OLS).. these often go a long way in generating reasonable predictions.

Finally, the more we know about your data, the easier it will be for us to help generate suggestions -- What time frame are you observing? What are the features you're currently using? etc.

Hope this helps --

3

As @Christopher Lauden mentioned above, time-series analysis is most appropriate for this sort of thing. If, however, you wished to do a more traditional "machine learning approach", something that I have done in the past is to block up your data into overlapping windows of time as features, then use it to predict the next days (or weeks) traffic.

Your feature matrix would be something like:

t1 | t2 | ... | tN
t2 | t3 | ... | tN+1
t3 | t4 | ... | tN+2
...
tW | tW+1 | ... |tN+W

where tI is the traffic on day I. The feature you'll be predicting is the traffic on the day after the last column. In essence, use a window of traffic to predict the next day's traffic.

Any sort of ML model would work for this.

Edit

In response to the question, "can you elaborate on how you use this feature matrix":

The feature matrix has values indicating past traffic over a period of time (for instance, hourly traffic over 1 week), and we use this to predict traffic for some specified time period in the future. We take our historic data and build a feature matrix of historic traffic and label this with the traffic at some period in the future (e.g. 2 days after the window in the feature). Using some sort of regression machine learning model, we can take historic traffic data, and try and build a model that can predict how traffic moved in our historic data set. The presumption is that future traffic will resemble past traffic.

gallamine
  • 428
  • 3
  • 8
3

You could try Neural Network. You can find 2 great explanations on how to apply NN on time series here and here.

Note that it is best practice to :

  • Deseasonalize/detrend the input data (so that the NN will not learn the seasonality).
  • Rescale/Normalize the input data.

Because what you are looking for is a regression problem, the activation functions should be linear and not sigmoid or tanh and you aim to minimize the sum-of-squares error (as opposition to the maximization of the negative log-likelihood in a classification problem).

Orelus
  • 151
  • 4
2

Well, firstly, I would not even use things like Machine learning without having in depth knowledge. Simplistic things I would do if I had this time series is:

  1. Write sql queries to understand which of the times you have the busiest, average and low foot traffic.
  2. Then try to visualize the whole time series, and you could use basic pattern matching algorithms to pick up patterns.

This two things will help you understand what your data set is telling you. Then, with that in hand, you will probably be in a better state to use machine learning algorithms.

Also, I'm currently working in building something on time series, and using time series analysis will help you much more than machine learning. For example, there are pattern recognition algorithms that you can use that uses every day data to show patterns, and ones which use up to as much as 3 to 6 months of data to catch a pattern.

Rubens
  • 4,117
  • 5
  • 25
  • 42
Nischal Hp
  • 795
  • 3
  • 10
0

I would advice against using a neural network or equivalent as, I assume, you have got such a good prior based on your experience with the store (ie that there are probably day-to-day / seasonal trends and some level of smoothness) and I imagine a relatively small amount of data. A better option IMO would be to go for a kernel method such as a Gaussian Process or SVM.

j__
  • 101
  • 2
0

Bringing this thread back to life, as this could be useful to others landing here with similar questions.

Facebook recently released and open-sourced one of their internal forecasting tool called Prophet

It is available as both R & Python packages, and proves to be an interesting solution for someone with little Machine Learning background. However, some additional ML knowledge allows to tune and optimize the produced models.

I recommend giving Prophet a try as a first step. The quick win on this solution is the ease and speed of the model building and testing: you can literally get a descent projection in a matter of minutes. It behaves very well on time series, catching the seasonality of the data at hand "naturally".

Under the hood, it's similar to a generalized additive model (GAM) - more details on the dedicated paper: https://facebookincubator.github.io/prophet/static/prophet_paper_20170113.pdf

Ethan
  • 1,657
  • 9
  • 25
  • 39