1

If I have a dataset with events occuring at certain times of day, Hour, how would I go about using this for, say, a classifier? Example:

Hour     | Event | Item
08:45:22 | Buy   | Apple
09:03:10 | Buy   | Orange
10:00:00 | Sell  | Apple

Would I convert the Hour timestamp into a numeric value, such as Unit Timestamp and then do normalization on it, like I would any other numeric value?

Khaine775
  • 181
  • 2
  • 7

2 Answers2

3

My answer is that I would just normalize 0 = 00:00:00, 1 = 23:59:59+1sec by counting seconds and divide by the number of seconds in a day.

I disagree with the sine/cosine transformation, it might treat 11:30 and 12:30 the same if you pick the wrong transformation.

Also, there might be patterns like 'last minute' or 'early rise' that do not correspond to the cyclic nature of the day.

It should be the responsibility of the machine learning algorithm to detect which transformation should be made, e.g. if your machine learning algorithm would be a neural network, it should/could have nodes with activation functions that may look like sine/cosine, amongst others. Those would detect and respond to such cyclic behaviors.

The normalized time data is also easy to pick up by tree based/svm algorithms. Maybe even easier than 'Hour' as category, e.g. if you have a shop that is open from 9:30 to 17:30, the hour category "9" would be ambiguous.

Pieter21
  • 1,051
  • 6
  • 7
1

Since you are dealing with cyclic events (Hour column goes from 00:00:00 to 23:59:59 and then back again), you could transform the columns into 'seconds' and then use sine, cosine transformation. See this similar question.

tomar__
  • 600
  • 6
  • 8