8

I work in renewable energy. My company gathers a lot of data from equipment. This typically includes process data (such as transformer temperature, line voltages, currents, etc.) and discrete alarms (e.g. breaker trip, inverter alarm values, transformer over temperature alarm). This is a rough example of what our data looks like (to be read as lines of csv):

  • timestamp, tag, value
  • 5/25/2016 14:30:01, INVERTER_1.VOLTAGE_DC, 249.5
  • 5/25/2016 14:30:06, INVERTER_1.VOLTAGE_DC, 250.1
  • 5/25/2016 14:45:02, TRANSFORMER_1.TEMP_ALARM, 0
  • 5/25/2016 14:45:15, TRANSFORMER_1.TEMP_ALARM, 1

I'd like to start performing some pattern analysis on this data at rest, not real-time (at least for now). I believe what I'd like to attempt is unsupervised feature learning, but I'm not entirely sure. It would be nice (I think) to apply machine learning to 1) identify any patterns that aren't obvious and 2) allow an algorithm to identify signatures of patterns in the data (e.g. all inverters on a single feeder lose communications when a breaker is open).

My initial question: Is this considered time-series data? In my research so far it seems that time-series data is referencing data that is a function of time. For most of my data, as a domain expert, I don't believe that defining functions for my data is useful for this analysis. Also, in my research, it seems as though time-series data refers to real-valued values and not discrete.

Any comments or relevant references would be helpful.

rphv
  • 1,624
  • 12
  • 25
theoneandonly2
  • 211
  • 1
  • 2
  • 5

1 Answers1

3

Yes, your data is "time-series data", since it's a sequence of measurements of the same variable collected over time. Time-series data can be collected continuously or at discrete intervals.

Your sample data can be expressed as a function of time - maybe it helps to think of the "function" as the process that produces the measured output, the input to the function is the date/time stamp, and the output is the value of that parameter at that time:

$\text{INVERTER_1.VOLTAGE_DC}( \text{5/25/2016 14:30:01} ) = 249.5$

You don't necessarily need to define the (general) function(s) that produce your data to perform time-series analysis - it's enough to know the value of the function at your measurement times. The range of time-series data can be continuous & real-valued, discrete, or even non-numeric.

It's certainly possible to use machine learning techniques on time-series data, e.g. for forecasting, anomaly detection, or pattern identification.

Neural nets might be a good choice if you're interested in predictive modeling. One possible setup is to use the current parameter measurements as input to the neural net, and the output is the predicted future value or "state of the system" (e.g., whether a breaker is open or not).

WEKA is a good open-source machine-learning toolkit that contains implementations of many different ML algorithms.

rphv
  • 1,624
  • 12
  • 25