12

The general question, as the title suggests, is:

  • What is the difference between DS and OR/optimization.

On a conceptual level I understand that DS tries to extract knowledge from the available data and uses mostly Statistical, Machine Learning techniques. On the other hand, OR uses the data in order to make decisions based on the data, for example by optimizing some objective function (criterion) over the data (input).

I wonder, how these two paradigms compare.

  • Is one subset of the other?
  • Are they consider complementary fields?
  • Are there examples that one field complements the other or they are used in conjuction?

In particular, I am interested in the following:

Is there any example where OR techniques are used to solve a Data Science question/problem?

PsySp
  • 261
  • 1
  • 2
  • 7

5 Answers5

10

While both Operations Research and Data Science both cover a large amount of topics and areas, I'll try to give my perspective on what I see as the most representative and mainstream parts of each.

As others have pointed out, the bulk of Operations Research is primarily concerned with making decisions. While there are many different ways to determine how to make decisions, the most mainstream parts of OR (in my opinion) are focused on modelling decision problems in a mathematical programming framework. In these kinds of frameworks, you typically have a set of decision variables, constraints over these variables, and an objective function dependent on your decision variables that you are trying to minimize or maximize. When the decision variables can take values in $\mathbb{R}$, the constraints are linear inequalities over your decisions variables, and the objective function is a linear function of the decision variables, then you have a linear program -- the main workhorse of OR for the past sixty years. If you have other kinds of objective functions or constraints, you find yourself in the realm of integer programming, quadratic programming, semi-definite programming, etc...

Data Science, on the other hand, is mostly concerned with making inferences. Here, you're typically starting with a big pile of data and you'd like to infer something about data you haven't seen yet in your big pile. The typical sorts of things you see here are: 1) the big pile of data represents the past results of two different options and you'd like to know which option will yield the best results, 2) the big pile of data represents a time series and you'd like to know how that time series will extend into the future, 3) the big pile of data represents a labelled set of observations and you'd like infer labels for new, unlabelled observations. The first two examples fall squarely into classical statistical areas (hypothesis testing and time-series forecasting, respectively) while the third example I think is more closely associated with modern machine learning topics (classification). Fun trivia: I believe that for a long time, the job title at Google for people doing what we now call Data Science was (is?) "Statistician".

So, in my opinion, Operations Research and Data Science are mostly orthogonal disciplines, although there is some overlap. In particular, I think that time-series forecasting appears in a non-trivial amount in OR; it's one of the more significant, non-math programming-based parts of OR. Operations Research is where you turn if you have a known relationship between inputs and outputs; Data Science is where you turn if you're trying to determine that relationship (for some definition of input and output).

skovorodkin
  • 103
  • 3
mhum
  • 2,350
  • 14
  • 18
6

This isn't a full answer, since mhum's is quite good in contrasting the differing aims of OR vs DS.

Rather, I want to address this comment of yours:

I was wondering if, for example, one could use any OR techniques to solve DS problems.

The answer is yes. The clearest example that comes to mind is Support Vector Machines (SVMs).

To "fit" an SVM model to some data (which must be done before you can use it to infer predictions), the following optimisation problem must be solved:

Maximize the dual,

$$ g(a) = \sum_{i=1}^{m} \alpha_i - \frac{1}{2} \sum_{i=1}^{m} \sum_{j=1}^{m} \alpha_i \alpha_j y_i y_j x_i^T x_j,$$

subject to the constraints

$$ 0 \leq \alpha_i \leq C, \qquad \sum_{i=1}^n y_i \alpha_i = 0$$

This is a constrained optimisation problem, just like many in the field of OR, and it is solved using quadratic programming methods or interior point methods. These are generally associated with the field of OR rather than DS but this is an example of their wider applicability.

More generally, optimisation is key to many of the statistical and machine learning models employed in the field of DS, since the process of training these models can typically be formulated as a minimisation problem involving a loss/regret function - from the humble centuries-old linear regression model to the very latest deep learning neural network.

A good reference on SVMs is Bishop.

Schonfinkel
  • 1,493
  • 4
  • 13
  • 25
A. G.
  • 261
  • 2
  • 5
3

As a strategist, I've had the opportunity to work with both sides of the discipline. In trying to explain what OR and DS are to a qualitative MBA executive, my (overly) simplistic one line introduction for each

OR: economists that know how to code
DS: statisticians that know how to code.

In practical terms, how the two groups typically come together: the OR side develops the decision model, and the DS side figures out the appropriate data implementation to feed the model.

Each on their own, will rely on the theoretical traditions of their disciplines - together, they conduct experimentation to structure the data and refine the model in order to get to the true insights needed for optimal decisions. As each gets to know the other, their thinking and their language will typically converge.

padawan
  • 1,455
  • 1
  • 12
  • 30
user88056
  • 31
  • 1
2

I obtained my Operations Research degree from a military institution and have been using it for military applications for about 15 years. In that timeframe, Data Science has grown into a main-stream term referring to people with varying degrees of expertise in the data and information space. In the Department-of-Defense (DoD), Data-science is not very well defined, but yet, there is an understanding of wanting more of it, and there is an expectation that DS is crucial for data-architecting, data-normalization, ML, AI and facilitating analytics. Unfortunately, some decision makers perceive visualizations as being the end-state goal of analytics, but that is better left as a tangential discussion in the future.

Before DS became a main-stream term, Ops Researchers were used by the DoD to perform the data tasks that I mentioned above, but also to do Mathematical Programming, Optimization, Network optimizations (as in SCM, not computer networks), Network Interdiction and vulnerability analysis, and simulations for prediction and prescription, plus process testing and wargaming.

The Advances in Computer Science, Computer Hardware and Computing Power, plus the reduction of cost in those areas have now provided the necessary demand for a dedicated discipline to exploit data in the way that all our theories have only dreamed of since the era of vacuum tubes. I think for the most part ORs have been filling a void out of necessity, but not necessarily with the right standard discipline and expertise that it really merits. I don't mean to say that ORs can't do data science, or that we can't be experts on it. However, in my experience we (ORs) have been performing data-science tasks in order to support analytics (modeling, simulation, decision aides, etc...)

In general, and surprising, due to the age of the two disciplines, there is a better understanding in the Department-of-Defense of what Data Scientists do, than what ORs do. Probably because of the main-stream aspect of DS, and the current data-age we live in. To the sad and frustrating point where now OR has to defend its utility, since the wrong assumption is that DS can do all that ORs do.

The confusion is also due to the fact that Ops Research does not always produce a tool or data-product, but instead delivers analysis that may have used techniques that are not always advertised or highlighted as they are not the focus, but the results of the analysis itself. Also Ops Researchers tend to grow from a functional group and culture first, and then become ORs. This has a huge positive impact for the studies and analysis, but also does not give the ORs an obvious distinction within the team. Data Scientists, however, due to their "newness," average age and specific niche that they operate in, plus the fact that they can come in from outside the functional area and culture, they are automatically identified as an enabler and SME.

I hope that my experiences help answer the original questions or at least give a different perspective to better inform the group. I did not want to go technical as I think that there is a lot of overlap between the two disciplines, and we are all still trying to get better definitions as we settle in this space.

Cheers to the group,

p.s. I find this graphic useful in illustrating OR:

Representative illustration of the disciplines and problems related to operations research

-2

If you count ML and AI driven by ML as a part of Data Science(which some people do and some dont according to my experience, for instance Microsoft professional program in AI contains key aspects of Data Science+Machine learning(with both DL and RL) while Higher School of Economics presents practically same advanced parts of Microsoft cuuriculum as Advanced Machine Learning) then there are many similarities in mathematics that is used in both fields. For instance: Nonlinear Programming(Lagrange multipliers, KKT conditions...)-->used for derivation of Support Vector Machines...Econometrics which is mostly based on Regressions---> Regressions are key part of both Data Scinece in general and more specifically Supervised Learning...Statistics(normally found in OR Curriculum)--->key for Data Science and Machine Learning as well...Stochastic Processes--->very important in Reinforcement Learning...Dynamic Programming--->again found in Reinforcement Learning...So,I would say there are some similarities with Data Science in general and pretty much similarities with ML. Of course, goals of these disciplines are different but there is a lot of similarities in mathematics that is being used in these disciplines.