5

It is not clear to me what advantage the EDA data visualization provides. By advantage I mean what decision I will make according to one or the other visualization.

Could someone give me an example where the data visualization makes me decide for one or the other algorithm ?

i.e from the book "Introduction to ml with python" Visualising datasets before fitting any models can be extremely useful. It allows us to see obvious patterns and relationships,and may suggest a sensible form of analysis. With multivariate data, finding the right kind of plot is not always simple, and many different approaches have been proposed. enter image description here

How does whether I have seen this visualization or not change the way to proceed?

liakoyras
  • 636
  • 4
  • 15

2 Answers2

5

First, visualization is just an easy and intuitive way to understand underlying patterns in your data. Everything that you can achieve through this, can also be achieved through painstakingly printing different values and statistics.

I will just mention two simple examples of algorithms chosen because of patterns in the data. They are very simple, but they can be generalized.

  1. Regression

    If you find out that the data is linear, Linear Regression can be a good choice of algorithm

    linear data

  2. Classification

    If the data are linearly separable, SVM is suitable linearly separable data

These are visualizations of the datapoints themselves, but other visualizations like histograms can help find underlying distributions too.

In addition, visualization can be useful in other parts of the process. For example, if you see a normal distribution, you can impute missing data using the mean value, while for a skewed distribution the median is more suitable.

liakoyras
  • 636
  • 4
  • 15
5

The simplest example is Anscombe's Quartet

By Anscombe.svg: Schutz(label using subscripts): Avenue - Anscombe.svg, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=9838454. These four data sets are quite different, which doesn't appear just by looking at the summary stats.

Richard Careaga
  • 302
  • 1
  • 5