13

I spent the past year learning Python. As a person who thought coding was impossible to learn for those outside of the CS/IT sphere, I was obviously gobsmacked by the power of a few lines of Python code!

Having arrived at an intermediate level overall, I was pretty proud of myself as it greatly expands my possibilities in data analysis and visualization compared to Excel (aside from the millions of other uses there are for Python).

Purely in terms of data analysis and visualization:

what does approaching the same data set with pandas/matplotlib/seaborn/numpy bring to the table as opposed to using Tableau?

(sidenote: I was greatly disappointed to see all my hard-earned Python data wrangling skills were available in such a user-friendly GUI... :'( )

Uralan
  • 143
  • 1
  • 8

3 Answers3

13

Don't worry - your hard-earned Python skills are still important ;)

Tableau is not a replacement - it is essentially a means of sharing your insights/findings. It is a wrapper around your normal toolkit (Pandas, Scikit-Learn, Keras, etc.). It can do some basic analysis (just using basic models from sklearn), but the powerful thing is it can deploy your models to allow people to run inference on stored data/new data, and then play around with it in an interactive dashboard.

Watch this video for a good overview of everything it can do, and how it connects to Python (and R/MatLab). There is just a bit of boiler plate code around your normal Python code.

Tableau also offer TabPy to set up a server, allowing nice deployments of your work, but in the end you need their desktop application to view the results (i.e. your customers need it to look at the results). This is not free: https://www.tableau.com/pricing/individual

In summary, I'd say Tableau is more of a business intelligence tool, allowing e.g. your non-data-scientist boss or other stakeholders to interactively explore the data and the results of your modelling. Similar to Microsoft's PowerBI.

n1k31t4
  • 15,468
  • 2
  • 33
  • 52
10

There is the official answer and the realistic answer (from a business perspective):

Official

Officially, the greatest benefit your Python skills will bring you is flexibility. If you need to run an economic model where you want to show gradient uncertainty or something else complex, doing that manually in any Data Visualization/Business Intelligence software is going to be a pain. Even simpler tasks, like semi-complex aggregations, will often be easier to accomplish in a few lines of Python compared to the mess they can quickly become in BI software.

Practical

Business Intelligence software—which I will include Tableau in for this answer—can handle a significant portion of real-life data analysis and data visualization steps. While they are not particularly flexible compared to code, they are good enough for day-to-day use. In general, given a typical business setting, I would readily recommend them for most users. The greatest limiting factor with all of them is that the biggest job of a business data scientist is collecting and, most importantly, cleaning data, and that boils down to either manual labor... or coding. All BI software attempts to help with automatically pulling in data and, to a lesser extent, assisting with cleaning it up. However, the real job often boils down to: "connect to these databases, clean the data, combine the data, and put them somewhere so you or someone else can visualize the data in BI software."

And that's the thing: Google Data Studio is easily the least capable of all the popular BI solutions, yet it has become my go-to solution. This is because once I prepare the data correctly, I can give it to anyone to explore, and it has the easiest/best UX. And yes, any complex statistics will happen long before it gets into any BI software (in both Tableau and Microsoft Power BI, you can also run Python directly inside the product... personally, I wouldn't recommend it, as it 1) just becomes messy and 2) pulls it out of source control), but those occur less often than one might expect.

Conclusion

If you are in the business of business intelligence, then I would wholeheartedly recommend leaning on business intelligence software as much as possible. My experience is that you have:

  • What your job really is: the Data Warehouse side of things (extract your data, transform (clean it), and load (store it somewhere you can access from both your BI software and Jupyter))
  • What your end users will see: the BI software for standard visualizations
  • What you want it to be: the occasional Jupyter notebooks for specialized analyses

Of course, your experience might be completely different, but this has been my personal experience after working for a couple of years for a company that helped companies with their data-driven business management (and thus, I got to see how it worked in a whole bunch of companies). And, yes, often enough, all a company will be using is Excel + Power Query.

PS. Tableau tries to be this all-in-one solution. Personally, my experiences have not been positive with them, but for what it's worth, they are the most established player on the market.

David Mulder
  • 200
  • 5
6

As someone who worked on a competitor to Tableau, Data Science skills have largely superseded the need for Bi Software for data munging, complex analysis and ad hoc reports.

But BI Software can still be beneficial if you need to deploy your results to lots of people, often with varying rights to view something (e.g. you can only see your performance stats, but not the stats of alice). For this, the graphical capabillites of Tableau and the underlying security model are quite a lot to recreate in Python.

This holds also true for a lot of use cases where you need to update reports regularly and the viewership needs nice reports.

On the other hand, data munging, etl and most importantly complex analysis pipelines are not the strong suit of BI Software, they are much better done in Python. Also, if you are providing an API, which is intended for programmatic consumption, Bi Software is often stark nacked.

Christian Sauer
  • 657
  • 4
  • 7