Why should I understand AI architectures?

Question

Why should I understand what is happening deep down in some AI architecture?

For example LSTM-BERT- Partial Conv... Architectures like this. Why should I understand what is going on while I can find any model on the Internet or any implementations on the Internet?

score 14 · Answer 1 · answered Nov 07 '21 at 20:18

You are right that you actually do not need to know the architectures if you just want to apply them.

But there are to reasons why it would be good to understand the architecture.

Models often do not work off the shelf for your problem. In this case you will have to tune the model parameters etc. in order to apply the model to your problem. So knowledge of the architecture will be very important for debugging.
If you want to use the model in production you should be able to explain how it works to the business people, when it can fail and what are its limitations.
If you want to develop new methods a solid understanding of current models will be inevitable.
If you want to deploy your systems for which you do not have a running implementation.

MvZ · Answer 2 · 2021-11-08T12:55:34.067

I understand that you're looking for a shortcut. You can read about an architecture and produce an implementation that scores well on performance benchmarks. Although performance feels quite rewarding, it is not an indicator that you are doing well or delivering a high-quality model. So what's the use of all the extra struggle?

The algorithms and architectures that are in vogue change. Everybody can learn a list of an architecture's strengths and limitations. The struggle to understand the underlying technologies is a process that teaches you how to evaluate an architecture - and its use in a specific context - critically and semi-independently*. From that perspective, the architectures you're currently looking at are just a teaching tool.

Knowing that an architecture has a limitation does not mean that you can:

Recognize whether that limitation applies to your use case.
Understand how big the impact of the limitation is.
How to test for, or measure the extent of the limitation.
Or how to mitigate or compensate for the limitation.

All of the above requires a solid understanding of the architecture's components. An excellent example of a situation where developers may have known of the limitations of their algorithm is ProPublica's coverage of algorithmic bias (via Wayback Machine). If you train a pattern-matching algorithm with biased data, you get a model with the same biases. That's well known, so how did this happen?

Did the developers of these models simply not know about that risk? Did they not care? Or, did they not realize what this limitation meant to their use case and how to mitigate it?

*I do implementations, not original research. But I need to be able to read publications and to understand how their contents apply to my use case - even if the technology has never been applied in a similar context.

Why should I understand AI architectures?

2 Answers2