1

Many examples of language translation neural networks:

"the cat sat on the mat" -> [model] -> "le chat etait assis sur le tapis"

use RNN, and in particular LSTM. See for example Sentences language translation with neural network, with a simple layer structure (if possible sequential) and A ten-minute introduction to sequence-to-sequence learning in Keras .

Are there (not too complicated) attempts of doing language translation with CNN only and no RNN/LSTM?

Would you have an example in Keras?

Basj
  • 180
  • 2
  • 18

3 Answers3

3

I just googled:

  • A Convolutional Encoder Model for Neural Machine Translation, by Gehring et al., link
  • Convolutional Sequence to Sequence Learning, by Gehring et al. link
  • Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction, by Elbayad et al. link

All the implementations I found on GitHub are in PyTorch. I'm not surprised I didn't find much: the application of CNNs to NLP was interesting, but they never beat RNNs at that. After the advent of Attention models, and especially Transformers, these kind of models have not been developed further. In fact the latest paper I linked you is from 2018 - i.e. an Ice Age ago for the pace of NLP.

If you really want to go deeper on this topic of Convolutional NMTs, I suggest you to check the available torch-based code and try to make a replica in Tensorflow/Keras. It's hard work, but a fancy model nonetheless. Good luck!

Leevo
  • 6,445
  • 3
  • 18
  • 52
0

I found an overview paper from 2020 summarizing it nicely:

The encoder and decoder are key components of NMT architectures. There are many methods to build powerful encoders and decoders, which can roughly divide into three categories: recurrent neural network (RNN) based methods, convolution neural network (CNN) based methods and self-attention network (SAN) based methods

https://www.sciencedirect.com/science/article/pii/S2666651020300024

0

CNN and RNN have different architectures and are designed to solve different problems.

Images have a lot of pixels and thereby lot of features. Reducing some features doesn't impact much on what image coveys. CNNs are designed to reduce the features.

NLP are context driven. Farther the word is in sentence the less is its significance to context/meaning of current word. RNNs/LSTM/Transformers are designed to sustain that memory based on distance to current word. Therefore these architectures are better suited for NLP kind of scenario. Attention just helps achieving same objective by putting focus on some specific words (Attention can be used with CNNs too of course).

Now the original question, can CNN be used for RNN. Yes, but in that case you will have to control how much memory (in form of CNN stride/window size etc) yourself based on the sentence you like to understand/translate.
Simply, RNNs are just better than CNNs to solve this problem so there are more efforts put by community.

Sandeep Bhutani
  • 914
  • 1
  • 7
  • 26