4

I have to start off by saying I am 100% a beginner here.

I trained a RNN model on a 30 class dataset with over 90000 samples and it achieved less than 2% accuracy. Training the same model on a small subset of the same data (with only 3 classes), the accuracy shoots up to 97%. I'm not sure why it would perform so bad with the large dataset.

I suspect the model might be too small to find enough generalizable features but the performance is putting me off from putting in the resources to train a larger model. Currently I have two layers with 256 hidden units. Here is the paper detailing the model architecture: http://manikvarma.org/pubs/kusupati18.pdf

Please give me any inputs that could help me with this. Mostly just perplexed by how it manages to keep performing worse than random.

Robert Long
  • 3,518
  • 12
  • 30
adithom
  • 41
  • 1

1 Answers1

1

Your observation of high accuracy (97%) on a 3-class subset and significantly lower accuracy (less than 2%) on the full 30-class dataset is key to understanding why you find apparently strange behaviour and suggests several potential issues:

  1. Model Capacity:

    • The current architecture, with two layers of 256 hidden units, may lack the capacity to capture the complexity of a 30-class problem. Increasing the number of hidden units or layers could enhance the model's ability to learn more intricate patterns.
  2. Data Imbalance:

    • If certain classes are underrepresented, the model might struggle to learn distinguishing features for those classes. Ensure that the dataset is balanced or consider applying techniques like oversampling, undersampling, or class weighting to address imbalance.
  3. Regularisation:

    • Overfitting can occur if the model learns noise instead of meaningful patterns. Implementing regularisation methods such as dropout, L2 regularisation, or early stopping can help mitigate this issue.
  4. Learning Rate and Optimisation:

    • An inappropriate learning rate can hinder convergence. Experiment with different learning rates and optimisation algorithms to find a suitable configuration.
  5. Feature Engineering:

    • The quality of input features significantly impacts model performance. Ensure that features are appropriately scaled and consider incorporating domain-specific knowledge to enhance feature representation.
  6. Hyperparameter Tuning:

    • Systematically tuning hyperparameters, including batch size, sequence length, and activation functions, can lead to performance improvements.
  7. Model Architecture:

    • While FastGRNN is optimised for efficiency, its compactness might limit performance on more complex tasks. Exploring other architectures, such as standard GRNNs or Long Short-Term Memory (LSTM) networks, may yield better results for your dataset.
  8. Training Data Quality:

    • Ensure that the training data is clean and accurately labeled. Noisy or mislabeled data can adversely affect model performance.

By systematically addressing these factors, you should be able to enhance the model's performance on the full dataset. I would suggest that you start with simpler models and gradually increase complexity, monitoring performance at each step to identify and address specific issues.Alternatively you could explore an ablation study approach,

Robert Long
  • 3,518
  • 12
  • 30