I'm interested to see if I can use machine learning/network analysis methods to automatically detect automatically generated (spam) webpages. I'm particularly interested in webpages that look structurally like a non-spam website, but on closer inspection are total rubbish.
If I want to test a method, I'd need some way of accessing (or generating) these webpages.
Question: Where can I access automatically generated (spam) webpages?
Here's a sample of the kind of webpage I'm thinking about: