If I have to paraphrase the current NER methodologies, it generally finds patterns in strings and creates its own "vocabulary", so to speak.
Naturally, it would perform like a charm with a mammoth dataset, curated carefully with hardwork and labeled entities.
But what if, the system is first introduced with a dictionary of named entities for respective categories and then given a sample literature or simple tweets for that category, for eg, and from them it "learns" how those named entites appear in a context.
The difference is subtle from regex based, in that in regex it would try to match strings and more the dictionary size, more the rules, then more its usefulness.
But in this system, it would actually learn how "eating an apple" and "eating at Apple" both can be accurately classified with little training set, the subtleness of grammar when mentioning about fruit and about a company.
Some intuition on its implementation in CRF++, CRFsuite, Stanford or any other.
General disclaimer
Not a data scientist. Just a passing thought from an ML enthusiast.