I need a set of news headlines and articles to help me in a project on automatic summarization. Is there such a dataset or something similar?
Asked
Active
Viewed 1,428 times
1 Answers
2
The most widely used ones in text summarization research is the DUC dataset. If you see a paper using dataset "DUC 2015" or "DUC 2016" that's from here.
I have also personally used the Reuters arcihve. You just need to download each article with wget or something similar. See also here.
The CNN / DailyMail dataset is also widely used in summarization especially in recent years, although it labels itself as a Q&A dataset.
user12075
- 2,294
- 16
- 20