9

I'm trying to find multi-label classfication datasets, which are available for free online.

By "multi-label" I mean that each instance can be labeled with anywhere from a single to $k$ labels, where $k$ is the total number of different labels in the dataset. Typically all information about the labels would be represented in a binary matrix $\mathbf{M}$, where $\mathbf{M}_{ij}=1$ if instance $i$ has label $j$, and $0$ otherwise.

I've found the following two datasets so far:

  1. iMaterialist Challenge (Fashion) at FGVC5 from Kaggle.com
  2. DeliciousMIL: A Data Set for Multi-Label Multi-Instance Learning with Instance Labels Data Set

I've also looked at the Mulan multi-label datasets page, but they are pretty opaquely (and sometimes erroneously) described.

Where can I find more multi-label datasets (preferably with 20-200 different labels in total)?

Bobson Dugnutt
  • 195
  • 1
  • 8

3 Answers3

4

You can find a complete repository of around 80 multi-label datasets here :

Adept
  • 904
  • 6
  • 17
Eva Gibaja
  • 56
  • 1
2

Try, Kaggle Toxic Comments Challenge. You have to classify the answer to multiple classes at the same time. It is a multi-label classification problem.

tenshi
  • 626
  • 4
  • 6
2

19 free datasets:

  1. United States Census Data: The U.S. Census Bureau publishes reams of demographic data at the state, city, and even zip code level. The data set is fantastic for creating geographic data visualizations and can be accessed on the Census Bureau website. Alternatively, the data can be accessed via an API. One convenient way to use that API is through the choroplethr. In general, this data is very clean and very comprehensive.

  2. FBI Crime Data: The FBI crime data set is fascinating. If you’re interested in analyzing time series data, you can use it to chart changes in crime rates at the national level over a 20-year period. Alternatively, you can look at the data geographically.

AND MUCH MORE HERE: https://www.springboard.com/blog/free-public-data-sets-data-science-project/

Stephen Rauch
  • 1,831
  • 11
  • 23
  • 34