4

I have to tag a dataset for NER. I came across conll2002/esp. What I understand so far, in IOB2 format if I want to tag 'Alex Larson is going to Los Angeles for a job interview with Candace Patrick' it'll be like:

Alex B-PER
Larson I-PER
is O
going O
to O
Los B-LOC
Angeles I-LOC
for O
a O
job O
interview O
with O
Candace B-PER
Patrick I-PER

Am I right? What about IOB format?

2 Answers2

5

The difference is not related to the length of the named entities. Rather, it deals with how two adjacent named entities of the same type are labeled.

In IOB1 (IOB), B- is only used to separate two adjacent entities of the same type:

Today    O
Alice    I-PER
Bob      B-PER
and      O
I        O  # or I-PER if pronominals are being tagged
ate      O
lasagna  O

In IOB2, all entities begin with B-:

Today    O
Alice    B-PER
Bob      B-PER
and      O
I        O  # or B-PER if pronominals are being tagged
ate      O
lasagna  O

See Wikipedia

3

IOB: Here, I is used for a token inside a chunk, O is used for a token outside a chunk and B is only used for the beginning token of a Named Entity (chunk) spanning more than one token.

Alex I-PER
is O
going O
to O
Los B-LOC
Angeles I-LOC

IOB2: It is same as IOB, except that the B- tag is used in the beginning of every chunk (i.e. all chunks start with the B- tag).

Alex B-PER
is O
going O
to O
Los B-LOC
Angeles I-LOC
dnivog
  • 31
  • 3