how to generate/read a glyph in ocr?

Question

I have been practicing with some Machine Learning algorithms for OCR, but I have a doubt about the glyph concept. For example in the UCI repository:

https://archive.ics.uci.edu/ml/machine-learning-databases/letter-recognition/letter-recognition.names

where it says:

4. Relevant Information:

   The objective is to identify each of a large number of black-and-white
   rectangular pixel displays as one of the 26 capital letters in the English
   alphabet.  The character images were based on 20 different fonts and each
   letter within these 20 fonts was randomly distorted to produce a file of
   20,000 unique stimuli.  Each stimulus was converted into 16 primitive
   numerical attributes (statistical moments and edge counts) which were then
   scaled to fit into a range of integer values from 0 through 15.  We
   typically train on the first 16000 items and then use the resulting model
   to predict the letter category for the remaining 4000.  See the article
   cited above for more details.

and when I see the file letterdata.csv I find something like this:

"letter","xbox","ybox","width","height","onpix","xbar","ybar","x2bar","y2bar","xybar","x2ybar","xy2bar","xedge","xedgey","yedge","yedgex"
"T",2,8,3,5,1,8,13,0,6,6,10,8,0,8,0,8

So for what I have read this first line refers to the character T that has certain values according to the attributes:

7. Attribute Information:
     1. lettr   capital letter  (26 values from A to Z)
     2. x-box   horizontal position of box  (integer)
     3. y-box   vertical position of box    (integer)
     4. width   width of box            (integer)
     5. high    height of box           (integer)
     6. onpix   total # on pixels       (integer)
     7. x-bar   mean x of on pixels in box  (integer)
     8. y-bar   mean y of on pixels in box  (integer)
     9. x2bar   mean x variance         (integer)
    10. y2bar   mean y variance         (integer)
    11. xybar   mean x y correlation        (integer)
    12. x2ybr   mean of x * x * y       (integer)
    13. xy2br   mean of x * y * y       (integer)
    14. x-ege   mean edge count left to right   (integer)
    15. xegvy   correlation of x-ege with y (integer)
    16. y-ege   mean edge count bottom to top   (integer)
    17. yegvx   correlation of y-ege with x (integer)

while some attributes are easy to understand, but I have a problem to visualice this in my head; I mean a glyph it is not like a grid in which a handwritten character is put on? something like this:

]

but where is the individual pixel data on the dataset letterdata.csv that I mentioned at the beginning of the question? Also and for not opening another question thread, how these glyphs are generated and if there is a glyph reader in which I can put the data like: "T",2,8,3,5,1,8,13,0,6,6,10,8,0,8,0,8 and I can visualize the corresponding letter. Any help?

score 2 · Answer 1 · answered Feb 21 '16 at 17:40

This is addressed in the relevant information you quoted, specifically in the following sentence:

Each stimulus was converted into 16 primitive numerical attributes (statistical moments and edge counts) which were then scaled to fit into a range of integer values from 0 through 15.

Many machine learning algorithms won't be successful with the raw raster data, instead requiring features such as the 16 "primitive numerical attributes" mentioned above. (Deep learning algorithms should work correctly with the raster data, though.)

Computing these numerical attributed is a one-way process: you cannot recover the glyph from its statistics. We compute these statistics since we think they are the salient features which distinguish one letter from another.

how to generate/read a glyph in ocr?

1 Answers1