Optical Character Recognition unicode subset