Back to News

Document OCR SDK

2015/07/30

Yunmai Document OCR SDK provides Optical Character Recognition technology for developers. By integrating our recognition SDK, your software is able to extract text from images, and create editable documents. This feature is generally used to manage a large amount of paper documents. The SDK can be integrated into mobile application or PC software.

What the OCR SDK can do

Extract text from image: Identify the characters on the image and retrieve the text.
Automatically crop image: Automatically detect document position and trim edges before running the text identification.
Character suggestion: After the Optical Character Recognition is completed, the engine is able to automatically detect potential errors on the recognition results, and provide characters suggestion and words suggestions. For English, the engine will give both characters and words; for Chinese, Chinese characters will be provided.
Export to PDF: Export to double-layer PDF file to allow user to search content within the file by typing keyword.
Auto categorization of images: After scanning images, the engine is able to automatically detect picture type and categorize the images before running OCR. The possible categories are: Business card, ID card, passport, license, text document, landscape, etc.
Blur detection: Before the text recognition, the engine will analyze the image quality and define whether the picture is blurry or not.
Image format supported: jpg, png, bmp. The engine supports manual adjustments of the image effects, such as brightness, contrast, color mode, etc.
Support multiple languages: Simplified Chinese, Traditional Chinese, English, Spanish, French, German, Italian, Portuguese, Swedish, Danish, Norwegian, Finnish and Dutch.

How the text extractor works

Step 1: Capture paper document or import existing image;
Step 2: Crop image, and enhance image effects;
Step 3: Process OCR to extract text from the image;
Step 4: Use character suggestion to correct errors or manually edit the results.

Character recognition results

The following are the average recognition accuracy for different languages:
English characters: 97%
European characters: 99%
Chinese characters: 92%
These data are obtained on recognizing an 8 pixel megabytes image of a document with 800 characters, and the font size is 12.

Character recognition speed

The recognition speed is based on hardware and the document captured, the following is an example:
Hardware: 1.7GHz CPU / 1GB RAM smart phone or higher
Capturing object: a paper document with about 800 characters
Recognition speed: OCR process will take about 7 seconds.

Programming language supported

The SDK is available for different programming languages: Java, C++, C, Object Pascal, Objective-C.

Other recognition SDKs we developed

Business card recognition SDK
Bank card recognition SDK
Chinese citizen ID card recognition SDK

Contact information

Tel: +86 592 596 9372
Email: sales@yunmai.com
Website: http://www.yunmai.com/en/home.html