Jump to content
  • 0
mbdick

OCR on a UNICODE PDF document

Question

I occasionally have to OCR a scanned PDF document which is in Unicode but not in one of the OCR languages. It's in  2500-years-old Akkadian.
ša i-na bala-e (17) Iiri-ba damar.utu ṭūḫḫeṣi

(BTW the last word does not exist but I made it up to indicate the type of letters I use) How do I tweak PDFElement  6 professional to just slavishly render the text as Unicode so I can search it?

Share this post


Link to post
Share on other sites

2 answers to this question

Recommended Posts

  • 0

Hi Mbdick,

I am afraid that this  2500-years-old Akkadian language cannot be found in our OCR library. 

Therefore, the text of this language cannot be well recognized when performing OCR, I am afraid. 

Please kindly understand. 

 

Share this post


Link to post
Share on other sites
  • 0

I think I worded my inquiry poorly. Of course I do not want an OCR for a language dead for 2000 years! My problem I guess is that I have problems getting PDFelement to recognize UNICODE. As my example shows ša i-na bala-e  Iiri-ba damar.utu ṭūḫḫeṣi; when PDFelement encounters a line like that in an article and creates a PDF, it seems to garble this line, even though it should be easily recognizable as UNICODE (Adobe PDF did), additionally it should be able to recognize superscripts and subscripts. I don't expect it to know what the line says (OCR), but it should be able to reproduce it in a PDF. Is there a tweak that takes PDFelement beyond ASCII with accents?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

Digitize paperwork and accelerate the way you create, prepare and sign documents.

Available for Windows, Mac, iOS, & Android.

Try Free Buy Now
Start your free trial!

Skip and Download

×
Start your free trial!

Skip and Download

×
×