Home

Readiris User`s Manual

image

Contents

1. Readiris for Mac OS USER S GUIDE IRIS Document to Knowledge 2003 LR L S All rights reserved OCR Connectionist Linguistic and AutoFormat technology by I R LS 2003 LR LS All rights reserved IIl USER Ss GUIDE SAVE Time No More RETYPING Congratulations on acquiring Readiris This software package will undoubt edly be of great help in recapturing your texts tables graphics and business cards As efficient as computers are you have to key in your information first If you have ever retyped a 15 page report or a large table of figures you know how tedious and time consuming it can be Use this state of the art OCR package to automatically enter text in your applications and you Il acquire an unprecedented level of efficiency and comfort Scan a printed or typed document indicate the zones of interest or have the system detect them for you execute the character recognition and export the document to your wordprocessor Documents composed of many pages are pro cessed from start to finish in a single effort A few mouse clicks beat long hours of work as Readiris converts your paper and PDF documents into editable com puter files it s up to 40 times faster than manual retyping With the automatic mode of operation the user s effort 1s reduced to a single click he initiates the scanning and saves t
2. Preferences Scanner Scanjet HA Invert Image vi Digital camera By doing this you enhance the image before it gets recognized There are specific challenges to be met when it comes to digital cameras they produce low resolution images even when you hold the camera very close over your document and the image resolution is in any case unkown There are some finer points to be aware of when it comes to successfully recognizing images captured with a digital camera First of all select the highest possible image resolution Create for instance 2 048 x 1 536 size images when 1 024 x 768 and 640 x 480 images are also supported Secondly enable the macro mode of your camera to take closeups which is always the case when you photograph documents This mode was designed to capture flowers insects etc Otherwise the images are unsharp and illegible USER S GUIDE Limit yourself to no or small compression important compression reduces the sharpness of the captured text Zoom manually to crop your document some cameras are bundled with photo stitching software but don t bother using it for document capture Hold the camera directly above the document to avoid capturing the docu ment at an angle However avoid shadows cast on the document by the camera or your hand Produce stable images Consider mounting your camera on a tripod when necessary Disable the flash when you re filming glossy paper oth
3. REGISTER TO VOTE We invite you to register your Readiris licence by submitting a registration form on the I R I S web site this method obviously requires an Internet connec tion You can access the registration form with the command Register Readiris under the Help menu Register Readiris You can register in many ways not just via the web by faxing or sending in your registration card and by calling I R I S during working hours 000 C ReadMe on Readiris 9 0 Contacting I R I S To get product support you can contact I R LS by e mail at the address support irislink com Please describe the phenomenon you experience clearly and include all relevant data concerning Readiris and your computer system The About Readiris dialog under the Readiris menu and the command I R LS on the Internet under the Help menu give direct access to the I R I S home page www irislink com LE LS Image Recognition Integrated Systems Rue du Bosquet 10 1348 Louvain la Neuve Belgium Tel 32 10 45 13 64 Fax 32 10 45 34 43 ILR LS Inc Image Recognition Integrated Systems Delray Office Plaza 4731 West Atlantic Avenue Suite B1 B2 Delray Beach FL 33445 USA Tel 1 561 921 0847 800 447 4744 Fax 1 561 921 0854 E mail info info irislink com E mail sales sales irislink com E mail support support irislink com LR I S home page http www irislink com Readiris web site http
4. fault under the Settings menu to save the current settings including your scan ner model as default settings for future use Save As Default When you quit the Readiris software and the settings were modified you are invited to save the current settings as default settings Are you ready to modify the default settings Settings files contain more than the scanner model they also determine whether you are going to use interactive learning which language and font type for instance a normal proportional font the documents have which output mode 1s used for instance send HTML texts to Internet Explorer etc In short a operational settings of Readiris are stored in the settings files SAVING SPECIFIC SETTINGS The default settings will obviously be used at each program startup To restore the default settings without having to quit the Readiris software use the com mand Open Default under the Settings menu Open Default You can also save specific settings to avoid having to redefine the operational parameters The commands Save As and Open under the Settings menu take care of this Z USER S GUIDE Open Default Save Save As Save As Default Let s give an example if you regularly have to OCR German documents you are recommended to create a settings file for this type of document You would then select German as the document language disable learning because the same t
5. Readiris is a state of the art OCR package equipped with numerous advanced features We will discuss all major features in this chapter and add many tips and hints concerning the use of Readiris STARTING THE SOFTWARE UP Double click on the Readiris application in the Readiris folder under Appli cations or click the application icon on the dock On a computer running Mac OS 9 x you can double click the alias for the Readiris application on your desk top J Readiris E Or zs Uf P r Back Forward View Computer Home Favorites Applications l of 6 items selected 2 27 GB available Readiris Images Read Me Adobe User s Manual Readiris i 5 A l TrA The Readiris startup screen and the menu bar of the Readiris software are displayed The startup screen displays the version and copyrights of the Readiris software It also gives direct access to the I R I S homepage simply click on the URL www irislink com to visit the I R I S web site Readiris version 9 0 copyright 1493 2005 Image Recognition Integrated Systems 5A All rights reserved For information on new products and upgrades visit aur web site at www DELS ELK corm DISCOVERING THE READIRIS INTERFACE The Readiris application not only contains a menu bar but also an image window and several toolbars that give quick access to the most frequent com mands USER S GUIDE The vertical main toolbar gives quick acce
6. VII USER S GUIDE the copyrights to the Readiris software the OCR technology the BCR technol ogy this manual and the on line help AutoFormat Connectionist Linguistic technology the IBCR II the I R I S logo and Readiris are trademarks of I R I S Acrobat and Reader are registered trademarks of Adobe Apple AppleWorks Mac OS and Safari are registered trademarks of Apple Entourage Excel Internet Explorer and Word are registered trademarks of Microsoft VIII USER S GUIDE Chapter 1 INSTALLATION This chapter discusses the system requirements and installation of the Readiris software SYSTEM REQUIREMENTS This is the minimal system configuration required to use Readiris on a com puter equipped with the operating system Mac OS X LJ a Mac OS computer with a G3 processor LJ the operating system Mac OS X version 10 01 Version 10 2 x is rec ommended LJ 110 MB of free hard disk space This 1s the minimal system configuration required to use Readiris on a com puter equipped with the operating system Mac OS 9 x LJ a Mac OS computer with a PowerPC processor Readiris does not run on 680x0 processor based computers LJ the operating system Mac OS 9 x The system libraries QuickTime 4 0 and CarbonLib 1 4 or later are required If necessary CarbonLib 1 5 will be installed by the Readiris installer LJ 32 MB free RAM LJ 110 MB of free hard disk space INSTALLING THE READIRIS SOFTWARE
7. as is the case by default Readiris creates a PDF file that contains the text result Graphics may occur but only when graphic zones occur on the page photographs artwork etc In other words the page image is not contained in the single layered PDF file 2272 File Edit reser the original document he OCR process does more than just Tecognize your text it can format it for you toa In a way text recognition is becoming more and more page recognition or document Tecoenition Whether your OCR software refonnats the recognized text or not is up to the user You can perform OCR because you just need the text in which case you will edit and format it yourself and you can recreate the source document including its formatting Signatures i View Document Tools Window Help Autoformatting The aim of autoformatting is to recreate a facsimile copy of The various levela of formatting are creati body text retaining the word and Paragraph formatting and creating a facsimile copy Creating body text means no formatting is applied you get a continucus running text All formatting if any is done afterwards by the user If you retain the wer and paragraph formating the font type size and typestyle are maintained across the recognition The justification of the pamgraphs is also detected However no graphics are captured and the columns arent recreated the parigraph just follow eac
8. www readiris com On line shop http shop irislink com Local machine zone Registering your Readiris licence allows us to keep you informed of future product developments and related I R I S products The registration benefits including free product support and special offers are strictly limited to reg istered users USER S GUIDE COMFORT ISN T LAZINESS Some additional steps can be completed for maximal ease of use of Readiris On a Mac OS X system drag the Readiris application to the dock to make it available at all times You can drag the application away from the dock to re move it again Also know that the dock is personal each user that logs on to a machine may have his own set of applications on the dock Readiris Khi Ll 9S Aa Under Mac OS 9 x it may be useful to create an alias Use the command Make Alias of the Finder s File menu to do so As a result you ll be able to start the Readiris software directly from your desktop Also you can add Readiris to the folder Apple Menu Items The software documentation that came with your Macintosh can tell you more about aliases and the Apple menu INSTALLING YOUR SCANNER UNDER READIRIS Readiris exploits the Photoshop plug in or Twain driver of cach scan ner to support it In other words as soon as there s a Photoshop plug in or Twain driver available for your scanner model Readiris supports it effortlessly Under Mac
9. French German gt Greek Italian Russian Spanish Click the option Other to display the long list of languages that were not selected recently USER S GUIDE Choose a language Bulgarian English Byelorussian Byelorussian English Catalan Cancel F i M Readiris is far from limited to English up to 104 languages are supported All American and European languages are supported including the Central Euro pean languages Greek Turkish the Cyrillic Russian and the Baltic languages Optionally you can read Asian documents the extra module Asian OCR add on offers recognition of Japanese Simplified Chinese Traditional Chinese and Korean Simplified Chinese is used on China s mainland and in Singapore where Traditional Chinese is used by Hong Kong Taiwan Macau and the over seas Chinese communities Also note that the British and American or should we say international variants of the English language are distinguished Selecting the proper document language is imperative Based on the selection of a language the software knows which symbol set to recognize Multi lin guistic support ensures that exotic characters such as 8 y and are recognized correctly Secondly the software extensively uses linguistic databases to validate its results Suppose that you have to read the word president where an ink stain makes the r look like an f Look
10. Now you can F and fastest tool to enter fastidious task d texts into your ONE DECOMPOSING A SCANNED IMAGE Now that the image 1s scanned you have to indicate which parts you want to convert into editable text by drawing frames so called windows around the zones of interest Actually Readiris will do this for you automatically when the option Page Analysis under the Options button or under the Layout menu 1s enabled The page analysis is enabled by default 69 2 12 Page Analysis To force Readiris to decompose the current page because you disabled page analysis by accident because you erased some windows erroneously and want to redo the page analysis etc you can simply click the button Analyze Page on the image toolbar or click the command Analyze Page under the Process menu Analyze Page Select the document language before executing the page analysis when you are dealing with Asian documents Specific routines are used for these languages the interline spacing of Asian documents is in most cases bigger than in Western documents the text is made up of small icons ideograms that could easily be seen as graphic zones in Western documents and the text may run from top to bottom from right to left And if you forgot to select the proper language select it afterwards Readiris re executes the page analysis automatically Automatic page decomposition is particularly usef
11. The Readiris software is delivered compressed To install it is mandatory to run the installation program 1 When booting your computer select the appropriate Startup Disk If you are running the operating system Mac OS X on your computer launch the Readiris installer under Mac OS X doing so will install the necessary files to run Readiris as native software under Mac OS X and under Mac OS 9 x The reverse does not hold when the installer 1s run under Mac OS 9 x you install the software under Mac OS 9 x but mofunder Mac OS X even if this system is present on your hard disk 2 Insert the Readiris CD ROM 3 Double click on the Readiris installer and follow the on screen instruc tions You are recommended to use the easy installation it places all the necessary files on your hard disk including the sample images which are used in the tutorial of this manual USER S GUIDE 18006 Readiris Installer _ Easy Install ry Read Me 3 Click the Install button to install Application Sample Images Install Location The folder Feadiris will be created in the folder Applications on the disk Mag Install Location Mac The Readiris folder is created automatically by the installation program under the Applications folder ca Readiris v A Back Forward View Computer Home Favorites Applications l of 6 items selected 2 27 GB available Bg E Readiris Read Me
12. ate separate dictionaries for specific applications for instance per type of docu ment For clarity you are recommended to give meaningful names to the font 2 34 dictionaries for instance Report Palatino etc Training no longer has effect when the dictionary is full the results of the learning are no longer held in memory or written to a dictionary SAVING THE RESULTS IN A TEXT FILE The interactive training concludes the character recognition you will be prompted to save the OCR result to a text file Just click Save for the time being Save As English txt Where BE Desktop HH _ Append to File Click the Format button on the main toolbar or select the command Output Format under the Settings menu to discover the versatile output capabilities of Readiris USER S GUIDE Format Layout Create body text Retain word and paragraph formatting Recreate source document Fonts ral Use columns instead of frames PDF _ Include page image _ Create bookmarks M Merge lines into paragraphs I Include graphics Output MW Ask file name and location Send t Nome eeren Readiris supports the file formats Text ASCII RTF Rich Text Format HTML and Adobe Acrobat PDF The RTF format is used by default Note that the file extension of the selected format is added automatically to the file name mana Where Desktop A The option Ask F
13. Frarciseo a Soldier Reynaldo Servant to Pobowius Players Two Clowns Grave digeers Fortinbras Ponce of Noraay A Captain English Ambassadors Ghosi of Hamlets Father Gertmude Queen of Denmark and Mother of Hamlet Ophelia Daughter te Polowius Lands Ladies OPhcers Soldiers Sailors Messengers and other Atencdants SCENE Elsinore SAVING GRAPHICS SEPARATELY In our PDF example the graphic was included in the recognized text whether this is the case depends on the formatting option Include Graphics Saving graph ics inside the text is only possible with full autoformatting not with a poor text format such as Text ASCII USER S GUIDE W include graphics Still with Readiris you can save graphics without performing text recognition As Readiris supports black and white greyscale and color images you can cap ture lineart graphics and photographs How Draw a graphic zone around the illustrations cartoons etc you need Creating graphic windows manually is done in the same way as drawing text and table windows simply select the graphic window tool now on the image toolbar or under the Layout menu bal Draw Graphic Zones 7 Similar to the other window types the status bar of the image window tells you how many graphic zones there are Next choose the command Save Page As under the File menu and enable the option Graphics Only You are prompted to specify a fi
14. Gets More Intelligent Each Time 0 0 0 0 ccccccccccccesssseeeeeceeeeeeessaaes 2 28 Learn gcc sss ecg ath ects ers oes eines ac ree aces efon ana E E E E 2 31 Don t earn prs pe ct ccs es accesses geen efoto ena va as pee saan gts ne ne fe ero acesien E omen ed 2 31 Delete ee E E cae ee ee A A ab aeameeeeae 2 32 Undone E E E E E E E A ee et ee eee ee 2 32 PTS eao E E E A E sb anaceceeeeees 2 33 ADO eea E E E E rn eek eiaeeaelen 2 33 The Role of Font Dictionaries 20 0 0 ecccccccceesssssssssseeeeeeeeeeeeeeeeeeeseeeeenttttsaaaees 2 33 Saving the Results in a Text File sesccctssverectecouscecdtesanecsuctaveseteusdovstesincebinnbebecetuleiaeedts 2 34 Sending the Result Directly to Your Application cccccccccceceesssseeeceeeeesssseeees 2 37 Seeing he lest Reull 5 ae ence seat texcocenerceecuseeeseeiesueass senor tee re ntecee nea 2 39 Recognizing Multiple Pages cece cccccccccccccccccecceeceeeeeeeensesessssseseeaeeeeeeeeeeeeeeeeeeees 2 40 Organizing the Text Output 00 0000 cccccceceesssssssssseeeeeeeeeeeeeeeeeseseeeeettnntsaaaaes 2 44 Sene up rour CANCE serais iinei SEEE lean tarteteieceaiees 2 45 Scanning Documents ooside ca seasiecec5cidandadeqnaidsudnotined neracabaeuentsaeidawucsedloonsenaneretiel Sendetoneaass 2 46 Bring Color to Your Text Scans 00 0 occcccccceccccssscsccccecccccceeeeeeeeeeeeeeeetssssssssaeeeeeeeeeees 2 50 Ditterent Devices Dieren KeSONMION ss ciieucossicg cece siccsessrereuctseutesevcsseusencuns
15. Greek jpg Open a italiar ir Goto Add to Favorites y _ Me 2 a Ak l ma a a And you can open multipage TIFF files When you do so a page number is added to the root of the image file Open the sample file Multipage tif to give it a try the various pages are displayed one after the other Multipage 1 Page 1 of 5 Ee aa t Ead p F D O D amp TF A v v v Auto T 11 text zone s 0 graphic zone s O table zone s 2000x2388x1 597K 300dpi Luan Multipage 1 7 Scanjet S5 aE D meautod I D D oua mcts v pot a wodd ia which kumas beings Acquire Text T Sper md wani has hace prorlshnad on the Multipage 2 po ao tht Teche I Hh Pi i Charin th te tBte Pett i ko rights in the digenty pat the bunin pomon and in the equal sight vomed and have determined to Yo T gocial progracs and better umaiarris of ilfe i Recognize Multipage 3 fi All images you scan or load into memory are added to the current document until you click the command Close Document or New Document under the File menu Closing a document or creating a new one cleans the slate Any document loaded into memory containing a single page or multiple pages is erased New Document z Open Document 0 Close Document a6 W The page toolbar gives direct access to the various pages of the document To go to a page click it in the page toolbar The selected page is highlig
16. Microsoft Excel Select HTML as text format and Excel as target application with the Format button USER S GUIDE Format WTML Layout Create body text Retain word and paragraph formatting C Recreate source document Fonts 3 V Use columns instead of frames PDF Include page image Create bookmarks rd Merge lines into paragraphs rd Include graphics Output WM Ask file name and location Send to Microsoft Excel he ini The spreadsheet is started up and the typical table structure with rows and columns gets recreated you are immediately ready to process the data ai ra 9 2 86 e 06 Fat PE 123 985 Jeer 7as_ 129 24 5 589 19 915 91 549 pee 2 ER 4 287 41 9 812 5 _ amp T T C E 69 313 2 39 3 499 123 149 You may come across ungridded tables the page analysis does not detect as table zones because the columns are too widely spaced Readiris tries to avoid confusion with columnized text blocks To create a table window manually click on the table window tool in the image toolbar and proceed as usual 11 text Zone s 1 graphic zone s 1 table zone s GETTING ON LINE HELP This concludes our overview of Readiris Some last minute information may not be included in this manual We thus recommend you to consult the on line help system for additional information on Readiris Go to
17. Word and Paragraph For matting or Recreate Source Document enabled and the tables get recreated Open your wordprocessor to have a look at the result USER S GUIDE Reading Tables Feadri recognizes tabular data and recreates them cell by cell in worksheets or as table objects inside wordprocessor files To ieat tables as table objects wou must reinthe word and paagraph formating or recreate the som domment see the Format buthomonth maim toolbar The pare amalyer ddect sidded and ungrdded tabkes Gridded or Eramed kabks have borders around tha aglls as dows the ampk below The borders of the table eels got recreated a Poertiomwnce test opted E l COROM Overage doci CPu Wideo dip Tag ardal Cagta Versatile k WA kps ubkeaton fe pasbacks read16 KB imme TK bps doc sd CE a E Cd oe E eonom a speed OOO s eas a a CEE aa o z alaz azm i wo S ae 318 Wngyidied tables dont have any borders around the cols When the eolumos of ungrdded tabks are boo widely spaced the page amly may not ddtect a tabk window bo avol confusion with eolummeed text blocks When yor tabks axcluspreb oai numer characters amabhk the wierk reading mode with the Language buthomonthe mam toolbar bor imereased accuracy 123 985 B 313 2 7390 SBS 123 zq 3 389 19 173 3153 31 5349 zar 41d q9 526 3 612 4300 g lf 17429 473 125 173 ZHE 1 22 10938 Tare O
18. You then execute the recognition on image only PDF files and save the OCR results as text based PDF documents Text based PDF files are search able and editable image only PDF files are not Finally converting PDF files 1s a way of unlocking PDF content You can recognize read only PDF documents where the text is normally inaccessible With unprotected PDF files the content can be retrieved copied and saved to an RTF file with read only files the content cannot be extracted These docu ments can only be viewed and printed An important nuance Readiris does not open password protected PDF docu ments even if all other PDF security barriers are broken down by Readiris Proceed as usual load PDF files into memory as you open prescanned images faxes snapshots made with your digital camera etc You can give it a try with the file Sample pdf in the Readiris image folder if you care to Sample 2 Page 2 of 5 Bk Bola FT i text zone s 0 graphic zone s 0 table zone s 2479x3508x32 33983K 300dpi PERSONS REPRESENTED Claudius King of Denmark Hamlet Son to ihe former and Nephew to the present King Fokmins Lord Chamberlin Phoratio Frienal to Hamlet Laertes Son to Polonius Voltimand Courier Comelius Courtier Rosencrantz Courtier Guildenstem Courier snc Courier A Genthemen Courter A Priest Marcellus C4ticer Bernardo Officer
19. can also set the image source with the USER S GUIDE Preferences command under the Readiris menu and you can acquire images with the commands Open Document and Acquire Document under the File menu Acguire Open Color greyscale and black and white images are supported on an equal basis Readiris allows you to open FlashPix images GIF images JPEG images MacPaint images Photoshop images PICT images PNG images QuickDraw GX images QuickTime images Silicon Graphics images Targa images uncompressed packbits and Group 3 compressed TIFF images multipage TIFF images and Windows bitmaps BMP Readiris also opens Adobe Acrobat PDF documents Loading prescanned images is particularly useful to convert your faxes into editable text files Select your scanner as image source click the Open button and go to the folder Images under the Readiris folder From Images eee S Alphabets tif Autoform jpg Brazilian jpg Czech jpq Deskew pq Digital jpq Kind Document Dutch jpg Size 824 KB English jpg gt Created 11 14 01 French jpg Modified 11 14 01 Carman ina _ Goto Add to Favorites Double click the image English jpg in the image folder or click the image once and click the button Open The image is read from disk and displayed in the image zone USER S GUIDE English Page 1 of 1 a DA BD BAT e 1872x1985x32 14546K 300dpi A word about OCR The aim
20. enabled Click on the windows you want to include Windows you do not click on are simply ignored excluded from recognition It s easy to see which zones are se lected and which aren t the selected windows are numbered the non selected windows aren t English Page 1 of 1 2 la WT a 1872x1985x32 14546K 300dpi VOLE compute and sends it Hai kii At this as the document image is only a meaningless cloud of a 9 2 16 Two WINDOWING A SCANNED IMAGE MANUALLY Page analysis is the automatic way of zoning a scanned page Alternatively you can zone an image manually with the windowing tools of Readiris These are available on the image toolbar and under the Layout menu Draw Text Zones bel E3 Draw Graphic Zones r r Draw Table Zones r To draw a rectangle around a zone of interest select the corresponding tool in the image toolbar or under the Layout menu click the cursor in the upper left corner of the window stretch the window by moving the mouse to the lower right corner and click again Sides smaller than 1 mm are not allowed they wouldn t even contain a single character anyway The windows are automatically sorted in the order of creation numbers indi cate the sort order The status bar of the image window tells you how many zones of each type were created 11 text zone s 1 graphic zone s 1 table zone s You can also frame irregular text blocks by drawing polygonal wind
21. k User s Manual INSTALLING SOFTWARE OPTIONS There s a single software option available for the Readiris software the Asian OCR add on It allows you to read Japanese Traditional Chinese Simplified Chinese and Korean 2 USER S GUIDE 8 99 Simplified Chinese Page 1 of 1 p TM a D gt H Awe FT Aut Hi 727x939x1 88K 300dpi d Simplified i Li yi rE K fa E A H BRR RE HEA HR FR B m eh eA RMA FRU KA A Ree oe wt eT a tS E A I cs amp AHG Z Ea 5R F yR V i A V o B ETET amp P E t R J ie A w T wae ma a amp 5 E w if By install ng this option specific documentation becomes available that dis cusses how you can recognize Asian documents T Readiris Back Forward Computer Home Favorites Applications l of 6 items selected 2 27 GB available Readiris Images Read Me User s Manual Reading Asian lang uag es k UNINSTALLING THE READIRIS SOFTWARE Uninstalling the Readiris software is very easy run the installer again select the installation option Uninstall and click the Uninstall button The same goes for the software options run the uninstaller of these specific software options to erase them USER S GUIDE Readiris Installer Easy Install re f Read Me Custom Install install Cet a ae Sample Images Install Location The folder Feadiris will be created in the folder
22. of columns text blocks and graphics follows your original docu ment In other words Readiris allows you to archive a true copy of your documents be it an editable and compact text file instead of a scanned image All this implies that the sorting of windows only partially applies when autoformatting is used you can include and exclude zones but any re ordering of zones is simply ignored Here s an example of how it works To get acquainted with this feature open the image Autoform jpg which is found in the image folder USER S GUIDE Autoform Page 1 of 1 Yn 7A wD Ala FT 11 text Zone s 1 graphic zone s 1 table zone s 2110x2615x8 5434K 300dpi Click the Format button select the text format RTF Rich Text Format and the layout option Recreate Source Document The option Merge Lines B9 a into Paragraphs is enabled by default Enable the option Ask File Name and Location to send the reading result to an RTF file or 1f Microsoft Word is in stalled on your computer send the OCR result to Microsoft Word Note that layout reconstruction 1s limited to the RTF format and indirectly to target applications that support the RTF format adequately A poor format gen erating plain text such as Text ASCII does not support advanced formatting codes and therefore cannot offer autoformatting On the plus side the RTF for mat is a widely used text format that can be opened by a
23. the font type and character pitch These commands do not apply to Asian documents Let s clarify what this means Let s start with the command Font Type under the Settings menu The font modes separate normal documents from dot matrix printed documents Draft or 9 pin dot matrix symbols are made up of isolated separate dots and highly specialized recognition routines are used to recognize them ape descended life Letter quality dot matrix printing also called 25 pin or NLQ dot matrix requires the normal setting as do the printing qualities typeset typewritten laser printed and inkjet printed Font Type P wv Automatic Dot Matrix The setting Automatic means that Readiris will detect the font mode auto matically Let Readiris auto detect the font mode in all cases unless you are sure dot matrix documents are being read Obviously Automatic is the default value The tooltip of the Recognize button indicates the selected font type auto matic detection or dot matrix Recognize the document dot matrix font Recognize the document font detection The character pitch can be set with the command Character Pitch under the Settings menu Character Pitch P Automatic Fixed Proportional With fixed or monospaced fonts all symbols of the font have the same width An i takes up as much horizontal space on a line asa w
24. tion You have indeed converted a paper document into an editable computer file be it up to 40 times faster than manual retyping Go ahead and compare it with the image you have inside your Readiris window USER S GUIDE r TextEdit File Edit Format Window Help English txt A word about OCR The aim of OCR is to automatically enter printed text document in a very effective and low cost way Although the first research and development on Optical Character Recognition OCR began more than 30 years ago this teclmology is still unknown by most of the people who could use it for their document entry applications Now you can use this effective tool in your office and unburden yourself with the fastidious task of retyping printed text OCR is the most efficient and fastest tool to enter texts into your computer automatically The document is read by your scanner This device acts as the eye of your computer and sends it the image At this step the document image is only a meaningless cloud of black points pixels on a white background The OCR software has to extract text information from these pixels it has to recognize shapes by assigning characters The system extensively uses linguistic databases when analyzing the context in this way finding correct solutions for difficult cases The user trains the software on new characters and typestyles which are recognized automatically later on This learning module allows you to re
25. var mamei tz da he a The eee apiri miere an came hra 1 dan eens cf Prmmctal responce or weareed effin Cn a ih bent ile atiiierig t the Breet P E ae N tih m t er deger of peieppiog keom md regains Se ee ma ng kokam doing pecially the m n hiag ne ee egi eae a ak ed ped a Pipe comensais meri cher a Meciroeh our had i p mij kt ieat dbh ca pallial AL COMO hh ether pokes Gor coches ege ot feel bet cite d 1 miiy al emh copped coe ot phar poek ml oe of the oct led halel to roolia with the others Aa linen mini eid pb marimii l i iaia n oe r ie a geomet 0 ETE ch piur gorp te Appie hal im ma maicne mp dil mairia he yeep lities iii a ii Ca h hey I Ae dr do ee rei Migpbe mor The el Ra ng opipi i ieee ib AEE pose cr fe oar moked da Lie h Han tin hepa mie wc b mp bead Gi mat Seb A gece ala iha Mahtng Counc made up og jbm iii eh a a iial markar he of rng ean Anchi note be ie F oo getting hrei deent epe bd E opto muj mpake ayr kmg tt ayt wke e ppap t iaie APE ise berie keia ee ee r oun Collaborion jor USER S GUIDE 1479x2438x1 457K 300dpi READIRIS TAKES YOU AROUND THE WORLD Assuming that the windows are correctly defined you are now almost ready to execute the character recognition We say almost because we haven t veri fied the language and document settings yet The language setting can be found on the main toolbar Numeric B Dutch conz PAETE
26. D Finally you cansend your tables of figures deeethy to Mieosott Excel by selecting the spreadsheet as hargetapplcation rater to the Pormat buthomonth main topbar Coprogh wsx Becopouoo bup ed Sus Alin fee hielt eam Z Page 1 Sec 1 Se as a Ln Col 256 256 ERK OREC OTRE Z Have a closer look at the gridded or framed table the scanned table that had borders around the cells The cells and the borders were recreated by Readiris one by one Let s concentrate on the ungridded table for a moment it has no borders around the cells Note that the page analysis has nevertheless detected it There s another interesting aspect to this table its content 1s purely numeric For optimal OCR accuracy of such tables we can limit the recognition to the numeric symbols with the Language button The numeric mode is not strictly numeric it includes the symbols 0 to 9 comma dot and the symbol i v Numeric Dutch English French gt German Numeric Italian Spanish Other DA As you can only do this when the table doesn t contain any alphabetic symbols otherwise the text portions won t be recognized correctly we can activate the numeric mode only when we recognize this table but not the rest of the docu ment When we do so by selecting this table w th the Sort button we can send the OCR result directly to the spreadsheet
27. OS X use the carbonized Photoshop plug in or Twain driver or the native Photoshop plug in Under Mac OS 9 x the normal or car bonized Photoshop plug in or the Twain driver must be installed Here s how you install your scanner under Readiris 29 Using the Photoshop plug in l Install the scanner drivers using the CD ROM that comes with your scanner Doing so will install the Photoshop plug in on your com puter If necessary study the installation instructions that accompany U9 l 10 your scanner carefully to ensure that these drivers are installed prop erly Verify if the scanner operates correctly with any scanning application other than Readiris Locate the Photoshop plug in on your hard disk and copy it to the your system s Application Support folder Start up the Readiris software Select your plug in under Readiris with the option Scanner in the Preferences command under the Readiris menu That shouldn t be too hard your Photoshop plug in will be the only scanner driver avail able under the Scanner option Preferences Scanner ScanWise Plugin KY _ Invert Image Digital camera Using the Twain driver l UJ Install the scanner drivers using the CD ROM that comes with your scanner Doing so will install the Twain driver on your computer If necessary study the installation instructions that accompany your s
28. THREE SAVING WINDOWING TEMPLATES The resulting windowing layouts can be saved as zoning templates for future use with the command Save As under the Layout menu and loaded into memory with the command Open under the Layout menu There s a specific command to allow you to quickly save the current layout again Save Sample Layout Save As If you have to recognize documents with a similar layout for instance a 50 page report where the header and footer should be excluded for obvious reasons a single template can be applied to zone all 50 pages When you load a template into memory the page analysis is disabled auto matically The zoning template remains active until you re enable the page analy SIS Actually there s a nice alternative for zoning templates the preview tool Ig nore Exterior Area limits the page decomposition to the cropped portion of the image 5 r Select this tool and frame the portion of the image you want to process When you re dealing with a multipage document you can exclude the same outer zone from page analysis on every page Re execute the page analysis to cancel the image cropping or change the zones manually Book Page 1 of 1 A D BHD e TF A 1 text zone s 0 graphic zone s 0 table zone s L p aui A eee rs pit TEF LELET ete anar deparan wt ng Fad er m meie wiin eren cring Chet ihare Ghee wea uber 2 mE Socrbms tthe smpezy Gat
29. TIFF Cancel Clicking the Send button exports a scans of the current document Image_1 tif Image_z tif Obviously you can load the image files into memory with the Open button on the main toolbar or with the corresponding command under the File menu Or double click the icon of a Readiris image to load it into Readiris You can even select several of Readiris image files and execute a double click to load them into memory simultaneoulsy Color greyscale and black and white images are supported on an equal basis Readiris allows you to open FlashPix images GIF images JPEG images MacPaint images Photoshop images PICT images PNG images QuickDraw GX images QuickTime images Silicon Graphics images Targa images uncompressed packbits and Group 3 compressed TIFF images multipage TIFF images and Windows bitmaps BMP Readiris also opens Adobe Acrobat PDF documents This capability 1s particularly useful to convert your faxes into editable text files If you have any influence over your correspondents ask them to send faxes 2 81 USER S GUIDE with the fine quality those faxes have the higher resolution of 200 dpi and will yield better OCR results RECOGNIZING TABLES So far we ve recognized texts and faxes and we ve saved graphics Let s process a table now Take a table of figures and scan it or open the sample image Tables jpg in the image folder Actually the image Table
30. ack to the st di s executes With prom es of mare buoobers wd mut blow skool they be wiling to ot PEt aoter taty milir rhs to wrap things up Pher poo dicer Patrick Stak rar out of cati on he tdeperdert fim Tit he had onby ore opto He ht the streets of Warcomer ard steered for HE Appr I spent 12 oe ona street bey with shoppers orDecember 22 Stak sas I set up a tat in or art galery area had a wideo matie ard pir d Pid omer ard over agai I beked with the fim I thoeh E I dant make some Morey this wary at last TD gt some attention ard le emerat brow that Im loch for more for the Deeb pro diction The bashing episode is jette latest seert ir a ihmi saga that h stretched over the Columns rtf USER S GUIDE tan Te was an update of the Don Qurote story toldi silkib Sard Paren s pott of view wih a cary homekss man Di tte role of the TARTLETINE hrigt wd Sam Ferr an mtt work salesmary as Pem Tey git ty cast Tom Enmitanol m assistant director on Mhe AFies at the time ard ar actor hron better as FroFdike from the sows Lore Gmm wi the led wad bgm mene ir fa ter fret day of shooting Troe bem a semiek erdbess shoot wih jet ore day of shooting savdadched in emery CAD Dots OT so PE Fad no morey to start with at all so we thought we could at least ake moren for ore day that would gre we larerage to get the se od dy dwe Stuk ephire Wher yare phrig ore dy of imha ya can pi yar bet effort rto igh product
31. ad virtually any font In other words the software gets more intelligent each time you use it Copyright Image Recognition Integrated Systems Web site htto www irislink com RECOGNIZING MULTIPLE PAGES But how do you save the text of additional pages Or in other words how do you process documents consisting of multiple pages It s actually very simple go on recognizing pages but enable the option Append to File when you are saving to the same file Save As English txt Where Desktop ka fv W Append to File l Cancel 3 save gt But there s a more efficient way of recognizing several pages than scanning and OCRing them one after the other processing multipage documents di rectly To scan a document composed of several pages in one operation enable the document feeder of your scanner Study the Photoshop plug in or Twain driver of your scanner to see how this works Place the pages of your document in the automatic document feeder and start the scanning You can also open multiple prescanned images To load several images select the first image and hold down the Command key as you select additional images To load a continuous range of images select the first image and hold down the Shift key as you select the last image USER S GUIDE From Images 4 Alphabets tif 3 Autoform jpg Ei Brazilian jpg 3 Czech jpg 3 Dutch jpg English jpg French jpa I German jpg IS
32. age width English English 50 English 100 English 200 You can also Command click the mouse button over a region of the scanned image to zoom in at real size immediately Command click a second time to zoom out again As soon as you press the Command key over the image preview the mouse cursor 1s adapted 69 2 10 English 100 Page 1 of 1 a D A Bb eo e Y English Page 1 of 1 1872x1985x32 145 n DA BDH 0 text zone s 0 graphic zone s 0 table zone s A W O rd a b O A word about OCR Q The aim of OCR is to automatically enter printed tex low cost way Although the first research and c Recognition OCR began more than 30 years ago I most of the people who could use it for their docume eraros The aim of OCR is to low cost way Alth Recognition OCR k most of the people w texts into your computer automatically Finally the magnifying glass allows you to zoom in on specific details of the acquired images Click the button Magnifying Glass on the image toolbar or Shift click and drag the mouse across the image 2 1 USER S GUIDE English Page 1 of 1 a 22 Bh Ba WT text zones 0 graphic zone s 0 table zone s 1872x1985x32 14546K 300dpi The aim of OCR is to b uments in a very effective and low cost way Al 7 ment on Optical Character Recognition OCR Hoey is still unknown by most of the peopl pplicatiorss hem yourself with the
33. ality documents faxes and dot matrix printouts It copes beautifully with badly scanned and copied documents containing too light or dark font shapes Joined characters ligatures are resolved and fragmented forms such as dot matrix symbols are recomposed User verification in pop up style not only flags doubtful characters but also increases the system s precision All solutions confirmed by the user are memo rized increasing speed and confidence as you go along Using Readiris means rendering it more intelligent each time This powerful learning tool allows you to train Readiris on special characters such as mathematic symbols and dingbats but also to handle distorted fonts as you will find in real documents To increase your productivity further Readiris not only recognizes your texts but can formatthem for you as well Make use of autoformatting and Readiris recreates a facsimile copy of the scanned document the word paragraph and page formatting of the original document are retained Similar typefaces are used the point sizes and typestyles as used in the source document are maintained across the recognition The placement of columns text blocks and graphics follows your original documents And as Readiris supports greyscale and color scanning effortlessly you can recapture any graphics be they lineart black and white photos or color illustrations When a document contains tables Readiris reorganizes them in real c
34. ance Byelorussian English In other words don t try to just select Greek or Byelorussian as document language and hope that the Western symbols will come out fine Creek Russian Here s an example where a Russian text contains some English words open the image file Alphabets tif if you want to try it for yourself Alphabets 50 Page 1 of 1 a D 2 D Bla F 0 text zone s 0 graphic zone s 0 table zone s 1920x2476x1 619K 300dp Russian English IIpequa3HaveHHeM CHCTeMbI UTHYECKOTO Paclio3HaBaHHa 3HaKOB ABIIACTCA ABTOMATHYECKH BEBOJ TeuaTHbIX JOKYMEHTOB B AMATE KOMIIbIOTepa Kpaline oe PEKTHBHEIM H jem sniM yT eM Hecmorps uro pa3pa6oTKa 3TOH CHCTeMI a Tipequpunsta En TEXHOJOTHA E OKA HEH3BECTHA IOHpoko my Imke JIA ABTOMATAYECKOTO BBOJa MaTepHala H AOKYMCHTOB eee ene em Tenge Sede SSR SSM SSM TSSE DS Pa i oe ee ei Ce i Ceo oe To mix other languages simply select the language with the most extended character set If you have a document where the say French translation is placed USER S GUIDE alongside an English text you have to select French as language to ensure that the accentuated characters such as and get recognized correctly DEFINING THE DOCUMENT CHARACTERISTICS Now that the language is set we ll turn to the other document characteristics You can fine tune the recognition by specifying some document features
35. are by default represented by a tilde the sym bol The reject character can be modified with the Preferences command under the Readiris menu E iea Oo Su ial all Symbol for rejected characters r llcar intarfara lanniisana SS E E FE If necessary enter a character or character string for the incorrect or un known shape and click one of the following buttons Learn You agree with the proposed solution or correct it The program saves this doubtful character in the font dictionary as sure final Future recognition will USER S GUIDE no longer require your intervention the shape is considered learnt once and for all In the example above the system stops on a damaged character and we click Learn to accept a shape which cannot be confused with other characters Don t Learn You agree with the proposed solution or correct it The difference with the Learn button is that the learnt symbol gets the status unsure in the dictionary For future recognition the system will propose the learnt solution but still re quire a confirmation This button is used for symbols which might be confused with others a de faced e which might be mistaken for a c a damaged t which closely re sembles an r etc Dictionary Untitled Characters and typestyles which are recognized automatically later on This learning module allows you to read virtually any font In o
36. as is the case in this sentence Think of documents produced using a typewriter where the carriage moves a fixed distance for each typed symbol A proportiona lpitch means that the width of a character depends on its shape Symbols like m and w are wider take more horizontal space on a line than the thin characters TP or 4 Virtually all books magazines and newspapers are printed in proportional pitch The simplest solution is to leave this option at all times on the default value Automatic which means that Readiris will detect the character pitch automati cally READIRIS GETS MORE INTELLIGENT EAcH TIME When the document language is selected and document characteristics are set you can click the Recognize button on the main toolbar or the command Recognize Document under the Process menu Recognize Document Recognize The OCR progress is indicated on screen You can click the Escape key to abort the text recognition USER S GUIDE Readiris will enter the interactive learning phase at the end of the recognition when the learning is enabled Interactive learning is enabled by default Font training can substantially enhance the accuracy of the recognition sys tem When the user tries to read distorted defaced forms as are found in real documents or stylized font shapes which Readiris does not recognize optimally training can overcome this temporary failure User lea
37. can ner carefully to ensure that these drivers are installed properly Verify if the scanner operates correctly with any scanning application other than Readiris Start up the Readiris software Select your scanner model under Readiris with the option Scanner in the Preferences command under the Readiris menu pep USER S GUIDE Preferences Scanner EPSON TWAIN HH Invert Image _ Digital camera More about scanner support can be found in the Read Me file that comes with the Readiris software Don t hesitate to contact your scanner manufacturer or its representative should there be problems with scanner drivers Most manufacturers allow you to down load the latest versions of the scanners drivers from their web site GETTING PRODUCT SUPPORT The Readiris Read Me file details how you can get technical support Among other things you can contact I R I S by e mail at the address support irislink com Please describe the phenomenon you experience clearly and include all rel evant data concerning Readiris your scanner and your computer system GETTING IN ToucH wirta I R LS You can also contact I R I S to learn more about its range of software solu tions The Readiris startup screen and the command I R I S on the Internet under the Help menu of Readiris bring you directly to the I R I S home page www irislink com USER S GUIDE Chapter 2 GUIDED TOUR
38. come available as output targets To make things easier for you you re prompted to assign target applications to the supported text formats the first time you run Readiris Please choose your preferred document type and output format Document type Text Output format RTF We d like you to assign the text formats Readiris supports to specific applications Doing so allows you to launch an associated application automatically when the recognition is done Associate type TE with None Any choices you make here can be modified later on with the Format button on the toolbar Note that the Send to option also allows you to copy the recognized text to the clipboard so there is no strict need to export the result to an application or save it to a text file SEEING THE TEXT RESULT Concluding Readiris offers several methods when it comes to saving the OCR result copying the result to the clipboard saving the result in a text file exporting the recognized document promptly to a target application and even saving the result in a text file and sending the recognized document directly to an applica tion After the OCR the scanned image is redisplayed with the zoning as created to be available for further processing it stays there until you scan another page You can now open the recognized text with your wordprocessor text editor import it into your desktop publishing software or any other text based applica
39. day Softbank manages about 4 billion in venture capital funds for global investments YASUMITSU SHIGETA 35 has invested in more than 70 Web or mobile Net based ven tures in Japan and the U S including Tum bleweed Communications and Phone com Shigeta is also developing new businesses that take advantage of the growth of the Internet and mobile communications VASUMITSU SHIGETA 35 has invested in more than 70 Web or mobile Net based ven tures in Japan and the U S including Tum bleweed Communications and Phone com Shigeta is also developing new businesses _ that take advantage of the growth of the Internet and mobile communications As was already indicated powerful intelligent routines automatically convert color and greyscale images into black and white Thanks to its intelligent rou tines even tough cases get solved here s how our difficult image gets binarized by Readiris MASAYOSHI SON 42 president and CEO is the master Net empire builder His con glomerate holds stakes in 300 Internet companies in the U S Japan Europe and other Asian countries Today Softbank manages about 4 billion in venture capital funds for global investments YASUMITSU SHIGETA 35 has invested in more than 70 Web or mobile Net based ven tures in Japan and the U S including Tum bleweed Communications and Phone com Shigeta is also developing new businesses that take advantage of the growth of the In
40. e USER S GUIDE DIFFERENT DEVICES DIFFERENT RESOLUTION Whatever your scanning mode may be maintain a scanning resolution of 300 dpi In all probability this is notthe default setting of your Photoshop plug in or Twain driver Select a resolution of 300 dpi for normal applications use a higher resolution of 400 dpi for small print below 10 point and when the docu ment is very degraded Readiris reads point sizes of 6 to 72 point 0 08 to 1 or 0 21 to 2 54 cm 6 point 72 point Readiris also recognizes drop letters large caps that cover several lines These can of course be no bigger than 72 point eadiris reads drop letters also called drop caps that cover several lines and assigns them to their starting line Faxes have a resolution of 100 or 200 dpi when you re creating images with a digital camera the resolution is unknown when you re opening images the file header may contain an incorrect resolution To process such images hassle free enable the option Process as 300 dpi under the Preferences command of the Readiris menu This setting applies to both direct scanning and the opening of prescanned images Ki Process as 300 dpi When your images are acquired by a digital camera instead of a scanner it is mandatory that you enable another special option Digital Camera in the Preferences command This parameter again applies to direct scanning and prescanned images
41. ells and recreates the cell borders of the original tables In other words Readiris allows you to archive a true copy of your documents be it editable and compact text files instead of scanned images Various levels of formatting are available the choice is up to the user USER S GUIDE Readiris supports virtually all scanners using their Photoshop plug in or Twain drivers all models that dispose of a Photoshop plug in or Twain mod ule are seamlessly supported TABLE OF CONTENTS Poe Time IN OLIN Ly IV ac coca tetera E EE R Il Tablo or 1011 lt 01 een nee ene eet AEA NAET ne ee eee ere V Credits and Copyrights oo ccrcsrccitictanteraitosentarcinionanteseictoset bien wvndanteseaiiedsedindioencentinndevetievatiesass VI Chapter 1 Installation System REGU TEMEN S en ee te ner re eee l 1 Installing the Readiris Software 20 0 0 cccccccesessssssssseeeeeeceeeeeeeeeseseeeeeestttsssaaeees 1 2 Installing Software Options 0 cccccccccccccccccccccceceeseseeeeeesnesessssseeeeeeeeeeeeceeeeeeeeeeeeeenes 1 4 Uninstalling the Readiris Software 0 0 0 0 cccccccceeeeensessssssseeeeaeeeeeeeeeeeeeeeeeeeeens 1 6 PRE OTST I VON E a A l 7 Comin ISi LAZINESS ne ne Ce E 1 9 Installing Your Scanner under Readiris 0c ccecccccceesssssccceecceeeeceeeeceseeeeeettsnsssaaaes 1 9 Using the Photoshop plug in se cecadencescccea coca iesncetbastasmaseateudacssagueceecseninedsaiuateaneeenad cdtenernacesbeeaaeids 1 9 U
42. erwise the image may be too light Generally speaking adapt the brightness and contrast to the environ ment day light lamp light neon light etc Some cameras can be calibrated by filming a white document ee To give it a try open the image Digital jpg in the Readiris image folder and execute the recognition a9 2 54 Digital Page 1 of 1 a D A A lb TA O text zone s 0 graphic zone s 0 table zone s 3070x1270x16 7620K 300dpi ADJUSTING THE SCANNED IMAGES Scanning in greyscale and color isn t just useful to save the graphics with sufficient quality in some instances it s also useful or necessary to obtain good OCR results When text is printed on a color background scanning in color may create the tone differences that are lacking in black and white images When there is only limited contrast between the text and the background the back ground can create noise that renders the recognition difficult or impossible Think for instance of black text printed on a dark background when you scan such a document in black and white you may not be able to drop the back USER S GUIDE ground color without losing the text information as well as much as you may try to adjust the scanner brightness MASAYOSH SON 42 president and CEO is the master Net empire builder His con glomerate holds stakes in 300 Internet companies in the U S Japan Europe and other Asian countries To
43. etaceserceees 2 52 Adjusting the Scanned mage 0 00 00 cc ccccccccceesssesssssseeeeeeeeeeeeeeeeeseeeeeentnttsaaaes 2 55 Saving Default Settings ccc ck 2st sates asnie sncdoosneecudcestudesadtesusesade cotcivnsteckdiuastoestcdedacss 2 61 Saving Specific Settings 20 0 cccceeeeesssssscceeeeeeececeeeeeeceeeseeeeeessassesaaeeeeeeeeeeees 2 61 Recognizing Pages Automatically 0 0 0 0 ccccccccccccccccccccccccceeeeceeeeeeseeeeessstssssesaeeeeeeeeees 2 62 Readiris Recreates Your Document Layout 0000cccecccceeeeesssssceseeeeeeeeeeeeeeees 2 62 Columns Please Not Frames 0 000 ccc ee cceececcccceecececuececccccuecccccauseeceeaueececeeanesseeaas 2 67 Text Formatting Part 2 cccccccccccccccssccccceccccecceceeeeeceeeeeestenssssssseeeeeeeeeseeeeeeeeeeeeeeees 2 69 Creatine Portable Documents srocesscnnesisiionisiitiieiiii r ui E Anaa 2 70 E i aA i a E E EA E E EN E A 2 76 Saving Graphics Separately ssseseccccccccccccccceccccessssessessssessccccanecececeeeeseveess 2 78 Reading Faxes and Deferred Recognition eeeeccceeeseseererrreessserrrrrrrerrererrrrrrssssen 2 80 Recopmizine VAD CS csias n EO AN 2 83 Get ng On line Help sid sce setccdanednntsiesdcticenssnnsauegsastaautieseaeteuemcbscxnecviesssauiuinatsnledeosddecbessidute 2 88 CREDITS AND COPYRIGHTS The Readiris software is designed and developed by I R I S OCR Connectionist Linguistic and AutoFormat technology by I R I S IL R I S detains
44. ey don t contain any text and re sort the remaining zones you can click the command Delete Small Zones under the Layout menu Delete Small Zones ONE AND A HALF SORTING WINDOWS Readiris not only detects the various blocks but also sorts them the zones are sorted top down left to right by default to cope with columnized documents Numbers indicate the sort order 2 14 English Page 1 of 1 YW ABODA TA 6 text zane s 1 graphic zone s 0 table zone s 187 2x1985x32 14546K 300dpi i Fi Although ae first re aceon an Optical Characte tan i OCR began more than T o this technology is still unknown b most of he people who could use it for their document entry applications and als it the image At this step the z i slack points pixels om a white backs alts E nformation from these pixels it has to recog e system extensively uses linguistic aba ses n E comt is Waly inding correct solutions for difficult c he user trains the ee on Tew aracters and typestyles which are Pe later on This learning odule allows you to read virtually a other words the software gets more ntellipent each tine you use it Copyright Imaal TI i Integrated Systems Weh sites mi T mshnk com Evidently you can modify the sort order To do so click the Sort button or use the command Sort Zones under the Layout menu Ze Io USER S GUIDE The mouse cursor changes as soon as the sort mode is
45. g the wo paragraph formatting and creating a facsimile copy The option Create Bookmarks sees to it that a bookmark is created for each document element the graphics as well as the text blocks and tables For the text zones Readiris applies an intelligent algorithm to come up with a title a summary per zone the tables and graphics are simply numbered Another navigational element of PDF documents page thumbnails can be created dy namically by your Adobe Reader or Adobe Acrobat software F 3 Adobe Reader RT Edit View py Titles yau Goa Py Autoformatting i laa tt e Py Copyright De aed m page Tee magin F py Tables ee fee mapi oot oo Tm an pion OC b fee ati io which un p imm iyena e E py Table 1 Edit 3 Adobe Reader RT View USER S GUIDE we OR READING THEM Let s look the other way for a moment As Readiris offers full support of the Adobe Acrobat PDF format you won t just generate PDF files you can also read them Repurposing PDF documents may be a major application of Readiris There are several reason why this is the case First of all it s a way of converting images into text open image based PDF documents execute the recognition and save the OCR result to a text document in any supported text format Text files are editable image files are not Second case you can convert image based PDF files to text based PDF docu ments
46. h other etc Autoformatting recreates a facsimile copy of the original document the text blocks graphics and tables are recreated in the same place and the word and paragraph formatting are maintained across the recognition Cel TAT Inmate Cel13A 100 000 Asda result you get a true copy of your source document be it a compact and editable text file no longer a scanned image of your document When this option is enabled you get different results Readiris creates a search able PDF file that contains the recognized text and the page image The page USER S GUIDE image is contained above the text in a two layered PDF file Use the Search tool of Adobe Reader or Adobe Acrobat and this becomes quickly obvious gt Search PDF Hide Finished searching for OCR Total instances found 3 2d New Search Results Ei the OCR process does more than EY your OCR software reformats the t EY perform OCR because you just nei Ee jal d Done m Use Advanced Search Options i Complete Adobe Reader 6 0 Help 2 74 et Adobe Reader File Edit Bookmarks the original document he i does more than just mouge yuu lexi i ean fret il fiar you too 1 Signatures View Document Autoformatting The aim of autoformatting is to recreate a facsimile copy of Window Help Tools The various levels cf formatting are pranbing po bony text retainin
47. he text result all intermediate steps are taken care of by Readiris After the recognition you can send the reading results directly to your favorite applications be that a wordprocessor spreadsheet or web browser Readiris recognizes tabular data and recreates them as worksheets or as table objects inside your wordprocessor your numeric data are immediately ready for further processing Based on the Connectionist technology from I R I S Readiris represents the best OCR has to offer Font independant feature extraction 1s complemented by self learning techniques derived from a proprietary neural network The system can learn new characters through context analysis linguistic knowledge about syllables and words improves the OCR performance Readiris supports up to 104 languages all American and European languages are supported including the Central European languages the Baltic languages Greek and the Cyrillic Russian languages Optionally you can read four Asian languages Japanese Simplified and Traditional Chinese and Korean Readiris even copes with mixed alphabets the software detects Western words that pop up in Greek Cyrillic and Asian documents many untranscrible proper names brand names etc are written using the Western symbols Readiris uses linguistics during the recognition phase not after it As a direct result Readiris recognizes documents of all kinds with top accuracy including low qu
48. hen you enable the option All Any windows you might have detected or drawn on the page are ignored USER S GUIDE Save As format Sie a pF Where BMP JPEG Photoshop ai Oc PICT PNG Cancel Y The color mode of the original image color greyscale or black and white is always maintained Select an appropriate graphic format various graphic formats are available When you save a document as a JPEG file for deferred OCR ensure that you maintain sufficient image quality JPEG files with high compression rates de grade the image quality and the performance of your OCR software can suffer as a consequence As we just indicated the command Save Page As exclusively saves the current page There s a much more efficient way of saving your scans in graphic files for later OCR enable the image scanning mode To do so select the document type Image on the main toolbar or under the Settings menu Note that the Recognize button is now replaced by the Send button a _ z Image Send Click the Format button to discover what this means You have the same flexibility that you have when you re recognizing documents you can save your m f 2 80 E ka z9 scans in files and send them directly to a target application Photoshop the Preview application etc Note how the Format button indicates the selected graphic format Output E Mi Ask file name and location Send to Preview HH
49. hted You can also edit multipage documents mainly to correct scanning errors you can drag pages to the trashcan below to delete them and you can drag and drop them to other locations in the document to reorder them Start the recognition on the sample image Multipage tif If the interactive learning 1s enabled you go through the recognition and learn ing phases page by page USER S GUIDE When you click the Finish button all decisions by the system thereafter are accepted without user validation In other words the interactive learning is aborted for all pages the OCR for this document continues in automatic mode The recognition result of multipage documents is saved in a single output file you are prompted to specify the filename after the first page and the following pages get appended When the recognition result is sent to a target application multiple pages get created inside a single document ORGANIZING THE TEXT OUTPUT Saving or exporting the text means more than selecting an output method saving a file sending the output to a target application or the clipboard or doing both or defining a filename for the output file You also select a file format and determine the appearance of the recognized text In short you have to decide where you want to take the text before you launch the execution Some options of the Format button allow you to influence the look of the text output The text flow of the output d
50. ile Name and Location determines whether you are prompted to save the recognized text at the end of the recognition phase SENDING THE RESULT DIRECTLY TO YOUR APPLICA TION But we can also send the recognized text directly to our text application as an alternative to saving a text file and simultaneously with it For instance 1f Microsoft Word functions as your target application your wordprocessor will be started up automatically at the end of the recognition 1f necessary and the rec ognized text will be inserted inside a new document The Send to feature offers a direct OCR link between your scanner and your Mac OS applications Readiris exports recognized documents directly to any text based Mac OS application wordprocessors such as Microsoft Word spreadsheets such as Microsoft Excel web browsers such as Apple Safari ap plication suites such as AppleWorks and standard Mac OS text applications such as TextEdit Use the option Add Application to declare an application as a possible output target all declared applications remain so until they are removed again with the option Remove Application Select None to disable the use of a tar get application momentarily USER S GUIDE Output MW Ask file name and location Send to None HH Clipboard Add Application Remove Application You are recommended to assign different applications to the various formats so that several applications be
51. image of your Loc Ue SERA SRS SS SRR FS E To see the effect correctly you need to enable the WYSIWIG mode of your wordprocessor mostly called page layout mode However if you send the recognized document directly to Microsoft Word the page or print layout view is activated automatically lt Normal Online Layout Page Layout Outline Master Document In short Readiris not only recognizes your texts but can format them for you as well OCR isn t just text recognition anymore 1t is becoming more and more page or document recognition as well COLUMNS PLEASE NoT FRAMES The formatting option Use Columns instead of Frames determines Aowthe autoformatting gets done the text blocks tables and graphics can either be stored in frames or in editable columns Frames are separate containers for text used to position several blocks of text graphics and tables on a page With columns the text flows naturally from one column to the next and columnized texts are much easier to edit We now assume that real columns do occur on the scanned document when the system is unable to detect columns in the source document this formatting mode uses frames anyway as a fallback position You can make good use of the image Columns tif in the image folder if you want to try it WHEN THE MONEY ROMS TH ov A aw ST poed whats there to do Your ameraze producer can onby go beging b
52. ing malul alas wn pi ral virtually winy fen i other words fhe sollware pets itor intelligunt vazh linwe wu usw iff The first two options concern color and greyscale images the last one Despeckle exclusively concerns black and white images Despeckling means that the parasite pixels also called salt and pepper noise will be removed from black and white images 2257 USER S GUIDE If computers can t If computers can t adapt easily then adapt easily then maybe the people maybe the people using themcan using them can Be sure that you don t erase spots that are too big otherwise you might start erasing the dots on 1 etc portions of dot matrix letters etc Despeckle remove 5 pixel dots Warning removing too large dots may erase useful information from the image Apply Cancel ia k By enabling the option Despeckling under the Options button and under the Settings menu the despeckling is executed automatically on every page loaded into memory iy Page Despeckling n The best way of optimizing the images for the OCR process 1s this place the adjustment window where it doesn t prevent you from judging the image adjust ment you execute Adapt the parameters clicking Apply each time until the image is crisp and clear I j 2 60 ae cal SAVING DEFAULT SETTINGS Set the program parameters correctly and click the command Save As De
53. ing things up in the English lexicon Readiris will detect autonomously that the word president is being read and that it doesn t make any sense to recognize the symbol f This self learning technique is of course highly dependant on the linguistic context Linguistics offer useful help to solve ambiguous cases such as an O which might be mistaken for a 0 Another typical example is the letter I and number 1 which have an identical form in many fonts think of texts produced on old typewriters The linguistic context helps to determine whether you are dealing with I or T The illustration below shows various shapes of 1 and 1 The shapes on the first line are unambiguous the shapes on the second line are ambiguous but linguistics can solve them When the context does not suffice the user inter 193 1950S th Well Rossellini READIRIS CHANGES LANGUAGES AS NEEDED But the buck doesn t stop here Readiris can switch languages in the middle of a sentence without any help from the user When Western words pop up in Greek Cyrillic or Asian documents many untranscrible proper names brand names etc are written using the familiar Western symbols Readiris can switch USER S GUIDE to the correct alphabet automatically In other words you can activate a mixed alphabet of Greek Cyrillic or Asian and Western characters Be sure to select Greek English or the appropriate Cyrillic language setting for inst
54. ion tihes I warded tto look live alughbypolebed film that told We had no money to start SS a En EO mpr p a as Sa i better pat of rearhy far yars For Stak wd hi Pe eg fr crear ne with at all so we thought ifwe i i wari wih o the weekends Es ben a could at kast raise money for d catat mission A one day thatwould gwe us Stak met hi chief colkbaratar or the fim O writer director Lance Terales nhik both of leverage to get the second day Page 1 Sec 1 11 at os Ln 1 Col 1 REC O 4 Furthermore the button Fonts offers you control over the typefaces that get used to autoformat the document but we recommended you not to change the default values Select up to four fonts to be used in the documents created by Readiris Font 1 sans serif Font 2 serif Font 3 fixed Courier New w Font 4 narrow Warning it is not recommended to change the default fonts for Latin languages l Default b i j d c Cancel 3 TEXT FORMATTING PART 2 The other layout options are Create Body Text and Retain Word and Para graph Formatting USER S GUIDE Format RTF Layout Create body text Retain word and paragraph formatting J Recreate source document Fonts Ta y Use columns instead of frames Creating body text means you create a non formatted running text The text will be captured but its formatting is entirely ig
55. lename Save As Format TIFF s Where BE Desktop H fe 2 All Graphics Only Determine which graphic file format you will use Select a format that s sup ported by your paint or photo retouching software A multitude of popular graphic formats is available JPEG Photoshop PNG PICT TIFF and Windows bitmaps BMP The graphics are saved in a single file You don t have to limit yourself to a single graphic but if you draw several graphic windows they will be collected stacked in a single file You can use the crop command of your paint or photo retouching program to separate them Sides smaller than mm are not allowed bitmaps of that size hardly contain any information Irregular non rectangular windows are allowed and so are several graphics The surface not covered by your complex graphic zones remains white In the example below two graphics zones one in the left lower corner and the other in the upper right corner lead to lots of white space around the actual graphics READING FAXES AND DEFERRED RECOGNITION Saving images as image files opens another possibility you can save the fu page and perform deferred OCR on it later on That s what we did with the prescanned images of our tutorials Simply scan a document and select the command Save Page As under the File menu This command only saves single pages You Il be prompted to save the entire page as a graphic file w
56. lligence they contain in this way Readiris takes into account the intelligence stored n these font libraries You could say that Readiris gets more intelligence each time you use it USER S GUIDE Initially all input from the user is simply held in the computer s memory No font shapes are actually saved until he uses the command Save As under the Learning menu When he does so all learnt shapes contained in the RAM memory are stored in files called font dictionaries for future use Save As Sample Training Where _ Readiris eee f Cancel The command Open Dictionary allows to oad font dictionaries back into memory Open Dictionary The active dictionary is mentioned at all times in the title bar of the interactive learning window When no dictionary has been saved yet the name Untitled Training is used Click the Abort button of the interactive learning in case you have loaded the wrong font dictionary Dictionary Untitled Training Use the command New Dictionary to unload whichever dictionary 1s loaded into memory New Dictionary You can also append complete existing dictionaries by loading them perform ing extra learning and saving them again There s a specific command to allow you to quickly save the current dictionary Save Sample Training Save Dictionary As Font dictionaries are limited to 500 shapes and you are recommended to cre
57. lor under the View menu Image in Color There s another way to import image files into Readiris Drop them on the Readiris icon Readiris starts up and the image file is opened automatically ___ Readiris Back Forward View Computer Home Fa tel there ere he ee el f Ti L i a j J r i Lt 4 Hems selected iat a Readiris ReadMe htm The image toolbar contains all the commands you need during the image pre view tools to analyze the page to indicate the zones of interest to rotate the image etc LOOMING IN ON IMAGES Readiris has several commands that allow you to zoom in on the scanned image for instance to verify the scanning quality Click the Zoom Level button on the image toolbar or go the View menu to discover the zoom levels you can zoom in at real size display the image at 50 and 200 of its actual size fit the image to the page width and to fit the entire image in the preview window At actual size a screen pixel corresponds to an image pixel Shortcuts are available for all zoom levels USER S GUIDE R a amp a F a Fit to Window Fit to Width 50 Actual Size Actual Size p mter pinied Oe Actual Size an 4 years ago busse IEE LEEN it for their document entry app en i 546K 300dpi Er mewn wy Note that the current zoom level is indicated in the window title there s no zoom level mentioned when the image fits the window or the p
58. mean the black and white threshold The setting Automatic determines the bilevel threshold automatically Apply a different threshold when necessary by darkening or lightening the black and white image when you darken the image more pixels become black in the black and white version when you lighten the image less pixels become black in the black and white version Note above all that no image adjustment is executed until you click the Ap ply button By clicking OK you execute the adjustment and close the window Here s an example where we lightened the black and white image dramatically though admittedly not with OCR accuracy in mind The aim al CK is bo subimalivally voter printed beat dumenn ia a wees eflective ane lay bet wav Althoreh he frst resuirch ond deve loomenh on edical character Recuenitinn i Adjust Image Most the pi M Smoothen grayscale and color images MATAS URLI La Riadia tisi Brightness teats irk YHL sa 2 Automatic Manual cee v seems lighten darken Despeckle otf Warning removing too large dots may erase useful information from the image Wha li ace and sends ll y Apply cancer black points es ee indarmaticn F r The system extensively uses linguistic databases when analyzing the contest in this way Finding correct olution foe aliF icult cases The oner teins fhe siwane on mew chonwteTs and dypesivles which ane ceva automalicgliy later on This earn
59. nored Use this option when you just need to recapture a text but not its layout The option Retain Word and Paragraph Formatting represents the middle road the word formatting font type serif sans serif proportional fixed normal condensed point size and typestyle bold italic and underlined is retained across the recognition and so is the paragraph formatting the tabs and the alignment left centered right and justified Don t confuse this formatting option with full autoformatting this option just puts one paragraph after the other 1t does not recreate columns or copy the relative position of the various zones CREATING PORTABLE DOCUMENTS We still need to go deeper into one format Adobe Acrobat PDF Readiris allows you to create PDF documents and offers lots of options concerning PDF files Format PDF Layout Create body text Retain word and paragraph formatting Recreate source document Fonts 1 he vi Use columns instead of frames PDF Mi include page image Mi Create bookmarks v Merge lines into paragraphs i include graphics Output Mi Ask file name and location send to Adobe Reader 6 0 ks Cancel 1 he As soon as the PDF format is selected autoformatting applies and cannot be disabled 227 USER S GUIDE Enabling and disabling the option Include Page Image allows to create PDF files of two types when this option is disabled
60. ny popular wordprocessor When the recognized text is opened using a wordprocessor the text looks like this without any intervention by the user USER S GUIDE Autoform Autoformatting The aim of autoformatting is to recreate a facsimile copy of the original document The various levels of he OCR process dors mom than pei formatting are creating body recognize Your test it can Somat it for text retaining the werd and wou tool paragraph formatting and ertating a Eucsimile copy hay wo cope o booming me Creating body text menns mo md Oe ME Ld wap rhor oi docu formatting is supplied wou get eee a continuous running text All formatting if any is dont sftervumard Ey the user Whebe you OCA sohage woma he bi Empir ue o o amp epo she eam Y ar po am OCA bm or pw red he um in For warn che eed eee feo he veh coe you all ode aed fom r yord md YO on EDERE whe waw d mme rdidrg Pa ue empor The paimo oof whe paap mha formating D a3 deed Howe ro paho ar apid form pe are md p de me marad mio adhe mkm a m eorr he paap x x oloa ac oho oc Au bobomatigg reergahs a sawam noy ob the original document the text blocks gieplics andl tabs are recreated in the sume phos and the worl andl paragraph fomai are mainbimd acres the recog nition cells Paes Cell 24 Warden Cellaa 1 00 000 Asama yo gA atag copy ob your sour document be ita compact amd aliabb text file mo loner a sawl
61. ocument is directly influenced by the option Merge Lines into Paragraphs nd Merge lines into paragraphs Keep this option enabled to have Readiris detect the paragraphs Readiris will then apply the normal wordwrap typical of wordprocessors otherwise a car riage return is added after each line and hyphenated words remain so Paragraph detection is enabled by default Let s give an example to clear things up When the first three lines of a col umn are The new presi dent waved from the balcony and His wife had joined him the paragraph detection gives you the following result The new president waved from the balcony His wife had joined him The hyphenated parts of the word president were reglued and a space was added at the end of the first sentence thus creating naturally flowing text Had paragraph detection not been enabled the original layout would have been retained with a carriage return added at the end of each line SETTING UP YOUR SCANNER Let s set your scanner up now It is assumed that the scanner hardware and necessary software are installed correctly on your computer system Actually it s all very easy Readiris exploits the Photoshop plug in or Twain driver of each scanner to support it In other words as soon as there s a Photoshop plug in or Twain module available for your scanner model Readiris supports it effortlessly In short locate your scanner s Photosho
62. ode that best suits your needs To include lineart graphics in the recognized documents scan in black and white to include black and white photos scan in greyscales to include color pictures scan in color Readiris processes true color images 16 million colors by default but you can process smaller images to limit the system requirements It takes the Pref erences command under the Readiris menu to process 16 bit palette images 65 536 colors 8 bit images 256 colors or greyscales or 1 bit images black and white _ Digital camera Black and White Grayscale 256 Colors 65 536 Colors Preprocessing M Reduce colors to Poel Eee et aS DENE hae S DEN eos iwi Lemanathan mraverala and enlnre imanar It goes without saying that greyscale and color images are slower to acquire and require more RAM memory than bilevel images When you increase the color mode to true color the required free RAM memory increases from 22 MB to 32 MB on Mac OS 9 x systems This does not apply to computers that run Mac OS X that operating system handles memory management entirely au tonomously Note that the image size and bit depth is mentioned on the status bar of the image window 1872x1939x32 14209K 300dpi Readiris creates a black and white version for every greyscale and color im age To view a scanned image in black and white disable the option Image in Color under the View menu Image in Color Z
63. of OCR is to automatically enter printed text documents in a very effective and low cost way Although the first research and development on Optical Character Recognition OCR began more than 30 years ago this technology is still unknown by Acquire most of the people who could use it for their document entry applications Now you can use this effective tool in your office and unburden yourself with the fastidious task of retyping printed text OCR is the most efficient and fastest tool to enter texts into your computer automatically Ry Recognize The document is read by your scanner This device acts as the eye of your computer and sends it the image At this step the document image is only a meaningless cloud of black points pixels on a white background The OCR software has to extract text information from these pixels it has to recognize shapes by assigning characters The system extensively uses linguistic databases when analyzing the context in this way finding correct solutions for difficult cases The user trains the software on new characters and typestyles which are recognized automatically later on This learning module allows you to read virtually any font In other words the software gets more intelligent each time you use it For every greyscale and color image a black and white version is generated for the OCR process To display a greyscale or color image as black and white disable the option Image in Co
64. ows around them Non rectangular windows are created by merging rectangular zones as soon as two rectangles of the same type intersect they become a single window automatically In a way you re building a house by adding one room after the other Creating polygonal table windows doesn t make any sense 2217 USER S GUIDE English 50 Page 1 of 1 n D D d ea wf 1 text zone s 0 graphic zone s 0 table zone s 187 2x1985x32 14546K 300dpi r your office and unburden yourself with the ICR is the most efficient and fastest tool to enter ak oe Furthermore manual zoning can be combined with window sorting you can draw new windows even when the sort mode 1s enabled You then use sorting to include a number of detected windows and manually create some other win dows where the page analysis didn t yield the appropriate results As soon as you start creating windows in the sort mode all windows you didn t select are promptly erased O16 Ps 69 To modify move and delete windows you need to select them first To do so choose the window selection tool in the image toolbar and click inside a window Rectangular markers now appear at each corner and in the middle of the window sides F Select Zones To unselect windows click the mouse button elsewhere To select addi tional windows hold down the Shift key while clicking on these extra windows So m
65. p plug in on your hard disk and copy it to your system s Application Support folder Next select your plug in under Readiris with the option Scanner of the Preferences command under the Readiris menu Preferences Scanner ScanWise Plugin HH Hi C invert Image Digital camera To use a Twain driver simply select it in the Preferences command The option Invert Image allows you to generate inverted images this option is useful to process full pages with white text on a dark background These options do not apply to scanners using the Photoshop plug in The selected scanner is mentioned in the main toolbar the title bar of the image window and the filename in the page toolbar indicate which scanner was used to acquire the image Given our example page 1 was scanned with Agfa s ScanWise plug in and that plug in is still the active scanner USER S GUIDE ScanWise Plugin Page 1 of 1 a W Beh Awa T tet Auto 1434x3348x8 4708K 300dpi arr Scanwise v Scanwise Go to the Readiris Read Me file or to chapter 1 of this manual should you need further information SCANNING DOCUMENTS Now that our scanner is set up we want to get started scanning documents The scanner s Photoshop plug in or Twain driver is used to set the scanning resolution the page format and orientation brightness and contra
66. rning is also used to train the system on special symbols which Readiris is unable to recognize such as mathematical and scientific symbols and dingbats Some examples Readiris can be trained to recognize the z symbol as pi or the dingbat as Tel However the list of recognized symbols cannot be extended with the symbols z and The interactive learning is enabled with the Learn button on the main toolbar or with the option Interactive Learning under the Learning menu Pa Interactive Learning Interactive learning does not apply to Asian documents learning does not make sense for these languages which use thousands of different symbols and you d have to be able to enter the ideograms not an easy task when using a Western keyboard At the end of the recognition Readiris displays the recognized text progres sively and the system stops on doubtful characters or if you are dealing with touching characters ligatures on doubtful character strings They are al ways presented in their context the doubtful characters are highlighted Dictionary Untitled A word about OCR The aim of OCR is to automatically enter printed text document in a very effective and low cost y Although the first research and development on Optical Character Recognition OCR began more than 30 years ago this tH Undo f Delete Finish C Abort Don t Learn Unrecognized characters
67. s jpg contains two tables and that s no coincidence The page analysis zones them as table windows and Readiris will reconstruct them for you by recreating the tables cell by cell in your spreadsheet or by insert ing a table object inside your wordprocessor files Tables Page 1 of 1 i W A B text zone s 0 graphic zone s 0 table zone s i F source document see the Format button on the main toolbar Tables The page analysis detects gridded and ungridded tables Gridded or framed tables have borders around the cells as does the example below The borders of the table cells get recreated a T a 2092254032 20796K 300dpi Ungridded tables don t have any borders around the cells When the columns of ungndded tables are too widely spaced the page analysis may not detect a table window to avoid confusion with columnized text blocks When your tables exclusively contain numeric characters enable the numeric reading mode with the Language button on the main toolbar for increased accuracy 123 985 957 745 19 287 410 479 000 499 125 69 129 149 49 0 149 3137 24 915 626 17 28 2 395 6 683 91 54 9 812 17 42 112 5 Finally you can send your tables of figures directly to Microsoft Excel by selecting the spreadsheet as target application refer to the Format button on the main toolbar Run the recognition with the layout option Retain
68. sing the Twain driven secesii erain Ea a r AE taser tase E tee TAE Taaa TEA E AREE 1 10 Getting Product Support 200 0 cccccccccceecessssssesseeeeeeeeeeeeeeeeeeeeseeeestttetsasaaeees 1 11 Getting in Touch with I R I S ccsec cc chuntassusedaessasecdondbocnensterantnasdtinsucteadtanewenstseiun tke 1 11 Chapter 2 Guided Tour Starting the Software Up cccceceeccsscssssscceeeeccccceeeeeeeeseeseeeessssseeeaaeeeeeeeeeeeeeeeeeeeeenes 2 1 Discovering the Readiris Interface 0 0 0 ccccccccceeessesesssssseeeeeeeeeeeeeeeeeeseeeeenees 2 2 Getting Started with a First Tutorial ccccccccccccccccccccccccceeeeeeeeeeeeeesstsssseaaeeeeeeeeeees 2 4 Zooming WOM TINA SS senorita raa EEE ET EE E E EAE 2 8 One Decomposing a Scanned Image 0 0 0 0 cceccceceessssssssseeeeeeeeeeeceeeeeeeeeeeeeeens 2 11 One and a Half Sorting Windows 0 0 0 0 cecccccceeecesseenssssseeeeeeeeeeeeeeeeeeeeeeeeeeens 2 13 Two Windowing a Scanned Image Manually 0 000cccccccccccccccceccceeeeeeeeeesssntsssaaes 2 16 Three Saving Windowing Templates ccccccccccessssssssseeeeeeeeeeeceeeeeeeeeeeeeeens 2 19 Readiris Takes You around the World 000cccccceccccssssssssssseeseeeeeeeeeeeeeeeeeeeeens 2 21 Readiris Changes Languages As Needed 0ccccceccccesesssssssceeeeeeeeeeeeeeeeeeeeeees 2 24 Defining the Document Characteristics 00 0 0000ccccccccccecesssesssseeeeeeeeeeeeeeeeeeeeeeeeens 2 27 a9 VI aes Readiris
69. ss to all frequent general com mands the horizontal image toolbar contains all common commands you need during the image preview 8 0 8 Untitled Auto AO v Scanjet Acquire Text my To learn which command corresponds to a certain button hold your mouse pointer over it for a while the status bar of the image window will tell you what the button does The window pane or image zone is where the scanned images are displayed 0608 Untitled e RABE He Acquire a document with your scanner F Scanjet S B Acquire The status bar also displays all system information and gives information on the current image the image size in image pixels and in KB and the image resolution When the image window is too small some information may not be visible Open a document from a file 21i0x2615x24 5434K 300dpi GETTING STARTED WITH A FirSsT TUTORIAL The best way to become familiar with the operation of Readiris is undoubtedly by using it A number of prescanned images is provided with the software they allow you to get started even when there is no scanner connected to your computer Let s turn to them now Readiris allows you to scan images using your scanner and open prescanned images select File as image source and use the Open button to open prescanned images select your scanner as image source and use the Acquire button to acquire images with your scanner You
70. st The con trast setting is only available on some scanners Which scanning options you dispose of depends on your scanner model Refer to the software documentation that accompanies your scanner Replace this string with your window title Q ea Blim Gol ak Image Type Scan using 300 x 300 dots per inch Black amp White Text Photo Mixed Colori Mixed Grayscale Custom Change Custom Settings Send To 4k Send Now Scan Again Scan All Restore Defaults pi F nena amc s Output Type Image Size _ Done 0 0000 g 0000 100 Black amp White Bitmap 0 0KB There are some elements you should be aware of First of all pay some atten tion to lineskew Although the page analysis and recognition are skew tolerant it may become difficult to zone and OCR a page correctly when the skew is too significant Limited lineskew less than 0 5 can be ignored because the OCR accuracy does not suffer USER S GUIDE The option Page Deskewing under the Options button or under the Set tings menu determines whether pages which were scanned at an angle will be deskewed straightened automatically Limited lineskew gets ignored This op tion is disabled by default iy Page Deskewing If you forgot to enable this option use the command Deskew Page on the image toolbar or under the Process menu to straighten pages that
71. ternet and mobile communications m i 2 56 Should this still be necessary the user can optimize the image further for the consecutive OCR process Select the Adjust Image button on the image toolbar or the command Adjust Image under the Process menu to do so Pr T Adjust Image When you access this command the black and white version 1s displayed automatically It s as if you disabled the option Image in Color There are some complicated concepts here and we need to discuss them in detail i Smoothen grayscale and color images Brightness 2 fe Automatic Manual lighten Des peck e off Warning removing too large dots may erase useful information from the image Apply Cancel h T Tan he The option Smoothen Greyscale and Color Images renders greyscale and color images more homogeneous by flattening smoothing out relative differ ences in intensity As a result a sharper contrast is created between the fore ground the text and the background a color artwork etc USER S GUIDE The image smoothening is also available as an option in the Preferences command under the Readiris menu We suggest that you leave this option en abled at all times Preprocessing Reduce colors ta 65 536 Colors ki i Smoothen grayscale and color images Process as 300 dpi The brightness now By brightness we actually
72. the Help menu to do so The command Readiris Help allows you to navigate through the many help topics USER S GUIDE 0660 Readiris a ae z ft Pa yey Back Forward Stop Refresh Home AutoFill Print Se file Readiris Help_US Readiris htm Live Home Page a Appke Appke Support Appk Store Mac Mac 0S s Microsoft MacTopia iT Contents B 7 sajoney amp Welcome to the Readiris help gb Introducing OCR gi Recognizing Documents Recognizing Business Cards gt Scanning Images How to gt Reference Information w ADOUSIHD K Software Versions and Options Product Registration gt Product Support LAs ki yoogdes yaeas iA p JapjoH abeg Welcome to Readiris Help Use on line help to learn more about Readiris Quickly find answers to questions Connect to the I R 1 5 web site for latest tips and product updates 7003 Copyright 1 8 1 5 All rights reserved Wi Local machine zone LZ You can also find more information on Readiris on the I R I S web site www irislink com the command I R I S on the Internet takes you directly to the I R I S home page LR 1 5 on the Internet
73. ther words the software gets more intelligent each time you use it Copyright Image Image f Undo Delete Finish as f Abot Don t Learn The e above is seriously damaged in fact it is close to the letter c and you should click Don t Learn so as not to confuse it with the symbol c Delete The displayed form is eliminated from the output This button is used to ignore noise on the documents spots coffee stains etc which might get recognized as points comma s and what have you and to erase any other unwanted sym bol Undo You go back to correct mistakes You can undo the nine last decisions Finish The learning process is aborted but the OCR continues in automatic mode All decisions by the system thereafter are accepted without user validation Click this button when you see that the recognition is highly accurate and does not require detailled proofreading Abort Don t confuse Finish with the Abort button with Abort no output is generated and you start all over with Finish the text is created it just isn t proofread in detail THE ROLE OF FONT DICTIONARIES The results of each training session are temporarily held in the computer s memory but can and should be stored in files called dictionaries for future use Font dictionaries should be loaded into memory when you want to recognize similar documents in order to make use of the extra inte
74. uch for selecting zones To modify a window select it put your mouse cursor over a marker and drag the side to change the window size To move a window simply select it and drag it to another location To delete windows select the window s and choose the Cut or Clear command from the Edit menu The Cut command cuts the window s to an internal buffer Clear erases the window s irretrievably When you paste zones they are inserted 1n their original position and you have to drag them to their new location In fact a familiar commands from the Edit menu apply to the windows you can delete cut copy and paste them The Undo command also applies if you have unfortunately deleted moved resized etc some zones Undo will cancel the last operation Edit C Undo EFA Copy j C Faste 3V Clear Select All WA 2 19 USER S GUIDE Also note that shortcuts are available for all commands Let s give an ex ample to erase all existing windows you can choose the command Select All or its shortcut Command A and click the command Clear or its shortcut BackSpace Alternatively you can use the command Delete All Zones under the Layout menu to erase all windows simultaneously Delete All Zones You are now ready to recreate the necessary layout To restore the previous layout you can choose Undo or the shortcut Command Z Or click Undo once more to erase the windows a second time
75. ul when columnized texts and documents with a complex page layout possibly including graphics and tables are recognized Page decomposition uses three window types text graphic and table win dows Readiris discriminates text blocks tables and graphic zones containing photos illustrations etc on the page Saving graphics and recognizing tables will be discussed at great length below A specific icon marks each zone type Also note that you can Ctrl click a zone to change its type and to delete it Ze 19 USER S GUIDE enter printe Graphic fone Epo rst research and de Table Zone than 30 years ago th at e it for their d ow Delete Zone 2 tool in your office Astidnme Orr ne yd text OCR is the ma yts into your computer automatically Page analyisis is fast skew tolerant and highly accurate it traces complex irregular shapes apa W D ane DISPLAY Sceptre s re of CD 1112T LCD panel onn is so svelte that you can tuck iti The page analysis will even detect zones where you get white text on a black background Recognizing such inserts is no problem while the preview displays the scanned document correctly on screen Readiris inverts the image when the need arises to recognize such text blocks Some documents have many stray dots on the page may generate a black page border around the actual image etc To erase all small windows it s as sumed th
76. were scanned at an angle Deskew Page The deskewing takes a few seconds the image is analyzed to detect the skew angle if any the color or greyscale image and its black and white version are deskewed and the page analysis gets re executed You may also need to adjust the page orientation Use the rotation tools on the image toolbar Corresponding commands are found under the Process menu Three rotation directions are available to the right to the left and upside down Rotation also takes a few seconds as the image itself is updated not just the display on screen 2 p Rotate Right air otate Left Left Rotate 180 180 However Readiris can correct badly oriented pages for you Enable the op tion Page Orientation Detection under the Options button or under the Set tings menu and Readiris will correct the page orientation where needed Page Orientation Detection n You can make good use of the image Deskew jpg the image folder if you want to try it Enable the options Page Deskewing and Page Orientation Detection before you open the image and let Readiris restore the Tower of Pisa the way we like it USER S GUIDE Deskew Page 1 of 1 fa A A 5 a Als Y a Open a document from a file 14344334618 4708K 300dpi BRING COLOR TO Your TEXT Scans Readiris supports black and white greyscale and color images on an equal basis so you are free to choose the color m
77. ypefaces are used systematically etc RECOGNIZING PAGES AUTOMATICALLY Now that our scanner is set up we want to get started capturing documents Instead of going through all the parameters we ll execute automatic OCR a very comfortable way of recognizing pages Click the Auto button or select the command Automatic OCR under the Process menu amp Automatic OCR Auto We will now perform fully automatic OCR that is we will recognize a page immediately without any interruption Automatic OCR means that a page is suc cessively scanned windowed by page analysis or a zoning template and recog nized without interactive learning All you have to do is initiate the scanning and save the recognized text the intermediate steps are handled by Readiris READIRIS RECREATES YOUR DOCUMENT LAYOUT Automatic recognition which renders the recognition process automatic should not be confused with autoformatting Autoformatting means that Readiris rec ra 9 2262 reates a facsimile copy of the scanned document the word paragraph and page formatting of your original document are applied Similar typefaces serif and sans serif proportional and fixed normal and condensed are used as in the source document the point sizes and typestyles bold italic and underlined are maintained across the recognition The tabs and the alignment left centered right and justified of each text block are recreated The placement

Download Pdf Manuals

image

Related Search

Related Contents

Philips SWV3458W/17 User's Manual  POWER 8200 MP3  Ateca LEDE00224 flat panel floorstand  FSE-054-IT-5.0 Manuale d`uso Gastroscopio 1G Fuse  Manual de Instalación y Operación  Fitbug Air - Upload Modes  monos TECHNBQEJE 8. MODE D`EMPLOi l  AR283 - FR - version 1.3  USER MANUAL - icg  LXUSB User Manual - The Shoestring Astronomy Store  

Copyright © All rights reserved.
Failed to retrieve file