Home

BACHELOR THESIS Michal Kebrt Word-to

image

Contents

1. Center Right ee Italics Pink Table 1 Sample table Converted with Word to BTpX Some colors and font sizes led id risus Donec enenatis viverra velit nisl mattis urna non luctus sapien ante et leo Integer pharetra congue jtempus metus sem eu lorem dio vitae nibh Donec porta Source code int main int argc char argv 4 if 0 lt 1 4 printf Hello World return 0 I Indenting EQ page reference Aliquam egestas quam in imperdiet imperdiet nulla nulla lacinia nunc congue tempus EQ field yz PAGEREF colors are on page 1 Table Center Right ee Italics Pink Table 1 Sample table 32 Converted with Word2TpX Some colors and font sizes led id risus Donec enenatis viverra velit nisl mattis urna non luctus sapien ante et leo Integer pharetra congue tempus metus sem eu lorem dio vitae nibh Donec porta Source code int main int argc char argv if 0 lt 1 printf Hello World return 0 Indenting EQ page reference Aliquam egestas quam in imperdiet imperdiet nulla nulla lacinia nunc congue tempus EQ field PAGEREF colors are on page 1 Table Table 1 Sample table Blue Center Right ee Italics Pink 33 Chapter 4 Conclusion Word to BTEX convertor has almost all the features from the specificationi that was given in April 2005 A short list of features th
2. E lt font gt FONT_SANS_SERIF sans serif font e g Arial Verdana S textsf E S lt font type sans serif gt E lt font gt FONT_SIZE1 font size group 1 S tiny E S lt font size value 1 gt E lt font size gt FONT_SIZE2 font size group 2 S scriptsize E S lt font size value 2 gt E lt font size gt FONT_SIZE3 font size group 3 S footnotesize E S lt font size value 3 gt E lt font size gt FONT_SIZE4 font size group 4 S small Ee 9 lt font size value 4 gt E lt font size gt FONT SIZE5 font size group 5 S normalsize E S lt font size value 5 gt E lt font size gt Table B 2 Conversion mappings 58 FONT_SIZE6 font size group 6 S large E S lt font size value 6 gt E lt font size gt FONT_SIZE7 font size group 7 8 Large E S lt font size value 7 gt E lt font size gt FONT_SIZE8 font size group 8 S LARGE E S lt font size value 8 gt E lt font size gt FONT_SIZE9 font size group 9 S huge E S lt font size value 9 gt E lt font size gt FONT_SIZE10 font size group 10 S Huge E S lt font size value 10 gt E lt font size gt HEADING1 heading level 1 headings have to
3. title SWL DOC_TITLE WL PAGE_SIZE US NORMA A g Mextasciitilde gt Mextasciicircum makeatletter newenvironment indentation 3 iparisetlength iparindent 3 setlenath leftmargin 1 isetlength rightmargin 2 ladvancellinewidth leftmargin advancellinewidth E Move up Move down rightmargin Figure 5 3 Preamble tab Document preamble inserted at the top of output files can be easily edited in this dialog Table 5 4 shows the list of macros that can be used in the preamble The translations of Output format special characters e g in BTEX or lt in XML are defined in the right part of this dialog Don t forget to fill in these characters in the right order because some special characters can be used for the translation of other special characters e g must be at the top for BTEX output New characters can be added double clicking the pink row 5 6 4 Special characters Special characters are divided into groups according to their Unicode 11 posi tions Each character can have a translation used in regular text context and a math translation used in math context Currently when a character has both translations defined the text translation is always used If it has only a math translation the character is inserted as a simple inline equation If no translation is defined the character is inserted as is in UTF 8 encoding The math tran
4. DOCUMENT_BODY document body S begin document WL NL E end document S lt body gt E lt body gt lt document gt Table B 2 Conversion mappings 64 LIST ENUMERATE enumerated list S begin fenumerate WL NL E end enumerate WL NLOWL NL S WL NL lt list type enumerate gt E lt list gt WL NL LIST_ITEMIZE itemized list S begin fitemize WL NL E end itemize WL NLOWL NL S WL NL lt list type itemize gt E lt list gt WL NL LIST_ITEM list item S WL TAB item E Ji S lt list item gt E lt list item gt WL NL PARAGRAPH common paragraph S E GWL NLOWL NL S WL NL lt para gt E lt para gt WL NL TABLE PARAGRAPH paragraph in a table S WL NL E WL NL S WL NL lt table para gt E lt table para gt WL NL LIST PARAGRAPH paragraph in a list ET E WL NL S lt list para gt E lt list para gt LINE BREAK line break S WL NL WL NL S lt linebreak gt TAB tabulator S hspace 15pt S lt tab gt Table B 2 Conversion mappings 65 TABLE CELL table cell e HWIDTH cell width S amp EZ S lt table cell width WIDTH gt E lt table cell gt TABLE ROW table row S E WL NL S lt table row gt E lt table row gt TABLE table e TITLE title of the table 8 WL
5. Figure 2 11 Running the convertor from a VBA macro 28 Chapter 3 Related projects 3 1 Summary There are a couple of programs that convert Word documents to BIEX Only one of them Word2TEX is so good that it will be described in details and compared with Word to BTEX in section 3 2 The other convertors are listed only in a brief more details can be found in 12 Word independent convertors e wsW2LTX 13 based on cross platform wv library that allows to access Word binary files The convertor has no customization options doesn t convert font sizes user styles headings paragraph aligning etc e Antiword 14 wide portable converts only to plain text or PostScript Font styles and sizes footnotes lists tables etc are converted Problems with figures sometimes occur Convertors that need Word installed e GrindEQ 15 works as a Word add in Cannot be customized doesn t handle lists headings font sizes paragraph indentation special characters graphics etc e A couple of very old convertors e g Word TEX can be found at CTAN sites 16 RTF to BIEX convertors RTF is document file format that can be read and exported in most of text processors including Microsoft Word e rif2latex2e 17 produces quite nice IATEX output It converts font styles footnotes tables paragraph styles Equation Editor equations and some figures e Other RIF to IXTEX convertors can be found at CTAN
6. The WL PAGE_SIZE macro will be replaced with a value depending on the Page size processing option as shows table 5 3 Option name WL PAGE_SIZE will be replaced with complete the complete definition of the page size matching the page size of the input document symbolic the convertor will try to translate the symbolic page size e g A4 of the input document to an appropriate BTEX size e g letterpaper use Page size the value of the Page size option Table 5 3 Page size processing options Translations The translation mappings between input document elements and LITEX com mands are defined here It comprises of headings font styles footnotes tables A alignments colors and so on Each element has a Start command which is inserted before the element itself and an End command inserted after the ele ment One example Let some text appear in the document and the FONT ITALIC mapping is textit for the start command and y for the end command Then textit Some text will be written to the output file The complete overview of translated elements with the default mappings for BIEX and XML output can be found in section B 2 5 6 3 Document preamble EE Word to LaTex E In xl Configuration Help Running Figures E q Document HL Styles Fonts Characters Misc Output format special characters textbackslash usepackage hyperref lauthor SWL DOC_AUTHOR
7. right gt E FOOTNOTE footnote S footnote E S lt footnote gt E lt footnote gt PAGE_BREAK page break S pagebreak WL NL WL NL S lt pagebreak gt EQUATION_INLINE inline equation S begin math E end math S lt equation type inline gt E lt equation gt EQUATION_NUMBERED e FORIG_LABEL numbered equation original equation label retrieved from the input document S begin equation E GOWL NL ORIG LABELOWL NLVendfequation S equation type numbered origlabel ORIG LABEL D E lt equation gt EQUATION LABEL equation label inserted into the EQUATION_NUMEBERED element e NAME auto generated label auto incrementing counter is used S label NAME S label name NAME gt Table B 2 Conversion mappings 61 EQUATION OUTLINE equation displayed on a separate line 8 begin displaymath E end displaymath 9 lt equation type outline gt E lt equation gt INDEX_ENTRY index entry Word XE field S index E k S lt index entry gt E lt index entry gt INDEX index Word INDEX field KTEX generates the whole index automatically S printindex S lt printindex gt IMAGE_COMMAND image e WIDTH image width in points e FILENAME auto generated image filename e g img1 eps e TITLE image title if present S includegraphics width WIDTHpt
8. type is a standard RGB representation stored in one long number Word WdColorIndex type contains a couple of internal codes The mapping be tween Word WdColorIndex and Word WdColor had to be made to be able to convert all the colors to RGB Colors are written in HTML notation to output files e g FF0000 is a red color 26 There is no fast wav of searching the portions of text that have color different from black The document has to be traversed character bv character and the convertor checks whether the color of the current character is different from the color of the previous one This technique is extremelv slow so the conversion of colored text can be disabled in the configuration Colored backgrounds of text are converted if thev are applied to a stvle or inserted using the Highlight tool Such highlighted text can be easilv found with the Word Find object Titles The Word object model doesn t provide anv information about links between tables or figures and their titles Therefore the convertor must check whether the paragraph before the title contains a table or some kind of figure Tables The conversion of tables is surelv the most complicated part of the convertor Its source code has been rewritten a few times so it s working quite good now but the code is a little bit messy The problem is both in Word and BIEX The Word object model has a very limited interface for tables with merged cells There are no functions li
9. 1 Lorem ipsum deler sit amet consectetuer adipiscing elit UT SED NISI vel justo lobortis venenatis Sed id risus Donec sollicitudin Aenean nulla Nam blandit sapien a venenatis viverra velit nisl mattis urna non luctus sapien ante et leo H20 E mc 1 2 Styles 2 Lorem ipsum dolor sit amet consectetuer adipiscing elit Ut sed nisi vel justo lobortis venenatis Sed id risus Donec sollicitudin Aenean nulla Nam blandit sapien a venenatis viverra velit nisl mattis urna non luctus sapien ante et leo 2 Special characters in list e lu ou k k p l d belsk dy VOaoc ie T a b Z A x B 3 Paragraph indentation Lorem ipsum dolor sit amet consectetuer adipiscing elit Lorem ipsum dolor sit amet consectetuer adipiscing elit Ut sed nisi vel justo lobortis 4 Simple table Center bold Right 2 1 Italics Pink 5 Complex table Header A b B c d lLorem ipsum dolor sit amet 49 XML output transformed to HTML and rendered in Mozilla Font styles Styles 1 Lorem ipsum deler sit amet consectetuer adipiscing elit UT SED NISI vel justo lobortis venenatis Sed id risus Donec sollicitudin Aenean nulla Nam blandit sapien a venenatis viverra velit nisl mattis urna non luctus sapien ante et leo H20 E mc Styles 2 Lorem ipsum dolor sit amet Lorem ipsum dolor sit amet consectetuer 2 adipiscing elit Ut se
10. 2 1 List of subprojects 19 Word Object Model word to latex configuration class word to latex word to latex gui V word to latex bin 7 word to latex gui bin word to latex glue lt Figure 2 5 Projects dependencies 2 1 3 Libraries The following libraries are used e Microsoft Word 10 0 Object Model Library e NET System XML for processing XML configuration files and validating them against XML Schema e NET System Windows Forms for creating the graphic user interface e NET System Drawing for saving images in PNG format 2 2 Design and algorithms Projects that worth deeper description are word to latex which performs all the conversion and word to latex glue that allows to run the convertor through a VBA macro The WLConvertor class demonstrated in figure 2 6 is the main entry point of the word to latex library This class receives an input document an output file name and a configuration file initializes the Word application and the Math Type library Afterwards two important tasks are to be done when converting a document First positions of all special non text elements like footnotes or styles must be retrieved and stored as so called marks to inner structures of the convertor Once this is done the conversion of text content can begin with the document preamble and continue with
11. 4 Improving performance using COM 3 Related projects 3 1 Summary 3 2 Word2TEX versus Word to BIEX lt llle A Conclusion II 5 User s manual 5 1 Requirements and installation lt lt lt lt 5 2 Uninstallation 5 3 Configuration 10 10 12 13 13 13 14 14 15 16 16 16 19 20 20 21 22 22 24 24 25 26 27 29 29 30 34 5 4 Command line convertor eh 38 5 5 EPS to TIF image conversion 2 00000 39 5 6 Graphic user interface 2 A4 ru b a a 39 5 6 1 Running th conversion aca AR 39 5 6 2 Figures Equations and Translations 40 5 6 3 Document preamble lt oe i ee da wg 42 5 6 4 Special characters 05 62 kol Gis bob Bisa keh ale l 42 5 6 5 Styles and Font sizes ifa dates Lt ale How ale Im 44 5 6 6 Miscellaneous options o a do ije Ow oe 6 45 5 7 Running Word to BIEX from Word lt aa 46 5 8 Conversion to XML XHTML MathML 46 A Sample documents 47 Structure of configuration files 52 Bil QGonversion 0pmotis 72 33203 Bae A ae AS eue UB nS 53 B 2 Conversion mappings ow ete pee eng et ewer x 56 B 3 Special characters 4 52 AD TA g ee aes use 70 N zev pr ce Konvertor Word to IATEX Autor Michal Kebrt Katedra Katedra softwarov ho in en rstv Vedouc bakal sk pr ce RNDr Tom Skopal Ph D E mail vedouc ho tomas skopal net Abstrakt V p edlo en pr ci popisu
12. Ghostscript executable must be specified at the top of the eps2tif bat file When you want to export all images from a Word document to some bitmap format PNG JPEG and so on just run Word to BTEX to have an EPS version of each image and then execute the eps2tif bat file with the options described in table 5 2 Finally you can convert the output TIF files to the format you prefer for example Irfanview does this very effectivelv eps2tif bat inDir outDir inDir directory from which the files with eps extension are taken outDir directory where the tif files will be saved Table 5 2 eps2tif bat options 5 6 Graphic user interface For most of users the graphic interface will be the most frequent way of using Word to BIEX convertor To run it just click the icon on your Desktop or in the Start menu or execute the word to latex gui exe file in your Word to BTEX directory After executing the program the configuration dialog will appear All the six tabs will be described now 5 6 1 Running the conversion Only the Input document is required to be selected When the Output file is omitted the Input document file name appended with tex extension is taken instead Two configuration files can be found in your Word to BIEX directory config xml for conversion to IATEX and XMLConfig xml for conversion to XML When the Configuration file is omitted config xml will be used instead But be careful it s recommended to cus
13. S lt font type bold gt E lt font gt FONT_ITALIC italic font S textit E S lt font type italic gt E lt font gt FONT_SMALLCAPS small caps font S textsc E F S lt font type smallcaps gt E lt font gt FONT_HIDDEN hidden font S OWL NL E WL NL S lt font type hidden gt E lt font gt Table B 2 Conversion mappings 56 FONT_SUBSCRIPT subscript font S _ E S lt font type subscript gt E lt font gt FONT_SUPERSCRIPT superscript font S E S lt font type superscript gt E lt font gt FONT_COURIER courier font e g Courier Courier New S texttt E S lt font type courier gt E lt font gt FONT_UPPERCASE uppercase font S uppercase E S lt font type uppercase gt E lt font gt FONT_UNDERLINE underlined font S uline E S lt font type wave underline gt E lt font gt FONT_DOUBLE_UNDERLINE double underlined font S uuline E S lt font type double underline gt E lt font gt FONT_WAVE_UNDERLINE wavy underlined font S uwave E S lt font type wave underline gt E lt font gt Table B 2 Conversion mappings 57 FONT STRIKE strikethrough font S sout E S lt font type strike gt
14. Word uses Unicode 11 which is also used in configuration files there is no problem with the conversion of most of characters Moreover when the translation is not defined for some character it can be kept as is because the encoding of output files is UTF 8 22 WLDocumentMark long StartPosition long EndPosition String GetStartCommand String GetEndCommand WLImage ProcessImages gt WLMarks WLPageBreak GetPageBreaks Queue StartMarks Queue EndMarks WLFootnote GetFootnotes WLStyle GetStyles WLFonts GetFontStyles Figure 2 8 WLMarks class WLDocumentBody foreach Word Paragraph par in inputDocument new table if isFirstInTable par Word Table tab par Range Tables Item 1 WLTable tab new WLTable tab table Convert new list else if isFirstInList par WLListParagraph list new WLListParagraph par list Convert common paragraph else WLParagraph wlPar new WLParagraph par wlPar Convert Figure 2 9 WLDocumentBody class 23 Nevertheless the situation is not so clear due to the fonts like Symbol or Wingdings that have only 0 255 ASCII range Characters from these fonts are internally stored in the part of the Unicode table which is reserv
15. between the Range and Selection objects are e the Range object always represents the contiguous area it has a start and end position in a document e prefer the Range to the Selection because it s a little bit faster 18 2 1 2 Components The whole Word to BIEX application is split into 7 projects which allows easy reusing of the source code Table 2 1 and figure 2 5 show the list of projects Project name output Short description word to latex word to latex lib dll Library containing all the conversion stuff word to latex bin word to latex exe Command line convertor uses the word to latex library word to latex configuration class word to latex configuration class dll Library that reads configuration files It s used in both command line and GUI pro grams Details about configuration files can be found in appendix B word to latex glue word to latex glue dll Simple library containing a class that links the Word application with the word to latex and word to latex gui libraries The class can be used as a COM object directly from a Word VBA macro word to latez gui word to latex gui lib dll Library containing dialogs that enable easy customization of the convertor word to latez gui bin word to latex gui exe Program that uses the previous GUI library and runs the command line convertor word to latex setup word to latex setup msi Deplovment project Table
16. for some users But there are programs like LyX which allow visual editing of IITEX documents 1 2 What to expect It s not possible to perform 1 1 conversion because Word and BIEX are very different document preparation systems The most important task is surely to convert all the text content It especially means to convert special and national characters correctly e g 9 Conversion programs will produce the better results the better input Word documents are structured and formatted This is the reason whv people should 12 use paragraph stvles and appropriate Word functions for inserting footnotes sec tions index etc Once users follow these rules conversion programs can properlv convert almost everv part of a document 1 3 Internal and external conversion There are two possible ways how to convert Word documents to BIEX format A lot of information and also the terminology internal and external conversion come from the article 4 Internal conversion is carried out within the Word application using its object model It s not significant whether you use the object model in a VBA macro or in some external program The most important thing is that all parts of docu ments and information about documents including formatting Word application settings etc is available Examples of programs that perform internal conversion are Word2TRX 18 and Word to BTpX External conversion on the other hand is
17. some of the Word regular expressions that are currently used Regular expression Sample matching string a zA Z i 0 9 120 a zA Z a zA Z 0 9 sin x 2 2 6 Some nice features While Word to BTpX is converting a document the user is being informed about the progress of the conversion It is done through an object that implements simple ILog interface Therefore the console in the command line convertor and a text box in the graphic interface can be used for printing the log information public interface ILog writes a line to the log void WriteLine string line J It is very easy to add new font styles that will be recognized by the conver tor The function FindFontStyle ProcessFunction f searches for the spec ified font style in the input document and calls the given handler that has the same arguments as the following delegate http www dessci com 25 range range in the document that has the given font style style style character or paragraph of this range private delegate void ProcessFunction Word Range range Word Style style example of usage set the font style and pass the ProcessFontStyleBold handler WLConvertor WordApp Selection Find Font Bold 1 true FindFontStyle new ProcessFunction ProcessFontStvleBold Although the convertor is highlv customizable and has a built in support for two different output format families KTEX and XML there are onl
18. special character which is illustrated in the following example lt latexChar char convertTo textbackslash gt lt latexChar char convertTo gt All the other special and national characters are defined in lt char gt elements The code attribute contains the Unicode 11 number of each character The details about the common context translation convertTo attribute and the math context translation mathConvertTo attribute can be found in section 5 6 4 A short example follows char code 010C convertTo v C mathConvertTo check C gt char code 010D convertTo v c mathConvertTo check c pp 70 Bibliographv 1 Allin Cottrell Word Processors Stupid and Inefficient 2 10 11 12 13 14 15 16 17 http www ecn wfu edu cottrell wp html Donald E Knuth The TgXbook Volume A of Computers and Typesetting Addison Wesley Publishing Company 1984 ISBN 0 201 13448 9 Tobias Oetiker The Not So Short Introduction to BIEX 2z http people ee ethz ch oetiker Marion Neubauer Conversion from WORD WordPerfect to BTEX MAPS 14 1995 120 124 http www ntg nl maps maps14 html Jesse Liberty Programming C Second Edition O Reilly 2002 ISBN 0 596 00309 9 Ben Albahari Peter Drayton Brad Merrill C Essentials Second Edition O Reilly 2001 ISBN 0 596 00315 3 MSDN Library Word Object Model Overview http msdn2 microsoft
19. the document body Special and national characters are translated to appropriate commands and the marks are inserted to correct positions More about these two tasks will be told in next sections 20 input document output filename B WLConvertor configuration WordApp Init MathType Init WLMarks GetAllMarks documentPreamble Convert documentBody Convert Figure 2 6 WLConvertor class 2 2 1 Retrieving and inserting marks The concept of marks shortly mentioned in the previous section is actually the same as so called XML markup The convertor retrieves information about a lot of non text elements contained in the document Each element like a page break footnote or text highlight has its start and end position that can be obtained from the Start and End properties of the corresponding Word Range object Like in XML some marks don t need to have end positions Although this markup concept is very simple one example in figure 2 7 will make it completely clear Lorem ipsum dolor lt bold gt Lorem ipsum lt bold gt sit amer consectuer dolor lt linebreak gt sit amet lt font size 3 gt consectuer lt font gt Figure 2 7 Markup concept Each element from Word documents has its corresponding class in the word to later library All of them are derived from the WLDocumentMark class figure 2 8 Their instances must have start and end positions and return com
20. to transform the file into the format we need The html xsl style located in the Word to BTEX directory transforms the input file to XHTML format 21 com bined with CSS 22 This style was tested with saron XSLT processor 46 Appendix A Sample documents The following pages show two documents converted with Word to BTpX 47 Original Word document 1 Font stvles 1 1 Stvles1 Lorem ipsum deler sit amet consectetuer adipiscing elit UT SED NISI vel justo lobortis venenatis Sed id risus Donec sollicitudin Aenean nulla Nam blandit sapien a venenatis viverra velit nisl mattis urna non luctus sapien ante et leo H20 E me 1 2 Styles 2 Lorem ipsum dolor sit amet consectetuer adipiscing elit Ut sed nisi vel justo lobortis venenatis Sed id risus Donec sollicitudin Aenean nulla Nam blandit sapien a venenatis viverra velit nisl mattis urna non luctus sapien ante et leo 2 Special characters in list e lu ou k k p l d belsk dy o PQa 6 ie T ab AxB 3 Paragraph indentation Lorem ipsum dolor sit amet consectetuer adipiscing elit Lorem ipsum dolor sit amet consectetuer adipiscing elit Ut sed nisi vel justo lobortis 4 Simple table Center bold Right 2 1 Italics Pink 5 Complex table Header A a b B c d Lorem ipsum dolor sit amet 48 BTEX output compiled to PostScript 1 Font styles 1 1 Styles
21. tomization Additional XSL stylesheets can be created to have Word doc uments in your own format Sample XSL stylesheet generating XHTML MathML CSS documents has been tested and the output looks very nice 13 Equations inserted through Equation Editor MathType and Word EQ fields are converted There are a couple of predefined equations output formats e g BTEX MathML Numbered equations are also converted Optionally references to numbered equations can be automatically recognized in input documents Both raster and vector images and even embedded objects like Excel graphs are converted to Encapsulated PostScript EPS format or to bitmaps PNG format 1 4 2 Support for structured documents Paragraphs marked as headings using the Word built in styles are properly converted to IXTEX sections the default mappings can be changed Ordered and unordered lists even nested and complex tables with merged rows and columns are converted Footnotes and endnotes are properly converted Bibliography items can be optionally created from endnotes They re in fact the only way how users can insert bibliography and citations into Word documents The program converts table and figure titles index table of contents mul ticolumn sections hyperlinks Bookmarks references and page references to bookmarks are also converted 1 4 3 Documents formatting Mappings between user styles both paragraph and character and BTEX command
22. 3 S ref NAME OWL NL E WL NL S lt math reference name NAME gt E lt math reference gt NOTE_REFERENCE note reference currently only endnotes are supported e HNAME name of the note typically number that is being referenced S cite ref NAME S lt note reference name NAME gt Table B 2 Conversion mappings 63 BIBLIO REFERENCE reference to a bibliography item cita tion the Word hard coded citation e g Ka75 will be the content of this ele ment e NAME name of the bibitem e g Ka75 S NcitefreffNAMEJOWL NL E WL NL S lt biblio reference name NAME gt lt biblio reference gt PAGE_REFERENCE e NAME page reference name of the bookmark that is being refer enced S pageref NAME BOOKMARK_LABEL bookmark e NAME name of the bookmark S label NAME S lt bookmark name NAME gt STYLE paragraph or character user style e NAME name of the style all numbers in the name are replaced with words e g 1 One S NAME E S lt style name NAME gt lt style gt STYLE_DEFINITION container for a single user style definition commands describing the style will be in serted into E lt style definition gt e HNAME name of the user style S newcommand NAME 114 E S lt style definition name NAME gt
23. But be very careful when checking this option because it takes a lot of time to find and convert the colored text The same package is used when you check Convert highlighted text marked with the Word Highlight tool and Convert colored table cells When any option is unchecked it only means that commands defining colors won t be inserted into the output file The whole text content will be of course converted Misc Check Convert multicolumns to convert multicolumn sections inserted through Format Columns Sans serif fonts like Arial or Verdana are converted to appropriate commands only when Convert sans serif fonts is checked 45 Check the option Automaticallv recognize math in italicized text and simple math expressions like or k lt 30 will be inserted as math text instead of text in italics The convertor can Recognize references to numbered equations if they match the pattern 1 9 or 1 9 1 9 e g 3 15 A numbered equation must be inserted on a separate line and its label must be written at the right part of the same line Any number of white space characters between the equation and its label is allowed Paragraphs not containing any text won t be converted when Ignore empty paragraphs is checked Word to BTgX can Convert endnotes into bibliography items and Rec ognize bibliography references citations if they match the pattern A Za z0 9 e g 4 or Ka76 But if you don t use endnotes for b
24. Charles Universitv Prague Czech Republic Faculty of Mathematics and Physics BACHELOR THESIS Michal Kebrt Word to BIEX convertor Department of Software Engineering Advisor RNDr Tomas Skopal Ph D Program in Computer Science 2006 I hereby certify that I wrote the thesis myself using only the referenced sources I agree with lending the thesis Prague May 20 2006 Michal Kebrt Contents 1 Word to ETEX conversion 1d Word versus DIRX a b we eh A ubo es An BNB A34 12 Whattoexpect uos ae Wan be Oat Wee ec des U at A 1 3 Internal and external conversion lt 1 4 Word to ETRX Converter cd a ah el yk a eee At 1 4 1 Most important features lt lt 1 4 2 Support for structured documents 1 4 3 Documents formatting zug ed om RES kud sib 1 44 Miscellaneous options and features 2 Implementation 2 1 Basic overview 2 1 1 Word object model srs x ecm care o de woo de eene SR 22s Componentes s uud o 4 RE CRUEL SEE RACER dud AI CIE ERES ar ena e aec 2 2 Design and algorithms 4 54 8 RR Go RA 4 2 2 1 Retrieving and inserting marks 2 2 2 Text content conversion 20 5 3o E 4e A ald ok eed i 2 2 3 Special characters conversion 2 2 4 mages conversion ella oye be fed dok 2 2 5 Equations conversion L 54 zoe 46 kg mk re kde 2 2 6 Some nice features cece Aa ad at os 2 3 Problems 2
25. FILENAMEJOWL NL S lt image width WIDTH src FILENAME title TITLE gt IMAGE_CONTAINER image container used when the image has a title S begin figure h WL NL E end figure 5 E IMAGE TITLE image title inserted into the IMAGE CONTATNER element e TITLE title S caption TITLE o gt TOC table of contents Word TOC field KTEX generates the table of contents automati cally as well as Word S tableofcontents S lt table of contents gt Table B 2 Conversion mappings 62 HVPERLINK hyperlink e HREF hyperlink target the macro can be used also in the end command S href HREF E S lt link href HREF gt lt link gt SPECIAL_COMMAND BTEX command s inserted into the doc ument through the Word PRIVATE field whose content must begin with the case insensitive string latex such a field may look like this PRIVATE LaTeX indent indent will be inserted between the start and end command S E oc S4 Bi REFERENCE bookmark reference e NAME name of the bookmark that is being refer enced S ref NAME S lt reference name NAME 2 MATH REFERENCE e NAME eguation reference the Word hard coded reference e g 3 will be the content of this element name of the eguation that is being refer enced it is generated for each numbered equation in the document e g eg
26. NL vspace 3pt noindent WL NL begin tabular E end tabular WL NL vspace 2pt WL NL S WL NL lt table title TITLE gt E lt table gt WL NL TABLE_CONTAINER table container used when the table has a title QWL NL begin table h S E end table WL NL S B TABLE_TITLE table title inserted into the TABLE_ CONTAINER element e TITLE title S caption TITLE Sura TABLE MULTIROW table cell with merged rows e ROWS number of merged rows in the cell S multirow ROWS E S lt table multirow cell multi ROWS gt Be Table B 2 Conversion mappings 66 TABLE CELL COLOR command for the colored background of ta ble cells the COLOR macro in the next el ement TABLE MULTI COLUMN will be re placed with this command e COLOR background color in HTML notation e g FF0000 S gt columncolor HTML COLOR S color COLOR TABLE_MULTICOLUMN table cell with merged columns e COLS number of merged columns e LEFT_BORDER if the cell has a left border e RIGHT_BORDER if the cell has a right border e COLOR see the previous element e ALIGN cell content alignment 1 left r right c center S multicolumn COLS LEFT_BORDER COLOR ALIGN RIGHT_BORDER E S lt table cell multi COLS left border LEFT_BORDER right border RIGHT_BORDER align ALIGN width WIDTH COLOR gt E lt table cel
27. WL TAB bibitem NUMBER ref NUMBER E WL NL S WL TAB lt bib item name NUMBER gt E lt bib item gt ENDNOTE_REFERENCE endnote this translation is used at the endnote s insertion point e NUMBER number of the endnote e CONTENT endnote s text content can be used when translating endnotes to footnotes S citefref NAME S lt endnote reference name NUMBER gt Table B 2 Conversion mappings 68 COLOR BG AND BORDER text with colored border and background e BORDER COLOR border color in HTML notation e g FF0000 e COLOR text color dtto Vfcolorbox HTML BORDER COLOR HTML COLOR Ur U lt box border color BORDER COLOR background color COLOR gt E lt box gt COLOR_BORDER colored border around text e BORDER_COLOR border color in HTML notation e g FF0000 S fcolorbox HTML BORDER COLOR HTML FFFFFF E S lt box border color BORDER_COLOR gt E lt box gt BORDER black border around text S fbox E S lt box gt E lt box gt Table B 2 Conversion mappings 69 B 3 Special characters The configuration of special characters is enclosed in the lt specialChars gt ele ment lt latexChar gt elements are used for defining characters that have a special meaning in the output format They must be written in a correct order because one special character can be used for translating another
28. ample texttt e yes X no DOC CLASS The OWL DOC CLASS macro used in the preamble will be replaced with the value of this option e c g article Table B 1 Conversion options 53 Option name Description and possible values OUTPUT FORMAT The format of output files Please remember that all translations mappings described in B 2 should be set to match this output format The convertor performs a few special actions depending on two possible val ues e latex e xml PAGE_SIZE The WL PAGE_SIZE macro used in the document preamble will be replaced with the value of this op tion only if the PAGE SIZE PROCESSING option is set to my e e g adpaper PAGE SIZE PROCESSING Specifies how the page size will be processed possible values are e complete the WL PAGE SIZE macro used in the document preamble will replaced with the complete page size definition matching the page size of the in put document e symbolic the convertor will try to translate the symbolic page size of the input document e g A4 to an appropriate BTEX size e g letterpaper e my see the previous option DEFAULT FONT SIZE Defines the default font size of the input document The portions of text having this size won t be marked with any font size command in the output file Only integer numbers are allowed e c g 12 PARAGRAPH Convert paragraph alignments ALIGNMENTS yes x no PARAGRAPH Con
29. aphy items e yes X no RECOGNIZE BIBLIO REF Recognize in text citations references to bibliogra phy items e g 4 yes x no FONT SIZE 1 10 These options define ranges for each converted font size group The range for the i th group is from FONT SIZE i 1 1 to FONT SIZE i inclu sive The first group FONT SIZE1 starts with the size 1 Only integer numbers are allowed e c g 11 for the FONT SIZEA option and 12 for the FONT SIZE5 option when the default font size is 12 Table B 1 Conversion options 59 B 2 Conversion mappings Table B 2 shows the complete list of conversion mappings between input docu ment elements sections paragraphs lists and so on and Word to BTX Each mapping has a start command S which is inserted before the element and most of them have also an end command E inserted after the element Some ele ments like tabulators doesn t have any content others hold some kind of content text equation another element which is inserted between the start and end command Names of macros that are specific to each element begin with macros common to all elements begin with e e WL NL new line e QWL TAB tabulator Table B 2 also contains the default mappings for BIEX and XML output When E is omitted the end command is always ignored by the convertor stands for the empty translation command FONT_BOLD bold font S textbff E
30. at were not implemented or could not be implemented due to the Word limitations follows A lot Cross references onlv to bookmarks are properlv converted The other cross references use Word internal codes and therefore cannot be converted Word has no tool for inserting citations so there is nothing to convert Nevertheless the convertor mav recognize hard coded citations The character set of output files cannot be changed as was promised in the specification UTF 8 encoding has been chosen because it covers all national and special characters Images are not exported in original format PNG and EPS are the only output formats of additional improvements have been done the most important are The convertor is not tied up with BIEX so the configuration for XML output could be easily created Such output files may be transformed to other formats e g XHTML CSS The convertor can be executed through the COM object The total perfor mance was increased more than 10 times when this COM object was used in a VBA macro Paragraph and character user styles are translated to appropriate com mands Colored and highlighted portions of text can be optionally converted Parts of text written in basic sans serif and typewriter fonts are properly marked in output files http www ms mff cuni cz kebrm3am word to latex spec pdf 34 Although the convertor has a lot of features a few other improvements could be done in f
31. ators CRLF CR LF Lines in output files can be wrapped after each x characters is defined in the configuration 15 Chapter 2 Implementation 2 1 Basic overview Word to BIpX performs so called internal conversion since it uses the Word ob ject model 7 8 to retrieve all parts of documents Basic information about this model will be given in section 2 1 1 Microsoft Visual Studio 2003 and C language 5 6 were chosen as a devel opment environment The whole project is divided into a couple of subprojects described in section 2 1 2 The program design interesting algorithms and limi tations of the Word object model will be depicted in section 2 2 2 1 1 Word object model The object model enables vou to control the whole Word application and manip ulate the documents Each document can be traversed in a couple of wavs and a lot of information can be retrieved using tens of various objects properties You have to add a reference to Microsoft Word Object Model Library to be able to use the Word object model in your program Such a program should correctly work with all higher Word versions in future but not with the older versions that don t have all the functionality you may use when developing with a newer object model library As you can see in figure 2 1 Application and Document are the essential objects that every program which automates Word needs The entire Word application is represented by the Application o
32. be marked with the Word built in styles they can be defined up to level 9 S section E S lt heading level 1 gt E lt heading gt HEADING2 heading level 2 S subsection E S lt heading level 2 gt E lt heading gt Table B 2 Conversion mappings 59 HEADING3 heading level 3 S subsubsection E S lt heading level 3 gt E lt heading gt ALIGN_CENTER paragraph alignment centered S begin center WL NL E WL NL end center S lt align type center gt kiri ALIGN LEFT paragraph alignment left S raggedright WL NL E WL NL S lt align type left gt E ALIGN RIGHT paragraph alignment right S raggedleft WL NL E WL NL S align type right gt E TABLE ALIGN CENTER table paragraph alignment centered e WIDTH table cell width in points S parbox WIDTHpt centering E S align type center gt B TABLE ALIGN LEFT table paragraph alignment left e WIDTH table cell width in points S parbox WIDTHpt raggedright E S lt align type left gt E Table B 2 Conversion mappings 60 TABLE ALIGN RIGHT e WIDTH table paragraph alignment right table cell width in points S parbox WIDTHpt raggedleft E S lt align type
33. bject Al though the Application object makes a lot of other objects available only a few of them are so important that you will find them in almost every application that uses the Word object model Figure 2 2 shows three of these essential objects Only one document can be active within the Word application ActiveDocument All opened documents are grouped in the Documents col lection Each Document object figure 2 3 represents a single Word document It comprises of a couple of collections containing footnotes endnotes fields para graphs styles shapes and so on The Selection object represents the currently selected area This object offers almost the same properties as the Document object and a couple of addi tional properties which are illustrated in figure 2 4 The Find property is used very often throughout the whole Word to BTFX program It provides the same 16 class WordSketch static void Main stringll args Word ApplicationClass wordAppClass Word Application wordApp Word Document document object fileName d file doc object readOnly false object isVisible false object saveChanges false object missing System Reflection Missing Value wordAppClass new Word ApplicationClass wordApp wordAppClass Application document wordApp Documents Open ref fileName ref missing ref readOnly ref missing ref missing ref missing ref missing ref missing ref missing ref missing ref missi
34. com en US library kw65a0we VS 80 aspx Julianne Sharer Arthur Einhorn Word Object Model The Definitive Refer ence O Reilly 2001 ISBN 1 56592 430 4 MathType Software Development Kit http www dessci com en reference sdk Dale Rogersion Inside COM Microsoft Press 1997 ISBN 1572313498 Unicode Home Page http www unicode org Wilfried Hennings Convertors from PC Textprocessors to BIEX http www tug org utilities texconv pctotex html wsW2LTX convertor http www winshell de Antiword http www winfield demon n1 GrindEQ http www grindeq com the Comprehensive TEX Archive Network http www ctan org rtf2latex2e http sourceforge net projects rtf2latex2e 71 18 19 20 21 22 Word2TEX http www chikrii com Extensible Markup Language XML http www w3 org XML XSL Transformations XSLT http www w3 org TR xslt XHTML 1 0 The Extensible HyperText Markup Language http www w3 org TR xhtml1 Cascading Style Sheets http www w3 org Style CSS 12
35. d Keywords LaTeX Word XML conversion Preface Word to BTgX is a program that converts Microsoft Word documents into IXTEX format which is suitable for typesetting books manuscripts and other kinds of documents or contributing papers to a lot of conferences Although the conversion to BIEX was the only goal of the project I tried to make the program as much customizable as possible which resulted in the convertor that supports two output format families BIEX and XML Other markup formats can be easily added through the configuration The program is divided into a couple of components which allowed to create a separate command line convertor a graphic user interface that s running the command line convertor and also a COM object that enables to use the convertor directly from the Word application The work has two main parts the first one contains three important chap ters Chapter 1 compares text processors and BIEX as two different approaches to making documents It also summarizes Word to BTEX features and the pos sibilities of conversion between Word documents and BIEX format Chapter 2 describes the implementation It covers the concept of the convertor and its com ponents the most important algorithms and the way of communication between the convertor and the Word application A short overview of the Word object model its problems and limitations is also included Word to BIEX program is compared with all existing Word
36. d nisi vel justo lobortis venenatis Sed id risus Donec sollicitudin Aenean nulla Nam blandit non luctus sapien ante et leo Special characters in list e lu ou k k p l belsk dy VOoa d iET a b A xB Paragraph indentation Lorem ipsum dolor sit amet consectetuer adipiscing elit Lorem ipsum dolor sit amet consectetuer adipiscing elit Ut sed nisi vel justo lobortis sapien a venenatis viverra velit Mislanatus urna Simple table 2 1 Italics Pink Complex table Header a b A c d 50 Original Word document at the top BIEX output compiled to PostScript at the bottom 40 30 20 10 HEnergy MWater OWood Bitmap image Microsoft Excel graph Equation editor expressions max D o 0 Y 400 0 1 k l Given a set of paths Xp and a set of path contents X p binary relation PPC c X p XX pc is defined An e s e PPC denotes the assignment of the path e e e e to the path content s s1 s2 Sy 3 EO field expression 5 See expression 1 40 30 20 10 0 HEnergy MWater Wood Bitmap image Microsoft Excel graph Eguation editor expressions max li l D o oj gt d of of 1 k 1 Given a set of paths Xp and a set of path contents Xpc binary relation PPC C XpxXpc is defined An e s PPC denotes the assignment of the path e e1 e2 ek to t
37. ed for the application use 0xF020 OxFOFF Currently Word to BTEX program has a built in support for the Symbol font The program defines mappings between most of characters from this font and Unicode However it s verv difficult to find these characters in documents because Word overlaps the real Font propertv with the surrounding font like Arial or Times Find and Replace and Insert Symbol dialogs have to be invoked to find these symbols and detect their real codes 0 255 Afterwards they can be converted to Unicode following the predefined mappings 2 2 4 Images conversion Word to BIEX exports images including embedded objects in two different for mats as bitmaps in PNG format or as vector images in Encapsulated PostScript EPS format The conversion to EPS format is performed by an external PostScript printer driver e g Generic Color PS which can be easily installed in Windows The conversion procedure is rather complicated first the image is copied into the clipboard then pasted in a temporary Word document which is printed to an EPS file using the PostScript printer Once this is done the Bounding Box property specifying the EPS image size must be edited to match the original image size in the Word document This property is edited without any external program which is quite an easy task It means to change four numbers in the head of each EPS file plain ASCII text file Example 4 BoundingBox 110 687 219 714 The W
38. ent Each of the them is inserted into the lt option gt element with two attributes name and value Option name Description and possible values ONLY_IMAGES Convert only images and ignore text content eyes X no PRINTER NAME The name of a PostScript printer which is used for exporting images in EPS format The printer driver has to be installed on vour svstem e c g Generic Color PS IMAGE FORMAT The output format of images e eps for EPS vector format requires a PostScript printer e png for PNG bitmap format not all the images can be exported as bitmaps TDL FILENAME The translation file used for the conversion of equa tions See the Translators subdirectory of your MathType directory for possible values remember that MathType must be installed on your system to be able to convert equations You can edit or add new files into this directory if you want to customize the conversion of equations e c g LaTeX tdl EQUATIONS The conversion of equations covers Equation Editor MathType and EQ fields equations e ignore do not convert e convert convert using the translation file speci fied in the TDL FILENAME option e toimages convert to images CREATE COMMANDS The convertor will create or not new commands for FOR STYLES paragraph and characters user styles in the preamble Output text files are more maintainable if commands like Ncode are used instead of for ex
39. he path content s s1 s2 Sk 3 EQ field expression 7 See expression 1 51 Appendix B Structure of configuration files lt xml version 1 0 encoding utf 8 72 configuration xmins http kebrt cz word to latex xmlns xsi http www w3 org 2001 XMLSchema instance gt lt variousOptions gt lt option name 0UTPUT FORMAT value latex lt option name EQUATIONS value toimages gt lt variousOptions gt lt translationTable gt lt docElement name FONT_BOLD start textbf end gt lt docElement name HEADING1 start part end gt lt translationTable gt lt specialChars gt lt latexChar char convertTo textbackslash gt lt specialChars gt lt configuration gt Figure B 1 Fragment of the config xml configuration file All the configuration is stored in an XML file with the lt configuration gt root element which contains three subelements lt variousOptions gt various options applied during the conversion out put format PostScript printer name lt translationTable gt table containing mappings between input docu ment elements sections paragraphs footnotes and so on and IXTEX commands lt specialChars gt translation mappings between special and na tional characters and IXTEX commands 52 B 1 Conversion options All the options listed in table B 1 belong to the lt variousOptions gt parent ele m
40. houldn t be a reason for the typesetting to be an important job These two tasks have been put together in widely used WYSIWIG text pro cessors Microsoft Word WordPerfect OpenOffice org Writer and many more are examples of these programs They allow to create documents their design and layout interactively selecting from a great variety of commands in the pro gram menu A user always sees a document in its final form because all the document formatting is displayed on the screen for example a heading appears in a bold and bigger font At first sight this feature looks nice but the on the fly typesetting brings a couple of problems which will be summarized below ETEX 3 on the other hand is a document preparation system which is used for typesetting science and mathematical documents in a high typographic quality Iwhat you see is what you get 10 The svstem is suitable for creating manv different kinds of documents from plain letters to large books TREX is also a standard for contributing manuscripts to a lot of scientific conferences BTEX uses TEX 2 typesetting system for creating beautiful books which was developed by professor Donald E Knuth BIEX is actually only a package of macros that make the work with TEX easier Other sets of macros which can be used instead of BIEX are AmS TeX and AMS EIEX What is the main difference between TEX and Word When you make a document in IXTEX you write al
41. ibliography items you will still have to edit the bibliography section manually 5 7 Running Word to BIEX from Word The conversion will be at least 10 times faster if you press the button on the Word to ATFEX toolbar installed directly into your Word application The convertor interface is completely the same as the one described in the previous section If you have problems with running the convertor from Word please verify that you have Medium or Low option checked in the Word Tools Macro Security menu bia Figure 5 7 Word to BTEX toolbar in Word 5 8 Conversion to XML XHTML MathML The output of the convertor completely depends on the configuration There is no need to convert documents only to IXIEX The XMLConfig xml configuration file stored in the Word to BTEX directory is used for conversion to XML 19 which is a nice intermediate format that can be easily transformed to whatever format you need You should be familiar with XML and related technologies to understand a short overview The best way to insert mathematical equations into XML documents is MathML language Word to T X uses Math Type built in capability to export equations to MathML format XML format is very strict XML files must be so called well formed Some times the convertor produces a file that is not well formed but it s never difficult to correct such a file manually Once we have a well formed XML file an XSLT style 20 can be used
42. is no need to define translations for Latin extended characters e g or Cyrillic ones Just make sure that you have appropriate commands in the document preamble for example usepackage T2A fontenc usepackage utf8 inputenc 5 6 5 Styles and Font sizes EE Word to LaTex In xl Configuration Help Running Figures Eq Document Preamble Styles Fonts Characters Misc A LATIN CAPITAL LETTER A WITH M kbart 8 LATIN SMALL LETTER A WITH MA a barfa LATIN CAPITAL LETTER A WITH B sufA brevelA LATIN SMALL LETTER A WITH BR tula Sufa A LATIN CAPITAL LETTER A WITH O 4k 8 LATIN SMALL LETTER A WITH OG kfa C LATIN CAPITAL LETTER CWITH A N C kacutefl LATIN SMALL LETTER C WITH AC Vich acutefc C LATIN CAPITAL LETTER C WITH CI C hat C LATIN SMALL LETTER C WITH CIR c hat c C LATIN CAPITAL LETTER C WITHD C dot C LATINSMALLLETTERC WITH DO c dot c C LATIN CAPITAL LETTERCWITH C WiC check C LATIN SMALL LETTER C WITH CA wie check c D LATIN CAPITAL LETTER D WITH C w D check D d LATINSMALLLETTERD WITH CA wid check d n LATIM CADITAI ICTTCD ni illTLI c AD A Figure 5 5 Stvles Fonts tab The translations of paragraph and character user styles can be defined in this dialog Press Add new and fill in the name of a style the start command inserted before the text co
43. is the only way of converting them to BIEX Although this format is public 9 it s a hard imaginable method for me Math Type is a professional and commercial version of Equation Editor with a couple of great improvements support for numbered equations automatic recognition of variables functions and constants capability to export equations in GIF EPS MathML BIEX and other formats Math Type has an API for basic work with expressions and as it can handle Equation Editor and EQ expressions too it s a solution for converting all the expressions within Word documents to BIEX Finally we decided to use the MathType API for the conversion of equations although it has one big disadvantage Word to BIEX users must have a legal version of this product if they want to convert equations to IATEX The possibility of parsing the expressions binarv format was eliminated from our consideration because it would have been a verv troublesome task and moreover the format of Equation Editor and MathType equations even differs a bit WordToLatex MathType namespace contains a few functions wrapping the MathType API MathType uses so called translator files written in Translator Definition Language TDL 9 to export expressions in other formats It has a couple of predefined translators enabling conversion to MathML BIEX and a few other formats Word to BIEX tries to recognize simple math expressions written in italics The following table shows
44. ji program Word to BTgX konvertor p ev d j c dokumenty ve form tu Microsoft Word do form tu BIEX kter je vhodn pro sazbu knih skript v deck ch l nk atp Program je v ak kon figurovateln do t m ry e umo uje p ev d t dokumenty i do zcela odli n ch form t nap XML Sou st pr ce je srovn n textov ch procesor a form tu BIEX vyzdvi en jejich v hod a nev hod Stru n jsou pops ny z klady ob jektov ho modelu programu Microsoft Word mo nosti jeho pou it n kolik jeho probl m a omezen a zp sob jak urychlit aplikace kter jej vyu vaj Kl ov slova LaTeX Word XML konverze Title Word to BIEX convertor Author Michal Kebrt Department Department of Software Engineering Supervisor RNDr Tomas Skopal Ph D Supervisor s e mail address tomas skopal net Abstract This work is devoted to Word to BIEX program that converts docu ments written in Microsoft Word into IXTEX format which is suitable for typeset ting books manuscripts scientific articles etc The program can be customized so much that it enables to produce completely different output formats e g XML In this work I also tried to compare text processors and MIX format and emphasise their pros and cons The Microsoft Word object model is briefly described its problems and limitations are also covered Finally the way of im proving performance of applications that automate Word is suggeste
45. ke GetMergedColumnsCount or GetMergedRowsCount This important information has to be counted on the basis of cells widths Moreover Word sometimes gives pointless information about various properties of tables so its really difficult to convert them properly IXTEX capabilities for making complex tables with a lot of merged cells are not very nice which brings other complications into the source code The work with tables becomes much more complicated when there are nested tables in the document so they are ignored by Word to BTEX now 2 4 Improving performance using COM Word to BIEX is not very fast when it s converting a large document Since users usually run the conversion only a few times before they find the best configuration for the particular document the speed is not the most important feature Before the performance can be improved we must figure out what causes the conversion procedure to be so slow It s so called interprocess communication IPC between the convertor process word to latex exe and the Word appli cation winword exe that is extremely exhausting due to the intensive utilizing of the Word object model It s good to follow the rules for writing fast Word automation programs e g prefer Range objects to the Selection object but the speed improvement is not very high It would be perfect to have only one process when automating Word We could use VBA but it s not a suitable language for such a big p
46. l gt PAR_INDENT paragraph indentation e LEFT_INDENT left indentation in points e RIGHT_INDENT right indentation in points e FIRST LINE INDENT first line indentation in points S begin indentation LEFT_INDENTpt RIGHT_INDENTpt FIRST_LINE_INDENTpt WL NL E WL NL end indentation S WL NL lt par indent left LEFT_INDENT right RIGHT_INDENT first line FIRST_LINE_INDENT gt WL NL Ei MULTICOLUMN multicolumn section e COLS number of columns in the section S begin multicols COLS E end multicols S lt multicol count COLS gt E lt multicol gt Table B 2 Conversion mappings 67 COLOR TEXT colored text e COLOR color in HTML notation e g FF0000 S textcolor HTML COLOR E S lt font color color COLOR gt E lt font color gt COLOR_BG text with colored background e COLOR color in HTML notation e g FF0000 S colorbox HTML COLOR E S lt font background color COLOR gt E lt font background gt ENDNOTES_SECTION container for endnotes can be used for in serting the bibliography begin thebibliography 99 WL NL end thebibliography WL NL am wn lt bibliography gt E lt bibliography gt ENDNOTE endnote this translation is used in the ENDNOTES_SECTION context suitable for inserting a single bibliography item e NUMBER number of the endnote S
47. l the text and commands directly into a plain text file and you cannot see the final document until you run a program which generates a PostScript or PDF file Documents can be structured using a lot of special commands for example section My example section command makes a section Most of modern text editors highlight IXTEX commands so it s never difficult to write and maintain IXTEX documents documentclass 11pt article begin document title Simple LaTeX Document author Michal Kebrt date 3rd Apr 2006 maketitle section Introduction This is a simple document created using LaTeX end document Figure 1 1 Simple BIEX document Disadvantages and limitations of text processors e When writing a large document like a book text processors often become very slow and documents hardly maintainable due to the real time typeset ting which requires a great amount of memory e Authors usually tend to use various kinds of fonts emphasis indentation alignment of paragraphs and so on Of course they do it with an inten tion to make documents nicer but this inconsistence always causes worse readability of documents e Authors often forget to concentrate on the content and logical structuring of documents e Documents are usually stored in binary files which sometimes cannot be opened without the text processor installed on your machine There may be also problems with exchanging files between different ver
48. mands that will be inserted into these positions Table 2 2 shows the list of mappings between the document elements Word objects and convertor classes Figure 2 8 shows how the marks are collected and stored The WLMarks class contains two queues for the start and end marks instances of classes from table 2 2 The queue with the start marks is sorted by start positions in the ascending order The end marks in the second queue are sorted ascending by end positions Each queue has special rules applied in the situation when the start or and end positions of two marks are equal This prevents from so called crossover of the marks e g lt b gt lt i gt foo lt b gt lt i gt Each class like WLFootnote has a static member function that loads the marks into these queues When all the marks are loaded the convertor can sequen 21 Word object or propertv convertor class footnote Footnote WLFootnote endnote Endnote WLEndnote image Shape InlineShape WLImage bookmark Bookmark WLBookmark TOC Field type TOC WLTOC index Field type Index gt WLIndex index entry Field type IndexEntry WLIndexEntry hyperlink Field type Hyperlink WLHyperlink cross reference Field type Ref PageRef WLCrossReference equation Field type Formula Embed WLEquation colored text Font Color WLColorText colored bg Font Shading WLColoredBackground style instance Range Style WLStyleInstance fon
49. ng ref isVisible ref missing ref missing ref missing print the content of the first paragraph Console WriteLine document Paragraphs Item 1 Range Text wordApp Quit ref saveChanges ref missing ref missing Figure 2 1 Sketch of a program that uses Word object model Application ActiveDocument Documents Selection Figure 2 2 Essential properties of the Word Application object 17 Document Bookmarks PageSetup Characters Paragraphs DocumentProperties Sections Fields Shapes Footnotes Stvles InlineShapes Tables ListParagraphs Figure 2 3 Essential properties of the Word Document object functionality as the Word Find and Replace dialog and may help you to find the portions of text written in specified font color or style page breaks tabs and so on Even regular expressions can be used when searching for a particular text Selection Cells Find Columns Rows Font Figure 2 4 Essential properties of the Word Selection object The Range object is the last one that will mentioned because it s also widely used This object has nearly the same properties as the Selection object The main differences
50. ntent of the style and the end command inserted after the text content When you omit the definition of some style appropriate commands will be created automatically on the basis of the style properties Word built in styles are skipped You can edit the list of styles double clicking any of the fields Write Y or N to the leave as is field if you don t want to make any changes character translations wrapping in the text content of the style It s suitable for styles that are translated to the verbatim environment Check Create commands in the preamble to make a special command for each style in the document preamble It s recommended to enable this option because it makes output files much more maintainable For example if you have a style named code stylecode command will be created and when you decide to change the definition of the style you will do it only in one place Font sizes are split into 10 groups which are converted to the commands de fined in Translations see 5 6 2 for details Each group has a point range of sizes that it covers from the start size exclusively to the end size inclu sively You can edit the default settings double clicking the end size field of a group you want to change Start sizes are counted automatically 44 The portions of text that have the Default font size won t be marked with anv command defining the font size Therefore it s verv important to have a correct value in this field
51. onvertor When the command line convertor word to latex exe is executed without any parameters the list of all possible options from table 5 1 will be printed word to latex exe i inputFile o outputFile opt confFile i input file name 0 output file name opt configuration file name Table 5 1 word to latex exe options The only required option is i When the output file is omitted the input file name appended with tex extension is taken instead If the configuration file is not specified the default configuration stored in the config xml file is used for the conversion After you run the program with correct options it prints all the file names input output configuration and also your Microsoft Word version which can be useful when an error occurs Then the conversion routine is started and you will be informed about the progress Please be patient when you are converting a large document it can take a long time to convert it Much more faster way of running the conversion will be described in section 5 7 38 5 5 EPS to TIF image conversion As not all images included in Word documents can be converted to bitmaps I wrote a simple batch file eps2tif bat in the eps2tif directory which converts EPS files to TIF format It benefits from the fact that Word to BTEX can export all images to EPS format This batch file requires Ghostscript program which is free for non commercial use The path to the
52. ord object model has no capability to export images That is why the NET System Drawing library is used for saving images in PNG format How ever this procedure has one limitation not all the images can be saved as PNG bitmaps The eps2tif program described in section 5 5 solves this problem There s one more way of exporting images as bitmaps When a Word doc ument is saved as a web page all the images including embedded objects etc are exported as JPEG PNG and GIF files As this technique is very laboured Word to BIEX doesn t use it now 2 2 5 Equations conversion There are three ways how to insert mathematical expressions into a Word doc ument The first one are EQ fields Insert Field which can be used even for quite complicated expressions containing sums brackets matrices fractions etc EQ expressions are written in a source code similar to KIpX e g f 5 3 makes a fraction But they have a couple of limitations for example you cannot create a triple integral As there is no API for EQ fields their source code must be parsed to be able to convert them into another format Equation Editor mostly in version 3 is a part of Microsoft Office package It s a visual editor without any mode for writing expressions in a source code similar 24 to EQ fields In spite of this fact Equation Editor can convert EQ expressions into its own format but not back The parsing of Equation Editor expressions binary format
53. ord to BTEX graphic user interface config xml XMLconfig xml convertor configuration for BIEX and XML output html xsl XSL file which transforms XML output to HTML manual pdf user s manual eps2tif directory containing a batch file for converting EPS images to TIF format http www dessci com en products mathtype http www princeton edu cavalab tutorials computers postscriptPrinter html 37 5 2 Uninstallation If you want to uninstall Word to BIpX from your system go to Control Panel Add or Remove programs and select Word to BTEX Please close Word if it s running before uninstalling 5 3 Configuration All the program configuration is stored in an XML file with a public format which is defined using XML Schema in the config xsd file Before the conversion procedure starts the configuration is validated against the schema so you must be very careful when editing the file manually There are two predefined configuration files in your Word to BTEX directory config xml for conversion to BIEX and XMLConfig xml for conversion to XML format Don t be afraid if XML is an unknown abbreviation for you There is no need to know anything about XML technologies because you can customize the convertor also through the graphic interface which will be described in section 5 6 Appendix B describes the XML structure of configuration files and possible values in each element and attribute 5 4 Command line c
54. performed without the help of the Word application and its object model Then we can use at least two methods to convert a Word document into LIEX either directly access the Word document as a binary file or save the document in a more accessible format often RTF and then convert it into IXTEX External conversion has one big disadvantage in comparison with internal conversion It s usually impossible to retrieve all information about documents especially about their logical structure The first method is completely independent on the Word application so it can be performed outside the Windows environment Although the idea of parsing a Word binary file is rather unimaginable there are a few programs that use this method Antiword 14 and wsW2LTX 13 rtf2latex2e 17 is an example of a program that converts RTF documents into KIEX 1 4 Word to BTEX convertor Word to BIpX performs so called internal conversion since it uses Word object model to retrieve all parts of documents The lists of implemented features follow 1 4 1 Most important features e The conversion can be run from the command line through the graphic interface or directly from Word The latter way of running is much faster than the previous ones e The convertor is not limited only to BIEX format The program can be easily customized by changing a configuration file or through the graphic interface The configuration for XML output is an example of such a cus
55. roject like this convertor The idea of only one process can be easily implemented using the Component Object Model COM described in 10 It s possible to create a simple class connecting the Word application with the convertor library register it as a COM object and then use it in a Word VBA macro There s only one process then winword exe and the conversion is much faster 27 The WordGlue class figure 2 10 works as such a connector It receives a Word application instance and a document to convert Afterwards the customization dialog MainForm is created and the Word Application object is passed to the WLConvertor class described in section 2 2 The user starts the conversion procedure pushing the button in the dialog public class WordGlue private Word Application _app private Word Document _origDoc public void Startup object app object doc _app Word Application app _origDoc Word Document doc public void Shutdown app null _origDoc null i public void Convert MainForm form new MainForm true WLConvertor WordApp app form FromWord true form ShowDialog Figure 2 10 WordGlue class Once we have the connector class WordGlue registered as a COM object it can be simply used in a VBA macro which illustrates figure 2 11 Sub WordToLatex Dim app As New WordGlue app Startup Application ActiveDocument app Convert app Shutdown Set app Nothing End Sub
56. s can be defined e g stvle named preformated to verbatim environment A special command for each stvle can be optionallv created to make later changes in documents easier Converts various font styles bold italic small caps subscript superscript uppercased underlined strikethrough and hidden Text written in basic fonts from sans serif and courier families is also marked in output docu ments IXTEX font size cannot be easily set exactly the same as in Word so there is a point range that each KIpX command covers e g 8 10 pt for Nsmall The default ranges and commands can be of course changed Colored text highlighted text and colored backgrounds of table cells can be converted Borders even colored applied to portions of text are also taken into account Paragraphs are converted even with alignments and indentations Line breaks and page breaks are correctly converted Page size and page margins can be converted 14 1 4 4 Miscellaneous options and features e Special and national characters e g Greek Russian or Hebrew are con verted even those from the Symbol font e Editable document preamble macros like WL DOC_AUTHOR used in the preamble are replaced with the respective information from Word docu ments e TEX commands can be inserted into Word documents through PRIVATE fields Word ignores them but they are correctly converted e Newline separator can be selected from the following separ
57. sed for batch processing of more doc uments e The configuration is stored in plain text XML files which can be easily edited e Output files have UTF 8 encoding which is suitable for easy insertion of national characters e Paragraph indentation may be converted e Page size and page margins are properly converted e It converts equations inserted through EQ fields e Equations can be exported as images which is suitable for users who don t have Math Type installed e The commands defining user styles can be created in the document pream ble 30 Each user stvle mav be converted as is with no translation applied on the text content of the stvle So it s possible to convert the stvle to the verbatim environment BIEX commands can be inserted into Word documents through PRIVATE fields Colored backgrounds of text and text borders are optionallv converted when thev occur in a user stvle Colored backgrounds of table cells mav be con verted too Bookmarks and page references are also converted The convertor can automaticallv recognize citations in input documents Still the bibliography items have to be added manually The default font size of documents can be set The most significant difference between Word2TEX and Word to BIEX is the overall conversion speed Figure 3 1 shows the times achieved when converting two documents using Word2TEX Word to BTgX command line program and Word to BTgX COM object in a VBA scrip
58. sions of the same text processor 11 Advantages of text processors e They are easy to learn and use for most of people When you want the portion of text to be in a bold font you just select it click on the icon and see the change e Users always see documents in their final form But sometimes this can be a disadvantage which was mentioned above e Most of text processors are capable to structure documents using predefined or user styles It s a pity that WYSIWYG model makes users not use this effective feature e Easy insertion of images and variety of external objects graphs drawings and so on TEX advantages e Users can use predefined document templates e g for articles and books with a professional look and typographic quality e Great facilities for writing mathematical expressions inserting index and citations e It s not necessary to specify documents formatting and look because it de pends on the selected document style Authors write only the commands defining the logical structure of documents e g sections and footnotes e Many add ons e g for inserting graphics or hyperlinks e Documents are stored in plain text files which can be opened and edited in any text editor e Wide portability of TEX and BIEX system e IIEX is free BIEX disadvantages and limitations e It s very difficult to make complex tables with a lot of merged rows and columns e Not WYSIWYG model may be a problem
59. sites 16 but they cannot usually handle the new version of RTF format http wvware sourceforge net 29 3 2 Word2 TpX versus Word to BTEX Word2TEX 18 is the only Word to BIEX convertor that will be compared with Word to BIpX All the other convertors are either very old or don t produce good results WordZTEX is a commercial convertor which requires Microsoft Word to be installed Here are the lists of useful features and advantages of both programs Word2TEX unique features and advantages e A couple of built in output formats BTEX 2 09 BIEX 2e AMS LIEX Word to BTEX can produce even more output formats e g XML because its configuration is not so tied up with BIEX e It can put figures and tables at the end output files e Word2T X users can easily define own mappings between math equations and BIEX commands through the graphic interface Word to PTEX users must edit Math Type TDL files to customize the conversion of equations e Commands for PDFTEX and for maketitle can be inserted in special dialogs Word to BTEX can do the same in the preamble configuration e Word2TpX is independent on Math Type when converting equations e It can extract figures in original format e g WMF or BMP e The conversion is very fast Word to BIEX unique features and advantages e Users can run the conversion from the command line through the graphic interface or directly from Word e The command line convertor can be u
60. slation does not influence the conversion of equations which is completely defined in a TDL file see section 5 6 2 for details 42 Macro Replaced with WL DOC_CLASS the Document class option from the previous di alog WL DOC_AUTHOR the input document s author retrieved from the document s properties WL DOC_TITLE the input document s title retrieved from the doc ument s properties OWL PAGE_SIZE see the Document settings in the previous sec tion OWL DEFAULT_FONT_SIZE the default font size details in section 5 6 5 WL STYLE COMMANDS the commands created from paragraph and charac ter user styles see the Styles Fonts tab in section 5 6 5 for details Table 5 4 Document preamble macros fe Word to LaTex Las Configuration Help Running Figures Eq Document Preamble Styles Fonts Characters Misc Styles name statcommand gt endcommand Leave asis code begin verbatim la xi Font sizes Li startsize endsize end verbatim Y 313 8 2 8 3 3 8 10 4 10 11 5b UH 12 6 12 14 Eo 16 8 16 18 3 18 20 10 20 26 Default font size i2 Add new V Create commands in the preamble ok gn Figure 5 4 Characters tab 43 Default translations can be changed double clicking the field vou want to edit The encoding of output files is UTF 8 which covers all national characters so there
61. stScript EPS or bitmap PNG If you want to export images to EPS format you must specify the PostScript printer This topic was mentioned in section 5 1 EPS format is recommended because EPS images can be easily integrated into IXTEX documents and moreover some images included in Word documents e g Word drawings cannot be exported as bitmaps If this occurs the convertor will give you a notice and after it finishes you can export all images to EPS format and use eps2tif program described in section 5 5 to have a bitmap version of each image Equations If you have MathType installed on your system you can check convert and all equations inserted through Equation Editor MathType and Word EQ fields will be converted Otherwise you have to select ignore to ignore all equations or to images for exporting equations to images When the convert option is selected the output format of converted equa tions depends on the translation file defined in the TDL filename box See the Translators subdirectory of your MathType directory for possible values You can edit or add new files to this directory if you want to customize the conversion of equations Document settings As the convertor performs a few special actions depending on the Output for mat you must select BIEX or XML But remember that it doesn t change any Translations The WL DOC_CLASS macro used in the document preamble will be replaced with the value of the Document class option
62. t The machine used for testing was Athlon 2200 1 8 GHz 512 MB RAM Word XP 1000 X e o El Word to LaTeX COM D Word to LaTeX command line li Word2TeX Time in seconds log a book 700 pages article 12 pages Figure 3 1 Word to DTEX vs Word2TEX speed A couple of problems occurred when I was testing Word TFX Table of contents wasn t converted correctly Some images had wrong width specified Predefined translations for a few Czech national characters were missing A couple of tables and lists were converted badly 31 The following pages contain a Word document converted with both Word to ATEX and Word2TgX Although it s very a short document a couple of Word2TEX limitations are illustrated badlv converted font sizes courier and sans serif fonts no indentations no background colors wrong table page refer ence is hard coded not inserted through pageref Word original Some colors and font sizes led id risus Donec enenatis viverra velit MISIMAMAS urna non luctus sapien ante et leo Integer pharetra congue tempus metus sem eu lorem dio vitae nibh Donec portal Source code int main int argc char argv if 0 lt 1 printf Hello World return 0 indenting EQ page reference Aliquam egestas quam in imperdiet imperdiet nulla nulla lacinia nunc congue tempus EQ field Ax PAGEREF colors are on page 1 Table
63. t style Font Bold Font Italic WLFontStyle paragraph Paragraph WLParagraph table Table WLTable table cell Cell WLTableCell Table 2 2 Mappings between Word objects and Word to BIpX classes tially pick them up from the sorted queues and insert the commands returned by GetStartCommand and GetEndCommand member functions into the output file 2 2 2 Text content conversion The conversion of text content follows the marks retrieval task described in the previous section The WLDocumentBody class figure 2 9 works as a manager it traverses the document paragraph by paragraph and calls functions for the conversion of tables list paragraphs and common paragraphs The WLParagraph class takes the paragraph text translates special characters and inserts marks if any to appropriate positions in the paragraph Finally the converted paragraph can be written to the output file 2 2 3 Special characters conversion We must differ between two groups of special characters The first one contains the characters that have a special meaning in the output format e g V in IXTEX The second group comprises of all national characters and special symbols e g T The characters from the first group must be always converted earlier because they are often used to translate the ones from the second group The way how the characters will be converted completely depends on the configuration described in section B 3 Since
64. to BIEX convertors in Chapter 3 The second part is user related it comprises of the user s manual and appen dices that show sample documents converted with Word to BTFX and describe the structure of configuration files Part I Chapter 1 Word to LATEX conversion 1 1 Word versus BIX At the beginning it wouldn t make sense to compare two particular software products as examples of two verv different approaches to making documents At first it s important to realize how documents are usuallv created and how thev should be created correctly Alan Cottrel in his flammy article 1 very strictly separates two tasks while creating documents The composition of the text itself By this I mean the actual choice of words to express one s ideas and the logical structuring of the text It includes matters such as the division of the text into paragraphs sections or chapters adding of special emphasis to certain portions of the text and so on The typesetting of the document This refers to matters such as the choice of the font family in which the text is to be printed and the way in which structural elements will be visually represented Should section headings be in bold face or small capitals Should the text be justified or not And so on Apart from the fact that in these days the author and the typesetter is often the same person the author should always mainly concentrate on the first of these tasks At the beginning there s
65. to avoid a lot of unnecessarv font size commands in the output file Check Auto detect default font size to retrieve the default size from the Word built in Normal stvle 5 6 6 Miscellaneous options PR Word to LaTex In xj Configuration Help Running Figures Eq Document Preamble Styles Fonts Characters Misc Output Paragraphs IV Wrap paragraphs after po characters do WEISS Heine IV Process paragraph indentations CRLF C LF C CR A Misc Convert multicolumns Colors IV Convert sans serif eg Arial fonts Convert colored text IV Automatically recognize math in italicized text Y Convert highlighted text T Ignore empty paragraphs T Convert colored table cells IV Recognize references to numbered equations ie 4 IV Recognize bibliography references ie 5 IV Convert endnotes to bibliography Figure 5 6 Misc tab Output Check Wrap paragraphs and insert an integer number to wrap the paragraphs in the output text file The following line separators can be used in output files CRLF Windows LF Unix CR Macintosh Paragraphs Check Process paragraph alignments and Process paragraph indenta tions to take them into account Sometimes it s better to ignore Word alignments and indentations because TX can make them automatically and better Colors Check Convert colored text to convert colored portions of text using xcolor package
66. tomize the settings for each document you convert Save as Save and Load commands in the Configuration menu can be used to load and save convertor configurations Remember that the current configuration must be saved before it is applied during the conversion You can check the option Save configuration before conversion to save the configuration automatically after pressing the Convert button 3http www cs wisc edu ghost http www irfanview com 39 EE Word to LaTex Fioues Ea Documert Preamble Stytes Forts Characters Mise Figure 5 1 Running tab When you press the Convert button all the file names input output con figuration and also your Microsoft Word version will be written to the text box below This can be useful when an error occurs Then the conversion routine is started and you will be informed about the progress in the text box Please be patient when you are converting a large document it can take a long time to convert it Much more faster way of running the conversion will be described in section 5 7 5 6 2 Figures Equations and Translations EE Word to LaTex fers Figure 5 2 Figures Eq Document tab 40 Figures Check Onlv figures to convert onlv figures and ignore the text content of the input document Word to BTiX exports images including embedded objects like Excel graphs in two formats vector Encapsulated Po
67. uture e The conversion of equations without MathType installed It means to parse the equations binary format e Convert the spaces between paragraphs e XSL stylesheets for other output formats e g tBook or Simplified Docbook could be created e The conversion of nested tables 35 Part II Chapter 5 User s manual 5 1 Requirements and installation Microsoft Windows 2000 or XP is required Microsoft NET Framework Version 1 1 or higher is required The instal lation file can be found in the setup directory Microsoft Word XP 2002 or higher is required to be installed on your system If you want to export mathematical equations not only as images but also to BIEX or MathML formats you will have to install Design Science Math Type it s a commercial product You must have a PostScript printer driver installed on your system to be able to export images to EPS format You can follow instructions here to add very good Generic Color PS Printer After you have installed all the required software close Word if it s run ning execute setup exe in the setupWord to LaTeX directory and follow the instructions You must have administrator privileges to install the whole appli cation properly Once the installation is finished you will find a couple of files in your e Word to 2 TX directory Some of them are listed here word to latex exe Word to DTEX command line convertor word to latex gui exe W
68. vert paragraph indentations INDENTATION yes x no COLOR TEXT Use special commands for colored text e yes X no COLOR BG Use special commands for text with colored back ground e yes X no COLOR TABLE Use special commands for table cells with colored background e yes X no Table B 1 Conversion options 54 Option name Description and possible values AUTO_DETECT_ DEFAULT_FONT_SIZE Detect the default font size of the input document automatically or not The font size of the Word built in Normal style will be taken as the default one if this option is set to yes eyes X no MULTICOLUMN Convert multicolumn sections e yes X no WRAP_PARAGRAPHS A positive value causes paragraphs to be wrapped into lines after each x characters Any other value forces the convertor not to wrap paragraphs e c g 80 NEW LINE Defines the line separator possible values are e cr1f Windows line separator e cr Macintosh line separator e 1f Unix line separator SANS SERIF Use special commands for sans serif fonts e yes X no AUTO RECOGNIZE MATH Recognize math expressions written in italics e g i e yes X no IGNORE EMPTY PAR Ignore paragraphs not containing any text e yes X no RECOGNIZE NUMBERED EQ REF Recognize references to numbered equations marked with labels like 5 or 5 2 e yes X no ENDNOTES TO BIBLIO Convert endnotes to bibliogr
69. y a few places in the source code where the convertor handles these output formats differentiv Appendix B describes the configuration in details 2 3 Problems The problems and limitations of the Word object model and Word itself will be described in this section Sometimes funny The Word object model sometimes behaves funny in a couple of things Excep tions are ever and again thrown although there is no reason for Word to do it Word sometimes gives you completely bad information about the measures of tables and pages The most funny thing is to get a different output from a VBA macro and identical C code Citations Word has no tool for inserting citations e g 1 Ka78 into documents Somebody uses endnotes Insert Reference Footnote Endnote to in sert citations and therefore Word to ATgX can properly convert them to the bibliography environment The program may also convert the portions of text that match the citation pattern A Za z0 9_ to the commands speci fied in the configuration e g 1 to cite bib1 Cross references Word to BTEX converts cross references inserted through Insert Reference Cross reference only to bookmarks inserted through Insert Bookmark Other cross references to sections tables etc use Word internal codes e g PAGEREF Ref 133683482 and cannot be converted Colors The Word object model uses two data types for representing colors Word WdColor

Download Pdf Manuals

image

Related Search

Related Contents

Behringer Mixer PMP4000 User's Manual  ProSafe-RS 設置ガイダンス  LotusLive: Manual do utilizador de conversação do Sametime no  制定案 - JISC 日本工業標準調査会  Skech Custom Jacket  

Copyright © All rights reserved.
Failed to retrieve file