Getting Started with XML and ConTeXt using TEXML

From Wiki
Revision as of 07:33, 4 March 2005 by Paul (talk | contribs) (First posted page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Goal

This document is for XML authors who want to use open source software to produce high quality PDF documents--right now. The most official way to convert XML to PDF has been to use the FO language, but the only open source project to convert FO is FOP, and it doesn't come close to implementing all the standards. It cannot center tables, for example, and it has no way to control orphan text. The FOP developers have not made any changes in the last 1 1/2 years, making believe it is a dead end project.

ConTeXt, a variation of tex, has almost none of the limitations one runs into when using FOP. If we know how to use ConTeXt in place of FO, we can produce the documents we want right now. Don't worry if you have never seen FOP. Just ignore those parts and you should still get a good idea on how to use ConTeXt.

Since an XML author will use XSLT for conversion, we can dispense with many of the macros written in ConTeXt that produce such things as titles and table of contents. We'll let XSLT do a lot of the work and use the most stipped down version of ConTeXt we can.

This document assumes that the user already has ConTeXt installed and knows how to use it. It also assumes that he has a passing knowledge of XM and XSLT.

Converting From XML

Being text-based, ConTeXt does not lend itself well to a conversion from an XML tree. If an innoncent blank line from an XML document finds its way into a ConTeXt document, we end up with an extra paragraph division. One way around this problem is to use ConTeXt's native XML mapping, which you can read about on the ConTeXt home page. I find this mapping scheme too complicated, which is why I advocate using http://getfo.sourceforge.net/texml/index.html TeXML. TeXML is a python utility that converts its own special form of XML into ConTeXt. That means you can use XSLT to convert from one XML tree to another and then let the python utlity to the dirty work of handling white space.

TeXML uses a very simple XML language. Basically, it represents ConTeXt commands in XML and does little more. One could look at a TeXML document and immediately know what the author meant to express in ConTeXt. In converting an XML document such as TEI to TeXML, one is coming as close as possible to actually conerting to ConTeXt itself, without having to worry about white space, and while having the comfort of still working with an XML tree. If you use TeXML to convert, you really won't have to learn a new XML languge, since TeXML consists of very few elements. Instead, you will still think in terms of ConTeXt.

Simple Document in ConTeXt and in TeXML

Here is the simplest ConTeXt document:


\starttext
hello world
\stoptext

In TeXML, this looks like:


<?xml version="1.0"?>
<TeXML>
  <env name="text">
    <TeXML>Hello World</TeXML>
  </env>
</TeXML>

Follow the instruction to convert the TeXML to ConTeXt. As of writing this document, I had to set an environmental variable to tell TeXML it was converting to ConTeXt (as opposed to latex),

and then type


texml.py -e utf  

.

Our simple document consists of one envrionment, the text environonment. Like all environments in ConTeXt, this one starts with a backslash followed by the word "start", and then followed by the name of the environment wihout a space. We end this environment in the same way, replacing "start" with "stop."

In TeXML, we enclose environments with the


env

element. The mandatory "name" attribute defines the environment's name.

Commands

Aside from environments, we also have commands in ConTeXt. Through commands we control the text formatting in ConTeXt. Commands start with a backslash and can be followed by setups, which are placed in brackets, and by the "scope or range of the command," which are placed in curly brackets.

For example, if we wanted to create a simple document with just one box, inside of which were the lines "that's it," we write:


\starttext
\framed[width=2cm,height=1cm]{that's it} 
\stoptext

In TeXML, this looks like:


<?xml version="1.0"?>
<TeXML>
  <env name="text">
    <TeXML>
        <cmd name="framed">
           <opt>width=2cm, height=1cm</opt>
           <parm>that's it</parm>
    </TeXML>
  </env>
</TeXML>

Other Preliminaries

In order to makes sure that that our unicoded XML documents get converted properly, we want to put the following line at the top of all our documents:


\enableregime[utf]

Apparently, this allows ConTeXt to handle both utf8 and utf16.

In addtion, we want to disable as many of ConTeXt's automatic modes as possible, since we will generate things like titles and sections our self. ConTeXt automatically places a number on each page. To turn this feature off, place this line somewhere at the top of your document:



\setuppagenumbering[state=stop]

We might alter this command in some ways later.

Example Documents

Here are two very simple documents, one in plain old ConTeXt, and one in TeXML.