Difference between revisions of "Export"

From Wiki
Jump to navigation Jump to search
Line 467: Line 467:
 
</texcode>
 
</texcode>
  
So mode=none will suppress the contextual analysis, bypass the presentation-forms codepoints, and give the original input in the export.
+
So mode=none will suppress the contextual analysis, bypass the presentation-forms codepoints, and give the original input in the exported XML.
  
 
= More TODO =
 
= More TODO =

Revision as of 11:59, 19 June 2019


TODO: This page is work in progress. (See: To-Do List)


ConTeXt does not only produce beautiful PDFs, but can also export to XML/HTML. This is especially useful for creating eBooks in ePub format.

< XML | ePub >

Minimal example

% mode=mkiv
\setupbackend[export=yes] % this is all to activate export!

\starttext
\input tufte
\stoptext

Exported structure

If you compile an example as above as minimal.tex, you get a directory structure like this:

minimal.tex
minimal.log
minimal.pdf
minimal.tuc
minimal-export
├── cover.xhtml
├── images
├── minimal-div.xhtml
├── minimal-pub.lua
├── minimal-raw.xml
├── minimal-tag.xhtml
└── styles
    ├── minimal-defaults.css
    ├── minimal-images.css
    ├── minimal-styles.css
    └── minimal-templates.css

We will further refer to these files without the prefix ("minimal-"). We reformatted the code copies a bit to make them smaller and better readable.

div.xhtml

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<!--
    input filename   : minimal
    processing date  : Sat Jan 17 17:43:58 2015
    context version  : 2014.12.29 10:01
    exporter version : 0.33
-->
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:math="http://www.w3.org/1998/Math/MathML">
    <head>
        <meta charset="utf-8"/>
        <title></title>
<link type="text/css" rel="stylesheet" href="styles/minimal-defaults.css" />
<link type="text/css" rel="stylesheet" href="styles/minimal-images.css" />
<link type="text/css" rel="stylesheet" href="styles/minimal-styles.css" />
    </head>
    <body>
        <div class="warning">Rendering can be suboptimal because there is no default/fallback css loaded.</div>
<div>
We thrive in information--thick worlds because of our marvelous and everyday capacity to select, edit, single out, structure, highlight, group, pair, merge, harmonize, synthesize, focus, organize, condense, reduce, boil down, choose, categorize, catalog, classify, list, abstract, scan, look into, idealize, isolate, discriminate, distinguish, screen, pigeonhole, pick over, sort, integrate, blend, inspect, filter, lump, skip, smooth, chunk, average, approximate, cluster, aggregate, outline, summarize, itemize, review, dip into, flip through, browse, glance into, leaf through, skim, refine, enumerate, glean, synopsize, winnow the wheat from the chaff and separate the sheep from the goats.
</div>
    </body>
</html>

tag.xhtml

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!--
    input filename   : minimal
    processing date  : Sat Jan 17 17:43:58 2015
    context version  : 2014.12.29 10:01
    exporter version : 0.33
-->
<?xml-stylesheet type="text/css" href="styles/minimal-defaults.css" ?>
<?xml-stylesheet type="text/css" href="styles/minimal-images.css" ?>
<?xml-stylesheet type="text/css" href="styles/minimal-styles.css" ?>
<document href="minimal" language="en" date="Sat Jan 17 17:43:58 2015" context="2014.12.29 10:01" xmlns:m="http://www.w3.org/1998/Math/MathML" file="minimal" version="0.33">
We thrive in information--thick worlds because of our marvelous and everyday capacity to select, edit, single out, structure, highlight, group, pair, merge, harmonize, synthesize, focus, organize, condense, reduce, boil down, choose, categorize, catalog, classify, list, abstract, scan, look into, idealize, isolate, discriminate, distinguish, screen, pigeonhole, pick over, sort, integrate, blend, inspect, filter, lump, skip, smooth, chunk, average, approximate, cluster, aggregate, outline, summarize, itemize, review, dip into, flip through, browse, glance into, leaf through, skim, refine, enumerate, glean, synopsize, winnow the wheat from the chaff and separate the sheep from the goats.
</document>

raw.xml

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!--
    input filename   : minimal
    processing date  : Sat Jan 17 17:43:58 2015
    context version  : 2014.12.29 10:01
    exporter version : 0.33
-->
<?xml-stylesheet type="text/css" href="styles/minimal-defaults.css" ?>
<?xml-stylesheet type="text/css" href="styles/minimal-images.css" ?>
<?xml-stylesheet type="text/css" href="styles/minimal-styles.css" ?>
<document language="en" date="Sat Jan 17 17:43:58 2015" context="2014.12.29 10:01" xmlns:m="http://www.w3.org/1998/Math/MathML" file="minimal" version="0.33">
We thrive in information--thick worlds because of our marvelous and everyday capacity to select, edit, single out, structure, highlight, group, pair, merge, harmonize, synthesize, focus, organize, condense, reduce, boil down, choose, categorize, catalog, classify, list, abstract, scan, look into, idealize, isolate, discriminate, distinguish, screen, pigeonhole, pick over, sort, integrate, blend, inspect, filter, lump, skip, smooth, chunk, average, approximate, cluster, aggregate, outline, summarize, itemize, review, dip into, flip through, browse, glance into, leaf through, skim, refine, enumerate, glean, synopsize, winnow the wheat from the chaff and separate the sheep from the goats.
</document>

pub.lua

return {
 ["htmlfiles"]={ "minimal-div.xhtml" },
 ["htmlroot"]="minimal-div.xhtml",
 ["identifier"]="3ce74458-4cdd-829d-ace4-cf535fb00519",
 ["imagefile"]="styles/minimal-images.css",
 ["imagepath"]="images",
 ["images"]={},
 ["language"]="en",
 ["name"]="minimal",
 ["stylepath"]="styles",
 ["styles"]={ "minimal-defaults.css", "minimal-images.css", "minimal-styles.css" },
 ["xhtmlfiles"]={ "minimal-tag.xhtml" },
 ["xmlfiles"]={ "minimal-raw.xml" },
}

Required structuring of your ConTeXt code

The export contains usable content only for content that is "well structured" in an XML sense. In our above example all text ended up in the root tag document.

That means, you need to mark everything, from markup spans over paragraphs and enumeration items to chapters and parts with \start... … \stop....

Also note that switches like \em don’t translate into output structure, you need to \definehighlight[emph][style=italic] and use as \emph{emphasized}.

More useful example

% mode=mkiv
\mainlanguage[en]
\setupbackend[export=yes]

\setupinteraction[state=start,
	color=,contrastcolor=,
	% This metadata is used for the PDF
	title={My first eBook 1},
	subtitle={},
	keywords={},
	author={Hans 1}
]
\setupexport[
	hyphen=yes,
	% This metadata is used by ConTeXt’s ePub script
	% title, subtitle and author are taken from \setupinteraction, if not set
	title={My first eBook 2},
	subtitle={},
	author={Hans 2}
]
\settaggedmetadata[
	% here you can set as many metadata entries as you like, but you need to process them yourself
	title={My first eBook 3},
	author={Hans 3},
	subtitle={},
	version={\date} % TODO: doesn’t expand
]

\definehighlight[emph][style=italic] % use \emph{something} instead of {\em something}

\starttext

\startchapter[title=Example]

\startparagraph
\input tufte
\stopparagraph

\startsection[title={A section}]

\startparagraph
\input tufte

\startitemize[packed,joinup]
  \startitem First \stopitem
  \startitem Second \stopitem
  \startitem Third \stopitem
  \startitem Fourth\stopitem
\stopitemize

\stopparagraph

\startparagraph
\input knuth
\stopparagraph

\startparagraph
\input zapf
\stopparagraph

\stopsection
\stopchapter

\startchapter[title=Quoth\footnote{by Edgar Allan Poe}]
\startlines
\quotation{Prophet!} said I, \quotation{thing of evil!—prophet still, if bird or devil!
By that Heaven that bends above us—by that God we both adore—
Tell this soul with sorrow laden if, within the distant Aidenn,
It shall clasp a sainted maiden whom the angels name Lenore—
Clasp a rare and radiant maiden whom the angels name Lenore.}
\emph{Quoth the Raven \quotation{Nevermore.}}
\stoplines
\stopchapter

\stoptext

There’s also an example of an export-friendly ConTeXt file in the sources: export-example.tex.

Choice of output files

Only after such tagging we find significant differences between the three content output files:

TODO: explain differences between export variants

raw.xml

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!--
    input filename   : minimal
    processing date  : Sat Jan 17 19:42:37 2015
    context version  : 2014.12.29 10:01
    exporter version : 0.33
-->
<?xml-stylesheet type="text/css" href="styles/minimal-defaults.css" ?>
<?xml-stylesheet type="text/css" href="styles/minimal-images.css" ?>
<?xml-stylesheet type="text/css" href="styles/minimal-styles.css" ?>
<document xmlns:m="http://www.w3.org/1998/Math/MathML" title="My first eBook 1" version="0.33" author="{Hans 1} " context="2014.12.29 10:01" date="Sat Jan 17 19:42:37 2015" language="en" file="minimal">
 <section detail="chapter" chain="chapter" level="2">
  <metadata>
   <metavariable name="author">Hans 3</metavariable>
   <metavariable name="subtitle"></metavariable>
   <metavariable name="title">My first eBook 3</metavariable>
   <metavariable name="version">\date </metavariable>
  </metadata>
  <sectionnumber>1</sectionnumber>
   <sectiontitle>Ex­am­ple</sectiontitle>
  <sectioncontent>
   <paragraph>We thrive in in­for­ma­tion--thick worlds be­cause of our mar­velous and every­day ca­pac­ity to se­lect, edit, sin­gle out, struc­ture, high­light, group, pair, merge, har­mo­nize, syn­the­size, fo­cus, or­ga­nize, con­dense, re­duce, boil down, choose, cat­e­go­rize, cat­a­log, clas­sify, list, ab­stract, scan, look into, ide­al­ize, iso­late, dis­crim­i­nate, dis­tin­guish, screen, pi­geon­hole, pick over, sort, in­te­grate, blend, in­spect, fil­ter, lump, skip, smooth, chunk, av­er­age, ap­prox­i­mate, clus­ter, ag­gre­gate, out­line, sum­ma­rize, item­ize, re­view, dip into, flip through, browse, glance into, leaf through, skim, re­fine, enu­mer­ate, glean, syn­op­size, win­now the wheat from the chaff and sep­a­rate the sheep from the goats.</paragraph>
   <section detail="section" chain="section" level="3">
    <sectionnumber>1.1</sectionnumber>
     <sectiontitle>A sec­tion</sectiontitle>
    <sectioncontent>
     <paragraph>We thrive in in­for­ma­tion--thick worlds be­cause of our mar­velous and every­day ca­pac­ity to se­lect, edit, sin­gle out, struc­ture, high­light, group, pair, merge, har­mo­nize, syn­the­size, fo­cus, or­ga­nize, con­dense, re­duce, boil down, choose, cat­e­go­rize, cat­a­log, clas­sify, list, ab­stract, scan, look into, ide­al­ize, iso­late, dis­crim­i­nate, dis­tin­guish, screen, pi­geon­hole, pick over, sort, in­te­grate, blend, in­spect, fil­ter, lump, skip, smooth, chunk, av­er­age, ap­prox­i­mate, clus­ter, ag­gre­gate, out­line, sum­ma­rize, item­ize, re­view, dip into, flip through, browse, glance into, leaf through, skim, re­fine, enu­mer­ate, glean, syn­op­size, win­now the wheat from the chaff and sep­a­rate the sheep from the goats. <itemgroup detail="itemize" chain="itemize" packed="yes" symbol="1" level="1"><item><itemtag><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" display="inline"><m:mo></m:mo></m:math></itemtag><itemcontent>First</itemcontent></item><item><itemtag><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" display="inline"><m:mo></m:mo></m:math></itemtag><itemcontent>Sec­ond</itemcontent></item><item><itemtag><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" display="inline"><m:mo></m:mo></m:math></itemtag><itemcontent>Third</itemcontent></item><item><itemtag><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" display="inline"><m:mo></m:mo></m:math></itemtag><itemcontent>Fourth</itemcontent></item></itemgroup> </paragraph>
     <paragraph>Thus, I came to the con­clu­sion that the de­signer of a new sys­tem must not only be the im­ple­menter and first large--scale user; the de­signer should also write the first user man­ual.      <break/>
The sep­a­ra­tion of any of these four com­po­nents would have hurt TEX sig­nif­i­cantly. If I had not par­tic­i­pated fully in all these ac­tiv­i­ties, lit­er­ally hun­dreds of im­prove­ments would never have been made, be­cause I would never have thought of them or per­ceived why they were im­por­tant.      <break/>
But a sys­tem can­not be suc­cess­ful if it is too strongly in­flu­enced by a sin­gle per­son. Once the ini­tial de­sign is com­plete and fairly ro­bust, the real test be­gins as peo­ple with many dif­fer­ent view­points un­der­take their own ex­per­i­ments.</paragraph>
     <paragraph>Com­ing back to the use of type­faces in elec­tronic pub­lish­ing: many of the new ty­pog­ra­phers re­ceive their knowl­edge and in­for­ma­tion about the rules of ty­pog­ra­phy from books, from com­puter mag­a­zines or the in­struc­tion man­u­als which they get with the pur­chase of a PC or soft­ware. There is not so much ba­sic in­struc­tion, as of now, as there was in the old days, show­ing the dif­fer­ences be­tween good and bad ty­po­graphic de­sign. Many peo­ple are just fas­ci­nated by their PC’s tricks, and think that a widely--praised pro­gram, called up on the screen, will make every­thing au­to­matic from now on.</paragraph>
    </sectioncontent>
   </section>
  </sectioncontent>
 </section>
 <section detail="chapter" chain="chapter" level="2">
  <sectionnumber>2</sectionnumber>
   <sectiontitle>Quoth<descriptionsymbol detail="footnote"><sup>1</sup></descriptionsymbol></sectiontitle>
  <sectioncontent>
   <lines detail="lines" chain="lines">
    <line><delimited detail="quotation-1">“Prophet!”</delimited> said I, <delimited detail="quotation-1">“thing of evil!—prophet still, if bird or devil!”</delimited><line>By that Heaven that bends above us—by that God we both adore—</line><line>Tell this soul with sor­row laden if, within the dis­tant Aidenn,</line><line>It shall clasp a sainted maiden whom the an­gels name Lenore—</line><line>Clasp a rare and ra­di­ant maiden whom the an­gels name Lenore.</line></line>
    <line><highlight detail="emph">Quoth the Raven <delimited detail="quotation-1">“Nev­er­more.”</delimited></highlight></line>
   </lines>
   </sectioncontent>
  <description detail="footnote" chain="footnote">
   <descriptiontag><sup>1</sup> </descriptiontag>
   <descriptioncontent>by Edgar Al­lan Poe</descriptioncontent>
  </description>
 </section>
</document>
<break/>

tag.xhtml

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!--
    input filename   : minimal
    processing date  : Sat Jan 17 19:42:37 2015
    context version  : 2014.12.29 10:01
    exporter version : 0.33
-->
<?xml-stylesheet type="text/css" href="styles/minimal-defaults.css" ?>
<?xml-stylesheet type="text/css" href="styles/minimal-images.css" ?>
<?xml-stylesheet type="text/css" href="styles/minimal-styles.css" ?>

<document title="My first eBook 1" version="0.33" context="2014.12.29 10:01" href="minimal" author="{Hans 1} " xmlns:m="http://www.w3.org/1998/Math/MathML" file="minimal" language="en" date="Sat Jan 17 19:42:37 2015">
 <section chain="chapter" detail="chapter" level="2">
  <metadata>
   <metavariable name="author">Hans 3</metavariable>
   <metavariable name="subtitle"/>
   <metavariable name="title">My first eBook 3</metavariable>
   <metavariable name="version">\date </metavariable>
  </metadata>
  <sectionnumber>1</sectionnumber>
   <sectiontitle>Ex­am­ple</sectiontitle>
  <sectioncontent>
   <paragraph>We thrive in in­for­ma­tion--thick worlds be­cause of our mar­velous and every­day ca­pac­ity to se­lect, edit, sin­gle out, struc­ture, high­light, group, pair, merge, har­mo­nize, syn­the­size, fo­cus, or­ga­nize, con­dense, re­duce, boil down, choose, cat­e­go­rize, cat­a­log, clas­sify, list, ab­stract, scan, look into, ide­al­ize, iso­late, dis­crim­i­nate, dis­tin­guish, screen, pi­geon­hole, pick over, sort, in­te­grate, blend, in­spect, fil­ter, lump, skip, smooth, chunk, av­er­age, ap­prox­i­mate, clus­ter, ag­gre­gate, out­line, sum­ma­rize, item­ize, re­view, dip into, flip through, browse, glance into, leaf through, skim, re­fine, enu­mer­ate, glean, syn­op­size, win­now the wheat from the chaff and sep­a­rate the sheep from the goats.</paragraph>
   <section chain="section" detail="section" level="3">
    <sectionnumber>1.1</sectionnumber>
     <sectiontitle>A sec­tion</sectiontitle>
    <sectioncontent>
     <paragraph>We thrive in in­for­ma­tion--thick worlds be­cause of our mar­velous and every­day ca­pac­ity to se­lect, edit, sin­gle out, struc­ture, high­light, group, pair, merge, har­mo­nize, syn­the­size, fo­cus, or­ga­nize, con­dense, re­duce, boil down, choose, cat­e­go­rize, cat­a­log, clas­sify, list, ab­stract, scan, look into, ide­al­ize, iso­late, dis­crim­i­nate, dis­tin­guish, screen, pi­geon­hole, pick over, sort, in­te­grate, blend, in­spect, fil­ter, lump, skip, smooth, chunk, av­er­age, ap­prox­i­mate, clus­ter, ag­gre­gate, out­line, sum­ma­rize, item­ize, re­view, dip into, flip through, browse, glance into, leaf through, skim, re­fine, enu­mer­ate, glean, syn­op­size, win­now the wheat from the chaff and sep­a­rate the sheep from the goats. <itemgroup detail="itemize" symbol="1" chain="itemize" packed="yes" level="1"><item><itemtag><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" display="inline"><m:mo></m:mo></m:math></itemtag><itemcontent>First</itemcontent></item><item><itemtag><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" display="inline"><m:mo></m:mo></m:math></itemtag><itemcontent>Sec­ond</itemcontent></item><item><itemtag><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" display="inline"><m:mo></m:mo></m:math></itemtag><itemcontent>Third</itemcontent></item><item><itemtag><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" display="inline"><m:mo></m:mo></m:math></itemtag><itemcontent>Fourth</itemcontent></item></itemgroup> </paragraph>
     <paragraph>Thus, I came to the con­clu­sion that the de­signer of a new sys­tem must not only be the im­ple­menter and first large--scale user; the de­signer should also write the first user man­ual.      <break/>
The sep­a­ra­tion of any of these four com­po­nents would have hurt TEX sig­nif­i­cantly. If I had not par­tic­i­pated fully in all these ac­tiv­i­ties, lit­er­ally hun­dreds of im­prove­ments would never have been made, be­cause I would never have thought of them or per­ceived why they were im­por­tant.      <break/>
But a sys­tem can­not be suc­cess­ful if it is too strongly in­flu­enced by a sin­gle per­son. Once the ini­tial de­sign is com­plete and fairly ro­bust, the real test be­gins as peo­ple with many dif­fer­ent view­points un­der­take their own ex­per­i­ments.</paragraph>
     <paragraph>Com­ing back to the use of type­faces in elec­tronic pub­lish­ing: many of the new ty­pog­ra­phers re­ceive their knowl­edge and in­for­ma­tion about the rules of ty­pog­ra­phy from books, from com­puter mag­a­zines or the in­struc­tion man­u­als which they get with the pur­chase of a PC or soft­ware. There is not so much ba­sic in­struc­tion, as of now, as there was in the old days, show­ing the dif­fer­ences be­tween good and bad ty­po­graphic de­sign. Many peo­ple are just fas­ci­nated by their PC’s tricks, and think that a widely--praised pro­gram, called up on the screen, will make every­thing au­to­matic from now on.</paragraph>
    </sectioncontent>
   </section>
  </sectioncontent>
 </section>
 <section chain="chapter" detail="chapter" level="2">
  <sectionnumber>2</sectionnumber>
   <sectiontitle>Quoth<descriptionsymbol detail="footnote"><sup>1</sup></descriptionsymbol></sectiontitle>
  <sectioncontent>
   <lines chain="lines" detail="lines">
    <line><delimited detail="quotation-1">“Prophet!”</delimited> said I, <delimited detail="quotation-1">“thing of evil!—prophet still, if bird or devil!”</delimited><line>By that Heaven that bends above us—by that God we both adore—</line><line>Tell this soul with sor­row laden if, within the dis­tant Aidenn,</line><line>It shall clasp a sainted maiden whom the an­gels name Lenore—</line><line>Clasp a rare and ra­di­ant maiden whom the an­gels name Lenore.</line></line>
    <line><highlight detail="emph">Quoth the Raven <delimited detail="quotation-1">“Nev­er­more.”</delimited></highlight></line>
   </lines>
   </sectioncontent>
  <description chain="footnote" detail="footnote">
   <descriptiontag><sup>1</sup> </descriptiontag>
   <descriptioncontent>by Edgar Al­lan Poe</descriptioncontent>
  </description>
 </section>
</document>

div.xhtml

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<!--
    input filename   : minimal
    processing date  : Sat Jan 17 19:42:37 2015
    context version  : 2014.12.29 10:01
    exporter version : 0.33
-->
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:math="http://www.w3.org/1998/Math/MathML">
    <head>
        <meta charset="utf-8"/>
        <title>My first eBook 1</title>
<link type="text/css" rel="stylesheet" href="styles/minimal-defaults.css" />
<link type="text/css" rel="stylesheet" href="styles/minimal-images.css" />
<link type="text/css" rel="stylesheet" href="styles/minimal-styles.css" />
    </head>
    <body>
        <div class="warning">Rendering can be suboptimal because there is no default/fallback css loaded.</div>
<div>
 <div class="section chapter level-2">
  <div class="metadata">
   <div class="metavariable">Hans 3</div>
   <div class="metavariable"><!--empty-->
</div>
   <div class="metavariable">My first eBook 3</div>
   <div class="metavariable">\date </div>
  </div>
  <div class="sectionnumber">1</div>
   <div class="sectiontitle">Ex­am­ple</div>
  <div class="sectioncontent">
   <div class="paragraph">We thrive in in­for­ma­tion--thick worlds be­cause of our mar­velous and every­day ca­pac­ity to se­lect, edit, sin­gle out, struc­ture, high­light, group, pair, merge, har­mo­nize, syn­the­size, fo­cus, or­ga­nize, con­dense, re­duce, boil down, choose, cat­e­go­rize, cat­a­log, clas­sify, list, ab­stract, scan, look into, ide­al­ize, iso­late, dis­crim­i­nate, dis­tin­guish, screen, pi­geon­hole, pick over, sort, in­te­grate, blend, in­spect, fil­ter, lump, skip, smooth, chunk, av­er­age, ap­prox­i­mate, clus­ter, ag­gre­gate, out­line, sum­ma­rize, item­ize, re­view, dip into, flip through, browse, glance into, leaf through, skim, re­fine, enu­mer­ate, glean, syn­op­size, win­now the wheat from the chaff and sep­a­rate the sheep from the goats.</div>
   <div class="section level-3">
    <div class="sectionnumber">1.1</div>
     <div class="sectiontitle">A sec­tion</div>
    <div class="sectioncontent">
     <div class="paragraph">We thrive in in­for­ma­tion--thick worlds be­cause of our mar­velous and every­day ca­pac­ity to se­lect, edit, sin­gle out, struc­ture, high­light, group, pair, merge, har­mo­nize, syn­the­size, fo­cus, or­ga­nize, con­dense, re­duce, boil down, choose, cat­e­go­rize, cat­a­log, clas­sify, list, ab­stract, scan, look into, ide­al­ize, iso­late, dis­crim­i­nate, dis­tin­guish, screen, pi­geon­hole, pick over, sort, in­te­grate, blend, in­spect, fil­ter, lump, skip, smooth, chunk, av­er­age, ap­prox­i­mate, clus­ter, ag­gre­gate, out­line, sum­ma­rize, item­ize, re­view, dip into, flip through, browse, glance into, leaf through, skim, re­fine, enu­mer­ate, glean, syn­op­size, win­now the wheat from the chaff and sep­a­rate the sheep from the goats. <div class="itemgroup itemize symbol-1 packed-yes level-1"><div class="item"><div class="itemtag"><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" display="inline"><m:mo></m:mo></m:math></div><div class="itemcontent">First</div></div><div class="item"><div class="itemtag"><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" display="inline"><m:mo></m:mo></m:math></div><div class="itemcontent">Sec­ond</div></div><div class="item"><div class="itemtag"><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" display="inline"><m:mo></m:mo></m:math></div><div class="itemcontent">Third</div></div><div class="item"><div class="itemtag"><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" display="inline"><m:mo></m:mo></m:math></div><div class="itemcontent">Fourth</div></div></div> </div>
     <div class="paragraph">Thus, I came to the con­clu­sion that the de­signer of a new sys­tem must not only be the im­ple­menter and first large--scale user; the de­signer should also write the first user man­ual.      <div class="break"><!--empty-->
</div>
The sep­a­ra­tion of any of these four com­po­nents would have hurt TEX sig­nif­i­cantly. If I had not par­tic­i­pated fully in all these ac­tiv­i­ties, lit­er­ally hun­dreds of im­prove­ments would never have been made, be­cause I would never have thought of them or per­ceived why they were im­por­tant.      <div class="break"><!--empty-->
</div>
But a sys­tem can­not be suc­cess­ful if it is too strongly in­flu­enced by a sin­gle per­son. Once the ini­tial de­sign is com­plete and fairly ro­bust, the real test be­gins as peo­ple with many dif­fer­ent view­points un­der­take their own ex­per­i­ments.</div>
     <div class="paragraph">Com­ing back to the use of type­faces in elec­tronic pub­lish­ing: many of the new ty­pog­ra­phers re­ceive their knowl­edge and in­for­ma­tion about the rules of ty­pog­ra­phy from books, from com­puter mag­a­zines or the in­struc­tion man­u­als which they get with the pur­chase of a PC or soft­ware. There is not so much ba­sic in­struc­tion, as of now, as there was in the old days, show­ing the dif­fer­ences be­tween good and bad ty­po­graphic de­sign. Many peo­ple are just fas­ci­nated by their PC’s tricks, and think that a widely--praised pro­gram, called up on the screen, will make every­thing au­to­matic from now on.</div>
    </div>
   </div>
  </div>
 </div>
 <div class="section chapter level-2">
  <div class="sectionnumber">2</div>
   <div class="sectiontitle">Quoth<div class="descriptionsymbol footnote"><div class="sup">1</div></div></div>
  <div class="sectioncontent">
   <div class="lines">
    <div class="line"><div class="delimited quotation-1">“Prophet!”</div> said I, <div class="delimited quotation-1">“thing of evil!—prophet still, if bird or devil!”</div><div class="line">By that Heaven that bends above us—by that God we both adore—</div><div class="line">Tell this soul with sor­row laden if, within the dis­tant Aidenn,</div><div class="line">It shall clasp a sainted maiden whom the an­gels name Lenore—</div><div class="line">Clasp a rare and ra­di­ant maiden whom the an­gels name Lenore.</div></div>
    <div class="line"><div class="highlight emph">Quoth the Raven <div class="delimited quotation-1">“Nev­er­more.”</div></div></div>
   </div>
   </div>
  <div class="description footnote">
   <div class="descriptiontag"><div class="sup">1</div> </div>
   <div class="descriptioncontent">by Edgar Al­lan Poe</div>
  </div>
 </div>
</div>
    </body>
</html>

(WORK IN PROGRESS)

Export options

From back-exp.mkiv:

\setupexport[
   align=\raggedstatus,
   bodyfont=\bodyfontsize,
   width=\textwidth,
   title={}, % from interaction
   subtitle={}, % from interaction
   author={}, % from interaction
   firstpage=, % imagename
   lastpage=,  % imagename
   alternative=, % html or div
   properties=no,
   hyphen=no,
   svgstyle=,
   cssfile=,
]
  • The options align, bodyfont and width end up in the exported CSS.
  • title, subtitle and author default to those from \setupinteraction
  • firstpage, lastpage: cover image (end up only in pub.lua and is handled by ePub script)
  • hyphen: yes/no; include invisible hyphenation characters (soft hyphen, &shy;) at every possible place
  • alternative: html or div (influence on html export style? where?)
  • properties: no: ignore, yes: as attribute, otherwise: use as prefix (used where?)
  • svgstyle: maybe compression?
  • cssfile: file name of additional CSS file

Suppressing Presentation Forms in Arabic Script Fonts

As is well known, Arabic script requires contextual analysis: Given a character, it takes a different form depending on its position within a word or other continuous string of characters. Certain pre-Unicode conventions encoded each of these different forms for use in certain ancient typesetting applications. Those encodings are preserved as part of the Arabic Presentation Forms B block in Unicode (U+FE7 - U+FEF block). But this is a legacy encoding: These codepoints should never be used to prepare fresh Unicode documents. Rather, Arabic script characters should be encoded primarily using codepoints from the standard Arabic block (U+600-6FF). Contextual forms of these characters are called upon during OpenType processing, but they do not take separate codepoints.

Unfortunately, certain Arabic fonts such as Linotype Lotus - it is a staple of the Middle East publishing industry - give the contextual forms of standard Arabic-script characters Unicode names that correspond to codepoints from Arabic Presentation Forms B. This saves space in the font, and for some purposes it is innocuous. But for, e.g., ConTeXt processing, many of the original, standard codepoints in the input are replaced by presentation-form codepoints in the output. For example:

مِّنَ السَّمَاءِ وَالْأَرْضِ

becomes

ﻣﱢﻦَ اﻟﺴﱠﻤَﺎءِ وَاﻷَْرْضِ

If one looks carefully, one will see that there are errors in the output. (If one uses the font almfixed in a text editor that supports Unicode and Arabic script you will easily see the differences).

When exporting to XML, this issue become a serious problem, for the exported text will not use the same codepoints as the input. To get around this issue, use something along the lines of the following in your preamble:

% private typescript that combines TeX-Gyre Termes for Latin with Linotype Lotus for Arabic
\usetypescriptfile[type-times-lotus]
\usetypescript[times-lotus]

% choose your desired features for pdf output
\definefontfeature
   [lotus-default]
   [mode=node,language=dflt,script=arab,
    init=yes,medi=yes,fina=yes,isol=yes,
    liga=no,rlig=yes,trep=yes,tlig=yes, 
    mark=yes,ccmp=yes]

% use this in export mode

\definefontfeature
   [lotus-default]
   [mode=none]

% setup the bodyfont last; the order is important!
   
\setupbodyfont[times-lotus,12pt]

So mode=none will suppress the contextual analysis, bypass the presentation-forms codepoints, and give the original input in the exported XML.

More TODO

  • handling of images
  • which files get overwritten, which stay

Open Issues

as of 2014-01-20, updated 2015-08-08/-15 and 2017-11-11

  • FIXED Structure bug: Metadata ends up within the first section instead of in front of everything
  • FIXED Names of metavariables are missing in div.xhtml (could be better, but not that important)
  • Notes (footnotes): Only visual formatting, no semantical markup and no reference/ID
  • FIXED Delimited: Quotations have tagging and quotation marks, even in raw.xml (now marks have their own tags)
  • Firstpage/Lastpage (cover setup) is ignored in project structure
  • Minimal example doesn’t create a cover at all
  • \color leaves no trace in export.
  • Spacing characters like \, get lost.
  • Additional commas in export of register ranges (1,–,5 instead of 1–5).
  • There’s no marking of page breaks (ePub should have page break markers of the PDF for scientific quotability).

Workarounds

Problem: If you setup export, hyphen signs go missing.

Solution: Your font is missing a soft hyphen glyph at x00A0. Workaround:

   \enabledirectives[otf.checksofthyphen]