Difference between revisions of "Aleph Guide"

From Wiki
Jump to navigation Jump to search
(Idris wrote this to the ML; first save)
 
m (→‎Installing: link to existing mirror)
(21 intermediate revisions by 8 users not shown)
Line 1: Line 1:
 +
< [[Aleph]] | [[Arabic and Hebrew]] >
 +
 
=Aleph in ConTeXt: A Guide to the Perplexed=
 
=Aleph in ConTeXt: A Guide to the Perplexed=
  
 
(with apologies to Maimonides)
 
(with apologies to Maimonides)
  
by Prof. Idris Samawi Hamid
+
by Prof. [[User:Ishamid|Idris Samawi Hamid]]
 +
 
 +
__TOC__
  
 
==Introduction==
 
==Introduction==
  
Aleph is a typesetting engine derived from Omega and eTeX. Reasons for Aleph:
+
[[Aleph]] is a typesetting engine derived from Omega and eTeX. Reasons for Aleph:
  
 
# ConTeXt depends on the eTeX extensions, and even LaTeX now defaults to pdfeTeX.
 
# ConTeXt depends on the eTeX extensions, and even LaTeX now defaults to pdfeTeX.
Line 20: Line 24:
  
 
''Aleph'', inheriting from ''Omega'', provides many ready-to-go filters, using a Times Roman like font for Latin, Greek, and Cyrillic scripts.
 
''Aleph'', inheriting from ''Omega'', provides many ready-to-go filters, using a Times Roman like font for Latin, Greek, and Cyrillic scripts.
The ConTeXt module for this setup is called ''Gamma'' ([[source:m-gamma.tex]]); this is a port of the ''Lambda'' (i.e., LaTeX) style files to ConTeXt. The font typescript is called type-omg.
+
The ConTeXt module for this setup is called ''Gamma'' ([[source:m-gamma.tex|m-gamma.tex]]); this is a port of the ''Lambda'' (i.e., LaTeX) style files to ConTeXt. The font typescript is called type-omg.
  
 
==Installing==
 
==Installing==
Line 28: Line 32:
 
Users of MiKTeX and other OS's will need to adjust the following instructions to their own setups.
 
Users of MiKTeX and other OS's will need to adjust the following instructions to their own setups.
  
* Make sure you have a very recent version of ConTeXt that supports the engine path mechanism. This mechanism allows texexec to manage two, e.g., cont-en.fmt files at once, one in <code>texmf/web2c/aleph</code> and one in <code>texmf/web2c/pdfetex</code>
+
* Make sure you have a very recent version of ConTeXt that supports the engine path mechanism. This mechanism allows <tt>texexec</tt> to manage two, e.g., <tt>cont-en.fmt</tt> files at once, one in <code>texmf/web2c/aleph</code> and one in <code>texmf/web2c/pdfetex</code>
  
How recent, you ask? Just be safe and get the latest <tt>:-)</tt>
+
How recent, you ask? Just be safe and get the latest :-)
  
 
* Some configuration points:
 
* Some configuration points:
 +
** Make sure you have the following line in <tt>texexec.ini</tt> set to "true":<texcode>set  UseEnginePath      to  true</texcode>
 +
** In <tt>texmf-local\web2c\texmf.cnf</tt>, <tt>texmf-local\web2c\context.cnf</tt>, and <tt>texmf\web2c\texmf.cnf</tt> comment out the following line
 +
<texcode> extra_mem_bot.context    = 2000000</texcode>
 +
otherwise Aleph will crash under some conditions, like overfull boxes and the like...<br>The [[XeTeX]] developer found the source to this bug, and a fix; hopefully [[User:Oblomov|Giuseppe]] will get to it :-))
 +
* Get the omega support files [http://ftp.cvut.cz/tex-archive/obsolete/systems/win32/fptex/0.7/package/omega.zip omega.zip] and [http://ftp.cvut.cz/tex-archive/obsolete/systems/win32/fptex/0.7/package/omegafonts.zip omegafonts.zip].
 +
* Get rid of two directories from omega.zip (not really necessary but if you want to be efficient): <tt>texmf/eomega</tt> and <tt>texmf/omega/encodings</tt>
 +
* Put support files in <tt>texmf-local</tt>.
 +
* Replace the default uni2cuni.ocp and uni2cuni.otp with this one: [[Media:Uni2cuni.zip]] This contains two ocp's: uni2cuni.ocp and uni2cuni-math.ocp. uni2cuni-math is the equicalent of the old omega uni2cuni.
 +
This makes r-l numeral labels with separators easier to handle. Always use something like $<numeral>$ or ${\tf <numeral>}$ for mathematics and decimal points. $<numeral>$ uses the default math font; ${\tf <numeral>}$ uses the digits from the main text font.
 +
* Compile the Aleph format: <texcode>mktexlsr<br>texexec --make en -tex=aleph</texcode>
 +
* Here is a test file. Note the preamble
 +
<texcode> %tex=aleph output=dvipdfmx</texcode>
  
a. Make sure you have the following line in
+
at the beginning of every Aleph file.
 
 
ConTeXt\tex\texmf-local\context\config\texexec.ini
 
 
 
set to "true", viz.,
 
 
 
set  UseEnginePath      to  true
 
 
 
b. In
 
 
 
texmf-local\web2c\texmf.cnf,
 
texmf-local\web2c\context.cnf, and
 
texmf\web2c\texmf.cnf,
 
 
 
comment this line as follows
 
 
 
%extra_mem_bot.context    = 2000000
 
 
 
otherwise aleph will crash under some conditions, like overfull boxes and the like... The XeTeX developer found the source to this bug, and a fix; hopefully Giuseppe will get to it-))
 
 
 
3. Get the omega support files:
 
 
 
http://www.ctan.org/get?fn=/systems/win32/fptex/0.7/package/omega.zip
 
http://www.ctan.org/get?fn=/systems/win32/fptex/0.7/package/omegafonts.zip
 
 
 
4. Get rid of the following directories from omega.zip (not really necessary but if u want to be
 
efficient):
 
 
 
texmf/eomega
 
texmf/omega/encodings
 
 
 
5. Put support files in texmf-local;
 
 
 
6. Compile the Aleph format:
 
 
 
mktexlsr
 
 
 
texexec --make en -tex=aleph
 
 
 
7. Here is a test file. Note the preamble
 
 
 
% tex=aleph output=dvipdfmx
 
 
 
at the beginning of every aleph file.
 
  
'''omarb.tex'''
+
===omarb.tex===
 
<texcode>
 
<texcode>
 
% tex=aleph output=dvipdfmx
 
% tex=aleph output=dvipdfmx
\input m-gamma.tex
+
\usemodule[gamma]
\input type-omg.tex
+
\input type-omg.tex % perhaps \usetypescriptfile[type-omg] ?
  
 
\setupbodyfont[omlgc,12pt]
 
\setupbodyfont[omlgc,12pt]
Line 156: Line 129:
 
</texcode>
 
</texcode>
  
8. For Arabic script you will probably want to use an encoding that supports direct Arabic-script editing. There are three: utf-8, iso-8859-6 (apple-unix), and cp1256 (micro$oft). We can define the following, using ConTeXt macros for managing filter sequences. Maybe I will add these to m-gamma and ask Hans to distribute. In the meantime, here are some definitions, samples of all three encodings, and an example of mixed lr-rl text:
+
* For Arabic script you will probably want to use an [[Encodings_and_Regimes|encoding]] that supports direct Arabic-script editing. There are three: utf-8, iso-8859-6 (Apple/Unix), and cp1256 (Microsoft). We can define the following, using ConTeXt macros for managing filter sequences. Maybe I will add these to [[source:m-gamma-tex|m-gamma]] and ask [[User:Hagen|Hans]] to distribute. In the meantime, here are some definitions, samples of all three encodings, and an example of mixed lr-rl text:
  
'''m-arabic-enc.tex'''
+
===m-arabic-enc.tex===
 
<texcode>
 
<texcode>
 
% tex=aleph output=dvipdfmx
 
% tex=aleph output=dvipdfmx
%\input m-gamma.tex
+
%\usemodule[gamma]
\input type-omg.tex
 
 
\usetypescriptfile[type-omg]
 
\usetypescriptfile[type-omg]
 
\usetypescript[OmegaArab]
 
\usetypescript[OmegaArab]
Line 200: Line 172:
 
% For inner paragraph control within an LR paragraph
 
% For inner paragraph control within an LR paragraph
  
\def\ArabicTextUTF#1{{\textdir TRT\usefiltersequence[UTFArabic]%
+
\definestartstop
                    \switchtobodyfont[omarb]#1\textdir TLT
+
  [arabictextutf]
                    \clearocplists}}
+
  [commands=%
 +
    {\textdir TRT%
 +
    \switchtobodyfont[omarb]%
 +
    \usefiltersequence[UTFArabic]}]
  
\def\ArabicTextISO#1{{\textdir TRT\usefiltersequence[ISOArabic]%
+
\definestartstop
                    \switchtobodyfont[omarb]#1\textdir TLT
+
  [arabictextiso]
                    \clearocplists}}
+
  [commands=%
 +
    {\textdir TRT%
 +
    \switchtobodyfont[omarb]%
 +
    \usefiltersequence[ISOArabic]}]
  
\def\ArabicTextWIN#1{{\textdir TRT\usefiltersequence[WINFArabic]%
+
\definestartstop
                    \switchtobodyfont[omarb]#1\textdir TLT
+
  [arabictextwin]
                    \clearocplists}}
+
  [commands=%
 +
    {\textdir TRT%
 +
    \switchtobodyfont[omarb]%
 +
    \usefiltersequence[WINArabic]}]
 +
 
 +
\def\ArabicTextUTF#1{\startarabictextutf#1\stoparabictextutf}
 +
 
 +
\def\ArabicTextISO#1{\startarabictextiso#1\stoparabictextiso}
 +
 
 +
\def\ArabicTextWIN#1{\startarabictextwin#1\stoparabictextwin}
  
 
% For global Arabic script
 
% For global Arabic script
Line 258: Line 245:
 
\startarabutf
 
\startarabutf
  
اللَّهُمَّ صَلِّ عَلَى مُحَمَّدٍ وَ
+
اللَّهُمَّ صَلِّ عَلَى مُحَمَّدٍ وَ
آلِ مُحَمَّدٍ وَ ارْزُقْنِي
+
آلِ مُحَمَّدٍ وَ ارْزُقْنِي
الْيَقِينَ وَ حُسْنَ الظَّنِّ بِكَ
+
الْيَقِينَ وَ حُسْنَ الظَّنِّ بِكَ
وَ أَثْبِتْ رَجَاءَكَ فِي قَلْبِي
+
وَ أَثْبِتْ رَجَاءَكَ فِي قَلْبِي
وَ اقْطَعْ رَجَائِي عَمَّنْ سِوَاكَ
+
وَ اقْطَعْ رَجَائِي عَمَّنْ سِوَاكَ
حَتَّى لَا أَرْجُوَ غَيْرَكَ وَ لَا
+
حَتَّى لَا أَرْجُوَ غَيْرَكَ وَ لَا
أَثِقَ إِلَّا بِك‏
+
أَثِقَ إِلَّا بِك‏
  
 
\stoparabutf
 
\stoparabutf
Line 292: Line 279:
 
\blank
 
\blank
  
Here is some mixed {\em Arabic-} (\ArabicTextUTF{عربي}) and
+
Here is some mixed {\em Arabic-} (\ArabicTextUTF{عربي}) and
 
Latin-script. As you can see, Aleph does a very good job mixing
 
Latin-script. As you can see, Aleph does a very good job mixing
{\em LR} (\ArabicTextUTF{يسار-يمين}) and {\em RL}
+
{\em LR} (\ArabicTextUTF{يسار-يمين}) and {\em RL}
(\ArabicTextUTF{يمين-يسار}) texts. \ArabicTextUTF{و
+
(\ArabicTextUTF{يمين-يسار}) texts. \ArabicTextUTF{و
هنا جملة منقطعة في وسط قرينة
+
هنا جملة منقطعة في وسط قرينة
لاتينية}. Aleph even does a great job breaking Arabic
+
لاتينية}. Aleph even does a great job breaking Arabic
 
phrases across lines.
 
phrases across lines.
  
Line 305: Line 292:
 
==Going beyond==
 
==Going beyond==
  
The last example shows how to make and apply your own filter sequences beyond the basic Gamma module. To go further u need to learn some low-level business. You will also need some working utilities. I have put together a windows package that you can unzip to C:\ConTeXt. These utilities do work, but they are cobbled together from old fpTeX and MiKTeX versions. just place the tree in C:\ConTeXt\
+
* [[File:Utilities-aleph.zip|utilities-aleph.zip]]
  
1. Example: If you want to get the final Persian kaaf instead of the default Arabic one:
+
The last example shows how to make and apply your own filter sequences beyond the basic Gamma module. To go further you need to learn some low-level business. You will also need some [[File:Utilities-aleph.zip|working utilities]]. I have put together a windows package that you can unzip to <tt>C:\ConTeXt</tt>. These utilities do work, but they are cobbled together from old fpTeX and MiKTeX versions.
 +
 
 +
===Example: If you want to get the final Persian kaaf instead of the default Arabic one===
  
 
Check to see if your glyph is in the Arabic font. The Arabic font is made of 6 raw fonts: 3 regular and three bold:
 
Check to see if your glyph is in the Arabic font. The Arabic font is made of 6 raw fonts: 3 regular and three bold:
 
+
  texmf-local/fonts/type1/public/omega
C:\ConTeXt\tex\texmf-local\fonts\type1\public\omega
+
  omsea1, omsea1b, ... omsea3b
 
 
omsea1, omsea1b,...omsea3b
 
  
 
Using a font viewer or editor you will find the Persian final kaaf in omsea2, named kafswashfin.
 
Using a font viewer or editor you will find the Persian final kaaf in omsea2, named kafswashfin.
  
 
Now go to
 
Now go to
 
+
<tt>texmf-local/omega/lambda/misc</tt>
C:\ConTeXt\tex\texmf-local\omega\lambda\misc
 
 
 
 
and open
 
and open
 
+
<tt>omarab.cfg</tt>
omarab.cfg
 
 
 
 
you will find a line
 
you will find a line
  
04AA N kafswashfin
+
  04AA N kafswashfin
  
 
This means that the 044A is the virtual font position for kafswashfin. Open cuni2oar.otp and add the following at line 263:
 
This means that the 044A is the virtual font position for kafswashfin. Open cuni2oar.otp and add the following at line 263:
  
%@"E343 => @"04AA;
+
  %@"E343 => @"04AA;
  
 
Following this line you should see
 
Following this line you should see
  
% remaining Arabic glyphs
+
  % remaining Arabic glyphs
@"E000-@"E3FF => #(\1 - @"DF00);
+
  @"E000-@"E3FF => #(\1 - @"DF00);
  
Basically, in uni2cuni.otp final-kaaf gets mapped to E343. In the font, we want it mapped to kafswashfin, so we did that. Now recompile the otp:
+
Basically, in <tt>uni2cuni.otp</tt> final-kaaf gets mapped to E343. In the font, we want it mapped to kafswashfin, so we did that. Now recompile the otp:
  
otp2ocp cuni2oar
+
  otp2ocp cuni2oar
  
 
Now you will get kafswashfin for the final kaaf.
 
Now you will get kafswashfin for the final kaaf.
  
2. Want new fonts (Arabic or Latin). Here are the instructions:
+
===Want new fonts (Arabic or Latin)? Here are the instructions===
  
1. Read the following two papers carefully again and again; they are
+
* Read the following two papers carefully again and again; they are your friends :-)
  your friends:-)
+
** [http://omega.enstb.org/papers/tsukuba-methods97.pdf tsukuba-methods]
 +
** [http://omega.enstb.org/papers/ridt-omega98.pdf ridt-omega]
 +
* Make a pfb file containing the glyphs you need, or use an existing font
 +
* Make a cfg file a la <tt>texmf\omega\lambda\misc\omlgc.cfg</tt>. Make sure you list your glyph positions in hexadecimal notation.
 +
* Get <tt>makeovp.pl</tt> from the utilities. I made some small changes, <tt>makeovp2.pl</tt>. Try them both and use what works for you. There is a SH file with a sample of its use using omlgc.
 +
* Following are instructions for cooking omarab.ovf.
 +
** You want your own ovf, say, omlgcch.ovf.
 +
** Generate an afm file for your private glyph pfb/pfa plus the afm files that are listed in the SH file (base files for omlgc found in <tt>\texmf\fonts\afm\public\omega</tt>)
 +
** Using the instructions below and the SH file ('''ignore''' the kernings.afm file!) you can figure out how to make your own ovp and ovf.
 +
** Before making the ovf file, examine the ovp file created, especially the first few lines, to see how the font-metric info from the afm's are concatenated. Very instructive.
 +
* Don't forget the rest of the accounting:
 +
** adding lines to a map file and pointing dvips/dvipdfm to it;
 +
** create a typescript file;
 +
** edit your otp's. If you get stuck be sure to read [http://omega.enstb.org/papers/tsukuba-arabic97.pdf tsukuba-arabic]
  
http://omega.enstb.org/papers/tsukuba-methods97.pdf
+
====How to cook omarab.ovf====
http://omega.enstb.org/papers/ridt-omega98.pdf
+
Ingredients: omarab.cfg, omseco.afm, omsea1.afm, omsea2.afm, omsea3.afm
  
2. Make a pfb file containing the glyphs you need, or use an existing font
+
  > perl makeovp.pl omarab.cfg omseco.afm omsea1.afm omsea2.afm omsea3.afm omarab.ovp
 +
  > pltotf omseco.pl omseco.tfm
 +
  > pltotf omsea1.pl omsea1.tfm
 +
  > pltotf omsea2.pl omsea2.tfm
 +
  > pltotf omsea3.pl omsea3.tfm
 +
  > ovp2ovf omarab.ovp omarab.ovf omarab.ofm
  
3.Make a cfg file a la texmf\omega\lambda\misc\omlgc.cfg Make sure u list your glyph
+
If the last line does not work, try
  positions in hexadecimal notation.
 
  
5. Get the following from an old TeXLive distro: \support\makeovp.zip,
+
  > ovp2ovf omarab.ovp omarab.ovf omarab.tfm
  containing makeovp.pl.  There is a SH file with a sample of its use
 
  using omlgc.
 
  
4. Following are instructions for cooking omarab.ovf. You want your
+
rename omarab.tfm to omarab.ofm --> ofm directory
  own ovf, say, omlgcch.ovf (<ch> for <cherokee>). Generate an afm
 
  file for your private glyph pfb/pfa plus the afm files that are
 
  listed in the SH file (base files for omlgc found in
 
  \texmf\fonts\afm\public\omega )
 
  
Using the instructions below and the SH file (IGNORE the kernings.afm
+
====How to distill omarab.ovp from omarab.ovf:====
file!) you can figure out how to make your own ovp and ovf. Before
+
Use a different directory or a different name for the output ovp so that omarab.ovp created above is not overwritten.
making the ovf file, examine the ovp file created, especially the
 
first few lines, to see how the font-metric info from the afm's are
 
concatenated. Very instructive.
 
  
6. Don't forget the rest of the accounting:
+
get omarab.ofm & rename to omarab.tfm
  
a) adding lines to a map file and pointing dvips/dvipdfm to it;
+
  > ovf2ovp omarab.ovf omarab.tfm omarab.ovp
b) create a typescript file;
 
c) edit your otp's. If u get stuck be sure to read
 
  
http://omega.enstb.org/papers/tsukuba-arabic97.pdf
+
====How to cook omarabb.ovf====
 +
Ingredients: omarab.cfg, omsecob.afm, omsea1b.afm, omsea2b.afm, omsea3b.afm
  
<code>
+
  > perl makeovp.pl omarab.cfg omsecob.afm omsea1b.afm omsea2b.afm omsea3b.afm omarabb.ovp
[How to cook omarab.ovf:]
+
  > pltotf omsecob.pl omsecob.tfm
[Ingredients: omarab.cfg, omseco.afm, omsea1.afm, omsea2.afm, omsea3.afm]
+
  > pltotf omsea1b.pl omsea1b.tfm
 +
  > pltotf omsea2b.pl omsea2b.tfm
 +
  > pltotf omsea3b.pl omsea3b.tfm
 +
  > ovp2ovf omarabb.ovp omarabb.ovf omarabb.ofm
  
#perl makeovp.pl omarab.cfg omseco.afm omsea1.afm omsea2.afm omsea3.afm omarab.ovp
 
#pltotf omseco.pl omseco.tfm
 
#pltotf omsea1.pl omsea1.tfm
 
#pltotf omsea2.pl omsea2.tfm
 
#pltotf omsea3.pl omsea3.tfm
 
#ovp2ovf omarab.ovp omarab.ovf omarab.ofm
 
  
[If the last line does not work, try
+
If the last line does not work, try
#ovp2ovf omarab.ovp omarab.ovf omarab.tfm
 
rename omarab.tfm to omarab.ofm ===> ofm directory]
 
  
-----------------------------
+
  > ovp2ovf omarabb.ovp omarabb.ovf omarabb.tfm
[How to distill omarab.ovp from omarab.ovf:]
 
  
[Use a different directory or a different name for
+
rename omarab.tfm to omarab.ofm --> ofm directory
the output ovp so that omarab.ovp created above is not overwritten]
 
  
[get omarab.ofm & rename to omarab.tfm]
+
====How to distill omarabb.ovp from omarabb.ovf====
 +
Use a different directory or a different name for the output ovp so that omarabb.ovp created above is not overwritten
  
#ovf2ovp omarab.ovf omarab.tfm omarab.ovp
+
get omarab.ofm & rename to omarab.tfm
=========================================================
 
[How to cook omarabb.ovf:]
 
[Ingredients: omarab.cfg, omsecob.afm, omsea1b.afm, omsea2b.afm, omsea3b.afm]
 
  
#perl makeovp.pl omarab.cfg omsecob.afm omsea1b.afm omsea2b.afm omsea3b.afm omarabb.ovp
+
  > ovf2ovp omarabb.ovf omarabb.tfm omarabb.ovp
#pltotf omsecob.pl omsecob.tfm
 
#pltotf omsea1b.pl omsea1b.tfm
 
#pltotf omsea2b.pl omsea2b.tfm
 
#pltotf omsea3b.pl omsea3b.tfm
 
#ovp2ovf omarabb.ovp omarabb.ovf omarabb.ofm
 
  
[If the last line does not work, try
+
* For more info, there is also the (mostly cryptic) Omega manual [http://omega.enstb.org/roadmap/doc-1.12.ps in PS-Format]. Don't ask me why it's not in PDF. <tt>:-(</tt>
#ovp2ovf omarabb.ovp omarabb.ovf omarabb.tfm
+
See also [http://omega.enstb.org/papers/tsukuba-arabic97.pdf tsukuba-arabic]
rename omarab.tfm to omarab.ofm ===> ofm directory]
 
-----------------------------
 
[How to distill omarabb.ovp from omarabb.ovf:]
 
 
 
[Use a different directory or a different name for
 
the output ovp so that omarabb.ovp created above is not overwritten]
 
 
 
[get omarab.ofm & rename to omarab.tfm]
 
 
 
#ovf2ovp omarabb.ovf omarabb.tfm omarabb.ovp
 
</code>
 
 
 
3. For more info, there is also the (mostly cryptic) Omega manual [http://omega.enstb.org/roadmap/doc-1.12.ps in PS-Format]. Don't ask me why it's not in PDF. <tt>:-(</tt>
 
 
 
See also http://omega.enstb.org/papers/tsukuba-arabic97.pdf
 
  
 
==Miscellaneous==
 
==Miscellaneous==
  
# Some people have gotten large OpenType fonts to work in Aleph/Omega. Probably they used FontForge to convert to CFF-enriched type1. FF can produce ofm files (large tfms) so that's a help too.
+
* Some people have gotten large OpenType fonts to work in Aleph/Omega. Probably they used [http://fontforge.sourceforge.net FontForge] to convert to CFF-enriched type1. FF can produce ofm files (large tfms) so that's a help too.
# Me, I'm working on an advanced Arabic-script typesetting system that really pushes Aleph to the max. At present I don't actually use m-gamma, etc, but my own macros. I really hope to release something this year...
+
* Me, I'm working on an advanced Arabic-script typesetting system that really pushes Aleph to the max. At present I don't actually use [[source:m-gamma.tex|m-gamma]], etc, but my own macros. I really hope to release something this year...
#See also http://www.dtek.chalmers.se/~d97ost/omega-example.html
+
* See also [http://www.dtek.chalmers.se/~d97ost/omega-example.html Omega example]
  
 
==To the future==
 
==To the future==
  
# The otp mechanism does not seem well suited to support, e.g., opentype GPOS tables, important for really advanced Arabic (though GDEF and GSUB should work fine with the present mechanism for most purposes). We need a better model for horizontal and vertical glyph substitutions.
+
* The otp mechanism does not seem well suited to support, e.g., OpenType GPOS tables, important for really advanced Arabic (though GDEF and GSUB should work fine with the present mechanism for most purposes). We need a better model for horizontal and vertical glyph substitutions.
# The low-level filtersequence mechanism needs to abstract language processing from font mapping. Right now both are hardwired into a single sequence, so setting up more than one font for a single language is more of a pain than it should be.
+
* The low-level filtersequence mechanism needs to abstract language processing from font mapping. Right now both are hardwired into a single sequence, so setting up more than one font for a single language is more of a pain than it should be.
# The otp language is a bit cryptic. Hans has suggested switching otp's to a new language (like lua or io) but I don't know how hard that will be...
+
* The otp language is a bit cryptic. Hans has suggested switching otp's to a new language (like lua or io) but I don't know how hard that will be...
# One very important feature which may work better at the primitive/engine level by extending the pdfetex engine:
+
* One very important feature which may work better at the primitive/engine level by extending the pdfeTeX engine: glyph substitution that depends on the paragraph. For example: In traditional Arabic typography, one way to compensate for "underfull" paragraphs is to substitute a "swash" version of a letter. Another way is by stretching the cursive tie between joining characters (which is already implemented in my own Arabic system). Combined with HZ we can get some pretty interesting high-level options, effects, etc. that the user can choose etc.
glyph substitution that depends on the paragraph. For example: In traditional Arabic typography, one way to
+
 
compensate for "underfull" paragraphs is to substitute a "swash" version of a letter. Another way is by
+
{{Installation navbox}}
stretching the cursive tie between joining characters (which is already implemented in my own Arabic
 
system). Combined with HZ we can get some pretty interesting high-level options, effects, etc. that the user can choose etc.
 

Revision as of 00:12, 15 April 2013

< Aleph | Arabic and Hebrew >

Aleph in ConTeXt: A Guide to the Perplexed

(with apologies to Maimonides)

by Prof. Idris Samawi Hamid

Introduction

Aleph is a typesetting engine derived from Omega and eTeX. Reasons for Aleph:

  1. ConTeXt depends on the eTeX extensions, and even LaTeX now defaults to pdfeTeX.
  2. Omega provides a nice foundation for multilingual typesetting with large (>256) character sets, including large virtual fonts, but a stable, dependable version has not been a priority with its developers.
    1. In particular, the RL-LR code works excellently for the most part (minor bugs, easy to work around).
    2. Omega 1.15 was the last relatively stable bugfix version, as far as usability is concerned.
  3. Some users need a dependable LR-RL TeX engine now.

Aleph weds Omega 1.15 and eTeX 201, removes some extraneous stuff, and fixes a few bugs. I use it for production purposes. It uses dvipdfmx for pdf production, and can take advantage of most of ConTeXt's capabilities. Giuseppe Bilotta has done virtually all of the development work.

In addition to large character sets, Aleph inherits the filter sequence mechanism for script processing (extension ocp, compile from text-editable otp). So you can script whatever input encoding you like to whatever output font encoding you like. It is mechanism powerful enough to do contextual analysis of Arabic script for example, but not powerful enough for things like vertical glyph positioning for cursive scripts and the like.

Aleph, inheriting from Omega, provides many ready-to-go filters, using a Times Roman like font for Latin, Greek, and Cyrillic scripts. The ConTeXt module for this setup is called Gamma (m-gamma.tex); this is a port of the Lambda (i.e., LaTeX) style files to ConTeXt. The font typescript is called type-omg.

Installing

This install is based on the stand-alone ConTeXt for Win32 package.

Users of MiKTeX and other OS's will need to adjust the following instructions to their own setups.

  • Make sure you have a very recent version of ConTeXt that supports the engine path mechanism. This mechanism allows texexec to manage two, e.g., cont-en.fmt files at once, one in texmf/web2c/aleph and one in texmf/web2c/pdfetex

How recent, you ask? Just be safe and get the latest :-)

  • Some configuration points:
    • Make sure you have the following line in texexec.ini set to "true":
      set  UseEnginePath      to  true
    • In texmf-local\web2c\texmf.cnf, texmf-local\web2c\context.cnf, and texmf\web2c\texmf.cnf comment out the following line
 extra_mem_bot.context    = 2000000

otherwise Aleph will crash under some conditions, like overfull boxes and the like...
The XeTeX developer found the source to this bug, and a fix; hopefully Giuseppe will get to it :-))

  • Get the omega support files omega.zip and omegafonts.zip.
  • Get rid of two directories from omega.zip (not really necessary but if you want to be efficient): texmf/eomega and texmf/omega/encodings
  • Put support files in texmf-local.
  • Replace the default uni2cuni.ocp and uni2cuni.otp with this one: Media:Uni2cuni.zip This contains two ocp's: uni2cuni.ocp and uni2cuni-math.ocp. uni2cuni-math is the equicalent of the old omega uni2cuni.

This makes r-l numeral labels with separators easier to handle. Always use something like $<numeral>$ or ${\tf <numeral>}$ for mathematics and decimal points. $<numeral>$ uses the default math font; ${\tf <numeral>}$ uses the digits from the main text font.

  • Compile the Aleph format:
    mktexlsr
    texexec --make en -tex=aleph
  • Here is a test file. Note the preamble
 %tex=aleph output=dvipdfmx

at the beginning of every Aleph file.

omarb.tex

% tex=aleph output=dvipdfmx
\usemodule[gamma]
\input type-omg.tex % perhaps \usetypescriptfile[type-omg] ?

\setupbodyfont[omlgc,12pt]

\starttext

\startlatin

This is a test

\bf This is a test

\stoplatin

\startgreek

A B G D a b g d

{\bf A B G D a b g d}

\stopgreek

\startarab

`rby:

A b t th j H kh

{\bf \ A b t th j H kh}

fArsy:

A b p t th j ch H kh

{\bf A b p t th j ch H kh}

\starturdu

ArdU:

A b p t 't th j ch H kh

{\bf A b p t 't th j ch H kh}

\stopurdu

\blank

\tfc

`rby:

bsm ALLah Al-rrHmn Al-rrHym

fArsy:

bh nAm khdAwnd b-kh-sh-nde mhrbAn

\starturdu

\tfc

ArdU:

ALLah kE nAm sE jw rHmAn w rHym hE

\stopurdu

\stoparab

\stoptext
  • For Arabic script you will probably want to use an encoding that supports direct Arabic-script editing. There are three: utf-8, iso-8859-6 (Apple/Unix), and cp1256 (Microsoft). We can define the following, using ConTeXt macros for managing filter sequences. Maybe I will add these to m-gamma and ask Hans to distribute. In the meantime, here are some definitions, samples of all three encodings, and an example of mixed lr-rl text:

m-arabic-enc.tex

% tex=aleph output=dvipdfmx
%\usemodule[gamma]
\usetypescriptfile[type-omg]
\usetypescript[OmegaArab]

\hoffset=0pt

%% Individual Filters

% Input filters (from what you type)

\definefiltersynonym [UTF8]      [inutf8]
\definefiltersynonym [ISO8859-6] [in88596]
\definefiltersynonym [CP1256]    [incp1256]

% Contextual filter

\definefiltersynonym [UniCUni]            [uni2cuni]

% Output filters (font mapping)

\definefiltersynonym [CUniArab]           [cuni2oar]

%% Filter Sequences

\definefiltersequence
  [UTFArabic]
  [UTF8,UniCUni,CUniArab]

\definefiltersequence
  [ISOArabic]
  [ISO8859-6,UniCUni,CUniArab]

\definefiltersequence
  [WINArabic]
  [CP1256,UniCUni,CUniArab]

% For inner paragraph control within an LR paragraph

\definestartstop
  [arabictextutf]
  [commands=%
    {\textdir TRT%
    \switchtobodyfont[omarb]%
    \usefiltersequence[UTFArabic]}]

\definestartstop
  [arabictextiso]
  [commands=%
    {\textdir TRT%
    \switchtobodyfont[omarb]%
    \usefiltersequence[ISOArabic]}]

\definestartstop
  [arabictextwin]
  [commands=%
    {\textdir TRT%
    \switchtobodyfont[omarb]%
    \usefiltersequence[WINArabic]}]

\def\ArabicTextUTF#1{\startarabictextutf#1\stoparabictextutf}

\def\ArabicTextISO#1{\startarabictextiso#1\stoparabictextiso}

\def\ArabicTextWIN#1{\startarabictextwin#1\stoparabictextwin}

% For global Arabic script

\def\ArabicDirGlobal{%
\pagedir TRT\bodydir TRT\textdir TRT\pardir TRT %
\hoffset=-8.88cm} % compensate for a bug in \bodydir TRT

\def\ArabicUTF{\ArabicDirGlobal\usefiltersequence[UTFArabic]
                \switchtobodyfont[omarb]}

\def\ArabicISO{\ArabicDirGlobal\usefiltersequence[ISOArabic]
                \switchtobodyfont[omarb]}

\def\ArabicWIN{\ArabicDirGlobal\usefiltersequence[WINArabic]
                \switchtobodyfont[omarb]}

% For separate Arabic-script paragraphs

\def\ArabicDirPar{\textdir TRT\pardir TRT}

\definestartstop
  [arabutf]
  [commands=%
    {\usefiltersequence[UTFArabic]
     \switchtobodyfont[omarb]%
     \ArabicDirPar}]

\definestartstop
  [arabiso]
  [commands=%
    {\usefiltersequence[ISOArabic]
     \switchtobodyfont[omarb]%
     \ArabicDirPar}]

\definestartstop
  [arabwin]
  [commands=%
    {\usefiltersequence[WINArabic]
     \switchtobodyfont[omarb]%
     \ArabicDirPar}]

\showframe[text]

\starttext

\startarabutf

اللَّهُمَّ صَلِّ عَلَى مُحَمَّدٍ وَ
آلِ مُحَمَّدٍ وَ ارْزُقْنِي
الْيَقِينَ وَ حُسْنَ الظَّنِّ بِكَ
وَ أَثْبِتْ رَجَاءَكَ فِي قَلْبِي
وَ اقْطَعْ رَجَائِي عَمَّنْ سِوَاكَ
حَتَّى لَا أَرْجُوَ غَيْرَكَ وَ لَا
أَثِقَ إِلَّا بِك‏

\stoparabutf

\blank

\startarabiso

Çääñîçïåñî Õîäñð Ùîäîé åïÍîåñîÏí èî Âäð åïÍîåñîÏí èî ÇÑòÒïâòæðê
Çäòêîâðêæî èî ÍïÓòæî ÇäØñîæñð Èðãî èî ÃîËòÈðÊò ÑîÌîÇÁîãî áðê
âîäòÈðê èî Çâò×îÙò ÑîÌîÇÆðê Ùîåñîæò ÓðèîÇãî ÍîÊñîé äîÇ ÃîÑòÌïèî
ÚîêòÑîãî èî äîÇ ÃîËðâî ÅðäñîÇ Èðã

\stoparabiso

\blank

\startarabwin

Çááøóåõãøó Õóáøö Úóáóì ãõÍóãøóÏò æó Âáö ãõÍóãøóÏò æó ÇÑúÒõÞúäöí
ÇáúíóÞöíäó æó ÍõÓúäó ÇáÙøóäøö Èößó æó ÃóËúÈöÊú ÑóÌóÇÁóßó Ýöí
ÞóáúÈöí æó ÇÞúØóÚú ÑóÌóÇÆöí Úóãøóäú ÓöæóÇßó ÍóÊøóì áóÇ ÃóÑúÌõæó
ÛóíúÑóßó æó áóÇ ÃóËöÞó ÅöáøóÇ Èößþ

\stoparabwin

\blank

Here is some mixed {\em Arabic-} (\ArabicTextUTF{عربي}) and
Latin-script. As you can see, Aleph does a very good job mixing
{\em LR} (\ArabicTextUTF{يسار-يمين}) and {\em RL}
(\ArabicTextUTF{يمين-يسار}) texts. \ArabicTextUTF{و
هنا جملة منقطعة في وسط قرينة
لاتينية}. Aleph even does a great job breaking Arabic
phrases across lines.

\stoptext

Going beyond

The last example shows how to make and apply your own filter sequences beyond the basic Gamma module. To go further you need to learn some low-level business. You will also need some File:Utilities-aleph.zip. I have put together a windows package that you can unzip to C:\ConTeXt. These utilities do work, but they are cobbled together from old fpTeX and MiKTeX versions.

Example: If you want to get the final Persian kaaf instead of the default Arabic one

Check to see if your glyph is in the Arabic font. The Arabic font is made of 6 raw fonts: 3 regular and three bold:

 texmf-local/fonts/type1/public/omega
 omsea1, omsea1b, ... omsea3b

Using a font viewer or editor you will find the Persian final kaaf in omsea2, named kafswashfin.

Now go to texmf-local/omega/lambda/misc and open omarab.cfg you will find a line

 04AA N kafswashfin

This means that the 044A is the virtual font position for kafswashfin. Open cuni2oar.otp and add the following at line 263:

 %@"E343 => @"04AA;

Following this line you should see

 % remaining Arabic glyphs
 @"E000-@"E3FF => #(\1 - @"DF00);

Basically, in uni2cuni.otp final-kaaf gets mapped to E343. In the font, we want it mapped to kafswashfin, so we did that. Now recompile the otp:

 otp2ocp cuni2oar

Now you will get kafswashfin for the final kaaf.

Want new fonts (Arabic or Latin)? Here are the instructions

  • Read the following two papers carefully again and again; they are your friends :-)
  • Make a pfb file containing the glyphs you need, or use an existing font
  • Make a cfg file a la texmf\omega\lambda\misc\omlgc.cfg. Make sure you list your glyph positions in hexadecimal notation.
  • Get makeovp.pl from the utilities. I made some small changes, makeovp2.pl. Try them both and use what works for you. There is a SH file with a sample of its use using omlgc.
  • Following are instructions for cooking omarab.ovf.
    • You want your own ovf, say, omlgcch.ovf.
    • Generate an afm file for your private glyph pfb/pfa plus the afm files that are listed in the SH file (base files for omlgc found in \texmf\fonts\afm\public\omega)
    • Using the instructions below and the SH file (ignore the kernings.afm file!) you can figure out how to make your own ovp and ovf.
    • Before making the ovf file, examine the ovp file created, especially the first few lines, to see how the font-metric info from the afm's are concatenated. Very instructive.
  • Don't forget the rest of the accounting:
    • adding lines to a map file and pointing dvips/dvipdfm to it;
    • create a typescript file;
    • edit your otp's. If you get stuck be sure to read tsukuba-arabic

How to cook omarab.ovf

Ingredients: omarab.cfg, omseco.afm, omsea1.afm, omsea2.afm, omsea3.afm

 > perl makeovp.pl omarab.cfg omseco.afm omsea1.afm omsea2.afm omsea3.afm omarab.ovp
 > pltotf omseco.pl omseco.tfm
 > pltotf omsea1.pl omsea1.tfm
 > pltotf omsea2.pl omsea2.tfm
 > pltotf omsea3.pl omsea3.tfm
 > ovp2ovf omarab.ovp omarab.ovf omarab.ofm

If the last line does not work, try

 > ovp2ovf omarab.ovp omarab.ovf omarab.tfm

rename omarab.tfm to omarab.ofm --> ofm directory

How to distill omarab.ovp from omarab.ovf:

Use a different directory or a different name for the output ovp so that omarab.ovp created above is not overwritten.

get omarab.ofm & rename to omarab.tfm

 > ovf2ovp omarab.ovf omarab.tfm omarab.ovp

How to cook omarabb.ovf

Ingredients: omarab.cfg, omsecob.afm, omsea1b.afm, omsea2b.afm, omsea3b.afm

 > perl makeovp.pl omarab.cfg omsecob.afm omsea1b.afm omsea2b.afm omsea3b.afm omarabb.ovp
 > pltotf omsecob.pl omsecob.tfm
 > pltotf omsea1b.pl omsea1b.tfm
 > pltotf omsea2b.pl omsea2b.tfm
 > pltotf omsea3b.pl omsea3b.tfm
 > ovp2ovf omarabb.ovp omarabb.ovf omarabb.ofm


If the last line does not work, try

 > ovp2ovf omarabb.ovp omarabb.ovf omarabb.tfm

rename omarab.tfm to omarab.ofm --> ofm directory

How to distill omarabb.ovp from omarabb.ovf

Use a different directory or a different name for the output ovp so that omarabb.ovp created above is not overwritten

get omarab.ofm & rename to omarab.tfm

 > ovf2ovp omarabb.ovf omarabb.tfm omarabb.ovp
  • For more info, there is also the (mostly cryptic) Omega manual in PS-Format. Don't ask me why it's not in PDF. :-(

See also tsukuba-arabic

Miscellaneous

  • Some people have gotten large OpenType fonts to work in Aleph/Omega. Probably they used FontForge to convert to CFF-enriched type1. FF can produce ofm files (large tfms) so that's a help too.
  • Me, I'm working on an advanced Arabic-script typesetting system that really pushes Aleph to the max. At present I don't actually use m-gamma, etc, but my own macros. I really hope to release something this year...
  • See also Omega example

To the future

  • The otp mechanism does not seem well suited to support, e.g., OpenType GPOS tables, important for really advanced Arabic (though GDEF and GSUB should work fine with the present mechanism for most purposes). We need a better model for horizontal and vertical glyph substitutions.
  • The low-level filtersequence mechanism needs to abstract language processing from font mapping. Right now both are hardwired into a single sequence, so setting up more than one font for a single language is more of a pain than it should be.
  • The otp language is a bit cryptic. Hans has suggested switching otp's to a new language (like lua or io) but I don't know how hard that will be...
  • One very important feature which may work better at the primitive/engine level by extending the pdfeTeX engine: glyph substitution that depends on the paragraph. For example: In traditional Arabic typography, one way to compensate for "underfull" paragraphs is to substitute a "swash" version of a letter. Another way is by stretching the cursive tie between joining characters (which is already implemented in my own Arabic system). Combined with HZ we can get some pretty interesting high-level options, effects, etc. that the user can choose etc.