Difference between revisions of "Unicode blocks in ConTeXt"

From Wiki
Jump to navigation Jump to search
Line 1: Line 1:
A ''Unicode block'' is an interval of code points which represent characters that are semantically related to each other.  For example, there is a Unicode block for characters from the Devanagari script which is used by several Indian languages.  Another Unicode block corresponds to characters which denote mathematical operators, such as those that indicate the union and the intersection of sets.
+
A '''Unicode block''' is an interval of code points which represent characters that are semantically related to each other.  For example, there is a Unicode block for characters from the Devanagari script which is used by several Indian languages.  Another Unicode block corresponds to characters which denote mathematical operators, such as those that indicate the union and the intersection of sets.
  
 
ConTeXt has special names for all Unicode blocks.  These names can be used to specify ranges of code points in the setups of several commands.
 
ConTeXt has special names for all Unicode blocks.  These names can be used to specify ranges of code points in the setups of several commands.
Line 5: Line 5:
 
== Unicode blocks ==
 
== Unicode blocks ==
  
A '''Unicode block''' is an organisational unit of the Unicode code space.  The Unicode code space is the set of all code points, that is, the set of all integers from 0 to the integer whose hexadecimal representation is 10FFF.  The official list of the blocks is available [ftp://www.unicode.org/Public/UNIDATA/Blocks.txt at the Unicode Web site].
+
A Unicode block is an organisational unit of the Unicode code space.  The Unicode code space is the set of all code points, that is, the set of all integers from 0 to the integer whose hexadecimal representation is 10FFF.  The official list of the blocks is available [ftp://www.unicode.org/Public/UNIDATA/Blocks.txt at the Unicode Web site].
  
 
Every block is an interval of code points. Different blocks are disjoint from each other, and every code point belongs to at least one block.  Thus, the blocks form a partition  of the set of all Unicode code points.  The number of code points in a block varies.  Some have just 16 code points, and some others have thousands of code points.
 
Every block is an interval of code points. Different blocks are disjoint from each other, and every code point belongs to at least one block.  Thus, the blocks form a partition  of the set of all Unicode code points.  The number of code points in a block varies.  Some have just 16 code points, and some others have thousands of code points.
Line 21: Line 21:
 
See the article [[List of Unicode blocks]] for a table of Unicode blocks, their ConTeXt names, and links to more information about them.
 
See the article [[List of Unicode blocks]] for a table of Unicode blocks, their ConTeXt names, and links to more information about them.
  
== Usage of the blocks in ConTeXt ==
+
== An example usage of Unicode blocks in ConTeXt ==
  
 
A typical use of Unicode blocks is in the definition of '''fallback''' fonts to provide glyphs for certain characters. Sometimes, when writing a document in ConTeXt, one needs to typeset special symbols that are not available in the base font of the document.  In such a situation, one can specify a fallback font to provide these missing symbols.
 
A typical use of Unicode blocks is in the definition of '''fallback''' fonts to provide glyphs for certain characters. Sometimes, when writing a document in ConTeXt, one needs to typeset special symbols that are not available in the base font of the document.  In such a situation, one can specify a fallback font to provide these missing symbols.
Line 57: Line 57:
 
</context>
 
</context>
  
Another use of fallback fonts arises when one wants to replace the glyphs for some characters in the base font with glyphs for those characters from another font.  In such a case, the latter font can be specified as a fallback font.
+
== Another example ==
  
For example, the following document uses the [[Latin Modern]] font as the base font, and uses the {{code|Asana Math}} font for mathematical script letters, which lie in the Unicode block  
+
A different application of fallback fonts arises when one wants to replace the existing glyphs for some characters in the base font with glyphs for those characters from another font.  This situation is different from the one in the previous example.  There, the base font did not contain glyphs for the characters of interest, and the fallback font provided the missing glyphs.  Here, the base font does contain glyphs for the characters in question, but, perhaps due to aesthetic reaosons, the author of the document, wants to replace those glyphs with glyphs from another font.  In such a case, the latter font can be specified as a fallback font.
{{code|Mathematical Alphanumeric Symbols}}.  Instead of {{cmd|definefallbackfamily}} which was used in the previous example, this document uses the command {{cmd|definefontfallback}}.  The ConTeXt name of the block is supplied as the last setup of this command.
+
 
 +
For example, the following document uses the [[TeX Gyre|{{code|pagella}}]] typescript to provide the base font, and uses the {{code|STIX General Regular}} font for mathematical script letters, which lie in the Unicode block {{code|Mathematical Alphanumeric Symbols}}.  Instead of {{cmd|definefallbackfamily}} which was used in the previous example, this document uses the command {{cmd|definefontfallback}}.  The ConTeXt name of the block is supplied as the third setup of this command.  The last setup {{code|1=force=yes}} ensures that the glyphs of the relevant characters are replaced from the fallback font, overriding the glyphs that may exist in the base font for these characters.
  
 
<texcode>
 
<texcode>
Line 89: Line 90:
  
 
<pre>
 
<pre>
system > 3: filename=/usr/share/fonts/opentype/stix/STIXGeneral-Regular.otf ...
+
system > 13: filename=/usr/share/fonts/opentype/stix/STIXGeneral-Regular.otf ...
 
</pre>
 
</pre>
  
 
so {{code|context}} is taking the missing glyphs from the STIX fonts provided by the local operating system.
 
so {{code|context}} is taking the missing glyphs from the STIX fonts provided by the local operating system.

Revision as of 13:13, 25 October 2017

A Unicode block is an interval of code points which represent characters that are semantically related to each other. For example, there is a Unicode block for characters from the Devanagari script which is used by several Indian languages. Another Unicode block corresponds to characters which denote mathematical operators, such as those that indicate the union and the intersection of sets.

ConTeXt has special names for all Unicode blocks. These names can be used to specify ranges of code points in the setups of several commands.

Unicode blocks

A Unicode block is an organisational unit of the Unicode code space. The Unicode code space is the set of all code points, that is, the set of all integers from 0 to the integer whose hexadecimal representation is 10FFF. The official list of the blocks is available at the Unicode Web site.

Every block is an interval of code points. Different blocks are disjoint from each other, and every code point belongs to at least one block. Thus, the blocks form a partition of the set of all Unicode code points. The number of code points in a block varies. Some have just 16 code points, and some others have thousands of code points.

A code block starts at a code point that is a multiple of 16. The number of code points in each block is also a multiple of 16. Thus, the hexadecimal representation of the first code point in a block is of the form pqrs0, and that of the last code point in it is of the form tuvwF, where p, q, r, s, t, u, and v, are hexadecimal digits.

The Unicode standard gives every block a unique name that describes the common semantic nature of its code points. These names are case insensitive, and the hyphens, spaces, and underscores, in them are insignificant. For example, one can refer to the block whose Unicode name is Myanmar Extended-A as myanmarextendeda, MyanmarExtendedA, or myanmar_extended_a. ConTeXt chooses the first of these alternative styles for the names of blocks, as described below.

ConTeXt names of Unicode blocks

ConTeXt has its own names for all the Unicode blocks. These names are defined in the source file char-ini.lua. Most of them are obtained by converting the Unicode name of the block to the lower case, and removing the hyphens and spaces in the name.

The list of blocks

See the article List of Unicode blocks for a table of Unicode blocks, their ConTeXt names, and links to more information about them.

An example usage of Unicode blocks in ConTeXt

A typical use of Unicode blocks is in the definition of fallback fonts to provide glyphs for certain characters. Sometimes, when writing a document in ConTeXt, one needs to typeset special symbols that are not available in the base font of the document. In such a situation, one can specify a fallback font to provide these missing symbols.

For example, in the following document, the base font TeX Gyre Pagella does not have the glyphs for Cyrillic characters, whose code points are in the Unicode block Cyrillic. The document uses the \definefallbackfamily command to get the glyphs for this block from the DejaVu Serif font. The ConTeXt name of the block is supplied as the value of the key range in the last setup of the command.

\definefallbackfamily [mainface] [rm] [DejaVu Serif] [range=cyrillic]

\definefontfamily     [mainface] [rm] [TeX Gyre Pagella]

\setupbodyfont        [mainface]

\starttext

\startlines
’Twas brillig, and the slithy toves
Did gyre and gimble in the wabe;
All mimsy were the borogoves,
And the mome raths outgrabe.
\stoplines

\rightaligned{— Lewis Caroll, Jabberwocky}

\startlines
Варкалось. Хливкие шорьки
Пырялись по наве,
И хрюкотали зелюки,
Как мюмзики в мове. 
\stoplines

\rightaligned {— Дина Григорьевна Орловская, Бармаглот}

\stoptext

Another example

A different application of fallback fonts arises when one wants to replace the existing glyphs for some characters in the base font with glyphs for those characters from another font. This situation is different from the one in the previous example. There, the base font did not contain glyphs for the characters of interest, and the fallback font provided the missing glyphs. Here, the base font does contain glyphs for the characters in question, but, perhaps due to aesthetic reaosons, the author of the document, wants to replace those glyphs with glyphs from another font. In such a case, the latter font can be specified as a fallback font.

For example, the following document uses the pagella typescript to provide the base font, and uses the STIX General Regular font for mathematical script letters, which lie in the Unicode block Mathematical Alphanumeric Symbols. Instead of \definefallbackfamily which was used in the previous example, this document uses the command \definefontfallback. The ConTeXt name of the block is supplied as the third setup of this command. The last setup force=yes ensures that the glyphs of the relevant characters are replaced from the fallback font, overriding the glyphs that may exist in the base font for these characters.

\usetypescript      [pagella]

\definefontfallback [mathscript] [STIXGeneralRegular] [mathematicalalphanumericsymbols] [force=yes]

\definefontsynonym  [MathRoman]  [pagella]            [fallbacks=mathscript]

\setupbodyfont      [pagella]

\starttext

Here is a bestiary of mathematical script letters:

\startformula
𝒜, 𝒞, 𝒟, 𝒢, 𝒥, 𝒦, 𝒪, 𝒫, 𝒬, 𝒮, 𝒯, 𝒰, 𝒱, 𝒲, 𝒳, 𝒴, 𝒵
\stopformula

\stoptext

Here is the relevant part of the PDF output of the command context file.tex, where file.tex is a file with the above code.

Unicode blocks in ConTeXt Example.png

The log file resulting from the above command says:

system > 13: filename=/usr/share/fonts/opentype/stix/STIXGeneral-Regular.otf ...

so context is taking the missing glyphs from the STIX fonts provided by the local operating system.