Table manipulation

From Wiki
Jump to navigation Jump to search

< The I/O-Library

Contents

Preliminary information

Overview

In ConTeXt, the vanilla Lua table library is extended by a number of convenience features for more common tasks. In order to get a first impression of the added functionality you can generate a list:

local cnt = 1
for key,val in next,table do
    print(string.format("[%2i]  %19s: '%s'", cnt,key,type(table[key])))
    cnt = cnt + 1
end

Compare this to Lua’s and LuaTeX’s standard table implementation.

The main file of ConTeXt’s table library is l-table.lua. Visit there for the exact function definitions and Hans‘s original annotations. If one or more examples do not work as expected it is very likely that a change has been introduced in a recent release. Feel free to wikify it yourself or ask for wikification on the mailing list.

Examples (read this first!)

For the examples to work, your test scripts should define the following dummies:

--- Numerically indexed table
testarray = { } 
for cnt=1,tonumber(arg[1]) or 100, 1 do
    testarray[cnt] = cnt 
end

--- One-dimensional array of strings
alpha = {
    "a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m",
    "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"
}

--- Nested array of strings and arrays of strings
alpha_nested = {
    {"a", "b", "c"}, "d", "e", {"f", {"g", "h", {"i"}, "j"}}, "k", "l", {"m"},
    {"n", "o"}, "p", {{{"q", "r"}, {"s", {"t"}}, "u", {"v", "w"}}, "x"}, "y", "z"
}

--- Hash table
anagrams = {
    ["Taht si crreoct"] = "That is correct",
    ["I stom certainly od"] = "I most certainly do",
    ["Revy chum so"] = "Very much so",
    ["Hamrag"] = "Graham",
    ["Bumcreland"] = "Cumberland",
    ["Staht sit sepreicly"] = "That’s it precisely",
    ["Sey, sey"] = "Yes, yes",
    ["Ta the mnemot"] = "At the moment",
    ["The Mating of the Wersh"] = "The Taming of the Shrew",
    ["Malliwi Rapesheake"] = "William Shakespeare",
    ["Two Netlemeg of Verona"] = "Two Gentlemen of Verona",
    ["The Chamrent of Venice"] = "The Merchant of Venice",
    ["My dingkom for a shroe!"] = "My kingdom for a horse",
    ["Ring Kichard the Thrid"] = "King Richard the Third", -- That's not an anagram, that's a spoonerism.
}

--- Conflation of hash and numerically indexed table
mixed = {
      [1]    = "one",
     ["2"]   = "two",
      [3]    = "three",
     ["4"]   = "four",
      [5]    = "five",
     ["6"]   = "six",
      [7]    = "seven",
     ["8"]   = "eight",
      [9]    = "nine",
    ["10"]   = "ten",
     [10]    = "ten",
    ["11"]   = "eleven",
     [11]    = "eleven",
}

Those will be used throughout this tutorial.

In order to use ConTeXt’s Lua extensions with TeXlua without the comparatively huge TeX overhead you will want to run your scripts with the mtxrun loader:

$mtxrun --script my_script.lua

Of course, sometimes TeX processing is needed even in the following demonstrations. The most convenient way to do this in ConTeXt are .cld files (“ConTeXt Lua Documents”), which are preferable to \[start|stop]luacode or \ctxlua when dealing predominantly with Lua code.

Extensions to the table library

table.strip(table t)

Strips every string in an array t of its leading and trailing whitespace and returns a new array. If this operation should result in an empty string (i.e. the input string was all-whitespace), then it is dropped. This method fails with tables that contain non-string elements.

irregular = {
    -- lots of spacy strings
    "     Johann ", " Gambolputty ", "   de         ", "          von", "Ausfern     ",
    "schplenden  ", "schlitter    ", "crasscrenbon  ", "fried        ", "     digger ",
    "   dingle   ", "  dangle     ", "   dongle     ", "dungle       ", "   burstein ",
    "        von ", "     knacker ", "  thrasher    ", " apple       ", "  banger    ",
    "horowitz    ", "ticolensic   ", "      grander ", "   knotty    ", "spelltinkle ",
    "grandlich   ", "grumblemeyer ", "spelterwasser ", "kurstlich    ", "himbleeisen ",
    "  bahnwagen ", "  gutenabend ", "     bitte    ", "         ein ", "nürnburger  ",
    " bratwustle ", " gerspurten  ", "       mitz   ", "weimache     ", "      luber ",
    "  hundsfut  ", "gumberaber   ", "  shönedanker ", "kalbsfleisch ", "mittler     ",
    "aucher      ", "       von   ", " Hautkopft    ", "of           ", "    Ulm     ",
}

-- iterate over the stripped version
for n,name in next, table.strip(irregular) do
    io.write(">>"..name.."<< "..(n % 5 == 0 and "\n" or ""))
end

table.keys(table t)

Returns an array of all the keys in dictionary t. (This amounts to the index when used on arrays.)

for n, key in next,table.keys(anagrams) do
    io.write((n%3==0 and "\n" or n~=1 and " | " or "")..key)
end

table.sortedkeys(table t)

Returns a sorted array of all the keys in table t. The comparer used for sorting will treat everything as a string iff the compared values are of mixed type (cf. example [1]). In general this means that you will be fine as long as you avoid stuffing indices along with number hashes together in t, lest the order will end up confused depending on what type an element is compared with in what order (cf. example [2]).

--- Example [1]
for n,key in next,table.sortedkeys(anagrams) do
    io.write(string.format("[%2i] “%24s” -> “%s”\n", n, key, anagrams[key]))
end

--- Example [2]
for n,key in next,table.sortedkeys(mixed) do
    io.write(string.format("[%2i] [%2s] (t:%6s) -> “%s”\n", n, key, type(key), mixed[key]))
end

table.sortedhashkeys(table t)

Returns a sorted array of all the keys in table t. The difference to table.sortedkeys is that it relies on the standard Lua comparer which means that it will fail on mixed-type table indices. As long as you stick with homogenous tables, table.sortedhashkeys will be the faster choice.

for n,key in next,table.sortedhashkeys(anagrams) do
    io.write(string.format("[%2i] “%24s” -> “%s”\n", n, key, anagrams[key]))
end

for n,key in next,table.sortedhashkeys(alpha) do
    io.write(string.format("[%2i] “%2s”\n", n, key ))
end

table.sortedhash(table t) | table.sortedpairs(table t)

Returns an iterator over an array as returned by table.sortedkeys. This is a shortcut for the primary usecase of table.sortedkeys. If t is nil (or empty) this will safely pass.

local n = 1
for key,value in table.sortedhash(mixed) do
    io.write(string.format("[%2i] [%2s] -> “%s”\n", n, key, value))
    n = n + 1
end

table.append(table t, table list)

Appends all numerically indexed elements of list to t; this will ignore discontinuously numbered elements as well as hashes. (Comparable to Python’s list.extend.)

for i,elm in next,table.append(alpha, alpha_nested) do
    local current = type(elm) == "string" and elm or type(elm)
    io.write(i.."->"..current..(i%5==0 and ",\n" or ", "))
end

table.prepend(table t, table list)

Complement to table.append; returns an array with all numerically indexed elements of list prepended to t.

for i,elm in next,table.prepend(alpha, alpha_nested) do
    local current = type(elm) == "string" and elm or type(elm)
    io.write(i.."->"..current..(i%5==0 and ",\n" or ", "))
end

NB: this might distort the initial order of elements in the resulting array. Thus, you should not rely on the next function to iterate over this array as it might yield unwanted results. (Cf. the Warning in the Lua manual.) Use numeric for instead.

local t1 = { "a", "b", "c", "d", "e" }
local t2 = { "f", "g", "h", "i", "j" }

local t12 = table.prepend(t1,t2)
for i,elm in next,t12 do
    print(i,elm)
end

table.merge(table t, args)

Takes a target table t and a number of tables and returns their union. The order of the tables in args is significant as previously existing entries will be overwritten by those from later tables. NB: table.merge is an in-place operation on the first argument t. If you want to keep t intact, an empty (new) table or nil has to be supplied as first argument. (Or consider using table.merged instead.)

for n, elm in next, table.merge({ }, alpha, mixed, alpha_nested) do
    io.write(string.format("[%2i] “%s”\n", n, tostring(elm)))
end


table.merged(args)

Returns the union of all tables in args. In contrast to table.merge this creates a new table and leaves the arguments as they were.

for n, elm in next, table.merged(alpha, mixed, alpha_nested) do
    io.write(string.format("[%2i] “%s”\n", n, tostring(elm)))
end
print(table.identical(table.merge({ }, alpha,mixed), table.merged(alpha,mixed)))

table.imerge(table t, args)

As table.merge, but processes only numerically indexed elements.

for n, elm in next, table.imerge({ }, mixed, alpha) do
    io.write(string.format("[%s] “%s”", n, tostring(elm)))
    io.write((n%5==0 and "\n" or ", "))
end

table.imerged(table t, args)

As table.merged, but processes only numerically indexed elements.

print(table.identical(table.merged(alpha, mixed), table.imerged(alpha, mixed)))
for n, elm in next, table.imerged(alpha, mixed) do
    io.write(string.format("[%s] “%s”", n, tostring(elm)))
    io.write((n%5==0 and "\n" or ", "))
end

table.fastcopy(table t, [boolean meta])

Returns a deep copy of t. meta toggles copying of metatables (default: no). This method does not recurse into tables on the key side.

local t = table.fastcopy(alpha_nested)
print(table.identical(t, alpha_nested))
t[1] = "Albatross!"
for i=1,#t do
    io.write(string.format("[%2i]  %16s : %s\n", i, tostring(alpha_nested[i]), tostring(t[i])))
end

table.copy(table t, [table tables])

Returns a deep copy of t. Recurses into keys of type table as well. The optional second argument tables works as kind of a cache for table references; this way it can be reused when several tables with similar content have to be copied. The metatable of t is always copied.

local t1, t2 = { "foo" }, { "bar" }
local f1, f2 = function () return "Spam" end, function () return "Baked Beans" end

local tf1 = { [t1] = t2, [f1] = f2 }
local cache = { }

local tf2 = table.copy(tf1, cache)

io.write(string.format("equal: %s, same: %s\n",
                       tostring(table.are_equal(tf1,tf2)),
                       tostring(table.identical(tf1,tf2)))
        )

for k,v in next,tf2   do print(k   , v   ) end
for k,v in next,cache do print(k[1], v[1]) end

table.tohash(table t, [any value])

Returns a hashtable with all the values of t as indices, their values set to value. The default for value is true but it can be anything; nil resolves as true.

local a_hash = table.tohash(alpha, "Notlob")

local n = 1
for k,v in next,a_hash do
    io.write(string.format("“%s”: %s", k, tostring(v))..(n%5==0 and ",\n" or ", "))
    n = n + 1
end

table.fromhash(table t)

Returns an array of all the keys in dictionary t, unless their corresponding value is false or nil.

local woody = {
    antelope    = false, bound       = true, caribou     = true,
    gorn        = true,  intercourse = true, leap        = false,
    litterbin   = false, ocelot      = true, sausage     = true,
    tin         = false, vole        = true, wasp        = true,
    yowling     = true,
}

local inv_woody = table.fromhash(woody)
for i=1,#inv_woody do
    local elm = inv_woody[i]
    io.write(string.format("[%i:%11s]", i, elm)..(i%3==0 and ",\n" or ",   "))
end

table.serialize(table t, [string|int|boolean name,] [boolean reduce,] [boolean noquotes,] [boolean hex])

Serializes an arbitrary Lua table t into a string. The return value is 100% valid Lua code and as such can then be directly processed by the interpreter by evaluating it with loadstring. It is also possible to save it to a file and load it into the running session on the fly, as ConTeXt does it with respect to .tuc files. Note that even functions are serialized correctly, albeit in a not quite human readable manner.

local separator = function ()
    io.write("\n\n~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~\n\n")
end

local tf = {
    ["Tree"]     = function () return "The larch" end,
    ["Nice Guy"] = function () return "Nudge nudge" end,
}

print(table.serialize(alpha))
separator()
print(table.serialize(alpha_nested))
separator()
print(table.serialize(mixed))
separator()
print(table.serialize(tf))
separator()
print(table.serialize(anagrams))

name

The parameter name can be a string, an integer or a boolean expression. The values false and nil cause the string to be prefixless, thus it cannot be directly evaluated as-is. If true or the string return, then the return value is prefixed with return which enables it to be directly processed.

--- How to access a serialized tab from within the same session.
local teststr = table.serialize(anagrams, true) -- 'name == true' prepends *return*
local test = assert(loadstring(teststr))()
print(type(test),table.serialize(test))

Any other string as value will be prepended as variable name, making the output conform to standard Lua initialization syntax.

--- This serializes a table and then creates another table from
--- the resulting string.
assert(loadstring(table.serialize(mixed, "tabula_mixta")))()
print(table.serialize(tabula_mixta))

An integer will cause the result to be numerically indexed instead. If the parameter hexify is non-[nil|false], then the index number is converted to hex first, as are all the other index values.

print(table.serialize(mixed, 42))
print(table.serialize(mixed, 42, false, false, true))

reduce

If non-[nil|false], string values that are valid number expressions are automatically typecast to number.

local tab = {
    ["a"] = "1",
    ["b"] = "2",
    ["c"] = "5",
}
print(table.serialize(tab, true, false, false, false))
print(table.serialize(tab, true, true,  false, false))

noquotes

Determines whether or not to resort to Lua’s syntactic sugar regarding the hash table notation. This will leave keys of type string unquoted whenever this is possible (i.e. unless they are reserved words or contain spaces.)

print(table.serialize(anagrams, false, false, false, true))
print(table.serialize(anagrams, false, false, true, true))

hexify

If non-[nil|false], keys, indices and values of type number will be converted to hexadecimal representation. NB: This has no effect on numbers which are reduced from strings (see example).

local series = { 0, 1, 1, 2, 3, 5, 8, 13, "21" }
print(table.serialize(series, false, false, false, true))
print(table.serialize(series, false, true,  false, true)) -- no effect
print(table.serialize(mixed,  false, false, false, true))

Customization

There are three switches that allow for controlling what table content is to be serialized: table.serialize_functions, table.serialize_compact, and table.serialize_inline.

table.serialize_functions, if either nil or false, will prevent functions from being serialized. The string function will be returned in their place. (Default: true.)

local f = function () return "Throatwobbler Mangrove" end
local g = function () return "Luxury Yacht"           end

local t = {[1] = f, [2] = g}

print(table.serialize(t))
table.serialize_functions = false
print(table.serialize(t))

table.serialize_compact, if unset, will cause arrays to be serialized in verbose [n] = elm notation instead of plain comma-separated element listing. (Default: true.)

print(table.serialize(alpha))
table.serialize_compact = false
print(table.serialize(alpha))

table.serialize_inline, if unset, prevents the output of the deepest nesting level of arrays from being compressed onto a single line. This results in more verbose output. (Default: true.)

print(table.serialize(alpha_nested))
table.serialize_inline = false
print(table.serialize(alpha_nested))

Limitations

table.serialize throws an error if t contains functions or tables as keys.

--- The following code creates an error.
local f1, f2 = function () return "Conquistador" end, function () return "Instant Coffee" end
local t1, t2 = { "Conquistador" }, { "Instant Leprosy" }

local tf12 = { [f1] = f2 }
local tt12 = { [t1] = t2 }

print(table.serialize(tf12)) -- has function-type key
print(table.serialize(tt12)) -- has table as key

table.tohandle(function handle, table t, [string|int|boolean name,] [boolean reduce,] [boolean noquotes,] [boolean hex])

Same as table.serialize(…), except that it redirects its output to the handle specified as first argument. (Default handle is the print function.)

--- Setup a stream.
local fname = "./testfile.lua"
local f = io.open(fname, "w")
--- Create a handle and serialize some array into it.
local h = function(str) f:write(str) return end
table.tohandle(h, alpha, true)
f:close()

--- Load the array from the file again and verify its content.
local array = dofile(fname)
print(table.serialize(array))

table.tofile(string filename, string root, string name, boolean reduce, boolean noquotes, boolean hex)

Serializes the table root and writes it to file filename. Use with care, as this will overwrite any content of the file.

local fname = "./test_alpha.lua"
table.tofile(fname, alpha, true)

alpha = nil
print(alpha)
alpha = dofile(fname)
print(table.serialize(alpha))

table.flattened(table t, [table f,] [int depth])

Removes any nesting from array t. Optionally, a target table f may be specified that the flattened elements are appended to. The third argument allows for restricting the maximum recursion level. Elements beyond this level will remain untouched by the flattener.

local fl = table.flattened(alpha_nested)
--- Try this instead:
-- local fl = table.flattened(alpha_nested, alpha)

for i=1,#fl do
    io.write(string.format("[%2d] = «%s»,", i, fl[i])..(i%5==0 and "\n" or "  "))
end

print("\n" ..
      "Is it the same as the non-nested variant? -- " ..
      tostring(table.identical(alpha, fl)))


table.are_equal(table t1, table t2, [int start,] [int stop])

For arrays, checks if t1 and t2 have the same elements and order. Recurses into subtables. The optional arguments define a slice of both arrays to be checked, defaulting to the first and last element respectively. NB: As the check for identical length always precedes the other tests, tables of unequal length will be reported as unequal even though the requested subsets might match.

local a1, a2 = { "a", "b" }, { "a", "c" }
print(table.are_equal(a1, a2))
a1[2] = "c"
print(table.are_equal(a1, a2))

--- Subslice handling: a3 == a4 but a3 ~= a5.
local a3 = { "e", "f", "g", "h" }
local a4 = { "h", "f", "g", "e" }
local a5 = { "h", "f", "g", "e", "i" }
print(table.are_equal(a3, a4, 2, 3))
print(table.are_equal(a3, a5, 2, 3))

table.identical(table t1, table t2)

For hashtables, checks if t2 contains the same key-value pairs as t1. Recurses into subtables.

NB: table.identical is asymmetric, therefore the order in which both tables are given is significant! t2 is treated as identical to t1 whenever it contains the same value as t1 for each key in t1. Conversely this means that there is no check at all for whether t2 might contain additional elements that are not in t1. Thus, if you need to match the contents of two dictionaries, you might want to check if t2 is identical to t1 as well.

local t1 = { a = "a", b = "b" }
local t2 = { c = "c", d = "d" }
local t3 = { a = "a", b = "b", c = "c" }

--- Weird: t1 == t3 but t3 ~= t1.
print(table.identical(t1, t2))
print(table.identical(t1, t3))
print(table.identical(t3, t1))

table.compact(table t)

In-place operation removing empty tables from the array t.

local t = { { "a" }, { b = "b" }, { c = nil }, { }, { "e" } }
print(table.serialize(t))
table.compact(t)
print(table.serialize(t))

table.contains(table t, any value)

For a given element value, returns the index of value in array t, else if value is not in t, returns nil. Because of the iteration method used internally, table.contains does not work with hash tables (always returning false).

NB: How nil’ed values are handled depends on the table initialization. In hash-style numerically indexed tables any values after the lowest-indexed nil will be treated as non-existant, whereas array-style initialization will prevent them from being ignored (see example). This means that tables which are supposedly equal or identical can behave differently …

print(table.contains(alpha, "h"))

--- Ineffective with hashes.
local t = { a = "a", b = "b" }
print(table.contains(t, "b"))

--- Lose ends to watch:
local u = { [1] = "a", [2] = "b", [3] = "c" }
local v = { "a", "b", "c" }
print(table.identical(u,v),   table.are_equal(u,v))

print(table.contains(u, "c"), table.contains(v, "c"))
u[2], v[2] = nil, nil
print(table.contains(u, "c"), table.contains(v, "c"))

table.count(table t)

Non-recursively counts the elements of table t, irrespective of their being hashes or indices.

print(table.count(alpha))
print(table.count(alpha_nested))
print(table.count(anagrams))
print(table.count(mixed))

table.swapped(table t1, [table t2])

Returns a table with all key-value pairs in t1 transposed. The optional second argument is a table t2 for those pairs to be merged into. The pairs of t2 themselves will not be transposed. If t2 is given and some of t1’s values occur as keys in t2, the ones from t1 take precedence.

for k,v in next, table.swapped (anagrams) do
    io.write(string.format("%25s => %s\n", k,v))
end

for k,v in next, table.swapped (alpha) do
    io.write(string.format("%s => %2d\n", k,v))
end

local n = 1
local beta = { a = "A", b = "B", c = "C" }
for k,v in next, table.swapped (alpha, beta) do
--for k,v in next, table.swapped (beta, alpha) do
    io.write(string.format("%2s => %2s", k,v)..(n%5==0 and "\n" or ",   "))
    n = n+1
end


table.reversed(table t)

Returns a table with all consecutively indexed entries of t in reverse order. This ignores any non-numerically indexed content of t.

local rev = table.reversed(alpha)
for i=1,#rev do
    io.write(rev[i])
end

table.sequenced(table t, [string separator,] [boolean simple])

Returns a formatted string containing all the pairs in hash table t, sorted alpha-numerically by keys and concatenated using separator (default: “ | ”). The third argument, if non-[nil|false] enables exceptions:

  • for pairs with true as value, the key will be inserted as-is;
  • pairs with the value false, nil, or the empty string will be skipped.

(Cf. serialization.)

local t = {
    e = false, g = true,  h = true,  j = false,
    a = true,  b = false, c = false, d = true,
}

print("[booleans 1]> "..table.sequenced(t    , " * "       ).."\n") -- custom delimiter
print("[booleans 2]> "..table.sequenced(t    , " * " , true).."\n") -- “simple” mode
print("[alpha]     > "..table.sequenced(alpha              ).."\n") -- using defaults
print("[mixed]     > "..table.sequenced(mixed, " >< ", true).."\n") -- simple mode ineffective

table.print(table t, [...])

Equivalent to table.tohandle with print as first argument; see there for a list of all optional arguments.

table.print(mixed)
table.serialize_functions = false
table.print(xml)
table.print(table)

table.sub(table t, [int start,] [int stop])

Returns a slice of array t beginning from start and ending with stop. NB: Flagged as obsolete in the source.

print(table.serialize(table.sub(alpha,        4, 10)))
print(table.serialize(table.sub(alpha_nested, 4,  7)))

table.is_empty(table t)

Returns true if t is [nil|false] or an empty table.

print(table.is_empty(alpha))
print(table.is_empty({ }))
print(table.is_empty(false))
print(table.is_empty(nil))

table.has_one_entry(table t)

Returns true if t is a table with exactly one element (hash or indexed).

print(table.has_one_entry(alpha))
print(table.has_one_entry({"Fear"}))
print(table.has_one_entry({[2] = "Surprise"}))
print(table.has_one_entry({["three"] = "Ruthless Efficiency"}))

Legacy Lua<=5.1

Starting from version 5.2 some changes have been introduced in the standard Lua libraries. ConTeXt somewhat accommodates for this fact in order to ensure backwards compatibility after LuaTeX switches to the next version.

Fallback iterators

Lua up to version 5.1 provided two standard iterators for tables: ipairs() to traverse arrays and pairs() for hashes. ([1], [2]) At least ipairs’s existence will terminate with the next minor release (5.2 at the time of this writing), so in order for legacy code to work with future versions of LuaTeX, ConTeXt has fallback definitions for both iterators. Their code is equivalent to the definitions given by Roberto in the Lua handbook.

Thus, you may have to rewrite your code to make it work properly with upcoming vanilla Lua interpreters. But your LuaTeX code written for ConTeXt will be guaranteed to continue to work until Hans decides to drop backward compatibility at some point far into the future.

for n, elm in ipairs(testarray) do
    if n % 10 == 0 then io.write(elm .. " -- ") end 
end

This example will continue to work, but keep in mind that ipairs() by definition involves lots of function calls and, consequently, will lose any comparison to other iteration methods with respect to performance. Apart from the next() iterator, for enumerable (nil-terminated) arrays the numeric for … do … end loop with the table length cached locally is the fastest way and should be preferred over ipairs() under most circumstances.

do
    local max = #testarray
    for n=1, max do
        local elm = testarray[n]
        if n % 10 == 0 then io.write(elm .. " -- ") end
    end
end

unpack()

The handy function unpack() will be part of the table library after release 5.2. However, ConTeXt conveniently makes it accessible in the global environment as well (and, until the library update has taken place, it creates table.unpack() too, so new-style code can already be written prior to the actual transition).

This said, the following snippet will work out of the box with context/mtxrun but not Lua v.5.1:

local array = {"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m"}
print (unpack(array))
print (table.unpack(array))

Further information