README
Version 1.1, March 2008

This document describes the reconstructed Chinese etymology of Lojban, which is available in the original plain text format and in HTML format, generated by a conversion script. In some cases, the reconstructed Chinese source word is dubious or missing; these cases are marked by a comment with the string “FIXIT”

  1. Copying conditions
  2. Format of the HTML version
  3. Format of the plain text version
  4. Reconstruction procedure
  5. Etymological sources

Copying conditions

The “Chinese etymology of Lojban” in plain text and in HTML format as well as the conversion script were prepared by mublin in March 2008. The content of these three files is hereby placed irrevocably in the public domain.

Unless specified otherwise, English translations of Chinese source words are from the CEDICT dictionary, which is in the public domain. The official gismu list, prepared by The Logical Language Group, Inc., is in the public domain.


Format of the HTML version

In the generated HTML version, each gismu is given in bold, followed by the English keyword and the Lojbanised source word on one line.

On the next line, the reconstructed Chinese source word is given in traditional script, simplified script, pinyin transcription, and English translation.

If present, a comment starts on a new line in smaller font.


Format of the plain text version

The plain text file is encoded in UTF-8 with UNIX style line breaks. Each gismu has one line with TAB-separated fields, in the following format:

  1. gismu
  2. English keyword
  3. Chinese source word in Lojbanised form
  4. Chinese source word in traditional script
  5. Chinese source word in simplified script (optional)
  6. pinyin transcription
  7. English translation
  8. comment (optional)

A small number of gismu have two Lojbanised forms; each of these forms gets its own line. These gismu are “torni, mamta, lenjo, dansu, detri, datka, xalni, minra, and kantu.”

The etymology does not include the cultural gismu, the gismu “broda, brode, brodi, brodo, brodu” which have been constructed from “bridi”, and other gismu which have not been generated from the six source languages.

The following conventions are used inside the comment field:

FIXIT ...
needs review for the given reason
FIXIT missing
the source word could not be reconstructed
FIXIT dubious
the source word could be reconstructed, but may be wrong
FIXIT correct transcription “...”
the source word does not exactly match the Lojbanised form; the correct Lojbanisation for the source word is specified (tagged as FIXIT because this may also be due to wrong reconstruction)
cf. ... (...) [...] “...”
Chinese word with simplified form, pinyin, and translation
cf. ... (...) [...] “...” (compound)
Chinese compound word including the source word
cf. ... (...) [...] “...” (component)
Chinese word included in the compound source word
cf. ... (...) [...] “...” (variant)
another source word candidate, less frequently used or with a meaning more distant from the gismu definition
source: ...
main translation from an authority other than CEDICT
“...” (source: ...)
additional translation from an authority other than CEDICT

Translations from chinaboard.de are given in the original German; an English translation is given in parentheses: “GERMAN-1; GERMAN-2; ... (= ENGLISH-1; ENGLISH-2; ...)”

Compound words are indicated where the source word has too broad or distant a meaning when taken on its own, but as a component in the compound words clearly matches the gismu.

The “component” category is also used where both the source word and the additional Chinese word have a common component.

The “variant” category includes mere character variants having identical pronunciation and meaning, distinct words matching the same Lojbanised source word possibly with minor differences in pronunciation and meaning, as well as variant pronunciations of the same character matching the same Lojbanised source word.


Reconstruction procedure

The list of gismu with English keyword and the Chinese source words in Lojbanised form was obtained from the gismu etymology file. This file lists source language words in a Lojbanised form, in ASCII, without inflectional endings and with affricates reduced to simple spirants; and a few other rules, some of them source-language specific.

Each English keyword was first looked up in the CEDICT dictionary at mandarintools.com. This produced between zero and about twenty Chinese candidate words for each Lojbanised source word. Each Chinese word was given in traditional script, simplified script, pinyin transcription, and English translation.

Each of the candidates was checked manually, matching the Lojbanised source word against the pinyin transcription, and the English translation of the Chinese candidate against the English keyword for the gismu as well as the official gismu definition.

Where no matches could be found, or the match was dubious, the additional dictionaries at zhongwen.com, chinaboard.de and en.wiktionary.org were consulted. Where multiple matches remained, one of these was chosen both by semantical closeness to the gismu definition and by frequency of use as indicated by a web search engine; the eliminated match was reported in the comment field.


Etymological sources

The most important etymological source for Lojban is the list of gismu with Lojbanised source words and scores. The format of this file is described in detail in this message to the Lojban mailing list and the file etysample.txt on the Lojban server. Additional information can be found at the Lojban Etymology wiki page, on the Lojban file server and in this directory on the Lojban server.

The gismu generation process is described in more detail in “What is Lojban?”, ch. 4, sec. 17, and in the “Reference Grammar”, ch. 4, sec. 14.

The gismu “mleca” (less) is listed as “ckamu” in the original etymology file; it was changed in 1990 according to the etymology file itself. Similary, the gismu “donri” (daytime) is listed as “dinri”; it was changed in 1993 as reported by the minutes of the LLG. Both gismu are listed in the newer form here.

The following gismu are missing in the gismu etymology file: “gocti” (yocto), “gotro” (yotta), “zepti” (zepto), “zetro” (zetta), “slovo” (Slavic), and “vukro” (Ukrainian); the latter two were added in 1993 as reported by the minutes of the LLG. The gismu “mexco” (Mexican) was changed to “mexno” (see this message to the Lojban list). None of these gismu were generated from the six source languages, so this does not affect the Chinese etymology.

The correspondence between Lojban gismu and TLI Loglan, which is also of etymological interest, is described in detail in the file oldlog.txt on the Lojban server.