ŋaren crîþa 9 vlefto: Ŋarâþ Crîþ v9

Orthography and phonology (Elaine)

The phonology and orthography of Ŋarâþ Crîþ can be divided into eight layers in two modes (writing and speaking):

The conversions from 0 to 1, 1 to 2w, and 2s to 3s are functional: each valid input corresponds to exactly one output. The conversion from 1 to 2s is almost so, except when a ⟨&⟩ is present. In the opposite direction, the conversions from 4w to 3w, from 3w to 2w*, and from 2w* to 2w are functional. Furthermore, for any conversion, it can be determined whether a given input can be converted into a given output without external information.

In addition, the conversion between 1 and 2w is bijective: valid layer-1 and layer-2w representations can be paired with each other.

Layers 1 and 2w: Cenvos and its romanization

Rather than starting at layer 0, we start at layers 1 and 2w.

Cenvos, the native script of Ŋarâþ Crîþ, is written from right to left. This script can be analyzed on two levels: graphemes, which constitute the abstract level and glyphs, which are the characters being written. For instance, Cenvos has one grapheme romanized as ⟨c⟩ that corresponds to two different glyphs: the non-final form 𐲀𐲢 (denoted as ²⟨c⟩) and the final form 𐲀 (²⟨c$). As another example, the sequence 𐲌𐲁 (⟨me⟩ = ²⟨me) consists of one glyph but two graphemes.

In this grammar, we primarily use the romanization, whose symbols largely map one-to-one with Cenvos graphemes. Cenvos has four kinds of graphemes:

Of course, there is also the space.

True letters
Final forms and ligatures (layer 2w)
Table 1: The graphemes of Ŋarâþ Crîþ. (The columns are read from left to right.)

The letters ⟨w⟩, ⟨x⟩, ⟨y⟩, and ⟨z⟩ are USR letters. These are used in foreign languages written in Cenvos to represent phonemes that are not approximated by the phonology of Ŋarâþ Crîþ. Each foreign orthography is free to assign them as it pleases.

Cenvos has two graphemes that change form at the end of the word: ⟨c⟩ and ⟨ŋ⟩, as well as several ligatures. We do not distinguish these forms in the romanization.

The marker ⟨*⟩ is used for foreign words, such as loanwords and foreign names. ⟨#⟩ is used to prefix given names. ⟨+⟩ is used to prefix surnames passed by native conventions (i.e. from parent to child within the same gender); ⟨+*⟩ marks a surname passed using non-native conventions. Place names are prefixed with ⟨@⟩. ⟨#⟩, ⟨+⟩, ⟨+*⟩, and ⟨@⟩ can all be used with ⟨*⟩, in which case ⟨*⟩ occurs first. Note that ⟨+*⟩ is a single letter of its own and not a ligature.

At the start of a word, ⟨&⟩ indicates reduplication of an unspecified prefix of the rest of the word. For instance, ⟨&cên⟩ can be pronounced as if it were ⟨cêcên⟩ or ⟨cêncên⟩. (⟨&⟩ occurs after all other markers in this case.) This usage is not productive in standard Ŋarâþ Crîþ, but it appears in a few words, as well as in some idiosyncratic cases. At the middle or the end of a word, or alone, it indicates ellipsis of part or all of the word, most often to abbreviate or censor a word. Lastly, ⟨&{}⟩ is used similarly to the ellipsis in Western punctuation.

Markers can be applied to multi-word strings by surrounding the string with the delimiters ⟨{}⟩. In legal language, ⟨{}⟩ are also used around phrases to resolve ambiguities.

The sentence punctuation ⟨.⟩, ⟨?⟩, and ⟨!⟩ are used as expected. ⟨;⟩ is used to separate two independent clause phrases within the same sentence. The quotation marks, ⟨«»⟩, are used around quotations, direct or indirect. A ⟨.⟩ at the end of a quotation embedded within another sentence is omitted.

⟨’⟩ is used to separate clitics from the rest of the word to which they are attached. ⟨·⟩ indicates lenition; it could be described as a “letter modifier”. It is also used as a decimal point: officially, it is used after the most significant digit of an inexact numeral when written with digits, but it also used unofficially to write non-integers.

⟨/⟩, as its derivation from ⟨i⟩ suggests, is used to separate the number of mjari from the number of edva when writing currency amounts.

The morpheme boundary marker, ⟨-⟩, is sometimes used metalinguistically to mark a morpheme boundary, but it is not strictly a part of layer 1.

Spaces are placed in the following places:

[TODO: cover mentions of letters within the language, corresponding to v7 p17 “When letters or markers are referred to, … but the effects on other glyphs are not standardized”]

Digits are interchangeable with short-form numerals, but not with long-form numerals. They are also written right-to-left in Cenvos, with the most significant digit first: 𐲲𐲺𐲳 is 0x2A3 = 675.

Table 2: The digits of Ŋarâþ Crîþ. (The columns are read from left to right.)

Letter numbering

Sometimes, an integer must be assigned to each letter. In this case, the assignment shown in the table below is used. Note that numbers are not assigned fully sequentially. Furthermore, this function is valid only for layer 1 graphemes.

True letters
Table 3: Letter numbering in Ŋarâþ Crîþ. (The columns are read from left to right.)

The letter sum of a word is the sum of all of its letters. This value is used in some of the noun declension paradigms.

It is theorized that letter numbers were assigned in the following manner:


The true letters and the markers are collated in their respective order, except for ⟨&⟩, which is ignored. Lenited letters are treated as their respective base letters, except when two words differ only by the presence or absence of a lenition mark, in which case the lenited variant is collated after the base letter: ⟨saga⟩ < ⟨sag·a⟩ < ⟨sada⟩ < ⟨saħa⟩. Numerals are collated after all letters.

In a directory of personal names, entries are collated on surnames, with given names considered only when surnames are identical. Headings in such a list include the prefix up to an including the first true letter: ⟨+merlan #flirora⟩ would be found under ⟨+m⟩.

Ordered items can be labeled using numerals (starting from 0) or letters. In the latter case, only the letters ⟨c e n v o s r l m a f g p t î i d h⟩ are used.


A digit immediately preceding text surrounded by quotation or grouping marks constitutes a numquote. The digit is usually not pronounced in this case. Numquotes are mainly used for secondary purposes that lack any dedicated punctuation.

B{}Contains parenthetical information: provides supplementary information. The sentence should still be grammatical without the parenthetical content.
1{}Lists an alias of a referent mentioned by name.
2{}Surrounds a key-value list. Used as such: ⟨2{3{&{}} 4{&{}} 3{&{}} 4{&{}}}⟩
3{}Used for listing a key inside ⟨2{}⟩.
4{}Used for listing a value inside ⟨2{}⟩. When not directly inside a ⟨2{}⟩ numquote, marks a list: elements are delimited by spaces, and ⟨{}⟩ can be used to insert multi-word elements.
9{}Used to contain abbreviated quantities in the traditional currency system.
*9{}Used to contain abbreviated quantities in a currency system other than the traditional one.
Table 4: Numquotes in Ŋarâþ Crîþ.

Layer 0

The phonotactics of Ŋarâþ Crîþ can be expressed in terms of a state machine with five states: s (syllabic), g (glide), o (onset), n (nuclear), and ω (terminal). Each transition defined in the state machine has a set of accepted payloads.

The finite state machine describing the phonotactics of Ŋarâþ Crîþ.
Figure 1: The finite state machine describing the phonotactics of Ŋarâþ Crîþ.

A word, or rather the phonotactically relevant part thereof, starts in the syllabic state and ends in the terminal state.

The payloads associated with a transition are strings of manifested grapheme phrases. A manifested grapheme phrase is either a true letter not followed by a lenition marker (plain letter), any of ⟦p t d č c g m f v ð⟧ followed by a lenition mark (lenited letter), or, word-initially, one of the digraphs ⟦mp vp dt nd gc ŋg vf ðþ lł⟧ (eclipsed letter). All other graphemes are ignored for the purposes of phonotactics.

A manifested grapheme phrase has a base letter. The base letter of a plain letter is itself. The base letter of a lenited letter is the letter without the lenition mark. The base letter of an eclipsed letter is the second letter of the digraph.

A vowel is any of ⟦e o a î i ê ô â u⟧. ⟦j⟧ is a semivowel. All other manifested grapheme phrases are consonants.

An effective plosive is a manifested grapheme phrase whose base letter is any of ⟦p t d c g⟧. An effective fricative is a manifested grapheme phrase whose base letter is any of ⟦f v þ ð s š h ħ⟧.

A hatted vowel is one of ⟦î ê ô â⟧. All other vowels are unhatted vowels.

An initial is the beginning of a syllable and consists of one of the following:

The set of valid initials is denoted by the capital Greek letter iota, Ι.

A medial may either be empty or ⟦j⟧. The set of medials is denoted by the capital Greek letter mu, Μ.

The set of vowels is denoted by the capital Greek letter nu, Ν.

A coda is either a simple coda or a complex coda. A simple coda is one of ⟦s r n þ rþ l t c f cþ⟧ or nothing at all. A complex coda is one of ⟦st lt nt ns ls nþ lþ m⟧, with ⟦-m⟧ used only in a handful of function words. The set of all simple codas is denoted by the capital Greek letter kappa, Κ, and the set of all simple or complex codas is denoted by the capital Greek letter omega, Ω.

An onset is an initial plus a medial. A bridge is the coda of one syllable plus the onset of the following syllable.


Valid morphemes have addditional criteria that they must satisfy:

A bridge is canonical if it follows the maximal-onset principle; that is, if the onset has the maximal number of consonants for the given sequence of manifested grapheme phrases. For instance, ⟦-n-t-⟧ and ⟦-r-þl-⟧ are canonical, but ⟦-c-þ-⟧ and ⟦-rþ-l-⟧ are not (as they can be regrouped as ⟦-∅-cþ-⟧ and to ⟦-r-þl-⟧).

A bridge is valid if it can arise as the result of repairing a canonical bridge. Bridge repair is intended to change a bridge that is awkward to pronounce into one that is less so. It has the following properties:

  1. Lenition in the onset does not affect whether bridge repair preserves the bridge.
  2. The presence of ⟦j⟧ in the onset has no influence on bridge repair.
  3. All bridges with a coda that is null, ⟦-r⟧, or ⟦-l⟧ are unaffected by bridge repair.
  4. If a bridge with a complex initial I is not changed by bridge repair, then the bridge with an initial containing only the first consonant of I is also unchanged.

Importantly, bridge repair is not idempotent: ⟦-sð-⟧ is repaired to ⟦-ss-⟧, but ⟦-ss-⟧ is repaired to ⟦-þ-⟧. In addition, bridge repair might yield the pseudo-coda ⟦-ŋ⟧, which changes the preceding medial and vowel.

The following subsections describe the rules for bridge repair. A bridge that is modified by one rule might be further changed by later rules.

Coalescence of ⟦-tš-⟧

The bridge ⟦-tš-⟧ is changed to ⟦-č-⟧.

Fortition of ⟦h-⟧ and ⟦ħ-⟧

The onset ⟦h-⟧ is fortited to ⟦c-⟧ after ⟦-s⟧, ⟦-þ⟧, ⟦-rþ⟧, ⟦-t⟧, ⟦-c⟧, ⟦-f⟧, or ⟦-cþ⟧. ⟦hr-⟧ and ⟦hl-⟧ are fortited analogously.

The onset ⟦ħ-⟧ is fortited to ⟦g-⟧ after ⟦-t⟧, ⟦-c⟧, and ⟦-f⟧. ⟦ħr-⟧ and ⟦ħl-⟧ are fortited analogously.

Metathesis of ⟦t⟧ before ⟦c⟧ or ⟦g⟧

⟦-tc-⟧ and ⟦-tg-⟧ are metathesized to ⟦-ct-⟧ and ⟦-cd-⟧, respectively. Likewise, ⟦-tcr-⟧, ⟦-tcl-⟧, ⟦-tgr-⟧, and ⟦-tgl-⟧ are metathesized to ⟦-ctr-⟧, ⟦-ctl-⟧, ⟦-cdr-⟧, and ⟦-cdl-⟧.

Similar bridges with lenited onsets, such as ⟦-tc·-⟧ and ⟦-tg·r-⟧ are treated analogously, with the resulting onset remaining lenited.

⟦t⟧ is deleted before ⟦cf⟧, ⟦cþ⟧, ⟦cš⟧, ⟦cš⟧, ⟦gv⟧, and ⟦gð⟧, devoicing the last two of these.

Nasal assimilation

For these rules, ⟦m·⟧ is counted as a nasal, even though it is pronounced as a fricative.

⟦-t⟧ before a nasal onset assimilates to ⟦-n⟧.

⟦-c⟧ before a nasal onset assimilates to the pseudo-coda ⟦-ŋ⟧. As a special case, ⟦-cŋ-⟧ is repaired to ⟦-ŋ-⟧ instead.

Denasalization of ⟦ŋ-⟧

After ⟦-s⟧, ⟦-þ⟧, ⟦-rþ⟧, ⟦-f⟧, and ⟦cþ⟧, ⟦ŋ-⟧ is denasalized to ⟦g-⟧.

Devoicing of ⟦v-⟧ and ⟦ð-⟧

After ⟦-þ⟧, ⟦-rþ⟧, ⟦-t⟧, ⟦-c⟧, ⟦-f⟧, and ⟦-cþ⟧, ⟦v-⟧ devoices to ⟦f-⟧ and ⟦ð-⟧ devoices to ⟦þ-⟧. Additionally, ⟦ð-⟧ is devoiced after ⟦-s⟧.

This process occurs analogously for the onsets ⟦vr-⟧, ⟦vl-⟧, ⟦ðr-⟧, and ⟦ðl-⟧, except that ⟦-þ⟧ is deleted before ⟦vr-⟧ and ⟦vl-⟧ instead. Additionally, ⟦-rþCR-⟧ onsets (with R = ⟦r⟧ or ⟦l⟧ and C = ⟦v⟧ or ⟦ð⟧) are corrected to ⟦-RC-⟧.

As usual, similar rules apply to lenited onsets: ⟦v·⟧ devoices to ⟦f·⟧, and ⟦ð·⟧ is replaced with a copy of the preceding consonant.

Assimilation of ⟦s⟧ after ⟦þ⟧

After a ⟦þ⟧, ⟦s⟧ is replaced with ⟦þ⟧. Additionally, ⟦ss⟧ is coalesced into ⟦þ⟧, unless it is not followed by a consonant and the latter ⟦s⟧ arose from a ⟦ð⟧ in the previous step.

Degemination before another consonant

⟦þ⟧, ⟦t⟧, ⟦c⟧, and ⟦f⟧ are degeminated before another consonant in the onset; for instance, ⟦-ffr-⟧ is corrected to ⟦-fr-⟧, and ⟦-ccs-⟧ is corrected to ⟦-cs-⟧. ⟦td⟧ is degeminated to ⟦d⟧, and ⟦cg⟧ is degeminated to ⟦g⟧.

This rule also applies when the second instance of the degeminated consonant is lenited, in which case the first copy of the consonant is elided: ⟦-tt·l-⟧⟦-t·l-⟧.

Partial coda elision of bridges with ⟦-rþ⟧ and ⟦-cþ⟧ codas

If the coda is ⟦-rþ⟧, then it becomes ⟦-r⟧ before a fricative followed by ⟦r⟧ or ⟦l⟧, or before the onsets ⟦cf-⟧, ⟦cþ-⟧, ⟦cs-⟧, ⟦cš-⟧, ⟦tf-⟧, and ⟦dv-⟧. Before any other two-letter onset, it becomes ⟦-þ⟧.

If the coda is ⟦-cþ⟧, then it is maintained before the onsets ⟦þ-⟧, ⟦š-⟧, ⟦m·-⟧, ⟦t-⟧, ⟦ħ-⟧, ⟦m·-⟧, or ⟦t·-⟧. Before ⟦cf-⟧, ⟦cþ-⟧, ⟦cs-⟧, ⟦cš-⟧, or ⟦tf-⟧, or before any of the onsets consisting of ⟦þ⟧, ⟦š⟧, or ⟦ħ⟧ followed by ⟦r⟧ or ⟦l⟧, the onset loses its first consonant, and ⟦cs-⟧ additionally becomes ⟦þ-⟧. In all other cases, the coda becomes ⟦-þ⟧.

The pseudo-coda ⟦-ŋ⟧

Nasal assimilation might produce the pseudo-coda ⟦-ŋ⟧ instead of an actual (simple) coda. In this case, the preceding vowel becomes ⟦o⟧ for ⟦a o u⟧, ⟦jo⟧ for ⟦e i⟧, ⟦ô⟧ for ⟦â ô⟧, and ⟦jô⟧ for ⟦ê î⟧, with any glides merging with the preceding glide. The pseudo-coda itself becomes ⟦-r⟧.


Concatenating two morphemes invokes repair processes to maintain validity invariants. In addition, there are environments that may naturally (if rarely) occur within a morpheme but are repaired away when created by appending morphemes.

Deduplication, which occurs on concatenation, affects fricatives in the onset position that precede a non-hatted vowel followed by a homophonous manifested grapheme phrase:

Overall, concatenation invokes the following processes in order:

  1. Any new instances of ⟦j⟧ before ⟦i⟧, ⟦î⟧, or ⟦u⟧ are elided.
  2. Deduplication rules are applied.
  3. Newly formed bridges are canonicalized and repaired.

Note that deduplication happens before any canonicalization; for instance, appending the syllables ⟦reþ⟧ and ⟦eþ⟧ together gives ⟦reþeþ⟧, not ⟦reteþ⟧ (although appending the stem ⟦reþ⟧ to the suffix ⟦eþ⟧ does give ⟦reteþ⟧).

Stem fusion

In Ŋarâþ Crîþ, a stem consists of one or more syllables followed by an onset. In addition, the final onset of a stem must not contain a lenited consonant.

Stem fusion describes a set of related processes on a stem. Stem fusion with a null consonant turns a stem into a word (with a terminal end). Stem fusion with a non-null consonant combines a stem with one of ⟦t⟧, ⟦n⟧, or ⟦þ⟧ into another stem.

To describe stem fusion, we use the following notation:

Given a stem τΤ, τε𝒮sω is the result of fusing τ with a null consonant, and τθ𝒮so is the result of fusing τ with a non-null consonant θ{𝗇,𝗍,þ}.

The following variables are used: Σxy𝒮xy, γΓ, ιΙ, νΝ, κΚ, ωΩ, θ{𝗇,𝗍,þ}.

Earlier rules take precedence over later ones. In addition, we use a shorthand for rules that yield a common sequence of syllables regardless of the fusion consonant: a rule such as τσ, where τΤ and σ𝒮ss, is interpreted as the rules τε=σsω and τθ=σ:θ.

Another shorthand used in this document is ττ, which implies τε=(τ)ε and τθ=(τ)θ.

TODO: The rules for stem fusion are not finalized and therefore are temporarily omitted.

Properties of stem fusion

Fusion with ⟦t⟧ is invariant (i.e. yields the same stem as the original) only when the final onset of the stem is ⟦t-⟧.

Fusion with ⟦n⟧ is invariant only when the final bridge of the stem is ⟦-nn-⟧.

Fusion with ⟦þ⟧ is invariant only when the final onset of the stem is ⟦þ-⟧ or ⟦cþ-⟧.

Layer 2s

TODO: deal with complex codas before a clitic boundary

Traditionally, only manifested grapheme phrases are considered to be significant in the conversion from layer 1 to layer 2s. However, other graphemes such as punctuation can affect prosody.

n ndnčt͡ʂ
ŋ ŋgŋî
v m· vpvjj
ssd dtd
þ t·θð d· ðþð
š č·ʂh c·x
rɹħ g·ʕ
l lłlê
m mpmâ
f p·ff· v· ð·
g gcɡ
Table 5: Layer 1 to layer 2s conversions.

Layer 2 has a two-way tone contrast between vowels: the high tone (H) is the default, being contrasted with the low tone (L). For historical reasons, the presence or absence of a low tone on a vowel is called [±creaky].

Layer 3s

The conversion from layer 2s to layer 3s is comparatively more complex.

First, the following changes are made:

Plosives in a coda are unreleased. All unvoiced plosives and affricates outside of a coda are aspirated.

While Ŋarâþ Crîþ has two tone levels phonemically, their realizations in the phonetic level is more complex. It is common to describe phonetic tone using seven levels, from 0 (the lowest) to 6 (the highest). Each syllable has one or more tones.

In order to describe tone, we must introduce the concept of “stress”, which is placed according to the following rules:

We also introduce the concept of a tone accounting unit (TAU), which is the level at which tones are realized. That is, the tone of a syllable depends only on the contents of the TAU in which it lies. Instances of content words occupy different TAUs from each other, but some function words occupy the same TAU as the preceding or following word (in particular, such words have no stressed syllable and are confined to a relatively fixed position):

(Stress is accounted by orthographic word, not by TAU.)

First, two adjacent vowels are fused into a diphthong if the vowels are not identical, the first vowel is stressed, the second vowel is [i] or [u̜], and the syllable to which the second vowel belongs can be interpreted as having an empty coda. For purposes of tonekeeping, a diphthong is considered to be composed of two different syllables.

In general, unstressed H and L syllables have tone levels 4 and 2, respectively; stressed H and L syllables have tone levels 5 and 1. However, an open H or L syllable before a stressed syllable gets level 3 or 1, respectively, instead. Diphthongs get different values: 65 for HH, 53 for HL, 13 for LH, and 21 for LL.

If two adjacent copies of an identical vowel have the same tone level at this stage, then the one closer to the stressed syllable rises by one tone level and the one farther from it falls by one level.

A tone level of n is then changed into a tone contour in the following situations, unless doing so would result in an out-of-bounds tone level:

In addition, other syllables change their tone levels:

Finally, if all tones have a level of 4 or higher, then the lowest tone (breaking ties by preferring later tones) is lowered to 3, and all other tones in the same syllable are lowered by the same amount. All level-3 tones are then lowered to level 2.


The isochrony of Ŋarâþ Crîþ falls somewhere between syllable and mora timing, where:


Ŋarâþ Crîþ has two kinds of initial mutations: lenition and eclipsis. Neither kind of mutation has any effect on plosive-fricative onsets or any of ⟦r l n ŋ ħ⟧.

Lenition tends to turn plosives into fricatives and is indicated with a middle dot ⟦·⟧ after the consonant affected. In particular, it affects ⟦p t d č c g m f v ð⟧. (See Layer 2 for pronunciation details.) Partial lenition does not affect any of ⟦f v ð⟧; that is, it does not lenite consonants that would become silent. Unless otherwise qualified, lenition refers to total lenition, which affects ⟦f v ð⟧.

In a word containing ⟦&⟧, both instances of the reduplicated prefix are lenited. For example, ⟨&d·enfo⟩ can be pronounced as [ðeðenfo] but not as *[ðedenfo].

Lenition occurs in the following environments:

Eclipsis tends to add voice to voiceless consonants and change voiced stops into nasals. It is indicated by prefixing a consonant: ⟦t d c g f þ ł⟧ become ⟦dt nd gc ŋg vf ðþ lł⟧, respectively. ⟦p⟧ becomes ⟦vp⟧ before any of ⟦i e u î ê⟧ and ⟦mp⟧ elsewhere. If a word starts with a vowel, then it is eclipsed by prefixing ⟦g⟧.

In a word containing ⟦&⟧, only the first instance of the reduplicated prefix is eclipsed. For example, ⟨n&denfin⟩ can be pronounced as [nedenfin] but not as *[nenenfin].

Eclipsis occurs in the following environments:

Lenition can happen on any syllabic onset of a word, but eclipsis is limited to word-initial positions.

In this documentation, lenition is sometimes marked with an empty circle ○, and eclipsis with an filled circle ●. Partial lenition is marked with an empty triangle △.


Almost all loanwords in Ŋarâþ Crîþ are nouns.

Generally, when borrowing from languages that use the Cenvos script or a script related to it, and whose orthographies in the script in question do not deviate too far from Ŋarâþ Crîþ usage, Ŋarâþ Crîþ prefers to borrow the word graphemically than phonemically.

The typography of Ŋarâþ Crîþ

TODO: This section has not changed and is therefore omitted.