Orthography and phonology (Elaine)

The phonology and orthography of Ŋarâþ Crîþ can be divided into eight layers in two modes (writing and speaking):

Layer 0 is the underlying morphographemic representation. Content in this layer exists structurally instead of linearly. In this grammar, text in this layer is written in double square brackets: ⟦tanc-a⟧.
Layer 1 is the graphemic representation. This representation is subsequently exported to the spoken and written modes. Text in this layer is written with angle brackets: ⟨tanca⟩.
Layer 2w is the surface glyphic representation. This represents the sequence of Cenvos glyphs that is written, observing required ligatures and final forms. Text in this layer is written with double angle brackets: ²⟨tanca⟩; for a more interesting example, ⟨mencoc⟩ becomes ²⟨mencoc$⟩.
Layer 2w* is an intermediate layer between 2w and 3w, in which discretionary ligatures are introduced to 2w text. For instance, ²⟨#flirora⟩ can be realized as ²*⟨#fliro ra⟩.
Layer 3w is the topological representation, showing optional ligatures as well as stroke order variations. Text in this layer is written with double angle brackets: ³⟨t_1α a_1γ n_1α c_1α a_1α ⟩. More interestingly, ²⟨mencoc$⟩ could become ³⟨me_1α n_1α c_1α o_1α c$_1α ⟩.
Layer 4w is the presentational representation, adding to 3w variations in the strokes themselves and how strokes within a glyph are joined. Text in this layer is written with double angle brackets: ⁴⟨t_1α a_1γ n_1α c_1α a_1α ⟩.
Layer 2s is the phonemic representation. We use slashes for this, as usual: /tanka/.
Layer 3s is the phonetic representation, or what is pronounced. We use square brackets for this, as usual: [tʰa⁴ɲcʰa²].

The conversions from 0 to 1, 1 to 2w, and 2s to 3s are functional: each valid input corresponds to exactly one output. The conversion from 1 to 2s is almost so, except when a ⟨&⟩ is present. In the opposite direction, the conversions from 4w to 3w, from 3w to 2w*, and from 2w* to 2w are functional. Furthermore, for any conversion, it can be determined whether a given input can be converted into a given output without external information.

In addition, the conversion between 1 and 2w is bijective: valid layer-1 and layer-2w representations can be paired with each other.

Layers 1 and 2w: Cenvos and its romanization

Rather than starting at layer 0, we start at layers 1 and 2w.

Cenvos, the native script of Ŋarâþ Crîþ, is written from right to left. This script can be analyzed on two levels: graphemes, which constitute the abstract level and glyphs, which are the characters being written. For instance, Cenvos has one grapheme romanized as ⟨c⟩ that corresponds to two different glyphs: the non-final form 𐲀𐲢 (denoted as ²⟨c⟩) and the final form 𐲀 (²⟨c$⟩). As another example, the sequence 𐲌𐲁 (⟨me⟩ = ²⟨me⟩) consists of one glyph but two graphemes.

In this grammar, we primarily use the romanization, whose symbols largely map one-to-one with Cenvos graphemes. Cenvos has four kinds of graphemes:

True letters are graphemes that represent sounds.
Markers, while considered letters, do not represent sounds. Instead, they indicate that the words affected are treated specially. They occur on the level of a word and do not actively participate in morphology.
Punctuation includes the clause-end punctuation ⟨.⟩, ⟨;⟩, ⟨?⟩, and ⟨!⟩; the clitic boundary mark ⟨’⟩; the lenition mark ⟨·⟩; the grouping brackets ⟨{}⟩; and the quotation marks ⟨«»⟩.
Digits can be used to write short numerals.

Of course, there is also the space.

Cen	Name	Rom	Cen	Name	Rom	Cen	Name	Rom
True letters
𐲀𐲢	ca	c	𐲌	ma	m	𐲘	ar	h
𐲁	e	e	𐲍	a	a	𐲙	ħo	ħ
𐲂	na	n	𐲎	fa	f	𐲚	ên	ê
𐲃𐲢	ŋa	ŋ	𐲏	ga	g	𐲛	ôn	ô
𐲄	va	v	𐲐	pa	p	𐲜	ân	â
𐲅	o	o	𐲑	ta	t	𐲝	uħo	u
𐲆	sa	s	𐲒	ča	č	𐳀	cełaŋa	w
𐲇	þa	þ	𐲓	în	î	𐳁	avarte	x
𐲈	ša	š	𐲔	ja	j	𐳂	priþnos	y
𐲉	ra	r	𐲕	i	i	𐳃	telrigjon	z
𐲊	la	l	𐲖	da	d
𐲋	ła	ł	𐲗	ða	ð
Final forms and ligatures (layer 2w)
𐲀		c$	𐲌𐲁		me	𐳀𐳀		ww
𐲃		ŋ$	𐲌𐲌		mm	𐳁𐳁		xx
𐲁𐲁		ee	𐲔𐲜		jâ	𐳂𐳂		yy
𐲁𐲌		em	𐲜𐲔		âj	𐳃𐳃		zz
Markers
𐲤	carþ	#	𐲦	njor	+*	𐲨	nef	*
𐲥	tor	+	𐲧	es	@	𐲯	sen	&
Punctuation
𐲞	gen	.	𐲩	ŋos	’	𐲭	fos	«
𐲟	tja	;	𐲪	łil	·	𐲮	þos	»
𐲠	šac	?	𐲫	rin	{	𐳄	jedva	/
𐲡	cjar	!	𐲬	cin	}	𐲣	mivaf·ome	-

Table 1: The graphemes of Ŋarâþ Crîþ. (The columns are read from left to right.)

The letters ⟨w⟩, ⟨x⟩, ⟨y⟩, and ⟨z⟩ are USR letters. These are used in foreign languages written in Cenvos to represent phonemes that are not approximated by the phonology of Ŋarâþ Crîþ. Each foreign orthography is free to assign them as it pleases.

Cenvos has two graphemes that change form at the end of the word: ⟨c⟩ and ⟨ŋ⟩, as well as several ligatures. We do not distinguish these forms in the romanization.

The marker ⟨*⟩ is used for foreign words, such as loanwords and foreign names. ⟨#⟩ is used to prefix given names. ⟨+⟩ is used to prefix surnames passed by native conventions (i.e. from parent to child within the same gender); ⟨+*⟩ marks a surname passed using non-native conventions. Place names are prefixed with ⟨@⟩. ⟨#⟩, ⟨+⟩, ⟨+*⟩, and ⟨@⟩ can all be used with ⟨*⟩, in which case ⟨*⟩ occurs first. Note that ⟨+*⟩ is a single letter of its own and not a ligature.

At the start of a word, ⟨&⟩ indicates reduplication of an unspecified prefix of the rest of the word. For instance, ⟨&cên⟩ can be pronounced as if it were ⟨cêcên⟩ or ⟨cêncên⟩. (⟨&⟩ occurs after all other markers in this case.) This usage is not productive in standard Ŋarâþ Crîþ, but it appears in a few words, as well as in some idiosyncratic cases. At the middle or the end of a word, or alone, it indicates ellipsis of part or all of the word, most often to abbreviate or censor a word. Lastly, ⟨&{}⟩ is used similarly to the ellipsis in Western punctuation.

Markers can be applied to multi-word strings by surrounding the string with the delimiters ⟨{}⟩. In legal language, ⟨{}⟩ are also used around phrases to resolve ambiguities.

The sentence punctuation ⟨.⟩, ⟨?⟩, and ⟨!⟩ are used as expected. ⟨;⟩ is used to separate two independent clause phrases within the same sentence. The quotation marks, ⟨«»⟩, are used around quotations, direct or indirect. A ⟨.⟩ at the end of a quotation embedded within another sentence is omitted.

⟨’⟩ is used to separate clitics from the rest of the word to which they are attached. ⟨·⟩ indicates lenition; it could be described as a “letter modifier”. It is also used as a decimal point: officially, it is used after the most significant digit of an inexact numeral when written with digits, but it also used unofficially to write non-integers.

⟨/⟩, as its derivation from ⟨i⟩ suggests, is used to separate the number of mjari from the number of edva when writing currency amounts.

The morpheme boundary marker, ⟨-⟩, is sometimes used metalinguistically to mark a morpheme boundary, but it is not strictly a part of layer 1.

Spaces are placed in the following places:

between orthographic words, but not between a clitic and the word to which it is attached
after (but not before) ⟨.⟩, ⟨;⟩, ⟨?⟩, and ⟨!⟩
before ⟨«⟩ and after ⟨»⟩ (but not on the other sides)
around ⟨&{}⟩

[TODO: cover mentions of letters within the language, corresponding to v7 p17 “When letters or markers are referred to, … but the effects on other glyphs are not standardized”]

Digits are interchangeable with short-form numerals, but not with long-form numerals. They are also written right-to-left in Cenvos, with the most significant digit first: 𐲲𐲺𐲳 is 0x2A3 = 675.

Cen	#	Cen	#	Cen	#	Cen	#
𐲰	0	𐲱	1	𐲲	2	𐲳	3
𐲴	4	𐲵	5	𐲶	6	𐲷	7
𐲸	8	𐲹	9	𐲺	A	𐲻	B
𐲼	C	𐲽	D	𐲾	E	𐲿	F

Table 2: The digits of Ŋarâþ Crîþ. (The columns are read from left to right.)

Letter numbering

Sometimes, an integer must be assigned to each letter. In this case, the assignment shown in the table below is used. Note that numbers are not assigned fully sequentially. Furthermore, this function is valid only for layer 1 graphemes.

Letter	Hex	Dec	Letter	Hex	Dec	Letter	Hex	Dec
True letters
c	0	0	m	20	32	h	11	17
e	1	1	a	9	9	ħ	12	18
n	2	2	f	A	10	ê	101	257
ŋ	2B	43	g	B	11	ô	104	260
v	3	3	p	C	12	â	109	265
o	4	4	t	D	13	u	13	19
s	5	5	č	DE	222	w	−1	−1
þ	55	85	î	E	14	x	−2	−2
š	5E	94	j	6E	110	y	−3	−3
r	6	6	i	F	15	z	−4	−4
l	7	7	d	10	16
ł	77	119	ð	155	341
Markers
#	14	20	+*	16	22	*	19	25
+	15	21	@	17	23	&	1A	26

Table 3: Letter numbering in Ŋarâþ Crîþ. (The columns are read from left to right.)

The letter sum of a word is the sum of all of its letters. This value is used in some of the noun declension paradigms.

It is theorized that letter numbers were assigned in the following manner:

The basic true letters inherited from Necarasso Cryssesa (i.e. those corresponding to ⟨c e n v o s r l m a f g p t î i d h⟩) received sequential numbers from zero. The number of ⟨m⟩ was changed due to superstitions against the number eight.
⟨ŋ þ š ł č ð⟩ received numbers based on what letter pairs (or triplets in the case of ⟨ð⟩) they were based on.
⟨ê⟩, ⟨ô⟩, and ⟨â⟩ were numbered as 256 + base glyph number.
The other letters and the markers received sequential numbers after ⟨h⟩, skipping 0x18.

Collation

The true letters and the markers are collated in their respective order, except for ⟨&⟩, which is ignored. Lenited letters are treated as their respective base letters, except when two words differ only by the presence or absence of a lenition mark, in which case the lenited variant is collated after the base letter: ⟨saga⟩ < ⟨sag·a⟩ < ⟨sada⟩ < ⟨saħa⟩. Numerals are collated after all letters.

In a directory of personal names, entries are collated on surnames, with given names considered only when surnames are identical. Headings in such a list include the prefix up to an including the first true letter: ⟨+merlan #flirora⟩ would be found under ⟨+m⟩.

Ordered items can be labeled using numerals (starting from 0) or letters. In the latter case, only the letters ⟨c e n v o s r l m a f g p t î i d h⟩ are used.

Numquotes

A digit immediately preceding text surrounded by quotation or grouping marks constitutes a numquote. The digit is usually not pronounced in this case. Numquotes are mainly used for secondary purposes that lack any dedicated punctuation.

Numquote	Meaning
B{}	Contains parenthetical information: provides supplementary information. The sentence should still be grammatical without the parenthetical content.
1{}	Lists an alias of a referent mentioned by name.
2{}	Surrounds a key-value list. Used as such: ⟨2{3{&{}} 4{&{}} 3{&{}} 4{&{}}}⟩
3{}	Used for listing a key inside ⟨2{}⟩.
4{}	Used for listing a value inside ⟨2{}⟩. When not directly inside a ⟨2{}⟩ numquote, marks a list: elements are delimited by spaces, and ⟨{}⟩ can be used to insert multi-word elements.
9{}	Used to contain abbreviated quantities in the traditional currency system.
*9{}	Used to contain abbreviated quantities in a currency system other than the traditional one.

Table 4: Numquotes in Ŋarâþ Crîþ.

Layer 0

The phonotactics of Ŋarâþ Crîþ can be expressed in terms of a state machine with five states: $s$ (syllabic), $g$ (glide), $o$ (onset), $n$ (nuclear), and $ω$ (terminal). Each transition defined in the state machine has a set of accepted payloads.

Figure 1: The finite state machine describing the phonotactics of Ŋarâþ Crîþ.

A word, or rather the phonotactically relevant part thereof, starts in the syllabic state and ends in the terminal state.

The syllabic state, $s$ , is reached between syllables. In this state, an initial can be accepted to transition to the glide state.
The glide state, $g$ , is reached immediately after an initial. This state can accept a medial to transition to the onset state.
From the onset state, $o$ , a vowel (also called a nucleus) leads to the nuclear state.
From the nuclear state, $n$ , a simple coda can be accepted to transition back to the syllabic state. Alternatively, a simple or complex coda may be accepted to transition to the terminal state.
The terminal state is the end state for a word and marks the end of the final syllable. There are no transitions from this state.

The payloads associated with a transition are strings of manifested grapheme phrases. A manifested grapheme phrase is either a true letter not followed by a lenition marker (plain letter), any of ⟦p t d č c g m f v ð⟧ followed by a lenition mark (lenited letter), or, word-initially, one of the digraphs ⟦mp vp dt nd gc ŋg vf ðþ lł⟧ (eclipsed letter). All other graphemes are ignored for the purposes of phonotactics.

A manifested grapheme phrase has a base letter. The base letter of a plain letter is itself. The base letter of a lenited letter is the letter without the lenition mark. The base letter of an eclipsed letter is the second letter of the digraph.

A vowel is any of ⟦e o a î i ê ô â u⟧. ⟦j⟧ is a semivowel. All other manifested grapheme phrases are consonants.

An effective plosive is a manifested grapheme phrase whose base letter is any of ⟦p t d c g⟧. An effective fricative is a manifested grapheme phrase whose base letter is any of ⟦f v þ ð s š h ħ⟧.

A hatted vowel is one of ⟦î ê ô â⟧. All other vowels are unhatted vowels.

An initial is the beginning of a syllable and consists of one of the following:

nothing at all
a single consonant
an effective plosive or fricative plus ⟦r⟧ or ⟦l⟧
any of ⟦cf cþ cs cš gv gð tf dv⟧; that is, a plosive plus a fricative of the same voicing, such that the plosive has a more retracted place of articulation than the fricative

The set of valid initials is denoted by the capital Greek letter iota, $Ι$ .

A medial may either be empty or ⟦j⟧. The set of medials is denoted by the capital Greek letter mu, $Μ$ .

The set of vowels is denoted by the capital Greek letter nu, $Ν$ .

A coda is either a simple coda or a complex coda. A simple coda is one of ⟦s r n þ rþ l t c f cþ⟧ or nothing at all. A complex coda is one of ⟦st lt nt ns ls nþ lþ m⟧, with ⟦-m⟧ used only in a handful of function words. The set of all simple codas is denoted by the capital Greek letter kappa, $Κ$ , and the set of all simple or complex codas is denoted by the capital Greek letter omega, $Ω$ .

An onset is an initial plus a medial. A bridge is the coda of one syllable plus the onset of the following syllable.

Validation

Valid morphemes have addditional criteria that they must satisfy:

All bridges must be valid.
⟦j⟧ cannot precede ⟦i⟧, ⟦î⟧, or ⟦u⟧.
⟦h⟧ cannot occur word-initially.
Conversely, eclipsed letters may only occur word-initially.

A bridge is canonical if it follows the maximal-onset principle; that is, if the onset has the maximal number of consonants for the given sequence of manifested grapheme phrases. For instance, ⟦-n-t-⟧ and ⟦-r-þl-⟧ are canonical, but ⟦-c-þ-⟧ and ⟦-rþ-l-⟧ are not (as they can be regrouped as ⟦-∅-cþ-⟧ and to ⟦-r-þl-⟧).

A bridge is valid if it can arise as the result of repairing a canonical bridge. Bridge repair is intended to change a bridge that is awkward to pronounce into one that is less so. It has the following properties:

Lenition in the onset does not affect whether bridge repair preserves the bridge.
The presence of ⟦j⟧ in the onset has no influence on bridge repair.
All bridges with a coda that is null, ⟦-r⟧, or ⟦-l⟧ are unaffected by bridge repair.
If a bridge with a complex initial I is not changed by bridge repair, then the bridge with an initial containing only the first consonant of I is also unchanged.

Importantly, bridge repair is not idempotent: ⟦-sð-⟧ is repaired to ⟦-ss-⟧, but ⟦-ss-⟧ is repaired to ⟦-þ-⟧. In addition, bridge repair might yield the pseudo-coda ⟦-ŋ⟧, which changes the preceding medial and vowel.

The following subsections describe the rules for bridge repair. A bridge that is modified by one rule might be further changed by later rules.

Coalescence of ⟦-tš-⟧

The bridge ⟦-tš-⟧ is changed to ⟦-č-⟧.

Fortition of ⟦h-⟧ and ⟦ħ-⟧

The onset ⟦h-⟧ is fortited to ⟦c-⟧ after ⟦-s⟧, ⟦-þ⟧, ⟦-rþ⟧, ⟦-t⟧, ⟦-c⟧, ⟦-f⟧, or ⟦-cþ⟧. ⟦hr-⟧ and ⟦hl-⟧ are fortited analogously.

The onset ⟦ħ-⟧ is fortited to ⟦g-⟧ after ⟦-t⟧, ⟦-c⟧, and ⟦-f⟧. ⟦ħr-⟧ and ⟦ħl-⟧ are fortited analogously.

Metathesis of ⟦t⟧ before ⟦c⟧ or ⟦g⟧

⟦-tc-⟧ and ⟦-tg-⟧ are metathesized to ⟦-ct-⟧ and ⟦-cd-⟧, respectively. Likewise, ⟦-tcr-⟧, ⟦-tcl-⟧, ⟦-tgr-⟧, and ⟦-tgl-⟧ are metathesized to ⟦-ctr-⟧, ⟦-ctl-⟧, ⟦-cdr-⟧, and ⟦-cdl-⟧.

Similar bridges with lenited onsets, such as ⟦-tc·-⟧ and ⟦-tg·r-⟧ are treated analogously, with the resulting onset remaining lenited.

⟦t⟧ is deleted before ⟦cf⟧, ⟦cþ⟧, ⟦cš⟧, ⟦cš⟧, ⟦gv⟧, and ⟦gð⟧, devoicing the last two of these.

Nasal assimilation

For these rules, ⟦m·⟧ is counted as a nasal, even though it is pronounced as a fricative.

⟦-t⟧ before a nasal onset assimilates to ⟦-n⟧.

⟦-c⟧ before a nasal onset assimilates to the pseudo-coda ⟦-ŋ⟧. As a special case, ⟦-cŋ-⟧ is repaired to ⟦-ŋ-⟧ instead.

Denasalization of ⟦ŋ-⟧

After ⟦-s⟧, ⟦-þ⟧, ⟦-rþ⟧, ⟦-f⟧, and ⟦cþ⟧, ⟦ŋ-⟧ is denasalized to ⟦g-⟧.

Devoicing of ⟦v-⟧ and ⟦ð-⟧

After ⟦-þ⟧, ⟦-rþ⟧, ⟦-t⟧, ⟦-c⟧, ⟦-f⟧, and ⟦-cþ⟧, ⟦v-⟧ devoices to ⟦f-⟧ and ⟦ð-⟧ devoices to ⟦þ-⟧. Additionally, ⟦ð-⟧ is devoiced after ⟦-s⟧.

This process occurs analogously for the onsets ⟦vr-⟧, ⟦vl-⟧, ⟦ðr-⟧, and ⟦ðl-⟧, except that ⟦-þ⟧ is deleted before ⟦vr-⟧ and ⟦vl-⟧ instead. Additionally, ⟦-rþCR-⟧ onsets (with R = ⟦r⟧ or ⟦l⟧ and C = ⟦v⟧ or ⟦ð⟧) are corrected to ⟦-RC-⟧.

As usual, similar rules apply to lenited onsets: ⟦v·⟧ devoices to ⟦f·⟧, and ⟦ð·⟧ is replaced with a copy of the preceding consonant.

Assimilation of ⟦s⟧ after ⟦þ⟧

After a ⟦þ⟧, ⟦s⟧ is replaced with ⟦þ⟧. Additionally, ⟦ss⟧ is coalesced into ⟦þ⟧, unless it is not followed by a consonant and the latter ⟦s⟧ arose from a ⟦ð⟧ in the previous step.

Degemination before another consonant

⟦þ⟧, ⟦t⟧, ⟦c⟧, and ⟦f⟧ are degeminated before another consonant in the onset; for instance, ⟦-ffr-⟧ is corrected to ⟦-fr-⟧, and ⟦-ccs-⟧ is corrected to ⟦-cs-⟧. ⟦td⟧ is degeminated to ⟦d⟧, and ⟦cg⟧ is degeminated to ⟦g⟧.

This rule also applies when the second instance of the degeminated consonant is lenited, in which case the first copy of the consonant is elided: ⟦-tt·l-⟧ → ⟦-t·l-⟧.

Partial coda elision of bridges with ⟦-rþ⟧ and ⟦-cþ⟧ codas

If the coda is ⟦-rþ⟧, then it becomes ⟦-r⟧ before a fricative followed by ⟦r⟧ or ⟦l⟧, or before the onsets ⟦cf-⟧, ⟦cþ-⟧, ⟦cs-⟧, ⟦cš-⟧, ⟦tf-⟧, and ⟦dv-⟧. Before any other two-letter onset, it becomes ⟦-þ⟧.

If the coda is ⟦-cþ⟧, then it is maintained before the onsets ⟦þ-⟧, ⟦š-⟧, ⟦m·-⟧, ⟦t-⟧, ⟦ħ-⟧, ⟦m·-⟧, or ⟦t·-⟧. Before ⟦cf-⟧, ⟦cþ-⟧, ⟦cs-⟧, ⟦cš-⟧, or ⟦tf-⟧, or before any of the onsets consisting of ⟦þ⟧, ⟦š⟧, or ⟦ħ⟧ followed by ⟦r⟧ or ⟦l⟧, the onset loses its first consonant, and ⟦cs-⟧ additionally becomes ⟦þ-⟧. In all other cases, the coda becomes ⟦-þ⟧.

The pseudo-coda ⟦-ŋ⟧

Nasal assimilation might produce the pseudo-coda ⟦-ŋ⟧ instead of an actual (simple) coda. In this case, the preceding vowel becomes ⟦o⟧ for ⟦a o u⟧, ⟦jo⟧ for ⟦e i⟧, ⟦ô⟧ for ⟦â ô⟧, and ⟦jô⟧ for ⟦ê î⟧, with any glides merging with the preceding glide. The pseudo-coda itself becomes ⟦-r⟧.

Concatenation

Concatenating two morphemes invokes repair processes to maintain validity invariants. In addition, there are environments that may naturally (if rarely) occur within a morpheme but are repaired away when created by appending morphemes.

Deduplication, which occurs on concatenation, affects fricatives in the onset position that precede a non-hatted vowel followed by a homophonous manifested grapheme phrase:

The onset ⟦f⟧ or ⟦tf⟧ followed by a non-hatted vowel then ⟦f⟧ or ⟦p·⟧ is replaced with ⟦t⟧.
The onset ⟦þ⟧ or ⟦cþ⟧ followed by a non-hatted vowel then ⟦þ⟧ or ⟦t·⟧ is replaced with ⟦t⟧. In addition, a preceding ⟦þ⟧ or ⟦cþ⟧ coda is replaced with ⟦s⟧, and a preceding ⟦rþ⟧ coda is replaced with ⟦r⟧.
⟦h⟧ followed by a non-hatted vowel then ⟦h⟧ or ⟦c·⟧ is replaced with ⟦p⟧.
⟦v⟧ followed by a non-hatted vowel then ⟦v⟧ or ⟦m·⟧ is replaced with ⟦n⟧.
⟦ð⟧ followed by a non-hatted vowel then ⟦ð⟧ or ⟦d·⟧ is replaced with ⟦ŋ⟧.
⟦ħ⟧ followed by a non-hatted vowel then ⟦ħ⟧ or ⟦g·⟧ is replaced with ⟦g⟧.

Overall, concatenation invokes the following processes in order:

Any new instances of ⟦j⟧ before ⟦i⟧, ⟦î⟧, or ⟦u⟧ are elided.
Deduplication rules are applied.
Newly formed bridges are canonicalized and repaired.

Note that deduplication happens before any canonicalization; for instance, appending the syllables ⟦reþ⟧ and ⟦eþ⟧ together gives ⟦reþeþ⟧, not ⟦reteþ⟧ (although appending the stem ⟦reþ⟧ to the suffix ⟦eþ⟧ does give ⟦reteþ⟧).

Stem fusion

In Ŋarâþ Crîþ, a stem consists of one or more syllables followed by an onset. In addition, the final onset of a stem must not contain a lenited consonant.

Stem fusion describes a set of related processes on a stem. Stem fusion with a null consonant turns a stem into a word (with a terminal end). Stem fusion with a non-null consonant combines a stem with one of ⟦t⟧, ⟦n⟧, or ⟦þ⟧ into another stem.

To describe stem fusion, we use the following notation:

𝒮xy is the set of all morphemes with start type x and end type y, with x and y being one of s (syllabic), g (glide), o (onset), n (nuclear), or ω (terminal).
- 𝒮xyn is the subset of 𝒮xy whose elements undergo n cycles from x back to x.
  - Ex: $𝒮_{x x}^{0}$ contains only the empty string for all boundary types $x$ .
  - Ex: $𝒮_{s g}^{0}$ is the set of all initials.
  - Ex: $𝒮_{n o}^{1}$ includes ⟦stafc⟧ but not ⟦þþj⟧ or ⟦tatag⟧.
Given α∈𝒮xy and β∈𝒮yz, α:β∈𝒮xz is the result of appending α and β, performing repair processes as necessary.
- Ex: if $α = 𝖿𝖾𝗏𝖺 \in 𝒮_{s s}$ and $β = 𝗏𝖾 \in 𝒮_{s s}$ , then $α : β = 𝖿𝖾𝗇𝖺𝗏𝖾 \in 𝒮_{s s}$ .
- This operation is also defined for $α \in 𝒮_{x o}$ and $β \in 𝒮_{g z}$ , in which case the glides at the end of $α$ and the start of $β$ are merged.
- $α β$ is the result of appending $α$ and $β$ without performing any repair processes.
$Ι = 𝒮_{s g}^{0}$ is the set of all initials.
$Μ = 𝒮_{g o}^{0} = {ε_{Μ}, 𝗃}$ is the set of all glides.
$Ν = 𝒮_{o n}^{0}$ is the set of all vowels.
$Κ = 𝒮_{n s}^{0}$ is the set of all simple codas.
$Ω = 𝒮_{n ω}^{0}$ is the set of all codas, simple or complex.
$Γ = 𝒮_{n g}^{0}$ is the set of all coda–onset pairs. The glide is not included because stems ending in ⟦j⟧ are treated specially in stem fusion.
$Π$ is the set of effective plosives and fricatives – that is, the set of consonants that can form an initial when followed by ⟦l⟧ or ⟦r⟧.
$Τ \subset 𝒮_{s o} ∖ 𝒮_{s o}^{0}$ is the set of valid stems. A stem must contain at least one syllable, and its final onset must not contain a lenited consonant.

Given a stem $τ \in Τ$ , $τ^{ε} \in 𝒮_{s ω}$ is the result of fusing $τ$ with a null consonant, and $τ^{θ} \in 𝒮_{s o}$ is the result of fusing $τ$ with a non-null consonant $θ \in {𝗇, 𝗍, þ}$ .

The following variables are used: $Σ_{x y} \in 𝒮_{x y}$ , $γ \in Γ$ , $ι \in Ι$ , $ν \in Ν$ , $κ \in Κ$ , $ω \in Ω$ , $θ \in {𝗇, 𝗍, þ}$ .

Earlier rules take precedence over later ones. In addition, we use a shorthand for rules that yield a common sequence of syllables regardless of the fusion consonant: a rule such as $τ ⇝ σ$ , where $τ \in Τ$ and $σ \in 𝒮_{s s}$ , is interpreted as the rules $τ^{ε} = σ_{s ω}$ and $τ^{θ} = σ : θ$ .

Another shorthand used in this document is $τ ↷ τ^{'}$ , which implies $τ^{ε} = (τ^{'})^{ε}$ and $τ^{θ} = (τ^{'})^{θ}$ .

TODO: The rules for stem fusion are not finalized and therefore are temporarily omitted.

Properties of stem fusion

Fusion with ⟦t⟧ is invariant (i.e. yields the same stem as the original) only when the final onset of the stem is ⟦t-⟧.

Fusion with ⟦n⟧ is invariant only when the final bridge of the stem is ⟦-nn-⟧.

Fusion with ⟦þ⟧ is invariant only when the final onset of the stem is ⟦þ-⟧ or ⟦cþ-⟧.

Layer 2s

TODO: deal with complex codas before a clitic boundary

Traditionally, only manifested grapheme phrases are considered to be significant in the conversion from layer 1 to layer 2s. However, other graphemes such as punctuation can affect prosody.

MGPs	IPA	MGPs	IPA
c	k	p	p
e	e	t	t
n nd	n	č	t͡ʂ
ŋ ŋg	ŋ	î	ì
v m· vp	v	j	j
o	o	i	i
s	s	d dt	d
þ t·	θ	ð d· ðþ	ð
š č·	ʂ	h c·	x
r	ɹ	ħ g·	ʕ
l lł	l	ê	è
ł	ɬ	ô	ò
m mp	m	â	à
a	a	u	u̜
f p·	f	f· v· ð·	∅
g gc	ɡ

Table 5: Layer 1 to layer 2s conversions.

Layer 2 has a two-way tone contrast between vowels: the high tone (H) is the default, being contrasted with the low tone (L). For historical reasons, the presence or absence of a low tone on a vowel is called [±creaky].

Layer 3s

The conversion from layer 2s to layer 3s is comparatively more complex.

First, the following changes are made:

kθ → x͡θ
ʕ → ħ / V[+creaky] _
n → m / _ C[+labial]
n → ɱ / _ C[+labiodental]
n → n̪ / _ C[+dental]
n → ɳ / _ C[+retroflex]
n C₁[+velar] → ɲ C₁[+palatal]
n → ŋ / _ C[+lateral] V[+front]
sʂ → ʂː
C₁={ɹ, ɬ} → w / C₁V _
l → ɾ / V[+back] _ V
θ → θ̠ / s_, _s, _ʂ
ʂj → ʃ
ʂ → ʃ / _ i
t͡ʂj → t͡ʃ
t͡ʂ → t͡ʃ / _ i
C₁[+voiced] → C₁[−voiced, −aspirated] / C₂[−voiced]

Plosives in a coda are unreleased. All unvoiced plosives and affricates outside of a coda are aspirated.

While Ŋarâþ Crîþ has two tone levels phonemically, their realizations in the phonetic level is more complex. It is common to describe phonetic tone using seven levels, from 0 (the lowest) to 6 (the highest). Each syllable has one or more tones.

In order to describe tone, we must introduce the concept of “stress”, which is placed according to the following rules:

Syllables with a high tone have a priority over syllables with a low tone – that is, a syllable with a low tone will be selected only if the word in question has only low-tone syllables.
If the coda of the final syllable is either empty, or it consits of only [s] or [n], then the syllables are chosen in the order 2nd-to-last → 3rd-to-last → last → 4th-to-last → … → first.
If the coda of the final syllable is a complex coda, then the syllables are chosen in the order last → 3rd-to-last → 2nd-to-last → 4th-to-last → … → first.
If the coda is anything else, then the syllables are chosen from end to start: last → 2nd-to-last → 3rd-to-last → … → first.
Monosyllabic function words generally lack any stressed syllable.

We also introduce the concept of a tone accounting unit (TAU), which is the level at which tones are realized. That is, the tone of a syllable depends only on the contents of the TAU in which it lies. Instances of content words occupy different TAUs from each other, but some function words occupy the same TAU as the preceding or following word (in particular, such words have no stressed syllable and are confined to a relatively fixed position):

Head particles, nominalized verb particles, and monosyllabic determiners occupy the same TAU as the following word.
⟨so⟩, monosyllabic relationals ... occupy the same TAU as the preceding word.

(Stress is accounted by orthographic word, not by TAU.)

First, two adjacent vowels are fused into a diphthong if the vowels are not identical, the first vowel is stressed, the second vowel is [i] or [u̜], and the syllable to which the second vowel belongs can be interpreted as having an empty coda. For purposes of tonekeeping, a diphthong is considered to be composed of two different syllables.

In general, unstressed H and L syllables have tone levels 4 and 2, respectively; stressed H and L syllables have tone levels 5 and 1. However, an open H or L syllable before a stressed syllable gets level 3 or 1, respectively, instead. Diphthongs get different values: 65 for HH, 53 for HL, 13 for LH, and 21 for LL.

If two adjacent copies of an identical vowel have the same tone level at this stage, then the one closer to the stressed syllable rises by one tone level and the one farther from it falls by one level.

A tone level of n is then changed into a tone contour in the following situations, unless doing so would result in an out-of-bounds tone level:

n to (n : n + 1): when the coda is [st] or [x͡θ]
n to (n : n − 1): when the coda is [rθ] or [ns]
n to (n + 1 : n): when the nucleus is preceded by two or more voiceless consonants

In addition, other syllables change their tone levels:

Raise the tone level by 1 (if it is not already 6) if the coda is a voiceless fricative, or if the coda is [x͡θ].
Lower the tone level by 1 if the coda is [ɹ].
Lower the tone level by 1 if the coda is a nasal followed by a voiced obstruent or nasal.

Finally, if all tones have a level of 4 or higher, then the lowest tone (breaking ties by preferring later tones) is lowered to 3, and all other tones in the same syllable are lowered by the same amount. All level-3 tones are then lowered to level 2.

Isochrony

The isochrony of Ŋarâþ Crîþ falls somewhere between syllable and mora timing, where:

The body of a syllable is always 1 unit long.
The coda of a syllable is between 0 and 1 unit long, with the hierarchy /t, k < n < l, ɹ < f, s, θ, ɹθ, kθ < st, lt, ns, ls, nθ/.
Codas are shortened after two consecutive vowels: for instance, the ⟨l⟩ in ⟨moriel⟩ is pronounced for less time than that in ⟨mjarel⟩.

Mutations

Ŋarâþ Crîþ has two kinds of initial mutations: lenition and eclipsis. Neither kind of mutation has any effect on plosive-fricative onsets or any of ⟦r l n ŋ ħ⟧.

Lenition tends to turn plosives into fricatives and is indicated with a middle dot ⟦·⟧ after the consonant affected. In particular, it affects ⟦p t d č c g m f v ð⟧. (See Layer 2 for pronunciation details.) Partial lenition does not affect any of ⟦f v ð⟧; that is, it does not lenite consonants that would become silent. Unless otherwise qualified, lenition refers to total lenition, which affects ⟦f v ð⟧.

In a word containing ⟦&⟧, both instances of the reduplicated prefix are lenited. For example, ⟨&d·enfo⟩ can be pronounced as [ðeðenfo] but not as *[ðedenfo].

Lenition occurs in the following environments:

On the stem in abessive forms of nouns in paradigms 7, 8, 9, 10, and 13
On a noun modified by ⟨šinen⟩ or ⟨nemen⟩ when used as determiners, if that noun is not a form of ⟨ðên⟩
Partially, on a noun modified by ⟨ruf⟩ not immediately following it
Partially, on a noun modified by ⟨mê⟩ immediately preceding it
On a terrestrial noun modified by a participle-form verb belonging to a Type I genus
To a dative-case nominalized verb phrase as explained in Nominalized forms
Partially, on a verb when receiving the comparative prefixes ⟦mir-⟧ or ⟦ła-⟧
On a classifier attached to the numeral ⟨ces⟩ or any numeral ending in ⟨ħas⟩ or ⟨sreþas⟩
On the second item of a compound noun, if it is neither terrestrial nor a form of ⟨vês⟩
On a verb with the cessative prefix ⟦car-⟧ or the terminative prefix ⟦er-⟧

Eclipsis tends to add voice to voiceless consonants and change voiced stops into nasals. It is indicated by prefixing a consonant: ⟦t d c g f þ ł⟧ become ⟦dt nd gc ŋg vf ðþ lł⟧, respectively. ⟦p⟧ becomes ⟦vp⟧ before any of ⟦i e u î ê⟧ and ⟦mp⟧ elsewhere. If a word starts with a vowel, then it is eclipsed by prefixing ⟦g⟧.

In a word containing ⟦&⟧, only the first instance of the reduplicated prefix is eclipsed. For example, ⟨n&denfin⟩ can be pronounced as [nedenfin] but not as *[nenenfin].

Eclipsis occurs in the following environments:

On the genitive dual, plural, and singulative forms of nouns
On a noun modified by ⟨lê⟩ or ⟨tê⟩ immediately preceding it
On a noun modified by ⟨dân⟩
On a finite form of a verb or relational with perfective aspect
To a locative, instrumental, or abessive-case nominalized verb phrase that is not an object of a modifying relational, as explained in Nominalized forms
On a short numeral modified by ⟨ceþe⟩

Lenition can happen on any syllabic onset of a word, but eclipsis is limited to word-initial positions.

In this documentation, lenition is sometimes marked with an empty circle ○, and eclipsis with an filled circle ●. Partial lenition is marked with an empty triangle △.

Loanwords

Almost all loanwords in Ŋarâþ Crîþ are nouns.

Generally, when borrowing from languages that use the Cenvos script or a script related to it, and whose orthographies in the script in question do not deviate too far from Ŋarâþ Crîþ usage, Ŋarâþ Crîþ prefers to borrow the word graphemically than phonemically.

The typography of Ŋarâþ Crîþ

TODO: This section has not changed and is therefore omitted.