Project Elaine
Reforming Ŋarâþ Crîþ morphophonology.
The current state of Ŋarâþ Crîþ morphophonology
The phonotactics of Ŋarâþ Crîþ distinguish between simple and complex codas, where complex codas may occur only at the end of a word. If morphology causes a complex coda to occur word-medially (such as by adding a suffix), and the bridge containing this coda cannot be reinterpreted as one without a complex coda, then a set of coda simplification rules takes place to remove the word-medial complex coda.
During Project Caladrius, it has been observed that some bridges, such as ⟦-cð-⟧ and ⟦-fp-⟧ rarely occur, to the degree that stem fusion with stems ending in such bridges are difficult to define. During Ŋarâþ Crîþ Corner #0x00, I have proposed the idea of restricting the set of valid bridges to a subset of all coda–onset combinations. To find what bridges are worth saving, I decided to analyze frequencies of bridges in a corpus of ŊCv9 text, but this has not yet provided much insight.
Originally, Project Caladrius was meant to revise the noun declension system, but the proposal of an operation that would make a stem have a valid coda such that a suffix beginning with a consonant could be attached to it, and the refinement of the idea into the current notion of stem fusion, has expanded the scope of the project and made me question if the current morphophonology of Ŋarâþ Crîþ was adequate.
Burning bridges
If a statistical analysis is not insightful, then reasoning from first principles to find which bridges should be valid might be. Here are the first three that come up in my mind:
- Lenition in the onset does not affect the validity of a bridge.
- The presence of ⟦j⟧ in the onset does not affect the validity of a bridge.
- All bridges that can be interpreted as having a coda that is null, ⟦-r⟧, or ⟦-l⟧ are valid.
That is, if ⟦-lt-⟧ is valid as an onset, then ⟦-lt·-⟧, ⟦-ltj-⟧, and ⟦-lt·j-⟧ are valid as well. Note that this does not say anything about the validity of the homophonous ⟦-lþ-⟧. Indeed, there might exist invalid bridges that are homophonous with valid bridges.
After that, there are two ways to approach the problem:
- The synchronic approach: find out which bridges don’t sound too awkward and keep those.
- The diachronic approach: assume that all coda–onset combinations were once valid bridges, then use sound changes to narrow the set. Has the advantage of informing us how illegal bridges created by affixation are treated.
Odds and ends
Project Caladrius has formalized the notion of a stem – namely, as a sequence of one or more syllables plus an onset. That is, the beginning of the stem marks a syllable boundary, while the end of a stem marks the boundary between an onset and the following nucleus.
Some operations are possible on some types of ends but not others. For instance, ending a word or adding a suffix such as ⟦-ten⟧ is not possible on an onset end, but adding a suffix such as ⟦-a⟧ is. (In this case, stem fusion is supposed to provide equivalents of the first two operations on an onset end.)
Affixes also have starts and ends. Prefixes always have syllabic starts, and suffixes always have syllabic ends. A sequence of letters making an affix might be compatible with multiple types of starts or ends; for instance, ⟦-a⟧ may have either a syllabic or onset start, depending on the context.
Additionally, there is another type of end called a terminal end, to which morphemes with syllabic starts cannot be attached. This end is produced by the presence of a complex coda at the end - that is, sword-medial complex codas are never produced in the first place.
Detailed rules for valid bridges
We reproduce the three initial rules proposed here, with some elaborations:
- Lenition in the onset does not affect the validity of a bridge. (This rule, however, does not establish a further relationship between the results of resolving two bridges that differ only by lenition in the onset. It also does not prevent lenition from affecting canonicality.)
- The presence of ⟦j⟧ in the onset does not affect the validity of a bridge.
- All bridges that can be interpreted as having a coda that is null, ⟦-r⟧, or ⟦-l⟧ are valid.
- New: If a bridge with a complex initial I is valid, then the bridge with an initial containing only the first consonant of I is also valid.
The following subsections describe situations in which bridges are invalid and must be repaired during concatenation. A bridge that is repaired might need further repair by later rules.
Importantly, bridge repair might cause earlier segments to change, and it is not idempotent: ⟦-sð-⟧ is repaired to ⟦-ss-⟧, but ⟦-ss-⟧ is repaired to ⟦-þ-⟧.
Also note that ⟦-cþ⟧ is now promoted to a simple coda.
Coalescence of ⟦-tš-⟧
The bridge ⟦-tš-⟧ is changed to ⟦-č-⟧.
Fortition of ⟦h-⟧ and ⟦ħ-⟧
The onset ⟦h-⟧ is fortited to ⟦c-⟧ after ⟦-s⟧, ⟦-þ⟧, ⟦-rþ⟧, ⟦-t⟧, ⟦-c⟧, ⟦-f⟧, or ⟦-cþ⟧. ⟦hr-⟧ and ⟦hl-⟧ are fortited analogously.
The onset ⟦ħ-⟧ is fortited to ⟦g-⟧ after ⟦-t⟧, ⟦-c⟧, and ⟦-f⟧. ⟦ħr-⟧ and ⟦ħl-⟧ are fortited analogously.
Metathesis of ⟦t⟧ before ⟦c⟧ or ⟦g⟧
⟦-tc-⟧ and ⟦-tg-⟧ are metathesized to ⟦-ct-⟧ and ⟦-cd-⟧, respectively. Likewise, ⟦-tcr-⟧, ⟦-tcl-⟧, ⟦-tgr-⟧, and ⟦-tgl-⟧ are metathesized to ⟦-ctr-⟧, ⟦-ctl-⟧, ⟦-cdr-⟧, and ⟦-cdl-⟧.
Similar bridges with lenited onsets, such as ⟦-tc·-⟧ and ⟦-tg·r-⟧ are treated analogously, with the resulting onset remaining lenited.
⟦t⟧ is deleted before ⟦cf⟧, ⟦cþ⟧, ⟦cš⟧, ⟦cš⟧, ⟦gv⟧, and ⟦gð⟧, devoicing the last two of these.
Nasal assimilation
For these rules, ⟦m·⟧ is counted as a nasal, even though it is pronounced as a fricative.
⟦-t⟧ before a nasal onset assimilates to ⟦-n⟧.
⟦-c⟧ before a nasal onset assimilates to ⟦-ŋ⟧. This is further corrected: ⟦-aŋ⟧, ⟦-oŋ⟧, and ⟦-uŋ⟧ to ⟦-or⟧, and ⟦-eŋ⟧ and ⟦-iŋ⟧ to ⟦-jor⟧ (coalescing the ⟦j⟧ if necessary), and analogously with the hatted vowels. As a special case, ⟦-cŋ-⟧ is repaired to ⟦-ŋ-⟧ instead.
Denasalization of ⟦ŋ-⟧
After ⟦-s⟧, ⟦-þ⟧, ⟦-rþ⟧, ⟦-f⟧, and ⟦cþ⟧, ⟦ŋ-⟧ is denasalized to ⟦g-⟧.
Devoicing of ⟦v-⟧ and ⟦ð-⟧
After ⟦-þ⟧, ⟦-rþ⟧, ⟦-t⟧, ⟦-c⟧, ⟦-f⟧, and ⟦-cþ⟧, ⟦v-⟧ devoices to ⟦f-⟧ and ⟦ð-⟧ devoices to ⟦þ-⟧. Additionally, ⟦ð-⟧ is devoiced after ⟦-s⟧.
This process occurs analogously for the onsets ⟦vr-⟧, ⟦vl-⟧, ⟦ðr-⟧, and ⟦ðl-⟧, except that ⟦-þ⟧ is deleted before ⟦vr-⟧ and ⟦vl-⟧ instead. Additionally, ⟦-rþCR-⟧ onsets (with R = ⟦r⟧ or ⟦l⟧ and C = ⟦v⟧ or ⟦ð⟧) are corrected to ⟦-RC-⟧.
As usual, similar rules apply to lenited onsets: ⟦v·⟧ devoices to ⟦f·⟧, and ⟦ð·⟧ is replaced with a copy of the preceding consonant.
Assimilation of ⟦s⟧ after ⟦þ⟧
After a ⟦þ⟧, ⟦s⟧ is replaced with ⟦þ⟧. Additionally, ⟦ss⟧ is coalesced into ⟦þ⟧, unless it is not followed by a consonant and the latter ⟦s⟧ arose from a ⟦ð⟧ in the previous step.
Degemination before another consonant
⟦þ⟧, ⟦t⟧, ⟦c⟧, and ⟦f⟧ are degeminated before another consonant in the onset; for instance, ⟦-ffr-⟧ is corrected to ⟦-fr-⟧, and ⟦-ccs-⟧ is corrected to ⟦-cs-⟧. ⟦td⟧ is degeminated to ⟦d⟧, and ⟦cg⟧ is degeminated to ⟦g⟧.
This rule also applies when the second instance of the degeminated consonant is lenited, in which case the first copy of the consonant is elided: ⟦-tt·l-⟧ → ⟦-t·l-⟧.
Partial coda elision of bridges with ⟦-rþ⟧ and ⟦-cþ⟧ codas
If the coda is ⟦-rþ⟧, then it becomes ⟦-r⟧ before a fricative followed by ⟦r⟧ or ⟦l⟧, or before the onsets ⟦cf-⟧, ⟦cþ-⟧, ⟦cs-⟧, ⟦cš-⟧, ⟦tf-⟧, and ⟦dv-⟧. Before any other two-letter onset, it becomes ⟦-þ⟧.
If the coda is ⟦-cþ⟧, then it is maintained before the onsets ⟦þ-⟧, ⟦š-⟧, ⟦m·-⟧, ⟦t-⟧, ⟦ħ-⟧, ⟦m·-⟧, or ⟦t·-⟧. Before ⟦cf-⟧, ⟦cþ-⟧, ⟦cs-⟧, ⟦cš-⟧, or ⟦tf-⟧, or before any of the onsets consisting of ⟦þ⟧, ⟦š⟧, or ⟦ħ⟧ followed by ⟦r⟧ or ⟦l⟧, the onset loses its first consonant, and ⟦cs-⟧ additionally becomes ⟦þ-⟧. In all other cases, the coda becomes ⟦-þ⟧.
A revised view of the layer-0-to-1 pipeline
Morphemes are no longer viewed as raw MGP sequences; rather, they are viewed structurally. Some constraints apply to all morphemes:
- all codas that do not end a word must be simple
- all bridges must be valid
- there are no instances of ⟦ji⟧, ⟦jî⟧, or ⟦ju⟧
When morphemes are joined together, repair processes must maintain these invariants at the join boundary. Morphemes that end in complex codas are simply forbidden from having anything appended to them. The second invariant is maintained by applying bridge correction when two morphemes are joined at a syllable boundary, and the third invariant is maintained by eliding the ⟦j⟧ if necessary when joining at an onset boundary.
In addition, there are environments that may naturally occur within a morpheme but are repaired away when created by appending morphemes. For now, these include deduplication targets such as ⟦-vav-⟧ and ⟦-ðoð-⟧, which arise when joining morphemes at a glide, onset, or nuclear boundary.
Complex coda simplification will no longer need to occur from layer 0 to 1, since complex codas cannot have anything appended to them in the first place.
More formally, we adopt the following notation:
- is the set of all morphemes with start type and end type , with and being one of (syllabic), (glide), (onset), (nuclear), or (terminal).
- is the subset of whose elements undergo cycles from back to .
- Ex: contains only the empty string for all boundary types .
- Ex: is the set of all initials.
- Ex: includes ⟦stafc⟧ but not ⟦þþj⟧ or ⟦tatag⟧.
- is the subset of whose elements undergo cycles from back to .
- Given and , is the result of appending and , performing repair processes as necessary.
- Ex: if and , then .
- This operation is also defined for and , in which case the glides at the end of and the start of are merged.
- is the result of appending and without performing any repair processes.
- (the capital letter iota) is the set of all initials.
- (capital mu) is the set of all glides.
- (capital nu) is the set of all vowels.
- (capital kappa) is the set of all simple codas.
- is the set of all codas, simple or complex.
- is the set of all coda–onset pairs. The glide is not included because stems ending in ⟦j⟧ are treated specially in stem fusion.
- is the set of effective plosives and fricatives – that is, the set of consonants that can form an initial when followed by ⟦l⟧ or ⟦r⟧.
- is the set of valid stems. A stem must contain at least one syllable, and its final onset must not contain a lenited consonant.
Stem fusion
The difference from Project Caladrius is that the stem and the result are treated structurally instead of linearly. The notation is also different – given a stem :
- is the result of fusing with a null consonant.
- is the result of fusing with a non-null consonant .
The following variables are used: , , , , , , .
As usual, earlier rules take precedence over later ones. In addition, we use a shorthand for C-invariant rules: a rule such as , where and is interpreted as the rules and .
Another shorthand used in this document is , which implies and .
Stems ending in ⟦j⟧
From now on, any explicit instances of will be omitted.
Onset aliasing
Valid codas
Degemination
where is the number of manifested grapheme phrases in and
Vowel epenthesis
Nasal merging
is the ξ-transformation; i.e.
inverts the tone of a vowel; i.e.
Obstruent merging
where
Final devoicing
where
Stems ending in consonant–liquid onsets
Let and . Then .
where for any consonant and coda ,
and denotes the operation of taking the maximal prefix of a coda that is a simple coda.
Stems ending in ⟦r⟧ or ⟦l⟧
Stems ending in ⟦š⟧, ⟦ł⟧, or ⟦č⟧
where is defined as
Stems ending in ⟦c⟧ or ⟦g⟧
where is the set of codas that end in a voiceless consonant.
Stems ending in ⟦p⟧
Stems ending in ⟦h⟧
Stems ending in ⟦ħ⟧
Stems ending in ⟦ŋ⟧
Stems ending in any other onset with two consonants
Coda-based rules
NB: by this point, the only possible onsets at the end of the stem are ⟦n s þ f⟧.
- All two-consonant onsets have already been handled.
- ⟦c ŋ š r l ł g p č h ħ⟧ handled by their respective rules.
- ⟦v m d ð⟧ handled by final devoicing.
- ⟦t⟧ handled by onset aliasing for fusion with and by obstruent merging for fusion with .
By observation, the only possible codas in the final bridge at this point are ⟦s n þ rþ t f⟧.
- The empty coda is obviously eliminated, as all of ⟦n s þ f⟧ are valid simple codas.
- ⟦r⟧: ⟦rþ⟧ is a valid coda; other cases handled by (Epenthesis-LC)
- ⟦l⟧ handled by (Epenthesis-LC); in case of fusion with , ⟦lþ⟧ and ⟦lt⟧ are valid complex codas
- ⟦c⟧: ⟦-cn-⟧ is an invalid bridge in the first place; ⟦-cs-⟧, ⟦-cþ-⟧, and ⟦-cf-⟧ are interpreted as having an empty coda and a complex onset.
- ⟦cþ⟧: ⟦-cþn-⟧, ⟦-cþs-⟧, and ⟦-cþf-⟧ not valid. ⟦-cþþ-⟧ handled by obstruent merging.
Some codas are limited to certain onsets at this point:
- ⟦t⟧ can be followed only by ⟦s⟧: ⟦-tn-⟧ not valid, ⟦-tf-⟧ canonicalizes to a null coda, and ⟦-tþ-⟧ handled by obstruent merging.
- ⟦þ rþ⟧ followed only by ⟦n⟧: neither can precede ⟦s⟧. ⟦þþ rþþ⟧ handled by degemination or obstruent merging. ⟦þf rþf⟧ handled by obstruent merging.
- If fusing with a null consonant, ⟦n⟧ is followed only by ⟦f⟧: ⟦-ns⟧ and ⟦-nþ⟧ are already valid complex codas, ⟦-nn⟧ handled by degemination.
The bridge ⟦-ts-⟧
The codas ⟦-s⟧, ⟦-þ⟧, ⟦-rþ⟧, and ⟦-f⟧
The coda ⟦-n⟧
Properties of stem fusion
Fusion with ⟦t⟧ is invariant only when the final onset of the stem is ⟦t-⟧.
Fusion with ⟦n⟧ is invariant only when the final bridge of the stem is ⟦-nn-⟧.
Fusion with ⟦þ⟧ is invariant only when the final onset of the stem is ⟦þ-⟧ or ⟦cþ-⟧.
Future work
The first complete draft of stem fusion is specified and implemented in f9i, but it needs testing with real-world input to assess what needs to be fixed.
Obsolete rules
Rules too complicated for their own good.
Stems ending in ⟦f⟧, ⟦v⟧, or ⟦m⟧ (first pass)
This pass applies only to onsets ending in these consonants that are not preceded by a nonempty coda.
Stems ending in ⟦p⟧, ⟦t⟧, or ⟦c⟧
For :
where are defined as
Stems ending in ⟦d⟧ or ⟦g⟧
For :
where
Stems ending in ⟦f⟧, ⟦þ⟧, or ⟦s⟧
External links
- The elaine-sm branch of f9i for testing changes made by Project Elaine