Truncation

In many cases, inflection involves the addition of prefixes or suffixes to a stem. It follows that inverting this process would involve the removal of these affixes. However, concatenation often results in phonological changes to one or both of the stem and the affix. For example, I meet you in Ŋarâþ Crîþ would be derived as cenv-a-ve, but the co-occurrence of v in the resulting word is considered undesirable (oginiþe cfarðerþ) and is resolved by replacing the first v with an n. Thus, the resulting word is not **cenvave but rather cennave.

Let $Σ$ be an alphabet and $f : Σ^{*} \times Σ^{*} \to Σ^{*} \times Σ^{*}$ be a function describing concatenation rules transforming pairs of strings $(a, b)$ . Generally, $f$ can be seen as the composition of some number of individual ‘rules’ $f_{0}, f_{1}, \dots, f_{n - 1} : Σ^{*} \times Σ^{*} \to Σ^{*} \times Σ^{*}$ .

Then let $a \overset{f}{\sim} b$ be the concatenation of $a$ and $b$ under $f$ , or $f$ -concatenation, defined as

$\begin{array}{lrr} a \overset{f}{\sim} b = a^{'} \cdot b^{'} where (a^{'}, b^{'}) = f (a, b) \end{array}$

Given $b, w \in Σ^{*}$ , we wish to find all $a \in Σ^{*}$ such that $a \overset{f}{\sim} b = w$ . If we can compute the preimage of a set over $f$ , then this is simply

$\begin{array}{lrr} A = ⋃_{i = 0}^{| w |} {α | (α^{'}, β^{'}) \in f^{\leftarrow} ({(w [0 . . i], w [i . . | w |])}) \land β^{'} = b} \end{array}$

That is, we do the following:

Split $w$ into $(α, β)$ in all possible ways.
Find all $(α^{'}, β^{'})$ such that $f (α^{'}, β^{'}) = (α, β)$ .
Collect all such $α^{'}$ where $β^{'} = b$ .

An analogous problem and its solution can be stated for removing prefixes.

This algorithm is quite general, and certain properties of $f$ admit simpler algorithms. (The solution is trivial if $f$ is the identity function.)

If for all $a, b$ and $(a^{'}, b^{'}) = f (a, b)$ , the strings $b$ and $b^{'}$ have the same length, then $f$ is considered to be right-isometric. In this case, we do not have to try all combinations of $(α, β)$ but rather only the one in which $| β | = | b |$ .

More generally, let

$\begin{array}{lrr} D_{f} (b) = {| b^{'} | | a \in Σ^{*}, (a^{'}, b^{'}) = f (a, b)} \end{array}$

be the set of possible lengths that $f$ could transform $b$ to become. Then we only have to try combinations of $(α, β)$ where $| β | \in D_{f} (b)$ .

A stronger property that right isometry is right invariance, which requires $b = b^{'}$ ; in other words, $f$ is not allowed to change the second string. If $f$ is right-invariant, then we can return an empty set if $b$ is not a suffix of $w$ .

Additionally, concatenation rules in human languages rarely change segments far away from the junction. For $a, b \in Σ^{*}$ and $(a^{'}, b^{'}) = f (a, b)$ , let $s$ be the longest common suffix of $b$ and $b^{'}$ , leaving $r$ and $r^{'}$ before it. Then the forward dextral radius of influence of $(a, b)$ with respect to $f$ is $R_{d} (f; a, b) = | r |$ and the backward dextral radius of influence is $R_{d}^{'} (f; a, b) = | r^{'} |$ . We can also define these radii for the function itself:

$\begin{array}{lrlr} R_{d} (f) & = \sup_{a, b \in Σ^{*}} R_{d} (f; a, b) \\ R_{d}^{'} (f) & = \sup_{a, b \in Σ^{*}} R_{d}^{'} (f; a, b) \end{array}$

Note that for right-isometric $f$ , $R_{d} (f) = R_{d}^{'} (f)$ .

Given knowledge of these values, suppose that for a given $(α, β)$ , we compare $b$ and $β$ , aligning them at their ends. Then if $b$ and $β$ differ in the characters at indices $| b | - i$ and $| β | - i$ , respectively, and if $| b | - i < R_{d} (f)$ or $| β | - i < R_{d}^{'} (f)$ , then we can conclude that $f^{\leftarrow} (α, β)$ does not contain any pairs $(α^{'}, β^{'})$ such that $β^{'} = b$ and can thus discard the pair $(α, β)$ .

Unfortunately, Ŋarâþ Crîþ v9e’s concatenation rules have a dextral radius of influence of 5, while many of its affixes are shorter than 5 assemblage units long. For that reason, radius-of-influence simplifications are unlikely to be useful for Ŋarâþ Crîþ.

Examples of concatenation rules

The identity function is right-invariant and has a dextral radius of influence of 0.

Let $m$ and $n$ be nonnegative integers and $g \in Σ^{m} \times Σ^{n} \to Σ^{*} \times Σ^{*}$ . Then denote by ${peephole}_{m, n} (g)$ the function $f$ such that

$\begin{array}{lrlr} f (a, b) & = {\begin{matrix} (a [. . | a | - m] \cdot a^{'}, b^{'} \cdot b [n . .]) & i f | a | \geq m, | b | \geq n, (a^{'}, b^{'}) = g (a [| a | - m . .], b [. . n]) \\ (a, b) & i f | a | < m o r | b | < n \end{matrix} \end{array}$

In effect, ${peephole}_{m, n} (g)$ is a function that affects only the $m + n$ characters around the juncture.

Then the following statements are true:

$\begin{array}{lrlr} D_{f} (b) & = {l + (| b | - n) | l \in D_{g} (b [. . n])} i f | b | \geq n \\ R_{d} (f) & = n \\ R_{d}^{'} (f) & = \max_{a, b} | b^{'} | w h e r e (a^{'}, b^{'}) = g (a, b) \end{array}$

In particular, $f$ is right-isometric if $g$ is.

Often, we wish to replace one substring with another if a juncture occurs anywhere within the substring. For instance, the rule replacing ‘cat’ with ‘dog’ can be expressed as the composition of two functions ${peephole}_{2,1} (g_{2}) \circ {peephole}_{1,2} (g_{1})$ where

$\begin{array}{lrlr} g_{1} (a, b) & = {\begin{matrix} (d, og) & i f a = c, b = at \\ (a, b) & otherwise \end{matrix} \\ g_{2} (a, b) & = {\begin{matrix} (do, g) & i f a = ca, b = t \\ (a, b) & otherwise \end{matrix} \end{array}$

Suppose that we have a function $h \in Σ^{k} \to Σ^{k}$ . If $f = {subst}_{k} (h)$ is such a substitution function, then it can be expressed as a composition of peephole functions $f_{n - 1} \circ \dots \circ f_{1}$ where

$\begin{array}{lrlr} f_{i} & = {peephole}_{i, k - i} (g_{i}) \\ g_{i} (a, b) & = (s^{'} [. . i], s^{'} [i . .]) w h e r e s^{'} = h (a \cdot b) \end{array}$

${subst}_{k} (h)$ is right-isometric and has a dextral radius of influence of $k - 1$ .

We have assumed that the function $h$ preserves the length of the substring. We can generalize $subst$ to account for functions that change the length of the input, but we must be careful about where the new juncture is placed.

Generalizations

In practice, we often want to check a word $w$ against multiple suffixes from a fixed set $B \subseteq Σ^{*}$ . This problem can be solved similarly to the single-suffix case by matching any of $(α^{'}, β^{'}) \in f^{\leftarrow} (\dots)$ where $β \in B$ . This problem might be simplifiable depending on the properties of $f$ . If $f$ is right-invariant, for instance, then it is possible to use a trie containing the reversed elements of $B$ .

Often, we do not want to find all elements of $A$ but rather its intersection with a ‘dictionary set’ $K_{b}$ .

Concatenation on regular systems

The notion of truncation can be generalized to be over any pair of formal languages. In this case, the concatenation rules function may have a different codomain from its domain. Usually, we are interested in languages of a regular system $Λ$ in which the end state of the first string is equal to the start state of the second string. We define an extended concatenative system $Φ$ over $Λ$ as a pair of functions $(κ, ϕ)$ where:

$κ : Q \times Q \times Q \to Q$ is a function that takes in three states $(p, q, r)$ and outputs a new middle state $q^{'}$ , and
$ϕ : ⋃_{p, q, r \in Q} [{(p, q, r)} \to Λ (p, q) \times Λ (q, r) \to Λ (p, κ (p, q, r)) \times Λ (κ (p, q, r), r)]$ returns for every triple of states $(p, q, r)$ a concatenation rule function for two strings, possibly changing the middle state according to $κ$ .

For $a \in Λ (p, q)$ and $b \in Λ (q, r)$ , the extended concatenation $a \underset{p, q, r}{\overset{Φ}{\sim}} b = a \overset{ϕ (p, q, r)}{\sim} b$ is an element of $Λ (p, r)$ . If no ambiguity would arise, we omit the state names from the operator and simply write $a \overset{Φ}{\sim} b$ .

$N$ -ary concatenation

We have looked at binary concatenation; concatenation of more than two operands is often assumed to be left associative. In other words, concatenating $(a, b, c)$ concatenates $a$ and $b$ first, then the result of that to $c$ .

Another possibility for concatenating multiple morphemes is to apply the juncture rules after all of the concatenations. This means that in this example, any juncture rules applied between $a$ and $b$ would have access to the contents of $c$ . This is more complex than repeated binary concatenation but has the advantage of being able to use word-global information (such as stress or syllable position within a word).

The concept of $N$ -ary concatenation itself does not specify the order in which juncture rules are applied to each juncture. For instance, if the juncture rules consist of three subprocesses A, B, and C, then given a word with three junctures labeled 1, 2, and 3 from start to end, then the rules could trigger in any of the following orders (among other possibilities):

Apply the subprocesses in sequence to each juncture from start to end: A1, B1, C1, A2, B2, C2, A3, B3, C3
Apply the subprocesses in sequence to each juncture from end to start: A3, B3, C3, A2, B2, C2, A1, B1, C1
Apply each subprocess to all junctures from start to end: A1, A2, A3, B1, B2, B3, C1, C2, C3
Apply subprocesses A and C from start to end but B from end to start: A1, A2, A3, B3, B2, B1, C1, C2, C3
Apply subprocess A to all junctures, then B and C to each juncture in sequence: A1, A2, A3, B1, C1, B2, C2, B3, C3

In exchange for this flexibility, $N$ -ary concatenation has the disadvantage that truncation requires searching a larger space for possible juncture placements.

Case study: Ŋarâþ Crîþ v9e

According to the Ŋarâþ Crîþ v9e grammar, concatenation consists of the following processes applied across the juncture:

Any new instances of ⟦j⟧ before ⟦i⟧, ⟦î⟧, or ⟦u⟧ are elided.
Deduplication rules are applied.
Newly formed bridges are canonicalized and repaired.

We assume that we are working with assemblages in a regular system. Therefore, while a substitution function can be regarded as a composition of multiple functions $f_{n - 1} \circ \dots \circ f_{1}$ , most of these functions will have no effect on concatenation at a given juncture state.

The first process, glide elision, is a right-invariant substitution function based on

$\begin{array}{lrlr} h (((μ, g, o), (ν, o, n))) & = {\begin{matrix} ((ε, g, o), (ν, o, n)) & i f ν \in {i, \hat{ı}, u} \\ ((μ, g, o), (ν, o, n)) & otherwise \end{matrix} \end{array}$

whose preimage is straightforward to compute.

The deduplication rules, which resolve instances of oginiþe cfarðerþ, work as follows:

The onset ⟦f⟧ or ⟦tf⟧ followed by a non-hatted vowel then ⟦f⟧ or ⟦p·⟧ is replaced with ⟦t⟧.
The onset ⟦þ⟧ or ⟦cþ⟧ followed by a non-hatted vowel then ⟦þ⟧ or ⟦t·⟧ is replaced with ⟦t⟧. In addition, a preceding ⟦þ⟧ or ⟦cþ⟧ coda is replaced with ⟦s⟧, and a preceding ⟦rþ⟧ coda is replaced with ⟦r⟧.
⟦h⟧ followed by a non-hatted vowel then ⟦h⟧ or ⟦c·⟧ is replaced with ⟦p⟧.
⟦v⟧ followed by a non-hatted vowel then ⟦v⟧ or ⟦m·⟧ is replaced with ⟦n⟧.
⟦ð⟧ followed by a non-hatted vowel then ⟦ð⟧ or ⟦d·⟧ is replaced with ⟦ŋ⟧.
⟦ħ⟧ followed by a non-hatted vowel then ⟦ħ⟧ or ⟦g·⟧ is replaced with ⟦g⟧.

In f9i, these rules are implemented twice: once for the case when the vowel in question is followed by a nonterminal coda (thus capturing the following initial), and once for the case when it is followed by a terminal coda (in which case only rules #1 and #2 are applicable). In both cases, the preceding coda is captured if available.

This process is more involved than glide elision, but its preimage is not too difficult to compute, and the process is right-isometric.

The assemblage form makes these changes difficult: the first consonant following a vowel might belong to the coda of the same syllable or to the initial of the following syllable, and it may be part of a complex coda or initial. It also complicates situations in which changing a letter requires the word to be syllabified differently; for this problem, v9e simply chooses to apply bridge resolution after deduplication, although this has the disadvantage of failing to remove some cases of oginiþe cfarðerþ.

Ŋarâþ Crîþ v9e’s deduplication rules are quite crude, prompting ad-hoc workarounds to be made in specific instances of inflection. There are plans in Project Shiva to expand the range of oginiþe cfarðerþ and thus the scope of deduplication, as well as a desire to avoid changing the initial consonant of a word. An even more challenging problem is propagation: if deduplication changes a consonant such that a new instance of oginiþe cfarðerþ arises, then additional invocations of deduplication might be required to resolve it.

The final step of concatenation is bridge resolution, which modifies awkward coda–initial pairs to more convenient ones. This process is also used for canonicalizing these pairs according to the maximal onset principle.

Ŋarâþ Crîþ v9e’s version of this process is complicated by the fact that although v7 allowed ŋ as a coda, v9 does not. When ŋ appeared as a coda, it was changed into r, modifying the preceding vowel. -aŋ, -oŋ, and -uŋ were changed to -or, and -eŋ and -iŋ were changed to -jor. Reflecting this change, bridge resolution first outputs either a true coda or a pseudo-coda of ŋ, subsequently resolving the latter case by applying this change to the vowel. That is, a final of -or might have arisen from one of -or, -aŋ, -oŋ, or -uŋ. Likewise, a final of -jor might have arisen from one of -jor, -jaŋ, -jeŋ, -joŋ, -eŋ, or -iŋ.

Since the number of possible bridges is relatively small, tabulation can be used to implement the preimage of the first step.

Because bridge resolution is right-isometric, Ŋarâþ Crîþ v9e’s concatenation rules as a whole are as well.