ŋaren crîþa 9 vlefto: Ŋarâþ Crîþ v9

Introduction

Suppose that 𝒦 is the set of all possible lexical entries in a language, C is a set of category labels, and L is the set of all phonotactically valid words in the language. Then suppose f:𝒦×CL is the function that inflects words in the language.

We wish to not only be able to inflect a given lexical entry in a given category but also to perform the reverse task: given an inflected word form, find all lexical entries that it could be derived from, along with the category of the inflected form. That is, we wish to devise an algorithm that computes the preimage f({s}) for any sL.

If we are concerned only with finding matches in a fixed dictionary K𝒦, then we can store a mapping for all kK and cC from f(k,c) to (k,c). This method is conceptually simple and requires no structural knowledge of f, but it takes O(|K||C|) entries and requires C to be finite, making it space-intensive for highly inflecting languages.

Because Ŋarâþ Crîþ v9’s inflection rules have been complex, this brute-force method has been used for f9i. During the development of sp9 for Project Shiva, however, +merlan #flirora proposed investigating alternative approaches that would not require storing all inflected forms.

The compositionality of linguistic inflection

Many inflection paradigms are not perfectly fusional. That is, we can express C as a Cartesian product of other sets C0×C1××Cr1 and define a sequence of functions fi:Li×CiLi+1, where L0=𝒦 and Lr=L. Then f involves applying each fi in succession, passing each category label:

f(k,(c0,c1,,cr1))=fr1(f1(f0(k,c0),c1),,cr1)

This structure allows us to compute f({s}) by inverting each step of the inflection process in reverse order:

Sn={(s,)}Si1=(s,c)Si{(s,(c,c))|(s,c)fi1({s})}f({s})=S0

Additionally, many parts of speech can be categorized into multiple inflectional classes. In mathematical terms, 𝒦 can be partitioned into sets P0,P1,,Pπ1, each of which has a function fj.:Pj×CL=f|Pj×C that is ‘simpler’ to implement than f itself. Hence, f({s}) can be computed by attempting to match against each of the subsets Pj:

f({s})=j=0π1(fj.)({s})

Alternatively, we can partition the set C into sets C0,C1,Cγ1, with corresponding functions f.k:𝒦×CkL=f|𝒦×Ck, such that

f({s})=k=0γ1(f.k)({s})

Importantly, each of these decompositions produce a set of tuples (𝒦,C,L,f) which can be analyzed as a subproblem of the original problem and treated in the same way.

As an example, consider a language that has five classes for nominal inflections, which can be categorized into two broad groups, X and Y. This language also has cases in three groups:

Then we could decompose the problem (𝒦,C,L,f) first by partitioning C into CACBCC. Subsequently, (𝒦,CA,L,f.A) is decomposed by partitioning 𝒦 into P0P4, while (𝒦,CB,L,f.B) is decomposed by partitioning the same set as PXPY.

In contrast, (𝒦,CC,L,f.C) is decomposed by composition into (𝒦,C0C,L1,f0.C) (adding the case affix) and (L1,C1C,L,f1.C) (adding the number affix). We now have the following simpler problems:

(P0,CA,L,f0.A)(P1,CA,L,f1.A)(P2,CA,L,f2.A)(P3,CA,L,f3.A)(P4,CA,L,f4.A)(PX,CB,L,fX.B)(PY,CB,L,fY.B)(𝒦,C0C,L1,f0.C)(L1,C1C,L,f1.C)

This compositionality was the reason that we did not choose to define the set of possible categories C(k) as dependent on the lexical entry. If we did so, then we could partition 𝒦 into equivalence classes on C(k) to yield subproblems that have a fixed set of category labels.