Introduction
Suppose that is the set of all possible lexical entries in a language, is a set of category labels, and is the set of all phonotactically valid words in the language. Then suppose is the function that inflects words in the language.
We wish to not only be able to inflect a given lexical entry in a given category but also to perform the reverse task: given an inflected word form, find all lexical entries that it could be derived from, along with the category of the inflected form. That is, we wish to devise an algorithm that computes the preimage for any .
If we are concerned only with finding matches in a fixed dictionary , then we can store a mapping for all and from to . This method is conceptually simple and requires no structural knowledge of , but it takes entries and requires to be finite, making it space-intensive for highly inflecting languages.
Because Ŋarâþ Crîþ v9’s inflection rules have been complex, this brute-force method has been used for f9i. During the development of sp9 for Project Shiva, however, +merlan #flirora proposed investigating alternative approaches that would not require storing all inflected forms.
The compositionality of linguistic inflection
Many inflection paradigms are not perfectly fusional. That is, we can express as a Cartesian product of other sets and define a sequence of functions , where and . Then involves applying each in succession, passing each category label:
This structure allows us to compute by inverting each step of the inflection process in reverse order:
Additionally, many parts of speech can be categorized into multiple inflectional classes. In mathematical terms, can be partitioned into sets , each of which has a function that is ‘simpler’ to implement than itself. Hence, can be computed by attempting to match against each of the subsets :
Alternatively, we can partition the set into sets , with corresponding functions , such that
Importantly, each of these decompositions produce a set of tuples which can be analyzed as a subproblem of the original problem and treated in the same way.
As an example, consider a language that has five classes for nominal inflections, which can be categorized into two broad groups, X and Y. This language also has cases in three groups:
- Group A cases are the most commonly used cases and have distinct declensions per class.
- Group B cases are less commonly used than group A cases and are declined differently between X classes and Y classes, but the declensions are the same within each group of classes.
- Group C cases are the least commonly used and are declined in the same way across all nouns. Additionally, while group A and B cases are coexponential with number, group C cases are monoexponential in that nouns in group C cases have a case ending followed by a number ending.
Then we could decompose the problem first by partitioning into . Subsequently, is decomposed by partitioning into , while is decomposed by partitioning the same set as .
In contrast, is decomposed by composition into (adding the case affix) and (adding the number affix). We now have the following simpler problems:
This compositionality was the reason that we did not choose to define the set of possible categories as dependent on the lexical entry. If we did so, then we could partition into equivalence classes on to yield subproblems that have a fixed set of category labels.