Regular systems
In this chapter, we introduce a notion of regular systems, which are a collection of related regular languages. Specifically, a regular system consists of a finite state machine, yielding a language for each pair of start and end states. If the set of phonotactically valid words in a natural language is a regular language, then substrings of words can be considered to belong in one of the languages yielded by a corresponding regular system.
An example is Ŋarâþ Crîþ’s finite state machine (Figure 1), which defines five states . Each transition carries a payload defining a component of a syllable. For instance, the transition from to is labeled with , the set of all initials, indicating that any initial is sufficient to transition from to . A phonotactically valid Ŋarâþ Crîþ word starts from to , but individual morphemes can be between a different pair of states. For instance, the morphemes in cenv-as-le you write it have the following structures:
- cenv- from to . In the case that this has at least one full syllable and does not end with a lenited consonant, as is here, this is called a stem.
- -as from to
- -le from to
Formally, we define a language system over a set of states and an alphabet as a function that has the following properties:
- Identity: For all , .
- Transitivity: For all , , where denotes the concatenation of languages. We say that a language system has strict transitivity if .
A regular system is a language system in which all languages are regular.
Given a subset of states , we can define a subsystem over and .
Semiautomata
A semiautomaton (DSA) is a deterministic finite automaton that does not specify a start or accept state. A DSA yields a regular language for every , as supplementing starting and accepting states to a semiautomaton yields a DFA. Additionally, these languages have the identity and transitivity properties required for regular systems. These properties (except for perhaps strict transitivity) still hold when we only look at a subset of states .
We also introduce the notion of a nondeterministic semiautomaton (NSA), which can be seen as a nondeterministic finite automaton without starting or accepting states. More formally, an NSA is a tuple where is the transition function. For states , is the set of strings that can be generated by a path from to . We also define nondeterministic semiautomata with ε-moves (NSA-ε) similarly, in which transitions without an input symbol are allowed. For similar reasons, NSAs and NSA-εs have the properties of regular systems, and these properties (except for perhaps strict transitivity) still hold when we only look at a subset of states . It follows that any regular system can be converted into a subsystem of an NSA-ε by applying Thompson’s construction algorithm between each pair of input states.
However, while it is possible to translate an NFA into a DFA, translating an NSA into a DSA is less trivial. In general, it is not possible to translate an NSA into a DSA with the same regular system if each state of the NSA must correspond to exactly one state in the DSA. A single accepting state in the NSA might correspond to multiple accepting states in the DSA as shown in Figure 2.
Start | End | Language |
---|---|---|
(ab*)* | ||
a+(ba*)* | ||
b+(ab*)* | ||
(ba*)* |
However, it is possible to map each state of the NSA to a set of states in the DSA. In Figure 2, in the NSA would correspond to in the DSA and would correspond to . Let be the function mapping states in to corresponding sets of states in .
However, note that given two states and of , the existence of a path from some to in accepting a string does not imply that . For example, while does not accept the empty string, does. Therefore, we require that for all , there exists a state such that accepts from the state to .
More generally, we can establish a bijection between the set of abstract states of a regular system and the set of sets of concrete states of a semiautomaton . Then a string is in if and only if for all , there exists a such that there is a path from to in for :
In this text, we concern ourselves only with DSAs and their subsystems. This is sufficient for most natural languages if we take the various components of a syllable, including null segments, as symbols themselves and augment each symbol with its source and destination states. For instance, Ŋarâþ Crîþ tfełor would be represented as
We refer to strings represented this way as fully syllabified strings or assemblages and any element of an assemblage as an assemblage unit. A disambiguated regular system is a regular system over in which for each nonempty string , there is at most one such that .