ŋaren crîþa 9 vlefto: Ŋarâþ Crîþ v9

Regular systems

In this chapter, we introduce a notion of regular systems, which are a collection of related regular languages. Specifically, a regular system consists of a finite state machine, yielding a language for each pair of start and end states. If the set of phonotactically valid words in a natural language is a regular language, then substrings of words can be considered to belong in one of the languages yielded by a corresponding regular system.

The finite state machine describing the phonotactics of Ŋarâþ Crîþ.
Figure 1: The finite state machine describing the phonotactics of Ŋarâþ Crîþ.

An example is Ŋarâþ Crîþ’s finite state machine (Figure 1), which defines five states {s,g,o,n,ω}. Each transition carries a payload defining a component of a syllable. For instance, the transition from s to g is labeled with Ι, the set of all initials, indicating that any initial is sufficient to transition from s to g. A phonotactically valid Ŋarâþ Crîþ word starts from s to ω, but individual morphemes can be between a different pair of states. For instance, the morphemes in cenv-as-le you write it have the following structures:

Formally, we define a language system Λ over a set of states Q and an alphabet Σ as a function Q×Q𝒫(Σ) that has the following properties:

A regular system is a language system in which all languages Λ(p,q) are regular.

Given a subset of states QQ, we can define a subsystem Λ=Λ|Q×Q over Q and Σ.

Semiautomata

A semiautomaton (DSA) is a deterministic finite automaton that does not specify a start or accept state. A DSA M=(Q,Σ,δ) yields a regular language M(p,q) for every p,qQ, as supplementing starting and accepting states to a semiautomaton yields a DFA. Additionally, these languages have the identity and transitivity properties required for regular systems. These properties (except for perhaps strict transitivity) still hold when we only look at a subset of states QQ.

We also introduce the notion of a nondeterministic semiautomaton (NSA), which can be seen as a nondeterministic finite automaton without starting or accepting states. More formally, an NSA is a tuple M=(Q,Σ,δ) where δ:Q×Σ𝒫(Q) is the transition function. For states p,qQ, M(p,q) is the set of strings that can be generated by a path from p to q. We also define nondeterministic semiautomata with ε-moves (NSA-ε) similarly, in which transitions without an input symbol are allowed. For similar reasons, NSAs and NSA-εs have the properties of regular systems, and these properties (except for perhaps strict transitivity) still hold when we only look at a subset of states QQ. It follows that any regular system can be converted into a subsystem of an NSA-ε by applying Thompson’s construction algorithm between each pair of input states.

However, while it is possible to translate an NFA into a DFA, translating an NSA into a DSA is less trivial. In general, it is not possible to translate an NSA into a DSA with the same regular system if each state of the NSA must correspond to exactly one state in the DSA. A single accepting state in the NSA might correspond to multiple accepting states in the DSA as shown in Figure 2.

Figure unavailable in HTML. Alt text: n/a
Figure 2: A counterexample to the conjecture that every NSA can be translated into a DSA with an equivalent regular system: the q0 and q1 states both correspond to multiple accepting states in the DSA.
StartEndLanguage
q0q0(ab*)*
q0q1a+(ba*)*
q1q0b+(ab*)*
q1q1(ba*)*
Table 1: The regular expression for each language of the system described by Figure 2.

However, it is possible to map each state of the NSA to a set of states in the DSA. In Figure 2, q0 in the NSA M would correspond to {{q0},{q0,q1}} in the DSA M and q1 would correspond to {{q1},{q0,q1}}. Let D be the function mapping states in M to corresponding sets of states in M.

However, note that given two states p and q of M, the existence of a path from some p*D(p) to q*D(q) in M accepting a string s does not imply that sM(p,q). For example, while M(q0,q1) does not accept the empty string, M({q0,q1},{q0,q1}) does. Therefore, we require that for all p*D(p), there exists a state q*D(q) such that M accepts s from the state p* to q*.

More generally, we can establish a bijection between the set of abstract states 𝒬 of a regular system Λ and the set of sets of concrete states Q of a semiautomaton M. Then a string s is in Λ(p,q) if and only if for all p*p, there exists a q*q such that there is a path from p* to q* in M for s:

Λ(p,q)=p*pq*qM(p*,q*)

In this text, we concern ourselves only with DSAs and their subsystems. This is sufficient for most natural languages if we take the various components of a syllable, including null segments, as symbols themselves and augment each symbol with its source and destination states. For instance, Ŋarâþ Crîþ tfełor would be represented as

((tf,s,g),(ε,g,o),(e,o,n),(ε,n,s),(ł,s,g),(ε,g,o),(o,o,n),(r,n,ω))

We refer to strings represented this way as fully syllabified strings or assemblages and any element of an assemblage as an assemblage unit. A disambiguated regular system is a regular system Λ over Q in which for each nonempty string s, there is at most one (p,q)Q×Q such that sΛ(p,q).