Minimizing a Deterministic Regular Grammar or a DFA
Published: Nov 2, 2023
A deterministic regular grammar
is a DFA (Deterministic Finite Automaton) represented as a regular grammar.
Such a grammar can have rules of the following format:
where given a nonterminal and a terminal symbol at most one of
its rules will start with a terminal symbol .
Secondly, the is attached to the symbol, which is the
only termination point. If the language contains , then the
single degenerate rule, is added to the rules.
As you can see, each node has at most one
transition for a given terminal symbol. Hence, this canonical form is
equivalent to a deterministic finite state automation (DFA).
A deterministic regular grammar (also a DFA) is useful in many applications.
However, in most applications where a such a grammar can be used
(i.e., parsing and fuzzing) the performance of algorithms can be improved much
further by minimizing the grammar such that it has the smallest size possible
for the language it represents. This post tackles how to minimize a DFA using
the classical algorithm 1. Note, it is referred to as
Hopcroft’s erroneously in wikipedia 2, however Hopcroft has
a different algorithm based on partitioning.
Important:Pyodide takes time to initialize.
Initialization completion is indicated by a red border around Run all button.
Available Packages
These are packages that refer either to my previous posts or to pure python
packages that I have compiled, and is available in the below locations. As
before, install them if you need to run the program directly on the machine.
To install, simply download the wheel file (`pkg.whl`) and install using
`pip install pkg.whl`.
We assume that we have a DFA that is represented as a grammar where the
states in the DFA are nonterminal symbols in the grammar and terminals are
the transitions. However, this DFA is not necessarily minimal. That is,
thre could be duplicate nonterminal symbols that could behave exactly alike.
That is, if we take two such states as start symbols of the grammar, then
the strings accepted by such an grammars would be exactly the same. So, the
idea behind minimizing a regular grammar is to identify nonterminals that are
duplicates of each other
Interestingly, Brzozowski 3 observed that if you reverse the
arrows in the DFA, resulting in an NFA, and then convert the NFA to DFA, then
do this again, the resulting DFA is minimal. However, this can have
exponential worst case complexity (but can be much faster in common patterns).
Hence, we look at a more common algorithm for minimization.
The idea is as follows:
Mark the start state and all final states
Collect all pairs of nonterminals from the grammar
while there are pairs to be marked
a. for each nonmarked pairs p,q do
i. for each symbol a in alphabet do
* if the pair delta(p, a) and delta(q, a) is marked, mark p,q
Construct the reduced grammar by merging unmarked groups of pairs.
We start with a matrix nonterminals, and iteratively refine them. We start by
initializing our data structure.
Next, we want to update the markings. To improve the performance, we define
a datastruture to keep the transitions from states.
We are now ready to update our markings.
Using it.
At this point, all that remains to be done is to merge the nonterminals in the
pairs which are indistinguished. That is, the value is None.
Using it.
The runnable code for this post is available
here.
Artifacts
The runnable Python source for this notebook is available here.
Yingjie Xu “Describing an n log n algorithm for minimizing states in deterministic finite automaton” 2008 ↩
John Hopcroft “An n log n algorithm for minimizing states in a finite automaton” 1971 ↩
Janusz Brzozowski “Canonical regular expressions and minimal state graphs for definite events” 1963 ↩