Important:Pyodide takes time to initialize.
Initialization completion is indicated by a red border around Run all button.
This post shows you how to convert a regular grammar to a regular expression
directly – that is, without first creating an NFA or a DFA.
Converting a regular grammar to a regular expression is fairly easy
We only need to follow a fairly fixed set of rules.
First, we make sure that we have a start symbol, and a single symbol
that represents the nonterminal in the grammar.
Next, we combine any production rules that end with the same nonterminal
into a regular expression rule with a regular expression prefix, and the
nonterminal suffix.
Next, we handle any self repetitions by using Kleene star expressions
Finally, we start removing nonterminals one by one until we are left with
the regular expression that takes us from start to stop.
We start with importing the prerequisites
Available Packages
These are packages that refer either to my previous posts or to pure python
packages that I have compiled, and is available in the below locations. As
before, install them if you need to run the program directly on the machine.
To install, simply download the wheel file (`pkg.whl`) and install using
`pip install pkg.whl`.
The grammar should have one start symbol
and exactly one stop symbol which is NT_EMPTY
So, what we do is, whenever we have a rule that contains
just a terminal symbol, we append the NT_EMPTY symbol
to the rule. Thus NT_EMPTY symbol becomes the final
nontermainal to be expanded.
Testing it.
We also define an is_nonterminal that knows about regular expressions.
Next, given a rule, we want to split it into a regex part and a nonterminal part.
Prefix Regex
Next, what we want to do is to consolidate rules that have same nonterminals
to a single rule with a regular expression prefix, and the nonterminal suffix.
That is:
convert <A> := a <B> | b <B> to (a|b) <B>
Using it.
Recursion (Repetition)
When there is recursion, that is a rule contains a nonterminal
that is the same as the nonterminal we are defining, we need to
convert that to a kleene star, and add it in front of every other rule.
That is,
A -> b B
| c C
| a A
becomes
A -> a* b B
| a* c C
Using it.
Finally, we start eliminating nonterminals from the grammar one by one.
The idea is to choose a single nonterminal to be eliminated, and find where
it is being used. For each such rules, replace that rule with a set of rules
with the same prefix, and the rules of the nonterminal being eliminated as the
suffix. That is, given
A -> b B
| c D
B -> e E
| f F
Eliminating B will result in
A -> b e E # new
| b f F # new
| c D
# B -> e E
# | f F
Using it.
Regular grammar to regex
Eliminate each nonterminal one by one to get our expression.
Using it.
The runnable code for this post is available
here.
Artifacts
The runnable Python source for this notebook is available here.
The installable python wheel rgtorx is available here.