[This is an actual case study, heavily disguised.]
Let's imagine a programming language called Yazyka.
Let's imagine a programming language called Yazyka.
Let's say the Yazyka compiler is not built to provide detailed assistance and instruction, regarding use of the Yazyka language, to the user. Essentially, its error messages aren't useful to a human being. This is a common problem among production language compilers.
Say that, because of incredible external pressures on the compiler team, the compiler will never be able to generate this human-friendly instruction itself.
So it might be best to build a separate 'check-module', which can be used by the system before or during compilation, that would play a role something like ‘lint’, but with far more sophisticated analytic functionality and pervasive user-friendliness. Like ‘lint’, it would also help to flesh-out documentation and instructional materials for the Yazyka language.
Say that, because of incredible external pressures on the compiler team, the compiler will never be able to generate this human-friendly instruction itself.
So it might be best to build a separate 'check-module', which can be used by the system before or during compilation, that would play a role something like ‘lint’, but with far more sophisticated analytic functionality and pervasive user-friendliness. Like ‘lint’, it would also help to flesh-out documentation and instructional materials for the Yazyka language.
One state-of-the-art solution is to build this check-module with grogix.
Grogix is an unusual computer language, but it is quite ideal for doing this particular kind of work.
Grogix is an unusual computer language, but it is quite ideal for doing this particular kind of work.
Explanation
Say there is a type
of statement in Yazyka whose form is:
(A) notify [x] with [y] on [z] ;
This can look
like, say:
notify
alertPort with “$D” on ALERT_STRINGS;
... and an
infinite set of other similar cases with different specifics.
But there is
also an infinite set of incorrect
cases (a larger infinite set, actually) which do not fit this statement's form (I would hesitate to even call it syntax), any of which could be generated by the user
while attempting to write a correct statement of this form.
Let’s look at a
tiny variant from the correct form.
* notify
alertPort with $D on ALERT_STRINGS;
In this
example, the user has used a notation ($D) which turns ‘D’ into a string. But
even if ‘D’ is otherwise correct, it turns out that ‘$D’ must
be used inside of quotation marks.
The existing Yazyka compiler is ‘aware’ that this is
incorrect. But, in its analysis, the "reasons" it has for rejection of this case
are not intelligible to a human, and cannot help a human to uncover the mistake that has
been made.
For this simple
mistake, the current compiler should provide helpful feedback like:
The $ operator is only for use within
quotation marks, e.g.: "$D"
or:
In the notify statement, the stream
ALERT_STRINGS requires a string
in the with clause. Found $D instead.
in the with clause. Found $D instead.
or:
“With” should be followed by a string
… et cetera ...
But, instead,
the Yazyka compiler responds with these kinds of errors:
Multiple markers at this line
- unexpected keyword: with
- unexpected keyword: (with) in
expression
- unexpected keyword: notify
- unexpected keyword: (notify) in
action
- unexpected keyword: on
- unexpected keyword: (on) in
action
- constructor $ should have 1
arguments
Of course there
are valid internal reasons for these messages, related to other work the Yazyka compiler needs to do. The compiler
is optimized to build fast, reliable code, not to teach people the Yazyka language. This is also true for
most other compilers.
This is why we
advocate a language check-module. It’s intended to serve two purposes:
1. maintain an
independent, human-verified, approachable formal language definition
2. make use of
this definition as the basis of a grogix
program that provides user-friendly error messages
What is grogix, how is it helpful, and why is it
special?
Grogix is the prototype for a new class of formal languages. A grogix program explicitly represents
computational operation, in a concise
way, through a simple, coherent, tree-like gradient of operational importance.
We call it an “operational grammar”.
A grogix program’s uniform structural
description, provided by a cascading deductive block of statements, consisting
of only a single statement type
(described below) means that a programmer is compelled to “push out”
uninteresting implementation details from this operational description of a
program. This leaves a “structural essence”, something which looks merely like an outline of operation, but is actually a
tight hierarchical structure that handles all
cases.
Because it is
‘syntax-centric’, a grogix program
can more easily provide consistency checking and targeted human error
reporting. Providing this, for all incorrect statements of the type (A) above,
is accomplished by this small snippet of code, which demonstrates grogix’s brevity:
.
statement(*starts_with(input,‘notify’)) -> notify_statement
. notify_statement -> notify
recipient_identifier with expression
.. on
stream_identifier($recipient_identifier,$expression) semicolon
.
notify(*starts_with(input,‘notify’)) -> ‘notify’
. notify -> *error(‘not a “notify”
statement’)
. recipient_identifier ->
identifier
. with(*next(input, ‘with’)) ->
‘with’
. with -> *error(‘missing “with”
’)
. expression ->
*valid_type(expr,$1,$2)
. expression -> *error(‘missing
expression after “with”’)
. on(*next(input,‘on’)) -> ‘on’
. on -> *error(‘missing “on”
indicator for stream name’)
. stream_identifier ->
*validate($expression,$recipient_identifier,identifier)
. semicolon(*next(input,‘;’)) -> ‘;’
. semicolon -> *error(‘missing
semicolon’)
Now to explain.
There is only one kind of statement in a grogix program: the conditional production. Its general form is:
. (condition) ->
.. [combination of parameterized terminals, non-terminals and
actions]
Note that every
statement is preceded by a ‘.’ or continued by a ‘..’’ Any other line is a comment: this is a nod to Knuth’s Literate
Programming initiative.
Now let’s look
at the same program, with explanatory comments added:
.
statement(*starts_with(input,‘notify’)) -> notify_statement
We assume that there are more kind of statements besides
notify,
which will be inserted here.
. notify_statement -> notify
recipient_identifier with expression
.. on
stream_identifier($recipient_identifier,$expression) semicolon
This is the structure of a notify statement. In grogix, all the expressions on the right
hand side of the ‘->’ will be evaluated, in order, before this statement can
return a value upwards.
Note that we’ve passed the return value of two of the
non-terminals (recipient_identifier and expression) as parameters (using $) to
a third non-terminal (stream_identifier) in order to check type agreement.
. notify(*starts_with(input,‘notify’))
-> ‘notify’
. notify -> *error(‘not a “notify”
statement’)
This is redundant! But I wanted to give you an early flavor
for the nature of the conditional production. The first notify production is
invoked, and returns a string, if the line begins with ‘notify’, but otherwise,
the non-terminal definition falls through, (top-to-bottom order) to the second
conditional production, which is the ‘default’. The first conditional
production to evaluate positively is the one that ‘runs’ (i.e. the one whose
right-hand-side is further evaluated).
Also note that the *error
operation and the *starts_with
operations are both external
references. Everything else in a grogix program is an internal reference: that is, defined within the grogix
program. This means the grogix program represents only the operational structure, or
‘essence’ of the program … everything
else is pushed out of this structure, or pulled
in, via these * operations.
. recipient_identifier ->
identifier
‘identifier’ will be defined another time. also, in this
consistency checker, there will be binding considerations for the different
kinds of identifiers (we have to check that the streams and variables exist,
for example).
. with(*next(input, ‘with’)) ->
‘with’
. with -> *error(‘missing “with”
’)
Here we look at the next token at the first level, and, if it
is missing at this point in the evaluation, we issue a specific, user-friendly
error (I’ll leave the error format discussion to another time.)
. expression ->
*valid_type(expr,$1,$2)
. expression -> *error(‘missing
expression after “with”’)
If there is an
expression, it does a check against the type declared elsewhere, returning the
mismatch if found. If there’s no
expression at all, it reports the problem.
. on(*next(input,‘on’)) -> ‘on’
. on -> *error(‘missing “on”
indicator for stream name’)
Another simple use of the conditional production to identify
missing structure.
. stream_identifier ->
*validate($expression,$recipient_identifier,identifier)
‘identifier’ will be defined another time. notice that we
call an external function *validate, which we assume has access to appropriate
tables, to validate passed values of other non-terminals (see notify_statement above), and
potentially return an error here is there is a problem. For ease of exposition,
we’re presenting the case where the other non-terminals have already been
evaluated. A different technique is available if the agreement is with
non-terminals whose values are not yet available, but our users can usually
rewrite the productions to make agreement-evaluation easy.
. semicolon(*next(input,‘;’)) ->
‘;’
. semicolon -> *error(‘missing
semicolon’)
Another simple use of the conditional production to identify
missing structure.
Conclusion
The position of
the proposed check-module is architecturally flexible: it can be invoked either
after the Yazyka compiler encounters a problem, to provide better output to the
user, or beforehand, to ensure better input to the compiler.
Socially, the
proposed check-module provides an independent validation of assumptions
regarding the Yazyka language. Although
this module and the compiler may seem like a “divergence risk” because of “two
different language definitions” in two different system modules, in fact their
jobs are complementary, both helping to socialize the actual language
definition. It is the grogix team's job to ensure that there are no conflicts, and that the check-module provides a
world-class level of Yazyka training
and support to the user.