of awk (The AWK Programming
Language), which was published in
late 1987. This version became available
to the world with the UNIX System V
Release 3. 2.
I bought the book, figuring that now
was my chance to learn awk. It was
(and remains) a great book. Having an
interest in programming languages and
an interest in contributing to the world
at large, I decided to see whether the
GNU project had a version of awk.
Indeed, it did, but it implemented only
old awk (and poorly, at that). Being
single at the time, I decided to get
involved and see if I could work to
make gawk compatible with new awk.
(And, thus, the course of history
As early as 1988, the GNU developers
were corresponding with Brian Kernighan
and other awk implementers to make
sure that the awk semantics were consistent across implementations. System V
Release 4, in 1989, brought a few new
features for new awk (the -v option,
the ENVIRON array, the tolower() and toupper() built-in functions) and the
first POSIX standard (circa 1992) introduced the CONVFMT variable.
Starting in December 1993, Brian
Kernighan was able to release the
code to new awk; it continues to be
available (see Resources) and sees
minor bug fixes from time to time.
GNU Awk was first written around
1986 by Jay Rubin and Paul Finlason,
with some help from Richard Stallman.
It barely implemented the original awk
language, was buggy and not particularly fast. It worked by building a parse
tree representation of the program
and then recursively evaluating the
parse tree for each input record.
When I got involved in late 1987,
David Trueman already had volunteered to upgrade it to new awk, and
I joined the effort, contributing code
fixes and doing serious work on the
documentation. We worked together
until around 1994, when I became the
Along the way, gawk acquired full
compliance with new awk, including
POSIX, and it improved in code quality,
speed and new features. Throughout
the course of more than 20 years though,
the basic design remained the same:
build the parse tree and recursively
evaluate it for each input record.
In 2003, out of the blue, a gentle-
man named John Haque contacted
me. He had rewritten the gawk inter-
nals to use a byte-code interpreter and
provided an awk-level debugger for
awk programs. This was a startling
innovation. I worked with him to get
his version to the point where it was
stable and passed the test suite, but I
WWW.LINUXJOURNAL.COM SEPTEMBER 2011 | 95