Saturday, December 9, 2023

Another example of why NYC is a great place to live but I wouldn't want to visit.

Today is Santacon. 

It is the day when people young and old, big and small, dress up as Santa Claus. Or if that's not your thing, just wear felty or furry brown and put on a pair of felt antlers and that works too. 

I see such diversity in costume today, maybe just the hat, maybe just the pants, as I walk the 1/2 mile to the Saturday Farmer's Market. I look for basically two items.

The first item is brussel sprouts on the stalk. If the brussel sprouts are very fresh, you can eat the middle of the stalks. They have a carrot-like texture and the taste is a mild kind of horseradish with a slightly skunky smell that you find in cabbage or broccoli which brussel sprouts are related to. For the last month or so, they have been appearing at the Farmer's market.

The second item is concord grapes. I like them for their pissy taste. That taste comes from a native grape called the "fox grape" that is native to the northeastern coastal part of the U.S. 

I have never had a fox grape and would like to try it. Like raspberries or  low-bush blueberries, they are a bit fragile and do not transport very well. So people make jams with these. 

Historically, I imagine they are interesting for this reason. Since they are the native grape, I imagine that some ancient Viking real-estate developer hit on the idea of calling this area "Vinland" after discovering one if these. I imagine also that this was the same guy who previously had a successful real estate run after naming this other piece of land: "Greenland". There is a spot not far from NYC across the great psychological divide, the Hudson River, which one of his descendants named "Meadowlands". This is a double lie: it is a swamp. 

Anway, today I could not find either at the Farmer's market. I imagine it is this kind of specificness that people from NYC contributes to why others, not from the region,  may view us as "snobby". 

On the way back, I stopped into a large bookstore on Broadway. Yes, that Broadway; the bookstore is just not in the theater or show-business district. 

I have this weird idea that one day I will find a script to a Radio show I once heard by David Mamet and someone else that has music to the lyrics:


Hail to thee, George Topax

'ner against the foe lax,

Tibia quo pax,

Ee-i-ee-i-o.

          As we move along,

          through the goolagong, 

          we know where you belong,

          on your big fat throne, so .. 

          here's to thee, George Topax, ...


I don't find it. I check also the science and math section. And being a geek, the computer section as well. 


I see a book called "The Computer and the Brain" by John von Neumann. 



From the photo attached you can see a 1950's style computer, complete with wires on the front. Those large cylindrical things are some sort of memory.

I open it and read the table of contents. I am interested: PART I: THE COMPUTER. ... PART II: THE BRAIN

Von Neumann is, of course, responsible for the architecture of the modern-day computer with its CPU and memory.  It is not surprising to me that he was also interested in the brain. He was that kind of guy. And I find it cool and reassuring that right from the beginning, to his way of thinking he separates and distinguishes how a computer of his design works from how a brain works. As I look to see the price, and I come across this: 


$4.00 - I can afford that. And then I notice the writing. M. L M-something-or-other. Cambridge, Mass Nov. 1960. 

Then it hits me. ...  Could this be Marvin L. Minsky? The beginning matches, but I don't see anything that looks like the end: "sky".  I look on my phone to see if I can find a signature for Minsky. This is in the dingy and slightly damp basement of the Strand. No cell phone connection is available. I pass it by, and move on looking at other sections. 

But I figure: for $4, why not? 

So I go back and pick it up.  At the cash register I see that I should have looked at the back of the book, it was $20 not $4 which was the original price of the book. The $20 figure is because his book is "out of print". Yeah, I can understand why that might be.

Coming home, I look to find Minsky's signature. Signatures he writes in "signed copy" books do have a visible "sky". However I asked a friend who know more about handwriting analysis and says that the "M" and the "k" do look like they were formed the same way. 

According to Wikipedia, Minsky joined MIT (in Cambridge MA)  in 1958, the same year as this 3rd printing. The book's topic is clearly a topic he was interested in. 

What do you think? 

Anyway, it was his. That's my story -- and I am sticking to it!

To come back to the title, here is an example of what is cool about NYC.  I am just kind of lumbering around on a routine walk and stumble across something like this. On another occasion, pretty much that same route, I was walking home and passed a theater that said: "World Premiere". It was Samuel Beckett's penultimate play, Worstward Ho I thought now there's something you don't see every day (a phrase Edgar often says to Chauncy in the Rocky and Bullwinkle cartoon) . But then I thought. Well, I am tired tonight. I'll go tomorrow night, and did. 

Another time I was walking around, about 3 blocks from my apartment on an Easter Sunday, and there is no one out on the street except this guy, Philip Glass who I later learned lives here.

As for the "but I wouldn't want to visit" part, the touristy kinds of things tend not to thrill me. 

I did however like going up the top of the  World Trade Center because of inside the view was very myopic and all four compass sections are very very different. And then you could go to the top and see how it all pieces together. But to really appreciate something like this you have to know something about the area.


Saturday, October 28, 2023

The two times I have been misled by Compiler Canon (and the Dragon book)

Twice in my life there have these weird moments in compilers where I realize something I have been taught, something that has been considered canon turns out to be misleading.

The time this happened was when I started at IBM Research in a compiler group. Previously my formal knowledge of compilers was from the first edition of the dragon book.

Register allocation focus was on reducing the stack depth in evaluating expressions. There was something called Sethi-Ullman numbers which computed the maximum stack depth.

Since one of the authors of the Dragon Book was Jeff Ullman, it is not a surprise would be mentioned.

And for the DEC PDP-11 machines this made sense, because there were stack-like increment/decrement instructions on a special "stack-pointer"register. It never made sense for IBM's computers which had general-purpose registers without such increment/decrement instructions.

When I got to IBM around 1982, there was this register allocation implementation by Greg Chaitin based on an idea by Russian Computer Scientist A. P. Ershov via IBM Fellow John Cocke. While the idea had been floating around since 1971 or so by John Cocke, Greg's simple elegant implementation it happen and put it into the PL.8 compiler.

Subsequent editions of the Dragon book ditched Sethi Ullman numbers and described this instead. 

The second time though is recent in relation to how compilers work and how decompilation works.

In around 2006, I was interested in decompilation as a way to get precise location information about where a program is when it is running code, such as inside a debugger, or just before producing a stack dump. There was some code that was started around 2000 and largely abandoned a year or two later after going through two people. Subsequent maintainers I don't think understood it as well since it has a compiler-centric orientation.

There were a couple of people who wrote thesis on this topic, and their work seems to be canon for decompilation. It assumes a general-purpose decompiler -- one that starts with machine code and produces source code. It has no understanding of the source code used, translator, or compile system used. Therefore its source code has to be some sort of lower-level language bridgings the semantics of machine-code-level instructions and control flow. C is a good language for this kind of thing.

The decompiler-as-a-kind-compiler approach however is at odds with this. It does assume a specific compilation system and its transformations into a high-level bytecode. And by doing this the source code is the same as the source code we started out with. Furthermore, by doing this we don't need really to understand the semantics of the bytecode instructions!

At first I thought this was kind of a one-off. Part if this was thinking that bytecode as a general principle is a one off. I now realize, that this indeed is not the case. As the Wikipedia article on p-code-machine explains, the idea goes back as early as 1966 and traces through Euler and Pascal via Niklaus Wirth. Rather than bytecode being a one-off. It is used a a number of systems like Smalltalk, and Java, probably gaining popularity from the Pascal P-code interpreter, UCSD Pascal and Turbo Pascal.

There was this school of thought at IBM is that there was a holy grail of a universal source-code and machine independent register-based intermediate language. Something akin to LLVM. To some extent there  still is a school of though that believes this. And there is possibly some truth to this. However there are a lot of systems that use a bytecode designed for a particular language for a particular family of languages.  Microsoft CIL and JVM are examples of a family of languages camp.

See https://www.youtube.com/watch?v=IT__Nrr3PNI&t=5311s for the history of JVM and how it was influenced by bytecode.


Friday, November 26, 2021

Sysadmin Horror Story with a Moral

Of all of my horror stories working for a NYC financial organization, the worst one was also the most memorable, not because of its horribleness, but rather because there was something that I discovered which has served me well in the decades since.

Back in the days before server clouds and virtual machines, we had real computing hardware.

Sometimes, as in this financial institution, the servers, all 1,000 or so of them, were housed right inside the business. This firm owned an entire 24-floor building not far from Rockefeller Center.

We were informed by NYC's power company, Con Edison, that in order to beef up our growing power demand they needed to turn off the power to the building for a little bit. It might have just been to the two or three floors where we housed all of the computers. This would be done on a Saturday night when all of the financial markets were closed.

The plan was that Saturday evening we'd power down all of the servers and Sunday morning at 6AM the servers would be powered back up. Because every server had to be checked, we divided the power-on phase into two groups: one starting at 6AM, and another starting at noon. We had the more experienced people doing the noon shift, because we expected that some servers might be tricky, so the early group could focus on quantity and leave the more difficult machines for the more advanced group.

This seemed pretty reasonable and clever. I was on the second shift.

Things went pretty well and we got most of the servers up and running.

Except there was one. And it was one of the more important real-time trading servers.

So now we flash back six weeks earlier...

Back in the days before the widespread use of server configuration management and orchestration tools like kubernetes, ansible, chef, or puppet, we had to write our own scripts to do mundane things.

One time I was assigned the task of writing a script to update root passwords everywhere.

Actually the assignment wasn't really that. A batch of sysadmins had burned out and quit and upper management wanted to just make sure ASAP that they didn't inflect damage on us. Financial intuitions weren't friendly places to work.

Taking the longer view of things, I wrote a general program to change root passwords. You weren't going to change the attitude of the financial institution. They treated everyone like a disposable wipe. So you adjusted; this was an activity was going clearly going to happen over and over again.

Part of the task of was, of course, to write the program to update the root password. The other part of the task, do it everywhere, was actually more difficult: get and maintain an inventory of servers. A number of the servers you could get from a service developed at Sun Microsystems called NIS.

However not all servers used this, so NIS information had to be melded in with DNS. The two lists were largely distinct because servers were set up two different and somewhat competing groups. However the groups got merged sometime after I was hired.

When a server has more than one network interface, it was important to catch that. I had written blast script that would take a list of servers, copy some code to that server and run it. The code here was the thing that changed the root passwords.

One of these servers had more than one interface, was in listed both in DNS and NIS and so it was listed twice. My blast script updated servers independently rather than serially.
When you have 1,000 or so servers to update, you need to update servers in parallel if you want to get any task done in a single work shift. (More about that aspect perhaps some other time; that story has a moral, too.)

There is a funny thing about the Unix file system (and by extension Linux as well): consistency is not guaranteed over independent file writes. In other words, if you have two clients each of which opens the same file and writes to it in parallel with the other client, the result will probably be garbage. And that's what happened when I ran my password update in parallel through two different network interfaces.

The result was that we had a shadow password file which was garbage.

Much later when I learned about "idempotency" in configuration management, a light bulb went off in my head. See definition 3 in https://foldoc.org/idempotent .

Now, pretty soon we discovered the mistake. I went to upper management to report the problem. I figured, since we were low on sysadmins which is why I was doing this in the first place, my job was safe for a little while, at least until I got the program working reliably by adding a couple of file locks into the program, and updating the inventory for the mistake.

Then I was asked what the current situation of the server was. Was the important trading system still trading?

Interestingly the answer was, well, actually yes. It was. Any program that started out with root privileges and that was running was still functioning normally. It was only new requests that couldn't log in as root.

So upper management said, don't worry about it then: just don't reboot the box.

This is exactly the highly skilled 2-steps-ahead kind of answer typical of those minds that work at places making decisions about your financial future.

So now we come back to the problem that Con Edison had decided that we were going to reboot. Most of the important servers were hooked to battery backup with specially monitored power regulators. However this one server was in the trading area on some guy's desk hooked into the wall outlet.

And as we learned that day, it also had disk clustering of the root file system which was also slightly dysfunctional. The process of rebooting involved using a magical command as root which we were only to find out about later. I hope you see where we are going with this...

If you are still with me, now I get to the main part of the story.

We spent hours trying all sorts of things and the usual tricks to bring up a server in repair mode. But because of the root partition being part of disk clustering which was messed up, we couldn't. All the other folks were told to go home because it was just this one server and it was just me and the sysadmin team lead.

We had a contract with the manufacturer/vendor for support of the important servers, but not for this particular server which wasn't racked with other servers. Finally, the sysadmin team lead, Sam, decides to put on his own personal credit card 1/2 hour live support for about $800. (This was in 1999 dollars). I was impressed: it was resourceful and took guts.

We were told that we would deal with the expert in Veritas Cluster Management, our clustering software, at Sun.

So we were hopeful.

He had us try a few thing things, but none of them worked. And then after 3 minutes of elapsed time, he said, "I hope you have good backups to restore from." Well, predictably, this important but rogue trading server that happened to be sitting on some dude's desk and had neither fault-tolerant power nor server support maintenance wasn't in the list of servers to be backed up either.

But Sam just said: I want a second opinion.

The guy said: Okay, I hope your résumés are up to date.

I was very agitated. Using a low-level disk reading command, dd, I was able to see that the all the clustering info was there. It was just a matter of reconstructing whatever information is needed at boot so we could then get to a single-user root shell with /etc/ mounted. To spend $800 to be told in 3 minutes to basically blow it all away and restore everything from backup (which would have taken several hours and lost some amount of trading data) really irked me.


But Sam kept cool. We talked to the manager who explained that this guy really was the top-of-the-line expert in Veritas Clustering that they have.

Sam patiently said this wasn't acceptable and that we still had 15 minutes of paid time to figure out something. The manager said well, in 5 minutes, at midnight, the shift changes and we have another guy on duty. He's just the guy we use to fill in off hours when there is low activity. And he doesn't know that much about Veritas Clustering. Sam said, okay, let's try him.

We wait the 5 minutes and talk to the other guy. He affirms that the person we talked to really was the expert on Veritas Clustering. He didn't know much about it, but he will look up whatever resources he has available and call us back. Okay. And in another 5 minutes he says well, by googling he sees that you can get direct support for Veritas Clustering from the Veritas company itself. Sun Microsystems was just an OEM provider.

So we called up Veritas, and gave them the dd information I had gotten earlier. With that, someone in Veritas was able to write a custom shell script which we were able to recreate the volume manager information with all of the current data in tact, so that we could boot the server to at least a root shell. YAY!


Moral: It is sometimes more important to find someone willing to help and work with you than someone who has the most knowledge about something. With someone trying to be helpful, you might be able to figure the things you don't know, things that an expert might overlook.
 

Aftermath:

While we now had root access we still had a lot of work to recreate shadow passwords and go over the /etc/ filesystem. We had both been there over 12 hours. Sam had to get some rest before doing this important, unforgiving and custom bit of work. And this was wise, too, because when you are tired it is too easy to make a mistake. So Sam was going to go home and sleep to 5AM.

For my part, I just slept on the wall-to-wall carpeted floor in the sysadmin room. The carpeting was there because this was the floor with the trading room, not because the company cared about the comfort of sysadmins sleeping on the floor.

At about 3AM I hear the locked door open. It was the night guard who said he was suspicious because he heard some heavy snoring. I assured him I heard it, too, and was certain it was coming from the next room over.

Monday, November 16, 2020

Greg Chatin, Computer Programmer

As a friend of Greg Chaitin, and an avid follower and reader of his writings, one thing that strikes me is that there is very little mentioned about the 25 years or so that he spent at IBM, most of it at IBM Research. Perhaps the situation is analogous to that of Archimedes. His mathematical works are well preserved, but we don't really know the extent of his considerable practical engineering and invention.

Greg started out as a IBM Service Engineer in Argentina and I came to know him when he worked at IBM Research.

We worked on a compiler for IBM's first RISC, then called 801. It later became known as the POWER architecture.

The work he did on that project was first rate and I learned a lot from him about writing code. Some of this is described below. In addition, he used insightful algorithms. You can find a little bit about his work on the register allocator in a couple of papers listed in the references below. These papers have become the standard references whenever something describes register allocation by graph coloring, which is called Chaitin-style graph coloring.

The papers convey the elegance of the idea but, having studied the working code as it appeared inside the compiler, I don't think they do justice to the underlying engineering or volume of output Greg produced when I had the pleasure of working with him on that project.

It is surprising there is not more about this period in biographical or autobiographical writings, and I would like to address that here.

A little background.

I started working at IBM Research right after getting a masters' degree. Although I had worked as a programmer and written code in school and on part-time jobs, this was my first full-time job at any large company. The project I worked on was an optimizing compiler. The code was hundreds of thousands of lines long, written by maybe a dozen people over several years. This kind of scale is still true for many compilers and interpreters. The "optimizing" part of "optimizing compiler" means that clever techniques are applied to make the code run fast.

Needless to say in any such endeavor of this size there is bound to be a lot of chaos. Here is an example that I encountered in watching a bug get fixed.

Before Greg's involvement, after the input source program was parsed, information was mostly managed in expression trees. The structure for an expression node could have a number of fields.

The compiler also consisted of a number of "phases," as found in many compilers. In particular, the phases were:
  1. A parse phase
  2. A code-improvement phase
  3. A register-allocation phase
  4. A final assembly to machine-code phase

One day, a bug turned up because a field that was needed in the final assembly phase got mangled. That field was called "ppb" for "phil's private bits." Phil worked on the parse phase.

Peter, who worked on the code-improvement phase, occasionally needed to store a bit of information for his purposes. He figured that since his phase came after Phil's phase, he could do a little code optimization of his own and reuse "phil's private bits" since, as he put it, Phil wasn't looking. That way he wouldn't have to add another temporary field which would make the expression-tree structure larger and more unwieldy. So "phils private bits" became "peter's private bits" when Phil wasn't looking. 

Unfortunately Hank, who worked on the final assembly phase, needed the information stored in phil's private bits. But this had been destroyed in the code-improvement phase by Peter.

Greg's phase, affectionately known as Phase III, was the register-allocation phase. When I studied Greg's code in order to take over the phase, I was amazed at how clean, efficient and elegant it was. Not at all like phil's/peter's private bits. At one point, I thought I could come up with an improvement, only to find that, even though it hadn't been mentioned in the papers describing the code, it was already implemented.

The first thing that struck me about Greg's code was that it followed the kind of formalism found in physics equations. Physics often uses short names in a particular domain. For example E, m and c in E = mc^2; this code was like that. There was a short and consistent notation for program variables that represented all registers, another for a single register, an idiom for looping over a register, and so on. This was very different from the rest of the code base.

In contrast to the looseness described in the Phil/Peter bug, there was strict consistency, clarity, succinctness, and focus on purpose which was applicable generally. This was the first time I had encountered code that exuded a sense of art rather than the typical robotic engineering that I was familiar with and pervaded the rest of the code.

As I said, the algorithm behind the register allocation can be found in a couple of papers. However what you won't find printed anywhere is a description of the excellent engineering that made this work. So I'd like to describe this next.

To support working on the register-coloring algorithm, Greg invented a register-transfer language and its display format. It has a couple of features not found in many register-transfer languages, such as a way to indicate that an instruction has more than one output. For example on the IBM System/370 the a 32 bit multiply is done, the output is stored in a pair of registers when the multiplication completes. So in the register-transfer language, both of these registers were on the left-hand side of an assignment statement. If an instruction, like 'add', set a condition bit such as 'carry', that bit was indicated as an output as well. When a register went
"dead"—the last time a value from that register was used—it had a tick mark after it.

All of this was needed in the register-allocation process, and the concise and precise language Greg invented for this purpose made it very easy to understand what was going on and how registers got allocated or "colored."

Nowadays, it is not uncommon in compilers to transform the tree representation that comes out of a parser (Phase I) into a register-based representation. But circa 1982 it was not a widespread practice. Greg's work predates the invention of Single Static Assignment, or SSA, which followed shortly afterward at IBM, presumably under Greg's influence. Both promoted the proliferation of register-based intermediate languages.

Greg's work was contemporaneous with the first edition of the Dragon Book which popularized this idea. (The first edition described a register allocation scheme that was suited for stack-architecture CPUs. Later editions of this book switched to describing the register allocation scheme that Greg developed.)

Compare this with the earlier books by David Gries or McKeeman et al. In the 1970s, there were several compilers that worked off of a tree representation. Some interpreters, such as those for Lisp, Perl5, or Korn shell, still work off of expression trees.

When Greg joined the project, the final assembly phase, Phase IV, used the expression trees built in Phase I. There was quite a bit of engineering and coding that Greg had to do to convert the expression trees into his register-transfer language. But he also had to do the work to convert it back again into the expression trees after coloring so that Phase IV could do its work! That idea that you could wall yourself off from the surrounding chaos blew my mind, but by doing so Greg had created a little corner of the world with order in the middle of a big mess.

Eventually, Greg convinced Hank, the person who worked on the phase after Greg's phase, to use his more elegant register-transfer language to drive final assembly. Hank told me that in doing so, his code got a lot better, simpler, shorter, and easier to maintain.

I have used the technique of coming up with an elegant or more appropriate representation and language in which to think and describe things in, even if it means writing additional and messy engineering code to do transformations both into and out of your language. Each time this comes up there is great benefit. I am grateful to Greg for adding this technique to my programming arsenal. In algorithms you sometimes find this idea in conjunction with, say, the Discrete Fast Fourier Transform. However it is not something usually mentioned in conjunction with programming.

This kind of elegance and simplification was not an isolated incident, but part of the fiber of Greg's programming. Here is another example.

After this work was completed, Greg embarked on a totally different aspect of the compiler, the binder, sometimes called a linker. This is a piece of code that is needed after the source code is compiled into machine code and just before running or loading the code. It adjusts the code so that it can work in an arbitrary section of computer virtual memory.

To do this, the compiler needed to be modified so that it used only those instructions that were position-independent. Nowadays this is called PIC Position-Independent Code) In other words, certain instructions or instruction formats had to be avoided. These typically include the "absolute address" forms of  "load from memory" and "store from memory" instructions.

The first versions of Greg's binder started out by more or less following what some other clever IBM Researchers, notably Chris Stephenson, did to perform this process on conventional IBM System/370 hardware. In the beginning Greg probably knew no more about this process than any casual programmer would. He finished a preliminary version of the code which worked and reflected his understanding of Chris's code for the 370. It was probably more or less a translation of that code adapted to the new hardware instructions and architecture.

However after this was completed and working, he had an epiphany—the process of binding together a set of machine code fragments could be thought of as a data-dependency driven process which uses a topological sort to pull everything together followed by a garbage-collection process that removes those parts of the code fragments that are bundled in "object decks" but aren't needed for the particular program that is being bound together.

At this point he discarded his thousands of lines of code and rewrote everything from scratch in a couple of days! I had never heard of anyone doing such a thing.

The end result was, of course a very concise and clean piece of code. Getting this code adapted into IBM's product engineering went pretty quickly. Most of the things that came out of IBM Research needed significant rewriting before they would be considered by the Product team, but this was not the case here due to its usefulness, simplicity and elegance. It became known as IBM's TOC binder.

One other incident in conjunction with the TOC binder attests to its unity of purpose and simplicity.

One day I happened to be in Greg's office. Usually, everyone worked on a bullpen that had 6--8 terminals in it. (Unix was developed in such an environment, too.) Before the days of IRC and Slack, a bullpen like this had the same effect. If you had a problem, you'd just blurt it out. Someone might hear you and quickly tell you what was going on.

However we also had private offices. Greg's office, like his code, was free from clutter. It consisted of a single terminal on a desk otherwise empty except for a phone. A single framed picture hung on the wall. Nothing more.

I had dropped by there and Dick, who was working on a database for our operating system, came in to describe a problem he had encountered. Just as Sherlock Holmes would do, Greg faced Dick the entire time, listening intently and quietly. After that, there was a pause of about 30 seconds as he reflected. Then after asking a few questions, he offered a theory of what might be wrong. With a couple of clicks on the terminal he brought up a file, just one small part of the rather substantial code. In a couple of clicks more, he found the problem spot. The whole process took less than 5 minutes.

It was amazing to me that he figured out the bug completely without having run the code even once. In order to verify that his theory and fix were correct, the code had to be run, but that came later. This kind of bug fixing is something I have rarely encountered. The fact that something like this can be done at all I attribute to the simpleness, cleanness and overall good organization of the code. He could reason about its behavior without having to run it.

My last example of clean concise code is published.

Greg wrote computer simulations for five well-known physical models:
  • a satellite going around the Earth according to Newton,
  • propagation of an electromagnetic wave according to Maxwell,
  • the same satellite going around the Earth according to Einstein,
  • an electron moving in a one-dimensional potential according to Schrödinger, and
  • sums over all histories according to Feynman.
Each of these simulations fits on a single page of APL2 code. This includes setting up the data and drawing plots of the output. In fact this code was so short and elegant that for a while one of the 30-line simulations replaced the picture that had been hanging in his office.

References



Saturday, October 7, 2017

What the Dragon Book Didn't Tell Us?

Originally posted on Quora, but they've stopped allowing updates. The Quora post has some reader comments.

First I'll say that I like this book on Compilers. It is probably the single book that has changed the landscape of compiling technology the most.

Also, In the first edition there was a nice dragon on the cover and the tail flowed around the spine over to the back. That green tail on the spine looked nice on a bookshelf of mostly technical books with boring covers and spines. Think: Knuth's Art of Computer Programming or Aho and Ullman's earlier book: The Theory of Parsing Translation and Compiling to get the idea. Later editions didn't look so nice on the shelf.

Andrew Appel’s excellent book “Modern Compiler Implementation in Java : Basic Techniques” gets honorable mention for having I guess the mid-section of a Tiger on the spine, but that is not as easily recognizable on the shelf as a dragon’s tail. It also has a nice pun in the epigram for the chapter on method lookups for Object-Oriented languages. (The dragon book has nothing on how to handle method lookups in OO Languages.)

But there is a lot that I have learned on compilers and compiling technology in the 40 years since that book has come out, that just isn't in there. The last U.S. edition of this book came out about a 2007, so I wondered if it has some of the things I've come to learn. At over $100-160 to buy a hardback edition or even a kindle edition, that's a bit steep for a more than a decade-old book . So I asked a Maldives computer scientist for his copy which can be had for about $15.

The book arrived the other day and I was eager to see what's been updated. None of the topics I had hoped would be covered are in there. So lemme describe what I think should be in there.

In my case, I make lots of mistakes. So I have needed and have used debuggers a lot. And I've written quite a few of them, even for some obscure things like GNU Makebash, or zsh. (But also for more conventional languages like Python and Go as well.)

More discussion on debuggers or just the on the run-time system and environment support for debuggers would be nice. The runtime system for dynamic languages versus statically compiled languages is a bit different. And Just-in-time compilers extend on but work a bit differently than either interpreters or compilers. Nothing is mentioned here.

I’ve needed to beef up the run-time support in many of the dynamic programming languages that I’ve worked on to create or extend a debugger. For example, between bash 2.05b and bash 3, changes were made based on patches that I submitted. Those changes made it possible to write a debugger.

Such information would include properties of stack frames or code objects that assist stack backtraces, execution tracing, execution profiling, and debugging. Or the different ways one could implement breakpoints; the different ways of storing program-location information. Or the various kinds of debugger APIs like remote out-of-process debugging of Java, nodejs/V8, or PHP, versus in-process callback-hook debugging which is found in many dynamic languages.

Or issues around debugging optimized code. One simple idea that I would like to see explored is the optimizer thinking of transformations as a transaction in the database sense so that it would just log what transformation took place. That might make unravelling the result easier for the hapless programmer who has to unscramble things.

In an area similar to debugging and run time, there are object formats and what goes into an object format. Or what goes on in a binder or linker. When I look in the index for object formats like COFF, ELF, EXE, OMF, or I'll even take the old Unix a.out, there is nothing. Likewise for the associated symbol table sections like DWARF, or stabs. John R Levine’s book “Linkers and Loaders” while 17-years old is still very relevant. (And it too has a cool and appropriate cover.)

And there is the corresponding thing to linking and loading that goes on in dynamic languages: importing, requiring or loading and method lookup strategies.

In working with the Go interpreter, I became aware of the "boxed-value" memory layout it uses. And "thunks". Those are not in the index either. The newer book “Engineering a Compiler” by Keith Cooper and Linda Torczon does talk about the memory shape of datatypes and that is a welcome addition to compiler books.

Recently I've been interested in using decompilers as a way of communicating the exact location in a program when an error occurs or when inside a debugger. Since Python and Ruby do very little in the way of “compiling” , reversing the process is very doable. And in such cases you get freakishly exact results. (Possibly not so surprising in Python where they try to maintain that there is just one right “Pythonic” way to do things.)

But many decompilers don't handle this as a kind of syntax-directed compilation pipeline (for reasons I need to describe elsewhere), so the implementation of the decompiler code is very hacky. If guidance was given in the same way as it is in the dragon book for a compiler pipeline, implementations might be less hacky, have more rigor and be more robust.

There are a some small but important differences between a compiler and a decompiler. Namely:

  • you don't start out with program text but instructions of some sort.
  • you can assume that what you are given is correct which means..
  • you don't have to understand everything completely. For example you can ignore bookkeeping instructions (popping a stack value to make it match a join with another branch)
  • you can assume type checking is correct

I could elaborate on each of the points above, but understanding the differences helps in adapting standard compiling theory.

This is true, for example, with respect to parsing, though what I've found most useful is the Jay Earley parser. It handles ambiguous grammars, which generally would be considered a bad kind of grammar to create in a programming language. But in this context it is the right thing to do. As with translating from one human language to another: there can be several valid ways to do it.

For flexibility and ease of use then, you want the flexibility of the Earley parser. The dragon book mentions Earley only in passing in the last chapter’s end notes, and has a dismissive view of its efficiency. Wikipedia says that for most context-free grammars (or almost context-free grammars which is the case here), it is linear if you prefer left recursion over right recursion, except in pathological cases. And in my experience this is true. Furthermore, the pathological cases can often be addressed once you know where they are. Due to its extreme flexibility like adding rules at run time based on instructions seen, or the ability to do some checks when a grammar reduction occurs, I wouldn't consider using anything else at the moment.

Speaking of grammars, missing is the whole topic of how to debug a grammar, and common test practices, e.g. round tripping, bootstrap testing, and running tests on the entire standard library. In the tool I use, I've added grammar coverage profiling to give an understanding of which grammar rules are extraneous, or whether my test suite is adequate.

There is a curse that comes with any book being successful and good at what it does: I suspect that it reduces incentive in research in compiler topics that broaden and change things slightly. My decompiler example is one such example.

So we are stuck with a book that while over 950 pages long, is still very abbreviated (or totally lacking) for at least my use cases where compiling technology has come in handy. I wonder to what extent material in there has stifled research and innovation in other areas, because compilers were considered a done deal. I mentioned the short-shrift of the Earley parser and pessimistic performance time complexity. I knew about this and so it put me off when I encountered the first decompiler I ran into that used it. I had the same feeling when I first learned about register allocation by graph coloring, because the first edition of the dragon book talked only about Sethi-Ullman stack-based register allocation. I am pleased to say that that omission has been corrected in later editions.

And speaking of parsers, what about packrat grammars also known as Parsing Expression Grammars or PEGs? And for that there's something called the omega trick. PEG is not in the index either.

In the 2013 Pearson edition I have, the index, while decently long, omits authors. This is in contrast to the in contrast to the 1986 Addison Wesley book: “Compilers Principles, Techniques, and Tools”,

Also omitted is an accumulated list of references to look up authors - just what is in the end notes. The preface is gone along with what's changed between the editions, although Amazon’s advertising suggests some things. No acknowledgements in the newest edition. If you accumulate the end-chapter references, there are maybe about half of what was in the 1986 accumulated bibliography. The end-chapter references are so abbreviated that they don’t cover the important papers. Only in a dozen or so references is the publication date after 1990, the bulk coming from 1960–1985.

Here is one an example of a missing reference probably among several. In the “Code Generation” section they mention the programming language ML (as in OCAML) that their pedagogical programming language is modeled after. ML is neither listed in the references nor in the index to the pages talking about characteristics of ML.

So what I'd like to see is a book of similar length covering all the things that I feel are missing in the Dragon book. This book would cover or cover more JIT compilers, decompilers, transpilers, debuggers, object formats, bytecode VMs, and packrat grammars and dynamic interpreters that are given short shrift in the dragon book.

See also Jeffrey Kegler’s Parsing: a timeline -- V3.0


Saturday, February 4, 2017

Name that interval: Brahms 1st Symphony Horn solo

I have an obsession with pitch and timbres which brings me to this ramble.

We start with the Big Ben chimes:

Compare with this horn solo part of the 4th movement of the Brahms 1st symphony:

It is similar.

But the part I want to focus on is is the interval between the 8th and 9th notes which is not in the clock chimes. In the Brahms symphony is it is between a B natural and a Bb. By the way, when you listen to the solo flute repeat the theme, you can hear a little pitch glitch and between the 5th and 6th notes (a D and E) and you can see what's going on when the fingers go between 011-000 and 110-110. I am always fascinated by such little glitches that give a little character to the sound.

But now I get to the part that fascinated me. I had heard that Brahms horn parts were not for the modern valve horns you seen the Youtube clip, but are for a hand horn. And this instrument doesn't play that interval as perfectly.

In my mind I got that this interval was slightly less than the equal-temperament note. When I mentioned it an eminent musicologist, he said, well you know in the harmonic series, you have it wrong. That note should be slightly greater than equal temperament.

Brahms wrote Clara Schumann in a letter that he got this theme from listening to Alpine Horns when he was on vacation in the Alps. So I went to check out what the alpine horn does. See

and he is right! It is slightly flatter then equal temperament. If you map this to the natural overtone sequence this is harmonics 12, 11, 9 (G, F#, D). And even more undisputable, listen to this, 7 seconds in:

and even played with a C crook 4 minutes in :

you will hear it less flat, but not sharp as it was in my memory. So what gives with my bad memory? Is it just more senility?

Possibly. It could be that I was just hearing it as the intervals between F#, E, C. Or more likely translating the slight less than half step between C and B in the natural horn when the B is muted with the hand. With hand adjustments, the French Horn no longer strictly follows the harmonic series.

Or just wrong. I’ve updated my memory on this though.

Python in 2016 feels like C of 1996

In 2016 I use Python a lot. I've been using it over 10 years or so. But even with that much use it still feels clunky. It reminds me of my experiences with C which go back even longer.

Let me explain.

There are people who feel that Python has revolutionized the ease with which they work. xkcd had an comic about it: xkcd: Python,  and I guess the pythonistas liked it so much that since Python 2.7 (and pypy 2.6) it has been in the standard library distribution that you can get to it via the antigravity module that was added since the comic came out.

And yes, many find Python an improvement over say earlier languages, much in the same way was felt about C. I can see how they feel that way. It is true that there is such a great body of libraries in the language, like C. Python has about everything.

Except elegance, and orthogonality.

But before diving into that, let me come back to C circa 1996. As I said that's exactly how people felt and I suppose still feel about C. For example, what is the most-used version of Python written in? C, of course. In fact, that version of Python is often called CPython.

Back in 1996, people were ranting about C, I guess rightfully, was because it was like the Fortran for systems programming. So I guess in a sense C around 1989 or so was the Fortran of 1959. And I'll go into the Fortran comparison just a little bit as well.

When I first started working at IBM research I worked with a guy, Dick Goldberg, who worked on the original Fortran project. Back then I felt Fortran was a little clunky too: there was no BNF language definition (but note that the B in BNF stands Backus, the guy that also was behind Fortran. Fortran came before this and this aspect was addressed in general, if not in Fortran).

And it had these weird rules like you couldn't index arbitrary expressions like x[i*j] but you could do some limited forms of that like x[2*i] or is it x[i*2]? And the reason for that was that back then they translated this kind of thing into IBM 360 assembler which allowed a memory index (a multiplication) along with a memory offset adjustment (an add) as a single instruction and the Fortran compiler could turn some of these into that form although it couldn't even re-associate 2*i to i*2 if that was needed.

But the big deal, as Dick explained to me, is that scientific programmers and numerical analysts didn't have to learn assembly language, because Fortran did a good enough job, on average, to obviate that need for the most part.

And then that brings us back to C. Fortran is not suitable for writing operating systems, or systems code and so around 1990 many Operating Systems were written in assembly language. Microsoft's OS's were. IBM's were as well.

But now for Unix, then minix and then Linux, you could write in the higher level language, the same as you could in Fortran for numerical or scientific software.

And libraries were written and to this day the bulk of complex libraries are still written in C: for regular expressions, for XML parsing, Database systems like mysql and postgres, and a lot of systems code. Even the guts of numpy, the numerical package for Python, has a lot of C code at its core and that is very common. You'll find other popular programming languages like Ruby, or Perl are written in C and their numerical packages fall back to C as well. Also the bindings that interface to those C systems.

Inelegance

Ok. So now let’s get back to inelegance in Python.

In Python, some functions are top-level functions like len()hasattr()type()getattr() str()repr() and so on, and some functions are methods off of an object instance, like the .join() or .append() methods of string and list objects, even though lists and strings are built-in types. So instead of writing in a uniform o.join('a').split('x').len().str() or str(len(split('x', join(o)))) you have to go back and forth depending on which arbitrary way it was decided for the function, e.g. str(len(‘o’.join(‘a’).split('x'))).

This is like learning in arithmetic expression which operators are infix, postfix, prefix and what the precedence is when there aren't parenthesis. And yes, Python follows in that tradition like many programming languages including C nowadays as well. This is in contrast to Polish prefix of Lisp, or the old HP calculators, Forth, or Adobe's Postfix.

I'm not suggesting languages change it. But I am saying it is more cumbersome. And when you extend that notion, it hurts my little brain.

C was notorious for extending that into "address of" (&) and "indirection of" (*), pre and post increment (e.g. ++) bitwise operators (|), logical operators (&&) and indexing ( [] ), and selection/indexing ( . ->). Keeping in mind the precedence was so difficult that most people just used parenthesis even when not strictly needed.

I do need to back off the comment about top-level versus method functions in Python. But only a little...

"len" for example is a method off of some type like string or list. But it has these ugly double underscores around it, I think that was added to deter people from writing it in the more natural or uniform way like you would in say javascript.

Python does stuff like this: take something that would be usable but muck it up more to make it less usable. And there is a community of Python Nazis out there who will scream at you if you use x.__len__() however ugly it is instead of len(x).

These are the same kinds of people who insist that you should write:

import os

import sys

rather than the shorter:

import os, sys

When you ask why? They say just because. Well, actually they don't say it that way, they refer you to an acronym and number like PEP8, which is taken like legal law. Everyone has to do it that way. Fascism.

The Fascists say that doing things this way makes it easier to understand if people do things the same way. Really? Mankind has long been able to deal with variation in expression without any effort whatsoever. I can write "It is one thing" or "It is a thing" and most people don't obsess over whether it has to be "one" or "a". Actually, in English there is a certain style difference if I want to emphasise the singleness I might choose "one" over "a".

And so I'd argue that kind of nice subtlety is missing by the Python fascists. For my part, I'd prefer stipulating it would be okay to stipulate that every program imports "os" and "sys" and be done with that and get on with the rest of my life and the more important task of programming. The next best thing is just to string along that preamble crap in one line the way Jewish blessings always start "Baruch atoy adenoy eluhanu" (Blessed is God, King of the universe").

Oh, and by the way the "import" statements have to be at the top, (except that sometimes there are situations where they won’t work if they are at the top). That seems so 1980's Pascal like to me which required all constants, types and variables to be put at the top of the program. But in Pascal it was not for the same kind of fascism, but just that the Pascal grammar was limited in that way.

And while on the topic of Pascal, let me mention another aspect related to Pascal. Originally and to the end of Python 2, "print" was a reserved word. Very similar to Pascal's "println". And because it was a reserved word, you didn't put parenthesis around the arguments to print. I'm sure this was thought clever in the same way it was clever in Pascal. When Python 3 came about that changed so that print is now the more regular and uniform function call. Oh, but by then a decade-or-so-old code base then had to change.

Now you might think, "who knew"? Well, although Pascal had that feature (and most other languages including C just didn't), by the time the successor to Pascal, Modula, was developed the mistake was corrected. And here's the point: this was all done before or contemporaneous with Python's start.

It's one thing to make a mistake and correct it. But another thing to make the same mistake as was just fixed in another language, and then spend several years before you fix it the same way everyone else does. Pascal and Modula were developed in Europe, same as Python, so it's really no excuse not to have known about it.

Stubbornness and Doing things differently

So why is it that such things take so long to address? The Python language has been stubborn and unrelenting in its wrongness. I think Guido now agrees that indentation thing was a mistake. However there was a long period of time when its superiority was asserted. To me, the mistake is not so much about using indentation for a block begin, but the lack of adding even an optional "end" terminator.

But because of Python's indentation, printing python programs in print is subject to error since there isn’t the needed redundancy check. And more seriously, you can't embed that in another a templating language because the templating can mess up the fragile indenting. So if you are doing say web development in Django, you need to learn another language which of course doesn't have that indentation rule. Ruby programmers don't suffer that limitation and their templating systems use Ruby.

It also feels to me like stubbornness and arrogance that Python has long resisted using common idioms of other languages. (By the way, I sense that Dennis Ritchie and Guido Rossum are a little similar here). So the backtick notation of shell, Perl, and Ruby is not in Python. Nor for a long time was there simple variable interpolation. (In Ruby that's done via # rather than $ but the change was for technical reasons, not arrogance or a desire to do things differently).

In Python, how to do subprocesses has changed over the years, and still isn't as simple as backtick. But I suppose that has some cover because there are security considerations. Variable interpolation inside a string was also long resisted, although there was something like the C format specifiers inside a string with the '%'. But once Python added its own flavor of variable interpolation using the .format() method C style variable format specifiers are eschewed.

This kind of being the last on the block to give in to something that is common among other languages and then add it in a different way seems to be the Python way. A ternary if/else operator that C introduced and adopted in other languages is another example.

For those that want to work across multiple languages this kind of awkward originality just adds more warts.

Like Python, when C first came out, it had a number of unfamiliar things like all those post and pre operators, I guess the format specifier thing, variable arguments, the C preprocessor with its macro expansion. (Many of these weren’t strictly unique: you could find examples of these things elsewhere, such as in assemblers; but compared to say the high-level ALGOL-like languages and things like Cobol or Fortran, this was different). In C's case though they largely stuck to this and didn't change course. When developing Go, the mistakes of C were just fixed and so there I feel that Go is a very welcome upgrade to C.

Misc other things: comprehensions (not needed in better-designed languages), language drift, import statements.