Saturday, February 4, 2017

Python in 2016 feels like C of 1996

In 2016 I use Python a lot. I've been using it over 10 years or so. But even with that much use it still feels clunky. It reminds me of my experiences with C which go back even longer.

Let me explain.

There are people who feel that Python has revolutionized the ease with which they work. xkcd had an comic about it: xkcd: Python,  and I guess the pythonistas liked it so much that since Python 2.7 (and pypy 2.6) it has been in the standard library distribution that you can get to it via the antigravity module that was added since the comic came out.

And yes, many find Python an improvement over say earlier languages, much in the same way was felt about C. I can see how they feel that way. It is true that there is such a great body of libraries in the language, like C. Python has about everything.

Except elegance, and orthogonality.

But before diving into that, let me come back to C circa 1996. As I said that's exactly how people felt and I suppose still feel about C. For example, what is the most-used version of Python written in? C, of course. In fact, that version of Python is often called CPython.

Back in 1996, people were ranting about C, I guess rightfully, was because it was like the Fortran for systems programming. So I guess in a sense C around 1989 or so was the Fortran of 1959. And I'll go into the Fortran comparison just a little bit as well.

When I first started working at IBM research I worked with a guy, Dick Goldberg, who worked on the original Fortran project. Back then I felt Fortran was a little clunky too: there was no BNF language definition (but note that the B in BNF stands Backus, the guy that also was behind Fortran. Fortran came before this and this aspect was addressed in general, if not in Fortran).

And it had these weird rules like you couldn't index arbitrary expressions like x[i*j] but you could do some limited forms of that like x[2*i] or is it x[i*2]? And the reason for that was that back then they translated this kind of thing into IBM 360 assembler which allowed a memory index (a multiplication) along with a memory offset adjustment (an add) as a single instruction and the Fortran compiler could turn some of these into that form although it couldn't even re-associate 2*i to i*2 if that was needed.

But the big deal, as Dick explained to me, is that scientific programmers and numerical analysts didn't have to learn assembly language, because Fortran did a good enough job, on average, to obviate that need for the most part.

And then that brings us back to C. Fortran is not suitable for writing operating systems, or systems code and so around 1990 many Operating Systems were written in assembly language. Microsoft's OS's were. IBM's were as well.

But now for Unix, then minix and then Linux, you could write in the higher level language, the same as you could in Fortran for numerical or scientific software.

And libraries were written and to this day the bulk of complex libraries are still written in C: for regular expressions, for XML parsing, Database systems like mysql and postgres, and a lot of systems code. Even the guts of numpy, the numerical package for Python, has a lot of C code at its core and that is very common. You'll find other popular programming languages like Ruby, or Perl are written in C and their numerical packages fall back to C as well. Also the bindings that interface to those C systems.

Inelegance

Ok. So now let’s get back to inelegance in Python.

In Python, some functions are top-level functions like len()hasattr()type()getattr() str()repr() and so on, and some functions are methods off of an object instance, like the .join() or .append() methods of string and list objects, even though lists and strings are built-in types. So instead of writing in a uniform o.join('a').split('x').len().str() or str(len(split('x', join(o)))) you have to go back and forth depending on which arbitrary way it was decided for the function, e.g. str(len(‘o’.join(‘a’).split('x'))).

This is like learning in arithmetic expression which operators are infix, postfix, prefix and what the precedence is when there aren't parenthesis. And yes, Python follows in that tradition like many programming languages including C nowadays as well. This is in contrast to Polish prefix of Lisp, or the old HP calculators, Forth, or Adobe's Postfix.

I'm not suggesting languages change it. But I am saying it is more cumbersome. And when you extend that notion, it hurts my little brain.

C was notorious for extending that into "address of" (&) and "indirection of" (*), pre and post increment (e.g. ++) bitwise operators (|), logical operators (&&) and indexing ( [] ), and selection/indexing ( . ->). Keeping in mind the precedence was so difficult that most people just used parenthesis even when not strictly needed.

I do need to back off the comment about top-level versus method functions in Python. But only a little...

"len" for example is a method off of some type like string or list. But it has these ugly double underscores around it, I think that was added to deter people from writing it in the more natural or uniform way like you would in say javascript.

Python does stuff like this: take something that would be usable but muck it up more to make it less usable. And there is a community of Python Nazis out there who will scream at you if you use x.__len__() however ugly it is instead of len(x).

These are the same kinds of people who insist that you should write:

import os

import sys

rather than the shorter:

import os, sys

When you ask why? They say just because. Well, actually they don't say it that way, they refer you to an acronym and number like PEP8, which is taken like legal law. Everyone has to do it that way. Fascism.

The Fascists say that doing things this way makes it easier to understand if people do things the same way. Really? Mankind has long been able to deal with variation in expression without any effort whatsoever. I can write "It is one thing" or "It is a thing" and most people don't obsess over whether it has to be "one" or "a". Actually, in English there is a certain style difference if I want to emphasise the singleness I might choose "one" over "a".

And so I'd argue that kind of nice subtlety is missing by the Python fascists. For my part, I'd prefer stipulating it would be okay to stipulate that every program imports "os" and "sys" and be done with that and get on with the rest of my life and the more important task of programming. The next best thing is just to string along that preamble crap in one line the way Jewish blessings always start "Baruch atoy adenoy eluhanu" (Blessed is God, King of the universe").

Oh, and by the way the "import" statements have to be at the top, (except that sometimes there are situations where they won’t work if they are at the top). That seems so 1980's Pascal like to me which required all constants, types and variables to be put at the top of the program. But in Pascal it was not for the same kind of fascism, but just that the Pascal grammar was limited in that way.

And while on the topic of Pascal, let me mention another aspect related to Pascal. Originally and to the end of Python 2, "print" was a reserved word. Very similar to Pascal's "println". And because it was a reserved word, you didn't put parenthesis around the arguments to print. I'm sure this was thought clever in the same way it was clever in Pascal. When Python 3 came about that changed so that print is now the more regular and uniform function call. Oh, but by then a decade-or-so-old code base then had to change.

Now you might think, "who knew"? Well, although Pascal had that feature (and most other languages including C just didn't), by the time the successor to Pascal, Modula, was developed the mistake was corrected. And here's the point: this was all done before or contemporaneous with Python's start.

It's one thing to make a mistake and correct it. But another thing to make the same mistake as was just fixed in another language, and then spend several years before you fix it the same way everyone else does. Pascal and Modula were developed in Europe, same as Python, so it's really no excuse not to have known about it.

Stubbornness and Doing things differently

So why is it that such things take so long to address? The Python language has been stubborn and unrelenting in its wrongness. I think Guido now agrees that indentation thing was a mistake. However there was a long period of time when its superiority was asserted. To me, the mistake is not so much about using indentation for a block begin, but the lack of adding even an optional "end" terminator.

But because of Python's indentation, printing python programs in print is subject to error since there isn’t the needed redundancy check. And more seriously, you can't embed that in another a templating language because the templating can mess up the fragile indenting. So if you are doing say web development in Django, you need to learn another language which of course doesn't have that indentation rule. Ruby programmers don't suffer that limitation and their templating systems use Ruby.

It also feels to me like stubbornness and arrogance that Python has long resisted using common idioms of other languages. (By the way, I sense that Dennis Ritchie and Guido Rossum are a little similar here). So the backtick notation of shell, Perl, and Ruby is not in Python. Nor for a long time was there simple variable interpolation. (In Ruby that's done via # rather than $ but the change was for technical reasons, not arrogance or a desire to do things differently).

In Python, how to do subprocesses has changed over the years, and still isn't as simple as backtick. But I suppose that has some cover because there are security considerations. Variable interpolation inside a string was also long resisted, although there was something like the C format specifiers inside a string with the '%'. But once Python added its own flavor of variable interpolation using the .format() method C style variable format specifiers are eschewed.

This kind of being the last on the block to give in to something that is common among other languages and then add it in a different way seems to be the Python way. A ternary if/else operator that C introduced and adopted in other languages is another example.

For those that want to work across multiple languages this kind of awkward originality just adds more warts.

Like Python, when C first came out, it had a number of unfamiliar things like all those post and pre operators, I guess the format specifier thing, variable arguments, the C preprocessor with its macro expansion. (Many of these weren’t strictly unique: you could find examples of these things elsewhere, such as in assemblers; but compared to say the high-level ALGOL-like languages and things like Cobol or Fortran, this was different). In C's case though they largely stuck to this and didn't change course. When developing Go, the mistakes of C were just fixed and so there I feel that Go is a very welcome upgrade to C.

Misc other things: comprehensions (not needed in better-designed languages), language drift, import statements.

No comments:

Post a Comment