Friday, April 3, 2015

Sysadmin Interview Questions (circa 2000)

At one of the not-swift places places I worked that had a high turnover rate, we put together the following set of interview questions. The most important and basically operative interview question was the last one in the list.


In order to have a successful interview at our-company, you  must be able to answer the following questions correctly:

1. What kinds of files are generally kept in /tmp?
  a) files that could be erased
  b) important records and sensitive market data that need to saved
  c) binaries
  d) comma files (bonus points for describing what comma files are)
  e) personal mail you'd rather your manager not read

2. What are files ending in the extension .o used for?

3. Scenario:
  Let's say there is a program which hogs resources and occasionally
  doesn't complete. What should be done?

  a) understand why it doesn't complete and possibly fix
  b) run it from crontab every 15 minutes
  c) reboot

4. Scenario:
    Let's say there is a mail reader that corrupts the mailbox when /tmp
    fills up. How would you deal with this situation?

  a) fix the mail reader
  b) use another mail reader
  c) use existing SNMP monitors to check for filesystems which are more than
     95% full
  d) increase /tmp
  e) reboot
  f) remember what you did to fix the corrupted file the last time you had this
     problem so you could quickly do it again

5. What are the important options of the AIX shutdown command?

6. What is the best program editor ever created?
  a) more
  b) cat
  c) vi

7. Essay: discuss the advantages of rebooting hundreds of servers at one time.
  [Hint: Make sure you include ease of remembering and quickness of pain]

8. Are you a light sleeper?

Answers:

1. all except a. See also question 4.
A comma file is the predecessor to the "dot" file e.g. ,profile ,login ,rhosts. Comma comes before dot in the ASCII collating sequence. As predecessor, it doesn't have the feature that dot files have of not being expanded by metacharacters

It is used by applications programmers to make Korn shell code look more sophisticated. For example:

  cat </dev/null>,verbose
2. This is a silly question because at our company, the concept of a filetype or class of file doesn't exist. However some Systems  Administrators use .o for the extension on backup copies or "old" copies of a file. It is handy to use because some programs the C  compiler and some Makefiles will remove them for you automatically.

3. b. And also every 5 minutes

4. d. e & f are only for Senior System Administrators

5. -F 0. [fast,  wait 0 seconds] Some applicants will proffer -r (reboot),  but that is optional.

6. A trick question: SA's don't program.

7. You can test not only the hardware but also the network as all the
   servers try make connection with the single nameserver. It might
   simulate what would happen during a nuclear attack.

8. SA's never sleep!


Executive Email server

The first part of a series on Sysadmin horror stories.

I worked at a large financial organization where the CEO didn't "do" email.

A clever first-line network manager had the idea to get the CEO into the late 20th century. (Yes, it was back then, but still it was at a time when the CEO had no excuse not to use email.)

The CEO had a secretary who handled most of his paper correspondence and meetings and such.

Since there were no official resources to for this, the manager relegated an old Solaris Sparc server dedicated just for this guy's email. Given the technology at the time — this was before cloud email providers and virtual machines — this was a perfectly reasonable thing to do.  In fact, I thought it clever.

To isolate it from everything else, he gave the server its own DNS domain name inside the
company. That too was perfectly reasonable.

The domain name had company-exec in the DNS name to try to entice people to send email to use it.

All of this was working fine initially. The CEO didn't get that much email as he didn't use it. Sometimes, his secretary would.

Over time, though, as some of the managers who worked underneath found out about this email address they would request a boutique email on that domain as well. And the managers under the top-level manager who were a little more email savvy didn't get really get that much email either.

At that point, I think the network manager tried to get a bigger disk for the little server, but since the CEO was a tightwad about such things and didn't care about email, he wouldn't approve it.

Now here's where things started to go awry. We now get to the the next level of managers who saw that their managers and the CEO all had this exclusive email domain. They now wanted an email account in that domain and on that server.

People at that level did know how to use email, sort of, and got a lot of it. They were the kind of people that insisted on being on automated email lists. The kind of people who 90% of their email is a forward of (detailed, automated, or already forwarded) email that was sent to them. They would respond with the entire text adding the choice commentary that they got paid so much for:
underling -
plz address 
Yes, these kinds of managers were essentially overpaid switchboard operators. And were it not for the fact that most of these people were male, I wouldn't be surprised if they were in fact laid-off switchboard operators of an earlier era.

Or rather switchboard multiplexers because the email would cc'd to the 10 other people on the email list, some of which were also on using the executive email server.

It was becoming clear that the little server just didn't have the disk space to handle this. Again attempts to get a beefier server — to no avail.

The shit hit the fan due when it got coupled with another quirk.

In general, application software and financial trading software wasn't written all that well. The company relied on the steely nerves and quick responses of systems administrators to correct for application errors. (More on that in another blog).

One of the flaky programs for one of the real-time trading systems had a custom program written for it that checked the health every second (or less) because, well, this was an important real-time program. And what it did when the programmed stopped working, was spam the email system with alerts emails.

So shortly after that next level of managers was added to the "executive" email system, there was a problem with the crappy real-time trading application; the server was flooded with gigabytes of emails alerts to all of the executives with a smattering of their cc's back to other people, some on the executive email server, to fix. The end result was that during this crises, all of them were locked out of sending and receiving emails.

We took  advantage of the outage caused by that server to swap it out with another beefier box where we had the ability to add a larger disk later.  As for approving the a larger disk, the secretary slipped an invoice of about $200 for it in-between two of his travel vouchers which was an order of magnitude larger.

Aside from the other features of this story, an aspect about this that haunts me is how a series of reasonable local decisions all individually logical or maybe clever led to a total disaster.  I
sometimes think writing software is like that too.

The "Test Driven Development" philosophy addresses this by embracing "refactoring" code. That is, as part of the process is a step where you reassess and rework before going further.

Sunday, March 15, 2015

Runtime-Support for Breakpoints, Step-in, and Step out

Introduction


One of the things my enhanced runtime for Ruby has is support for breakpoints, step over and step out.  I'd like to explain why.

Motivation — Speed

Most debuggers in dynamic languages hook into a callback mechanism provided by the language. The callback is triggered by an event, such as just before running a new statement. Thus, the debugger is written in the same language it is debugging. There are callback hooks to support debugging in Ruby, Python, Perl, and POSIX shell.

Without breakpoint step-out or step-over code, debuggers suffer from the "are we there yet?" syndrome. At each event the debugger code has to check to see if

  1.  we have reached a breakpoint, 
  2.  we are single stepping, or 
  3.  we have  just stepped out of a function. 

In those cases where we want to "step out", "step over" or "step to the next breakpoint", too often the result is "no, don't stop; keep going".

But dynamic languages are 10-60 times slower than C, and we often want to debug large programs. So this means that running step out or step over under a debugger can be an order of magnitude slower than normal execution.

Existing Art

This is one reason why Kent Siblev could bill his ruby-debug as the "fast" debugger: he wrote key parts, including as breakpoint handling, in a C extension, whereas rdebug.rb is written entirely in Ruby.

I have seen the "are we there yet" slowdown also in Python. Nir Aides promoted his winpdb as a fast Python debugger. As an artifact of the way it is written, the debugger allows you to step at fewer places; consequently it does fewer checks.

Interestingly, the venerable Perl interpreter supports debuggers better in this respect. It has support for breakpoints inside the interpreter.

ruby-debug does its checks for breakpoints in a C extension. Neither it nor Perl has support for "step over" or "step out" in the runtime. I wondered if it was possible to do better, and I think the answer is yes, with a caveat described later. So to test whether this was so, I embarked on changing the runtime. So far I am pleased with the results.

The caveat is that these checks incur a small amount of overhead when debugging is not enabled. But Mark Moseley did some initial benchmarking and found the overhead negligible. I'll describe how I drive down the overhead later, but since this a completely separate interpreter, if what you need is that last bit of speed, don't use this interpreter. Switch to it only when it is needed. Down the line it might be interesting to figure out how to seamlessly switch interpreters mid-execution.

Let me now describe how breakpoints, step over, and step out are implemented.

Breakpoint Support

Breakpoints are implemented by allocating essentially a bit vector in those instruction sequences where we need it.

In the instruction sequences where we want a breakpoint, the breakpoints are implemented using what can be thought of as a bit vector although, for speed and simplicity at the expense of space, it is really a byte vector.

Even a bit vector for each instruction is a little more space than is strictly needed because not all indices in an instruction sequence are on VM instruction boundaries and I currently only allow setting breakpoints on statement boundaries, specifically VM "TRACE" instructions. If we allow instruction sequences to be modified, we could set this breakpoint bit inside the trace instruction, but we have to be aware of possible run-time implications. There are security, performance, and architecture advantages in having read-only instruction sequences, although the current Ruby interpreter does not make use of these.

So, the additional test for breakpoint stopping occurs at the same place we test for calling trace hooks. And to further speed things up, we first perform a zero-comparison test to see if there are any breakpoints set in the instruction sequence before indexing into the byte vector to see if the specific instruction has a breakpoint. Since at any given time, there is a small number of breakpoints, often none, we usually don't have to index into the byte vector. We just test for zero.The Ruby code uses a program UNLIKELY() to indicate to a compiler like gcc that this test is likely to be false.

When no trace hooks are in effect the tests are, in fact ,the same tests that Ruby normally uses.

Let's compare what's done here versus what the "fast" ruby-debug has to do. Our code is part of the runtime, so there is no callback even a callback to C code. Just that saves a lot of time. But ruby-debug then has to get the source-code file name and line number; then it uses that to compare against a list of breakpoints. When there are no breakpoints this isn't so bad. But when there are some, it can't do a quick test to see if we are in an instruction sequence containing breakpoints. And we do not need to access the source-code filename or line number. ruby-debug 1.9 and greater could use some of these ideas.

Step Over/Step Out Support

Test to support step over/step out inside Ruby, while straight-forward, are a bit time consuming. To perform step out, we need to check:

  • Is the current nesting level less than or equal to the starting nesting level? (exception handling and change the nesting level by more than one)
  • Are we in the same thread?

And of course this means accessing the current thread id, and computing the nesting level (which means counting the number of entries in the call stack). As before, when stepping over large chunks of code, there can be a lot of "no, keep going" results which take time before getting to a "yes, stop".

So with runtime support, how can we handle this? The implementation is pretty simple. We record in the stack frame whether we should step-stop in this frame or not. An additional bit indicates whether we should step-stop in frames created from the frame. So there is a small bit-test done on frame creation to set a trace bit or not. But the overhead is measured in terms of a couple of instructions rather than hundreds of instructions, as is the case in Ruby.

And as before we can use the UNLIKELY() pragma to indicate that most of the time we won't be in "step over" or "step out".

Sunday, February 15, 2015

Ruby Frame objects

Introduction 

I may write more about this later, but a brief intro may be useful here.

As mentioned in A Personal History of Ruby Debuggers, Part 1: Ruby 1.8, when I started there weren't many Ruby debuggers and they weren't of the caliber that you could find for other programming languages at that time, notably Perl or Python. As of now, there are a number of Ruby debuggers and IDEs that hook into Ruby for debugging.

But no matter how many there are, there is a limit on their quality and power that's dictated by the runtime system.

In an exchange with David Rodriguez, I noted a feature request he had made for the ability to get a list of local variables for a call-stack frame.

This has since been added  in Ruby 2.1. More detail below.

David indicated he uses this code:


  bind = frame_binding(frame_no)
  bind.eval('local_variables.inject({}){|h, v| h[v] = eval(v.to_s); h}')

I have bad memories of writing lots of code of this ilk in ruby-debug. The problem is that a lot of it is not only slow, but also error prone. And in some cases you can only approximate the information you want.

Specifically, the Ruby method local_variables walks up the call stack and gives local variables defined in this or in enclosing scopes. Suppose you just want those that are defined in the current scope?

As a result of an overexposure in ruby-debug to this kind Rube Goldberg convoluted code, I decided that I'd just fix things at the source. Internally, MRI keeps a table of the local variables for the current scope. Therefore, getting that information from the Ruby runtime is really straightforward; it doesn't need to run a special eval() with a binding.

Call Frame Object 

And that leads me to the call-stack frame object that I've added in my patches.

Adding this one object solves a lot of problems, not just the one cited above.

The frame object contains all of the information that caller_locations() has, such as source location. But it also contains dynamic information as well. Specifically, it contains a binding method that can be used in calling eval(). It contains information about the current program counter and stack pointer. It contains access to the instruction sequence if the code is Ruby code. It has a method to get a list of local variables and retrieve their values, without having to run eval(), since the information is stored directly in the local table.

Location, Location, Location

In my Critique of MRI Ruby 2.0's TracePoint API, I mentioned that I don't use caller_locations() even though it is a very welcome addition to Ruby. Part of the reason as you now see is because I use the Frame object. And in my view there are some things in the way that caller_locations() reports locations which I think are misfeatures if not out-and-out bugs.

So let me next describe the differences.

First, I have something called a source container. Most of the time, expression and statements come pretty much directly from a file in a filesystem but that doesn't have to be the case. At runtime you can evaluate a string. In that case the container is the string. Sure, the eval statement might be written in a file. But on the other hand, it might not. It's possible for the statement running the eval to have gotten run automatically too. And how do you indicate that you are at the second or third line in a string? At one time in Ruby, it did this by adding the line number position in the string onto the line number position in the file that contained eval()!

Also in Ruby there is the possibility that you are running code from a dynamic library. While that is technically a file too, it isn't a text file.

It is also not inconceivable that code you run is part of an file archive or packaging system. So while the file might be that archive file, you also probably want to know the member name in the
archive.

Source container is there to help out with such things. It is a tuple of container type and value(s). So for eval the first entry is "file", for an eval string the first entry is "string", and for a dynamic library the first entry is "binary".

In addition to the container (traditionally called "filename"), I have source_location(). In reality, there is a program counter of some sort, either a Virtual Machine PC or the PC associated with the C code if you are stopped in a Ruby C function. That's the true location, even if what you want to know is what is the closest reference in the source code. Associating a single source code line with a PC is a tricky problem. In the presence of compiler optimization such as common subexpression elimination it is possible that a VM PC might refer to several locations in the source code. Therefore the source location is an Array of line numbers. In practice for Ruby, the array has length 1.

What is the source location if you are stopped inside a C function? Well, it should be the PC of where you are stopped, a machine address. Right now, I use the PC for the function entry even if you are stopped somewhere else. To me this is still a little more honest than what Ruby does which is report the last Ruby text source-code location.

In the Ruby debugger, for C functions I'll also include the closest Ruby source-code location, but note it with "via". For example you might see:

c_call (address 0x7f195fe7dc10 via /tmp/foo.rb:1)

This machine address you could then use with say gdb. If you wanted to know what C function name is associated with that function, you could run the gdb command info symbol.

For eval strings, I show a bit of the string. And the debugger takes that string and stores it in a temporary file. For example:

line (eval "1+2":1 remapped /tmp/eval-8d79933-20150215-5399-qnfr5p.rb:1 @2)

Above `@2' is the VM PC.

In finishing up this section on location, one concept I hope you come away with is the idea of transparency: if you are stopped in a machine address of a C function, I report that and don't try to sugarcoat it by giving you the last known Ruby source text location.

The Downsides of a Frame Object.

Security:

The frame object I provide exposes some of the MRI Ruby runtime system. Some compiler writers are a bit scared to allow programmers this low-level access. Yet, you have the same low-level access if you use gdb, and what's given in the Ruby debugger can be more helpful since it is more high-level. I'd rather debug in a debugger that understands Ruby than one where I am forced to work at the level of C and assembly language.

I trust and hope that programmers use the information responsibly. For the most part, folks will just be crashing their own program.

For those situations where security is paramount, just don't have this interpreter around. That's partly why it is a completely separate Ruby interpreter.

Invalid Frames:

The second problem with the frame object is that it may no longer be valid. The object is valid only for as long as that frame object is on the call stack. It is possible to save a frame object in some sort of variable that persists beyond the lifetime of the frame it refers to. The code I wrote does some basic checks to see if a frame is valid. For example, the address of the internal frame pointer has to be inside the range of currently valid frame pointers. And even if it is we check to see that static unchanging frame data is the same as the data as when the frame was created.

Saturday, February 14, 2015

A Critique of the Ruby 2.1's TracePoint API

In updating Ruby 1.9.1 patches to work on Ruby 2.1.5, I have gotten familiar with the TracePoint API, and more generally, with the changes that facilitate run-time introspection error reporting and debugging.

For example, caller_locations() is like the old caller() routine but gives structured data. This is a welcome addition that I recall asking for in 2010. However, in the debugger I've been recently working on, I don't use it. More on this in another blog entry.

Here, I will mostly focus on the TracePoint API.

Side comment: the name TracePoint seems really weird to me. I don't get the point of "Point" in the name. Perhaps this is another word for "Object" or "thingy"?

But whatever its name, TracePoint is a big improvement over Kernel#set_trace_func(). And here's why: set_trace_func() has global effect. Calling that function removes any previous callback.

As far back as I can remember in Ruby 1.9, the thread tracing structures coded in C were a linked list that allowed for several hooks to be serviced, but Ruby 1.9 didn't have any API to get access to that or the event bitmask. Therefore, in patches I wrote for 1.9 I added access to add and remove entries to this list. I also wrote a Ruby gem, rb_tracer, to give access to this and to act as a multiplexer when there were several hooks.

And here is why you might want to have multiple hooks. Suppose you are interested in gathering statistics for calls and returns. For example you might want to record elapsed time in some functions. So you could write a trace hook that triggers only on calls and returns. And at some point you might want to debug other portions of the code. Or maybe you want to do something with regard to exceptions that get raised. So, one trace function would gather the statistics while the debugger might temporarily use another hook, and another hook might do something for exception events.

Well, the TracePoint API now handles registering multiple hooks and each can register what event they want to trigger on. So those patches and the rb_tracer gem are happily obsolete.

Unfortunately, there are limitations.

Once you set the events you want to trigger on, you can't change them. Nor can you see what mask has been set. In the context of a debugger, I allow the user to set and change events. And of course I want to be able to see what events we've registered. So in my patches to 2.1.5, I allow you to set and change the event masks.

Also good is the fact that the TracePoint API is Object Oriented. Again compare this to the older set_trace_func() which is a method. As a result, you no longer have that long list of 6 parameters in the callback. Instead there is just the one object. From that, you can get whatever other information you need, like the event triggered, source position, access to a binding object to be able to run eval() in the context of where you were stopped, and so on.

But ...

In terms of OO design it feels like there are two objects combined into one. Consider this example from ruby-doc:

trace = TracePoint.new(:raise) do |tp|
   p [tp.lineno, tp.event, tp.raised_exception]
end

trace and tp are both of type TracePoint, but they really function differently. tp has methods which are specific to the callback (lineno(), event(), raised_exception()).  But some methods aren't specific to the callback such as enabling and disabling the tracepoint. Setting the event mask isn't specific to the callbeck. In the example above, the event mask is :raise.

Finally, there is a weird artifact in the implementation. When the tracepoint is enabled, you well get a "c return" event callback from the enable() method.

Here is how I eliminate that unwanted callback in my patches. In order to facilitate "step in", "step over" and "step out", I have the ability to disable tracing on a per-frame basis. So inside the enable() method, I disable frame tracing for that frame.

Wednesday, February 4, 2015

A Personal History of Ruby Debuggers, Part 1: Ruby 1.8

Introduction


I'm writing yet another debugger for Ruby 2.x. Or more precisely I am extending the debugger I started in Ruby 1.9.x for Ruby 2.x.  And that is an extension of the Ruby debugger(s) I worked on in Ruby 1.8. Why?

I think this is progress with Ruby and debuggers and the run-time support needed for debuggers, but no as much as there could be.

So let me start with a little personal history of debuggers and run-time support — where we've been — before describing what I hope to achieve in the debugger effort this time around.

A Personal History if Ruby Debuggers - Ruby 1.8


My first encounter with a Ruby debugger was in Ruby 1.8. It was the one that Matz wrote. It is still available today. You can invoke the debugger using the -r flag with the module name debug. Putting those together you get an invocation that looks something like this:

  $ ruby -rdebug myprogram.rb

I started extending and changing this to make Matz's code a little more like gdb. For example the frame command in that debugger and gdb's frame command do two different things.

But early on I came across Kent Sibilev's excellent debugger for Ruby 1.8  called ruby-debug. However in contrast to Matz's debug module there was a command-line program you could invoke called rdebug.

While the command-line name was clever, ultimately I think it caused a lot of confusion between running:


   $ ruby -rdebug myprogram.rb

versus:

   $ rdebug myprogram.rb

Before going on, let me note two facets of my activity in debuggers from the outset:
  • rather than write a debugger from scratch, I started extending someone else's code.
  • I was interested in keeping commands rather than inventing or reappropriating them. Here I follow the gdb commands.
Kent Sibilev billed ruby-debug as the "fast" Ruby debugger. The adjective was I think justified. Matz's debugger was written in pure Ruby, and two things made it slow:
  • In order to be able to support evaluating Ruby expressions inside the debugger, the callback API dictates that a binding object be created for every callback.
  • there are lots of callbacks to the Ruby code. This is okay if you are single stepping, but if you are running "step over", "step out" or "step to breakpoint" the slowness is noticeable
Kent Sibilev handled these two bottlenecks by coding stop-condition determination and binding object creation in C using the API that Matz provided for Ruby 1.8. Furthermore, to mitigate the slowness in tracing overhead, Kent added the ability to turn it on and off. This was done by the calls:

   Debugger.start

and:

   Debugger.stop

But these needed to be used in conjunction with "require 'ruby-debug'." Also you need to make an initial call to the debugger. Other alternatives included passing a block to the start method, or encountering an unhandled exception after setting up post-mortem debugging. Later we added a convenience mechanism to reduce these three steps to one:

   require "ruby-debug/debugger"

Another cool idea Kent had in ruby-debug was a stepping mode where you could "force" the next stopping point to be on a different line. Originally the command invocation was set force, but I changed it or rather allowed "set different on" to mean the same thing.

One last thing about ruby-debug is worth mentioning.  From the beginning, Kent allowed different ways to interact with the debugger. One way of course is via a terminal in the same process on the same computer; another way is to run batch debugger commands as happens in running a user profile. A third way was similar to how Java debuggers run: the debugger sets up a listener on a socket. 

Early on, folks working on IDEs were interested in using the C extension as a back end, and of course, there was no interest in the command-line front end. To accommodate this, we split the gem into two parts,  a "base" part which IDEs could use, and a pure Ruby part.

Finally, I'd like to note a couple of aspects that came up from the start.  In my opinion, Ruby 1.8 didn't have enough run-time support to enable writing an industrial-strength debugger. However to Matz's credit, in Ruby version 1.8.4 an API was added that made it possible for Kent Sibilev to provide that run-time support.

Kent Sibilev's a good guy. As sometimes happens in the open-source community things change and people sometime drop out of communication. I haven't heard from Kent since around the end of 2007.

Note: There are a lot of people whose contributions I have omitted. I am sure there are things I don't remember; possibly I've gotten things wrong. If you feel slighted or I've got something wrong, contact me or comment below so I can correct this.

Why a Blog?

This blog is an experiment.

There is a bit of information I'd like to get out there. My experience is that the topics I have something to say about are often too specialized to give as a conference talk or submit as a
paper somewhere.

Yet I just don't see these things out there elsewhere.
Here are some other venues I have tried:

Newsgroups:



Unfortunately there's not a lot of activity on these.

Public Blogs


I've written about my Perl debugger in blogs.perl.org, for example: http://blogs.perl.org/users/rockyb/2012/09/a-plan-for-revamping-and-replacing-the-perl-debugger.html

I don't see that there is anything like this for Ruby.

Project Wiki, Mailing lists, etc.


There are wikis for each of the various debuggers, but a wiki doesn't encourage discussion.

An issue tracker or the mailing lists tend to get centered around an individual's problems rather than a general discussion.

So, here we go...