Sunday, February 15, 2015

Ruby Frame objects

Introduction 

I may write more about this later, but a brief intro may be useful here.

As mentioned in A Personal History of Ruby Debuggers, Part 1: Ruby 1.8, when I started there weren't many Ruby debuggers and they weren't of the caliber that you could find for other programming languages at that time, notably Perl or Python. As of now, there are a number of Ruby debuggers and IDEs that hook into Ruby for debugging.

But no matter how many there are, there is a limit on their quality and power that's dictated by the runtime system.

In an exchange with David Rodriguez, I noted a feature request he had made for the ability to get a list of local variables for a call-stack frame.

This has since been added  in Ruby 2.1. More detail below.

David indicated he uses this code:


  bind = frame_binding(frame_no)
  bind.eval('local_variables.inject({}){|h, v| h[v] = eval(v.to_s); h}')

I have bad memories of writing lots of code of this ilk in ruby-debug. The problem is that a lot of it is not only slow, but also error prone. And in some cases you can only approximate the information you want.

Specifically, the Ruby method local_variables walks up the call stack and gives local variables defined in this or in enclosing scopes. Suppose you just want those that are defined in the current scope?

As a result of an overexposure in ruby-debug to this kind Rube Goldberg convoluted code, I decided that I'd just fix things at the source. Internally, MRI keeps a table of the local variables for the current scope. Therefore, getting that information from the Ruby runtime is really straightforward; it doesn't need to run a special eval() with a binding.

Call Frame Object 

And that leads me to the call-stack frame object that I've added in my patches.

Adding this one object solves a lot of problems, not just the one cited above.

The frame object contains all of the information that caller_locations() has, such as source location. But it also contains dynamic information as well. Specifically, it contains a binding method that can be used in calling eval(). It contains information about the current program counter and stack pointer. It contains access to the instruction sequence if the code is Ruby code. It has a method to get a list of local variables and retrieve their values, without having to run eval(), since the information is stored directly in the local table.

Location, Location, Location

In my Critique of MRI Ruby 2.0's TracePoint API, I mentioned that I don't use caller_locations() even though it is a very welcome addition to Ruby. Part of the reason as you now see is because I use the Frame object. And in my view there are some things in the way that caller_locations() reports locations which I think are misfeatures if not out-and-out bugs.

So let me next describe the differences.

First, I have something called a source container. Most of the time, expression and statements come pretty much directly from a file in a filesystem but that doesn't have to be the case. At runtime you can evaluate a string. In that case the container is the string. Sure, the eval statement might be written in a file. But on the other hand, it might not. It's possible for the statement running the eval to have gotten run automatically too. And how do you indicate that you are at the second or third line in a string? At one time in Ruby, it did this by adding the line number position in the string onto the line number position in the file that contained eval()!

Also in Ruby there is the possibility that you are running code from a dynamic library. While that is technically a file too, it isn't a text file.

It is also not inconceivable that code you run is part of an file archive or packaging system. So while the file might be that archive file, you also probably want to know the member name in the
archive.

Source container is there to help out with such things. It is a tuple of container type and value(s). So for eval the first entry is "file", for an eval string the first entry is "string", and for a dynamic library the first entry is "binary".

In addition to the container (traditionally called "filename"), I have source_location(). In reality, there is a program counter of some sort, either a Virtual Machine PC or the PC associated with the C code if you are stopped in a Ruby C function. That's the true location, even if what you want to know is what is the closest reference in the source code. Associating a single source code line with a PC is a tricky problem. In the presence of compiler optimization such as common subexpression elimination it is possible that a VM PC might refer to several locations in the source code. Therefore the source location is an Array of line numbers. In practice for Ruby, the array has length 1.

What is the source location if you are stopped inside a C function? Well, it should be the PC of where you are stopped, a machine address. Right now, I use the PC for the function entry even if you are stopped somewhere else. To me this is still a little more honest than what Ruby does which is report the last Ruby text source-code location.

In the Ruby debugger, for C functions I'll also include the closest Ruby source-code location, but note it with "via". For example you might see:

c_call (address 0x7f195fe7dc10 via /tmp/foo.rb:1)

This machine address you could then use with say gdb. If you wanted to know what C function name is associated with that function, you could run the gdb command info symbol.

For eval strings, I show a bit of the string. And the debugger takes that string and stores it in a temporary file. For example:

line (eval "1+2":1 remapped /tmp/eval-8d79933-20150215-5399-qnfr5p.rb:1 @2)

Above `@2' is the VM PC.

In finishing up this section on location, one concept I hope you come away with is the idea of transparency: if you are stopped in a machine address of a C function, I report that and don't try to sugarcoat it by giving you the last known Ruby source text location.

The Downsides of a Frame Object.

Security:

The frame object I provide exposes some of the MRI Ruby runtime system. Some compiler writers are a bit scared to allow programmers this low-level access. Yet, you have the same low-level access if you use gdb, and what's given in the Ruby debugger can be more helpful since it is more high-level. I'd rather debug in a debugger that understands Ruby than one where I am forced to work at the level of C and assembly language.

I trust and hope that programmers use the information responsibly. For the most part, folks will just be crashing their own program.

For those situations where security is paramount, just don't have this interpreter around. That's partly why it is a completely separate Ruby interpreter.

Invalid Frames:

The second problem with the frame object is that it may no longer be valid. The object is valid only for as long as that frame object is on the call stack. It is possible to save a frame object in some sort of variable that persists beyond the lifetime of the frame it refers to. The code I wrote does some basic checks to see if a frame is valid. For example, the address of the internal frame pointer has to be inside the range of currently valid frame pointers. And even if it is we check to see that static unchanging frame data is the same as the data as when the frame was created.

No comments:

Post a Comment