Getting Pedantic About Ruby Semantics

I just can’t resist the tempation to be an asshole language lawyer.

Yehuda thinks that it’s important to keep a semantic model of Ruby behavior in mind, and not get hung up on implementation details. He’s right. While it’s important to understand the implementation, with a half a dozen or more different implementations of Ruby out there, it’s more and more important that we understand the semantics of Ruby as a language separately from Matz’ Ruby Interpreter (MRI).

The best resource we have for determining the semantics of Ruby-as-a-language is the draft ISO standard. The draft standard is not without it’s problems, not the least of which is that wider Ruby community hasn’t exactly been encouraged to participate in the standards process. But it’s the closest thing we have to a definition of Ruby’s semantics divorced from any specific implementation. I looked up the parts of the standard which pertain to blocks, in order to see if Yehuda’s mental model of Ruby agrees with that of the creator of Ruby.

Here’s the part defining blocks:

11.2.2 Blocks

[...]

Semantics
A block is a sequence of statements or expressions passed to a method invocation. A block can be called either by a yield-expression (see §11.2.4) or by invoking the method call on an instance of the class Proc which is created by an invocation of the method Proc.new to which the block is passed (see §15.2.17.3.3).

And here’s the part which specifies the semantics of &block arguments:

A block parameter: This parameter is represented by block-parameter-name. The parameter is bound to the block passed to the method invocation.

[...]

13.3.3 Method invocation

The way in which a list of arguments is created are described in §11.2.

Given the receiver R, the method name M, and the list of arguments A, take the following steps:

a) If the method is invoked with a block, let B be the block. Otherwise, let B be block-not-given.

[...]

Push B onto [[block]]. [This is a notational convention indicating an attribute of an execution context -Ed].

[...]
iv) If the block-parameter of L occurs, let D be the top of [[block]] .

I) If D is block-not-given, let V be nil.

II) Otherwise, invoke the method new on the class Proc with an empty list of arguments and D as the block. Let V be the resulting value of the method invocation.

III) Let n be the block-parameter-name of block-parameter.

IV) Create a variable binding with name n and value V in Sb .

As I read the standard, Yehuda is incorrect in his interpretation of Ruby block semantics. Blocks are non-object entities which form one of the attributes of an execution context. If the method provides a name for the block in the form of an &block parameter, then and only then is an object created, by explicitly calling Proc.new.

Does it matter that this model differs from Yehuda’s? In at least one respect it does. Yehuda points out that in MRI, the object_id of a given block is always the same even. By my read of the standard, this is an optimization, and not something to rely on as a part of Ruby’s semantics.

But that’s not the end of the story. The stated intent of the Ruby ISO standard is to formally describe the the behavior of MRI 1.8.7. So does it, in this case? Let’s see:

class < < Proc
  alias old_new new
  def new(&block)
    puts "in Proc.new"
    old_new(&block)
  end
end

def foo(&block)
  puts "In foo"
  bar(&block)
end

def bar(&block)
  puts "In bar"
  block.call
end

foo do
  puts "In block"
end

# >> In foo
# >> In bar
# >> In block

It certainly doesn’t look like Proc.new is being called here. Now, whether this is because MRI is not flexible enough to enable an effective redefinition of Proc.new, or whether Proc.new is never called, is unclear without the assistance of a C-level debugger. If the latter, then Yehuda’s mental model is closer to the real behavior of MRI than the draft standard.

And when standards no more reflect reality than any given hacker’s imagination, the question becomes: what semantics do you think the language should have, moving forward? As a statement of what he thinks the semantics ought be, I personally think Yehuda’s mental model is pretty reasonable, and consistent with Ruby’s “everything is an object” philosophy. But stating that it is the canonical semantics of the language upon which Ruby hackers should base their assumptions – that seems a bit presumptuous to me.

This entry was posted in Uncategorized and tagged , . Bookmark the permalink.
  • Etienne Vallette d'Osia

    Your redefinition is wrong, you actually replace the existing new method, instead of adding one (class methods are singleton methods of the class object), so super can't work.
    class << Proc
    alias old_new new
    def new(&block)
    puts “in Proc.new”
    old_new(&bloc)
    end
    end

    I think Yehuda is right and the specs should be changed, but what *I* think is not important ;-)
    You should send a small mail to mailing-list.

    • http://avdi.org avdi

      You're right about super, but it doesn't matter in this example because the method is never called anyway (at least, not in MRI 1.8.7).

      • Etienne Vallette d'Osia

        I have never said this correction fixes this problem :-)

        I tested it on ruby 1.9.1, jruby and rubinius, none of them print the line “in Proc.new”, so I am certain it is a spec issue

        rvm ruby test.rb :

        jruby-1.4.0: jruby 1.4.0 (ruby 1.8.7 patchlevel 174) (2009-11-02 69fbfa3) (Java HotSpot(TM) Client VM 1.6.0_15) [i386-java]

        in foo
        in bar
        in block

        rbx-1.0.0-rc2: rubinius 1.0.0-rc2 (1.8.7 release 2010-01-04 JI) [i686-pc-linux-gnu]

        in foo
        in bar
        in block

        ree-1.8.7-2010.01: ruby 1.8.7 (2009-12-24 patchlevel 248) [i686-linux], MBARI 0×8770, Ruby Enterprise Edition 2010.01

        in foo
        in bar
        in block

        ruby-1.8.7-p249: ruby 1.8.7 (2010-01-10 patchlevel 249) [i686-linux]

        in foo
        in bar
        in block

        ruby-1.9.1-p378: ruby 1.9.1p378 (2010-01-10 revision 26273) [i686-linux]

        in foo
        in bar
        in block

    • http://avdi.org avdi

      I went ahead and updated the code sample, thanks for the note!

  • http://twitter.com/wycats wycats

    @avdi “And when standards no more reflect reality than any given hacker’s imagination, the question becomes: what semantics do you think the language should have, moving forward?”

    Here's the thing. My imagination reflects reality, while the standard does not. So while it's true that “the standards reflect reality no more than my imagination”, they certainly reflect reality less than my imagination.

    I think a very good touchstone for reality is how Rubinius thinks of the world, since much of it is implemented in Ruby. Consider the following code:

    def bar(&block) block end
    foo = proc { 1 }
    bar(&foo)

    In Rubinius, foo is passed directly to bar, because, in Rubinius, &foo means “get the Proc from foo and and send the result to the method”.

    This view of the world clearly explains why sending a block across multiple methods retains object identity, while the Proc.new approach in the spec, in addition to being provably false (as you demonstrated) does not explain it.

    When choosing between two competing descriptions of the same phenomenon, we tend to choose the description that most fits with the phenomenon being described, not a “platonic ideal” of what we wish we were seeing.

    • http://avdi.org avdi

      The thing is, we can also posit a third model where, as in the spec, a block is a non-object part of a method's execution context. It is only reified as a Proc when a &block argument is specified, and memoizing that Proc object is an allowable optimization. This model is no less elegant and fits the observed phenomena equally well. At least one commenter on your article put forth this model or one very similar to it, and you dismissed them as if yours was clearly the more valid mental model.

      I don't think yours is a *bad* model, but I don't see what makes it more valid than any other. The fact that one implementation of Ruby chose this model doesn't convince me, especially since I suspect MRI did not go this route (I haven't had time to check the source to confirm that).

      • http://twitter.com/wycats wycats

        “memoizing the Proc is an allowable optimization” disregards the fact that it's actually *true* in all implementations.

        There's an open bug in 1.8.x in which changing the metaclass of the Proc produces a different Proc across method boundaries, but this is (a) fixed in 1.9, and (b) not a bug in Rubinius and JRuby.

        I say it's a bug because if you read the actual C code, you can see the mistake (a use of CLASS_OF without rb_class_real).
        In implementations without this bug, you can modify the metaclass of a Proc and it will persist across method boundaries. Additionally, this is *actually* the implementation in Rubinius.

        Also, this model doesn't neatly handle the case of & where the object itself is not a Proc. In all implementations, &foo calls to_proc on foo, and the returned Proc is passed to the method. In order to understand this behavior using the “reified block” model, you need to also posit that the block is extracted from the returned Proc and then reattached later. This step is entirely unneeded to explain what is happening.

        Since it *is not* always happening in practice (for instance, in Rubinius), and the step is unneeded to explain the observed phenomenon, why should we add it to our model of the phenomenon?

    • http://avdi.org avdi

      As far as I can tell, your model is more a platonic ideal than anything anyone else has put forward – in that it tries to explain observed phenomenon in terms of processes which don't actually take place in MRI. This isn't a bad thing – most mature language standards *must* set forth a kind of platonic ideal if they are to remain implementation neutral. They describe the semantics, not the mechanics.

      It just seems to me that you are trying to describe your platonic ideal as THE shared mental model that Ruby coders should keep in mind – and that's going to bite people unless and until you are able to convince Ruby implementors that your described semantics are the ones they should strive to preserve. If that's what you're trying to do here – put forward a semantic model you think *should* be the common mental baseline for Ruby implementors – then just say so. But I don't think it's accurate yet to say that this model is the one that all implementors are already keeping in mind as they build their runtimes.

  • http://blog.sebastianguenther.org/ Sebastian Günther

    Avdi, you have sparked my interest with your comments on Yehuda posts. I'm questioning whether their is one true source to explain Rubys semantic at all. I could use a pragmatic approach and use Rubyspec and Rubinus to describe the semantic with the same (language) abstraction level. Or I could put much effort into understanding MRI or JRuby and explain what happens there.

    And this is curious: The mental model preforms my understanding of the languages internal workings. And as I want to understand how the metaprogramming capabilities of Ruby work, choosing the right model is essential.

    But what is the “right model”?

  • http://blog.sebastianguenther.org/ Sebastian Günther

    Avdi, you have sparked my interest with your comments on Yehuda posts. I'm questioning whether their is one true source to explain Rubys semantic at all. I could use a pragmatic approach and use Rubyspec and Rubinus to describe the semantic with the same (language) abstraction level. Or I could put much effort into understanding MRI or JRuby and explain what happens there.

    And this is curious: The mental model preforms my understanding of the languages internal workings. And as I want to understand how the metaprogramming capabilities of Ruby work, choosing the right model is essential.

    But what is the “right model”?