Array Set Operations in Ruby

Do you ever find yourself doing this?

tags = %w[foo bar baz]
tags << 'buz' unless tags.include?('buz')

Or:

tags << 'baz'
tags.uniq!

In both cases, we have an Array we want to use as a set, containing only unique elements.

One way to tackle this more cleanly is to simply use a Set.

require 'set'
tags = Set.new(%w[foo bar baz])
tags.add('foo')
tags.add('buz')
tags # => #<Set: {"foo", "bar", "baz", "buz"}>

But the Set and Array interfaces differ in some regards, and if other code is already expecting the collection to be an Array, that solution may not be practical.

As it happens, Array supports several basic set operations innately. You may already know about these, but in case you don’t, here are some examples.

Set union:

tags = %w[foo bar]
tags |= %w[foo buz] # => ["foo", "bar", "buz"]

Set difference:

tags = %w[foo bar]
tags - %w[bar baz] # => ["foo"]

Set intersection:

tags = %w[foo bar]
tags & %w[bar baz] # => ["bar"]

It’s a small thing, but perhaps it will save you a few lines of code.

UPDATE: My WordPress “related posts” feature points out that I have officially begun to repeat myself. Ah well. If nothing else this article has a bit more explanation than the one from 2010.

This entry was posted in Ruby and tagged , , , . Bookmark the permalink.
  • http://ngauthier.com Nick Gauthier

    I also love using sets in testing when I want to compare two arrays but I don’t care about order. Clearer than sorting them.

    • xternal

      In RSpec, a matcher =~ is provided for comparing arrays without consideration to order. It’s quite helpful. Not saying you do or should use RSpec; it just seems to be a lesser known matcher so sharing in case anybody finds it useful.

      The Set trick is neat too, I will keep that in mind!

      • http://avdi.org Avdi Grimm

        Hey, cool!

      • http://ngauthier.com Nick Gauthier

        Yeah, that’s neat. Kind of an odd overloading, but I guess “matches” makes sense for the arrays.

      • myronmarston

        FWIW, the recommended way to match an array w/o order is:

        expect(array).to match_array(other_array)

        The `expect` syntax is the recommend syntax to use with matchers now, and it intentionally does not support operator matchers. Read this for more info:

        http://myronmars.to/n/dev-blog/2012/06/rspecs-new-expectation-syntax

        • Brendon Murphy

          Thanks for the tip. That definitely reads more explicitly.

  • myronmarston

    Good stuff. I use sets all the time, not simply because I want set semantics, but also because `Set#include?` is O(1) and `Array#include?` is O(N). When you’re checking membership in a collection in a type loop, using a set rather than an array can make a big difference. Last week I optimized a method that was taking over 20 minutes to run (w/o doing any IO) down to 16 seconds by changing an array to a set.

  • Dan Bernier

    “My WordPress “related posts” feature points out that I have officially begun to repeat myself.”

    Well, you need to store your posts in a Set, not an Array!

  • Zubin Henner

    In the third example you can do &= to set the variable to the intersected result.

    • http://twitter.com/benhamill Ben Hamill

      You can similarly do -= like the second example. :)

  • KevinSjoberg

    Indeed a very good post. I wasn’t familiar with the union functionality, and I can really see use cases where it might come handy.

    It’s interesting that I haven’t seen much code using Set. I wonder why it is so? Is just because people isn’t familiar with it?

  • http://twitter.com/tehpeh Tim Preston

    Hey Avdi, I have a small gem that encompasses set operations on Array https://github.com/tehpeh/set_theory

    Cheers!

  • zzak
  • wxianfeng

    Great, it is intersting