Generating cows with IO.popen()

I find the subject of starting and interacting with other OS processes fascinating. A few years ago I wrote a never-completed series on the many ways to spawn off processes in Ruby:

  • Part 1: Backticks and system()
  • Part 2: Opening pipes to processes
  • Part 3: Open3, PTY, and Shell standard libraries

However, those were all written for Ruby 1.8, and are a little out of date.

Ruby 1.9 substantially expanded and revamped the Ruby process API. Many of the tasks which once required third-party gems like Open4 can now be easily accomplished using the built-in calls. Particularly notable is that Process.spawn was added with a comprehensive set of options for customizing how the process is started, and the other standard process-starting calls were updated to accept the same set of arguments as Process.spawn.

Here’s how www.cowsays.com starts a Perl process in order to generate ASCII-art cows:

cowsay_path = Pathname(__FILE__).dirname + "../bin/cowsay"
perl_path   = "/usr/bin/perl"
cows_path   = Pathname(__FILE__).dirname + "cowsay/cows"
env         = {
  'COWPATH' => cows_path.to_s
}
args        = %W[-f #{@cowfile}]
@io.popen([env, perl_path, cowsay_path.to_s, *args], 'r+') do
  |process|
  process.write(message)
  process.close_write
  process.read
end

Notes:

  • Since I know where Perl is found on Heroku boxes, I hardcode the path to it. Unless you have a specific reason for allowing program paths to be overridden, it’s always safer to hardcode the path to executables. That’s one less vector for attacks.
  • @io simply points to the IO class in production. It’s an instance variable so I can write isolated unit tests verifying how my code interacts with the system.
  • IO.popen starts a process and creates a pipe to that process, which your program can use to send data into that process’ STDIN, and read data from its STDOUT.
  • I pass an array instead of a string to IO.popen. If you pass a string, Ruby will start up a shell process and pass the string to the shell for interpretation. This is handy for commands like ls -l *.rb where you want standard shell-expansion to be performed. It’s also very dangerous when dealing with user-supplied data; it opens up a door for shell-injection attacks similar to SQL injection exploits. Only shell-injection is even more dangerous, since a shell injection can potentially execute any command that your server process has the rights to execute.
  • The array version, by contrast, skips the shell. The arguments are passed directly to the process as strings in it the ARGV variable. This version is always my preference unless I have a specific reason to want to do shell expansion. This is also why I avoid backticks (`some-command ...`) or %x() except in little personal-use scripts.
  • I make use of another feature of IO.popen, Process.spawn and friends: I customize the environment variables the process will see by passing a hash as the first argument. I want more Ruby programmers to know about this feature; I sometimes see code which accomplishes this effect by setting ENV['somevar'] before spawning a process and then re-setting it afterwards. Passing a hash is much cleaner.
  • Note that by default, only the hash keys specified in the environment hash will be changed for the subprocess; the rest will be copied from the current environment. There’s an option, however, to use the hash as a complete replacement for the current environment. See the IO.spawn documentation for details.
  • I pass r+ as the “file mode” argument. This tells Ruby I want to read from and write to the child process. The default is read-only.
  • Note the line where I call process.close_write. This is very important, and something which bites a lot of first-time users of popen (it sure used to bite me!). With many programs, if you don’t explicitly close the pipe for writing after finishing your write, you’ll wait forever trying to read the program’s output. This is because the program being executed as a subprocess was written to read its input up to EOF before outputting and exiting. And it won’t get that EOF until you close the pipe for writing. In addition, the operating system may buffer the data flowing from your program to the subprocess, and calling .close_write forces that buffer to be flushed.

I hope you’ve learned something new about starting processes in Ruby from this post. If you have any process tips of your own, or any burning questions about spawning processes, feel free to bring them up in the comments!

By the way, if you’re interested in finding out more about working with UNIX processes in Ruby, the Ruby Rogues book club is currently reading Jesse Storimer’s book on the subject.

This entry was posted in Ruby and tagged , . Bookmark the permalink.
  • http://raggi.myopenid.com/ James Tucker

    “I pass an array instead of a string to IO.popen. If you pass a string, Ruby will start up a shell process and pass the string to the shell for interpretation. ”
    ^^ not sure this is entirely true
    i had to read through the code base the other day to solve some signal problems, although i was reading 1.8
    ruby actually has a ton of different cases where it turns these things into sh(1) execs instead
    and it always makes strings instead of actually passing the varargs downstream
    (contrary to popular commentary elsewhere
    it’s also worth noting that system() has signal trap bugs, and so is not safe for us as subshells and in other scenarios
    you’re almost always better off using fork { exec }; Process.wait
    which handles signals properly

    I would have to check the 1.9 code base to be sure of exactly how much of the above is still true there. It would be well worthwhile reading process.c to verify these comments, as they’re very important for the reasons you already outlined in the article.

    • http://avdi.org Avdi Grimm

      We discussed this offline, replying for the sake of the readers: as James pointed out to me, Ruby may use heuristics to optimize some cases where it looks like the string doesn’t contain any shell-interpreted symbols, and skip the shell. However, you should still always treat process calls which take strings instead of arrays as if they will be executed by a shell.