Ruby Doc (RDoc) (Best Tutorial 2019)

Ruby Documentation RDoc

Ruby Documentation RDoc

In this blog, we’re going to look at the finer details of developing reliable software code such as Ruby documentation (Rdoc), error handling, debugging, and testing. Without documenting, testing, and Debugging your Program, it’s unlikely that anyone but you could work on the code with much success, and you run the risk of releasing faulty scripts and applications.

 

Ruby Documentation

Even if you’re the only person to use and work on your Ruby code, it’s inevitable that over time you’ll forget the nuances of how it was put together and how it works. To guard against code amnesia, you should document your code as you develop it.

 

Traditionally, documentation would often be completed by a third party rather than the developer or would be written after the majority of the development had been completed.

 

Although developers have always been expected to leave comments in their code, true documentation of a quality such that other developers and users can understand it without seeing the source code has had less importance.

 

Ruby makes it extremely easy to document your code as you create it, using a utility called RDoc (standing for “Ruby Documentation”).

 

Generating Documentation with RDoc

Generating Documentation with RDoc

RDoc calls itself a “Document Generator for Ruby Source.” It’s a tool that reads through your Ruby source code files and creates structured HTML documentation. It comes with the standard Ruby distribution, so it’s easy to find and use.

 

RDoc understands a lot of Ruby syntax and can create documentation for classes, methods, modules, and numerous other Ruby constructs without much prompting.

 

The way you document your code in a way that RDoc can use is to leave comments prior to the definition of the class, method, or module you want to document. For example:

This class stores information about people. class Person
attr_accessor :name, :age, :gender
Create the person object and store their name def initialize(name)
@name = name end
Print this person's name to the screen
def print_name
puts "Person called #{@name}"
end
end

 

This is a simple class that’s been documented using comments. It’s quite readable already, but RDoc can turn it into a pretty set of HTML documentation in seconds.

To use RDoc, simply run it from the command line using rdoc <name of source file>.rb, like so: rdoc person.rb

 

Note On Linux and OS X this should simply work “out of the box” (as long as the directory containing RDoc—usually /usr/bin or /usr/local/bin—is in the path). On Windows it might be necessary to prefix rdoc with its full location.

 

This command tells RDoc to process person.rb and produce the HTML documentation. By default, it does this by creating a directory called doc from the current directory and placing its HTML and CSS files in there. Once RDoc has completed, you can open index.html, located within the doc, and you should see some basic documentation.

 

The HTML documentation is shown with three frames across the top containing links to the documented files, classes, and methods, respectively, and a mainframe at the bottom containing the documentation being viewed at present.

 

The top three frames let you jump between the various classes and methods with a single click. In a large set of documentation, this quickly becomes useful.

 

When viewing the documentation for the Person class, the documentation shows what methods it contains, the documentation for those methods and the attributes the class provides for its objects. RDoc works this out entirely from the source code and your comments.

 

RDoc Techniques

 

RDoc Techniques

In the prior section, you got RDoc to generate documentation from a few simple comments in your source file.

 

However, RDoc is rarely useful on such a small example, and its real power comes into play when you’re working on larger projects and using its advanced functions. This section will cover some of these functions so you can comment the code on your larger projects correctly.

 

Note The following sections give only a basic overview of some of RDoc’s features. To read the full documentation for RDoc and learn about features that are beyond the scope of this blog, visit the official RDoc site at http://docs.ruby-lang.org/en/2.2.0/RDoc.html.

 

Producing Documentation for an Entire Project

Producing Documentation

Previously you used rdoc along with a filename to produce documentation for a single file. However, in the case of a large project, you could have many hundreds of files that you want to be processed.

 

If you run RDoc with no filenames supplied, RDoc will process all the Ruby files found in the current directory and all other directories under that. The full documentation is placed into the doc directory, as before, and the entire set of documentation is available from index.html.

 

Basic Formatting

Formatting your documentation for RDoc is easy. RDoc automatically recognizes paragraphs within your comments, and can even use spacing to recognize structure. Here’s an example of some of the formatting RDoc recognizes:

#= RDoc Example
#
#== This is a heading
#
#* First item in an outer list
* First item in an inner list
* Second item in an inner list #* Second item in an outer list
* Only item in this inner list
#== This is a second heading
#Visit www.rubyinside.com
#== Test of text formatting features
#

 

#Want to see *bold* or _italic_ text? You can even embed #+text that looks like code+ by surrounding it with plus #symbols. Indented code will be automatically formatted: #
class MyClass
def method_name
puts "test"
end
end
class MyClass
end

To learn more about RDoc’s general formatting features, the best method is to look at the existing code that is extensively prepared for RDoc, such as the source code to the Ruby on Rails framework, or refer to the documentation at http://docs.ruby-lang.org/en/2.2.0/RDoc/Markup.html.

 

Modifiers and Options

Modifiers and Options

RDoc can work without the developer knowing much about it, but to get the most from RDoc it’s necessary to know how several of its features work and how they can be customized. RDoc supports a number of modifiers within comments, along with a plethora of command-line options.

:nodoc: Modifier

 

By default, RDoc will attempt to use anything it considers relevant to build up its documentation. Sometimes, however, you’d rather RDoc ignore certain modules, classes, or methods, particularly if you haven’t documented them yet.

 

To make RDoc ignore something in this way, simply follow the module, class, or method definition with a comment of nodoc:, like so:

 

This is a class that does nothing class MyClass and This method is documented def some_method

end

def secret_method #:nodoc: end

end

In this instance, RDoc will ignore secret_method.

:nodoc: only operates directly on the elements upon which it is placed.

If you want :nodoc: to apply to the current element and all those beneath it (all methods within a class, for example), do this:
This is a class that does nothing class MyClass #:nodoc: all

This method is documented (or is it?) def some_method

end

def  secret_method

end

end

Now none of MyClass is documented by RDoc.

 

Turning RDoc Processing On and Off

RDoc Processing On and Off

You can stop RDoc from processing comments temporarily using #++ and #--, like so:

This section is documented and read by RDoc.

#-- This section is hidden from RDoc and could contain developer notes, private messages between developers, etc.

#++

# RDoc begins processing again here after the ++.

This feature is particularly ideal in sections where you want to leave comments to yourself that aren’t for general consumption.

 

Note RDoc doesn’t process comments that are within methods, so your usual code comments are not used in the documentation produced. RDoc will also not process comments that are separated from other comments with blank lines.

 

Command-Line Options

Command-Line Options

Like most command-line applications, including Ruby itself, you can give RDoc a number of options, as follows:

  • all: Usually RDoc processes only public methods, but --all forces RDoc to document all methods within the source files.
  • fmt <format name>: Produce documentation in a certain format (which currently includes darkfish, pot, and ri).
  • help: Get help with using RDoc’s command-line options and find out which output formatters are available.
  • inline-source: Usually source code is shown using popups, but this option forces code to be shown inline with the documentation.
  • main <name>: Set the class, module, or file that appears as the main index page for the documentation to <name> (for example, rdoc --main MyClass).

 

After any command-line options, rdoc is suffixed with the filename(s) of the files you want to have RDoc document. Alternatively, if you specify nothing, RDoc will traverse the current directory and all subdirectories and generate documentation for your entire project.

 

Note RDoc supports many more command-line options than these, and they are all covered in RDoc’s official documentation. Alternatively, run RDoc with rdoc --help at the command line to get a list of its options.

 

Debugging and Errors

Debugging and Errors

Errors happen. It’s unavoidable that programs you develop will contain bugs, and you won’t immediately be able to see what the errors are. A misplaced character in a regular expression, or a typo with a mathematical symbol, can make the difference between a reliable program and one that constantly throws errors or generates undesirable output.

 

Exceptions and Error Handling

An exception is an event that occurs when an error arises within a program. An exception can cause the program to quit immediately with an error message or can be handled by error handling routines within the program to recover from the error in a sensible way.

 

For example, a program might depend on a network connection (the Internet, for example), and if the network connection is unavailable, an error will arise when the program attempts to use the network.

 

Rather than brusquely terminating with an obscure error message, the code can handle the exception and print a human-friendly error message to the screen first.

 

Alternatively, the program might have a mechanism by which it can work offline, and you can use the exception raised by trying to access an inaccessible network or server to enter that mode of operation instead.

 

Raising Exceptions

Raising Exceptions

In Ruby, exceptions are packaged into objects of class Exception or one of the Exception’s many subclasses. Ruby has about 30 main predefined exception classes that deal with different types of errors, such as NoMemoryError, StandardError, RuntimeError, SecurityError, ZeroDivisionError, and NoMethodError. You might have already seen some of these in error messages while working in IRB.

 

When an exception is raised (exceptions are said to be raised when they occur within the execution of a program), Ruby immediately looks back up the tree of routines that called the current one (known as the stack) and looks for a routine that can handle that particular exception.

 

If it can’t find any error-handling routines, it quits the program with the raw error message. For example:

irb(main):001:0> puts 10 / 0
ZeroDivisionError: divided by 0
from (irb):1:in `/'
from (irb):1


This error message shows that an exception of type ZeroDivisionError has been raised because you attempted to divide ten by zero.

 

Ruby can raise exceptions automatically when you perform incorrect functions, and you can raise exceptions from your own code too. You do this with the raise method and by using an existing exception class, or by creating one of your own that inherits from the Exception class.

 

One of the standard exception classes is ArgumentError, which is used when the arguments provided to a method are fatally flawed. You can use this class as an exception if bad data is supplied to a method of your own:

class Person
def initialize(name)
raise ArgumentError, "No name present" if name.empty?
end
end
If you create a new object from Person and supply a blank name, an exception will be raised:
fred = Person.new('')

 

ArgumentError: No name present

Note You can call raise with no arguments at all, and a generic RuntimeError exception will be raised. This is not good practice, though, as the exception will have no message or meaning along with it.

 

Always provide a message and a class with a raise, if possible. However, you could create your own type of exception if you wanted to. For example:

class BadDataException < RuntimeError
end
class Person
def initialize(name)
raise BadDataException, "No name present" if name.empty?
end
end

This time you’ve created a BadDataException class inheriting from Ruby’s standard RuntimeError exception class.At this point, it might seem meaningless as to why raising different types of exceptions is useful.

 

The reason is so that you can handle different exceptions in different ways with your error-handling code, as you’ll do next.

 

Handling Exceptions

 

Handling Exceptions

In the previous section, we looked at how exceptions work. When raised, exceptions halt the execution of the program and trace their way back up the stack to find some code that can handle them. If no handler for the exception is found, the program ceases execution and dies with an error message with information about the exception.

 

However, in most situations, stopping a program because of a single error isn’t necessary. The error might only be minor, or there might be an alternative option to try. Therefore, it’s possible to handle exceptions. In Ruby, the rescue clause is used, along with begin and end, to define blocks of code to handle exceptions. For example:

begin
puts 10 / 0
rescue
puts "You caused an error!"
end
You caused an error!

In this case, begin and end define a section of code to be run, where if an exception arises, it’s handled with the code inside the rescue block. First, you try to work out ten divided by zero, which raises an exception of class ZeroDivisionError.

 

However, being inside a block containing a rescue section means that the exception is handled by the code inside that rescue section. Rather than dying with a ZeroDivisionError, the text “You caused an error!” is instead printed to the screen.

 

This can become important in programs that rely on external sources of data. Consider this pseudo-code:

data = ""
begin
<..code to retrieve the contents of a Web page..> data = <..content of Web page..>
rescue
puts "The Web page could not be loaded! Using default data instead."
data = <..load data from local file..>
end
puts data

 

This code demonstrates why handling exceptions is extremely useful. If retrieving the contents of a web page fails (if you’re not connected to the Internet, for example), then the error-handling routine rescues the exception, alerts the user of an error, and then loads some data from a local file instead—certainly better than exiting the program immediately!

 

In the previous section, we looked at how to create your own exception classes, and the motivation for doing this is that it’s possible to rescue different types of exceptions in a different way.

 

For example, you might want to react differently if there’s a fatal flaw in the code, versus a simple error such as a lack of network connectivity. There might also be errors you want to ignore, and only specific exceptions you wish to handle.

rescue’s syntax makes handling different exceptions in different ways easy:

begin
... code here ...
rescue ZeroDivisionError
... code to rescue the zero division exception here ...
rescue YourOwnException
... code to rescue a different type of exception here ...
rescue
... code that rescues all other types of exception here ...
end

This code contains multiple rescue blocks, each of which is caused depending on the type of exception raised. If a ZeroDivisionError is raised within the code between beginning and the rescue blocks, the rescue ZeroDivisionError code is executed to handle the exception.

 

Handling Passed Exceptions

Handling Passed Exceptions

As well as handling different types of exceptions using different code blocks, it’s possible to receive exceptions and use them. This is achieved with a little extra syntax on the rescue block:

begin
puts 10 / 0
rescue => e
puts e.class
end

 

ZeroDivisionError

Rather than merely performing some code when an exception is raised, the exception object itself is assigned to the variable e, whereupon you can use that variable however you wish. This is particularly useful if the exception class contains extra functionality or attributes that you want to access.

 

Catch and Throw

Although creating your own exceptions and exception handlers are useful for resolving error situations, sometimes you want to be able to break out of a thread of execution (say, a loop) during normal operation in a similar way to an exception, but without actually generating an error. Ruby provides two methods, catch and throw, for this purpose.

 

catch and throw work in a way a little reminiscent of raise and rescue, but catch and throw work with symbols rather than exceptions. They’re designed to be used in situations where no error has occurred, but being able to escape quickly from a nested loop, method call, or similar, is necessary.

 

The following example creates a block using catch. The catch block with the: finish symbol as an argument will immediately terminate (and move on to any code after that block) if the throw is called with the: finish symbol.

 catch(:finish) do
1000.times do
x = rand(1000)
throw :finish if x == 123
end
puts "Generated 1000 random numbers without generating 123!"
end

 

Within the catch block you generate 1,000 random numbers, and if the random number is ever 123, you immediately escape out of the block using throw: finish.

 

However, if you manage to generate 1,000 random numbers without generating the number 123, the loop and the block complete, and you see the message. catch and throw don’t have to be directly in the same scope. throw works from methods called from within a catch block:

def generate_random_number_except_123
x = rand(1000)
throw :finish if x == 123
end
catch(:finish) do
1000.times { generate_random_number_except_123 }
puts "Generated 1000 random numbers without generating 123!"
end

 

This code operates in an identical way to the first. When throw can’t find a code block using: finish in its current scope, it jumps back up the stack until it can.

 

The Ruby Debugger

The Ruby Debugger

 

Debugging is the process of fixing the bugs in a piece of code. This process can be as simple as changing a small section of your program, running it, monitoring the output, and then looping through this process again and again until the output is correct and the program behaves as expected.

 

However, constantly editing and re-running your program gives you no insight into what’s actually happening deep within your code.

 

Sometimes you want to know what each variable contains at a certain point within your program’s execution, or you might want to force a variable to contain a certain value.

 

You can use puts to show what variables contain at certain points in your program, but you can soon make your code messy by interspersing it with debugging tricks.

 

Ruby provides a debugging tool you can use to step through your code line by line (if you wish), set breakpoints (places where execution will stop for you to check things out), and debug your code. It’s a little like IRB, except you don’t need to type out a whole program. You can specify your program’s filename, and you’ll be acting as if you are within that program.

 

For example, create a basic Ruby script called debug test.rb:

i = 1
j = 0
until i > 1000000
i *= 2
j += 1
end
puts "i = #{i}, j = #{j}"
If you run this code with ruby debugtest.rb, you’ll get the following result:
i = 1048576, j = 20
But say you run it with the Ruby debugger like this:
ruby –r debug debugtest.rb
You’ll see something like this appear:
Debug.rb
Emacs support available
debugtest.rb:1:i = 1 (rdb:1)

This means the debugger has loaded. The third line shows you the current line of code ready to be executed (the first line, in this case), and the fourth line is a prompt that you can type on.

 

The function of the debugger is similar to IRB, and you can type expressions and statements directly onto the prompt here. However, its main strength is that you can use special commands to run debug test.rb line by line, or set breakpoints and “watches” (breakpoints that rely on a certain condition becoming true—for example, to stop execution when x is larger than 10).

 

Here are the most useful commands to use at the debugger prompt:

list: Lists the lines of the program currently being worked upon. You can follow list by a range of line numbers to show. For example, the list shows code lines 2 through 4. Without any arguments, the list shows a local portion of the program to the current execution point.

 

step: Runs the next line of the program. step literally steps through the program line by line, executing a single line at a time. After each step, you can check variables, change values, and so on. This allows you to trace the exact point at which bugs occur. Follow the step by the number of lines you wish to execute if it’s higher than one, such as step 2 to execute two lines.

 

cont: Runs the program without stepping. Execution will continue until the program ends, reaches a breakpoint, or a watch condition becomes true.

 

break: Sets a breakpoint at a particular line number, such as with break 3 to set a breakpoint at line 3. This means that if you continue execution with cont, execution will run until line 3 and then stop again. This is useful for stopping execution at a place where you want to see what’s going on.

 

watch: Sets a condition breakpoint. Rather than choosing a certain line upon which to stop, you specify a condition that causes execution to stop. For example, if you want the program to stop when x is larger than 10, use watch x > 10. This is perfect for discovering the exact point where a bug occurs if it results in a certain condition becoming true.

quit: Exits the debugger.

 

A simple debugging session with your debug test.rb code might look like this:

ruby -r debug debugtest.rb Debug.rb
Emacs support available.
debugtest.rb:1:i = 1
(rdb:1) list
[-4, 5] in debugtest.rb
=> 1 i = 1
2 j = 0
3 until i > 1000000
4 i *= 2
5 j += 1
(rdb:1) step
debugtest.rb:2:j = 0
(rdb:1) i
1
(rdb:1) i = 100
100
(rdb:1) step
debugtest.rb:3:until i > 1000000
(rdb:1) step
debugtest.rb:4: i *= 2
(rdb:1) step
debugtest.rb:5: j += 1
(rdb:1) i = 200
(rdb:1) watch i > 10000
Set watchpoint 1:i > 10000
(rdb:1) cont
Watchpoint 1, toplevel at debugtest.rb:5
debugtest.rb:5: j += 1
(rdb:1) i
12800
(rdb:1) j
6
(rdb:1) quit
Really quit? (y/n) y

This debugging session demonstrates stepping through the code, inspecting variables, changing variables in situ, and setting watchpoints. These are the tools you’ll use 99 percent of the time while debugging, and with practice, the debugging environment can become a powerful tool, much like IRB.

 

However, many Ruby developers don’t use the debugger particularly often, as its style of debugging and its workflow can seem a little out of date compared to modern techniques such as test-driven development and unit testing, which we’ll look at next. If the debugger seems like it could be useful, testing will make you drool.

 

Testing

Testing

Testing is a powerful part of modern software development and can help you resolve many development snafus. Without a proper testing system in place, you can never be confident that your app is bug-free. With a good testing system in place, you might only be 99 percent bug-free, but it’s a significant improvement.

 

Previously, we’ve looked at how to handle explicit errors, but sometimes your programs might perform oddly in certain situations. For example, certain data might cause an algorithm to return an incorrect result, or invalid data might be produced that, although invalid, does not result in an explicit error.

 

One way to resolve these problems is to debug your code, as you’ve seen, but debugging solves only one problem at a time. It’s possible to debug your code to solve one problem, but create many others! Therefore, debugging alone has become viewed as a poor method of resolving bugs, and testing the overall functionality of code has become important.

 

In the past, users and developers might have performed testing manually by performing certain actions and seeing what happens. If an error occurs, the bug in question is fixed and testing continues. Indeed, there was a time when it was it commonplace solely to use user feedback as a testing mechanism!

 

However, things have changed quickly with the rapidly growing popularity of test-driven development (also often known as test-first development), a new philosophy that turns software-development practices on their head. Ruby developers have been at the forefront of promoting and encouraging this technique.

 

The Philosophy of Test-Driven Development

Test-Driven Development

Test-driven development is a technique where developers create a set of tests for a system to pass before coding the system itself, and then rigidly use these tests to maintain the integrity of the code.

 

In a lighter form, however, it can also refer to the technique of implementing tests for any code, even if you don’t necessarily create the tests before the code you’re testing.

 

There is a variant called behavior-driven development that is essentially the same but that has different semantics, different buzzwords, and slightly different expectations. For this introduction, however, we’ll focus on test-driven development.

 

Note This section provides only a basic overview of test-driven development. The topic is vast, and many blogs and resources are available on the topic if you wish to learn more. Wikipedia’s entry on the topic at Test-driven development - Wikipedia is a great place to start.

 

For example, you might add a simple method to String that’s designed to capitalize text into titles:

>class  String
def  titleize
self.capitalize
end
end

 

Your intention is to create a method that can turn “this is a test” into “This Is A Test”; that is, a method that makes strings look as if they’re titles. titleize, therefore, capitalizes the current string with the capitalize method.

 

If you’re in a rush or not bothering to test your code, disaster will soon strike when the code is released into the wild. capitalize capitalizes only the first letter of a string, not the whole string!

puts "this is a test".titleize

"This is a test"

 

That’s not the intended behavior! However, with test-driven development, you could have avoided the pain of releasing broken code by first writing some tests to demonstrate the outcome you expect:

raise "Fail 1" unless "this is a test".titleize == "This Is A Test"
raise "Fail 2" unless "another test 1234".titleize == "Another Test 1234"
raise "Fail 3" unless "We're testing titleize".titleize == "We're Testing Titleize"

 

These three lines of code raise exceptions unless the output of titleize is what you expect it to be.

 

Note These tests are also known as assertions, as they’re asserting that a certain condition is true. If titleize passes these three tests, you can expect the functionality to be okay for other examples.

 

Note A set of tests or assertions that test a single component or a certain set of functionality is known as a test case.

 

Your current code fails on the first test of this test case, so let’s write the code to make it work:

class  String
def  titleize
self.gsub(/\b\w/) { |letter| letter.upcase } end
end

 

This code takes the current string, finds all word boundaries (with \b), passes in the first letter of each word (as obtained with \w), and converts it to upper case. Is job done? Run the three tests again.

RuntimeError: Failed test 3

 

Why does test 3 fail?

puts "We're testing titleize".titleize

 

We'Re Testing Titleize

\b isn’t smart enough to detect true word boundaries. It merely uses whitespace, or “non-word” characters to discriminate words from non-words. Therefore, in “We’re,” both the W and the R get capitalized. You need to tweak your code:

class String
def titleize
self.gsub(/\s\w/) { |letter| letter.upcase } end
end
If you make sure the character before the letter to capitalize is whitespace, you’re guaranteed to now be scanning with a true, new word.
Re-run the tests:
RuntimeError: Failed test 1

 

You’re back to square one.

One thing you failed to take into account is that looking for whitespace before a word doesn’t allow the first word of each string to be capitalized, because those strings start with a letter and not whitespace.

 

It sounds trivial, but it’s a great demonstration of how complex simple functions can become, and why testing is so vital to eradicate bugs. However, the ultimate solution is simple:

class  String

def  titleize

self.gsub(/(\A|\s)\w/){ |letter| letter.upcase } end

end

If you run the tests again, you’ll notice they pass straight through. Success!

This basic example provides a sharp demonstration of why testing is important. Small changes can lead to significant changes in functionality, but with a set of trusted tests in place, you can focus on solving problems rather than worrying if your existing code has bugs.

 

Rather than writing code and waiting for bugs to appear, you can proactively determine what your code should do and then act as soon as the results don’t match up with the expectations.

 

Unit Testing

Unit Testing

In the previous section you created some basic tests using raise, unless, and ==, and compared the results of a method call with the expected results.

 

It’s possible to test a lot in this way, but with more than a few tests it soon becomes messy, as there’s no logical place for the tests to go (and you certainly don’t want to include tests with your actual, functional code).

 

Luckily, Ruby comes with a library, Minitest, that makes testing easy and organizes test cases into a clean structure. Unit testing is the primary component of test-driven development and means that you’re testing each individual unit of functionality within a program or system. Minitest is Ruby’s official library for performing unit tests.

 

Note Formerly, Ruby’s standard testing library was called Test:: Unit. Minitest was a reimplementation that brought new features to the table, including better performance.

 

Minitest is now considered the de facto official testing library in Ruby. If you see code elsewhere written for Test:: Unit, however, never fear, because it will probably work fine as is with Minitest.

 

One of the benefits of Minitest is that it gives you a standardized framework for writing and performing tests. Rather than writing assertions in an inconsistent number of ways, Minitest gives you a core set of assertions to use.

 

Let’s take the title method from before to use as a demonstration of Minitest’s features and create a new file called test_titleize.rb:

class String
def titleize
self.gsub(/(\A|\s)\w/){ |letter| letter.upcase } end
end
require 'minitest/autorun'
class TestTitleize < Minitest::Test
def test_basic
assert_equal("This Is A Test", "this is a test".titleize)
assert_equal("Another Test 1234", "another test 1234".titleize)
assert_equal("We're Testing", "We're testing".titleize)
end
end

First, you include the titleize extension to String (typically this would be in its own file that you’d then require in, but for this simple example we’ll keep it associated with the test code). Next, you load the Minitest class using require. Finally, you create a test case by inheriting from Minitest::Test.

 

Within this class, you have a single method (though you can have as many as you like to separate your tests logically) that contains three assertions, similar to the assertions made in the previous section.

 

If you run this script, you’ll see the tests in action:

Run  options:  --seed  45484

Running:

Finished in 0.002585s, 386.8906 runs/s, 1160.6718 assertions/s. 1 runs, 3 assertions, 0 failures, 0 errors, 0 skips

 

This output shows that the tests are started, a single test method is run (test_basic, in this case), and that a single test method with three assertions passed successfully.

Say you add an assertion to test_basic that’s certainly going to fail, like so:

assert_equal("Let's make a test fail!", "foo".titleize)
and re-run the tests:
Loaded suite t2
Started
F
Failure:
test_basic(TestTitleize)
t2.rb:14:in `test_basic'
assert_equal("This Is A Test", "this is a test".titleize)
assert_equal("Another Test 1234", "another test 1234".titleize)
assert_equal("We're Testing", "We're testing".titleize)
=> 14: assert_equal("Let's make a test fail!", "foo".titleize)
end
end
<"Let's make a test fail!"> expected but was
<"Foo">
diff:
Let's make a test fail!
Foo
Finished in 0.015194 seconds.
1 tests, 4 assertions, 1 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications
0% passed

 

You’ve added an assertion that was bound to fail, and it has. However, Minitest has given you a full explanation of what happened. Using this information, you can go back and either fix the assertion or fix the code that caused the test to fail.

 

In this case, you forced it to fail, but if your assertions are created normally, a failure such as this would demonstrate a bug in your code.

 

More Minitest Assertions

In the previous section, you used a single type of assertion, assert_equal. assert_equal asserts that the first and second arguments are equal (whether they’re numbers, strings, arrays, or objects of any other kind).

 

The first argument is assumed to be the expected outcome and the second argument is assumed to be the generated output, as with your prior assertion:

 

assert_equal("This Is A Test", "this is a test".titleize)

Note assert_equal can also accept an optional third argument as a message to be displayed if the assertion fails. A message might, in some cases, prove more useful than the default assertion failure message.

 

You’re likely to find several other types of assertions useful as follows:

assert(<boolean expression>): Passes only if the Boolean expression isn’t false or nil (for example, assert 2 == 1 will always fail). refute is its direct opposite.

 

assert_equal(expected, actual): Passes only if the expected and actual values are equal (as compared with the == operator). assert_equal('A', 'a'.upcase) will pass.

 

refute_equal(expected, actual): The opposite of assert_equal. This test will fail if the expected and actual values are equal. Any negative/“not” assertions can be prefixed with refute_, but it’s a personal preference as to which you use.

 

assert_raise(exception_type, ..) { <code block> }: Passes only if the code block following the assertion raises an exception of the type(s) passed as arguments.

assert_raise (ZeroDivisionError) { 2 / 0 } will pass.

 

assert_instance_of(class_expected, object): Passes only if object is of class class_expected.

flunk: flunk is a special type of assertion in that it will always fail. It’s useful if you haven’t quite finished writing your tests and you want to add a strong reminder that your test case isn’t complete!

 

Advanced Ruby Features

Advanced Ruby Features

This blog covering useful libraries, frameworks, and Ruby-related technologies in Part 3, this blog rounds off the mandatory knowledge that any proficient Ruby programmer should have. This means that although this blog will jump between several different topics, each is essential to becoming a professional Ruby developer.

 

Dynamic Code Execution

As a dynamic, interpreted language, Ruby is able to execute code created dynamically. The way to do this is with the eval method. For example:

eval "puts 2 + 2"

4

 

Note that while 4 is displayed, 4 is not returned as the result of the whole eval expression. puts always returns nil. To return 4 from eval, you can do this:

puts eval("2 + 2")

4

Here’s a more complex example that uses strings and interpolation:

my_number = 15

my_code = %{#{my_number} * 2}

puts eval(my_code)

30

 

The eval method simply executes (or evaluates) the code passed to it and returns the result. The first example made eval execute puts 2 + 2, whereas the secondly used string interpolation to build an expression of 15 * 2, which was then evaluated and printed to the screen using puts.

 

Bindings

Bindings

In Ruby, a binding is a reference to a context, scope, or state of execution. A binding includes things such as the current value of variables and other details of the execution environment.

 

It’s possible to pass a binding to eval and to have eval execute the supplied code under that binding rather than the current one. In this way, you can keep things that happen with eval separate from the main execution context of your code.

Here’s an example:

def binding_elsewhere
x = 20
return binding
end
remote_binding = binding_elsewhere
x = 10
eval("puts x")
eval("puts x", remote_binding)
10
20

This code demonstrates that eval accepts an optional second parameter, a binding, which in this case is returned from the binding_elsewhere method.

 

The variable remote_binding contains a reference to the execution context within the binding_elsewhere method rather than in the main code. Therefore, when you print x, 20 is shown, as x is defined as equal to 20 in binding_elsewhere!

 

Note   You can obtain the binding of the current scope at any point with the Kernel module’s binding method.

 

Let’s build on the previous example:

eval("x  =  10")
eval("x  =  50",  remote_binding)
eval("puts  x")
eval("puts  x",  remote_binding)
10
50

 

In this example, two bindings are in play: the default binding, and the remote_binding (from the binding_elsewhere method).

 

Therefore, even though you set x first to 10, and then to 50, you’re not dealing with the same x in each case. One x is a local variable in the current context, and the other x is a variable in the context of binding_elsewhere.

 

Other Forms of eval

Although eval executes code within the current context (or the context supplied with a binding), class_eval, module_eval, and instance_eval can evaluate code within the context of classes, modules, and object instances, respectively.

 

class_eval is ideal for adding methods to a class dynamically:
class Person
end
def add_accessor_to_person(accessor_name)
Person.class_eval %{
attr_accessor :#{accessor_name}
}
end
person = Person.new
add_accessor_to_person :name
add_accessor_to_person :gender
person.name = "Peter Cooper"
person.gender = "male"
puts "#{person.name} is #{person.gender}"
Peter Cooper is male

In this example, you use the add_accessor_to_person method to add accessors dynamically to the Person class. Prior to using the add_accessor_to_person method, neither the name nor gender accessors exist within Person.

 

Note that the key part of the code, the class_eval method, operates by using string interpolation to create the desired code for Person:

Person.class_eval %{

attr_accessor :#{accessor_name}

}

String interpolation makes the eval methods powerful tools for generating different features on the fly. This ability is a power unseen in the majority of programming languages and is one that’s used to great effect in systems such as Ruby on Rails.

 

It’s possible to take the previous example a lot further and add an add_accessor method to every class by putting your class_eval cleverness in a new method, defined within the Class class (from which all other classes descend):

class Class
def add_accessor(accessor_name)
self.class_eval %{
attr_accessor :#{accessor_name}
}
end
end
class Person
end
person = Person.new
Person.add_accessor :name
Person.add_accessor :gender
person.name = "Peter Cooper"
person.gender = "male"
puts "#{person.name} is #{person.gender}"

 

In this example, you add the add_accessor method to the Class class, thereby adding it to every other class defined within your program. This makes it possible to add accessors to any class dynamically, by calling add_accessor.

 

(If the logic of this approach isn’t clear, make sure to try this code yourself, step through each process, and establish what is occurring at each step of execution.)

 

The technique used in the previous example also lets you define classes like this:

class SomethingElse

add_accessor :whatever

end

Because add_accessor is being used within a class, the method call will work its way up to the add_ accessor method defined in class Class.

 

Moving back to simpler techniques, using instance_eval is somewhat like using regular eval, but within the context of an object (rather than a method). In this example, you use instance_eval to execute code within the scope of an object:

class MyClass
def initialize
@my_variable = 'Hello, world!'
end
end
obj = MyClass.new
obj.instance_eval { puts @my_variable }
Hello, world!

 

Creating Your Own Version of attr_accessor

So far, you’ve used the attr_accessor method within your classes to generate accessor functions for instance variables quickly. For example, in longhand you might have this code:

class Person
def name
@name
end
def name=(name)
@name = name
end
end
This allows you to do things such as puts person.name and person.name = 'Fred'. Alternatively, however, you can use attr_accessor:
class Person
attr_accessor :name
end

This version of the class is more concise and has exactly the same functionality as the longhand version. Now it’s time to ask the question, how does attr_accessor work?

 

It turns out that attr_accessor isn’t as magical as it looks, and it’s extremely easy to implement your own version using eval. Consider this code:

class Class
def add_accessor(accessor_name)
self.class_eval %{
def #{accessor_name}
@#{accessor_name}
end
def #{accessor_name}=(value)
@#{accessor_name} = value
end
}
end
end

At first, this code looks complex, but it’s very similar to the add_accessor code you created in the previous section. You use class_eval to define getter and setter methods dynamically for the attribute within the current class.

 

If accessor_name is equal to name, then the code that class_eval is executing is equivalent to this code:

def name
@name
end
def name=(value)
@name = value
end

 

Thus, you have duplicated the functionality of attr_accessor.

You can use this technique to create a multitude of different “code generators” and methods that can act as a “macro” language to perform things in Ruby that are otherwise lengthy to type out.

 

Running Other Programs from Ruby

Programs from Ruby

Often it’s useful to be able to run other programs on the system from your own programs. In this way, you can reduce the number of features your program needs to implement, as you can pass off work to other programs that are already written.

 

It can also be useful to hook up several of your own programs so that functionality is spread among them. Rather than using the RPC systems covered in the previous blog, you can simply run other programs from your own with one of a few different methods made available by Ruby.

 

Getting Results from Other Programs

There are three simple ways to run another program from within Ruby: the system method (defined in the Kernel module), backtick syntax (``), and delimited input literals (%x{}).

 

Using system is ideal when you want to run another program and aren’t concerned with its output, whereas you should use backticks when you want the output of the remote program returned.

 

These lines demonstrate two ways of running the system’s date program:

x = system("date")

x = `date`

Warning On Windows, you won’t want to call date, as this command attempts to set a new date on the system. Instead try dir, which will list the contents of the current directory.

 

For the first line, x equals true, whereas, on the second line, x contains the output of the date command. Which method you use depends on what you’re trying to achieve. If you don’t want the output of the other program to show on the same screen as that of your Ruby script, then use backticks (or a literal, %x{}).

 

Note%x{} is functionally equivalent to using backticks; for example, %x{date}.

 

Transferring Execution to Another Program

Transferring Execution to Another Program

 

Sometimes it’s desirable to jump immediately to another program and cease execution of the current program. This is useful if you have a multistep process and have written an application for each. To end the current program and invoke another, simply use the exec method in place of the system. For example:

exec "ruby another_script.rb"

puts "This will never be displayed"

In this example, execution is transferred to a different program, and the current program ceases immediately—the second line is never executed.

 

Running Two Programs at the Same Time

Running Two Programs at the Same Time

Forking is where an instance of a program (a process) duplicates itself, resulting in two processes of that program running concurrently. You can run other programs from this second process by using exec, and the first (parent) process will continue running the original program.

 

the fork is a method provided by the Kernel module that creates a fork of the current process. It returns the child process’s process ID in the parent, but nil in the child process—you can use this to determine which process a script is in.

 

The following example forks the current process into two processes, and only executes the exec command within the child process (the process generated by the fork):

if fork.nil?
exec "ruby some_other_file.rb"
end
puts "This Ruby script now runs alongside some_other_file.rb"

 

Caution Don’t run the preceding code from IRB. If IRB forks, you’ll end up with two copies of IRB running simultaneously, and the result will be unpredictable.

 

If the other program (being run by exec) is expected to finish at some point, and you want to wait for it to finish executing before doing something in the parent program, you can use Process.wait to wait for all child processes to finish before continuing. Here’s an example:

child = fork do
sleep 3
puts "Child says 'hi'!"
end
puts "Waiting for the child process..."
Process.wait child
puts "All done!"
Waiting for the child process...
<3 second delay>
Child says 'hi'!
All done!

Note Forking is not possible with the Windows version of Ruby, as POSIX-style forking is not natively supported on that platform. However, threads, which are covered later in this blog, provide a reasonable alternative.

 

Interacting with Another Program

The previous methods are fine for simple situations where you just want to get basic results from a remote program and don’t need to interact directly with it in any way while it’s running. However, sometimes you might want to pass data back and forth between two separate programs.

 

Ruby’s IO module has a popen method that allows you to run another program and have an I/O stream between it and the current program. The I/O stream between programs works like the other types of I/O streams but instead of reading and writing to a file, you’re reading and writing to another program.

 

Obviously, this technique only works successfully with programs that accept direct input and produce direct output at a command prompt level (so not GUI applications).

 

Here’s a simple read-only example:

ls = IO.popen("ls", "r")
while line = ls.gets
puts line
end
ls.close

In this example, you open an I/O stream with ls (the UNIX command to list the contents of the current directory—try it with dir if you’re using Microsoft Windows). You read the lines one by one, as with other forms of I/O streams, and close the stream when you’re done.

 

Similarly, you can also open a program with a read/write I/O stream and handle data in both directions:

handle = IO.popen("other_program", "r+")
handle.puts "send input to other program"
handle.close_write
while line = handle.gets
puts line
end

 

Safely Handling Data and Dangerous Methods

It’s common for Ruby applications to be used in situations where the operation of a program relies on data from an outside source. This data cannot always be trusted, and it can be useful to protect your machines and environments from unfortunate situations caused by bad data or code.

 

Ruby can be made safer both by considering external data to be tainted and by setting a safe level under which the Ruby interpreter restricts what features are made available to the code it executes.

 

Tainted Data and Objects

Tainted Data and Objects

In Ruby, data is generally considered to be tainted if it comes from an external source, or if Ruby otherwise has no way of establishing whether it is safe. For example, data collected from the command line could be unsafe, so it’s considered tainted.

 

Data read from external files or over a network connection is also tainted. However, data that is hard-coded into the program, such as string literals, is considered to be untainted.

 

Consider a simple program that illustrates why checking for tainted data can be crucial:

while x = gets
puts "=> #{eval(x)}"
end
This code acts like a miniature version of irb. It accepts line after line of input from the user and immediately executes it:
10+2
=> 12
"hello".length
=> 5

 

However, what would happen if someone wanted to cause trouble and typed in rm -rf /*? It would run!

Caution Do not type the preceding code into the program! On a UNIX-related operating system under the right circumstances, running rm -rf /* is an effective way to wipe clean much of your hard drive!

 

Clearly, there are situations where you need to check whether data has potentially been tainted by the outside world.

 

You can check if an object is considered tainted by using the tainted? method:

x = "Hello, world!"
puts x.tainted?
y = [x, x, x]
puts y.tainted?
z = 20 + 50
puts z.tainted?
a = File.open("somefile").readlines.first
puts a.tainted?
b = [a]
puts b.tainted?
false
false
false
true
false

 

Note One of the preceding examples depends on some file being a file that actually exists in the local directory.

  • The first three examples are all operating on data that is already defined within the program
  • (literal data), so are not considered tainted. The last two examples involve data from external sources
  • (a contains the first line of a file). So, why is the last example considered untainted?

b is considered untainted because b is merely an array containing a reference to a. Although a is tainted, an array containing a is not. Therefore, it’s necessary to check whether each piece of data you use is tainted, rather than checking an overall data structure.

 

Note An alternative to having to do any checks is to set the “safe level” of the Ruby interpreter, and any potentially dangerous operations will be disabled for you. This is covered in the following section.

 

It’s possible to force an object to be seen as untainted by calling the untaint method on the object.

For example, here’s an extremely safe version of your Ruby interpreter:

>while x = gets
next if x.tainted?
puts "=> #{eval(x)}"
end

 

However, it’s incredibly useless, because all data accepted from the user is considered tainted, so nothing is ever run. Safety by inactivity! Let’s assume, however, that you’ve come up with a method that can tell if a certain operation is safe:

def code_is_safe?(code)
code =~ /[`;*-]/ ? false : true
end
while x = gets
x.untaint if code_is_safe?(x)
next if x.tainted?
puts "=> #{eval(x)}"
end

Caution code_is_safe? merely checks if the line of code contains a backtick, semicolon, asterisk, or hyphen, and deems the code unsafe if it does. This is not a valid way to check for safe code and is solely provided as an illustration.

 

In this example, you explicitly untaint the data if you deem it to be safe, so eval will execute any “safe” code. Note Similarly, you can explicitly taint an object by calling its taint method.

 

Safe Levels

Although it’s possible to check whether data is tainted and perform preventative actions to clean it up, a stronger form of protection comes with Ruby’s safe levels. Safe levels allow you to specify what features Ruby makes available and how it should deal with tainted data.

 

It’s worth noting before we get too far that Ruby’s safe level functionality is rarely used, so your use of it will probably be limited to very particular situations.

 

Manipulating the safe level is unpopular, primarily because it can tread on the toes of what libraries can do on your system, and secondly because developers usually strive to develop code that’s secure without needing to go to such lengths. An analogy would be that it’s better to drive a car safely than to just have excellent airbags.

 

If you do choose to use safe levels, however, the current safe level is represented by the variable $SAFE. By default, $SAFE is set to 0, providing the lowest level of safety and the highest level of freedom, but four other safe modes are available, as shown in Table 11-1.

 

Value of $SAFE Description

0 No restrictions. This is the default safe level.

1 Potentially unsafe methods can’t use tainted data. Also, the current directory is not added to Ruby’s search path for loading libraries.

 

2 The restrictions of safe level 1, plus Ruby won’t load any external program files from globally writable locations in the filesystem.

 

This is to prevent attacks where hackers upload malicious code and manipulate existing programs to load them.

Some potentially dangerous methods are also deactivated, such as File#chmod,Kernel#fork, and Process:: set priority.

 

3 The restrictions of level 2, plus newly created objects within the program are considered tainted automatically. You also cannot untaint objects.

 

To change the safe level, simply set $SAFE for whichever safe level you want to use. Do note, however, that once you set the safe level, you can only increase the safety level, not decrease it.

 

The reason for not being able to decrease the safe level is that this could allow nefarious code that is executed later (such as an errant library, or through the use of eval) to reduce the safe level by itself and cause havoc!

 

Working with Microsoft Windows

Working with Microsoft Windows

So far in this blog, the examples have been reasonably generic, with a little bias toward UNIX-based operating systems. Ruby is a relative latecomer to the world of Microsoft Windows, but it now includes some libraries that make working directly with Windows’ APIs easy.

 

This section looks at the basics of using the Windows API and Windows’ OLE capabilities from Ruby, although you’ll need in-depth knowledge of these topics if you wish to put together more advanced code.

 

Using the Windows API

Microsoft Windows provides an Application Programming Interface (API) that acts as a library of core Windows-related functions for access to the Windows kernel, graphics interface, control library, networking services, and user interface. Ruby’s Win32API library (included in the standard library) gives developers raw access to the Windows API’s features.

Note No code in this section will work under any operating system other than Microsoft Windows.

 

It’s reasonably trivial to open a dialog box:

require  'Win32API'
title  =  "My  Application"

text  =  "Hello,  world!"

Win32API.new('user32',  'MessageBox',  %w{L  P  P  L},  'I').call(0,  text,  title,  0)

 

First, you load the Win32API library into the program, and then you set up some variables with the desired title and contents of the dialog box. Next, you create a reference to the MessageBox function provided by the Windows API, before calling it with your text and title.

  • The parameters to Win32API.new represent the following:
  • The name of the system DLL containing the function you want to access
  • The name of the function you wish to use
  • An array describing the format of each parameter to be passed to the function
  • A character representing the type of data to be returned by the function

 

In this case, you specify that you want to call the MessageBox function provided by user32.dll, that you’ll be supplying four parameters (a number, two strings, and another number—L represents numbers, P represents strings), and that you expect an integer to be returned (I representing integer).

 

Once you have the reference to the function, you use the call method to invoke it with the four parameters. In MessageBox’s case, the four parameters represent the following:

  • The reference to a parent window (none in this case)
  • The text to display within the message box
  • The title to use on the message box
  • The type of message box to show (0 being a basic OK button dialog box)

 

The call method returns an integer that you don’t use in this example, but that will be set to a number representing which button on the dialog box was pressed.

You can, of course, create something more elaborate:

require 'Win32API'
title = "My Application"
text = "Hello, world!"
dialog = Win32API.new('user32', 'MessageBox', 'LPPL', 'I')
result = dialog.call(0, text, title, 1)
case result
when 1
puts "Clicked OK"
when 2
puts "Clicked Cancel"
else
puts "Clicked something else!"
end

 

This example keeps the result from the MessageBox function and uses it to work out which button was pressed. In this case, you call the MessageBox function with the fourth parameter of 1, representing a dialog box containing both an OK and a Cancel button.

 

If the OK button is clicked, dialog.the call returns 1, whereas if Cancel is clicked, 2 is returned.

The Windows API provides many hundreds of functions that can do everything from printing to changing the desktop wallpaper, to creating elaborate windows. In theory, you could even put together an entire Windows program using the raw Windows API functions, although this would be a major undertaking.

 

For more information about the Windows API, a good place is to start is the Wikipedia entry for it at Wikipedia, the free encyclopedia Windows_API.

 

Controlling Windows Programs

Although the Windows API allows you to access low-level functions of the Microsoft Windows operating system, it can also be useful to access functions made available by programs available on the system.

 

The technology that makes this possible is called Windows Automation. Windows Automation provides a way for programs to trigger one another’s features and to automate certain functions among themselves.

 

Access to Windows Automation is provided by Ruby’s WIN32OLE (also included in the standard library). If you’re already familiar with Windows Automation, COM, or OLE technologies, Ruby’s interface will feel instantly familiar. Even if you’re not, this code should be immediately understood:

require 'win32ole'
web_browser = WIN32OLE.new('InternetExplorer.Application')
web_browser.visible = true
web_browser.navigate('http://www.rubyinside.com/')

 

This code loads the WIN32OLE library and creates a variable, web_browser, that references an OLE automation server called 'InternetExplorer.Application'.

 

This server is provided by the Internet Explorer web browser that comes with Windows, and the OLE automation server allows you to control the browser’s functions remotely. In this example, you make the web browser visible before instructing it to load a certain web page.

 

WIN32OLE does not implement the visible and navigate methods itself. These dynamic methods are handled on the fly by method_missing (a special method that is run within a class whenever no predefined method is found) and passed to the OLE Automation server. Therefore, you can use any methods made available by any OLE Automation server directly from Ruby!

 

You can extend this example to take advantage of further methods made available by Internet Explorer:

require 'win32ole'
web_browser = WIN32OLE.new('InternetExplorer.Application')
web_browser.visible = true
web_browser.navigate('http://www.rubyinside.com/')
while web_browser.ReadyState != 4
sleep 1
end
puts "Page is loaded"

This example uses the ReadyState property to determine when Internet Explorer has successfully finished loading the page. If the page is not yet loaded, Ruby sleeps for a second and checks again. This allows you to wait until a remote operation is complete before continuing.

 

Once the page loading is complete, Internet Explorer makes available the document property that allows you to get full access to the Document Object Model (DOM) of the web page that it has loaded, much in the same fashion as from JavaScript. For example:

puts web_browser.document.getElementById('header').innerHtml.length

 

This section was designed to demonstrate that although Ruby’s origins are in UNIX-related operating systems, Ruby’s Windows support is significant. You can access Windows’ APIs, use OLE and OLE Automation, and access DLL files.

 

Many Windows-related features are advanced and beyond the scope of this blog, but I hope this section whetted your appetite to research further if this area of development interests you.

 

Threads

Threads

The thread is short for a thread of execution. You use threads to split the execution of a program into multiple parts that can be run concurrently. For example, a program designed to e-mail thousands of people at once might split the task between 20 different threads that all send an e-mail at once.

 

Such parallelism is faster than processing one item after another, especially on systems with more than one CPU, because different threads of execution can be run on different processors. It can also be faster because rather than wasting time waiting for a response from a remote machine, you can continue with other operations.

 

Ruby 1.8 didn’t support threads in the traditional sense. Typically, threading capabilities are provided by the operating system and vary from one system to another.

 

However, Ruby 1.8 provided Ruby’s threading capabilities directly which meant they lacked some of the power of traditional system-level threads. In Ruby 1.9, Ruby began to use system-based threads, and this is now the default expectation among Rubyists.

 

While Ruby 1.9 and 2.x’s threads are systems (native) threads, in order to remain compatible with 1.8 code a global interpreter lock (GIL) has been left in place so that threads do not truly run simultaneously.

 

This means that all of what is covered in this section is relevant to all of 1.8, 1.9, 2.0, and beyond. A Ruby 1.9-and-beyond–only alternative, fibers, is covered in the next primary section of this blog.

 

Basic Ruby Threads in Action

Here’s a basic demonstration of Ruby threading in action:

threads = []
10.times do
thread = Thread.new do
10.times { |i| print i; $stdout.flush; sleep rand(2) } end
threads << thread
end
threads.each { |thread| thread.join }

You create an array to hold your Thread objects so that you can easily keep track of them. Then you create ten threads, sending the block of code to be executed in each thread to Thread.new, and add each generated thread to the array.

 

Note When you create a thread, it can access any variables that are within scope at that point. However, any local variables that are then created within the thread are entirely local to that thread. This is similar to the behavior of other types of code blocks.

 

Once you’ve created the threads, you wait for all of them to complete before the program finishes. You wait by looping through all the thread objects in threads and calling each thread’s join method.

 

The join method makes the main program wait until a thread’s execution is complete before continuing. In this way, you make sure all the threads are complete before exiting.

 

Advanced Thread Operations

Advanced Thread Operations

As you’ve seen, creating and running basic threads is fairly simple, but threads also offer a number of advanced features. These are discussed in the following subsections.

 

Waiting for Threads to Finish Redux

When you waited for your threads to finish by using the join method, you could have specified a timeout value (in seconds) for which to wait. If the thread doesn’t finish within that time, join returns nil. Here’s an example where each thread is given only one second to execute:

threads.each do |thread|

puts "Thread #{thread.object_id} didn't finish in 1s" unless thread.join(1) end

 

Getting a List of All Threads

It’s possible to get a global list of all threads running within your program using Thread.list. In fact, if you didn’t want to keep your own store of threads, you could rewrite the earlier example from the section “Basic Ruby Threads in Action” down to these two lines:

 

10.times { Thread.new { 10.times { |i| print i; $stdout.flush; sleep rand(2) } } } Thread. list.each { |thread| thread.join unless thread == Thread.main }

 

However, keeping your own list of threads is essential if you’re likely to have more than one group of threads working within an application and you want to keep them separate from one another when it comes to using a join or other features.

 

The list of threads also includes the main thread representing the main program’s thread of execution, which is why we explicitly do not join it in the prior code.

 

Thread Operations from Within Threads Themselves

Threads aren’t just tiny, dumb fragments of code. They have the ability to talk with the Ruby thread scheduler and provide updates on their status. For example, a thread can stop itself:

Thread.new do
10.times do |i|
print i
$stdout.flush
Thread.stop
end
end


Every time the thread created in this example prints a number to the screen, it stops itself. It can then only be restarted or resumed by the parent program calling the run method on the thread, like so:

 

Thread.list.each { |thread| thread.run }

A thread can also tell the Ruby thread scheduler that it wants to pass execution over to another thread. The technique of voluntarily ceding control to another thread is often known as cooperative multitasking, because the thread or process itself is saying that it’s okay to pass execution on to another thread or process.

 

Used properly, cooperative multitasking can make threading even more efficient, as you can code in pass requests at ideal locations. Here’s an example showing how to cede control from a thread:

 

2.times { Thread.new { 10.times { |i| print i; $stdout.flush; Thread.pass } } } Thread.list. each { |thread| thread.join unless thread == Thread.main }

In this example, execution flip-flops between the two threads, causing the pattern shown in the results.

 

Fibers

Fibers offer an alternative to threads in Ruby 1.9 and beyond. Fibers are lightweight units of execution that control their own scheduling (often referred to as cooperative scheduling).

 

Whereas threads will typically run continually, fibers hand over control once they have performed certain tasks. Unlike regular methods, however, once a fiber hands over control, it continues to exist and can be resumed at will.

 

In short, fibers are pragmatically similar to threads, but fibers aren’t scheduled to all run together. You have to manually control the scheduling.

 

A Fiber in Action

 

A Fiber in Action

Nothing will demonstrate fibers as succinctly as a demonstration, so let’s look at a very simple implementation to generate a sequence of square numbers:

sg = Fiber.new do
s = 0
loop do
square = s * s
Fiber.yield square
s += 1
end
end
10.times { puts sg.resume }

In this example, we create a fiber using a block, much in the same style as we created threads earlier. The difference, however, is that the fiber will run solely on its own until the Fiber. yield method is used to yield control back to whatever last told the fiber to run (which, in this case, is the sg.resume method call).

 

Alternatively, if the fiber “ends,” the value of the last executed expression is returned.

 

In this example, it’s worth noting that you don’t have to use the fiber forever, although since the fiber contains an infinite loop, it would certainly be possible to do so. Even though the fiber contains an infinite loop, however, the fiber is not continually running, so it results in no performance issues.

 

If you do develop a fiber that has a natural ending point, calling its resume method once it has concluded will result in an exception (which, of course, you can catch—refer to blog 8’s “Handling Exceptions” section) that states you are trying to resume a dead fiber.

 

Passing Data to a Fiber

It is possible to pass data back into a fiber when you resume its execution as well as receive data from it. For example, let’s tweak the square number generator fiber to support receiving back an optional new base from which to provide square numbers:

sg = Fiber.new do
s = 0
loop do
square = s * s
s += 1
s = Fiber.yield(square) || s
end
end
puts sg.resume
puts sg.resume
puts sg.resume
puts sg.resume
puts sg.resume 40
puts sg.resume
puts sg.resume
puts sg.resume 0
puts sg.resume
puts sg.resume

In this case, we start out by getting back square numbers one at a time as before. On the fifth attempt, however, we pass back the number 40, which is then assigned to the fiber’s s variable and used to generate square numbers.

 

After a couple of iterations, we then reset the counter to 0. The number is received by the fiber as the result of calling Fiber.yield.

 

It is not possible to send data into the fiber in this way with the first resume, however, since the first resume call does not follow on from the fiber yielding or concluding in any way. In that case, any data you passed is passed into the fiber block, much as if it were a method.

 

Why Fibers?

A motivation to use fibers over threads in some situations is efficiency. Creating hundreds of fibers is a lot faster than creating the equivalent threads, especially in Ruby 1.9 and beyond where threads are created at the operating system level. There are also significant memory efficiency benefits, too.

 

The disadvantage of fibers compared to threads, however, is that fibers are not preemptive at all—you can only run one fiber at a time (within a single thread), and you have to do the scheduling. In some situations, of course, this might be a plus!

 

You may decide that fibers have no place in your own code, which is fine. One of the greatest benefits of fibers is in implementing lightweight I/O management routines within other libraries, so even if you don’t use fibers directly, you might still end up benefiting from their use elsewhere.

 

Ruby Inline

Ruby Inline

As a dynamic, object-oriented programming language, Ruby wasn’t designed to be a high-performance language in the traditional sense. This is not of particular concern nowadays, as most tasks are not computationally intensive, but there are still situations where raw performance is required for a subset of functionality.

 

In situations where extremely high performance is desirable, it can be a good idea to write the computationally intensive code in a more powerful but less expressive language, and then call that code from Ruby.

 

Luckily there’s a library for Ruby called RubyInline, created by Ryan Davis and Eric Hodel, that makes it possible to write code in other more powerful languages within your Ruby code. It’s most often used to write high-performance code in the C or C++ languages, and we’ll focus on this in this section.

 

Installing RubyInline on UNIX-related platforms (such as Linux and OS X) is easy with RubyGems:

 

gem install RubyInline

If you don’t have gcc—a C compiler—installed, RubyInline’s C support will not work, and RubyInline itself might not install. Refer to your operating system’s documentation on how to install gcc.

 

Why Use C as an Inline Language?

Why Use C as an Inline Language

C is a general-purpose, procedural, compiled programming language developed in the 1970s by Dennis Ritchie.

 

It’s one of the most widely used programming languages in the world and is the foundation of nearly every major operating system currently available. C (and its object-oriented sister language, C++) is still a popular language due to its raw speed and flexibility.

 

Although languages such as Ruby have been designed to be easy to develop with, C offers a lot of low-level access to developers, along with blazing speed. This makes C perfect for writing performance-intensive libraries and functions that can be called from other programming languages, such as Ruby.

 

Note This section is not a primer on the C language, as that would be an entire blog in its own right. To learn more about the C programming language itself, visit http://en.wikipedia.org/wiki/C_programming_language.

 

Creating a Basic Method or Function

Creating a Basic Method or Function

An ideal demonstration of RubyInline and C’s power is to create a basic method (a function in C) to compute factorials. The factorial of a number is the product of all integers from itself down to 1. So, for example, the factorial of 8 is 8 * 7 * 6 * 5 * 4 * 3 * 2 * 1, or 40,320.

 

Calculating a factorial in Ruby is easy:

class Fixnum
def factorial
(1..self).inject { |a, b| a * b }
end
end
puts 8.factorial
40320

You can use your knowledge of benchmarking to test how fast this method is:

require 'benchmark'
Benchmark.bm do |bm|
bm.report('ruby:') do
100000.times do
8.factorial
end
end
end
user system total real
ruby: 0.930000 0.010000 0.940000 ( 1.537101)

The results show that it takes about 1.5 seconds to run 100,000 iterations of your routine to compute the factorial of 8—approximately 66,666 iterations per second.

 

Let’s write a factorial method in C using RubyInline:

require 'inline'
class CFactorial
inline do |builder|
builder.c "
long factorial(int max) {
int i=max, result=1;
while (i >= 2) { result *= i--; }
return result;
}"
end
end
c = CFactorial.new()
puts c.factorial(8)

First, you create a factorial class to house your new method. Then inline do |builder| starts the RubyInline environment, and builder.c is used to process the C code within the multiline string between %q{ and }.

 

The reason for this level of depth is that RubyInline can work with multiple languages at the same time, so you need to enter the RubyInline environment first and then explicitly specify code to be associated with a particular language.

 

The actual C code in the preceding example begins following the builder.c line. Let’s focus on it for a moment:

long factorial(int max) {
int i=max, result=1;
while (i >= 2) { result *= i--; }
return result;
}

This code defines a C function called factorial that accepts a single integer parameter and returns a single integer value. The internal logic counts from 1 to the supplied value and multiplies each number to obtain the factorial.

 

Benchmarking C versus Ruby

Benchmarking C versus Ruby

Now that you have your C-based factorial routine written, let’s benchmark it and compare it to the Ruby-based solution. Here’s a complete program to benchmark the two different routines (C and Ruby):

require 'rubygems'
require 'inline'
require 'benchmark'
class CFactorial
inline do |builder|
builder.c "
long factorial(int max) {
int i=max, result=1;
while (i >= 2) { result *= i--; }
return result;
}"
end
end
class Fixnum
def factorial
(1..self).inject { |a, b| a * b }
end
end
Benchmark.bm do |bm|
bm.report('ruby:') do
100000.times { 8.factorial }
end
bm.report('c:') do
c = CFactorial.new
100000.times { c.factorial(8) }
end
end
user system total real
ruby: 0.930000 0.010000 3.110000 ( 1.571207)
c: 0.020000 0.000000 0.120000 ( 0.044347)

The C factorial function is so much faster as to barely leave a whisper on the benchmarking times! It’s at least 30 times faster.

 

There are certainly ways both implementations could be improved, but this benchmark demonstrates the radical difference between the performance of compiled and interpreted code, as well as the effect of Ruby’s object-oriented overhead on performance.

 

Environment Variables

Whenever a program is run on a computer, it’s contained with a certain environment, whether that’s the command line or a GUI.

 

The operating system sets a number of special variables called environment variables that contain information about the environment. They vary by operating system but can be a good way of detecting things that could be useful in your programs.

 

You can quickly and easily inspect the environment variables (as supplied by your operating system) on your current machine with IRB by using the special ENV hash:

 style="margin:0;width:959px;height:198px">irb(main):001:0> ENV.each {|e| puts e.join(': ') }
TERM: vt100
SHELL: /bin/bash
USER: peter
PATH: /bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/opt/local/bin:/usr/local/sbin
PWD: /Users/peter
SHLVL: 1
HOME: /Users/peter
LOGNAME: peter
SECURITYSESSIONID: 51bbd0
_: /usr/bin/irb
LINES: 32
COLUMNS: 120

Specifically, these are the results from my machine, and yours will probably be quite different. For example, when I try the same code on a Windows machine, I get results such as these:

ALLUSERSPROFILE: F:\Documents and Settings\All Users
APPDATA: F:\Documents and Settings\Peter\Application Data
CLIENTNAME: Console
HOMEDRIVE: F:
HOMEPATH: \Documents and Settings\Peter
LOGONSERVER: \\PSHUTTLE
NUMBER_OF_PROCESSORS: 2
OS: Windows_NT
Path: F:\ruby\bin;F:\WINDOWS\system32;F:\WINDOWS
PATHEXT: .COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.RB;.RBW
ProgramFiles: F:\Program Files
SystemDrive: F:
SystemRoot: F:\WINDOWS
TEMP: F:\DOCUME~1\Peter\LOCALS~1\Temp
TMP: F:\DOCUME~1\Peter\LOCALS~1\Temp
USERDOMAIN: PSHUTTLE
USERNAME: Peter
USERPROFILE: F:\Documents and Settings\Peter
windir: F:\WINDOWS

You can use these environment variables to decide where to store temporary files, or to find out what sort of features your operating system offers, in real time, much as you did with RUBY_PLATFORM:

tmp_dir = '/tmp'
if ENV['OS'] =~ /Windows_NT/
puts "This program is running under Windows NT/2000/XP!"
tmp_dir = ENV['TMP']
elsif ENV['PATH'] =~ /\/usr/
puts "This program has access to a UNIX-style file system!"
else
puts "I cannot figure out what environment I'm running in!"
exit
end
# [.. do something here ..]


Note You can also set environment variables with ENV['variable_name'] = value, but only do this if you have a valid reason to use them. However, setting environment variables from within a program only apply to the local process and any child processes, meaning that the variables’ application is extremely limited.

 

Although ENV acts like a hash, it’s technically a special object, but you can convert it to a true hash using its .to_hash method, as in ENV.to_hash.

 

Accessing Command-Line Arguments

Accessing Command-Line Arguments

You used a special array called ARGV. ARGV is an array automatically created by the Ruby interpreter that contains the parameters passed to the Ruby program (whether on the command line or by other means).

For example, say you created a script called argvtest.rb: p ARGV

 

You could run it like so: ruby argvtest.rb these are command line parameters
["these", "are", "command", "line", "parameters"]

The parameters are passed into the program and become present in the ARGV array, where they can be processed as you wish. Use of ARGV is ideal for command-line tools where filenames and options are passed in this way.

 

Using ARGV also works if you call a script directly.

On UNIX operating systems, you could adjust argvtest.rb to be like this:
#!/usr/bin/env ruby
p ARGV
And you could call it in this way:
./argvtest.rb these are command line parameters
["these", "are", "command", "line", "parameters"]

You generally use command-line arguments to pass options, settings, and data fragments that might change between executions of a program. For example, a common utility found on most operating systems is copy or cp, which is used to copy files. It’s used like so:

 

cp /directory1/from_filename /directory2/destination_filename

This would copy a file from one place to another (and rename it along the way) within the filesystem. The two filenames are both command-line arguments, and a Ruby script could receive data in the same way, like so:

#!/usr/bin/env ruby

from_filename = ARGV[0]

destination_filename = ARGV[1]

 

Distributing Ruby Libraries as Gems

Over time, it’s likely you’ll develop your own libraries to solve various problems with Ruby so that you don’t need to write the same code over and over in different programs, but can call on the library for support.

 

Usually, you’ll want to make these libraries available to use on other machines, on servers upon which you deploy applications, or to other developers. You might even open source your libraries to get community input and a larger developer base.

 

Luckily, deploying libraries is generally less problematic than deploying entire applications, as the target audience is made up of other developers who are usually familiar with installing libraries.

 

Creating a Gem

There are easy ways to create gems and slightly less easy ways. I’m going to take a “raw” approach by showing how to create a gem from the ground up. Later, we’ll look at a library that will do most of the grunt work for you.

 

Let’s first create a simple library that extends the String class and puts it in a file called string_extend.rb:

class String
def vowels
scan(/[aeiou]/i)
end
end

This code adds a vowels method to the String class, which returns an array of all the vowels in a string:

"This is a test".vowels

["i", "i", "a", "e"]

As a local library within the scope of a larger application, it could be loaded with require or require_relative:

require_relative 'string_extend'

 

However, you want to turn it into a gem that you can use anywhere. Building a gem involves three steps. The first is to organize your code and other files into a structure that can be turned into a gem.

 

The second is to create a specification file that lists information about the gem. The third is to use the gem program to build the gem from the source files and the specification.

Recommend