Define Closures (Best Tutorial 2019)

 

Define Closures

Define What are Closures

This tutorial defines what is closure with best examples. A closure is a function whose body references a variable that is declared in the parent scope. Furthermore, you can transfer these concepts to other languages that support closures, such as JavaScript, Haskell, and Elixir.

 

Once you get comfortable with the basic concepts, you’ll explore two interesting techniques made possible by using closures. First, you’ll learn that you don’t really need classes at all.

 

With some lambdas sprinkled with variables, you can replicate much of the functionality that classes provide. Second, you’ll see how closures let you write decoupled code using callbacks.

 

Finally, you’ll put this newfound knowledge to use by implementing your own version of Ruby’s Enumerable #reduce method using lambdas, a Ruby language feature that lets you write code in a functional-like programming style.

 

The Foundations of a Closure

 Closure Foundations

Let’s begin with a definition of a closure. A closure is to grasp two other programming language concepts: lexical scoping and free variables. Let’s look at lexical scoping first, and then dive into free variables.

 

Lexical Scoping: Closest Variable Wins

Lexical scoping rules serve to answer one question: what is the value of this variable at this line? In short, lexical scoping says that whichever assignment to a variable that is the closest gets that value.

 

The value of a variable x is given by the innermost statement that declares x. Furthermore, the area in a program where a variable maintains a value is called the scope of that variable.

 

Therefore, you can find out the value of a variable by simply eyeballing a line in your program without having to run it. This should feel intuitive to you; every time you try to work out the value of a variable, you do so by following lexical scoping rules.

 

Lexical scoping is sometimes known by another term— static scoping. I guess these terms are more catchy than “eyeball scoping.”

 

Let’s take a look at an example. Open IRB and follow along with the following example:

>> msg = "drive the principal's car" => "drive the principal's car"
>> 3.times do
>> prefix = "I will not"
>> puts "#{prefix} #{msg}"
>> end
I will not drive the principal's car
I will not drive the principal's car
I will not drive the principal's car

do .. end creates a new scope. Within the block, the prefix is declared and has the value "I will not". msg is more interesting.

 

It is not declared within the block but in the outermost, or parent, scope. The inner scope has access to the parent scope. Because of this, msg continues to have the value "drive the principal’s car"

 

You have just seen that the inner scope has access to the parent scope. Does the opposite hold true? For example, can prefix be accessed outside of the block? You can easily find out using IRB again:

>> puts prefix

NameError: undefined local variable or method `prefix' for main:Object from (IRB):6

 

You have just proven that this is definitely not the case; the prefix is only visible from within the block, and nowhere else. You might not usually think about it, but this is lexical scoping in action. You now know what lexical scoping is. The other piece of the puzzle is the free variable, or more precisely, identifying the free variable.

 

Identifying Free Variables

Free Variables

A free variable is a variable that is defined in a parent scope. Let’s look at an example that will make this clearer. We’ll modify the program we just wrote to use lambdas instead. Lambdas are Ruby’s version of anonymous functions found in other languages. Type this program into your IRB session:

>> chalkboard_gag = lambda do |msg|
>> lambda do
>> prefix = "I will not"
>> "#{prefix} #{msg}"
>> end
>> end
=> #<Proc:0x007ffa2f901600@(IRB):1 (lambda)>

 

Instead of seeing the output, the session shows that the return value is a lambda "drive the principal’s car" that is represented by a Proc object. Don’t worry about this for now; we’ll come back to this soon. Instead, try to imagine how you would make the chalkboard_gag variable return It might not be as straightforward as you think!

 

The trick here is to tease apart chalkboard_gag layer by layer. The first layer is the outermost lambda:

» chalkboard_gag = lambda do |msg| lambda do
prefix = "I will not" "#{prefix} #{msg}"
end
» end
The outermost lambda takes a single argument, msg. But what does it return?
Another lambda. Let’s turn our attention to the inner lambda:
chalkboard_gag = lambda do |msg|
» lambda do
prefix = "I will not"
"#{prefix} #{msg}"
» end
end

The body of the inner lambda declares the prefix variable. On the other hand, msg is not declared anywhere in the lambda’s body. Where is it declared then? It’s declared in the parent scope as the argument of the outer lambda. This makes msg a free variable.

 

The parent scope is also called the surrounding lexical scope because the outer lambda wraps around the inner one. It is this wrapping around that allows the inner lambda to access variables declared in the outer one.

 

Let’s put chalkboard_gag to the test. Go back to IRB where you last created the chalkboard_gag lambda. Then supply a value to the outermost lambda. Invoke the lambda by using the call method:

>> inner_lambda = chalkboard_gag.call("drive the principal's car") => #<Proc:0x007fca608589a0@(IRB):2 (lambda)>

 

As expected, the return result is a lambda. To get to the final result, you need to invoke the inner lambda, which you assigned to inner_lambda. To invoke it, use the call method:

>> inner_lambda.call()
=> "drive the principal's car"

 

Whenever an inner lambda refers to a variable that is not declared within it, but that variable is declared in the parent scope of that lambda, that is a free variable. 

 

At this stage, you almost have all the tools and knowledge needed to point out a closure. You know what lexical scoping is and how it works. You also know how to identify a free variable. Now, you need to know what separates closures and non-closures.

 

Rules of Identifying a Closure

Identifying a Closure

Recall the definition of a closure:

  • 1. It needs to be a function...
  • 2. whose body references some variable that...
  • 3. is declared in a parent scope.

Since Ruby doesn’t have the concept of a traditional function, we’re going to be a bit lose with the definition. In the context of Ruby, this means a block, Proc, or lambda.

 

However, being a block, Proc, or lambda is not enough. The body must contain at least one variable that is declared in the parent scope. What kind of variable is that? That’s right, a free variable. At this point, you might be thinking: what interesting things can I do with closures? The answer might surprise you!

 

Simulating Classes with Closures

If you think about it, classes are a way of packaging up data and behavior. Instances created of a class are distinct from each other. In other words, each instance has its own state. Closures also provide that functionality. Let’s see how with an example.

 

Say you want to build a very simple counter program. The counter program can do the following: Get the current value of the counter.

  • Increment the counter.
  • Decrement the counter.

This is the essence of what most classes do: retrieve and modify data. Here’s one possible implementation of a Counter class:

closures/counter.rb
class Counter
def initialize
@x = 0
end
def get_x
@x
end
def incr
@x += 1
end
def decr
@x -= 1
end
end
Here’s a sample run in IRB:
>> c = Counter.new
=> #<Counter:0x007f9335939840 @x=0>
>> c.incr => 1
>> c.incr => 2
>> c.get_x => 2
>> c.decr => 1
>> c.decr => 0
>> c.decr => -1

There should not be anything surprising with this example. So let’s add some constraints. Imagine if you didn’t have the ability to create classes. Could you still write a counter program? With lambdas, you most definitely can. Create a new file called lambda_counter.rb and fill it with the following code:

closures/lambda_counter.rb
1: Counter = lambda do
2: x = 0
3: get_x = lambda { x }
4: incr = lambda { x += 1 }
5: decr = lambda { x += 1 }
6:
7: {get_x: get_x, incr: incr, decr: decr}
8: end

Here, Counter is a lambda. Line 2 declares x, the state of the counter, and initializes it to zero. Line 3 creates a lambda that returns the current state of the counter. Lines 4 and 5 both modify the state of the counter by increasing and decreasing the value of x respectively.

 

It should be apparent to you by now that x is the free variable. Finally, on line 7, the return result of the outermost lambda is a hash whose keys are the names of the respective lambdas.

 

By saving the return values, you can get a reference to the respective lambdas and manipulate the counter. And manipulate the counter you will! Load the lambda_counter.RB file in IRB:

% IRB -r ./lambda_counter.rb

 

Create a new counter:

Create counter

 

>> c1 = Counter.call
=> {:get_x=>#<Proc:0x007fa92904ea28@/counter.rb:4 (lambda)>, :incr=>#<Proc:0x007fa92904e910@/counter.rb:6 (lambda)>, :decr=>#<Proc:0x007fa92904e898@/counter.rb:8 (lambda)>}
Counter c1 is a hash where each key points to a Proc. Let’s perform some operations on the counter:
>> c1[:incr].call => 1
>> c1[:incr].call => 2
>> c1[:incr].call => 3
>> c1[:decr].call => 2
>> c1[:get_x].call => 2
Let’s create another counter, c2. Is c2 distinct from c1? In other words, do they behave like distinct objects?
>> c2 = Counter.call
=> {:get_x=>#<Proc:0x007fa92a1fcc98@/counter.rb:4 (lambda)>, :incr=>#<Proc:0x007fa92a1fcc70@/counter.rb:6 (lambda)>, :decr=>#<Proc:0x007fa92a1fcc48@/counter.rb:8 (lambda)>}
>> c2[:get_x].call => 0
>> c1[:get_x].call => 2

Both c1 and c2 get their own x. So there you have it: it is entirely possible to have objects without classes. In fact, this technique is often used in JavaScript to make sure that variables do not leak out and inadvertently become overridden by some other function or operation.

 

While you probably wouldn’t want to use this technique in your day-to-day Ruby programming, there’s an important lesson to be drawn from here. The scoping rules of Ruby are such that when lambda is defined, that lambda also has access to all the variables that are in scope.

 

As the counterexample has shown, closures restrict access to the variables they wrap. This technique will come in handy in the later blogs.

 

If you have done any amount of JavaScript programming, you would most definitely have encountered the use of callbacks. When used judiciously, callbacks are a very powerful technique. The next section shows how you can achieve the same benefits of callbacks in Ruby.

 

Implementing Callbacks in Ruby with Lambdas

Ruby with Lambdas

At times, closures allow us to write programs that are otherwise very difficult to express or downright nasty to look at. An example of this is callbacks. Imagine that you’re working with a report-generation tool.

 

The programmer before you hacked together something quick and dirty and has since left for greener pastures. Lucky him. Unfortunately for you, the report-generating method has a couple of bugs, causing crashes to occur 5% of the time.

 

The code is a complete mess and you have no appetite to go near that monstrosity. Instead, you want to know if a report has been successfully generated and send it to your boss right away. However, when things go awry, you want to be notified personally.

 

The report generator is straightforward:

require 'ostruct'
class Generator
attr_reader :report
def initialize(report)
@report = report
end
def run
report.to_csv
end
end

The Generator takes in a report. The Generator#run delegates the call to the report’s to_csv method.

For simplicity’s sake, let’s use the OpenStruct class. This is an example of a report without any error:

good_report = OpenStruct.new(to_csv: "59.99,Great Success")

An erroneous report is represented by a nil value in to_csv like so:

bad_report = OpenStruct.new(to_csv: nil)

 

Let’s take a step back now and try to sketch out how to implement this. Recall that we need to handle two cases. The first case is when things are rosy and the report is sent to the boss; the other is when things go horribly wrong and we want to know about it.

Here’s how this might look:

Notifier.new(Generator.new(good_report),
on_success: lambda { |r| puts "Send #{r} to boss@acme.co" },
on_failure: lambda { puts "Send email to ben@acme.co"} ).tap do |n|
n.run
end

A Notifier takes a Generator object and a Hash that represents the callbacks for the success and failure cases, respectively. Finally, the run method is called to invoke the notifier.

 

If you’re looking at the previous code listing and thinking to yourself, “Wait a minute, this looks almost like functional programming!,” give yourself a pat on the back. Although Ruby is an object-oriented language, supporting features such as lambdas blur the lines between object-oriented and functional programming.

 

In particular, the feature that you might be thinking about is passing around functions as first-class values. Understanding what this means is useful especially when you go further into functional programming. So let’s take a short detour and investigate.

 

First-Class Values

Class Values

Think about the way you use an integer or a string. You can assign either to a variable. You can also pass them into methods. Finally, integers and strings can be return values of methods. These characteristics make them values.

 

What about lambdas? Are they values? In order to answer that question, they need to fulfill the same three prerequisites as integers and strings.

Can a lambda be assigned to a variable? Fire up IRB and find out:

>> is_even = lambda { |x| x % 2 == 0 }
=> #<Proc:0x007fa8a309c448@(IRB):1 (lambda)>
>> is_even.call(4)
=> true
>> is_even.call(5)
=> false

 

Check. Next, can a lambda be passed into a method? Define complement. This method takes a predicate lambda and a value, and returns the negated result:

>> def complement(predicate, value)
>> not predicate.call(value)
>> end
=> :complement
>> complement(is_even, 4)
=> false
>> complement(is_even, 5)
=> true

 

So yes, a lambda can most definitely be passed into a lambda. Now, for the final hurdle: can a lambda be a return value? Modify complement so that it only takes in one argument:

>> def complement(predicate)
>> lambda do |value|
>> not predicate.call(value)
>> end
>> end
=> :complement

 

What would you expect now if you invoked complement(is_even)? Let’s find out:

>> complement(is_even)
=> #<Proc:0x007fa8a31ef4f8@(IRB):8 (lambda)>
We get back another lambda: Strike three! For completeness, go ahead and supply some values:
>> complement(is_even).call(4)
=> false
>> complement(is_even).call(5)
=> true

As you can see, we have been treating lambdas as first-class functions all along. Being able to pass around functions like values means that we can conveniently assign, pass around, and return tiny bits of computation.

 

It should be noted that Ruby’s methods are not first-class functions. Instead, lambdas, Procs, and blocks step in to fill in that role.We will circle back now to implementing Notifier and see how first-class functions can lead to decoupled code.

 

Implementing Notifier

Let’s see how Notifier can be implemented:
closures/notifier.rb
class Notifier
attr_reader :generator, :callbacks
def initialize(generator, callbacks)
@generator = generator
@callbacks = callbacks
end
def run
result = generator.run
if result
callbacks.fetch(:on_success).call(result)
else
callbacks.fetch(:on_failure).call
end
end
end

The meat of the code lies in the run method. the result contains the generated report. If the result is non-nil, then the on_success callback is invoked. Otherwise, the on_failure one will be called.

 

There’s a tiny subtlety that is easy to miss, and that’s where the beauty of this technique lies.

Take a closer look at how the on_success callback is defined: on_success: lambda { |r| puts "Send #{r} to boss@acme.co" }

 

What’s the value of r? Well, at the point where this lambda was defined, no one knows. It is only at the point when we know that the generated report is non-nil do we pass the result into the success callback.

 

Let’s work with some examples. First, create a good report and run the notifier:

good_report = OpenStruct.new(to_csv: "59.99,Great Success")
Notifier.new(Generator.new(good_report),
on_success: lambda { |r| puts "Send #{r} to boss@acme.co" },
on_failure: lambda { puts "Send email to ben@acme.co"} ).tap do |n|
n.run #=> Send 59.99,Great Success to boss@acme.co end
Now, create a bad report and run the notifier again:
bad_report = OpenStruct.new(to_csv: nil)
Notifier.new(Generator.new(bad_report),
on_success: lambda { |r| puts "Report sent to boss@acme.co: #{r}" },
on_failure: lambda { puts "Whoops! Send email to ben@acme.co"} ).tap do |n|
n.run #=> Whoops! Send email to ben@acme.co end

This is a very flexible technique. Notice that Notifier doesn’t dictate how you should handle success and failure cases. All it does is invoke the appropriate callbacks.

 

This means that you are free to log errors to a file or send your boss an SMS when a report has been successfully generated—all without modifying the original Notifier class.

 

Now you should be ready for something slightly more challenging. You will get to implement one of the most useful operations inspired by functional programming—fold left.

 

Implementing Enumerable#reduce (or Fold Left)

class functions

Now that you know what closures and first-class functions are, let’s put your new skills to the test. We will conclude this blog by learning how to implement the reduce method. Ruby already has an implementation in the Enumerable class.

 

However, for our purposes, we are only going to use lambdas. reduce (sometimes known as fold left) is one of the operations that can be found in almost every functional programming language. In fact, it is so useful that many other non-functional languages have adopted it as part of the standard library.

 

Let’s do a quick refresher on how to reduce works. It needs an array, a binary operation—an operation that takes two operands—and a starting value, commonly known as an accumulator. reduce then combines all the elements of the array by applying the binary operation and returns the accumulated value. An example is in order.

 

Summing Values Using reduce

One common use for reducing is, to sum up the values of an array. Try it out in IRB:

>> [1,2,3,4,5].reduce(10) { |acc, x| p "#{acc}, #{x}"; acc + x } "10, 1""11, 2" "13, 3" "16, 4" "20, 5" => 25

reduce is given a starting value of 10, followed by a lambda that adds two numbers. The first argument of the lambda is the accumulator, while the second argument represents the current iterated value of the array.

 

How does this work behind the scenes? Instead of adding reduce directly into Array, we will spice things up by using only lambdas. Furthermore, we will pretend that Enumerable#each doesn’t exist. We will take baby steps and figure out how to implement this:

adder.call(10, [1, 2, 3, 4, 5]) #=> 25

 

the adder is a lambda that takes two arguments. The first argument is the accumulator and given a starting value of 10. The other argument is the array. We want to add all the elements of the array. Given this information, we can sketch out the adder lambda:

adder = lambda do |acc, arr|

# To be filled in

end

Let’s think about what we need to accomplish. We need to iterate through the array somehow. Each element of the array is going to be added to the accumulator—that is, if there are still elements left in the array. Once the iteration completes, we can return the accumulated result.

 

Let’s tackle the easy case first. When the array is empty, the accumulated result is returned:

adder = lambda do |acc, arr|
if arr.empty?
acc
else
# To be filled in
end
end

Now comes the fun bit. Given that we cannot use Enumerable#each, how else can you iterate through the list?

 

Recursion to the rescue! Here’s the trick: each time we add the first element of the list to the accumulator, we invoke adder with the new accumulated value, along with the remainder of the list.

 

This means that eventually, we will run out of elements of the list, and the final accumulated value will be returned.

 

The final implementation is beautiful in its simplicity and elegance:

closures/adder.rb
1: adder = lambda do |acc, arr|
2: if arr.empty?
3: acc
4: else
5: adder.call(acc + arr.first, arr.drop(1))
6: end
7: end

Notice the recursive call to adder on line 5. Each call adds the first element of the array to the accumulator. It also decreases the size of the array by removing the first element of the array.

 

Multiplying Values Using reduce

Let’s implement something similar, but for the multiply operation. Here’s how we’d invoke it:
multiplier.call(2, [2, 4, 6])) #=> 96
Can you figure out what the implementation of multiplier would look like? It turns out that it looks quite similar to adder:
closures/multiplier.rb
multiplier = lambda do |acc, arr|
if arr.empty?
acc
else
multiplier.call(acc * arr.first, arr.drop(1))
end
end

What has changed? Other than the variable names, the only thing that changed is the binary operation represented by the * multiplication symbol. Your “Don’t Repeat Yourself” alarm should be going off right about now because there’s a lot of duplication between adder and multiplier. Let’s put our refactoring hat on and get cracking!

 

Abstracting a Reducer

Reducer

As it stands, each binary operation is hard-coded into the branch when the array is not empty. What we can do here is to pull out the binary operation, make it an argument, and pass it into the recursive call of the lambda.

 

Here’s a possible first attempt. The reducer lambda takes an additional parameter —binary_function, a lambda that represents the binary operation. Try this out and you’ll realize that it works as expected:

reducer = lambda do |acc, arr, binary_function| if arr.empty?
acc
else
reducer.call(binary_function.call(acc, arr.first), arr.drop(1), binary_function)
end
end
reducer.call(1, [1,2,3,4,5], lambda { |x, y| x + y }) #=> 16

What do you think of this piece of code? I don’t like that binary_function is being passed as the third argument in the recursive call. In other words, since we are already passing in binary_function when reducer is invoked, we shouldn’t have to explicitly pass it in at every other invocation.

 

We can achieve this by abstracting a lambda that takes just the accumulator and the array, as seen on line 2. Here’s the result:

closures/reducer.rb
1: reducer = lambda do |acc, arr, binary_function|
-reducer_aux = lambda do |acc, arr|
-if arr.empty?
- acc
5: else
- reducer_aux.call(binary_function.call(acc, arr.first), arr.drop(1))
- end
- end
-
10: reducer_aux.call(acc, arr)
- end
-
- reducer.call(1, [1,2,3,4,5], lambda { |x, y| x + y }) #=> 16

 

binary_function is a free variable inside of reducer_aux. The value of binary_function is supplied by the reducer, the outermost lambda.

So there you have it. You should now have a better idea how Enumerable#reduce is implemented behind the scenes. reduce is a very useful tool to have, and is used in most functional languages to great effect. This implementation using lambdas is very similar to how functional languages might implement it.

 

Having lambdas as a language construct enables a very different kind of programming style that you might not otherwise be used to—a testament to the versatility of Ruby.

 

Test Your Understanding!

Now it’s time to work that gray matter! Do the following exercises to be sure you have grasped the concepts that were presented in this blog. Don’t skip this!

  • 1. What is the definition of a closure?
  • 2. Identify the free variable in the following:
 style="margin:0;height:198px;width:971px">def is_larger_than(amount)
lambda do |a|
a > amount
end
end
Here’s an example run:
>> larger_than_5 = is_larger_than(5)
>> larger_than_5.call(4)
=> false
>> larger_than_5.call(5)
=> false
>> larger_than_5.call(6)
=> true

 

3. You work in a music store and you’ve been tasked with writing a miniature database to store artists and album titles. The database should be able to insert, delete, and list entries, but you cannot use objects other than arrays and hashes. Only lambdas are allowed. Here’s the API:

>> db = new_db.call
>> db[:insert].call("Eagles", "Hell Freezes Over")
=> Hell Freezes Over
>> db[:insert].call("Pink Floyd", "The Wall")
=> The Wall
>> db[:dump].call
=> {"Eagles"=>"Hell Freezes Over", "Pink Floyd"=>"The Wall"}
>> db[:delete].call("Pink Floyd") => The Wall
>> db[:dump].call
=> {"Eagles"=>"Hell Freezes Over", "Pink Floyd"=>nil}
4. The complement method was previously defined as such:
def complement(predicate)
lambda do |value|
not predicate.call(value)
end
end
Convert complement into a lambda that returns another lambda. You should then be able to invoke complement like so:
>> complement.call(is_even).call(4)
=> false
>> complement.call(is_even).call(5)
=> true

5. Usually, we think of reducing as combining the elements of a list into a single value. However, you might be surprised to realize that it is more general than that. Here’s a challenge. By only using reduce, take [1, 2, 3, 4, 5] and turn it into [2, 4, 6, 8, 10].

 

Beautiful Blocks

Beautiful Blocks

Blocks are effectively a type of closure. Blocks capture pieces of code that can be passed into methods to be executed later. In a sense, they act like anonymous functions.

 

Blocks are ubiquitous in Ruby and are one of the defining characteristics of Ruby—you can immediately tell that it’s Ruby code once you see the familiar do ... end or curly braces.

 

It’s virtually impossible to write any meaningful Ruby program without using blocks. In order to understand and appreciate real-world Ruby code, it’s imperative that you understand how blocks work and how to use them.

 

There are two main objectives in this blog. The first is to make sure you understand how blocks are used. In order to do that, you will learn about the yield keyword and the block_given? the method by writing your own methods that take blocks as input. You will also learn what block variables are, and their relationship to blocks acting as closures.

 

The second objective is to get you well acquainted with the various ways that blocks are used in Ruby—block patterns if you will. You will write code that enumerates a collection such as an array or a hash.

 

Having the skills to use blocks in conjunction with the classes in the Ruby Standard Library will save you precious time, especially when you start to realize how blocks can make methods extremely versatile.

 

However, blocks have a lot more to offer than going through the elements of a collection. Other block patterns that are pervasive in real-world Ruby code include resource management, object initialization, and the abstraction of pre-and post-processing. You will be writing code that explores each of these patterns in the sections that follow.

 

Along the way, you will get to work with some meta-programming goodness and learn the secret to creating Ruby DSLs.

 

By the end of this blog, you will gain a deeper appreciation of blocks and understand how to use them effectively in your own code. You will be confident in writing your own code that uses blocks.

 

You will also have an understanding of how DSLs are built in Ruby, and you won’t be intimidated when you look at a foreign-looking DSL.

 

Separating the General from the Specific

programming technique

The ability to encapsulate behavior into blocks and pass it into methods is an extremely useful programming technique. This lets you separate the general and specific pieces of your code. Open IRB and let’s explore what this separation of concerns looks like.

Suppose you have a range of numbers from 1 to 20, and you’re interested in only getting the even numbers. In Ruby, this is how you can do it:

 

>> Array(1..20).select { |x| x.even? } => [2, 4, 6, 8, 10, 12, 14, 16, 18, 20]

Later, you decide that the list is too big and you want to add another condition: the even numbers must also be greater than 10:

>> Array(1..20).select { |x| x.even? and x > 10 } => [12, 14, 16, 18, 20]

 

Notice that the only code that you had to change was contained within the blocks. That is the actual “business logic” piece. You didn’t have to implement your own special version of Array#select in order to cope with a change in requirements. This also comes up pretty often with sorting.

 

Imagine that you’re working on an e-commerce site that sells sports shoes, and you want to display a selection of the products on the main page:

>> require 'ostruct'
>> catalog = []
>> catalog << OpenStruct.new(name: 'Nike', qty: 20, price: 99.00)>> catalog << OpenStruct.new(name: 'Adidas', qty: 10, price: 109.00)>> catalog << OpenStruct.new(name: 'New Balance', qty: 2, price: 89.00)

 

It’s plain to see that we have a pretty wide selection of footwear. Now, the boss wants to display the products by the lowest priced first:

> catalog.sort_by { |x| x.price }
=> [#<OpenStruct name="New Balance", qty=2, price=89.0>,
#<OpenStruct name="Nike", qty=20, price=99.0>, #<OpenStruct name="Adidas", qty=10, price=109.0>]
What if now she wants the products with the highest quantity to be displayed first?
>> catalog.sort_by { |x| x.qty }.reverse
=> [#<OpenStruct name="Nike", qty=20, price=99.0>, #<OpenStruct name="Adidas", qty=10, price=109.0>, #<OpenStruct name="New Balance", qty=2, price=89.0>]

 

In both instances, all you had to change was the code in the block. In fact, you didn’t have to change the implementation of Enumerable#sort_by. You were able to trust that the method would do its job provided you gave it a reasonable sorting criterion to work with.

So how is this possible? With yield.

Executing Blocks with the yield Keyword

 

When you see yield anywhere in a Ruby block, you should think “execute the block.” Try out the following in IRB:

>> def do_it
>> yield
>> end
=> :do_it
This is a pretty plain-looking piece of code that just executes any block you give it:
>> do_it { puts "I'm doing it" } I'm doing it
=> nil

 

Outputting a string doesn’t return any value. In other words, this block is executed merely for its side effects. Now, let’s make a block that returns a value and pass it into the do_it method:

>> do_it { [1,2,3] << 4 } => [1, 2, 3, 4]

 

What happens when we don’t pass in a block to do_it?

>> do_it

LocalJumpError: no block given (yield)

from (IRB):28:in `do_it'

 

IRB helpfully informs you that the method was not given a block to execute. You might want to pass arguments into a block. For example, say you want a method that passes in two arguments to a block:

 >> def do_it(x, y)
>> yield(x, y)
>> end
=> :do_it
Now, we can pass in and execute any block that takes two arguments:
>> do_it(2, 3) { |x, y| x + y } => 5
>> do_it("Ohai", "Benevolent Dictator") do |greeting, title|
"#{greeting}, #{title}!!!"
end
=> "Ohai, Benevolent Dictator!!!"

There’s a tiny gotcha to yield’s argument-passing behavior. It is more tolerant of missing and extra arguments than you might expect. Missing arguments will be set to nil, and extra arguments will be silently discarded.

 

Let’s modify the method and give it fewer arguments. In this definition of do_it, the yield is only given a single argument:

>> def do_it(x)
>> yield x
>> end
=> :do_it
Observe what happens when the method receives a block that expects two arguments:
>> do_it(42) { |num, line| "#{num}: #{line}" } => "42: "

 

If you find this behavior slightly strange, you can think of yield acting a little like a parallel assignment, in that nils are assigned to missing arguments:

>> a, b = 1 # => 1

>> b # => nil

As previously noted, missing arguments are assigned nil, which explains the lack of an error. What happens if the yield is given more arguments than expected? Once again, think about the parallel assignment analogy:

>> a, b = 1,2,3 => [1, 2, 3]
>> a
=> 1
>> b
=> 2
In this case, 3 is discarded. Redefine do_it once more:
>> def do_it
>> yield "this", "is", "ignored!"
>> end
=> :do_it
Now, pass in a block that takes in no arguments:
>> do_it { puts "Ohai!" } => Ohai!

 

Once again, Ruby executes the code without a hitch. This is also consistent with the parallel assignment behavior as previously demonstrated.

 

Keep in mind this argument passing behavior; otherwise, you might waste precious time figuring out why Ruby doesn’t throw an exception when you think it would, especially when writing unit tests. Now let’s look at the relationship between blocks and closures.

 

Blocks as Closures and Block Local Variables

Blocks Closures

In Ruby, blocks act like anonymous functions. After all, blocks carry a bunch of code, to be called only when yielded. A block also carries around the context in which it was defined:

def chalkboard_gag(line, repetition) repetition.times { |x| puts "#{x}: #{line}" }
end
chalkboard_gag("I will not drive the principal's car", 3)
This returns:
0: I will not drive the principal's car
1: I will not drive the principal's car
2: I will not drive the principal's car

 

What’s the free variable here? It is a line. That’s because the line is not a block local variable. Instead, it needs access to the outer scope until it reaches the arguments of chalkboard_gag.

 

The behavior of the preceding code shouldn’t be too surprising, because it seems rather intuitive. Imagine now if Ruby didn’t have closures. The block then wouldn’t be able to access the arguments. You can simulate this by declaring line to be a block local variable by preceding it with a semicolon:

def chalkboard_gag(line, repetition)
» repetition.times { |x; line| puts "#{x}: #{line}" } end
Block local variables are declared after the semicolon. Now line in the block no longer refers to the arguments of chalkboard_gag:
0:
1:
2:

 

Block local variables are a way to ensure that the variables within a block don’t override another outer variable of the same name. This essentially circumvents the variable capturing behavior of a closure.

Here’s another example:

x = "outside x"

1.times { x = "modified from the outside block" }

puts x # => "modified from the outside block"

 

In this example, the outer x is modified by the block, because the block closes over the outer x, and therefore has a reference to it. If we want to prevent this behavior, we could do this:

x = "outside x"
1.times { |;x| x = "modified from the outside block" }
puts x # => "outside x"

 

That covers most of what there is to know about block variables. In the next section, we take a look at different block patterns that are often seen in Ruby code. These patterns cover enumeration, resource management, and object initialization. Next, let’s look at some patterns that use blocks, starting with enumeration.

 

Block Pattern #1: Enumeration

Enumeration

You may have fallen in love with Ruby because of the way it does enumeration:

>> %w(look ma no for loops).each do |x|
>> puts x
>> end
look
ma
no
for
loops
=> ["look", "ma", "no", "for", "loops"]

 

Besides being very expressive, enumeration using blocks is more concise and less error-prone. It is concise because the block captures exactly what we want to do with each element (printing it out to the console). It is less error-prone compared to traditional for loops because it does away with indices that are prone to the infamous off-by-one error.

 

You should be familiar with this way of iterating over a collection, such as an Array. What is interesting is how these methods are implemented under the hood.

Going through the process of building your own implementation will give you a much deeper understanding of how methods and blocks work.

 

Implementing Fixnum#times

While it’s not surprising that Ruby is an object-oriented language, the extent of “object-orientedness” often surprises newcomers to Ruby. For example, most wouldn’t associate a number with the notion of an object. However, Ruby begs to differ by making code like this possible:

>> 3.times { puts "D'oh!" } D'oh!

D'oh!

D'oh! => 3

How is this possible? The answer is two-fold. First, 3 is an object of the Fixnum class. Second, the Fixnum#times method is what makes the preceding code possible.

 

What can we say about the Fixnum#times method? Well, it executes the block exactly three times. This information is taken from the instance of the Fixnum, 3. This detail is important, as you will soon see.

What can we say about the parameters of the block? Well, not much, since the block doesn’t take any parameters. Let’s implement Fixnum#times. Additionally, we will assume that each doesn’t exist.

 

Create a file called fixnum_times.rb. Fill in an initial implementation like so:

class Fixnum

def times

puts "This does nothing yet!"

end

end

 

Thanks to Ruby’s open classes, we have now just overridden the default version of Fixnum#times and replaced it with our own (currently non-working) one. Load the file in IRB using the following command:

$ IRB -r ./fixnum_times.rb
Let’s try this out:
>> 3.times { puts "D'oh!" } puts "This does nothing yet!"
=> nil

 

For now, nothing happens since we have overridden the default Fixnum#times method with our empty implementation. Remember that we imposed the constraint that we cannot use Array#each? The reason is that would make things too easy for us. We can fall back to a while loop:

blocks/fixnum_times.rb
class Fixnum
def times
x = 0
while x < self
x += 1
yield
end
self
end
end
Now, redo the steps with the updated code:
% IRB -r ./fixnum_times.rb >> 3.times { puts "D'oh!" } D'oh!
D'oh!
D'oh! => 3

 

Again, the self is the Fixnum instance, also known as 3 in our example. In other words, it is using the value of the number to perform the same number of iterations. Pretty nifty, if you ask me. The most important part of the code here is yield.

 

In this example, the yield is called without any arguments, which is exactly what the original implementation expects. The return value of the times method is the number itself, hence self is returned at the end of the method.

 

Let’s keep up the momentum and implement Array#each.
Implementing Array#each
Take a close look at how the Array#each method is invoked:
>> %w(look ma no for loops).each do |x|
>> puts x
>> end
look
ma
no
for
loops
=> ["look", "ma", "no", "for", "loops"]

 

The block accepts one argument. Create a new file, array_each.rb. As with the previous example, fill it with an empty implementation of the method that we are going to override:

class Array
def each
end
end
%w(look ma no for loops).each do |x|
puts x
end
Let’s test that this implementation does override the default behavior:
$ ruby array_each.rb
# Returns nothing

 

Since you don’t have the help of Array#each (that’s the whole point of this exercise), the iteration needs to be tracked manually. Once again, it’s time to reach for the humble while loop:

blocks/array_each.rb
class Array
def each
x = 0
while x < self.length
yield self[x]
x += 1
end
end
end
%w(look ma no for loops).each do |x|
puts x
end
Now when you run array_each.rb, the results get printed as expected:
$ ruby array_each.rb look
ma no for loops

Notice how the self is being used here. First, the while loop uses self.length to determine if it should continue looping or break out of the loop. Second, the individual elements of the array are accessed via self[x]. The value of this is passed into the supplied block, which, in our example, simply prints the elements out. 

 

Blocks can do more than enumeration. In fact, one common use case of blocks is to handle resource management. Let’s explore how.

 

Block Pattern #2: Managing Resources

Managing Resources

Blocks are an excellent way to abstract pre- and post-processing. A wonderful example of that is resource management. Examples of resources that require extra care include file handles, socket connections, and database connections.

 

For example, failure to close a database connection means that down the line, another connection attempt might be refused, since the number of connections that a database can handle is finite and limited.

 

Remembering to open and close the resource is a largely manual affair. This is error-prone and requires a bit of boilerplate. In the following example, the programmer is trying to open a file and write a few lines to it. The last line is where the programmer closes the file handle:

f = File.open('Leo Tolstoy - War and Peace.txt', 'w')
f << "Well, Prince, so Genoa and Lucca"
f << " are now just family estates of the Buonapartes."
f.close

 

What happens if the programmer forgets to close the file with f.close? The severity depends on how long the program runs. If this code were to be part of a one-off script, then the situation wouldn’t be that bad.

 

The file handle would be terminated once the script finished execution. But if you have a long-running application like a daemon or web application, then this is bad news.

 

That’s because the operating system can only support a finite number of file handles. If the long-running daemon continuously opens files and doesn’t close them, soon enough the file handles will run out, and you’ll get a call or page in the middle of the night. In other words, you have a resource leak on your hands.

 

If you think about it, the only thing we really want is to write to the file. Having to remember to close the file handle is a hassle. Ruby has a very elegant way of doing this, using blocks:

File.open('Leo Tolstoy - War and Peace.txt', 'w') do |f| f << "Well, Prince, so Genoa and Lucca"

f << " are now just family estates of the Buonapartes."

end

 

By passing in a block into File. open, Ruby helps you, the over-burdened (and downright lazy) developer, to close the file handle when you’re done writing the program.

 

Notice that the file handle is nicely scoped within the block. In other words, f only exists within the confines of the block. But where exactly is the file closing taking place? Let’s find out.

 

[Note: You can free download the complete Office 365 and Office 2019 com setup Guide.]

 

Implementing File.open

Ruby documentation

Let’s unravel the mysteries of File.open. First of all, the Ruby documentation provides an excellent overview of File.open. If you read carefully, it even provides hints of how it is implemented: With no associated block, File.open is a synonym for:: new.

 

If the optional code block is given, it will be passed the opened file as an argument, and the File object will automatically be closed when the block terminates. The value of the block will be returned from File.open.

 

This description alone is enough to kickstart our File. open implementation. Create file_open.rb and follow along.

 

1. If no block is given, File.open is the same as File.new:

class File
def self.open(name, mode)
new(name, mode) unless block_given?
end
end

 

2. If there’s a block, the block is then passed the opened file as an argument...

class File
def self.open(name, mode)
file = new(name, mode)
return file unless block_given?
» yield(file) end
end

 

3. ...and the file is automatically closed when the block terminates...

class File
def self.open(name, mode)
file = new(name, mode)
return file unless block_given?
yield(file)
» file.close
end
end

There’s a gotcha here. What happens if an exception is raised in the block? file.close will not be called, which defeats the whole point of this exercise. Thankfully, this is an easy fix with the ensure keyword:

blocks/file_open.rb
class File
def self.open(name, mode)
file = new(name, mode)
return file unless block_given?
yield(file)
» ensure
» file.close
» end
end

 

Now, file.close is always guaranteed to close properly.

 

4. The value of the block will be returned from File.open.

File.open.
Let’s see if this works. Open file_open.rb in IRB:
% IRB -r ./file_open.rb
Let’s get meta and open the file you just opened:
>> File.open("file_open.rb", "r") do |f|
>> puts f.path
>> puts f.ctime
>> puts f.size
>> end
file_open.rb
2016-11-13 08:32:24 +0800
238
=> nil

 

Since yield(file) is the last line, the value of the block will be returned from With a little bit of work, File.open frees you from having to remember to close file handles, handles exceptional cases, and to top it off, lets you do this in a simple and beautiful API. Speaking of beautiful, blocks are also great for object initialization, as you will soon see in the next section.

 

Block Pattern #3: Beautiful Object Initialization

There are a couple of ways to initialize an object in Ruby. Oftentimes, this is under the guise of applying configuration on an object. They usually mean the same thing. Here’s an example is taken from the Twitter Ruby Gem:

client = Twitter::REST::Client.new do |config|
config.consumer_key = "YOUR_CONSUMER_KEY"
config.consumer_secret = "YOUR_CONSUMER_SECRET"
config.access_token = "YOUR_ACCESS_TOKEN"
config.access_token_secret = "YOUR_ACCESS_SECRET"
end

See if you can guess how this is implemented. Here’s a hint: consumer_key and the rest belong to accessors of Twitter::REST:: Client. You’re going to build the bare minimum to get the preceding code to work. The code is doing two things:

 

The object instantiation bit is trivial. What’s interesting here is that the initializer of Twitter::REST:: Client accepts a block. This block takes a single argument called config. Within the block body, the fields of the config object are set using various values.

 

Implementing an Object Initialization DSL

 Object Initialization DSL

Create a file called object_init.rb. You will need to set up the modules and class

for Twitter::REST::Client:
blocks/object_init.rb
module Twitter
module REST
class Client
end
end
end
client = Twitter::REST::Client.new do |config|
config.consumer_key = "YOUR_CONSUMER_KEY"
config.consumer_secret = "YOUR_CONSUMER_SECRET"
config.access_token = "YOUR_ACCESS_TOKEN"
config.access_token_secret = "YOUR_ACCESS_SECRET"
end

 

If you try to run this code, you won’t get any errors. Why? Because Ruby ignores the block when it’s not called within the method body since there is no yield (yet). Let’s make that block do something. We know from looking at it that:

  • 1. It is being called from the initializer.
  • 2. It accepts one single argument, the config object.
  • 3. The config object has a couple of setters, such as consumer_key.

 

The main thing to realize is that the config object can be the same instance created by Twitter::REST:: Client.new. Why the can?

 

The short answer is that you can make things a bit more complicated by passing in a configuration object, but let’s stick to simple. Therefore, you now know one additional thing:

 

4. config and the instantiated object can be the same thing. Now you should have a better idea:

blocks/object_init.rb
module Twitter
module REST
class Client
» attr_accessor :consumer_key, :consumer_secret,
» :access_token, :access_token_secret
»
» def initialize
» yield self
» end
end
end
end
client = Twitter::REST::Client.new do |config|
config.consumer_key = "YOUR_CONSUMER_KEY"
config.consumer_secret = "YOUR_CONSUMER_SECRET"
config.access_token = "YOUR_ACCESS_TOKEN"
config.access_token_secret = "YOUR_ACCESS_SECRET"
end

 

In the initializer, self (that is, the Twitter::REST:: Client instance) is passed into the block. Within the block body, the instance methods are called. These instance methods were created with attr_accessor. You can now try it out:

client = Twitter::REST::Client.new do |config|
config.consumer_key = "YOUR_CONSUMER_KEY"
config.consumer_secret = "YOUR_CONSUMER_SECRET"
config.access_token = "YOUR_ACCESS_TOKEN"
config.access_token_secret = "YOUR_ACCESS_SECRET"
end
p client.consumer_key
What happens if you initialize the client without a block? You’ll get a
LocalJumpError complaining that no block is given:
'initialize': no block given (yield) (LocalJumpError)
This is an easy fix. Remember block_given?? You can use it in initialize:
def initialize
yield self if block_given?
end

Run the code again, and this time everything should work as expected. We will revisit this example soon, where we consider an even more flexible approach to initializing the config object. But before that, let’s enter into the world of meta-programming and DSL creation.

 

Implementing a Router DSL

Router DSL

In the previous section, you saw one flavor of object initialization using blocks. Here’s another example, adapted and modified from Rails. Let’s imagine that you want to define a bunch of routes in a web framework, such as Rails.

 

Routes are rules that you declare for an incoming web request. These rules invoke the appropriate controller and controller method, depending on the pattern of the URL of the incoming web request.

 

For example, if the web server receives a request with http://localhost:3000/users, a route would parse the incoming request URL and ensure that the index method of the UsersController will be invoked.

 

In Rails, this file is located in config/routes.rb. In older versions of Rails, the syntax looked like this:

routes = Router.new do |r|
r.match '/about' => 'home#about'
r.match '/users' => 'users#index'
end
However, as Rails evolved, the way routes were defined also changed:
routes = Router.new do
match '/about' => 'home#about'
match '/users' => 'users#index'
end

In the new syntax, the block no longer expects an argument. With nothing to pass into yield, which object does match belong to? Learning the techniques in creating this variation to object instantiation will also allow you to create DSLs.

 

Create a file called router.RB, and fill it in with this initial implementation:

routes = Router.new do
match '/about' => 'home#about'
match '/users' => 'users#index'
end
class Router
# We are going to implement this!
end
The end goal is to print out the routes:
{"/about"=>"home#about"}
{"/users"=>"users#index"}

An actual router implementation will parse the string and invoke the appropriate controller and method. For our purpose, printing out the routes will suffice. Your job is to fill in the body of the Router class. One way the task can potentially be simplified is to create an implementation that looks like this:

routes = Router.new do |r|

r.match '/about' => 'home#about'

r.match '/users' => 'users#index'

end

 

This should look familiar to you. Let’s go through building up this class, though this time the pace will be faster.

 

The initializer of Router takes a block that accepts a single argument. That argument is the object itself. You should also be able to infer that the Router has an instance method called a match that takes a hash as an argument. Here’s the fastest way to implement this class:

class Router
def initialize
yield self
end
def match(route)
puts route
end
end
Of course, the match method doesn’t do anything interesting. But that’s not the point. Here’s the challenge: how do you get from
routes = Router.new do |r|
r.match '/about' => 'home#about'
r.match '/users' => 'users#index'
end
to this?
routes = Router.new do
match '/about' => 'home#about'
match '/users' => 'users#index'
end

To make that leap, you will need to know about instance_eval and some meta-programming gymnastics. Let’s get to that right away.

 

Using instance_eval to Change self

When a method is called without a receiver (the object that the method is called on), it is assumed that the receiver is self. What does self-mean in a block? Open IRB and let’s find out. Let’s ask IRB the oldest existential question in the world: what is self?

>> self => main

>> self.class => Object

 

Recall that in Ruby, everything is an object. This means that when you perform some operation, this is done within the context of some object. In this case, the main is an object that belongs to the Object class.

 

Now, let’s answer the next question. Within the confines of a block, what does self-refer to? Try this out in IRB:

>> def foo
>> yield self
>> end
=> :foo
>> foo do
>> puts self
>> end
main
=> nil

In a block, the self is in the context where the block was defined. Since the block was defined in the main scope, self-refers to the main. This means that doing

routes = Router.new do
match '/about' => 'home#about'
end
will result in an error:
in `block in <main>': undefined method `match' for main:Object (NoMethodError)

This is because Ruby is trying to look for the match method defined on main. You need to tell Ruby to evaluate the match method in the context of the Router. In less malleable languages, it is extremely difficult to accomplish this. Not with Ruby.

 

Changing Context with instance_eval

You need to somehow convince Ruby that the self in self.match '/about' => 'home#about'

refers to the Router instance, not the main object. This is exactly what instance_eval is for. instance_eval evaluates code in the context of the instance. In other words, instance_eval changes self to point to the instance you tell it to.

 

Back to router.rb. Modify initialize to look like this:

class Router
def initialize(&block)
# self is the Router instance instance_eval &block
end
# ...
end

There are some new things with this code. Previously, all our encounters with block invocation were implicit via yield. Here, the code makes explicit that the block should be captured in &block.

 

The reason it needs to be stored in a variable is that the block needs to be passed into instance_eval. What about the "&" in &block? For that, we need to take a quick detour to learn about block-to-Proc conversion.

 

Block-to-Proc Conversion

Block Conversion

We have not covered Procs yet. For now, think of them as lambdas. Blocks are not represented as objects in Ruby. However, instance_eval expects to be given an object. Therefore, we need to somehow turn a block into an object.

 

In Ruby, this is done via a special syntax: &block. When Ruby sees this, it internally converts the captured block into a Proc object. The rules for block-to-Proc conversion can be utterly confusing. Here’s a useful way to remember it:

  • 1. Block → Proc if &block is in a method argument.
  • 2. Proc → Block if &block is in the method body.

 

Now that you understand block-to-Proc conversion, let’s circle back to where we left off and see how everything is put together. This is the final result:

 

blocks/router.rb
class Router
def initialize(&block)
instance_eval &block
end
def match(route)
puts route
end
end
routes = Router.new do
match '/about' => 'home#about'
end

When the Router instance is given a block, it is converted into a Proc, then passed into instance_eval. Since the context where instance_eval is invoked is Router, the Proc object is also evaluated in the Router context. This means that the match method is invoked on the Router instance.

 

Now that you know how instance_eval and block-to-Proc conversion works, let’s revisit an earlier example on object initialization and apply your newfound knowledge.

 

Object Initialization, Revisited

Object Initialization

Let’s say you want the initializer to be slightly more flexible and take an options hash. Furthermore, you also think that specifying config.consumer_secret is too verbose. In other words, something like this:

client = Twitter::REST::Client.new({consumer_key: "YOUR_CONSUMER_KEY"}) do
consumer_secret = "YOUR_CONSUMER_SECRET"
access_token = "YOUR_ACCESS_TOKEN"
access_token_secret = "YOUR_ACCESS_SECRET"
end

 

How could this be implemented? First, the hash of options would need to be iterated through. This would then be followed by calling the block:

» def initialize(options = {}, &block)
» options.each { |k,v| send("#{k}=", v) }
» instance_eval(&block) if block_given?
» end

 

Take note of the extra = in calling send. Because the methods were constructed using attr_accessor, Ruby uses the trailing = to indicate that the method is an attribute writer. Otherwise, it assumes it is an attribute reader.

 blocks/object_init_revised.rb
module Twitter
module REST
class Client
attr_accessor :consumer_key, :consumer_secret, :access_token, :access_token_secret
» def initialize(options = {}, &block)
» options.each { |k,v| send("#{k}=", v) }
» instance_eval(&block) if block_given?
» end
end
end
end
client = Twitter::REST::Client.new({consumer_key: "YOUR_CONSUMER_KEY"}) do
consumer_secret = "YOUR_CONSUMER_SECRET"
access_token = "YOUR_ACCESS_TOKEN"
access_token_secret = "YOUR_ACCESS_SECRET"
end
p client.consumer_key # => YOUR_CONSUMER_KEY p client.access_token # => YOUR_ACCESS_TOKEN

 

Now, anyone initializing a Twitter::Rest:: Client can choose to do it via an options hash, use a block, or use a combination of both. Now it’s time to flex those programming muscles and work on the exercises.

 

Test Your Understanding!

Test Your Understanding

Time to put on that thinking cap and flex that gray matter. Remember, in order to really understand the material, you should attempt to complete the following exercises. None of them should take too long, and you have my permission to peek at the solutions if you get stuck.

 

1. Implement Array#map using Array#each:

%w(look ma no for loops).map do |x|
x.upcase
end
This should return ["LOOK", "MA", "NO", "FOR", "LOOPS"].

 

2. Implement String#each_word:

"Nothing lasts forever but cold November Rain".each_word do |x| puts x
end
This should output:
Nothing
lasts
forever
but
cold
November
Rain

3. It’s your turn to implement File.open. Start off with the Ruby documentation. The key here is to understand where to put pre- and post-processing code, where to put yield, and ensure that resources are cleared up.

 

4. Here’s some real-world code adapted from the Ruby Redis library:

module Redis
class Server
# ... more code ...
def run
loop do
session = @server.accept
begin
return if yield(session) == :exit
ensure
session.close
end
end
rescue => ex
$stderr.puts "Error running server: #{ex.message}"
$stderr.puts ex.backtrace
ensure
@server.close
end
# ... more code ...
end
end

Notice the similarities to the File. open example. Does run require a block to be passed in? How is the return result of the block used? How could this code be called?

 

5. Implementing the ActiveRecord DSL

ActiveRecord DSL

Active Record is an object-relational mapper used in Rails, which connects objects, such as Micropost, to a database table. A migration in Active Record is a file that, when executed, makes changes to the database. Here’s an example of migration in ActiveRecord:

ActiveRecord::Schema.define(version: 20130315230445) do
create_table "microposts", force: true do |t| t.string "content"
t.integer "user_id"
t.datetime "created_at"
t.datetime "updated_at"
end
end

 

You don’t have to know how Active Record works. Your job is to implement the DSL. Here’s the full code, with placeholders for you to implement your solution:

module ActiveRecord
class Schema
def self.define(version, &block)
# *** <code here> ***
end
def self.create_table(table_name, options = {}, &block)
t = Table.new(table_name, options)
# *** <code here> ***
end
end
class Table
def initialize(name, options)
@name = name
@options = options
end
def string(value)
puts "Creating column of type string named #{value}"
end
def integer(value)
puts "Creating column of type integer named #{value}"
end
def datetime(value)
puts "Creating column of type datetime named #{value}"
end
end
end
ActiveRecord::Schema.define(version: 20130315230445) do
create_table "microposts", force: true do |t|
t.string "content"
t.integer "user_id"
t.datetime "created_at"
t.datetime "updated_at"
end
end

If you’ve done everything right, you will see:

Creating a column of type string named content

Creating a column of type integer named user_id

Creating a column of type DateTime named created_at

Creating a column of type DateTime named updated_at

 

The Power of Procs and Lambdas

Recall that blocks by themselves are not objects—they cannot exist by themselves. In order to do anything interesting with a block, you need to pass a block into a method.

 

Procs have no such restrictions because they are objects. They allow you to represent a block of code (anything between a do ... end) as an object. Some languages call these anonymous functions, and indeed, they do play the part. 

 

Procs are ubiquitous in real-world Ruby code, although chances are, you might not be using them that much. Through the examples, you’ll learn how to use them effectively in your own code.

 

Ruby also uses Procs to perform some really nifty tricks.

For example, have you ever wondered how ["o","h","a","i"].map(&:upcase) expands to ["o","h","a","i"].map { |c| c.upcase) }? 

 

Procs also assume another form: lambdas. While they serve similar functions (pun intended!), it is also important to learn about their differences, so that you will know when to use which at the right time.

 

One technique that Procs enable but hasn’t seen very wide use is carrying, a functional programming concept. Although its practical uses (with respect to Ruby programming) are pretty limited, it’s still a fun topic to explore.

 

Procs and the Four Ways of Calling Them

Ruby embraces

Unlike the language named after a certain serpent, Ruby embraces TMTOWTDI (pronounced as Tim Toady), or There’s more than one way to do it. The calling of Procs is a wonderful example. In fact, Ruby gives you four different ways:

1. Proc#call(args)
2. .(args)
3. Threequals
4. Lambdas

Fire up IRB. Let’s begin by creating a very simple Proc:

>> p = proc { |x, y| x + y }

=> #<Proc:0x007ffb12907940@(IRB):1>

 

There are two things to notice here. First, the return value tells you that a Proc has been created. Second, Ruby provides a shorthand to create Procs. This is really a method in the Kernel class:

>> p = Kernel.proc { |x, y| x + y } => #<Proc:0x007ffb12907940@(IRB):1>
Of course, since Proc is just like any other class, you can create an instance of it the usual way:
>> p = Proc.new { |x, y| x + y }
=> #<Proc:0x007ffb12907940@(IRB):1>
Now you know how to create a Proc. Time to make it do some work. The first
way is to use Proc#call(args):
>> p = proc { |x,y| x + y }
>> p.call("oh", "ai") => "ohai"
>> p.call(4, 2)
=> 6

In fact, this is my preferred way of invoking Procs because it conveys the intent of invocation much better than the alternatives, which are presented next.

 

Ruby provides a shorthand for the call(args) method: .(args). Therefore, the previous example could have been rewritten as such:

>> p = proc { |x,y| x + y }
>> p.("oh", "ai")
>> p.(4, 2)
Here’s an interesting Ruby tidbit. Turns out, the .() syntax works across any class that implements the call method. For example, here’s a class with only the call method:
class Carly
def call(who)
"call #{who}, maybe"
end
end
c = Carly.new
c.("me") # => "call me, maybe"

 

You should avoid using. () if you can, because this could potentially confuse other people who might not be familiar with the syntax.

 

Ruby has an even quirkier syntax for invoking Procs:

p = proc { |x,y| x + y }
p === ["oh", "ai"]
The === operator is also known as the threequals operator. This operator makes it possible to use a Proc in a case statement. Look at the following code:
even = proc { |x| x % 2 == 0 }
case 11
when even
"number is even"
else
"number is odd"
end
Here, even, when given a number, returns true or false depending on the case statement. For example:
>> even = proc { |x| x % 2 == 0 }
>> even === 11
=> false
>> even === 10 => true
Note that invoking a Proc that expects a single argument this way is incorrect and results in a confusing error message:
>> even = proc { |x| x % 2 == 0 }
>> even === [11]

NoMethodError: undefined method `%' for [11]:Array from (IRB):1:in `block in IRB_binding' Next, let’s look at lambdas and how they relate to Procs.

 

Lambdas—Same, But Different

Procs have a lot of similarities with lambdas. In fact, you might be surprised to learn that lambda is a Proc:

>> lambda {}.class => Proc
A Proc, however, is not a lambda:
>> proc {}.class => Proc
Fortunately, Ruby has a helpful predicate method that lets you disambiguate
procs and lambdas via the Proc#lambda? method:
>> lambda {}.lambda?
=> true
>> proc {}.lambda?
=> false

 

You might be wondering why there isn’t a corresponding proc? method. Some thinking leads you to realize that this method would be pretty useless. Let’s assume the method existed:

# NOTE: This method doesn't exist! >> lambda {}.proc?

true

# NOTE: This method doesn't exist! >> proc {}.proc?

true

Does this hypothetical example show that a method like proc? would not in any way help us differentiate between a lambda and a Proc since both are Procs.

 

Invoking lambdas is identical to invoking Procs, with the exception of the lambda keyword:

lambda { |x, y| x + y }.call(x, y)
lambda { |x, y| x + y }[x, y]
lambda { |x, y| x + y }.(x, y)
lambda { |x, y| x + y } === [x, y]
If you find typing lambda too verbose for your taste, Ruby offers an alternative, affectionately known as the stabby lambda syntax:
->(x, y){ x + y }.call(x, y)
->(x, y){ x + y }[x, y]
->(x, y}{ x + y }.(x, y)
->(x, y}{ x + y } === [x, y]

If you tilt your head a certain way and cross your eyes a little, you might see that -> looks like the Greek letter lambda, λ. So now that you’ve seen the similarities between lambdas and procs, what’s the difference between these two? And since they’re so similar, when should you use one or the other?

 

The Difference Between a Lambda and a Proc

 Lambda Proc

A lambda and a proc have two important differences: arity and return semantics.

 

Arity refers to the number of arguments a function takes. While this definition usually applies to functions or methods, it is also applicable to lambdas and procs. Fire up IRB and let’s do some exploring. Create a lambda and a proc:

>> l = lambda { |x, y| puts "x: #{x}, y: #{y}" }
>> p = proc { |x, y| puts "x: #{x}, y: #{y}" }
Then invoke them:
>> l.call("Ohai", "Gentle Reader")
>> p.call("Ohai", "Gentle Reader")
You’ll see the following result from each:
x: Ohai, y: Gentle Reader

 

Now here comes the interesting bit. What happens if we supply one less argument? Here’s Proc’s response:

>> p.call("Ohai")
#=> x: Ohai, y:
The Proc seems perfectly fine having less than expected arguments. What about
lambda?
>> l.call("Ohai")
=> ArgumentError: wrong number of arguments (1 for 2)
Turns out, lambdas get upset if you give them less arguments than expected. Now, let’s try the same exercise but with one extra argument. Again, here’s Proc:
>> p.call("Ohai", "Gentle", "Reader")
#=> x: Ohai, y: Gentle Reader
Here’s the lambda:
>> l.call("Ohai", "Gentle", "Reader")
#=> ArgumentError: wrong number of arguments (3 for 2)

So, the moral of the story is that lambdas, unlike Procs, expect the exact number of arguments to be passed in. For Procs, unassigned arguments are given nil. Extra arguments are silently ignored.

 

Now, let’s look at the other difference: return semantics. A Proc always returns from the context it was created. Let’s unpack this a little and see why this is important. Create a new file called some_class.rb with the following contents:

procs_lambdas/someclass.rb
class SomeClass
def method_that_calls_proc_or_lambda(procy) puts "calling #{proc_or_lambda(procy)} now!" procy.call
puts "#{proc_or_lambda(procy)} gets called!"
end
def proc_or_lambda(proc_like_thing)
proc_like_thing.lambda? ? "Lambda" : "Proc"
end
end

 

SomeClass has two methods:

method-class

method_that_calls_proc_or_lambda takes a Proc or lambda, prints out a message, then invokes the Proc or lambda. Depending on how the Proc or lambda returns, the final puts statement will execute.

proc_or_lambda is a tiny helper function that tells us if proc_like_thing is a lambda or
Proc. Let’s begin the experiment with the lambda:
>> c = SomeClass.new
>> c.method_that_calls_proc_or_lambda lambda { return }
With a lambda, the second puts statement is executed:
calling Lambda now!
Lambda gets called!
Next, do the same thing, but use a proc instead:
>> c = SomeClass.new
>> c.method_that_calls_proc_or_lambda proc { return }
Observe:
'block in <main>': unexpected return (LocalJumpError)

What just happened? Not only did the second puts statement not execute, but we landed ourselves onto a LocalJumpError. In order to understand why let’s go back to the line that started off this section:

 

A Proc always returns from the context it was created.

So, what is the context from which the Proc was created?

c.method_that_calls_proc_or_lambda proc { return }

 

It should be clear by now that the Proc was created in the main context. Therefore, proc { return } means returning from the main context. That is impossible because the main context is the top-most level. Look at the error again:

'block in <main>': unexpected return (LocalJumpError)

Now you can understand what the error means. The <main> bit refers to the main context. The unexpected return is due to the Proc returning from the main context, an impossible feat.

 

Should I Use lambdas or Procs?

Use lambdas

For the most part, you should be fine using lambdas. That’s because the return semantics of lambdas resemble the intuitive behavior of methods.

 

Sometimes though, you might want to use Procs. One reason might be you need multiple arities to be supported. You’ll learn more in the Symbol#to_proc section where you’ll reimplement a very nifty Ruby trick made possible by Procs.

 

Now let’s review the Ruby method Symbol#to_proc and see how it’s implemented.

 

How Symbol #to_proc Works

 beauty of Ruby

Symbol#to_proc is one of the finest examples of the flexibility and beauty of Ruby. This syntax sugar allows us to take a statement such as

words.map { |s| s.length }

and turn it into something more succinct:

words.map(&amp;:length)

 

Let’s unravel this syntactical sleight of hand by figuring out how this works.

The first step is to figure out the role of the &: symbol. How does Ruby know that it has to call a to_proc method, and why is this only specific to the Symbol class?

 

When Ruby sees a & and an object—any object—it will try to turn it into a block. This is simply a form of type coercion.

Take to_s, for example. Ruby allows you to do 2.to_s, which returns the string representation of the integer 2. Similarly, to_proc will attempt to turn an object— again, any object—into a Proc.

 

This might seem a little abstract, so in order to make things more concrete, it’s time to open IRB.

 

Reimplementing Symbol #to_proc

Symbol

In order to understand what happens behind the scenes, you’ll create an object and then pass it into the map. If you’re expecting this to fail, you are absolutely right, but that is the whole point. The error messages that Ruby provides will guide you to enlightenment.

>> obj = Object.new

>> [1,2,3].map &obj

 

TypeError: wrong argument type Object (expected Proc)

The error message is telling us exactly what you need to know. It’s saying that obj is, well, an Object and not a Proc. In other words, we must teach the Object class how to turn itself into a Proc. Therefore, the Object class must have a to_proc method that returns a Proc. Let’s do the simplest thing possible:

>> class Object
>> def to_proc
>> proc {}
>> end
>> end
=> :to_proc
>> obj = Object.new
>> [1, 2, 3].map &obj => [nil, nil, nil]

 

When you run this again, you’ll get no errors. But notice that the result is an array of nils. How can each element be accessed and, say, printed out? The Proc needs to accept arguments:

>> class Object
>> def to_proc
>> proc { |x| "Here's #{x}!" }
>> end
>> end
=> :to_proc
>> obj = Object.new
>> [1,2,3].map(&obj)
=> ["Here's 1!", "Here's 2!", "Here's 3!"]
This hints at a possible implementation of Symbol#to_proc. Let’s start with what you know and redefine to_proc in the Symbol class:
>> class Symbol
>> def to_proc
>> proc { |obj| obj }
>> end
>> end
=> :to_proc
You now know that an expression such as
words.map(&:length)
is equivalent to
words.map { |w| w.length }

 

Here, the Symbol instance is the length. The value of the symbol corresponds to the name of the method. You also know how to access each yielded object by making the Proc return value in to_proc take in an argument.

 

For the preceding example, this is what you want to achieve:

class Symbol
def to_proc
proc { |obj| obj.length }
end
end
You can even try this out:
>> class Symbol
>> def to_proc
>> proc { |obj| obj.length }
>> end
>> end
=> :to_proc
>> ["symbol", "cymbals", "sambal"].map(&:obj) => [6, 7, 6]

Unfortunately, this only works on objects that have the length method.

 

How can Symbol #to_proc be made more general?

Stmbol Method

Well, how can the name of the symbol be turned into a method call on the obj? This can be answered in two parts.

First, using Kernel#send, any method can be invoked on an object dynamically as long as the right symbol is supplied. For example:

>> "ohai".send(:length) => 4

 

In other words, send allows you to dynamically invoke methods using a symbol. In this example, length is hard-coded in the Symbol#to_proc method. The next step is to make the method more general, which brings us to the next part of the answer.

 

Instead of hard-coding: length, you can make use of self, which in the case of a Symbol, returns the value of the symbol.

Therefore, you can make use of self, which holds the value of the name of the method such as length, and pass it to the send method to invoke the method on obj. I hereby present you our own implementation of Symbol#to_proc:

procs_lambdas/symbol_to_proc.rb
class Symbol
def to_proc
proc { |obj| obj.send(self) }
end
end
Try it out. Save this code to a file called symbol_to_proc.rb and then load it in IRB:
$ IRB -r ./symbol_to_proc.rb
Then test it out:
>> ["symbols", "cymbals", "sambal"].map(&:length) => [7, 7, 6]
>> ["symbols", "cymbals", "sambal"].map(&:upcase) => ["SYMBOLS", "CYMBALS", "SAMBAL"]
self is the symbol object (:length in our example), which is exactly what #send expects.

 

Improving on Symbol # to_proc

The initial implementation of Symbol #to_proc is naïve. The reason is that only the obj in the body of the Proc in considered, while the arguments are totally ignored. Recall that unlike lambdas, Procs are more relaxed when it comes to the number of arguments they’re given.

 

It’s therefore easy to circumvent this limitation. It’s instructive to see what happens when lambda is used instead of a Proc.

 

First, we return a lambda instead of a Proc in to_proc. Recall that lambda is a Proc, so everything should work as normal:

class Symbol
def to_proc
» lambda { |obj| obj.send(self) } end
end
words = %w(underwear should be worn on the inside)
words.map &:length # => [9, 6, 2, 4, 2, 3, 6]

 

Since lambdas are picky when it comes to the number of arguments, is there a method that requires two arguments? Of course: inject/reduce. The usual way of writing inject is:

[1, 2, 3].inject(0) { |result, element| result + element } # => 6

 

As you can see, the block in inject takes two arguments. Let’s see how our implementation does by using the &: symbol notation:

[1, 2, 3].inject(&:+)
Here’s the error we get:
ArgumentError: wrong number of arguments (2 for 1)
from (IRB):10:in `block in to_proc'
from (IRB):14:in `each'
from (IRB):14:in `inject'

 

You can now clearly see that an argument is missing. The lambda currently accepts only one argument, but what it received was two arguments. You need to allow the lambda to take in more arguments:

class Symbol
def to_proc
» lambda { |obj, args| obj.send(self, *args) } end
end
[1, 2, 3].inject(&:+) # => 6

 

Now it works as expected! The splat operator (that’s the * in *args) enables the method to support a variable number of arguments. Before you go about celebrating, there’s one problem. The following code doesn’t work anymore:

words = %w(underwear should be worn on the inside)
words.map &:length # => [9, 6, 2, 4, 2, 3, 6]
You’ll see the following output when you run it:
ArgumentError: wrong number of arguments (1 for 2)
from (IRB):3:in `block in to_proc'
from (IRB):8:in `map'

 

There are two ways to fix this. First, you can supply args with a default value:

procs_lambdas/symbol_to_proc_lambda.rb
class Symbol
def to_proc
» lambda { |obj, args=nil| obj.send(self, *args) }
end
end
words = %w(underwear should be worn on the inside)
words.map &:length # => [9, 6, 2, 4, 2, 3, 6]
[1, 2, 3].inject(&:+) # => 6
Alternatively, you can just use a Proc again:
procs_lambdas/symbol_to_proc_final.rb
class Symbol
def to_proc
» proc { |obj, args| obj.send(self, *args) } end
end
words = %w(underwear should be worn on the inside)
words.map &:length # => [9, 6, 2, 4, 2, 3, 6]
[1, 2, 3].inject(&:+) # => 6

 

This is one of the few places where having a more relaxed requirement with respect to arities is important and even required.

 

Currying with Procs

function

The word “curry” comes from the mathematician Haskell Curry.  In functional programming, currying is the process of turning a function that takes n arguments into one that takes a single argument but returns n functions that take one argument.

 

For example, given a lambda that accepts three parameters:

>> discriminant = lambda { |a, b, c| b**2 - 4*a*c }
>> discriminant.call(5, 6, 7)
=> -104
you could convert it into this:
>> discriminant = lambda { |a| lambda { |b| lambda { |c| b **2 - 4*a*c } } }
>> discriminant.call(5).call(6).call(7)
=> -104
In Ruby, there’s a shorter way to do this using Proc#curry:
>> discriminant = lambda { |a, b, c| b**2 - 4*a*c }.curry
>> discriminant.call(5).call(6).call(7)
=> -104

Notice that using Proc#curry alleviates the need to have nested lambdas, as seen in the previous example.

 

Why Was Proc#curry Added?

Here’s the cheeky response coming straight from Yukihiro Matsumoto, creator of Ruby:

I consider this method (Proc#curry) to be trivial and should be treated like an Easter egg for functional programming kids.

 

Alright, so even Ruby’s creator thinks that you wouldn’t have much use for Proc#curry. But don’t let that dampen your learning spirit! Currying is very useful for creating new functions from existing ones.

 

It gets more useful in functional languages (and languages such as Haskell use it to great effect), but you can see examples of this in Ruby:

 >> greeter = lambda do |greeting, salutation, name|
>> "#{greeting} #{salutation} #{name}"
>> end
In the preceding function, if you wanted to use greeter, you would have to supply all three arguments:
>> greeter.call("Dear", "Mr.", "Gorbachev") => "Dear Mr. Gorbachev"

 

What if you wanted to construct a greeter that always started with “Dear”? With a curried Proc or lambda, you very well can:

 greeter = lambda do |greeting, salutation, name|
>> "#{greeting} #{salutation} #{name}"
>> end
>> dear_greeter = greeter.curry.call("Dear")
=> #<Proc:0x007f902ba542f0 (lambda)>
With dear_greeter defined, you can use it like so:
>> dear_greeter.call("Great").call("Leader") => "Dear Great Leader"
Of course, if you find .call slightly verbose, you can always write it this way:
>> dear_greeter.("Great").("Leader") => "Dear Great Leader"
Here, dear_greeter is constructed from greeter by partially applying the first argument. Let’s see another example:
sum_ints = lambda do |start, stop|
(start..stop).inject { |sum, x| sum + x }
end
sum_of_squares = lambda do |start, stop| (start..stop).inject { |sum, x| sum + x*x }
end
sum_of_cubes = lambda do |start, stop| (start..stop).inject { |sum, x| sum + x*x*x }
end
What do you notice? All three of the preceding lambdas have the same structure:
sum = lambda do |start, stop|
(start..stop).inject { |sum, x| sum + ??? }
end

 

This suggests that some form of refactoring can occur. The only thing that is different is the portion marked as ???. It’s straightforward to extract out the common logic and make it an argument:

sum = lambda do |fun, start, stop|
(start..stop).inject { |sum, x| sum + fun.call(x) }
end
So what does this buy you? Well, now you can make use of sum to build other methods:
sum_of_ints = sum.(lambda { |x| x }, 1, 10)
sum_of_squares = sum.(lambda { |x| x*x }, 1, 10)
sum_of_cubes = sum.(lambda { |x| x*x*x }, 1, 10)

 

Of course, even doing it this way requires you to specify all the arguments up front. What if you wanted to calculate sum_of_squares but wanted to defer supplying the ranges? For example, a user has the option to select the range via a form. Here’s currying to the rescue:

sum_of_squares = sum.curry.(lambda { |x| x*x })
Now, you can make use of sum_of_squares with any valid range you desire:
sum_of_squares.(1).(10) => 385
sum_of_squares.(50).(100) => 295475

 

That’s enough currying for now. It’s time to open that terminal and work that brain with some exercises.

Test Your Understanding!

1. Reimplement Symbol#to_proc. Now that you’ve seen how Symbol#to_proc is implemented, you should have a go at it yourself.

2. You can use #to_proc instantiate classes. . Consider this behavior:
class SpiceGirl
def initialize(name, nick)
@name = name
@nick = nick
end
def inspect
"#{@name} (#{@nick} Spice)"
end
end
spice_girls = [["Mel B", "Scary"], ["Mel C", "Sporty"],
["Emma B", "Baby"], ["Geri H", "Ginger",], ["Vic B", "Posh"]]
p spice_girls.map(&SpiceGirl)
This returns:
[Mel B (Scary Spice), Mel C (Sporty Spice),
Emma B (Baby Spice), Geri H (Ginger Spice), Vic B (Posh Spice)]

 

This example demonstrates how to_proc can be used to initialize a class.

Implement this.

3. How can you tell the difference between a Proc and lambda in Ruby?

4. What is the class of proc {}? What about lambda {}?

5. Which of these will cause an error? Why?

join_1 = proc { |x, y, z| "#{x}, #{y}, #{z}"
join_2 = lambda { |x, y, z| "#{x}, #{y}, #{z}"
}
}
join_1.call("Hello", "World")
join_2.call("Hello", "World")
6. Which of these will cause an error? Why?
join_1 = proc { |x, y, z| x + y + z }
join_2 = lambda { |x, y, z| x + y + z }
join_1.call(1, 2)
join_2.call(1, 2)

 

Building Your Own Lazy Enumerables

Lazy Enumerables

In this blog, you’ll implement your own cheap counterfeit of Ruby’s lazy enumerable using the concepts you’ve learned in the rest of this blog. Going through this process will also expose you to some of the more advanced concepts of blocks, enumerables, and enumerations.

 

Understanding Lazy Enumerables

Let’s do a brief recap of terminology before we dive into lazy enumerables. What’s the difference between an enumerable and an enumerator?

 

In Ruby, an Enumerable is a collection class (such as Array and Hash) that contains methods for traversal, searching, and sorting.

An Enumerator, on the other hand, is an object that performs the actual enumeration. There are two kinds of enumeration—internal and external—which will be explained in External vs. Internal Iteration.

 

The lazy enumeration was introduced in Ruby 2.0. What exactly is lazy? Is Ruby trying to slack off? Well, the “lazy” in lazy enumeration refers to the style of evaluation. To understand this better, let’s review the opposite of lazy evaluation: an eager evaluation. 

 

You’re already familiar with eager evaluation, as that’s the usual way most Ruby code is written. But sometimes, as in life, being overly eager is a bad thing. For instance, what do you think the following code evaluates to?

 

>> 1.upto(Float::INFINITY).map { |x| x * x }.take(10)

You might expect the result to be:

[1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

Unfortunately, the Ruby interpreter just goes on and on infinitely. The main offender is this piece:

 

1.upto(Float::INFINITY)

1.upto(Float::INFINITY) represents an infinite sequence:

>> 1.upto(Float::INFINITY)

=> #<Enumerator: 1:upto(Infinity)>

 

No surprise here; that expression returns an enumerator. Note that no results are returned at this point, just a representation of an infinite sequence.

Now, let’s try to force values out from the enumerator using the Enumerator#to_a method. Try it on a small and finite sequence first:

>> 1.upto(5).to_a => [1, 2, 3, 4, 5]

Now, repeat the same method call, but this time on the infinite sequence:

>> 1.upto(Float::INFINITY).to_a

 

You shouldn’t be surprised by now that this will lead to an infinite loop. Enumerator.to_a “forces” values out of an enumerator.

 

As an interesting side note, the to_a is aliased to force. This method is useful when you want to know all the values produced by an enumerator. You will be using this method later on.

 

So how can you convince Ruby not to evaluate every single value? Enter Enumerable#lazy. This method creates a lazy enumerable. Now, to infinity and

beyond:

>> 1.upto(Float::INFINITY).lazy.map { |x| x * x } => #<Enumerator::Lazy: #<Enumerator::Lazy:

#<Enumerator: 1:upto(Infinity)>>:map>

With lazy, the 1.upto(Float::INFINITY) enumerator has been made lazy by being wrapped up in an Enumerator::Lazy class, which has a lazy version of map.

 

Let’s try the very first expression that caused the infinite computation, but this time with Enumerable#lazy:

>> 1.upto(Float::INFINITY).lazy.map { |x| x * x }.take(10) => #<Enumerator::Lazy: #<Enumerator::Lazy:

#<Enumerator::Lazy:

#<Enumerator: 1:upto(Infinity)>>:map>:take(10)>

What just happened? Instead of getting ten values, it turns out that even Enumerable#take is wrapped up! How can you get your values then?

 

Enumerable#to_a to the rescue:

>> 1.upto(Float::INFINITY).lazy.map { |x| x * x }.take(10).to_a => [1, 4, 9, 16, 25, 36, 49, 64, 81, 100]

Why is Enumerable#take also wrapped up? This lets you do more lazy chaining. It allows you to hold off getting values out of the enumerator up until the point where you really need the values.

 

How does this sorcery work behind the scenes? You are about to find out. You will create your own version of lazy enumerables, albeit a minimalistic version. Along the way, you will also learn interesting aspects of Ruby’s enumerators that you probably wouldn’t have known about. Let’s get started.

 

Building Our Skeleton

Skeleton

First, you have to decide where the implementation should live. To do that, you have to find out where the Enumerator:: Lazy method lives. If you head over to the official documentation, you might find a clue shown in the figure.

So, the Enumerator class is the parent of the Lazy class. This is easy enough to translate to code:

class Lazy < Enumerator
end
For our exercise, we’ll use another name instead of reopening the existing Ruby class. A quick trip to the thesaurus yields a synonym to Lazy. Introducing, Lax:
lazy_enumerable/skeleton/lax.rb
class Lax < Enumerator
end

Notice that we’re inheriting from Enumerator. Let’s look at why.

 

External vs. Internal Iteration

What does inheriting from Enumerator buy you? In order to answer that question, let’s review the Enumerator class. According to the documentation, Enumerator is:

 

A class which allows both internal and external iteration. What is the difference between the two flavors of iteration? The key lies in who controls the iteration: the enumerable or the enumerator. For internal iteration, it is the Array object (or any Enumerable) that controls the iteration. In fact, that’s how you normally interact with Enumerables.

 

External iteration, on the other hand, is controlled by some other object wrapped around an Enumerable. Why would you want external iterators in the first place? Sometimes, you do not want to iterate through all of the elements in one pass.

 

You might want to say, “Give me exactly one now, and when I need the next one, I will ask again.” In other words, external iterators let you control the state of the enumeration. That lets you pause and rewind the enumeration as you see fit.

Internal iterators do not give you that ability. Once you kick-start an enumeration, there’s no turning back.

 

Creating an Enumerator from an Enumerable

Remember, an Enumerator wraps an Enumerable. You can see this in action in an IRB session:

>>> e = Enumerator.new([1,2,3])
warning: Enumerator.new without a block is deprecated; use Object#to_enum
>> e.next => 1
>> e.next => 2
>> e.next => 3
>> e.next
StopIteration: iteration reached an end
from (IRB):7:in `next'

When you wrap an array with an enumerator, you can then call the Enumerator#next method multiple times to retrieve the next value. When there are no more values left, the StopIteration exception is raised.

 

Notice that in the first snippet that you entered, Ruby complains about either using Object#to_enum or creating the enumerator with a block. Let’s pick the second option and use a block:

>> e = Enumerator.new do |yielder|
>> [1,2,3].each do |val|
>> yielder << val>> end
>> end
=> #<Enumerator: #<Enumerator::Generator:0x007fb9798e0668>:each>
And as usual, we can call Enumerator#next:
>> e.next => 1
>> e.next => 2
>> e.next => 3
>>
>> yielder << val>> e.next
StopIteration: iteration reached an end
from (IRB):16:in `next'
from (IRB):16
from /Users/benjamintan/.rbenv/versions/2.2.0/bin/IRB:11:in `<main>'

 

Let’s look at the code again, because there’s more than meets the eye. There are a few questions that come up:

e = Enumerator.new do |yielder|

[1,2,3].each do |val|

yielder << val

end

end
  • 1. What is this yielder object that is passed into the block?
  • 2. What does yielder << value do?
  • 3. Most importantly, how is it possible that simply wrapping an Enumerable enables the ability to retrieve each element of the enumerable one by one?

 

Let’s tackle the first two questions. The yielder object is passed into the block when an Enumerator object is created with a block. The purpose of the yielder object is to store the instructions for the next yield. Note that this is not the value; it is the instructions. That is what specifies.

 

Here’s an interesting and potentially confusing side note: the << is aliased to the yield method, but it has nothing to do with the yield keyword. When the Enumerator#next method is called, it emits the next value and returns.

 

This suggests that the yielder object must keep some form of state. How is this achieved? The return value from the previous code listing gives us a clue:

#<Enumerator: #<Enumerator::Generator:0x007fb9798e0668>:each>

This tells us that an Enumerator object contains another object called Enumerator::Generator.

 

Generators and Fibers, Oh My!

Let’s take a detour and explore the Enumerator:: Generator class. Generators can convert an internal iterator, such as [1,2,3], into an external one. Generators are the secret sauce that allows the one-by-one retrieval of the elements of an enumerable.

 

Here’s how a generator works:

  • 1. First, it computes some result.
  • 2. This result is handed back to the caller.
  • 3. In addition, it also saves the state of the computation so that the caller can resume that computation to generate the next result.

 

One way to do this is to use a little-known Ruby construct called a fiber. The Fiber class is perfect for converting an internal iterator to an external one. You probably won’t use fibers often, but they’re pretty fun to explore. Let’s do a quick run through of the basics.

 

You create a fiber with Fiber.new, with a block that represents the computation:

f = Fiber.new do
x = 0
loop do
Fiber.yield x
x += 1
end
end

 

This block contains an infinite loop. Note the Fiber. yield method in the loop body. That, dear reader, is the secret sauce! Before we get into more details, try running the example in IRB:

>>> f = Fiber.new do
>> x = 0
>> loop do
>> Fiber.yield x
>> x += 1
>> end
>> end
=> #<Fiber:0x007fb979023e58>

 

When you create a fiber like this, the block isn’t executed immediately. So how then is the block executed?

This is done with the Fiber#resume method. Observe:
>> f.resume => 0
>> f.resume => 1
>> f.resume => 2

Now back to the secret sauce. What you have just created here is an infinite number generator. The reason the loop doesn’t run indefinitely is that of the behavior of the Fiber. yield method, not to be confused with the yield keyword.

 

When the code executes Fiber.yield x, the result is returned to the caller, and control is given back to the caller. When Fiber#resume is called again, the variable x is incremented. The loop goes for another round, executing Fiber.yield x again, and once again gives control back to the caller.

 

Keep this ability to be able to start/pause/resume the execution behavior of fibers in mind as we move into the next section, as it’ll help you understand what’s happening as we build our implementation.

 

Implementing Lax

Implementing Lax

Now that you understand how generators work, let’s turn our attention back to the Enumerator:: Lax implementation. You wouldn’t have to use Enumerator:: Generator directly, since that is taken care for us by the yielder object.

 

Let’s think about how the client would use your code. In fact, we will try to mimic the real implementation as far as possible. Therefore, here’s an example:

.upto(Float::INFINITY)
.lax
.map { |x| x*x }
.map { |x| x+1 }
.take(5)
.to_a
This should return [2, 5, 10, 17, 26].

 

Here’s where the sleight of hand comes in. When the lax method is invoked, an instance of the Lax enumerator is returned. When map is called, this method is called on a Lax instance, not the map defined on the enumerable.

 

To enable something like 1.upto(Float::INFINITY).lax and [1,2,3].lax and return a new Lax self is the actual enumerable. 
instance, you have to add a new method to the Enumerable module:
lazy_enumerable/skeleton/lax.rb
» module Enumerable
» def lax
» Lax.new(self)
» end
» end
class Lax < Enumerator

 

It is passed in as an argument to the Lax constructor. If you run the code now, you won’t get any errors, but you will still get the infinite loop. That’s because all you did was provide an extra level of indirection without doing anything interesting. The next step would be to populate the

yielder. So let’s do that:
lazy_enumerable/skeleton/lax.rb
1: class Lax < Enumerator
» def initialize(receiver)
» super() do |yielder|
» receiver.each do |val|
» yielder << val
» end
» end
» end
- end
super vs. super()

 

Notice that the call to super() on line 3 has parentheses. These are absolutely necessary.

The difference between super and super() is that the former passes all the parameters from the current method and hands the parameters to the method from the base class, while the latter calls the method without any arguments. Even with this tiny piece of code, you can already iterate through an infinite collection one by one:

>> e = 1.upto(Float::INFINITY).lax
=> #<Lax: #<Enumerator::Generator:0x007f8324155c30>:each>
>> e.next => 1
>> e.next => 2
>> e.next => 3

Let’s look through the code a little more closely. Online 2, a Lax instance is created by taking in an enumerable, represented by the receiver. Next, the argumentless initializer of Enumerator is invoked and given a block. The iteration is defined by the block given to the enumerator on lines 3-7. The enumerable receiver is essentially wrapped around by this block.

 

Line 4 doesn’t trigger immediately when you create a Lax instance, which is the whole point of creating an enumerator and wrapping it around the enumerable. The block is only invoked when Enumerator#next is called, for example.

 

Each time Enumerator#next is called, and while there are still elements left to iterate over, each value is stuffed into yielder on line 5, and in turn, the yielder hands the values over to the chained method. Let’s illustrate this with an example:

class Lax < Enumerator
def initialize(receiver)
super() do |yielder|
receiver.each do |val|
» puts "add: #{val}" yielder << val
end
end
end
end
lax = Lax.new([1, 2, 3])
lax.map { |x| puts "map: #{x}"; x }
Running this produces the following output (I’ve added line breaks for clarity):
add: 1
map: 1
add: 2
map: 2
add: 3
map: 3

The most important point to note here is that add and map are interspersed. Each time a value is added to the yielder, the yielder then hands the value over to the map block.

Now, let’s say you were to replace the last line with this:

lax.take(0)

 

You should expect that there would be no output. Therefore, you can think about it as supply and demand. The enumerator and yielder control and manage the supply, while methods such as Enumerator#next and Enumerator#map create the demand. Therefore, values get added to yielder only when needed. With the foundation in place, let’s implement the map method.

 

Implementing Lazy map

Let’s implement method chaining on methods such as map and take. Enumerable:: Lax returns a Lax instance, which means we need to define Lax versions of the map and take. Each invocation of map and take will in turn return yet another Lax instance.

 

Recall how the original implementation “wraps” each method call with a “lazy layer.” You’ll do exactly the same thing here.

 

How would the map method in Lax look? Here’s a start:

lazy_enumerable/skeleton/lax.rb
def map(&block)
end

 

You also know that you would need to return a new Lax instance. Also, since the map is going to be chained, you should expect that the self is going to be another Lax instance:

lazy_enumerable/skeleton/lax.rb
def map(&block)
» Lax.new(self) end

 

For the method’s logic, you’ll want to populate the yielder object with elements that are going to be modified by the map method. That means that you must somehow pass two things into the Lax instance in the map: the yielder object, and the element to be mapped on. You can pass these via a block argument like so:

lazy_enumerable/map/lax.rb
def map(&block)
» Lax.new(self) do |yielder, val|
» yielder << block.call(val)
» end
end

 

What does this do? It looks like the block is invoked with Val, and that element is then passed into the yielder. That’s not completely accurate, though. Instead, it’s more like the instructions of how to treat a mapped value being stored inside the yielder object.

 

Since the Lax initializer accepts a block, the constructor needs to be modified:

lazy_enumerable/map/lax.rb
def initialize(receiver)
super() do |yielder|
receiver.each do |val|
» if block_given?
» yield(yielder, val)
» else
» yielder << val
» end
end
end
end

Let’s trace through what happens when a block is supplied, using the Lax version of the map method.

When we supply a block, as in the case of the Lax#map method, it is yielded, where both the yielder object and the element val are given to the yield method.

Does it work? Let’s find out: >> 1.upto(Float::INFINITY).lax.map { |x| x*x }.map { |x| x+1 }.first(5) => [2, 5, 10, 17, 26]

 

Enumerable#first(5) returns the first five elements of the enumerable, which gives you [2, 5, 10, 17, 26]. It is this method that “forces” the Lax enumerable to produce values. That takes care of the map. Let’s implement take.

 

Implementing Lazy take

Now that you’ve implemented map, you can have a go at implementing the take method. As its name suggests, Enumerable#take(n) returns the first n elements from the Enumerable.

 

As with the lazy version of the map, the lazy version of taking also returns a Lax instance, this time wrapping the Enumerable#take method. Here’s how it looks:

lazy_enumerable/take/lax.rb
def take(n)
taken = 0
Lax.new(self) do |yielder, val|
if taken < n
yielder << val
taken += 1
else
raise StopIteration
end
end
end

 

The logic for taking should be easy enough for you to follow. The interesting thing here is how to take signals that the iteration has ended. When taken reaches the limit, a StopIteration exception is raised to break out of the block.

 

While the use of exceptions for control flow is generally frowned upon, this is exactly how Ruby implements take. take throws an exception once the enumerator takes the right number of elements. That exception is handled inside the constructor.

 

What should be done once you get all the values you need? Well, nothing much really. All you need is to handle the exception silently in the initialize method:

lazy_enumerable/take/lax.rb
def initialize(receiver)
super() do |yielder|
» begin
receiver.each do |val|
if block_given?
yield(yielder, val)
else
yielder << val end end » rescue StopIteration » end end end Before you take the code for another spin, here’s what you should have so far: lazy_enumerable/final/lax.rb module Enumerable def lax Lax.new(self) end end class Lax < Enumerator def initialize(receiver) super() do |yielder| begin receiver.each do |val| if block_given? yield(yielder, val) else yielder << val end end rescue StopIteration end end end def map(&block) Lax.new(self) do |yielder, val| yielder << block.call(val) end end def take(n) taken=0 Lax.new(self) do |yielder, val| if taken < n yielder << val taken +=1 else raise StopIteration end end end end With that in place, let’s try out the code:>> 1.upto(Float::INFINITY).lax.map { |x| x*x }.map { |x| x+1 }.take(5).to_a => [2, 5, 10, 17, 26]

 

If you get [2, 5, 10, 17, 26], then you should pat yourself on the back. If not, go back and review your implementation carefully.

Recommend