Ruby Language Basics (Best Tutorial 2019)

Ruby language Basics Tutorial

Ruby language Basics Tutorial.

This Tutorial discusses Ruby basics are and the popular applications that utilize the Ruby programming language. This tutorial contains recipes for working with files and strings. I’ll start, however, with a section on the theory of manipulating strings.

 

It is designed to make you aware of certain ways of manipulating strings (and to give you some useful tools).

 

Manipulating Strings

The Ruby String class has plenty of methods. Following a somewhat “minimalist knowledge” approach (i.e., knowing only as much as is required), only some functions are discussed in this section.

 

Note that the methods discussed here usually return a copy of the part of / modified version original string (as required) and do not modify the original string.

 

length or size

The length method is used to get the size (in characters) of the string. (The same can be achieved by using the size function.)


"abcd".length => 4

empty?

The empty? method returns true if the string is empty; otherwise, it is false.

irb(main):001:0> "hello".empty?

=> false

irb(main):002:0> "".empty?

=> true

strip

strip removes the leading and trailing whitespace (and trailing NUL) characters.

" hello " => "hello"

The functions lstrip or rstrip may be used for removing spaces only from left or right side.

<<

<< is used for concatenation.

irb(main):007:0> a = "hello"

=> "hello"

irb(main):008:0> a << "world"

=> "helloworld"

<=>

<=> compares two strings. It returns –1, 0, or 1 based on whether the first string is lesser than, equal to, or greater than the second.

irb(main):009:0> "hello" <=> "world"

=> -1

irb(main):011:0> 'ddd' <=> 'ccc'

=> 1

capitalize

capitalize returns a copy with the first letter capitalized and the rest in lowercase. (There are quite a few functions in Ruby’s String class that deal with the cases of letters.)


"hello".capitalize => "Hello"

"Hello".capitalize => "Hello"

"HELLO".capitalize => "Hello"

downcase and upcase

As the names suggest, downcase and upcase return strings with the case converted.

irb(main):003:0> "Hello".downcase

=> "hello"

irb(main):004:0> "Hello".upcase

=> "HELLO"

chars

chars returns an array corresponding to the characters in the string.

irb(main):001:0> "abracadabra".chars

=> ["a", "b", "r", "a", "c", "a", "d", "a", "b", "r", "a"]

index

the index is the index of the first occurrence of a character, substring, or pattern in a string. It returns nil if not found.

"Hello".index('e') => 1

 

the index can also start from an offset position in order to look for the second index position (the third character) onward from a string and pass the index position (offset) as the second argument.


irb(main):001:0> "Hello".index('e',1)

=> 1

irb(main):002:0> "Hello".index('e',2)

=> nil

In the latter case, 'e' does not occur on or after the second index (character 3).

insert

insert inserts one given string into another, prior to the given index position.

irb(main):006:0> "abraabra".insert(4,'cad')

=> "abracadabra"

delete

delete returns a new string with characters deleted, as specified. It has a few different forms.


irb(main):007:0> "hello".delete "l" #delete any 'l' from "hello"

=> "heo"

irb(main):008:0> "hello".delete "lo" #delete any 'l' or 'o'

=> "he"

irb(main):009:0> "hello".delete "aeiou", "^e" #delete any of 'a','e', 'i','o','u' except 'e'

=> "hell"

irb(main):010:0> "hello".delete "ek-m" #delete any 'e' or any of 'k' to 'm'

=> "ho"

include?

include? returns a Boolean that indicates whether the argument string is part of the first string.


irb(main):005:0> "Hello world".include?("world")

=> true

 

slice

slice returns part of the string (somewhat like substring function) or returns nil. Note that this function has many forms, including one that returns int. Only one form is discussed in this section.


<String>.slice(start index,length), e.g.

"hello".slice(1,3) => "ell"

 

count

count counts the given character(s). It has a few different forms.

irb(main):001:0> "hello".count('l') # how many 'l' in "hello" => 2

 

partition

partition partitions a string into an array of strings based on (the first occurrence of) a given character or pattern.


irb(main):012:0> "hello".partition('l')

=> ["he", "l", "lo"]

 

tr

tr transforms a string by replacing some characters with others, as specified. It has multiple forms.


irb(main):015:0> "hello".tr('l','m')

=> "hemmo"

irb(main):016:0> "hello".tr('a-f','x')

=> "hxllo"

 

reverse

reverse returns the reverse of the string.


irb(main):017:0> "hello".reverse

=> "olleh"

sub (and GSUB)

 

sub and GSUB have more than one form. One form is discussed in this section; it substitutes specified parts of the string with a replacement.

 

It works with patterns; however, patterns (which could be regular expressions) are discussed in detail later in the book. Here, only results with simple patterns are shown.

 

sub works for the first occurrence and GSUB works for all occurrences in the string (GSUB is a global substitution).


irb(main):007:0> "Hello".sub('H','W')

=> "Wello"

irb(main):008:0> "Hello".sub('l','x')

=> "Hexlo"

irb(main):009:0> "Hello".sub('ll','x')

=> "Hexo"

irb(main):010:0> "Hello".GSUB('l','x')

=> "Hexxo"

 

scan

scan has multiple forms. The general (non-block) form returns an array by dividing the string into tokens of the given pattern. It is best understood in the context of regular expressions. However, it is a very important string function and hence mentioned here.

 

Suppose the pattern /[a-z]+/ means one or more contiguous characters that are anything from a to z. Take a look at the following as an example.


irb(main):018:0> "hello world".scan(/[a-z]+/)

=> ["hello", "world"]

It scans in the string for any such pattern (contiguous a–z). Two such patterns are found, and hence the returned array has those two patterns.

 

Let’s look at another example. Suppose /…/ means that a pattern is signified by any three contiguous characters (exactly three). Then, take a look at the following example.


irb(main):020:0> "hello world".scan(/.../)

=> ["hel", "lo ", "wor"]

It finds only three such patterns because the remaining id is not three characters long.

 

split

the split is a very important function, especially while recognizing columns from an input data file. It splits a string, based on a given separator (or space, if no separator is specified). This is the general form:

str.split( pattern=$;, < limit > ) => array

 

A full discussion of the function is not warranted at this point. However, the pattern is optional and could be a regular expression. The limit is also optional (limits, in general, indicate the number of columns that are to be returned; the last column includes the rest of the string).

 

If the limit is omitted, trailing empty fields are suppressed. If it is 1, the entire string is returned as the only element of the array. If it is negative, there is no limit to the number of fields returned; trailing null fields are not suppressed.

 

The pattern=$! syntax implies that the default value of the pattern is '$;' (which is a predefined variable and the value of that is 'nil' by default. And when occurs, the separator is taken as a single space). 

 

Now it is time for some demonstrations.


irb(main):001:0> arr = "hello world".split => ["hello", "world"]

irb(main):002:0> arr = "hello world".split(' ')

=> ["hello", "world"]

irb(main):003:0> arr = "hello world".split(' ')

=> ["hello world"]

 

Note, that when two spaces have been given a split pattern, the resulting array has only one element (it could not split on the space in between, because that is a single space).


irb(main):004:0> arr = "hello world".split('ll')

=> ["he", "o world"]

irb(main):005:0> arr = "abc,def,ghi".split(',')

=> ["abc", "def", "ghi"]

 

Comma separation is especially useful for CSV file manipulation.


irb(main):006:0> arr = "John,Doe,101 Nowhere Street".split(',',2)

=> ["John", "Doe,101 Nowhere Street"]

irb(main):007:0> arr = "John,Doe,101 Nowhere Street ".split(',',2)

=> ["John", "Doe,101 Nowhere Street "]

irb(main):009:0> arr = "John,Doe,101 Nowhere Street ".split(',')

=> ["John", "Doe", "101 Nowhere Street "]

Note how specifying the limit restricts the return array to two elements; the last element has rest of the string.

 

String Formatting

A string can be formatted in particular ways to print a number in some desired format.

The following example briefly illustrates this.


puts "zero padding"

x = "%05d" % 123 # should be "00123"

puts x

puts "decimal formatting"

y = "%.2f" % 34.9 # should be "34.90"

puts y

 

Accepting Input from the Console

Problem

Take input from the console in Ruby.

 

Solution

If writing to the console uses puts, it is a natural logical extrapolation that the gets function should be used to read from the console.

 

If you are going to use scripting for batch programming alone, you will possibly never need to read input interactively from the console. However, this is a rather basic function of Ruby (and indeed of programming tasks in general) and worth discussing here.

 

Note that this is not the only way you can take input from the console, but perhaps this is the most generally programmatic way for Ruby to take input from a console.

 

Well, a demonstration is in order. Run the following piece of code. Write it in a file and give it a name, such as inp1.rb. Save it and then run it from the console.

x = gets

 

puts x

You will find that the execution got stuck at a point (the beginning of the next line to the command) without coming back to the command prompt.

=>ruby inp1.rb

 

This is because it is waiting for user input from the console (call to gets). Provide the number 3 as input and press the Return key. You should observe the following behavior.

=>ruby inp1.rb

3

3

How It Works

It takes in the value in the x variable and prints it (the value of x) through the puts statement. Try experimenting with a few other inputs (of different types) and see what happens. (Remember to press the Return key every time after you input).


=>ruby inp1.rb

2.5

2.5

=>ruby inp1.rb

c

c

=>ruby inp1.rb

abc

abc

It seems to be handling different types very well on the outset. Note, however, that it is actually taking everything as a string. So the 3 that it printed was a String, not a Fixnum. But even that is not the full story.

Try the following code.


x = gets

puts x * 2

If you provide 3 as input, it does not produce 6; but you see something that might seem strange at first glance.

=>ruby inp2.rb

3

3

3

The first 3 is the input given, of course. The other 3s are the output. The occurrence of the string has been multiplied (i.e., essentially two strings added side by side), but notice also that the interpreted value of x has a newline character in it.

 

Accepting Numbers as Input

Problem Accept numbers as input from the console.

 

Solution

Let’s untangle this part by part. First, how to take it as an integer (I would use “integer” instead of Fixnum in many places because somehow it seems more natural).

Let’s convert the input using the to_i function.

If you run the following code


x = gets.to_i

puts x * 2

and provide 3 as input, the result, I would think, is quite as per expectation.

=>ruby inpint.rb

3

6

 

How It Works

The to_i function did its job of converting the input to an integer. (Notice the new line is also no longer an issue here.) It would not require a great stretch of the imagination to guess that to_f is the corresponding function for converting to Float.

The following code


x = gets.to_f

puts x * 2

with 2.5 as input, should run as follows.

=>ruby inpflt.rb

2.5

5.0

 

More on Getting Rid of the Newline

I guess that this serious enough to warrant a bit more demonstration.

Consider a programmer writing code in a language that requires each statement to end with a semicolon. He is doing this on a Friday evening with a bottle of beer on his desk. (His office environment is rather relaxed, especially on Friday afternoons.

 

Besides, who would notice? His boss was also drinking.) He needed to finish the piece of code soon; otherwise, he would not have done it while having a beer.

 

After a long while, he noticed that he forgot all the semicolons; although he is pretty sure (that’s what he is saying) that everything else is OK and that the code should be otherwise bug-free.

 

By this time he already had a good amount of beer in his system. He does not feel like editing the file just to put in so many semicolons at the end of each line, but he needs to compile the code.

 

Suppose he comes to you for a solution. Perhaps you could do something with a bit of Ruby scripting so that he can quickly compile and test the code, get it done and over with, and head to the nearest pub.

 

■■Note This particular scenario could very easily be done by a regex substitution through a good text editor; but as of now, we will focus on a Ruby solution.

 

As a first approximation, you may want to try the following code (assume the input file name is coord.txt).


infile = File.open('coord.txt','r')

outfile = File.open('modcoord.txt','w')

while (line = infile.gets)

outfile.puts line + ';'

end

outfile.close

infile.close

You will almost be successful—but not quite. The output file has the following content.

int x = 0

;

int y = 0

;

int r = 5

;

float areavar = x * x + y * y – r * r;

 

This is because, except for the last line, each line in the input file comes with a trailing newline character. When the line is read, the character, along with other parts of the line, is added to (assigned as part of) the line variable.

 

When the output string is constructed, the newline part is still there, and hence, the line breaks as they appear.

 

Change the line containing puts as follows


outfile.puts line.chomp + ';'

and then run the program (after saving the file, of course). Now you are truly successful.

This is the output:

int x = 0;

int y = 0;

int r = 5;

float areavar = x * x + y * y – r * r;

 

Again, you can see that chomp is a very useful function in Ruby batch programming. Note that it works equally well when there is a carriage return character along with the new line at the end, and it does not create trouble if there is no newline at the end.


irb(main):003:0> str1="abc\n"

=> "abc\n"

irb(main):004:0> str1.chomp

=> "abc"

irb(main):005:0> str2="abc\r\n"

=> "abc\r\n"

irb(main):006:0> str2.chomp

=> "abc"

irb(main):007:0> str3="abc"

=> "abc"

irb(main):008:0> str3.chomp

=> "abc"

Note that this function can also take an argument (record separator), although this form is highly unlikely to be used in practice.

irb(main):009:0> str="abcd"

=> "abcd"

irb(main):010:0> str.chomp("d")

=> "abc"

If nothing is given as an argument (such as in the case study for putting a semicolon at the end), it uses the default, which is a single set of carriage return characters (usually \r\n).

 

Formatting Strings

Problem

You need to have a string formatted the way that you want, with one or more variables replaced with their proper values.

 

This is especially useful for reporting purposes but also has many other uses. Think of a use case where you have been given a letter format with a subject and text, but the addressee is given as a variable whose values may come from a list of people. Essentially, it is the same letter to be sent to multiple people, but addressing each of them separately by name.

 

Note that in this case (as indeed in many other cases), adding multiple strings with blanks [e.g., + " " + ] is far from the ideal solution).

Solution


Consider the following code.

name = "John"

puts "Hello #{name} how are you ?"

It should run as follows.

=>ruby formstr.rb

Hello John how are you ?

 

How It Works

As you can see, the variable placeholder is defined by variable name encased in #{} within the string (i.e., using a #{<variable name>} construct within the string).

Note that in this case (i.e., for formatted string), the double quotes cannot be replaced by single quotes. In other words, the following code won’t work.


name = "John"

puts 'Hello #{name} how are you ?'

Here, the #{name} part is taken literally, and not as an interpreted value.

=>ruby formstr_bad.rb

Hello #{name} how are you ?

 

It is still a valid string, however, and hence no error is thrown.

  1. Does this substitution also work for other basic types of variables, such as Float?
  2. What prevents us from experimenting?
  3. Write the following code in a file named formstr2.rb. Save the file and run the code.

company = "Rhombus Inc"

year = 2015

total = 1289965.45

puts "In year #{year} net sales of #{company} was #{total} dollars."

The result is not disappointing.

=>ruby formstr2.rb

In the year 2015 net sales of Rhombus Inc was 1289965.45 dollars.

Evidently, it also works with integer and float values in the same fashion.

 

Processing Command-Line Arguments

Problem

For any serious programming language, being able to accept a command-line argument is probably indispensable. In Java, for instance, the main method has an argument that is an array of strings. These arguments to the main method come from command-line arguments (if any).

 

Solution

In a Ruby script, a command-line argument is available as a predefined constant (array) named ARGV (note that the name is case sensitive).

Run the following code the usual way.


name = ARGV[0]

puts "Hello #{name} how are you ?"

The result is not very impressive.

=>ruby argvtst.rb

Hello how are you ?

Use a command-line argument, however, and the result is better.


=>ruby argvtst.rb John

Hello John how are you ?

 

How It Works

Consider the following:

It does not wait for the argument (as with C or Java, for instance).

In Ruby, ARGV[0] denotes the first argument rather than the program name (unlike Java).

For the Ruby script, it is already available in the context. Even though we did not have any explicitly defined main method with named parameter(s), predefined constants can be used this way.

 

An array is a type of collection that you are most likely familiar with from another programming language. Clearly, in Ruby arrays are zero-based (the index starts with 0) and the elements are accessed as <Array-name>[<index_number>]. (e.g., ARGV[0]).

 

In order to accept a second argument, you would use ARGV[1].

Try the following code. You shouldn’t be disappointed.


first_name = ARGV[0]

last_name = ARGV[1]

puts "Hello #{first_name} #{last_name} how are you ?"

It should run as follows.

=>ruby argvtst2.rb John Doe

Hello John Doe how are you ?

 

It is easy to extrapolate what you could do to work with three arguments.

 

Reading from a File

Problem: One of the very basic tasks that you may need to perform for a lot of scripting functionalities is reading data from a file.

 

Quite often you need to read part of a file based on certain criteria—for example, a column containing pieces of data with particular values (e.g., a person’s address). So, you need to know some basic operations, such as opening a file in reading mode.

 

Solution

In the blog code directory, create a file named input.txt with only one line of text that contains the word welcome.

 

In the same directory, you need a readfl.rb program file with the following content.

infile = File.open('input.txt','r')

myword = infile.gets

puts myword

infile.close

Run the code from the command prompt. It should look like this:

=>ruby readfl.rb

welcome

 

How It Works

A bit of explanation is in order.

The first line of code, infile = File.open('input.txt','r'), means this:
  • \1.\ Open a file (from current directory) named input.txt.
  • \2.\ In reading mode.
  • \3.\ And store the file handler in a variable named in_file.

The in_file file handler is needed for further operations on the file.

 

In order to read a line, the gets function is used; however, this time it is called on the (in_file.gets) file handler, hence the instruction is to read the input from the file, which is then stored in the variable named my_word.

 

The third line (puts my_word) is for the output of the value of my_word. Note that it is not using the #{} construct, as it is not within a string any longer, being output by itself without any other string concatenated.

 

The fourth line simply closes the file (which was opened for reading) by calling the close function on the handler.

The second and third lines could have been combined in a single line, as follows.

puts infile.gets

 

The same effect would have been achieved. But for the purpose of better understanding and clarity, the first form is preferable. (Consider someone else trying to maintain your code).

 

Further down the road (figuratively speaking), there are recipes that focus on more interesting issues related to reading from a file.

 

Writing to a File

Problem How do you write to a file in Ruby?

Solution

Try the following code.


outfile = File.open('output.txt','w')
myword = 'welcome'

outfile.puts myword

outfile.close

You should not be surprised if it works. The command prompt should reappear (after running the program) and you should find a file named output.txt created with two lines. The first line has the word welcome (following a newline character, which causes the second line).

 

How It Works

For opening the file (the first line of code), the file name was given. The mode, in this case, is w (open for writing). If you are to perform a write operation on a file, it cannot be opened in reading mode.

 

There are other modes possible, however (such as a for append mode, and r+ for “read and”—as in “read and write”). One of the valid modes can be chosen based on the use case. (However, at this point, let’s make do with the w mode).

 

The puts function has been called on the output file handler to output the value of my word variable. This line (outfile.puts my word) does the writing. The last line closes the file.

 

If you run the code repeatedly, you will find that the file is getting overwritten (the modification timestamp should update). But how does the program behave if you are trying to read an input file that is not present?

 

Rename the file input.txt to input1.txt and run the readfl.rb program written earlier. It comes back with an error.

=>ruby readfl.rb

readfl.rb:1:in `initialize': No such file or directory @ rb_sysopen - input.txt (Errno::ENOENT)

from readfl.rb:1:in `open'

from readfl.rb:1:in `<main>'

This doesn’t look very nice, does it?

 

The situation is understandable, as the file does not exist. In a situation like this, it may be more desirable (especially for the end user of the batch code, who may be a non-IT person or may not have a programming background) to trap the error and provide a more user-friendly message.

 

(Remember, sometimes a set of error messages is more voluminous and it may be difficult even for the developer to quickly get to the real cause of the error). 

 

Creating and Deleting Directories

Problem

Working with directories may be needed for many day-to-day tasks. For example, consider that you have been given a directory that includes a lot of files with different extensions.

 

Some of them have SQL extensions, which are code files; others have data extensions, which are data files.

 

You may want to separate those files, based on their extensions, into two different directories. Working with directories in such fashion is perhaps not done as often as reading from or writing to files, but it is still very useful knowledge.

 

Solution

Run the following code.


require 'fileutils'

FileUtils.mkdir('credit')

A directory named credit is created under the current directory (unless it exists already, of course).

Deleting isn't hugely different. Replacing the mkdir_p function with the rm_rf function should do the trick.

require 'fileutils'

FileUtils.rm_rf('credit')

 

■■Note There is no complaint with deleting if the directory does not exist. This is the same behavior when creating a directory. So if your program is working with the expectation that the directory should always exist prior to deletion, you may want to put checks in place to ensure that it is.

 

How It Works

mkdir is a method defined in the FileUtils module (which can be accessed as FileUtils.mkdir). I

f you do not load the fileutils.rb file (i.e., if you omitted the require statement), FileUtils is unknown to the program—and running it would produce an error.

 

A module is a way of grouping together methods, classes, and constants. (Although that is not all a module is about).

 

If you were to define some methods that are not instance-specific (like static methods), a module may be a good place to define them. In Java, in a similar situation, you might have used a package, but the analogy is rather remote.

 

Creating a Whole Directory Path

Problem: How do you create an entire directory path (e.g., a/b/c) in Ruby?

 

Solution

A slightly different variation of file creation using the mkdir is a mkdir_p function, which creates all directories in the path as required.

How It Works


Try the following code to see the effect.

require 'fileutils'

FileUtils.mkdir_p('region/div/dept')

 

Alternatively, for the same directory structure (i.e., multiple directories with the '/' separator), if you used mkdir instead of mkdir_p, things would not be so smooth. The following code produces an error.


require 'fileutils'

FileUtils.mkdir('region/div/dept')

For a better error message, for this case too, you can use rescue.

begin

require 'fileutils'

FileUtils.mkdir('region/div/dept')

rescue

puts 'Wrong function used'

end

Although that does not solve the problem of functionality, it does present the case nicely.

=>ruby crpath1.rb

 

The wrong function used Reading Multiple Lines from a File

Problem: What if you need to read multiple lines instead of one from an input line? The following is the earlier program.


infile = File.open('input.txt','r')

myword = infile.gets

puts myword

infile.close

If you were to use it on a three-line input file, as follows, it would output the first line only.

welcome

to

Seattle

However, in a batch script, quite often you may need to scan through all the lines in an input file. In contrast, this program opens the file, reads only the first line, prints it out on the console, and closes the file.

 

Solution

Between the opening and closing the files, the middle part (reading and printing out) needs to be done until the input file has exhausted all lines. A while loop can be used to do the job nicely. Although this is not the only possible way, it can be considered a general enough approach.


infile = File.open('inplines.txt','r')

while (line = infile.gets)

puts line

end

infile.close

■■Note Prior to running the code, create a multiline text file in the directory named inplines.txt.

 

The code should run as expected and print all three lines.


=>ruby readmulti.rb

welcome

to

Seattle

 

How It Works

Ruby offers some control statements. (You would have already seen for). while is one such control statement. This is the normal construct of a while statement:


while (condition)

statements

end

Here, the while (condition) line serves as the beginning of the block and the end marks the end of the block. Let’s come back to the code for reading lines from the file. Note that this is the while line:


while (line = infile.gets)

This means that the following condition is as follows.

line = infile.gets

 

But wasn’t it supposed to be a Boolean? Yet this is an assignment, isn’t it?

It is actually both an assignment and a Boolean. This is one of the peculiarities of Ruby if you are coming from a Java background, for instance.

 

First, for any assignment in Ruby, after the right-hand side expression is evaluated, the value that is assigned to the variable is the value of the whole assignment (i.e., the assignment itself evaluated to that value).

 

■■Note The same applies to return values of function calls in Ruby. (i.e., the function itself evaluates to that value).

 

Hence, the value of the statement a = 2 + 3 + 5 is 10, which is the value posted to the variable after the expression evaluation. So the variable and the assignment as a whole both evaluate to 10 in this case.

 

Second, the gets function (used on the file object) returns nil if it fails to read a line. nil in Ruby is equivalent to null (as in Java). And nil is treated as false in a Boolean context (for instance, if you assign nil to a condition that expects a Boolean, then nil will be taken as false).

 

Hence, when no more lines are found (all the lines have been read), in file.gets returns nil, which is the value of the assignment at that point, which in this case means false, and the while loop breaks free.

 

So long as the lines are available (unless there are any other errors that prevent reading), they are assigned in turn to the line variable (for a proper line, the assignment would evaluate to true and the while will go on).

 

The line variable may be used inside the loop body for processing purposes. The choice of the name of the variable representing a line (line in this case) is arbitrary.

 

This construct with while (line = in file.gets) is a convenient way to read an input file, line by line, and process the data therein.

 

Things can get slightly better (definitely from a typing point of view) with while. Remember the part about function arguments not requiring parentheses? This makes puts "Hello" and puts("Hello") equivalent. Hence, this statement


infile = File.open('inplines.txt','r')

can be replaced with the following one.

infile = File.open 'inplines.txt','r'

(Don't miss the gap between the end of open and the beginning of input.txt.)

 

Well, it happens with conditions too. Hence, you can safely omit the parentheses and write the while line (the line containing while and condition) as follows.


while line = infile.gets

Our earlier code for a multiline read may boil down to this:

infile = File.open 'inplines.txt','r'

while line = infile.gets

puts line

end

infile.close

 

It still works as usual. In fact, if you need only one statement inside the while (like here), you can make the entire while loop inline instead of the while block, and it still works.


puts line while line = infile.gets

The code is now three lines in total.

infile = File.open 'inplines.txt','r'

puts line while line = infile.gets

infile.close

 

Reading a File in One Shot

Problem You need to read the whole file in one shot (i.e., in a single string).

Solution

Use the read function on the file as follows, for example.


text = File.read 'inplines.txt'

puts text

It should run as follows, printing all the lines (which are part of the text string variable in this program).

=>ruby fullfl.rb

welcome

to

Seattle

 

How It Works

First, be aware that for a rather big file, this may not be a good idea. Reading the entire file directly into memory can stop a machine in its tracks if the file is too big. It should be done only when you know in advance how big the file is and you’ve got plenty of RAM.

Second, note that the code is not using any file handler (so the file need not be explicitly closed from the program).

Third, the newlines are part of the string. (This has to be catered for if you want to extract individual lines from the text string for processing).

 

One use case (but not the only one by any means), could be when you are looking for multiple occurrences of a particular word in a file, but the word may be split across lines (without a hyphen or space at the split point, just the newline).

 

Working with Strings

Problem You want to work with strings in Ruby.

Solution

If you know any other programming language, chances are that you already understand strings.

Strings in Ruby are a sequence of characters (or bytes) that are typically used to represent text. Strings are objects of the String class in Ruby.

 

How It Works

There are many ways to construct a string literal, all of which are not equally used (and hence, probably not worth thinking much about unless you want to learn the language comprehensively). In my opinion, the following are the more prominent ones.

 Encased in single quotes:

'This is a book'

'That isn\'t the case' => That isn't the case 'double quote " n' => double quote " n

 

■■Note In this form, you cannot use a variable substitution using #{expression}.

Encased in double quotes:


"Hello World"

"isn't it" => isn't it

"The value is #{2 +3}" => The value is 5

 

You would have already seen that the expression in #{expression} can be a variable (e.g., #{name}). However, this can even be one or more statements. The following code


puts "Check this out #{

j = 0

for i in 1..5

j = j + i

end

j}"

translates as follows.

Check this out 15

 

Being a sequence of characters, certain characters in a string can be accessed using index (in this sense, it behaves like an array [a zero-based array]). Hence, the following code prints e (the second character) followed by a newline.


a = "Hello"

puts a[1]

Concatenation

You have already seen a string concatenation with the + operator.

'abc' + 'def' => "abcdef"

 

Expression Evaluation

String expression evaluation can also be used to concatenate strings.

 

Note, you cannot straightaway add an integer to a string. "Hello" + 3 will result in an error.


irb(main):001:0> "Hello" + 3

TypeError: no implicit conversion of Fixnum into String from (irb):1:in `+'

from (irb):1

This is the way forward:

irb(main):002:0> "Hello #{3}"

=> "Hello 3"

There are other ways of concatenating an integer or float to a string.

 

Converting Numbers to a String

Problem

How do you convert an integer or a float to a string, and vice versa?

 

Solution

When called on a string, the to_i function makes it an integer (Fixnum). This is especially useful for reading from a console or an input file, (when the data is expected to be an integer).

 

Note that it does not complain when the data is not an integer. For a string (not a number), it simply returns 0. (So you need to be careful; otherwise, for the wrong data, it might silently produce a result that may be far from what you expected, and not easy to recognize as wrong.)


irb(main):001:0> "12".to_i

=> 12

irb(main):002:0> "12.5".to_i

=> 12

irb(main):003:0> "abc".to_i

=> 0

The to_f function is similar except, it converts to a float.

"12".to_f => 12.0

"12.5".to_f => 12.5

"abc".to_f => 0.0

The to_s function (although available on a string also) is more useful when called on other things, especially an integer (Fixnum) or a float (Float).

 

Notice that "Hello" + 3 causes an error, but the following works perfectly.

irb(main):006:0> "Hello " + 3.to_s

=> "Hello 3"

The to_i and to_f functions have been discussed already in the context of reading input from a console. Here they were presented briefly for the sake of putting them in one place.

 

Extracting Information from Strings

Problem

A string, such as a line from a file, may contain information, only part of which may be of interest in a certain context. For instance, a data file may contain the first name, last name, age, and telephone number of people (let’s say customers), one record per line.

 

If you wish to know the name of the youngest person in the data available in the file, the name and age are important, but the telephone number has no relevance.

 

Thus, it is often useful to be able to extract information contained as part of a string and then work on this information. How do you go about doing that?

This section describes a couple of tasks and demonstrates in context.

 

Task: Change the Order of Names

A data file named nameaddr.csv consists of three columns: last name, first name, and the first line of the address.


carver,anita,12 Ross St

dell,sarah,15 Jesse St

yehuda,perez,20 Margaret St

chinoy,ron,23 Madox Square

 

The task is to print (modified) records in a file so that the resulting record only has the <first name> and the <last name> separated by a space and both capitalized. The following illustrates an example. carver,Anita,12 Ross St => Anita Carver

 

Solution

The following code should do the trick.


infile = File.open('nameaddr.csv','r')

outfile = File.open('names.txt','w')

while (line = infile.gets)

arr = line.chomp.split(',')

outfile.puts arr[1].capitalize + " " + arr[0].capitalize

end

outfile.close

infile.close

How It Works

This code uses the chomp, split, and capitalize functions, as well as concatenating using +.

 

Totaling the Shopping List

Given the following content in an input file (like the prices of items from shopping), write a program to calculate the total amount spent. It is a CSV file named shopping. csv.

 

The format is item_name, quantity (the number of units or another measure; for example, 1 implies 1 unit, or 1kg, or 1 liter, based on the unit specified), unit_price, and a description of the unit, separated by a single space.


Banana,6,2.50 each

Eggplant,2,10.00 per kg

Milk,3,4.50 per litre

Cold drinks,6,8.25 per bottle

 

Solution

Clearly, the result should be the sum of each of the first set of numbers multiplied by each of the second set of numbers. However, note that just separating by a comma won't give you the second set of numbers. The following code should work.


infile = File.open('shopping.csv','r')

sum = 0

while (line = infile.gets)

arr = line.chomp.split(',')

arr2 = arr[2].split(' ')

sum = sum + arr[1].to_i * arr2[0].to_f

end

infile.close

puts sum

(If the result is not as expected, in the input file, check if the unit price and unit description have more than one space in between them in any line).

 

Handling Exceptions

Problem

You want to handle any exceptions that occur in your program.

Solution

Use a combination of rescue and ensure. rescue wraps the code for handling errors. However, sometimes it may be necessary to run a piece of code in the end, no matter whether the execution of a block was normal or encountered an exception.

 

(It somewhat corresponds to the finally block in Java.) ensure comes in handy in such situations. It comes after the rescue. The structure may look like the following code.


f = File.open("input.txt")

begin

code for processing

rescue

code for handling errors

ensure

f.close unless f.nil?

end

 

An else construct can be used with this. The part of the code for the else is executed only if no error has been encountered in the main part (i.e., it is an else to the rescue).


f = File.open("input.txt")

begin

code for processing

rescue

code for handling errors

ensure

f.close unless f.nil?

end

To raise an exception, raise can be used (in one of three ways).

raise

raise "no file found"

raise ArgumentError, "Too big name", caller

The first form simply raises the current exception (or a RuntimeError if there is none). The second one creates a new RuntimeError, setting its message to the string that it specifies.

 

The third form uses the first argument to create an exception, sets the message with the second argument, and sets the stack trace to the third argument.

 

Single Line Rescue rescue has a single line form, which may be handy for small pieces of code. It is really easy to demonstrate through an example. Suppose you define an array of two elements and try to multiply the third element (which is non-existent for the array) by 2; it will come up with an error.


irb(main):001:0> a = [1,2]

=> [1, 2]

irb(main):002:0> a[2] * 2

NoMethodError: undefined method `*' for nil:NilClass from (irb):2

You can, however, wrap up the exception with a rescue, as follows.

irb(main):003:0> a[2] * 2 rescue 'No such element' => "No such element"

irb(main):004:0> a[1] * 2 rescue 'No such element' => 4

Note that for an existing element, it provides the proper result.

 

Working with Predefined Variables and Constants

Problem: Ruby has a lot of predefined variables for various purposes. Some of them are quite useful for batch processing.

 

Solution

Take $@ for instance, which holds an array of stack trace generated by the last exception.

The following code provides a small illustration.


begin

raise

rescue

print $@

end

It could run like this.

=>ruby test.rb

["test.rb:2:in `<main>'"]=>

The following (all read-only and local to the scope) are useful in pattern matching cases.

 $& – The matched string (after a successful pattern match).
$` – The string preceding the pattern in a successful pattern match.
$' – The string following the match in a successful pattern match.
$1 to $9 – The contents of a successive group of matches in a successful pattern match.
$~ - Local to the scope but not read-only; a Matchdata object that encapsulates the result of a successful pattern match.

 

The following piece of code illustrates the use of some of these variables.


"abracadabra".match(/rac/)

puts $&

puts $`

puts $'

It prints the following.

rac

ab

adabra

So it prints patterns in the string that matches /rac/ (which is the 'rac' part itself), the pattern before 'rac' (which is 'ab'), and the pattern after 'rac'.

Some of the execution environment variables are as follows.

$0 – The name of the top-level Ruby program being executed (typically, the name of the program file)
$? – The exit status of the last child process terminated (read-only and local)
$* – Command-line arguments (a synonym for ARGV)

 

Here are some of the input-output variables.

 $_ – The last line read (scope local to the thread).
 $/ – The input record separator (newline by default). The gets function, for instance, uses this; setting it to nil results in reading an entire file, for example.
 $. – The number of the last line read from the current input file.
 $, – The output separator (string) to methods such as Kernel#print and Array#join.
 $; – The default separator used by String#split.

 

For a CSV (comma-separated) file, you normally need to specify ',' as the argument to the split (to get different columns). Take the following input (in a file named input.txt), for example.


Seattle,is,a,city

Washington,is,a,state

USA,is,a,country

and the code

infile = File.open 'input.txt','r'

while line = infile.gets

col = line.split(',')

puts "#{col[0]} #{col[3]}"

end

infile.close

If you change the value of $; appropriately, then you won’t have to call split with that argument.

$; = ','

infile = File.open 'input.txt','r'

while line = infile.gets

col = line.split

puts "#{col[0]} #{col[3]}"

end

infile.close

And that prints the following.

Seattle city

Washington state

USA country

Although it is a simple use case, you may find more ingenious usage for the same.

 

Predefined Constants

There are some predefined constants as well. One such is ARGV, which has already been discussed (in the context of command-line arguments). Most of these are perhaps not as interesting or useful as the predefined variables but include STDIN, STDOUT, STDERR, and RUBY_VERSION, and ENV.

 

You may want to go into IRB, type ENV, and press the Return key. The output (which is the environment variables involved in your Ruby programming environment) may be interesting to watch.

 

Running OS Commands

Problem

In a batch execution, you want to run something like an OS command (command line), get the output, and process the same in your own way within the script.

 

Solution

Running an OS command from Ruby is achieved by using backquote delimiters. For instance, the following code

val = `ls *.txt`

print val

You may parse this for your particular data of interest. This feature can be used to great advantage in batch scripting.

 

Initializing and Finalizing Code

Problem

You need to do some initializing, (e.g., set the value of a default variable) that works for the entire program, not just a specific code block. The same is true for some finalizing activity.

 

Solution

One way of accomplishing this is to use BEGIN and END blocks. They are used to set predefined variables for the length of the script, for instance. BEGIN blocks execute prior to the main script body and END blocks execute after the main script body.


BEGIN { puts "abc" }

for i in 1..5

puts i

end

END { puts "def" }

 

Note that the blocks can be multiline. Also, note that there may be multiple BEGIN and END blocks in a program. BEGIN blocks execute in the order of occurrence and END blocks execute in reverse order. 

 

Defining Functions

Problem

How do you define your own functions in Ruby code?

 

Solution

A simple function in Ruby (without parameters) can be defined as follows.


def <method_name>

<code>

end

def and end are keywords, and <code> represents one or more statements. The function may be called simply with the function name (in the same code body). For example,


def say_hello

print "hello world"

end

say_hello

when executed, should print the following.

hello world

 

The first three lines of code define a function named say_hello, and the fourth line of code calls the function. For calls outside the class, a dot operator is used. You have already seen a lot of examples of this type of call.

By convention, a function defined in a class is referred to as a method.

 

NameError: undefined local variable or method `say_hello' for main:Object

Functions with Arguments

Functions with arguments may be declared in various forms.

The following code illustrates a function with arguments and call to that.


def sum_of (a, b)

c = a + b

end

d = sum_of 2,2

print d

It prints as follows.

4

 

Querying a CSV File

Problem

Imagine a situation like this. Your company has an elaborate payroll and HR system. However, your department’s HR director keeps her own text file with a few details (such as names, birthdays, and so on).

 

She keeps this for somewhat unofficial occasions, like buying a cake for someone's birthday in the department, which is not an enterprise-wide (but departmental and that too somewhat unofficial) event. Suppose it is a generally accepted practice in some other departments too.

 

The department HR director comes to you, having heard that you are the somewhat recognized expert in manipulating text files (with Ruby scripting), and asks you for help.

 

The file in question has the following data.

Robin,Sen,20/11/1965,360 Karin Drive NSW 2322 Karina,Rhea,23/05/1982, 3/25 West Avenue NSW 2455 Marvin,Major,08/12/1967,210 Racheal Place Vic 3222 John,Doe,15/12/1968,210 Racheal Place Vic 3222 Roland,Boyd,19/02/1992,21 Palm Avenue TAS 5525

 

The birth dates are in dd/mm/yyyy format. The task she wants you to perform (actually a number of subtasks), consists of the following.

  1. Write a Ruby program to find out a person’s birthday, given the first name and the last name as arguments.
  2. Write a Ruby program to find the youngest person and the oldest person in the department (from the file, of course).
  3. Write a Ruby program to find out the names of all the people with birthdays in a given month (say, December). She is willing to pass the argument as a numeric value (such as 12 for December), rather than as a string, to make your task easier.

 

She also mentioned that she is not very good at running computer programs, and sometimes she forgets to provide the right arguments while running a program.

 

She does not want to be surprised with a lengthy and/or unintelligible error message. (Those error messages are more for developers). The same is true for a name that she might have misspelled and for which the birthday is not found in the file.

 

You may argue that for such a small file, someone could just open and read it. But consider that the file could have been considerably bigger. (This is, after all, just a learning exercise. Real-life problems could indeed be larger in scale.)

 

Besides, you may not want to argue with a nice person on such points. She may have come to you because she rather likes you. Also, it may be half a day’s worth of work for you and your manager knows about it, so no trouble that way.

 

Solution

Let’s go into each subtask.

 

Subtask 1: A Person’s Birthday

This (sub)task can be further broken down to do the following.

  1. Get the arguments: first and last names.
  2. Check the argument count. You could do other checks, such as whether it starts with a letter or not, but for now, let’s restrict the validation check to the number of arguments only.
  3. Find and print out the person’s date of birth.
  4.  Display a nice message when the name is not found.
  5. Print out a nice error message when the file is not found in the directory. 

Sound good?

Command-line arguments and ARGV arrays were discussed in Recipe 2.5. They come in handy for argument getting and checking.

 

Writing the Code

The following code works fine and part of it may be used for argument (getting and) checking.


if ARGV.length != 2

puts "please provide first_name and last_name"

exit

end

first_name = ARGV[0]

last_name = ARGV[1]

puts "Getting birthday for #{first_name} #{last_name}"

Note that the arguments to the program should be separated by space, not a comma. Note also the use of exit for exiting the program midway.

 

Please provide first_name and last_name

Splitting a string based on a particular separator (e.g., ',') has been discussed. To get the date of birth, the third column is needed, while the first and second columns are given as arguments.

 

To get the birthday (including file opening and some error checks), let’s first try a hard-coded name. The planned approach is to code it partly and then combine as required.

 

The following code does the job.


first_name = "John"

last_name = "Doe"

begin

infile = File.open('hrfile.txt','r')

found = false

while line = infile.gets

col = line.split(',')

#if first and second columns match with the names if col[0] == first_name && col[1] == last_name

#print date of birth

puts "Date of Birth for #{first_name} #{last_name} is #{col[2]}"

#mark as found and break

found = true

break

end

end

#at this point found false means no line has matched the names if not found

puts "Sorry birthday for #{first_name} #{last_name} not found - check spelling"

end

rescue

puts "Could not find file hrfile.txt - check the directory."

ensure

infile.close unless infile.nil?

end

Hard-coded first names and last names do not need arguments to run. This is a way of developing part of a program as full, runnable code, which can later be converted, without much ado, to a function. (This “fast tracks” things a bit so that attention can be focused on other parts of the task).

 

It has a rescue portion, which prints a message if the input file is not found in the directory (although since this is a general rescue, any other error in its scope will also provide the same error message). The ensure part closes the file unless the handler is nil.

 

Within the while loop, each line picked up is split based on a comma. The first and second columns are attempted to match with the given first_name and last_name, respectively.

 

If a match is found (that means the person's record has been found), the third column (which is the date of birth) is printed, a boolean is marked true, and the while loop is exited with a break. There is no need to read another line if the match is already found.

 

On the other hand, if all the lines are exhausted and the match is still not found, then an error message is printed, indicating the same (with a slight hint that the spelling may be wrong).

 

Testing

The objective of testing is used to check that a program is working per its intended purpose.

In very general terms, the intended purpose of a program has two main categories. It should run successfully when all the settings and arguments are proper (valid case(s).)

 

It should fail (with proper error messages, as desirable) when the conditions/arguments are not right (invalid cases(s)). In a good program, the result (output message, etc.) of an invalid case should be indicative of what has gone wrong.

 

In this case, the success objective is that the program should run as it stands and give John Doe's birthday as '15/12/1968' (from the file). And it does not disappoint. It produces the following. Date of Birth for John Doe is 15/12/1968

 

For error cases in our testing, one very important check makes sure that if the input data file is not present, then this is indicated.

 

The other check makes sure that if a name is not present in the file, a meaningful message should display to indicate that the name was not found.

 

For both of these (invalid) cases, the testing requires a bit of tweaking. In the absence of a file, rename the data file to something else (say, hrfile1.txt) and run the code. The output should be as follows. Could not find file hrfile.txt - check the directory.

 

And for the second case, the code itself could be changed (but rename the data file back to its original name). Change the hard-coded first name to “Joe” from “John”.

 

With “Joe” as the first name, the code should produce the following.

Sorry birthday for Joe Doe not found - check to spell

It appears that the result is satisfactory.

 

Appropriately combining this code with the earlier getting and checking arguments (making small additions/deletions/modification as required in the process), the following code is reached.


if ARGV.length != 2

puts "Please provide first_name and last_name"

exit

end

first_name = ARGV[0]

last_name = ARGV[1]

begin

infile = File.open('hrfile.txt','r')

found = false

while line = infile.gets

col = line.split(',')

#if first and second columns match with the names if col[0] == first_name && col[1] == last_name

#print date of birth

puts "Date of Birth for #{first_name} #{last_name} is #{col[2]}"

#mark as found and break

found = true

break

end

end

#at this point found false means no line has matched the names if not found

puts "Sorry birthday for #{first_name} #{last_name} not found - check spelling"

end

rescue

puts "Could not find file hrfile.txt - check the directory."

ensure

infile.close unless infile.nil?

end

And, it works well.

Subtask 2: (The Names of) the Youngest and the Oldest Persons

 

This (sub)task does not require getting any argument to the program. It works on all the rows. This can be broken down into three parts.

  1. Get the third column.
  2. Change it into yyyymmdd format (this will make it easy to compare numerically).
  3. Sort the yyyymmdd values to get the minimum and the maximum. Store the corresponding names.

 

Again, you can take a build by portions approach.

Getting the third column is simple. But for the latter part of the code, it is better if we get the first name and last name along with that (because eventually, we have to print the names of the youngest and the oldest persons, not their dates of birth).

 

With the rescue and ensure parts tagged on, the code looks like this:


begin

infile = File.open('hrfile.txt','r')

while line = infile.gets

col = line.split(',')

puts "#{col[0]} #{col[1]} #{col[2]}"

end

rescue

puts "Could not find file hrfile.txt - check the directory."

ensure

infile.close unless infile.nil?

end


For this purpose, again, you can resort to writing a small code with a hard-coded value.


The following code does not disappoint you.

orgdate = "20/11/1965"

dtpart = orgdate.split('/')

print "#{dtpart[2]}#{dtpart[1]}#{dtpart[0]}"

It prints the date converted in that format. It should output as follows (in this case without adding a newline at the end of the output, as the print is being used).

 

19651120

Finally, for the sorting (you actually need to do two types sorting: one for the minimum and one for the maximum), you use another piece of code, which works on a smaller data file (named input2.txt and having data as shown here).


Robin Sen 19651120

Karina Rhea 19820523

Marvin Major 19671208

The following code seems to work.

mindate = 30000000; maxdate = 1

infile = File.new('input2.txt','r')

while (line = infile.gets)

col = line.chomp.split

date = col[2].to_i

if (mindate > date)

mindate = date

oldest = "#{col[0]} #{col[1]}"

end

if (maxdate < date)

maxdate = date

youngest = "#{col[0]} #{col[1]}"

end

end

infile.close

puts "Youngest : #{youngest}"

puts "Oldest : #{oldest}"

When run, it should produce this:

Youngest : Karina Rhea

Oldest : Robin Sen

Note that mandate is set at a rather high value (higher than you should expect in the data set) and mandate is set at a rather low value, to start with. This is to ensure that the very first comparison finds a new minimum value (or a new maximum value, as the case may be); otherwise, the algorithm may not work properly.

 

For anyone comparison, if the date needs switching (a new candidate date is found), the designated (oldest or youngest) name is reassigned too (to the value from the corresponding row).

 

After all the rows are processed, it is left with the names of the youngest and the oldest persons. Since the whole solution is being built in a piecewise fashion, this part of the program uses an intermediate data format (in a limited quantity) to develop the processing logic.

 

Putting it all together (and doing some amendments), the final code, which reads from the actual data file, looks like this:


mindate = 30000000; maxdate = 1

begin

#open input file

infile = File.open('hrfile.txt','r')

#read and process lines in a loop

while line = infile.gets

#split line for individual columns

col = line.split(',')

#split date for individual date parts

dtpart = col[2].split('/')

#reassemble date parts in yyyymmdd format for easy sorting date = "#{dtpart[2]}#{dtpart[1]}#{dtpart[0]}".to_i

#check if it is a new minimum

if (mindate > date)

mindate = date

oldest = "#{col[0]} #{col[1]}"

end

#check if it is a new maximum

if (maxdate < date)

maxdate = date

youngest = "#{col[0]} #{col[1]}"

end

end

rescue

puts "Could not find file input.txt - check the directory."

ensure

infile.close unless infile.nil?

end

#print the result(s)

puts "Youngest : #{youngest}"

puts "Oldest : #{oldest}"

Note that this code may be further optimized, but it shows a generalist approach to solving a problem. For instance, the checks for maximum and minimum could have been done within an if-else structure, rather than using two if statements.

 As another example, the date formatting could have been handled using a proper API.

 

Date Handling by API

Ruby has a Date class that has an elaborate API for parsing, formatting, and otherwise using dates. You need to require the file to use them.

 

Here is a small example to show the parsing and formatting of dates. Using these, you could process the date for this task.

The following code


require 'date'

dt = Date.parse('3/2/1965')

puts dt.strftime('%Y%m%d')

should produce this:

19650203

It is possible to provide the parsing format explicitly, for example:

dt = Date.parse('03/02/1965','%d/%m/%Y')

Sometimes it may be necessary.

Subtask 3: Persons with a Birthday in a Given Month

 

This one is quite simple. Broadly, the steps are as follows.

  1. Check that the first argument (integer value) is between 1 and 12 (both ends included).
  2. Add 0 to the front if the integer is less than 10.
  3. Compare it with the middle part (as split by '/') of the third column (as split by ','), and if a match is found, print the name.

The following is the code, detailed properly.


if ARGV.length < 1

puts "Please provide the month [1 to 12]"

exit

end

month1 = ARGV[0].to_i #Dec will become 0 if month1 < 1 or month1 > 12

puts "Wrong format or month number : valid 1 to 12"

exit

end

if month1 < 10

month = "0" + month1.to_s

else

month = month1.to_s

end

begin

infile = File.open('hrfile.txt','r')

found = false

while line = infile.gets

col = line.split(',')

birthmonth = col[2].split('/')[1]

if birthmonth.eql?(month)

puts "#{col[0]} #{col[1]}"

found = true

end

end

rescue

puts "Could not find file hrfile.txt - check the directory."

ensure

infile.close unless infile.nil?

end

puts "No record found for a given month" if not found

 

The following are some of the main test cases for this:

  1. Provide no argument.
  2. Provide a string as an argument, such as Dec.
  3. Provide a valid two-digit month (such as 12) that should fetch record(s).
  4. Provide a valid single-digit month that should fetch a record.
  5. Provide a month number (such as 1) that should not fetch any record.

With no argument, this is the output:


Please provide the month [1 to 12]

With Dec as the argument, this is the output:

Wrong format or month number : valid 1 to 12

With 12 as the argument, this is the output:

Marvin Major

John Doe

With 5 as the argument, this is the output:

Karina Rhea

And with 1 as the argument (it does not have a corresponding record in the data file), this is the output:

No record found for a given month

 

Note that if you provide more than one argument, the second argument onward is ignored. (No check is in place for argument count). Also, initially, the first argument is converted to int (actually Fixnum). This makes comparison easier.

 

Now you can confidently deliver the programs to the department HR director (if the situation was not fictitious, that is).

 

Sorting Text

Problem

The next task demonstrates taking user input (from a console) in a loop and processing the data once the end of the input is signaled. So you want to take names, one by one in a loop, from the command prompt, unless the user enters the string END. Then, sort those names in alphabetical order and print them out.

 

Solution

The following code will work.


print "Name [enter END to end] : "

name_arr = []

while name = gets.chomp

case name

when "END"

puts "No more input signalled by user"

break # break from asking loop

else # some name

#append the name to the array

name_arr << name

#print the prompt again for further input print "Name [enter END to end] : "

end

end

#sort the array and print the result

name_arr.sort.each {|name| puts name}

When run, this code should keep printing the prompt and wait for the user to input a name (one at a time). Once the input is END (all uppercase), it stops asking for input and provides the output (i.e., prints a sorted list of the names—one per line, as expected from puts function).

 

How It Works

The code has a lot of comments and it would be helpful to follow them. But as you can see, it is essentially using a while loop to get the names one by one, and using chomp to remove the newline characters following the names (because the user is supposed to press the Enter key every time after entering the name). It also has a conditional break in case END is entered instead of a name.

 

A print for prompt is required before the while in order to prompt for the first name because the while condition has a gets call (where it would stop without giving any decent clue to the user that it is waiting for a name).

 

Once the names are all taken, the real work happens in one line of code (sorting and printing). This is where Ruby shines (over Java for instance) in this kind of quick scripting.

 

Storing Data in a Structured Manner

Problem

Sometimes we need to store data in a structured manner, access and change (or otherwise process) them as part of the structure, and provide the necessary output. It may be more convenient (or conceptually easier to reason about/or easier to maintain) data in a structured format that represents an entity in the business domain.

 

Solution

In this situation, a struct can be very helpful. A struct is a class that makes it easy to organize and handle data.

Suppose we need to keep our customers’ names, addresses, and telephone numbers to do various processing.

 

It would be nice if, for each customer, we could group this information together (with possibly a short name [actually a variable] that identifies each customer for later retrieval and/or processing of his/her information). We can do this in the following ways:


Struct.new("Customer", :name, :addr, :tel)

or

Customer = Struct.new(:name, :addr, :tel)

Either way creates a structure named Customer, which has the structure described (i.e., three fields named name, addr, and tel—in that order).

And multiple customer data can be created using the structure, as follows.

john = Customer.new("John Connor", "123 Rachel Close", 3456)

jane = Customer.new("Jane Greystoke", "12 Jungle House", 4568)

turno = Customer.new("Sarah Turnbull", "50 Sunset Boulevard", 1254)

The variables (john, jane, etc.) can then be used to access particular data in those structures. Here is an example.

irb(main):005:0> http://john.name

=> "John Connor"

irb(main):006:0> jane.tel is Available for Sale

=> 4568

It can even be changed by assigning a new value.

irb(main):007:0> jane.tel is Available for Sale = 1111

=> 1111

irb(main):008:0> jane.tel is Available for Sale

=> 1111

It is possible to define a structure with methods also. Check the following code.

Customer = Struct.new(:name, :addr, :tel) do def greeting

puts "Hello #{name}!"

end

end

john = Customer.new("John Connor", "123 Rachel Close", 3456)

john.greeting

It does the job nicely, and when run, it should produce this:

Hello John Connor!

 

There are multiple ways to access the fields in a struct. For instance, each of the following accesses the name John (and should return “John Connor”).


john['name']

john[:name]

john[0]

One way may be more desirable than others in some situations. It also may be a matter of style, but I would recommend following the http://john.name style unless some other style is really required for the situation.

 

Note that it is also possible to very easily use a customer array, which may be iterated through for processing.

 

The following code

Customer = Struct.new(:name, :addr, :tel)

cust = []

cust[0] = Customer.new("John Connor", "123 Rachel Close", 3456)

cust[1] = Customer.new("Jane Greystoke", "12 Jungle House", 4568)

cust[2] = Customer.new("Sarah Turnbull", "50 Sunset Boulevard", 1254)

cust.each { |c| puts http://c.name }

should produce this:

John Connor

Jane Greystoke

Sarah Turnbull

 

It is also possible to iterate through each field of a single struct’s data. For example, in the preceding structure, if we define another customer like this:


joe = Customer.new("Joe Smith", "123 Maple St", 12345)

we can iterate through each field of this particular customer data, as follows.

joe.each_pair {|name, val| puts("#{name} => #{val}") }

It should produce the following.

name => Joe Smith

addr => 123 Maple St

tel => 12345

There are other methods in the Struct API for various functionalities. For instance, the == or equal? the method checks the equality between the two structures.

 

The following code

Customer = Struct.new(:name, :addr, :tel)

cust = []

cust[0] = Customer.new("John Connor", "123 Rachel Close", 3456)

joe = Customer.new("Joe Smith", "123 Maple St", 12345)

j2 = Customer.new("John Connor", "123 Rachel Close", 3456)

puts j2 == joe

puts j2 == cust[0]

should produce this:

false

true

This concludes our current discussion on structs.

 

Union

If you are to look for a union of two sets of characters (say, two ranges) you could nest the square brackets containing one set within another, like [A-Z[a-z]].

 

Note that if both are ranges, it is equivalent to contiguous ranges, as shown earlier. That is, [A-Z[a-z]] is equivalent to [A-Za-z]. There are other cases where they would be equivalent.

 

Intersection

From the input file (desc.txt), how do you find any character that is the intersection of the set of letters A to V (set1) and T to Z (set 2)?

 

You could find out the intersecting set and use it as a range. But suppose you don’t want to think so much? You’d rather let the program do it.

 

The intersection of two sets of characters could be defined with the construct.

[<set1>&&[<set2>]]

This is an example.

[A-V&&[T-Z]]

The [A-V&&[T-Z]] pattern matches any single character, which is common to the range A–V and T–Z (which is T–V).

It’s no wonder that the following code identifies 'T' from both lines. (This is the only available capital letter in either line in the range T–V).

infile = File.open 'desc.txt','r'

while line = infile.gets

matched = line.match(/([A-V&&[T-Z]])/)

puts matched.captures if matched

end

infile.close

 

Intersection with Negation

It gets interesting when you mix intersection with negation. It could be helpful when you have a big range, but only a few characters are to be left out (all except).

 

Suppose that you wanted to extract all lowercase letters, except vowels, from the same input file. How do you do it?

The [a-z&&[^aeiou]] pattern works well.

The following code


infile = File.open 'desc.txt','r'

while line = infile.gets

matched = line.match(/([a-z&&[^aeiou]])/)

puts matched.captures if matched

end

infile.close

produces 'l' and 'f', respectively, from the two lines. You may verify that those are the first (or only) lowercase consonants.

This would eventually work with two ranges, such as [A-V&&[^T-Z]].

 

An intersection with negation, where the inner set is a complete subset of the outer set, may be termed as subtraction.

 

In our earlier example of matching lowercase constants, the pattern used was a subtraction pattern; however, the same cannot be said for [a-j&&[^aeiou]].

 

Predefined Character Classes

The meaning of the words digits, non-digits (characters other than digits), and character (by negation, not a word character) should be pretty intuitive. Some words are less intuitive. Let’s start with. (dot).

 

Any Single Character: dot

The . (dot) is a predefined character class. It represents a wildcard that matches any single character. To see how it works, run the following code.


infile = File.open 'desc.txt','r'

while line = infile.gets

matched = line.match(/(.)/)

puts matched.captures

end

infile.close

Which will come up with

A

o

No wonder it identified the first characters from each line.

Note that for dot, you should not use the square braces. If you do, an error is encountered upon running the code. Ranges (single or contiguous) should be enclosed within square braces.

 

Whitespace and Non-Whitespace

To illustrate that \s matches a whitespace character (in this case the first space in each line), you may run the following code.


infile = File.open 'desc.txt','r'

while line = infile.gets

matched = line.match(/(\s)/)

puts "-" + matched.captures[0] + "-"

end

infile.close

This comes up with the following.

- -

- -

The '-' characters at either end make space pronounced. The reason that matched.captures[0] has to be concatenated is that captures returns an array, which cannot be directly concatenated to strings.

 

The first element (index 0) of that array, however, is a string, so that can be concatenated in the way shown. Try removing the [0] and the program will not run successfully.

 

In the first line of the input file, replace the first space with a tab and run the program.


A Tale

The output is somewhat different (for obvious reason).

- -

- -

Try putting a space (first) and then a tab between A and Tale. The output is the same as before. Space is picked up in this case.

 

Now restore the input file to its original condition and replace the \s in the code with \S (for non-whitespace characters). Upon running the code, the first letters are picked up on a pattern match.

-A-

-o-

This won’t change if you use a number of contiguous spaces and tabs at the beginning of these input lines. It will still match the first non-whitespace character in each line.

 

Escape Sequence

Let’s look at handling characters with special meaning to express their literal representation. There are tokens that represent a lot of things, such as a dot (.) to represent a single character, or and ^ to represent the beginning of a string.

 

What if you wanted to look for those characters at their literal face value? For example, you want to find an actual dot (.).

 

The approach is to use an escape sequence, which is a backslash (\), to escape the meaning of the character (and use it at face value). Thus, if you want to look for a dot, you would use /\./.

The following code

print "matched" if "a.b".match(/\./)

prints this:

matched

 

The following code, however, does not.

print "matched" if "ab".match(/\./)

This is true for all such control characters (+ ? . * ^ $ ( ) [ ] { } | \), which includes backslash itself. In order to look for a single backslash, you need to use two in the pattern. (i.e., \\).

 

End of a Source String

You should use them. (dot), but you also need an anchor to indicate the end of the source string—and that is $ (the dollar sign).

Try the following program.

infile = File.open 'desc.txt','r'

while line = infile.gets

matched = line.match(/(.$)/)

puts matched.captures

end

infile.close

 

You won’t be disappointed. ('e' and 's' are identified as the last characters for those two lines). Note, however, that the dollar sign appears after the dot (not before, as with the case of the other anchor).

 

This is important. Regexp tokens maintain their relative position (wherever applicable) in the search, as they appear in the pattern.

  1. ^. says to look for the single character just after the beginning.
  2. .$ says to look for the single character just before the end (newline is effectively the record separator, so it isn’t counted as part of the source string for this purpose).

That is how the first character and last character are specified in regexp.

 

Word Boundary and Non-Word Boundary

In order to understand the concept of a word boundary, here is a bit of an explanation.

A word boundary (\b) is a zero-width match that can match:

  • Between a word character (\w) and a non-word character (\W) or
  • Between a word character and the start or end of the string.
Note that, by definition, a word character (\w) is [A-Za-z0-9_] (in general).

Take the string “bread and jam”. The word boundary matches the (zero-width) places shown by the character '|'.

|bread|, |and| |jam|.

On the other hand, a non-word boundary character is anything (any character) except a word boundary (a negation of word boundaries).

 

It can match a zero-width place that is

  1. Between two word characters.
  2. Between two non-word characters.
  3. Between a non-word character and the start or end of the string.
  4. The empty string.

In the string “bread and jam”, it matches the places shown with | in the following (any place that is not a word boundary, so the negation of the places shown earlier):

b|r|e|a|d,| a|n|d j|a|m.|

 

Note that in this example (non-word boundaries), if the full stop was not there after the word jam, then the end of the string would be a word boundary instead of a non-word boundary. Now, let’s look at some actual demonstration on our input file (consisting of).

 

A Tale of Two Cities

Run the following code.


infile = File.open 'desc.txt','r'

while line = infile.gets

matched = line.match(/(.\b)/)

puts matched.captures

end

infile.close

It produces this:

A

f

'A' is the end of the first word in the first line (the whole word consists of a single letter, and hence, that is also the last character). 'f' is the end of the first word (in the second line), which is 'of'.

 

.\b says to get the single character just before the (applicable) word boundary; applicable, in this case, means the first such word boundary that has a character before it, not just the first word boundary, because it is looking for a pattern that is a character followed by a word boundary (so, the first occurrence of such a combination).

 

If you were to change the pattern to \b. (i.e., a character followed by a word boundary) 'o' would be picked up instead of the 'f' in the second input line. The following code (for a character followed by a non-word boundary)


infile = File.open 'desc.txt','r'

while line = infile.gets

matched = line.match(/(.\B)/)

puts matched.captures

end

infile.close

produces this:

T

o

For the first line, the first non-word boundary, preceded by a character, is the zero-width place after T (of the word Tale). In fact, that is the first non-word boundary in that line.

 

Using Non-Capturing Groups

Problem

You may wish that a second group be part of the pattern (to indicate the alternatives) but not be captured. How can you do that?

What would happen if we used captures on ["white and black".match (/(wh(eat|ite))/)]? What’s the point in trying to guess, when it can easily be found out?

 

The following code


print "white and black".match(/(wh(eat|ite))/).captures

prints this:

["white", "ite"]

It has captured two group matches. The second group is the one nested (i.e., (eat|ite), of which 'it' is a match).

 

Solution

This can be accomplished by making the group passive (or non-capturing). The way to do that is to put a '?:' at the beginning of the group. The following code


print "white and black".match(/(wh(?:eat|ite))/).captures

prints this:

["white"]

This will work even when the groups are not nested. For example, the following

print "white,black, or yellowish".match(/(white)(.*)(yellow).*/).captures

prints this:

["white", ",black, or ", "yellow"]

Yet, the following code

print "white,black, or yellowish".match(/(white)(?:.*)(yellow).*/).captures

prints this:

["white", "yellow"]

 

Named Backreferences

A named back reference is captured with the (?<name>pattern) construct and refereed with the \k<name> construct.

Suppose that you want to deal with two backreferences, and instead of referring to them as \1 and \2, you want to use names 'Hansel' and 'Gretel', respectively.

 

The following will work.

print "matched" if "cd1abcab3cd".match(/(?<Hansel>cd).(?<Gretel>ab). \k<Gretel>.\k<Hansel>/)

In the first group, the whole construct, which is of the form (?<name>pattern), is represented by name 'Hansel' and the 'cd' pattern. The matched substring is 'cd', which is assigned to the back reference named Hansel and can later be invoked as \k<Hansel>.

 

The following is the equivalent of our earlier code (with numbered backreferences).


print "matched" if "cd1cd2cd".match(/(ab|cd|ef).\1.\1/)

It can be written in the named backreference parlance, as follows.

print "matched" if "cd1cd2cd".match(/(?<x>ab|cd|ef).\k<x>.\k<x>/)

The choice of the name 'x' is arbitrary here (the name 'y' could serve equally well, if it is used consistently).

 

Finding a Match and Excluding Some of It in the Result

Problem

If you wished to match qu you could use the pattern /qu/. 

So long as it is purely for determining whether there is a match, it can be done as follows.


print "matched" if "aqua".match(/qu/)

If you are to get the matched string in return, you could use this:

print "aqua".match(/qu/)

It prints the following.

qu

But what if you wanted to print a 'q' on the match, only if it has u following it, but you did not want to get u along with the returned value?

 

Solution

Things will get very tricky. If you just use the /q/ pattern, it will match even if there is no 'u' following the 'q'.

And if you use /qu/ as a pattern, the return value will contain 'qu'.

 

Note that the non-capturing group will not help you here because you are interested in the whole pattern, not individual groups. So the following code


print "aqua".match(/q(?:u)/).captures

prints the following empty array, because the only group within the pattern is a passive (non-capturing) group.

[]

And this code:

print "aqua".match(/q(?:u)/)

prints the following.

qu

This is because the code is about the whole pattern, not the groups within. Hmmm. How do you avoid u (following the 'q') from being returned, while still checking for it as part of the match?


The following code succeeds.

print "aqua".match(/q(?=u)/)

It prints this:

q

And the construct that is used— (?=sub-pattern)— is a lookahead assertion.

I could not find a formal definition of assertions. In my own understanding, they can be expressed generally as the subpattern, presence, or absence of which (as specified), immediately ahead or behind another character or subpattern (as specified), causes the match to succeed, but which does not feature in the match returned.

 

So it is essentially a subpattern that features in the search but does not feature in the returned value (along with some other characteristics).

 

Let’s focus on the presence or absence and ahead or behind. If we combine these two sets of possibilities, we can come up with four types of assertions.

  • Present and ahead—or a lookahead assertion (an example of which has already been discussed)
  • Not present ahead—or a negative lookahead assertion
  • Present and behind—or a lookbehind assertion
  • Not present behind—or a negative lookbehind assertion

 

Replacing Substrings Using Regular Expressions

Problem

You need to replace the part(s) of a string as you search for a particular pattern. Take a look at the following string:

“All the land belongs to John Doe. All the horses belong to John Doe. And the farmhouse belongs to John Doe”
How would you replace “John Doe” with “me”?

 

Solution

You could do it this way.

print "All the land belongs to John Doe. All the horses belong to John Doe. And the farmhouse belongs to John Doe".GSUB(/John Doe/,'me')

 

The preceding code prints as follows.

All the land belongs to me. All the horses belong to me. And the farmhouse belongs to me.

 

It uses the "GSUB" function (for global substitution) with two parameters. The first one is the pattern. The second one indicates the substring that should be used to replace each of the matches.

 

If you used the subfunction instead of GSUB in the same fashion, only the first occurrence of 'John Doe' would be replaced.

 

You may try this on a file level. On an input file containing

  • All the land belongs to John Doe.
  • All the horses belong to John Doe.
  • And the farmhouse belongs to John Doe

this code


infile = File.open 'inp.txt','r'

outfile = File.open 'outfile.txt','w'

while line = infile.gets

outfile.print line.GSUB(/John Doe/,'me')

end

infile.close

outfile.close

would produce this

All the land belongs to me.

All the horses belong to me.

And the farmhouse belongs to me

in the output file.

 

It is possible to use GSUB, along with block structure, to do further processing prior to replacement, after the pattern is found. 

 

The following code finds each number group (separated by spaces in between), converts the group to an integer, and doubles them prior to replacing them with the result (of doubling).

print "12 10 16".GSUB(/(\d+)/) { |m| m.to_i * 2 }

It prints as follows.

24 20 32

 

Using the scan Function with Regular Expressions scan can also work with regular expressions.

You have already seen that the following code


print "this is the theatre".scan("th")

produces this:

["th", "th", "th"]

Try the following code.

str = "this is the theatre"

rslt = str.scan("th")

puts rslt.inspect

The result is the same. The inspect function offers an alternate means to inspect the result of the scan. (The function to_s also behaves the same way in the place of inspecting here.)

 

Try a regular expression in place of a string for the scan (as shown next).


str = "this is the theatre"

rslt = str.scan(/t|h/)

puts rslt.inspect

The result is very different.

["t", "h", "t", "h", "t", "h", "t"]

Since match with regular expressions (unless otherwise specified) finds the first match, using a scan with regular expressions may provide an easy way to find all the matches of the pattern in the source string.

 

Removing Block Commented Code

Problem

The first task is rather easy. Suppose you have a Java project with multiple files in multiple subfolders (packages). You developed it little by little, experimenting with this and that. In the process, you commented a lot of functions entirely.

 

Sometimes you commented on a large chunk of code within a function. You are done with your experimentation; however, there is too much-commented code.

 

Not that it would do anyone any harm beyond a bit of disk space, etc., but you want the code to be neat. Why bother to keep 3,000 lines of code if you can get away with 1,000 lines?

 

There are some comments, however, that you want to keep. For now, consider that you want to remove only block comments (which start with /* and end with */ and may span multiple lines).

 

You want to keep line style comments (which start with //) because they may contain important descriptions for the developer (or maintainer)—unless, of course, those line style comments appear within a block comment, in which case they should be removed anyway.

 

Solution

For the purpose of coding and testing, just take two subdirectories at the same level (a and b) and have ab.java in a and def.java in b files. (Even if the files were in different subdirectory levels, the trick as to how to tackle them has already been covered).


a/abc.java

b/def.java

For testing purposes, you don’t need to write real Java code in those files. Put the following in the abc.java file.

//Project

import something.something;

/* This is a comment */

some more statements

/* This too is a commenting spanning three lines */

good bye

Put the following in the def.java file.

//Project

Nothing to import

/* This is a comment */

few statements

and /* this too is */ a comment

/* This too

is a commenting

//having another comment trapped within */

few more statements

//Alternatively : instead of good bye you may say see you

The art of comment removal can be perfected in one file first. Extending it to multiple files will not be difficult. For this purpose, def.java is the most suitable. So copy it in the current folder.

 

Reading the whole file at once makes the task easier. See the following code.


text = File.read('def.java')

text1 = text.GSUB(/\/\*.*?\*\//m,'')

print text1

 

It works almost correctly—almost because it leaves an empty line for two of the block comments in the input file (def.java).

The /\/\*.*?\*\// pattern generally means this: /*, any number of characters (without being greedy), and */.

Since the file is read in one shot and them modifier is used for the GSUB, the multiline span is equivalent to a single line for the block comments.

 

The non-greedy specification is needed because otherwise, it will start at the first /* and end at the last */ (the end of last block comment), taking everything in between. You may test that yourself by removing the ? from the pattern.

 

But this approach is not something that we should finally adopt. We should not go the full file read path. For large input files, it is not a good idea.

 

If we go about doing our business line by line, a single pattern may not suffice. We can still apply the pattern without them modifier in order to clear all the block comments that start and end on the same line. And then on the output of that, we can try the trick for multiline pattern search/replacement.

 

The following code does the first part (except one thing) and writes the output in a file named tmp.txt.


infile = File.open 'def.java','r'

outfile = File.open 'tmp.txt','w'

while line = infile.gets

line = line.GSUB(/\/\*.*?\*\//,'')

outfile.print line

end

infile.close

outfile.close

The part that it does not do is for the block comment that has a newline after it. It does not take care of the newline (so an empty line is in the output in place of the first block comment).


//Project

Nothing to import

few statements

and a comment

/* This too

is a commenting

//having another comment trapped within */

few more statements

//Alternatively : instead of good bye you may say

 

To take care of such empty line, we can look for a newline after the end of the block comment (*/)— either immediately after, or with any number of spaces and tabs in between. So our pattern should be /\/\*.*?\*\/[\s]*\n/.

 

This, however, means that the second block comment (which has non-whitespace characters after it on the same line) will not be matched and replaced. To avoid this, we can apply both filters one after another. The following code does this and achieves the goal.


infile = File.open 'def.java','r'

outfile = File.open 'tmp.txt','w'

while line = infile.gets

line = line.GSUB( /\/\*.*?\*\/[\s]*\n/,'') line = line.GSUB( /\/\*.*?\*\//,'') outfile.print line

end

infile.close

outfile.close

 

To tackle multiline comments (while reading line by line), we have to first look for the opening pattern (/*) and once we find it, mark a flag, and then look for the closing pattern.

 

The following code does the job.


infile = File.open 'tmp.txt','r'

while line = infile.gets

if line.match(/\/\*/)

commentline = true

end

print line if (not commentline)

if commentline

commentline = false if line.match(/\*\//)

end

end

infile.close

This is the output:

//Project

Nothing to import

few statements

and a comment

few more statements

//Alternatively : instead of good bye you may say

see you

 

So far, so good. But what if the multiline comment has some text before the comment (for the opening line) and after the comment (for the closing line)? That is something like the following as the input file data.


abcd

123 /* open

and

close */ 456

efgh

The current code won’t work in this case. We need to print part of the opening and closing lines, not skip them wholly. The following code should work.


infile = File.open 'plinecmt.txt','r'

while line = infile.gets

if line.match(/\/\*/)

commentline = true

print $`

end

print line if (not commentline)

if commentline

if line.match(/\*\//)

commentline = false

print $'

end

end

end

infile.close

Predefined $` and $' variables are used to get the part before and after the (last) match, as appropriate.

 

Putting it together, making it function-based, and making a small change to avoid a newline for the closing comment line, we have the following.


def remove_comment(javafilename)

infile = File.open javafilename,'r'

outfile = File.open 'tmp.txt','w'

while line = infile.gets

line = line.GSUB( /\/\*.*?\*\/[\s]*\n/,'') line = line.GSUB( /\/\*.*?\*\//,'') outfile.print line

end

infile.close

outfile.close

infile = File.open 'tmp.txt','r'

outfile = File.open javafilename,'w'

while line = infile.gets

if line.match(/\/\*/)

commentline = true

outfile.print $`

end

outfile.print line if (not commentline)

if commentline

if line.match(/\*\//)

commentline = false

endpart = $'

endpart.GSUB!(/^[\s]*\n/,'')

outfile.print endpart

end

end

end

infile.close

outfile.close

end

remove_comment('def.java')

Notice this part.

if line.match(/\*\//)

commentline = false

endpart = $'

endpart.GSUB!(/^[\s]*\n/,'')

outfile.print endpart

end

The end part is being stripped of whitespace and newline (at the end). This removes extra newlines in place of the block comment, should the end part (the part after the comment close marker until the end of the line) consists only of whitespaces and newlines.

The objective for making it function-oriented is to make it easier to be adopted for multiple files.

 

In the preceding code, replace the line containing the call to the function with the following code.


arr = Dir.glob('**/*.java')

arr.each {|filename|

remove_comment(filename)

}

Save and close the file. Run the code. The comments are gone.

Recommend