Ruby File and Directory

Ruby File and Directory

Working with Ruby File and Directory 

While dealing with tasks at the directory level of a Ruby file system, methods in the Dir class may come handy. In this blog, we explain how to use and create a ruby file and directory. Objects of this class are directory streams that represent directories in the file system.

 

Using this API, you can create directories, change directories, list files in a directory, and so on without resorting to firing an OS-level command through the Ruby code. What is more, you can work further on the return values.

 

Let’s take, for instance, when you list the files using the proper Dir method. Since you have a handle on the list of files, you can iterate through the list, and take some particular action on each file. It can usually work with both relative (from a current directory in execution context) and absolute paths.

 

Note The current directory in the execution context may not be the directory from which you fired the script. It is possible that you programmatically changed the directory to a new one (in which case the new directory becomes your current directory in the program execution context).

 

Some of these functions can work with blocks.

 

mkdir

The mkdir function creates a directory.


Dir.mkdir('test1') => will create a directory 'test1' under the current working directory.

Dir.mkdir('test2',777) => will create directory 'test2' with '777' permission.

rmdir

The rmdir function removes the named directory, if empty. It raises an error otherwise. It can work with a relative or an absolute path.

Dir.rmdir('/tmp/tst') #absoulte path

Dir.rmdir('test2') #relative path

pwd

The pwd function returns the path to the current directory in the execution context. (It does not print it, it just returns. So you need to use puts or some such function, if you need it printed.)


irb(main):001:0> currDir = Dir.pwd

=> "/Users/Shared/chap02"

irb(main):002:0> puts currDir

/Users/Shared/chap02

=> nil

CHDIR

The CHDIR function changes the directory programmatically. Once changed, the new directory becomes current in an execution context.


irb(main):001:0> Dir.CHDIR('test1')

=> 0

irb(main):002:0> puts Dir.pwd

/Users/Shared/chap06/test1

=> nil

CHDIR has a few forms. Without an argument, it changes to the HOME directory (the HOME variable should be set in the environment).

 

Used with a block, it changes the directory to the named directory, executes the block, and upon exiting from the block, the original working directory (which was current prior to the CHDIR) is restored in the execution context. The return value of CHDIR, in this case, is the return value of the block.

The following code


puts Dir.pwd

Dir.CHDIR('test1') {

puts Dir.pwd

2 + 2

}

puts Dir.pwd

produces this:


/Users/Shared/chap06

/Users/Shared/chap06/test1

/Users/Shared/chap06

 

home

Without an argument, the home function returns the home directory of the current user.

With an argument, it returns the home directory of the named user.


Dir.home => Returns the home directory, of the current user.

Dir.home('root') => Returns roots home directory.

exist?

For the given argument, they exist? function checks that it is the name of an existing directory. If it is not a directory or does not exist, either case returns false. 

Dir.exist?('test2') => checks if the directory 'test2' exists directly under the current directory.

 

entries

The common form of the entries function takes one argument, which is the name of a directory whose entries are required. For a valid argument (directory exists), it returns an array containing the names of all files and directories in that directory. (A non-existing directory as an argument raises an error).

 

For the current directory, '.' may be passed (dot, surrounded by quotes) as the argument.


irb(main):003:0> Dir.entries('.')

=> [".", "..", "test1", "test2"]

irb(main):004:0> Dir.entries('test1')

=> [".", "..", "x.txt", "y.txt"]

new

The new function returns a new directory object for the named directory. This can be used as a handle for further action. It can use the close function on this handle (directory stream) to close it after the job is done. 

 

each

Each function works on a directory stream with a block, where the name of each file/ directory from the entries of the directory stream (i.e., the name of files/directories in the directory, which is pointed to by this directory stream or handle) gets passed as an argument.

The following code illustrates the point.


dir = Dir.new('test1')

print dir.entries

puts

dir.each {|x| puts 'Got '+ x}

dir.close

It can produce something like this:

[".", "..", "test3", "x.txt", "y.txt"]

Got .

Got ..

Got test3

Got x.txt

Got y.txt

Note that the '.' and '..' are also included in the array.

 

foreach

The foreach function has many forms, but only one is discussed here. It uses block and works similarly to each. However, instead of explicitly opening the directory stream with Dir.new, here the directory name is passed as an argument (hence, no explicit closing is required). In this sense, it is more convenient than each (less code).

 

In the test1 example, the following one-liner


Dir.foreach('test1') {|x| puts "Name : #{x}"}

produces this:

Name : .

Name : ..

Name : test3

Name : x.txt

Name : y.txt

glob

 

The glob function is, by far, the most useful function in the Dir class, so it is going to be discussed in detail.

It essentially filters the files to be worked on (for filtered values, files and directories are the same in the sense that filtering is done on names, and hence, it picks up names or either files or directories in the context), rather than all the entries, and that is very useful sometimes.

 

This is shown in the following examples, as well as in the context of at least one upcoming task.

 

It can take regular expressions, as patterns, for filtering. As a further goody, you don’t have to deal with the '.' and '..'. Some examples are given next.


Dir.glob('*') #returns all files in the current directory (but excludes '.' and '..')

irb(main):002:0> Dir.glob('*')

=> ["CHDIR.rb", "each.rb", "foreach.rb", "test1", "test2"]

It is possible to get a list of files with a particular extension (e.g., .rb).

Dir.glob('*.rb') #gets a list of file (and directory) names ending in .rb from the current directory.

irb(main):004:0> Dir.glob("*.rb")

=> ["CHDIR.rb", "each.rb", "foreach.rb"]

'**' works recursively. So finding any file with the .rb extension in any subdirectory under the current directory can be achieved using Dir.glob('**/*.rb'). Note that the default file separator may vary based on the operating system, so you can use File.join to build up the path, instead of a direct string.


irb(main):001:0> path = File.join('**','*.rb')

=> "**/*.rb"

irb(main):002:0> Dir.glob(path) #effectively Dir.glob("**/*.rb") in this case => ["CHDIR.rb", "each.rb", "foreach.rb", "test1/test3/z.rb", "test1/x.rb", "test1/y.rb"]

 

It is possible to restrict the recursive search to any subdirectory with a particular name. For instance, we can get all the .rb files under any test3 directory anywhere (at any sublevel) under the current directory, as follows.


irb(main):003:0> Dir.glob('**/test3/*.rb')

=> ["test1/test3/z.rb"]

It is also possible to use an expression like '**/test1/**/*.rb', which indicates any .rb file at any sublevel of any directory named test1 (which itself could be at any sublevel under the current directory).

irb(main):004:0> Dir.glob('**/test1/**/*.rb')

=> ["test1/test3/z.rb", "test1/x.rb", "test1/y.rb"]

 

Eventually, there are other patterns possible (and a sensible combination of patterns would also work). Any files (or directories) that start with t would be as follows.


irb(main):005:0> Dir.glob('t*')

=> ["test1", "test2"]

And any file (or directory) that has each in it would be as follows.

irb(main):006:0> Dir.glob('*each*')

=> ["each.rb", "foreach.rb"]

 

It is possible to search among multiple extensions. The following code finds all files (or directories) in the current directory, which has either extension .rb or .txt.


irb(main):001:0> Dir.glob('*.{rb,txt}')

=> ["CHDIR.rb", "each.rb", "foreach.rb", "x.txt", "y.txt"]

This pattern used a regular expression (for pattern alteration) of the form {p, q}.

 

It is possible to find files (or directories) that have an extension whose first character is not r (anything but r). Here, a regular expression is used. (The regular expression [^r] means a single character that is anything but r).


irb(main):003:0> Dir.glob('*.[^r]*')

=> ["x.txt", "y.txt"]

Note that it will not pick up a file (name) that does not have a '.' in its name (thus all files without extensions will be excluded). This is because the overall pattern includes the '.' character, and hence, it looks for the dot in the name of the file (or directory).

 

Dividing Files into Subdirectories

This task is rather simple. There are some files in a directory. All of them have the .sql extension. But some are table creation scripts; others are procedure creation scripts.

 

From the name or extension, it is not distinguishable whether a file has a table creation script or a procedure creation script inside it. Your task is to write a script to do the following.

  1. Create two subdirectories (named tbl and proc) in the current directory.
  2. Get the .sql files, one by one, and find out whether the first line matches table or procedure.
  3. Move the file to the appropriate subfolder.

 

Solution

To test the program, you need input data (files). Create four files named a.sql, b.sql, c.sql, and d.sql, respectively. In the first two files, put 'create table a' and 'create table b' in the first line (and some text in the second line). Here is an example.


create table a

col a1 null

For the last two files, use 'create procedure' in the first line. Here is an example.

create procedure c

begin

If you think about the steps in the task, how to create a directory (the first part) has already been discussed (using the Dir API). How to move a file programmatically (the third part) has not been. 

 

In the second part of the task, given the file name, you could open it, get the first line, and use the match operator to find out if it contains 'table' or 'procedure'. The Dir API can also be used to get only the .sql files in the directory.

 

To move a file, you can use the mv function of FileUtils. One example is given next. (Note that this is a rather crude example without any exception handling, but it shows the basic code.)

 

require 'fileutils' http://FileUtils.mv('abc.txt','tbl') #http://FileUtils.mv('abc.txt','tbl/abc.txt')

Provided that the abc.txt file exists in the current directory, the second line of the code will rename the file (unless a tbl directory exists under the current directory). The third line (when uncommented) will have a proper move effect (and not rename), provided that the files and directories exist as desired.

 

To get the names of all the .sql files in the current directory, you can use the following code.

arr = Dir.glob('*.sql')

print arr

 

It takes the file names in an array and prints the array.


["a.sql", "b.sql", "c.sql", "d.sql"]

And the array can be iterated over using block structure and the each method.

Putting it altogether, the code looks like this:

require 'fileutils'

Dir.mkdir('tbl')

Dir.mkdir('proc')

arr = Dir.glob('*.sql')

arr.each {|filename|

infile = File.open(filename,'r')

firstline = infile.gets #just need to read the first line infile.close

http://FileUtils.mv(filename,'tbl') if firstline =~ /table/ http://FileUtils.mv(filename,'proc') if firstline =~ /procedure/

}

Note Instead of Dir.mkdir('tbl'), FileUtils.mkdir('tbl') will also work.

This is somewhat crude but it works. Since we need only the first line, there is no need to use a while loop on the files. Also, it is very important that the opened file be closed prior to the move.

 

This was a simple use case. In reality, a file’s content may be more complicated (such as the table keyword appearing on the second line, or the word 'procedure' appears first in a table creation script file, within a commented part, not to mention case insensitive keywords). Also, no proper error handling has been added to this code.

 

In a real-life task, unless you are running it yourself and you are able to monitor the run and the results, it is imperative that proper error handling is in place.

 

Repeated running of the code would create a problem because the (sub) directories are already created. To avoid this, you could change the lines for directory creation as follows.


Dir.mkdir('tbl') unless File.directory?('tbl')

Dir.mkdir('proc') unless File.directory?('proc')

This means that the directory created for each one would not be attempted if already present.

 

Adding Text to Files Using a Batch Operation

Problem


(The following is a fictitious situation. Any resemblance …)

Dale is the team leader of Zoran’s team.

Dale stormed into the meeting room.

“Guys, we have a situation.”

 

The team members waited eagerly in anticipation. “Our team has been chosen to be audited this year.” This was not good news, thought Zoran. “You know how fussy they are about coding standards. Do those Java files in our project have a header with the project name and the code owner’s name?” Dale asked.

 

Zoran didn’t like where this was going. He was the unofficial batch script expert on the team, and he was pretty sure nobody bothered to put those comments in place (he himself didn’t).

  1. “Zoran?” Dale looked around to face Zoran as he spoke. “Write a script that can run from the project root directory, identify all the .java files, and add the header as a comment on the first line. Let me know when it is done.”
  2. “Who should I put for the code owner’s name ?” Zoran asked.
  3. “Use my name for now. My full name.” Dale replied.

 

Solution

For this task, the first thing to do is to identify the (.java) file names in the project, using the full path from the root directory.

 

I will show two ways of achieving it. The second one is really easy for the task, but the first approach may be useful (with some modifications as appropriate) in other situations.

 

To test the code, create a set of directories (and subdirectories) under the current directory.

a

b/1

a and b are the immediate subdirectories. 1 is a subdirectory of b.

In a, create a file called abc.java. In 1, create another file named def.java. Each of the files should have two lines.

111

222

 

That is not Java code (far from it). But our aim is to test our script, and this should be fine for our purpose. Finally, these are the subfolders and files of concern (other than the Ruby script itself):

a/abc.java

b/1/def.java

 

Approach 1: Output From Command Execution

In Mac or Linux (tested on a Mac), the following command


find . -name *.java -print

outputs as follows.

./a/abc.java

./b/1/def.java

You can get the return value of a command, as a single string, with the backquote construct.

val = `find . -name *.java -print`

The %x () construct also works in the same way. The following code

val = %x(find . -name *.java -print)

val.gsub!("\n",'')

puts val

arr = val.split("./")

print arr

prints as follows.

./a/abc.java./b/1/def.java

["", "a/abc.java", "b/1/def.java"]

The first line of the code gets the whole return in a string (val). The second line replaces all the newlines in the string, in place. (Make sure to use double quotes for \n, not single quotes).

 

The fourth line splits the string based on the ./.

Note that we still need to get rid of the first element of the array. Check the following code.


#get return value of command in a string val = %x(find . -name *.java -print)

#replace all \n characters

val.gsub!("\n",'')

#split by ./ and take the second element onwards

#array of .java filenames with full path starting form current directory arr = val.split("./")[1..-1]

print arr

This code populates the arr array the way that we need.

["a/abc.java", "b/1/def.java"]

This approach of running an OS-level command, getting the output, and processing may be useful elsewhere.

Approach 2: Use Dir.glob

This one is really easy.


arr = Dir.glob('**/*.java')

print arr

It prints as follows.

["a/abc.java", "b/1/def.java"]

 

Recommend