SQL Insert and SQL Query (Best Tutorial 2019)

SQL Insert and SQL Query

What is SQL and how SQL insert work

SQL stands for Structured Query Language, and it is the de facto standard for interacting with relational databases. Almost all database management systems you’ll come across will have an SQL implementation.

 

While SQL is a computer language, it is not like the other programming languages that you may have heard of like Python or C. Such programming languages are generic in nature, suitable for a wide variety of tasks from programming basic calculating systems to advanced simulation models.

 

SQL is a special purpose query language meant for interacting with relational databases. It has no use other than this context.

 

Advantages of Using SQL

It is standardized – no matter which relational database you choose, it will have an SQL query interpreter built in. The sheer popularity of SQL makes it worth everyone’s time who interacts with a data system.

 

It has a reasonable English-like syntax. None of the painstaking detail of programming languages like C or Java has to be specified when using SQL. It is concise, easy to understand, and easy to write database queries with.

 

It is declarative in nature, meaning you only have to declare what you want to achieve rather than going over the steps to achieve the results.

It allows a uniform way to query and administer a relational database. Many of the database administration commands are standard SQL commands making the transfer of skills much easier.

 

SQL Commands Classification

SQL Commands Classification

SQL is a language for interacting with databases. It consists of a number of commands with further options to allow you to carry out your operations with a database. While DBMS’s differ in the command subset they provide, usually you would find the classifications below.

 

Data Definition Language (DDL): CREATE TABLE, ALTER TABLE, DROP TABLE, etc. These commands allow you to create or modify your database structure.

 

Data Manipulation Language (DML): INSERT, UPDATE, DELETE. These commands are used to manipulate data stored inside your database.

 

Data Query Language (DQL): SELECT. Used for querying or selecting a subset of data from a database.

 

Data Control Language (DCL): GRANT, REVOKE, etc. Used for controlling access to data within a database, commonly used for granting user privileges.

 

Transaction Control Commands COMMIT, ROLLBACK, etc. Used for managing groups of statements as a unit of work. Besides these, your database management system may give you other sets of commands to work more efficiently or to provide extra features. But it is safe to say that the ones above would be present in almost all DBMS’s you encounter.

 

Explaining Tables

A table in a relational database is nothing but a two-dimensional matrix of data where the columns describe the type of data, and the row contains the actual data to be stored.

 

A key concept of tables is that they are conceptual in nature and may not have any bearing upon the actual files where the data is stored. When users create a spreadsheet, they associate a file name with the spreadsheet and place it somewhere on their disk. But relational databases hide all these details from the user.

 

The physical storage of a table on the disk might be to a single file, or to many files, or even have a relationship of storing many tables in a single file. It is the responsibility of your DBMS to provide a way to read and write to tables.

 

Data Types in SQL

Data Types in SQL

Just like programming languages, SQL also has data types to define the kind of data that will be stored in its fields. In the table given above, we can see that the fields language and author must store English language characters.

 

The id and year fields both store whole numbers. The commonly used data types you will encounter in subsequent blogs are shown in Table.

Table. Various Data Types in SQL
Character types char, varchar
Integer values integer, smallint
Decimal numbers numeric, decimal
Date data type date

 

A string of characters is usually stored in either char or varchar. The former reserves as much space as you want when you specify the field, but if the value you store in it is shorter, the remaining space is wasted.

 

A varchar, however, stands for a varying character and will occupy the exact length of the string, nothing wasted. There is, however, a maximum limit to how long a string value you can assign to such a field, and that is specified during the field definition itself.

char(12)

varchar(12)

 

If you store the value ‘McCarthy’ that is eight characters long, the char will store it but waste four characters. The varchar will store it as exactly eight characters but the whole dynamism comes at a cost of speed.

 

Nonetheless, the speed difference is small enough that for most scenario’s you would see the varying character data type being used.

 

In the case of number values, we get a split across two major classes – an integer for storing whole numbers and numeric for storing numerical values with a decimal point in them.

 

The ranges and limits of the values being stored in them vary with your choice of DBMS. However, a good rule of thumb to follow is to use the smallest data type that will suffice for the present and foreseeable future of your application.

 

For example, if I were storing student roll numbers, using a smallest would suit just fine. In most implementations, this data type allows a maximum value of 32767, a number I mostly expect to be much greater than the number of students in any class.

 

Decimal point numbers are trickier to specify. We use the numeric data type to fix how large the number could be and how many numbers can occur after the decimal point.

numeric(precision, scale)

numeric(5, 2)

 

The total number of digits is specified by the precision and the number of digits after the decimal point is represented by a scale. So in the example given, we would be able to store a number like 999.99 but not any further.

 

Since data types still vary from a DBMS implementation to another, I suggest you keep your DBMS manual handy. Each implementation gives you many other types to work with, but for our learning purposes, the ones above should suffice.

 

Getting Your Database Ready

Database

The best way to learn SQL is to practice writing the commands on a real relational database management system. In this blog, SQL is taught using either one of the following systems: PostgreSQL or SQLite.

 

The reasons for choosing these DBMS systems are simple – they are the free and open source, with availability on most major platforms.

 

PostgreSQL is a full-featured enterprise-class database management system with a great community. SQLite is a small but robust system that is especially suited for learning purposes. Choose the latter if you are not comfortable with software installations.

 

However, any relational database product that you can get your hands on should serve you just fine. In some cases, you may already have access to one in your organization, but be sure to ask for permissions to use it for learning purposes.

 

There might be minor incompatibilities between different vendors, so if you choose something else to practice on while reading this blog, it would be a good idea to keep the database vendor's user manual handy.

 

Since this text deals largely with teaching SQL in a product-­ independent manner, rather than the teaching of a specific DBMS system, details with respect to installation and specific operations of the product will be kept to a minimum.

 

Emphasis is instead placed on a few specific steps that will help you to get working on writing SQL as fast as possible.

 

Using PostgreSQL

PostgreSQL

The latest version of PostgreSQL as of writing this blog was 9.6. You don’t absolutely need the latest version; in fact, I use version 9.5 in this text.

 

You can download the latest version of PostgreSQL from https:// PostgreSQL: Downloads for your platform. For the fastest and easiest installation, I would recommend you choose your platform from the Binary Packages list.

 

Pre-built binaries mean that you can simply download and install PostgreSQL like any other software using a graphical step-by-step installer.

 

After choosing your platform, you might still get multiple ways to perform an installation. I’d recommend choosing the graphical installer version from third-party vendors like BigSQL or EnterpriseDB.

 

I had chosen the EnterpriseDB installer for my Fedora Linux machine, and a friendly installation procedure popped up when I ran the downloaded file, asking for details like the installation directory.

 

Some other details will also be asked for, most importantly the port number and password. The default value of 5432 for the port number should suffice. At the end of the installation, you would have a user named ‘Postgres’ on your system and a working database installation.

 

You can quickly verify that everything went well using PSQL, which is a command-line utility to interact with your PostgreSQL installation. I am capturing the command and output from my system below.

Listing Launching the PSQL shell
[~]$ /opt/PostgreSQL/9.5/bin/PSQL -U Postgres
PSQL.bin (9.5.8)
Type "help" for help.
Postgres=#

If you get a similar output, you are ready to start using your PostgreSQL installation. If not, I’m afraid you will have to do some digging on your own. You can also choose SQLite, which is discussed in the next section and has a much easier installation procedure.

 

Using SQLite

Using SQLite

If installing PostgreSQL seems like a daunting task, you are in luck. There is a very credible, free alternative database for you to practice on. It is called SQLite and its creator D. Richard Hipp has generously licensed it in the public domain.

 

You can download it from the project page at ­https:// SQLite Download Page. If you are on a Linux or MacOS X system, chances are high that you already have SQLite installed. To test this, you can attempt launching the SQLite shell sqlite3.

 

Launching the sqlite3 shell
[~]$ sqlite3
SQLite version 3.13.0 2016-05-18 10:57:30 Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database. sqlite>

 

If you get the output as above, you have everything you need to run SQLite. Alternatively, if you get an error message, it means you have to install it yourself. You can either use the similar precompiled binary method for your platform or you could use the system installer.

 

For systems like Red Hat Enterprise Linux, Scientific Linux, and CentOS, you can use yum to install SQLite.

# yum install sqlite
On a Fedora Linux system, you have to use dnf as below.
# dnf install sqlite
If you happen to use a Debian- or Ubuntu-based system, you can achieve the same result with the following.
$ sudo apt-get install sqlite3

Once the installation is done, you can verify the installation by launching the SQLite shell as before.

 

Creating Your Own Database

Creating Your Own Database

Database management systems like PostgreSQL allow you to create multiple databases. For practice purposes, it's advisable to create your own database, so that you are free to perform any operations on it.

 

Most database systems differ in the way they provide database creation facilities. PostgreSQL achieves the same by providing you multiple ways to do this, including through the pgAdmin III graphical utility.

 

However, for didactic purposes, we will instead use a command operation to create our database. Open up the PSQL shell and enter the command as below.

 

Listing Creating a database in PostgreSQL

database in PostgreSQL

CREATE DATABASE TestDB;

The command CREATE DATABASE is used to create a database that will serve as a holding envelope for your tables. In the example and output shown above, we created a database called TestDB for our use.

 

The login user you used while connecting with PSQL, in most cases Postgres, is now the owner of this database and has full control of entities within it. This is analogous to creating a file in an operating system where the creator gets full access control rights and may choose to give other users and groups specific rights.

 

The SQL standard by definition allows commands and keywords to be written in a case-insensitive manner. In this blog, we will use uppercase letters while writing them in statements, which is a widely accepted practice.

 

If you are using SQLite, fire up the command shell, and you will be greeted with a text printing version information. This is exactly the same message we saw in the previous section when we were verifying our SQLite installation.

 

Listing Opening the SQLite shell

SQLite version 3.13.0 2016-05-18 10:57:30 Enter ".help" for usage hints.

Connected to a transient in-memory database.

Use ".open FILENAME" to reopen on a persistent database. sqlite>
Here we enter our .open command to both create a SQLite database or open it in case it already exists.
sqlite> .open testdb
On a Linux system, you could also simply write the database name after the command of the SQLite shell like below, and you would be able to open the said database.
sqlite3 testdb

 

Interestingly, this invocation would not result in the creation of TestDB, it would simply open it if it exists. If you don’t perform any other operation and close the shell (Ctrl-D), there would be no TestDB file on your machine on Linux. On a Windows system, you would get an empty file with a length of 0 bytes.

 

Table Creation

Table Creation

We have already explored the concept of a table in a relational model. It is now time to create one using a standard SQL command – CREATE TABLE.

Listing General Syntax of a CREATE TABLE statement

CREATE TABLE <Table_Name>
(<Field 1> <Data Type>,
<Field 2> <Data Type>,
\. \. \.
<Field N> <Data Type>);

 

This is the simplest valid statement that will create a table for you, devoid of any extra options. We'll further this with clauses and constraints as we go along, but for now, let us use this general syntax to actually create the table of programming languages we introduced in the blog.

 

Listing Creating the programming languages table

CREATE TABLE proglang_tbl (
id INTEGER,
language VARCHAR(20),
author VARCHAR(25),
year INTEGER);

 

We have to key this command in PostgreSQL on the PSQL shell. Notice that when we launch the shell, the last line where our cursor waits looks as below: Postgres=#

 

This actually means that we are connected to the database named Postgres, which is something the PostgreSQL installation uses internally for management purposes. We have already created our very own database. Let’s switch to that before creating our tables using \c.

 

Listing. Connecting to a different database in PSQL

Postgres=# \c TestDB

You are now connected to database "TestDB" as user "Postgres". TestDB=#

Notice that the text on the last line has changed, indicating the current database we are connected to. Now you can key in the table creation statement, and if you don’t miss any of the important punctuation or misspell the keywords written in uppercase, your table would be created and the shell would reply simply with:

 

CREATE TABLE

table

TestDB=#

A non-successful command would yield an error with a helpful explanation. To see this in action, let’s run the exact same table creation command again. The shell would now respond:

 

ERROR: relation "proglang_tbl" already exists

The statement by itself is simple enough since it resembles the general syntax of Listing.

It is interesting to note the data types chosen for the fields. Both id and year are specified as integers for simplicity, even though there are better alternatives.

 

The language field is given a space of 20 characters to store the name of the programming language while the author field can hold 25 characters for the creator's name.

The semicolon at the last position is the delimiter for SQL statements, and it marks the end of a statement.

 

Creating the programming languages table in SQLite

[~]$ sqlite3 TestDB
SQLite version 3.13.0 2016-05-18 10:57:30
Enter ".help" for usage hints.
sqlite> CREATE TABLE proglang_tbl (
...> id INTEGER,
...> language VARCHAR(20),
...> author VARCHAR(25),
...> year INTEGER);
sqlite>

 

Notice that there is no successful operation message. If all goes well, SQLite shell simply moves on.

Just for some experimentation, if we try to create the same table again, we get an error saying: Error: table proglang_tbl already exists which is again a helpful and somewhat friendlier error message.

 

Inserting Data

Inserting Data

The table we have just created is empty so our task now becomes the insertion of some sample data inside it. To populate this data in the form of rows, we use the DML command INSERT, whose general syntax is given below.

General syntax of INSERT INTO TABLE
INSERT INTO <Table Name>
VALUES ('Value1', 'Value2', ...);
Inserting data into the proglang_tbl table
INSERT INTO proglang_tbl
VALUES (1, 'Fortran', 'Backus', 1955);
INSERT INTO proglang_tbl
VALUES (2, 'Lisp', 'McCarthy', 1958);
INSERT INTO proglang_tbl
VALUES (3, 'Cobol', 'Hopper', 1959);

 

If you do not receive any errors from PSQL or sqlite3 (or the SQL interface for your chosen DBMS), then you have managed to successfully insert three rows of data into your table.

 

Notice how we've carefully kept the ordering of the fields in the same sequence as we used for creating our table. This strict ordering limitation can be removed, and we will see how to achieve that later on.

If you ran these three statements in PSQL, at the end of each executed statement, you would receive a message like:

INSERT 0 1

indicating a success.

 

Writing Your First Query

Writing Your First Query

Let us now turn our attention to writing a simple query to check the results of our previous operations in which we created a table and inserted three rows of data into it. For this, we would use a Data Query Language (DQL) command called SELECT.

 

A query is simply a SQL statement that allows you to retrieve a useful subset of data contained within your database. You might have noticed the INSERT and CREATE TABLE commands were referred to as statements, but a fetching operation with SELECT falls under the query category.

 

Most of your day-to-day operations in an SQL environment would involve queries since you'd be creating the database structure once (modifying it only on a need basis) and inserting rows only when new data is available.

 

While a typical SELECT query is fairly complex with many clauses, we will begin our journey by writing down a query just to verify the contents of our table. The general syntax of a simple query is given below.

 

General syntax of a simple SQL query

SELECT <Selection> FROM <Table Name>;

Transforming this into our result verification query is a simple task. We already know the table we wish to query – proglang_tbl and for our selection, we would use * (star), which will select all rows and fields from the table.

SELECT * FROM proglang_tbl;

 

The output of this query would be all the (3) rows displayed in a matrix format just as we intended. If you are running this through PSQL, you would get an output similar to the one given below.

Verifying the contents of our table in PSQL TestDB=# select * from proglang_tbl;
id | language | author | year
----+ ---------- + ---------- +------
1 | Fortran | Backus | 1955
2 | Lisp | McCarthy | 1958
3 | Cobol | Hopper | 1959
(3 rows)
TestDB=#

The output from SQLite would be slightly messier at first, but let’s fix that one step at a time.

Verifying the contents of our table in sqlite3

sqlite> select * from proglang_tbl; 1|Fortran|Backus|1955 2|Lisp|McCarthy|1958 3|Cobol|Hopper|1959 sqlite>

Clearly not the cleanest output, but setting a few options would fix that.

 

The first of them is called:

.mode column and this would output a neatly spaces resultset rather than the squashed one we saw before.

Turning on the column mode
sqlite> .mode column
sqlite> select * from proglang_tbl;
1 Fortran Backus 1955
2 Lisp McCarthy 1958
3 Cobol Hopper 1959
sqlite> .headers on
sqlite> select * from proglang_tbl;
id language author year
---------- ---------- ---------- ----------
1 Fortran Backus 1955
2 Lisp McCarthy 1958
3 Cobol Hopper 1959

I recommend that you keep these options turned on for your learning sessions. The output becomes much easier to verify at a glance.

 

The Benefit of Constraints

The Benefit of Constraints

Relational databases are well into their fourth decade of dominance as a data storage and organization mechanism. A large part of this success is owed to the flexibility of the data model. It is easy to visualize all kinds of data fitting into a neat tabular structure with predefined columns.

 

The flexibility also extends to querying – while creating and populating tables, the little restriction is placed upon what you can query from a table. You might try to generate completely new insights from a table you hadn’t thought of before.

 

To enable all of this, relational databases expect a certain amount of discipline and thought being put upfront when designing your tables. Neat tables with well-defined data types are essential for success, and certain rules help you keep on this path of good database design.

 

A constraint is a rule that you apply or abide by while doing SQL operations. They are useful in cases where you wish to make the data inside your database more meaningful and/or structured.

 

The Null Constraint

Null Constraint

 

Consider the example of the programming languages table – every programming language that has been created must have an author (whether a single person, a couple or committee).

 

Similarly, it should have a year when it was introduced, be it the year it first appeared as a research paper or the year a working compiler for it was written. In such cases, it makes sense to create your table in such a way that certain fields do not accept a NULL (empty) value.

 

A null value does not mean 0 (zero) or an empty string like ‘’. Think of it as either empty or undefined. If you haven’t captured someone’s age while populating a table, you can’t assume their age to be 0.

 

This might have serious implications if someone was using this data for statistical analysis. Putting a null value there makes much more sense.

 

We now modify our previous CREATE TABLE statement so that we can apply the NULL constraint to some fields.

Listing  Creating a table with NULL constraints CREATE TABLE proglang_tblcopy 

id INTEGER NOT NULL,
language VARCHAR(20) NOT NULL,
author VARCHAR(25) NOT NULL,
year INTEGER NOT NULL,
standard VARCHAR(10) NULL);

 

In this table, we only allow the standard field to have a null value. Every other field ends with the option NOT NULL, which specifies that this field must necessarily have a value.

 

All fields by default are nullable in most database management systems, so you have to specify a non-nullable field. Writing the word NULL to specify a nullable field is optional.

 

If we try to insert a row into this table with a NULL value in one of the non-nullable fields like year, we expect an error message to be thrown at us. In both SQLite and PostgreSQL, we represent a null value with the literal null noting the lack of any quotation marks that generally enclose strings.

 

Null is not a string value and writing ‘null’ makes an actual string of length 4 and is decidedly non-null. Other database management systems might represent null values in a different way, so check your manual for such details.

 

Inserting a null value in SQLite

null value in SQLite

sqlite> INSERT INTO proglang_tblcopy
VALUES (1, 'Fortran', 'Backus', null, 'ANSI');
Error: NOT NULL constraint failed: proglang_tblcopy.year
Inserting a null value in PostgreSQL
TestDB=# INSERT INTO proglang_tblcopy
VALUES (1, 'Fortran', 'Backus', null, 'ANSI');
ERROR: null value in column "year" violates not-null constraint
DETAIL: Failing row contains (1, Fortran, Backus, null, ANSI).

 

We see in this case that we have achieved our objective of creating a table in which the field's id, language, author, and year cannot be empty for any row, but the new field standard can take empty values. We now go about trying to insert new rows into this table using an alternative INSERT syntax.

 

Selective Fields INSERT

Selective Fields INSERT

 

From our last encounter with the INSERT statement, we saw that we had to specify the data to be inserted in the same order as specified during the creation of the table in question.

 

We now look at another variation that will allow us to overcome this limitation and handle inserting rows with embedded NULL values in their fields by not specifying them at all. While this approach may seem verbose initially, its advantages quickly outweigh any statement length-related concerns.

 

 General Syntax of INSERT with selected fields

INSERT INTO <Table_Name>
(<Field Name 1>,
<Field Name 2>,
. . .
<Field Name N>)
VALUES
(<Value For Field 1>,
<Value For Field 2>,
. . .
<Value For Field N>);

 

Since we specify the field order in the statement itself, we are free to reorder the values sequence in the same statement, thus removing the first limitation.

 

Also, if we wish to enter an empty (NULL) value in any of the fields for a record, it is easy to do so by simply not including the field’s name in the first part of the statement.

 

The statement would run fine without specifying any fields you wish to omit, provided they do not have a NOT NULL constraint attached to them.

 

We now write some INSERT statements for the proglang_tblcopy table, in which we try to insert some languages that have not been standardized by any organizations and some which have been.

Inserting new data into the proglang_tblcopy table
INSERT INTO proglang_tblcopy
(id, language, author, year, standard)
VALUES
(1, 'Prolog', 'Colmerauer', '1972', 'ISO');
INSERT INTO proglang_tblcopy
(id, language, author, year)
VALUES
(2, 'Perl', 'Wall', '1987');
INSERT INTO proglang_tblcopy
(id, year, standard, language, author)
VALUES
(3, '1964', 'ANSI', 'APL', 'Iverson');

 

When you run this through your SQL interface, three new rows would be inserted into the table. Notice the ordering of the third row; it is not the same sequence we used to create the table.

 

Also, since Perl (row id 2) has not been standardized by an international body yet, so we do not specify the field name itself while doing the INSERT operation. This ensures that the standard field for the row is populated with null.

 

To verify the results of these statements and to make sure that the correct data went into the correct fields, we run a simple query as before.

SELECT * FROM proglang_tblcopy;
Table Result of the Query Run on proglang_tblcopy
id language author year standard
1 Prolog Colmerauer 1972 ISO
2 Perl Wall 1987
3 APL Iverson 1964 ANSI

 

Nulls are often shown by SQL interfaces by a blank space or a question mark (?) or sometimes even the word ‘null’ or ‘(null)’. Each implementation is free to choose its representation since it is not standardized among vendors.

 

Check Constraints

Check Constraints

Data must be meaningful for someone to derive insights from it. A great advantage of relational databases is that they enable good structuring of data, proper data type-based storage, and null value rules.

 

Check constraints go a step even further by providing validation of what values are allowed in a particular field.

 

They allow you to provide a logical expression against which inserted values are tested and subsequently accepted or rejected.

 

For example, suppose we wish to ensure that in our programming languages table, no language creation year could be less than or equal to 1950. This would stop someone from entering values like 190 for the year, which makes sense unless we wish to capture programming languages created during the Roman Empire!

Listing Creating a check constraint for the year field
CREATE TABLE proglang_constraints (
id INTEGER NOT NULL,
language VARCHAR(20) NOT NULL,
author VARCHAR(25) NOT NULL,
year INTEGER NOT NULL
CHECK (year > 1950),
standard VARCHAR(10) NULL);

 

Note the full definition of the year field that defines the check constraint after the NOT NULL constraint. The logical expression we are testing against is year > 1950, which disallows any row containing a year value less than 1951. Let’s try entering such a row to test the hypotheses.

Trying to violate a check constraint
TestDB=# INSERT INTO proglang_constraints (id, language, author, year)
VALUES
(1, 'Short Code', 'Mauchly', 1949);
ERROR: new row for relation "proglang_constraints" violates check constraint "proglang_constraints_year_check"
DETAIL: Failing row contains (1, Short Code, Mauchly, 1949, null).

 

While an underused feature, check constraints are extremely useful. A lot of application software code is written with the purpose of validating data to be inserted, an area where check constraints can help immensely.

 

Primary Key Constraint

Primary Key Constraint

The mathematical concept behind the relational data model was Set theory. This area of discrete maths deals with an unordered bag of values that can be uniquely identified, that is, contains no duplicates. For a table, a value is a record of data and a key column for each record is the perfect way to identify it.

 

A primary key is used to make each record unique in at least one way by forcing a field to have a unique value.

 

They do not have to be restricted to only one field; a combination of them can also be defined as a primary key for a table. One must think carefully about the logical implications of choosing a field or a combination of them as a primary key.

 

Often the best primary key candidates are not our instinctive identifiers for a collection. If you were storing data about people, their names are something we identify them within real-life scenarios.

 

But what would happen in the unforgiving world of primary keys if two people were named ‘David Childs’?

 

In our programming languages table, the id field is a good choice for applying the primary key constraint. We will now modify our CREATE TABLE statement to incorporate this.

A CREATE TABLE statement with a primary key CREATE TABLE proglang_tbltmp (
id INTEGER NOT NULL PRIMARY KEY,
language VARCHAR(20) NOT NULL,
author VARCHAR(25) NOT NULL,
year INTEGER NOT NULL,
standard VARCHAR(10) NULL);

 

ID fields are usually chosen as primary fields. Note that in this particular table, the language field would have also worked, since a language name is unique. However, if we have a table that describes people, we should try to find a logically unique field like their SSN number or employee ID number.

 

Even though the concept of a primary key seems to be natural and necessary, most database implementations don’t really enforce it. This includes the two databases used for examples in this blog – PostgreSQL and SQLite.

 

You are free to create a table without any primary keys (like we did before this section came along) and insert exactly duplicated data again and again. Not an ideal scenario but allowed nonetheless if you are so inclined.

 

Let us add some duplicated rows in our proglang_tblcopy table that we were working with in the beginning of the blog.

 

Inserting duplicate data in a table without a primary key

Inserting duplicate data

INSERT INTO proglang_tblcopy
(id, language, author, year)
VALUES
(2, 'Perl', 'Wall', '1987');
INSERT INTO proglang_tblcopy
(id, language, author, year)
VALUES
(2, 'Perl', 'Wall', '1987');

 

Note that we already had three unique rows in the table to which we added two duplicated ones. The execution of the INSERT statements was silent, indicating success. Let’s verify the contents of the table now.

SELECT * FROM proglang_tblcopy;
Table Result of the Query Run on proglang_tblcopy Showing Duplicated Rows
id language author year standard
1 Prolog Colmerauer 1972 ISO
2 Perl Wall 1987
3 APL Iverson 1964 ANSI
2 Perl Wall 1987
2 Perl Wall 1987

If we try to add duplicate records in our table containing the primary key constraint – proglang_tbltmp, we promptly get an error thrown at us.

Trying to add a duplicate record violating a primary key constraint

INSERT INTO proglang_tbltmp
(id, language, author, year)
VALUES
(2, 'Perl', 'Wall', '1987');
ERROR: duplicate key value violates unique constraint "proglang_tbltmp_pkey"
DETAIL: Key (id)=(2) already exists.

 

Unique Key Constraints

Unique Key Constraints

A unique key like a primary key is also used to make each record inside a table unique. Once you have defined the primary key of a table, any other fields you wish to enforce as unique is done through this constraint.

 

Well, thought out uniqueness constraints go a long way in ensuring that the data inside the table is sane.

 

For example, in our database, it now makes sense to have a unique key constraint on the language field. This would ensure none of the records would duplicate information about the same programming language even if the id field was non-matching.

 

 Programming languages table with the unique key constraint

CREATE TABLE proglang_tbluk (
id INTEGER NOT NULL PRIMARY KEY,
language VARCHAR(20) NOT NULL UNIQUE,
author VARCHAR(25) NOT NULL,
year INTEGER NOT NULL,
standard VARCHAR(10) NULL);
We will now try to insert two rows about the language Prolog cleverly changing the id field to test out our unique constraint.
Inserting duplicate data in a table with a unique key constraint
TestDB=# INSERT INTO proglang_tbluk (id, language, author, year, standard)
VALUES
(1, 'Prolog', 'Colmerauer', 1972, 'ISO');
INSERT 0 1
TestDB=# INSERT INTO proglang_tbluk (id, language, author, year, standard)
VALUES
(2, 'Prolog', 'Colmerauer', 1972, 'ISO');
ERROR: duplicate key value violates unique constraint "proglang_tbluk_language_key"
DETAIL: Key (language)=(Prolog) already exists.

Note that we write the word UNIQUE in front of the field and omit the KEY in the table creation command. You can have as many fields with unique constraints as you wish.

 

Differences Between a Primary Key and a Unique Key

Primary Key and a Unique Key

You might have noticed that the two constraints discussed above are similar in their purpose. However, there are a couple of differences between them.

\ 1.\ A primary key field cannot take on a NULL value, whereas a field with a unique constraint can. However, there can be only one such record since each value must be unique due to the very definition of the constraint.

 

\ 2.\ You are allowed to define only one primary key constraint for a table, but you can apply the unique constraint to as many fields as you like.

 

This is a favorite interview question for any job that deals with SQL as far as my experience goes. It is not too unfair considering the importance of these constraints to a logical data model. Just remember to think of a primary key as a NOT NULL UNIQUE constraint.

 

A primary key ensures a logical way to differentiate between rows of a table. It is the bare minimum criterion for a differentiated record.

 

Unique constraints are usually added as additional rules to ensure data sanity while keeping the business or domain rules in mind. It’s not necessary to have them, but they act as gatekeepers to allow only good data through.

 

Operations on Tables

Operations on Tables

 

Tables are the fundamental storage containers of the relational world. A database will typically contain many tables, each representing a collection of entities.

 

As requirements evolve, so do the tables within a database and database administrators (DBA’s) routinely perform administrative operations on individual tables like deleting them or changing their definition.

 

While typical database users are not granted permissions to perform such operations on large production databases, nonetheless it is important to be familiar with them for didactic purposes.

 

You might have noticed that we keep on making new tables whenever we are introducing a new concept.

 

This has had the not-so-desirable effect of populating our database with many similar tables each holding programming languages data but with slightly varying definitions and constraints. We will now go about dropping unneeded tables and modifying existing ones to suit our needs.

 

Dropping Tables

The deletion of tables in SQL is achieved through the DROP TABLE command. DROP is actually a top-level SQL command, much like CREATE, which performs a deletion operation on many kinds of database objects. To delete a table, we simply append it with the database object type – a TABLE in this case.

 

We will now drop any superfluous tables we have created during the previous lessons. Note that dropping a table means deleting a table and any data inside it without a chance of recovery. So be careful while writing DROP commands.

Dropping the temporary tables we created
DROP TABLE proglang_tbl;
DROP TABLE proglang_tblcopy;
DROP TABLE proglang_constraints;
DROP TABLE proglang_tbltmp;

 

If you get no errors returned, it means that the tables have been deleted. DROP TABLE only supports dropping a single table at a time conventionally though there are clever ways to go about deleting multiple tables with a single statement.

 

To verify whether the tables have actually been dropped you have two choices. A simplistic one is to write any query for the table, and you would get an error back similar to Error: no such table: proglang_tbl.

 

The other way is to get the listing of currently existing tables from the database catalog, which is a database that the DBMS internally uses to keep a track of databases, tables, and other objects that users create. Querying the catalog in SQLite for a listing of tables is extremely simple.

 

Listing existing tables in SQLite

Listing existing tables in SQLite

sqlite> .tables

proglang_tbluk

Doing the same thing in PostgreSQL is slightly longer, but it is a SELECT query on the catalog database.

 

Listing existing tables in PostgreSQL

TestDB=# SELECT table_name
FROM information_schema.tables
WHERE table_schema = 'public'
AND table_type = 'BASE TABLE';
table_name
----------------------
proglang_tbluk
(1 row)

 

If the query seems complicated to you, it is because it contains parts and syntax that we have not covered yet. But rest assured, the syntax will start making perfect sense by the end of the blog on queries. For now, we can infer that the table data in PostgreSQL is stored in the information_schema. tables catalog table.

 

Creating New Tables from Existing Tables

You might have noticed that we have dropped the proglang_tbl table, and we now have with us only the proglang_tbluk table that has all the necessary constraints and fields.

 

The latter's name was chosen when we were discussing the unique key constraint, but it now seems logical to migrate this table structure (and any corresponding data) back to the name proglang_tbl.

 

We achieve this by creating a copy of the table using a combination of both CREATE TABLE and SELECT commands and learn a new clause in the process – AS. This combination has a particularly catchy name – CTAS and was introduced in the SQL:2003 standard but not all DBMS systems implement it yet, notably Microsoft SQL Server.

 

The general syntax for creating a new table from an existing one

General syntax

CREATE TABLE <New Table>
AS
SELECT <Selection> FROM <Old Table>;
Since our proglang_tbluk contains only one record, we will push some more sample data in it so that we can later verify whether the records themselves got copied or not. Notice that we give the field names explicitly, or else the second row (which contains no standard field value) would give an error similar to:
sqlite> INSERT INTO proglang_tbluk
VALUES
(2, 'Perl', 'Wall', 1987);
Error: table proglang_tbluk has 5 columns but 4 values were supplied
in SQLite. A lot of other DBMS’s like Ingres would also not accept such a cavalier approach to inserting data. PostgresQL, however, would accept such a statement provided it could unambiguously insert the data that in this case it can due to the omitted value being the last nullable field only.
Inserting some more data into the proglang_tbluk table
INSERT INTO proglang_tbluk (id, language, author, year) VALUES (2, 'Perl', 'Wall', '1987');
INSERT INTO proglang_tbluk (id, year, standard, language, author) VALUES (3, '1964', 'ANSI', 'APL', 'Iverson');

 

To create an exact copy of the existing table, we use the same selection criteria as we have seen before - * (star). This will select all the fields from the existing table and create the new table with them along with any records. It is possible to use only a subset of fields from the old table by modifying the selection criteria and we will see this later.

 

 Re-creating a new table from an existing one

CREATE TABLE proglang_tbl

AS

SELECT * FROM proglang_tbluk;

 

If you are using SQL, you would see the prompt displaying SELECT, which gives an indicator of how many rows were selected and inserted into the new table. We now run a simple SELECT query to see whether our objective was achieved or not.

SELECT * FROM proglang_tbl;
Result of the Query Run on proglang_tbl
id language author year standard
1 Prolog Colmerauer 1972 ISO
2 Perl Wall 1987
3 APL Iverson 1964 ANSI

 

The two tables are now exactly identical but are not linked to each other in any way. If you drop any of the tables, the other one will not be affected. Similarly, inserting new data in one of them will not insert the data in the other one from now on.

 

Modifying Tables

Modifying Tables

 

After a table has been created, you can still modify its structure using the ALTER TABLE command. What we mean by modifying is that you can change field types, sizes, even add or delete columns. Not all database management systems support all operations of ALTER TABLE.

 

To get around such limitations, people frequently copy the data over to a new table that has the newly required structure. While altering a table is not an SQL command you’d use very often (hopefully!), you should be familiar with it.

 

There are some rules you have to abide by while altering a table and these are usually spelled out in detail by your particular DBMS manual. For now, we will see a simple example to modify the field author for the proglang_tbl table.

 

General syntax of a simple ALTER TABLE command

ALTER TABLE <Table name> <Operation> <Field with clauses>;

 

To keep our proglang_tbl intact, we are going to be making our changes to the old proglang_tbluk table. We want to make the author field hold a tad bit more maximum data length of 30 characters instead of 25. The operation to choose in this case is the ALTER COLUMN, which would modify our existing field.

 

Altering the author field

ALTER TABLE proglang_tbluk
ALTER COLUMN author TYPE varchar(30);
If you not using SQLite, the above query should execute quietly in PostgresQL. SQLite unfortunately does not support altering a column size but happily supports addition of new columns. So let’s add another requirement of adding a nullable column current_status to our table fields.
Adding a new current_status field
ALTER TABLE proglang_tbluk
ADD COLUMN current_status VARCHAR(32) NULL;

 

We have used the ADD COLUMN operation in this case for the ALTER TABLE command. Unsurprisingly we hope to add this new 32-character length column to our proglang_tbluk with this statement.

 

The many faces of ALTER TABLE Altering a table are one of those commands where even after three decades, there are discrepancies. For example, you already saw that the ALTER COLUMN doesn’t work in SQLite. Ingres; another, DBMS, expects you to write ALTER only, which coincidentally also works fine for PostgreSQL.

 

We choose the former to be explicit. Oracle, on the other hand, has gone a completely different way, and it uses MODIFY instead of ALTER COLUMN. 

Similarly, while adding a column, you could write only ADD <column name> in PostgreSQL or SQLite and expect it to work. 

 

However, while altering data types, PostgreSQL expects you to write TYPE between the column name and the new data type specification whereas Ingres wouldn’t expect it.

 

Showing Table Information in PostgreSQL

Table Information in PostgreSQL

If you are thinking about using the database system catalog to get table definition information to verify your ALTER TABLE results, congratulations! You are indeed correct in thinking that.

 

As before, the query might seem a little more than we can handle correctly at this point in the text, but its output is highly readable.

Viewing field information in PostgreSQL by querying the catalog
TestDB=# SELECT column_name,
data_type,
character_maximum_length FROM INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'proglang_tbluk';
column_name | data_type | character_maximum_length
---------------- + ------------------- + --------------------------
id | integer |
language | character varying | 20
author | character varying | 30
year | integer |
standard | character varying | 10
current_status | character varying | 32
(6 rows)

 

Both are changes to the fields author and current_status seem to be reflected correctly. There are a few other databases where such a query would work, but unfortunately, this is another area where a lot of DBMS implementations differ widely.

 

A PostgreSQL PSQL-specific method is to use \d+ <table name>, which gives almost the same information along with some other values by default.

>Describing the schema of the table in PSQL
TestDB=# \d+ proglang_tbluk;
Table "public.proglang_tbluk"
Column | Type | Modifiers | Storage |
--------------- + ----------------------- +----------- + ---------- +
id | integer | | plain |
language | character varying(20) | | extended |
author | character varying(30) | | extended |
year | integer | | plain |
standard | character varying(10) | | extended |
current_status | character varying(32) | | extended |
Indexes:
"proglang_tbluk_pkey" PRIMARY KEY, btree (id) "proglang_tbluk_language_key" UNIQUE CONSTRAINT, btree (language)

 

Showing Table Information in SQLite

Table Information in SQLite

As we have already discussed SQLite, as of the writing of this text, does not support modification to column sizes in a table using ALTER TABLE.

 

It does, however, allow you to add a new column, and we added the current_status field like with PostgreSQL. Let’s now verify this by looking at the table information inside the SQLite shell.

 

SQLite has its own special dot syntax commands that allow certain useful database management tasks. We have already seen the .open command used to create and open a database and .tables to list the table names. Similarly, we can use the .schema command to get table information.

 >Displaying schema information in SQLite
sqlite> .schema proglang_tbluk
CREATE TABLE proglang_tbluk (
id INTEGER NOT NULL PRIMARY KEY,
language VARCHAR(20) NOT NULL UNIQUE,
author VARCHAR(25) NOT NULL,
year INTEGER NOT NULL,
standard VARCHAR(10) NULL,
current_status VARCHAR(32) NULL);

 

Notice how the new column is added at the end while the length of the author field remains 25 characters.

 

Showing Table Information in Other DBMS’s

 Table Information in Other DBMS’s

 

If you are not practicing on either of the DBMS implementations mentioned above, there might be other ways to verify table field-level information. For example, Ingres utilizes the HELP TABLE <table name> command, which can be run on its isql shell.

 

A lot of other DBMS’s like Oracle use the DESCRIBE command to view a table definition. While the information this command shows may vary from one DBMS to another, they at least show the field name, its data type, and whether or not NULL values are allowed for the particular field. The general syntax of the command is given below.

 

 The general syntax of the DESCRIBE statement

DESCRIBE <table name>;

Writing Basic Queries

 

A query is an SQL statement that is used to extract a subset of data from your database and presents it in a readable format. As we have seen previously, the SELECT command is used to run queries in SQL. You can further add clauses to your query to get a filtered, more meaningful result.

 

The level of flexibility afforded by SQL is one of the reasons it has succeeded as a query language. While there is an entire gamut of add-ons to SELECT, in this blog we will focus on only two – ORDER BY and WHERE.

 

Database administration tasks for a well-thought-out schema are few and far between, but retrieving meaningful results using queries is something everyone does routinely.

 

Since the majority of operations on a database involve queries, it is important to understand them in detail. While this blog will only deal with queries run on a single table, you can run a SELECT operation on multiple tables in a single statement.

 

Selecting a Limited Number of Columns

The intention since the beginning of SQL was to provide an easy-to-use query system to everyday users. They should not have to reach for a programming language to make their report readable.

 

A major facility for this is the ability to display a finite set of columns in the output rather than all the fields of a table.

 

We have already seen how to extract all the data from a table when we were verifying our results in the previous blogs. But as you might have noted in some of our catalog queries – we can extract a subset of data too.

 

We first test this by limiting the number of fields to show in the query output by not specifying the * selection criteria, but by naming the fields explicitly as a comma-separated list.

Selecting a subset of fields from a table
SELECT language,
year FROM proglang_tbl;
language year
Prolog 1972
Perl 1987
APL 1964

 

You can see that the query we constructed mentioned the fields we wish to see, that is, language and year. Also, note that the result of this query is useful by itself as a report for looking at the chronology of programming language creation.

 

While this is not a rule enforced by SQL or a relational database management system, it makes sense to construct your query in such a way that the meaning is self-evident if the output is meant to be read by a human.

 

This is the reason we left out the field id in the query since it has no inherent meaning to the reader except if they wish to know the sequential order of the storage of records in the table.

 

You are free to decide the ordering of the fields in your output. The positioning of a field in a CREATE TABLE statement has no effect on any SELECT query you run on it.

 

Indeed, you are even free to duplicate a field as many times as you wish in your output. Whether it makes sense to do so is debatable! But as long as the field names in the comma-separated list to SELECT is valid, it will show up in the output.

 

Ordering the Results

Ordering the Results

You might have noticed that in our previous query output, the languages were printed out in the same order as we had inserted them. But what if we wanted to sort the results by the year the language was created in.

 

The chronological order might make more sense if we wish to view the development of programming languages through the decades. In such cases, we take the help of the ORDER BY clause. To achieve our purpose, we modify our query with this additional clause.

Usage of the ORDER BY clause
SELECT language,
year
FROM proglang_tbl ORDER BY year;
language year
APL 1964
Prolog 1972
Perl 1987

The astute reader will notice that the output of our ORDER BY clause was ascending. This is the default ordering that can be made explicit by appending the argument ASC to the column we wish to sort. To reverse this, we use the argument DESC to our 

ORDER BY clause as below.
Usage of the ORDER BY clause with the DESC argument
SELECT language,
year
FROM proglang_tbl ORDER BY year DESC;
language year
Perl 1987
Prolog 1972
APL 1964

 

Ordering is not limited to numeric fields. You can order character-­ based columns too. The sorting method is alphabetical starting with the first character and subsequently moving to the next sequential characters if the character is the same. Let us try ordering our query result by the language field this time 

 

 Usage of the ORDER BY clause with a character based column

SELECT language,
year
FROM proglang_tbl ORDER BY language;
language year
APL 1964
Perl 1987
Prolog 1972

 

Ordering Using Field Abbreviations

A useful shortcut in SQL involves ordering a query result using an integer abbreviation instead of the complete field name.

 

The abbreviations are formed starting with 1, which is given to the first field specified in the query; 2 to the second field; and so on. Let’s rewrite our query to sort the output by descending year using field abbreviations.

Ordered SELECT query in descending order using field abbreviations
SELECT language,
year
FROM proglang_tbl ORDER BY 2 DESC;
language year
Perl 1987
Prolog 1972
APL 1964

 

The 2 argument given to the ORDER BY clause signifies ordering by the second field specified in the query, namely year.

 

Over time I have realized that the best use of field abbreviations is while you are querying a database system interactively. Rarely is it a good idea to use field abbreviations if you are embedding SQL inside a programming language.

 

Ordering by Multiple Columns

Ordering by Multiple Columns

What if you wanted to order your results by more than one column? It would be a plausible scenario where some of the values of the ordering column are the same. For example, suppose you had a table having student grades and names.

 

You want to order the students by their grades, but a lot of students have gotten the Grade A. So you apply the second ordering by name, sorting alphabetically all grade A students, then grade B students and so on.

 

Let’s try to see a working example of this using our programming languages table. But for that, we need to insert a few more rows in there.

Inserting a few more languages in our table
INSERT INTO proglang_tbl
(id, language, author, year, standard)
VALUES
(4, 'JOVIAL', 'Schwartz', 1959, 'US-DOD');
INSERT INTO proglang_tbl
(id, language, author, year, standard)
VALUES
(5, 'APT', 'Ross', 1959, 'ISO');

Now let us order our programming languages table by year and language keeping in mind that our newly inserted languages have the same year of creation.

 

Combining different ordering types

SELECT language,
year
FROM proglang_tbl
ORDER BY year DESC, language ASC;
language year
Perl 1987
Prolog 1972
APL 1964
APT 1959
JOVIAL 1959

Notice how APT came before JOVIAL because we had mentioned an ascending order for the language field.

 

Putting Conditions with WHERE

Putting Conditions with WHERE

We have already seen how to select a subset of data available in a table by limiting the fields queried. We will now limit the number of records retrieved in a query using conditions.

 

The WHERE clause is used to achieve this, and it can be combined with explicit field selection or ordering clauses to provide meaningful output.

 

For a query to run successfully and fetch data from a table, it must have at least two parts – the SELECT and the FROM clause. After this, we place the optional WHERE condition and then the ordering clause.

 

Thus, if we wanted to see the programming language (and its author), which was standardized by ANSI.

 

If we let go of our from a table requirement, we can write a query with just SELECT.

Try SELECT 1 in your DBMS and see the output.
SELECT language,
author
FROM proglang_tbl
WHERE standard = 'ANSI';
language author
APL Iverson

 

As you may have noticed, the query we formulated specified the language and author fields, but the condition was imposed on a separate field altogether – standard. Thus we can safely say that while we can choose what columns to display, our conditionals can work on a record with any of its fields.

 

You are by no means restricted to use = (equals) for your conditions. It is perfectly acceptable to choose other operators like < and >. You can also include the ORDER BY clause and sort your output. An example is given below.

 

Combining the WHERE and ORDER BY

Combining the WHERE and ORDER BY

SELECT language,
author,
year
FROM proglang_tbl
WHERE year > 1970
ORDER BY author;
language author year
Prolog Colmerauer 1972
Perl Wall 1987

Notice that the output only shows programming languages developed after 1970 (at least according to our database). Also since the ordering is done by a varchar field, the sorting is done alphabetically in an ascending order.

 

Combining Conditions

If we can only specify one condition using the WHERE clause, it will fulfill only a tiny fraction of real-world requirements. We can, however, construct complex conditions using the boolean operators AND and OR.

When we want our resultset to satisfy all of the multiple conditions, we use the AND operator.

 

Using the AND operator to combine conditions
SELECT language,
author,
year
FROM proglang_tbl
WHERE year > 1970 AND standard IS NULL;
language author year
Perl Wall 1987
We have now combined the two conditions, meaning any row in the resultset must satisfy both the criteria mentioned. In our case, there is only one such row – Perl.

An interesting point to note is our construction of the second conditional. We specify that the standard field should be a null value by specifying IS NULL. This is not the same as saying standard = NULL. If we attempt to write the latter as our conditional, we would get an empty result.

 

While this may seem counterintuitive, it actually makes perfect sense. A null is supposed to signify undefined values, not a precise value like infinity or 0 or even a complex number. We cannot rationalize the precise equivalence operator = for a null, and thus SQL interpreters use the IS NULL comparison.

 

If we want our resultset to satisfy any one of our conditions, we use the OR operator. Let’s use this operator in the above example but with a different intention. We want all languages that were either created after 1970 or don’t have a standardizing body.

Using the OR operator
SELECT language,
author,
year
FROM proglang_tbl
WHERE year > 1970 OR standard IS NULL;
language author year
Prolog Colmerauer 1972
Perl Wall 1987

 

Prolog only satisfies the first criterion that it was created after 1970 but was actually standardized by ISO. Perl satisfies both criteria and is also rightly shown. If we had a language in our table without a standardizing body but created before 1970, it would also sneak up on the resultset here. 

 

We can even create yet more complex queries by combining the AND and OR operators. One has to be careful to not make the logic of the filtering using these operators complex or unreadable.

 

Manipulating Data

Manipulating Data

In this blog, we study the Data Manipulation Language (DML) part of SQL that is used to make changes to the data inside a relational database. The three basic commands of DML are as follows.

 

INSERT Populates tables with new data UPDATE Updates existing data DELETE Deletes data from tables

We have already seen a few examples on the INSERT statement including simple inserts, selective field insertions, and null value inserts. Thus we will concentrate on other ways to use this statement.

 

Inserting Data into a Table from Another Table

You can insert new records into a table from another one by using a combination of INSERT and SELECT. This is pretty close to the way we combined CREATE TABLE and SELECT to create a new table with rows from another table.

 

Since a query would return you some records, combining it with an insertion command would enter these records into the new table. You can even use a WHERE conditional to limit or filter the records you wish to enter into the new table.

 

We will now create a new table called stdlang_tbl, which will have only two fields – language and standard. In this, we would insert rows from the proglang_tbl table that have a non-null value in the standard field. This will also demonstrate our first use of a boolean operator – NOT.

 

 Using INSERT and SELECT to conditionally load data into another table

 Using INSERT and SELECT

CREATE TABLE stdlang_tbl
(language varchar(20),
standard varchar (10));
INSERT INTO stdlang_tbl
SELECT language,
standard
FROM proglang_tbl
WHERE standard IS NOT NULL;

 

Note that we had to create the table separately in this case and then insert data into it using INSERT and SELECT. The NOT inverts the IS NULL test, that is, if something is a null value instead of normally returning a true, the NOT makes the conditional return a false.

 

NOT, NULL, True and False If we consider boolean logic principles, the closest analog to NULL is the value false. However NOT inverts a boolean value: true becomes false, and false becomes true.

 

Testing a value with IS NULL is an SQL comparison, returning a true for every null value. This is what we have inverted in the example above using NOT. Don’t confuse the meaning of this comparison with what boolean value a NULL closely represents.

 

When you view the contents of this table, you will notice that it has picked up the languages that actually had a standard column value.

Contents of Our Newly Created stdlang_tbl table
language standard
Prolog ISO
APL ANSI
JOVIAL US-DOD
APT ISO

 

The data is populated by INSERT INTO and SELECT must adhere to the constraints defined during table creation.

 

If our stdlang_tbl had defined language as its primary key, our insert statements would run fine unless we encountered a duplicate language value. This is not a problem in our case currently since the languages are themselves unique.

 

An altered definition of stdlang_tbl with the primary key

DROP TABLE stdlang_tbl;
CREATE TABLE stdlang_tbl
(language varchar(20) PRIMARY KEY,
standard varchar (10));
INSERT INTO stdlang_tbl
SELECT language,
standard
FROM proglang_tbl
WHERE standard IS NOT NULL;

 

What would happen if we somehow violated the constraints? For example, let us go about creating a new table standardizing_bodies that contain only one field – name. The only constraint on this is UNIQUE. We already know that both Prolog and APT from our proglang_tbl were standardized by ISO. Let’s try to simulate this using code.

Violating the UNIQUE constraint while INSERT INTO … SELECT
CREATE TABLE standardizing_bodies ( name varchar(10) UNIQUE );
INSERT INTO standardizing_bodies
SELECT standard FROM proglang_tbl
WHERE standard IS NOT NULL;
ERROR: duplicate key value violates unique constraint "standardizing_bodies_name_key"
DETAIL: Key (name)=(ISO) already exists.

 

Note that the contents of this new table standardizing_bodies will be empty. Our INSERT operation was a single statement, not a collection of unique inserts. Thus when the constraint was violated, no data was inserted.

 

Updating Existing Data

Updating Existing Data

To modify some data in a record, we use the UPDATE command. While it cannot add or delete records (those responsibilities are delegated to other commands), if a record exists it can modify its data even affecting multiple fields in one go and applying conditions. The general syntax of an UPDATE statement is given below

 

General syntax of the UPDATE command

UPDATE <table_name> SET
<column1> = <value>,
<column2> = <value>,
<column3> = <value>
. . .
WHERE <condition>;

 

Let us now return to our proglang_tbl table and add a new row about the Forth and Tcl programming languages.

 Populating some more data in our programming languages table

INSERT INTO proglang_tbl
(id, language, author, year, standard)
VALUES
(6, 'Forth', 'Moore', 1973, NULL);
INSERT INTO proglang_tbl
(id, language, author, year, standard)
VALUES
(7, 'Tcl', 'Ousterhout', 1988, NULL);

 

What if we suddenly wanted to add 10 years to each language’s creation year? Since we want to apply the UPDATE logic to every row, we can forego the search conditions.

 

 Running an UPDATE on all rows of a table

UPDATE proglang_tbl SET

year = year + 10;

 

This query would increase all language creation years by 10. There is no ambiguity here since the right-hand side year + 10 is calculated first and then assigned to the year field. This happens for all rows. To get back to our original dates, simply run the same query with the SET column as year = year – 10.

 

We later realize that the Forth language was created near 1972 (instead of 1973), and it actually has been standardized in 1994 by the ANSI. Thus we now go about correcting our mistakes by writing our update queries to reflect this data.

 

We should note that we must include a search condition for the Forth language only.

Updating the Forth language details
UPDATE proglang_tbl SET
year = 1972
WHERE language = 'Forth';
UPDATE proglang_tbl SET
standard = 'ANSI'
WHERE language = 'Forth';
We could have easily combined updating the multiple fields in a single statement, thus saving the DBMS engine the trouble to find the row again (Listing 6-8).
Updating multiple fields in a single statement
UPDATE proglang_tbl SET
year = 1972,
standard = 'ANSI'
WHERE language = 'Forth';

 

If you've typed the statement correctly and no errors are thrown back, the contents of the record in question would have been modified as intended. Verifying the result of the same involves a simple query the likes of which we have seen in previous examples.

 

Deleting Data from Tables

Deleting Data from Tables

You can use the DELETE command to delete records from a table. This means that you can choose which records you want to delete based on a condition or delete all records, but you cannot delete certain fields of a record using this statement. The general syntax of the DELETE statement is given below.

 

The general syntax of DELETE

DELETE FROM <table_name>

WHERE <condition>;

While putting a conditional clause in the DELETE is optional, it is almost always used – simply because not using it would cause all the records to be deleted from a table, which is a rarely valid need. Luckily, we have a spare table stdlang_tbl that is not needed anymore, so let’s try deleting all rows from it.

 

 Deleting all records from a table

DELETE FROM stdlang_tbl;

If we try to verify the contents of this table, we’d get no data rows back.

Only the column headers would be visible.

language standard

 

We now write the full statement to delete the record corresponding to Forth from the table. Again, we will have to include the search condition in the WHERE clause.

proglang_tbl contents After the Record Deletion
id language author year standard
1 Prolog Colmerauer 1972 ISO
2 Perl Wall 1987
3 APL Iverson 1964 ANSI
4 JOVIAL Schwartz 1959 US-DOD
5 APT Ross 1959 ISO
7 Tcl Ousterhout 1988

 

Deleting a record from the proglang_tbl table

DELETE FROM proglang_tbl WHERE language = 'Forth';

 

You should always be careful about the WHERE clauses you put on a DELETE statement. They should never be too broad, lest you end up deleting more data than you intended.

 

Organizing Your Data

Since this is a text meant to teach SQL to people unfamiliar with it, our data has been very simplistic. The number of fields you’d wish to store in your database would be a larger value than the five-column table we saw in earlier blogs.

 

Also, some assumptions were made intrinsically on the kind of data we will store in the table. But this is not always the case in real life.

In reality, the data we encounter will be complex, even redundant. This is where the study of data modeling techniques and database design come in.

 

While it is advised that the reader refers to a more comprehensive treatise on this subject, nonetheless we will try to study some good relational database design principles since the study would come in handy while learning SQL statements for multiple tables.

 

Normalization

Let us suppose we have a database of employees in a fictional institution as given below. If the database structure has not been modeled but has been extracted from a raw collection of information available, redundancy is expected.

 The Fictional Firm’s Employee Data

employee_id name skill manager_id location
  • 1 Socrates Philosophy (null) Greece
  • 2 Plato Writing 1 Greece
  • 3 Aristotle Science 2 Greece
  • 4 Descartes Philosophy (null) France
  • 4 Descartes Philosophy (null) the Netherlands

 

We can see that Descartes has two rows because he spent his life in both France and the Netherlands. Doesn’t seem very elegant, does it? Now if at a later point we decide that we wish to classify him with a different skill, we would have to update both his rows since they should contain an identical (primary) skill.

 

Wouldn’t it be saner to have a separate table for skills and somehow allow the records that share the same skill to refer to this table? This way if we wish to reflect that both Socrates and Descartes were thinkers in Western Philosophy, renaming the skill record in the second table would do the trick.

 

This process of breaking down a raw database into logical tables and removing redundancies is called Normalization. There are even levels of normalization called normal forms that dictate how to achieve the desired design.

 

There are five accepted normal forms that serious database administrators and developers are familiar with. They range from first normal form 1NF to fifth normal form 5NF These forms are progressive in nature, meaning that a design in 3NF is also 1NF and 2NF compliant.

 

Since the origin of these forms is based on academic research, the working developers usually restrict themselves to 3NF or 4NF in most cases.

 

Again, we advise the reader to refer to a more comprehensive text dealing with database design and normalization. We will do mere lip service in exploring these vast fields. For now, let’s turn to our programming languages data to see the need for normalization playing out.

 

Repeating Groups

Another simple (but ultimately wrong) approach that comes to mind is to split the author field into two parts – author1 and author2. If a language has only one author, the author2 field would contain a null value. Can you spot the problem that will arise from this design decision?

A Table with a Repeating Group
id language author1 author2 year standard
1 Prolog Colmerauer (null) 1972 ISO
2 Perl Wall (null) 1987 (null)
3 APL Iverson (null) 1964 ANSI
4 Tcl Ousterhout (null) 1988 (nul)
5 BASIC Kemeny Kurtz 1964 ANSI

 

This imposes an artificial constraint on how many authors a language can have. It seems to work fine for a couple of them, but what if a programming language was designed by a committee of a dozen or more people, and we did want to include all of them in the credits? At the database design time, how do we fix the number of authors we wish to support?

 

This kind of design is referred to as a repeating group and must be actively avoided. This also has an ugly effect of having too many null values in some of the fields, a first sign of bad database design.

 

Splitting the Table

Splitting the Table

Our first stab at table design lumps the languages and authors together. It is natural to think that way because our understanding of the data at first glance views all the fields as a logical whole. All the data to us belongs to the programming languages, the entity being described.

 

But as we have seen above, the authors of the languages seem to be a distinct entity in our data. We have not even begun to capture multiple languages by the same author, and already we feel a pressing need to distinguish between languages and authors as entities.

 

The correct design to remove the problems listed above is to split the table into two – one holding the author details and one detailing the language.

A Table Holding Programming Language Details
id language year standard
1 Prolog 1972 ISO
2 Perl 1987 (null)
3 APL 1964 ANSI
4 Tcl 1988 (nul)
5 BASIC 1964 ANSI
A Table Holding Author Details
author_id author language_id
1 Colmerauer 1
2 Wall 2
3 Ousterhout 4
4 Iverson 3
5 Kemeny 5
6 Kurtz 5

 

Once you have removed the non-atomicity of fields and repeating groups along with assigning unique id’s to your tables, your table structure is now in the first normal form. The author table’s language_id field, which refers to the id field of the language table, is called a foreign key constraint.

Creating the new tables
CREATE TABLE newlang_tbl
(id INTEGER NOT NULL PRIMARY KEY,
language VARCHAR(20) NOT NULL,
year INTEGER NOT NULL,
standard VARCHAR(10) NULL);
CREATE TABLE authors_tbl
(author_id INTEGER NOT NULL,
author VARCHAR(25) NOT NULL,
language_id INTEGER REFERENCES newlang_tbl(id));

 

Notice that in the author’s table we’ve made a foreign key constraint by making the language_id field reference the id field of the new programming languages table using the keyword REFERENCES.

 

You can only create a foreign key reference as a primary or unique key; otherwise, during the constraint creation time, we would receive an error similar to the following.

 

ERROR: there is no unique constraint matching given keys for referenced table "newlang_tbl"

Since we have created a reference to the language_id, inserting a row in the author’s table that does not yet have a language entry would also result in an error, called a Referential Integrity violation.

A referential integrity violation
INSERT INTO authors_tbl
(author_id, author, language_id)
VALUES
(5, 'Kemeny', 5)
ERROR: insert or update on table "authors_tbl" violates foreign key constraint "authors_tbl_language_id_fkey"
DETAIL: Key (language_id)=(5) is not present in table "newlang_tbl".
However, when done sequentially, that is, the language first and then its corresponding entry in the author table, everything works out.
Making entries for BASIC in both the tables
INSERT INTO newlang_tbl
(id, language, year, standard)
VALUES
(5, 'BASIC', 1964, 'ANSI');
INSERT INTO authors_tbl
(author_id, author, language_id)
VALUES
(5, 'Kemeny', 5);

 

Referential Integrity in SQLite If you tried to run Listing in SQLite, you wouldn’t get an error back despite there being no language_id 5 in the newlang_tbl. SQLite, by default, turns off referential integrity checking for backward compatibility reasons. To turn it on for your database, run the following pragma in its command shell.

PRAGMA foreign_keys = ON;

If you now violate referential integrity, we would get a familiar Error: FOREIGN KEY constraint failed error message.

 

Referential integrity is a key benefit of good relational database design. Since it applies to related entities, it ensures that the values of these remain in sync. In our example above, this constraint made sure that we never have an author’s data whose created programming language is not captured in the languages table.

 

When designing databases to solve a business problem, deciding how referential integrity comes into play is a big decision.

 

This is done mainly in discussion with domain experts who understand the business logic of the entities data you are trying to capture. The other statements to get fully populated tables are given below.

 

Fully populating the newly created tables

INSERT INTO newlang_tbl
(id, language, year, standard)
VALUES
(1, 'Prolog', 1972, 'ISO');
INSERT INTO newlang_tbl
(id, language, year)
VALUES
(2, 'Perl', 1987);
INSERT INTO newlang_tbl
(id, language, year, standard)
VALUES
(3, 'APL', 1964, 'ANSI');
INSERT INTO newlang_tbl
(id, language, year)
VALUES
(4, 'Tcl', 1988);
INSERT INTO authors_tbl
(author_id, author, language_id)
VALUES (6, 'Kurtz', 5);
INSERT INTO authors_tbl
(author_id, author, language_id)
VALUES (1, 'Colmerauer', 1);
INSERT INTO authors_tbl
(author_id, author, language_id)
VALUES (2, 'Wall', 2);
INSERT INTO authors_tbl
(author_id, author, language_id)
VALUES (3, 'Ousterhout', 4);
INSERT INTO authors_tbl
(author_id, author, language_id)
VALUES (4, 'Iverson', 3);

The Pursuit of Normalization The man who created the Relational Model and in turn normalization – Dr. Codd – was an academic genius. While he was working at IBM at the time, decidedly non-academic, the whole thing has a whiff of mathematical purity about it.

 

But the cost of removing redundancies in data is speed. While there may be many advanced levels of normal forms, 1NF-5NF and the lesser-known Boyce-Codd Normal Form, we must not be too pedantic about pursuing the higher normal form. Common sense must prevail in the head of the database designer.

 

In some cases, denormalization does have the benefit of faster access. Indeed, many in-vogue NoSQL database systems out redundant data storage as a feature. This obviously comes at the cost of consistency of truth.

 

But then again, when our seemingly random clicks on the Web are captured and analyzed to decide our most suitable insurance provider, perhaps a loss of truth is acceptable.

 

Doing More with Queries

Doing More with Queries

SQL as a language was created for the end users of the database systems. It just happened to be used by programmers too, but the goal was always a simple, declarative, English-like language to allow anybody familiar with computers and the domain to make sensible reports out of the database system.

 

These reporting capabilities were the direct output of an SQL query, and thus from the very beginning, there have been a lot of options and clauses that can be used with SELECT to make the output more legible. 

 

We have already seen some basic queries, how to order the results of a query, and how to put conditions on the query output. Let us now see more examples of how we can modify our SELECT statements to suit our ever-growing reporting needs.

 

Query to count the number of records in the table

SELECT COUNT(*) FROM proglang_tbl;

The output returned will be a single record with a single field with the value as 6. The function COUNT took one argument, that is, what to count and we provided it with * that means the entire record. Thus we achieved our purpose of counting records in a table.

 

What would happen if instead of giving an entire record to count, we explicitly specify a column? And what if the column had null values? Let's see this scenario by counting on the standard field of the table.

 

Query to count the number of standard field values in the table

SELECT COUNT(standard) FROM proglang_tbl;

If you guessed the output of this query as the value 4, you are correct. Out of the six rows, two records contain null as their standard value leaving out four languages with a standard.

 

Using DISTINCT with COUNT

Using DISTINCT with COUNT

The astute reader might have noticed that the number of standardized languages was computed by counting the number of non-null standard values. However, the resultset contained a duplicate standards body value – ISO for both APT and Prolog.

 

Sometimes it is useful to be able to leave out such duplicates. The DISTINCT clause allows us to utilize only non-duplicated values of the input specified and is commonly used in conjunction with COUNT. Before seeing it in action, let’s add another row to our table so that the results of using DISTINCT jump out clearly.

 

Inserting a new row in our programming languages table

INSERT INTO proglang_tbl
(id, language, author, year, standard)
VALUES
(6, 'PL/I', 'IBM', 1964, 'ECMA');
Note the new data choice that we are populating with our new row. With PL/I we now have a fourth distinctive standards organization – ECMA. PL/I also shares the same birth year as APL (1964) giving us a duplicate year field. Now let us run a query to check what distinct year values we have in the table.
Distinct year values in the table
SELECT DISTINCT year
FROM proglang_tbl;
Year
1972
1988
1987
1964
1959

 

Both 1964 and 1959 make an appearance only once as we desired. A common use case for DISTINCT is to combine it with the COUNT function to output the number of unique values we have in the table. Attempting the same for year, we get our expected result of 5.

Counting distinct year values
SELECT COUNT (DISTINCT year)
FROM proglang_tbl;
> 5
Using DISTINCT on the standard field has a slightly different output than we might expect at first guess.
Listing distinct standard values
SELECT DISTINCT standard
FROM proglang_tbl;
standard
(null)
ECMA
ANSI
ISO
US-DOD

 

We actually get five rows in our output including the null value because, for the DISTINCT clause, it is a uniquely separate value. Combining this with COUNT removes the significance of the null row giving us the value 4.

SELECT COUNT (DISTINCT standard)
FROM proglang_tbl;
> 4
Column Aliases

 

Queries are frequently consumed directly as reports since SQL provides enough functionality to give a meaningful representation to the data stored inside an RDBMS. One of the features allowing this is Column Aliases, which let you rename column headings in the resultant output. 

SELECT <column1> <alias1>,

<column2> <alias2>,

...

FROM <table>;

 

For example, we wish to output our programming languages table with a few columns only. But we do not wish to call the authors of the language as authors. The person wanting the report wishes they be called creators. This can be simply done by using the query below.

SELECT id,
language,
author creator
FROM proglang_tbl;
id language creator
1 Prolog Colmerauer
2 Perl Wall
3 APL Iverson
4 JOVIAL Schwartz
5 APT Ross
7 Tcl Ousterhout
6 PL/I IBM

 

While creating a column alias will not permanently rename a field, it will show up in the resultant output. Implementations differ on whether they allow column aliases to be used in other parts of the query other than column listing. For example, let’s try using the column alias creator in the WHERE clause of a query in PostgreSQL.

Using column aliases in the WHERE clause in PostgreSQL
SELECT id,
language,
author creator
FROM proglang_tbl WHERE creator = 'Ross';
ERROR: column "creator" does not exist
LINE 4: FROM proglang_tbl WHERE creator = 'Ross';
Aha, PostgreSQL explicitly told us that this is a no-go. Let’s see if SQLite is slightly more forgiving.

 

Using column aliases in the WHERE clause in SQLite
sqlite> SELECT id,
language,
author creator
FROM proglang_tbl
WHERE creator='Ross';
id language creator
---------- ---------- ----------
5 APT Ross

 

While SQLite did allow it, I’m not really fond of using column aliases in anything other than column renaming for the output. I’d advise that you do the same unless there is a very strong case of readability improvement and your implementation allows it (and there are very few that allow it).

 

Order of Execution of SELECT Queries

A query is not evaluated from left to right; there is a specific sequence in which its various parts are evaluated as given below.

\ 1.\ FROM clause \ 2.\ WHERE clause
\ 3.\ GROUP BY clause \ 4.\ HAVING clause
\ 5.\ SELECT clause
\ 6.\ ORDER BY clause

 

There is an interesting corollary of having the SELECT evaluation being lower than the WHERE clause. Can you guess what it is?

 

It is the inability of database management systems like PostgreSQL, DB2, and Microsoft SQL Server to use column aliases in the WHERE clause. Until the point the query execution is on the filtering stage using the conditions provided, it has still not resolved the column aliases of the query.

 

Let’s test this by running a query in PostgreSQL where we use the column alias in the ORDER BY clause, the only one with a lower precedence.

 

Listing  Using column aliases in the ORDER BY clause

testdb=# SELECT id,
language,
author creator
FROM proglang_tbl
ORDER BY creator;
id | language | creator
----+ ---------- +------------
1 | Prolog | Colmerauer
6 | PL/I | IBM
3 | APL | Iverson
7 | Tcl | Ousterhout
5 | APT | Ross
4 | JOVIAL | Schwartz
2 | Perl | Wall
(7 rows)

Our reasoning was rewarded with the correct output, not that I would change my advice of using column aliases only with proper thought.

Recommend