How PostgreSQL Monitor

Dr.MohitBansal Profile Pic
Published Date:26-10-2017
Your Website URL(Optional)
Monitoring and Diagnosis In this chapter, we will cover the following recipes: f Checking whether a user is connected f Checking which queries are running f Checking which queries are active or blocked f Knowing who is blocking a query f Killing a specic session fi f Detecting an in-doubt prepared transaction f Knowing whether anybody is using a specific table f Knowing when a table was last used f Usage of disk space by temporary data f Understanding why queries slow down f Investigating and reporting a bug f Producing a daily summary of log file errors f Analyzing the real-time performance of your queries 257Monitoring and Diagnosis Introduction In this chapter, you will find recipes for some common monitoring and diagnosis actions you would want to do inside your database. They are meant to answer specific questions that you often face when using PostgreSQL. Databases are not isolated entities. They live on computer hardware using CPUs, RAM, and disk subsystems. Users access databases using networks. Depending on the setup, databases themselves may need network resources to function in any of the following ways: performing some authentication checks when users log in, using disks that are mounted over the network (not generally recommended), or making remote function calls to other databases. This means that monitoring only the database is not enough. As a minimum, one should also monitor everything directly involved in using the database. This means knowing the following: f Is the database host available? Does it accept connections? f How much of the network bandwidth is in use? Have there been network interruptions and dropped connections? f Is there enough RAM available for the most common tasks? How much of it is left? f Is there enough disk space available? When will it run out of disk space? f Is the disk subsystem keeping up? How much more load can it take? f Can the CPU keep up with the load? How many spare idle cycles do the CPUs have? f Are other network services the database access depends on (if any) available? For example, if you use Kerberos for authentication, you need to monitor it as well. f How many context switches are happening when the database is running? f For most of these things, you are interested in history; that is, how have things evolved? Was everything mostly the same yesterday or last week? f When did the disk usage start changing rapidly? f For any larger installation, you probably have something already in place to monitor the health of your hosts and network. 258Chapter 8 The two aspects of monitoring are collecting historical data to see how things have evolved and getting alerts when things go seriously wrong. Tools based on Round Robin Database Tool (RRDtool) such as Cacti and Munin are quite popular for collecting the historical information on all aspects of the servers and presenting this information in an easy-to-follow graphical form. Seeing several statistics on the same timescale can really help when trying to figure out why the system is behaving the way it is. Another popular open source solution is Ganglia, a distributed monitoring solution particularly suitable for environments with several servers and in multiple locations. Another aspect of monitoring is getting alerts when something goes really wrong and needs (immediate) attention. For alerting, one of the most widely used tools is Nagios, with its fork (Icinga) being an emerging solution. The aforementioned trending tools can integrate with Nagios. However, if you need a solution for both the alerting and trending aspects of a monitoring tool, you might want to look into Zabbix. Then, of course, there is Simple Network Management Protocol (SNMP), which is supported by a wide array of commercial monitoring solutions. Basic support for monitoring PostgreSQL through SNMP is found in pgsnmpd. This project does not seem very active though. However, you can find more information about pgsnmpd and download it from http://pgsnmpd. Providing PostgreSQL information to monitoring tools Historical monitoring information is best to use when all of it is available from the same place and at the same timescale. Most monitoring systems are designed for generic purposes, while allowing application and system developers to integrate their specific checks with the monitoring infrastructure. This is possible through a plugin architecture. Adding new kinds of data inputs to them means installing a plugin. Sometimes, you may need to write or develop this plugin, but writing a plugin for something such as Cacti is easy. You just have to write a script that outputs monitored values in simple text format. In most common scenarios, the monitoring system is centralized and data is collected directly (and remotely) by the system itself or through some distributed components that are responsible for sending the observed metrics back to the main node. 259Monitoring and Diagnosis As far as PostgreSQL is concerned, some useful things to include in graphs are the number of connections, disk usage, number of queries, number of WAL files, most numbers from pg_stat_user_tables andpg_stat_user_indexes, and so on, as shown here: An example of a dashboard in Cacti The preceding Cacti screenshot includes data for CPU, disk, and network usage; pgbouncer connection pooler; and the number of PostgreSQL client connections. As you can see, they are nicely correlated. One Swiss Army knife script, which can be used from both Cacti and Nagios/Icinga, is check_postgres. It is available at It has ready-made reporting actions for a large array of things worth monitoring in PostgreSQL. For Munin, there are some PostgreSQL plugins available at the Munin plugin repository at postgresql. 260Chapter 8 The following screenshot shows a Munin graph about PostgreSQL buffer cache hits for a specific database, where cache hits (blue line) dominate reads from the disk (green line): Finding more information about generic monitoring tools Setting up the tools themselves is a larger topic, and it is beyond the scope of this book. In fact, each of these tools has more than one book written about them. The basic setup information and the tools themselves can be found at the following URLs: f RRDtool: f Cacti: f Ganglia: f Icinga: f Munin: f Nagios: f Zabbix: 261Monitoring and Diagnosis Real-time viewing using pgAdmin You can also use pgAdmin to get a quick view of what is going on in the database. For better control, you need to install theadminpack extension in the destination database, by issuing this command: CREATE EXTENSION adminpack; This extension is a part of the additionally supplied modules of PostgreSQL (aka contrib). It provides several administration functions that PgAdmin (and other tools) can use in order to manage, control, and monitor a Postgres server from a remote location. Once you have installedadminpack, connect to the database and then go to Tools Server Status. This will open a window similar to what is shown in the following screenshot, reporting locks and running transactions: Downloading the example code You can download the example code files from your account at for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit and register to have the files e-mailed directly to you. 262Chapter 8 Checking whether a user is connected Here, we will show you how to learn whether a certain database user is currently connected to the database. Getting ready Make sure that you are logged in as a superuser. How to do it… Issue the following query to see whether thebob user is connected: SELECT datname FROM pg_stat_activity WHERE usename = 'bob'; If this query returns any rows, then it meansbob is connected to the database. The returned value is the name of the database to which the user is connected. How it works… PostgreSQL'spg_stat_activity system view keeps track of all running PostgreSQL backends. This includes information such as the query that is being currently executed (or the last query that was executed by a backend—available from 9.2); who is connected; when the connection, the transaction, and/or the query were started; and so on. There's more… If you've managed different versions of PostgreSQL, you may be aware that the pg_stat_activity view has undergone some important changes in PostgreSQL 9, especially from version 9.2 onwards. I hereby include a list of all the relevant changes that have occurred since 9.0: f PostgreSQL 9.1 has introduced theclient_hostname column (reporting the reverse DNS lookup ofclient_addr for connections over the IP) f In PostgreSQL 9.2, theprocpid column was renamedpid (identifier of the backend process) f In PostgreSQL 9.2, thecurrent_query field was renamed query, and now reports the statement that is being executed or the statement that was last executed by that backend 263Monitoring and Diagnosis f PostgreSQL 9.2 introduced thestate andstate_change columns, reporting the state of the backend (for exampleactive,idle, oridle in transaction) and when the change of state occurred, respectively. f Thestate column allows PostgreSQL database administrators to analyze queries that are being executed at that very moment (where state isactive), as well as the most recent query executed by a backend, which isidle oridle in transaction f In PostgreSQL 9.4, there are two more columns:backend_xid andbackend_xmin, reporting the top-level transaction identifier and the xmin horizon of the current backend, respectively I strongly advise you to spend a few minutes reading the PostgreSQL documentation that contains more detailed information aboutpg_stat_activity, available athttp://www. ACTIVITY-VIEW. You can find answers to many administration-related questions by analyzing the pg_stat_activity view. One common example is outlined in the following section. What if I want to know whether that computer is connected? Often, several different processes may connect as the same database user. In that case, you may actually want to know whether there is a connection from a specific computer. You still can get this information from thepg_stat_activity view, as it includes the connected clients' IP address, port, and hostname (where applicable). The port is only needed if you have more than one connection from the same client computer and you need to do further digging to see which process there connects to which database. Run the following command: SELECT datname, usename, client_addr, client_port, application_name FROM pg_stat_activity; Theclient_addr andclient_port parameters help you look up the exact computer and even the process on that computer that has connected to the specific database. From version 9.1 onwards, you can also retrieve the hostname of the remote computer through the client_hostname option (this requireslog_hostname to be set toon). Finally, I would always recommend includingapplication_name in your reports. This field has been introduced in PostgreSQL 9.0, and it is becoming widely recognized and honored by third-party application developers (I advise you to do the same with your own applications). For information on how to set the application name for your connections, refer to Database Connection Control Functions in the PostgreSQL documentation athttp://www. 264Chapter 8 What if I want to repeatedly execute a query in psql? PostgreSQL 9.3 introduces the\watch meta-command, which allows psql users to automatically (and continuously) re-execute a query. This behavior is similar to thewatch utility of some Linux and Unix environments. In the following example, we run a simple query onpg_stat_activity and ask psql to repeat it every 5 seconds. You can exit at any time by pressing Ctrl + C: gabriele= SELECT count() FROM pg_stat_activity; count - 1 (1 row) gabriele= \watch 5 Watch every 5s Tue Aug 27 21:47:24 2013 count - 1 (1 row) … snip … For further information about the psql utility, refer to the PostgreSQL documentation at Checking which queries are running Here, we will show you how to check which query is currently running. Getting ready You have to make sure that you are logged in as a superuser or as the same database user you want to check. Also, ensure that the parametertrack_activities = on is set (default behavior). 265Monitoring and Diagnosis This can be done either in thepostgresql.conf file or by the superuser, using the following SQL statement: SET track_activities = on The way PostgreSQL allows a user to change configuration parameters at runtime is very powerful. You are advised to look at the reference page at docs/current/static/sql-set.html. How to do it… To see which connected users are running at this moment, just run the following: SELECT datname, usename, state, query FROM pg_stat_activity; On systems with a lot of users, you may notice that the majority of backends havestate set toidle. This denotes that no query is actually running, and PostgreSQL is waiting for new commands from the user. Thequery field shows the statement that was last executed by that particular backend. Users of PostgreSQL 9.0 and 9.1 must use a different SQL statement, due to the aforementioned changes to thepg_stat_activity view: SELECT datname, usename, current_query FROM pg_stat_activity; In this case, idle backends are identified by the special query value of IDLE forcurrent_ query. If, on the other hand, you are interested in active queries only, limit your selection to only those records that havestate set toactive: SELECT datname, usename, state, query FROM pg_stat_activity WHERE state = 'active'; Again, PostgreSQL 9.0 and 9.1 users must use a different SQL statement and exclude the idle backends from their result set: SELECT datname, usename, current_query FROM pg_stat_activity WHERE current_query = 'IDLE'; 266Chapter 8 How it works… Whentrack_activities = on is set, PostgreSQL collects data about all running queries. Users with sufficient rights can then view this data using the pg_stat_activity system view. Thepg_stat_activity view uses a system function namedpg_stat_get_activity (procpid int). You can use this function directly to watch for the activity of a specific backend by supplying the process ID as an argument. GivingNULL as an argument returns information for all backends. There's more… Sometimes, you wouldn't care about getting all queries that are currently running. You may be only interested in seeing some of these, or you may not like to connect to the database just to see what is running. Catching queries which only run for a few milliseconds As most queries on modern online transaction processing (OLTP) systems take only a few milliseconds to run, it is often hard to catch the active ones when simply probing the pg_stat_activity table. Most likely, you will be able to see only the last executed query for those backends that have state different fromactive. In some cases, this can be enough. Remember that if you are using PostgreSQL 9.0 or 9.1, you cannot benet fi from the information regarding the last executed query of a backend through thepg_stat_activity view. In general, if you need to perform a deeper analysis, I strongly recommend installing and configuring the pg_stat_statements module, which is described in the Analyzing the real-time performance of your queries recipe of this chapter. Another option is to run a post analysis of log files using pgBadger. Depending on the workload of your system, you may want to limit the production of highly granular log files (that is, log all queries) to a short period of time. For further information on PgBadger, refer to the Producing a daily summary of log file errors recipe of this chapter. Watching the longest queries Another thing of interest that you may want to look for is long-running queries. To get a list of running queries ordered by how long they have been executing, use the following: SELECT current_timestamp - query_start AS runtime, 267Monitoring and Diagnosis datname, usename, query FROM pg_stat_activity WHERE state = 'active' ORDER BY 1 DESC; This will return currently running queries, with the longest running queries in the front. If the version is older than PostgreSQL 9.2, users will have to use a slightly different query: SELECT current_timestamp - query_start AS runtime, datname, usename, current_query FROM pg_stat_activity WHERE current_query = 'IDLE' ORDER BY 1 DESC; On busy systems, you may want to limit the set of queries returned to only the first few queries (addLIMIT 10 at the end) or only the queries that have been running over a certain period of time. For example, to get the list of queries that have been running for more than a minute, use this query: SELECT current_timestamp - query_start AS runtime, datname, usename, query FROM pg_stat_activity WHERE state = 'active' AND current_timestamp - query_start '1 min' ORDER BY 1 DESC; Watching queries from ps If you want, you can also make the queries being run show up in process titles by setting the following in thepostgresql.conf file: update_process_title = on Although theps andtop output are not the best places for watching the database queries, they may make sense in some circumstances. 268Chapter 8 See also The page in PostgreSQL's online documentation, which covers the related settings, is available at config-statistics.html. Checking which queries are active or blocked Here, we will show you how to know whether a query is actually running or it is waiting for another query. Getting ready Again, log in as a superuser. How to do it… Run this query: SELECT datname, usename, query FROM pg_stat_activity WHERE waiting = true; You will get a list of queries that are waiting on other backends. The following query will run on PostgreSQL versions older than 9.2: SELECT datname, usename, current_query FROM pg_stat_activity WHERE waiting = true; How it works… Thepg_stat_activity system view has a Boolean field named waiting. This field indicates that a certain backend is waiting on a system lock. The preceding query uses it to filter out only those queries that are waiting. 269Monitoring and Diagnosis There's more… Some more explanations about the preceding queries are appropriate here. No need for the = true part As thewaiting column is already Boolean, you can safely omit the= true part from the query and simply write the following: SELECT datname, usename, query FROM pg_stat_activity WHERE waiting; This catches only queries waiting on locks Thepg_stat_activity.waiting field shows only whether the query is waiting on a PostgreSQL internal lock. Although this is the main cause of waiting when using pure SQL, it is possible to write some query in any of PostgreSQL's embedded languages that can wait on other system resources, such as waiting for an HTTP response, a file write to get completed, or just waiting on a timer. As an example, you can make your backend "sleep" for a certain number of seconds using pg_sleep(seconds). While you are monitoringpg_stat_activity, open a new terminal session with psql and run the following statement in it: db= SELECT pg_sleep(10); it "stops" for 10 seconds here pg_sleep (1 row) It will show up as not waiting in thepg_stat_activity view, even though the query is, in fact, "blocked" in the timer. 270Chapter 8 Knowing who is blocking a query Once you have found out that a query is blocked, you need to know who or what is blocking them. Getting ready Just get a superuser account to run the queries. How to do it… Run the following query on PostgreSQL 9.2 or later versions: SELECT w.query AS waiting_query, AS waiting_pid, w.usename AS waiting_user, l.query AS locking_query, AS locking_pid, l.usename AS locking_user, t.schemaname '.' t.relname AS tablename FROM pg_stat_activity w JOIN pg_locks l1 ON = AND NOT l1.granted JOIN pg_locks l2 ON l1.relation = l2.relation AND l2.granted JOIN pg_stat_activity l ON = JOIN pg_stat_user_tables t ON l1.relation = t.relid WHERE w.waiting; This returns the process ID, user, current query about both blocked and blocking backends, and the fully qualified name of the table that causes the blocking. 271Monitoring and Diagnosis The equivalent query for PostgreSQL 9.0 and 9.1 is as follows: SELECT w.current_query AS waiting_query, w.procpid AS waiting_pid, w.usename AS waiting_user, l.current_query AS locking_query, l.procpid AS locking_pid, l.usename AS locking_user, t.schemaname '.' t.relname AS tablename FROM pg_stat_activity w JOIN pg_locks l1 ON w.procpid = AND NOT l1.granted JOIN pg_locks l2 ON l1.relation = l2.relation AND l2.granted JOIN pg_stat_activity l ON = l.procpid JOIN pg_stat_user_tables t ON l1.relation = t.relid WHERE w.waiting; How it works… This query r fi st selects all waiting queries (WHERE w.waiting), then gets the locks on those queries which are waiting (JOIN pg_locks l1 ON = AND NOT l1.granted), and then looks up the lock that is granted on the same table (JOIN pg_locks l2 ON l1.relation = l2.relation AND l2.granted). Finally, it looks up a row in pg_stat_activity corresponding to the granted lock. It also resolves the relation identie fi r (relid) of the table to its full name using thepg_stat_user_tables system view. Killing a specific session Sometimes, the only way to let the system continue as a whole is by surgically terminating some offending database sessions. Yes, you read it right—surgically. You might indeed be tempted to reboot the server, but you should think of that as a last resort in a business continuity scenario. In this recipe, you will be learning how to intervene, from gracefully canceling a query to brutally killing the actual process from the command line. 272Chapter 8 How to do it… You can either run this function as a superuser or, when using PostgreSQL 9.2 or newer versions, with the same user as that of the offending backend (look for theusename field in thepg_stat_activity view). Once you have figured out the backend you need to kill, use the function named pg_terminate_backend(pid) to kill it. How it works… When a backend executes thepg_terminate_backend(pid) function, it sends a signal, SIGTERM, to the backend as an argument after verifying that the process identified by the argumentpid is actually a PostgreSQL backend. The backend receiving this signal stops whatever it is doing, and terminates it in a controlled way. The client using that backend loses the connection to the database. Depending on how the client application is written, it may silently reconnect, or it may show an error to the user. There's more… Killing the session may not always be what you really want, so consider other options as well. It might also be a good idea to look at the Server Signaling Functions section in the PostgreSQL documentation at functions-admin.htmlFUNCTIONS-ADMIN-SIGNAL. Trying to cancel the query r fi st First, you may want to trypg_cancel_backend(pid), a milder version ofpg_terminate_ backend(pid). The difference between these two is thatpg_cancel_backend() just cancels the current query, whereaspg_terminate_backend() really kills the backend (therefore, this can be used foridle oridle in transaction backends). 273Monitoring and Diagnosis What if the backend won't terminate? Ifpg_terminate_backend(pid) fails to kill the backend and you really need to reset the database state to make it continue processing requests, then you have yet another option— sendingSIGKILL to the offending backend. This can be done only from the command line—as theroot or thepostgres system user—and on the same host the database is running, by executing the following code: kill -9 backend_pid This command kills that backend immediately, without giving it a chance to clean up. Consequently, the postmaster is forced to kill all the other backends as well and restart the whole cluster. Therefore, it actually does not matter which of the PostgreSQL backends you kill. You must be extremely careful if you have set thesynchronous_commit parameter to off. You may end up losing some supposedly committed transactions if you usekill -9 on a backend. Thus,kill -9 is the last resort, but only if nothing else helps, and not on a regular basis. Using statement timeout to clean up queries that take too long to run Often, you know that you don't have any use of queries running longer than a given time. Maybe, your web frontend just refuses to wait for more than 10 seconds for a query to complete and returns a default answer to users if it takes longer, abandoning the query. In such a case, it might be a good idea toset statement_timeout = 10 sec, either in postgresql.conf or as a per user or per database setting. Once you do so, queries running too long won't consume precious resources and make others' queries fail. The queries terminated by a statement timeout show up in the log, as follows: hannu= SET statement_timeout TO '3 s'; SET hannu= SELECT pg_sleep(10); ERROR: canceling statement due to statement timeout For the older versions of PostgreSQL, they show up as a more confusing message—query canceled due to user request. 274Chapter 8 Killing Idle in transaction queries Sometimes, people start a transaction, run some queries, and then just leave, without ending the transaction. This can leave some system resources in a state where some housekeeping processes can't be run. They may even have done something more serious, such as locking a table, thereby causing immediate denial of service for other users who need that table. You can use the following query to kill all backends that have an open transaction but have been doing nothing for the last 10 minutes: SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle in transaction' AND current_timestamp - query_start '10 min'; For PostgreSQL 8.4, 9.0, or 9.1, you need to use this slightly different query: SELECT pg_terminate_backend(procpid) FROM pg_stat_activity WHERE current_query = 'IDLE in transaction' AND current_timestamp - query_start '10 min'; You can even schedule this to run every minute while you are trying to find the specific frontend application that ignores open transactions, or you have a lazy administration that leaves a psql connection open, or a flaky network that drops clients without the server noticing it. Killing the backend from the command line Another possibility to terminate a backend is by using a Unix/Linux command namedkill N. This command orders theSIGTERM signal to processN on the system where it is running. You have to be either theroot user or the user running the database backends (usually postgres) to be able to send signals to processes. You can cancel a backend (and simulate thepg_cancel_backend(pid) function) by sending aSIGINT signal: kill -SIGINT backend_pid For more detailed information and the exact syntax, typeman kill from your favorite shell environment. 275Monitoring and Diagnosis Detecting an in-doubt prepared transaction While using two-phase commit (2PC), you may end up in a situation where you have something locked but cannot find a backend that holds the locks. This recipe describes how to detect such a case. How to do it… You need to look up thepg_locks table for those entries with an emptypid value. Run this query: SELECT t.schemaname '.' t.relname AS tablename,, l.granted FROM pg_locks l JOIN pg_stat_user_tables t ON l.relation = t.relid; The output will be something similar to the following: tablename pid granted -+-+- db.x t db.x 27289 f (2 rows) The preceding example shows a lock on thedb.x table, which has no process associated with it. If you need to remove a particular prepared transaction, you can refer to the Removing old prepared transactions recipe in Chapter 9, Regular Maintenance. Knowing whether anybody is using a specific table This recipe helps you when you are in doubt whether some obscure table is used any more or it is left over from old times and just takes up space. Getting ready Make sure that you are a superuser, or at least have full rights to the table in question. 276

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.