How to scale Relational Database

how to scale database horizontally and how to scale mysql database and how to scale-out mysql database and how to scale a database server
Dr.MohitBansal Profile Pic
Dr.MohitBansal,Canada,Teacher
Published Date:26-10-2017
Your Website URL(Optional)
Comment
Scaling Your Database with PL/Proxy If you have followed the advice in the previous chapters for doing all your database access through functions, you are in a great position to scale your database by "horizontally" distributing the data over multiple servers. Horizontal distribution means that you keep just a portion of a table on each "partition" database, and that you have a method to automatically access the right database when accessing the data. We will gently introduce the concepts leading to the PL/Proxy partitioning language, and then delve into the syntax and proper usage of the language itself. Let's start with writing a scalable application from scratch. First, we will write it to be as highly performing as possible on one server. Then, we will scale it by spreading it out on several servers. We will first get this implemented in PL/Pythonu and then as samples done in the theme special language for this chapter—PL/Proxy. This approach is worth taking only if you have (plans for) a really large database. For most databases, one server plus one or perhaps two hot standby servers should be more than enough. Simple single-server chat Perhaps, the simplest application needing this kind of scalability is a messaging (or chat) application; so let's write one. The initial single-server implementation has the following specifications: • There should be users and messages. • Each user has a username, password, e-mail, list of friends, and a flag to indicate if the user wants to get messages from only their friends, or from everybody.Scaling Your Database with PL/Proxy • For users, there are methods for: ° Registering new users ° Updating the list of friends ° Logging in • Each message has a sender, receiver, message body, and timestamps for sending and reading the message. • For messages, there are methods for: ° Sending a message ° Retrieving new messages A minimalistic system implementing this could look like the following: ws: / / ws: / / HUB Queue Notify Here, a web page opens a WebSocket (ws://) to a HUB (a message concentrator) which in turn talks to a database. On each new connection, the HUB logs in and on successful login opens a WebSocket connection to the web page. It then sends all new messages that have accumulated for the logged-in user since the last time they retrieved their messages. After that, the HUB waits for new messages and pushes them to the web page as they arrive. 192 Chapter 9 The database part has two tables, theuser_info table: CREATE TABLE user_info ( username text primary key, pwdhash text not null, base64 encoded md5 hash of password email text, friend_list text, list of buddies usernames friends_only boolean not null default false ); As well as themessage table: CREATE TABLE message ( from_user text not null references user_info(username), sent_at timestamp not null default current_timestamp, to_user text not null references user_info(username), read_at timestamp, when was this retrieved by to_user msg_body text not null, delivery_status text not null default 'outgoing' ('sent', "failed") ); As this is still an "all-in-one database" implementation, the database functions corresponding to application methods are very simple. Creating a user: CREATE or REPLACE FUNCTION new_user( IN i_username text, IN i_pwdhash text, IN i_email text, OUT status int, OUT message text ) AS BEGIN INSERT INTO user_info( username, pwdhash, email) VALUES ( i_username, i_pwdhash, i_email); status = 200; message = 'OK'; EXCEPTION WHEN unique_violation THEN status = 500; message = 'USER EXISTS'; END; LANGUAGE plpgsql SECURITY DEFINER; This method just fails when the user is already defined. A more "real-life" function would propose a list of available usernames in this case. 193 Scaling Your Database with PL/Proxy The method for login returns status500 for failure and200 or201 for success.201 means that there are unread messages for this user: CREATE OR REPLACE FUNCTION login( IN i_username text, IN i_pwdhash text, OUT status int, OUT message text ) AS BEGIN PERFORM 1 FROM user_info WHERE ( username, pwdhash) = ( i_username, i_pwdhash); IF NOT FOUND THEN status = 500; message = 'NOT FOUND'; END IF; PERFORM 1 FROM message WHERE to_user = i_username AND read_at IS NULL; IF FOUND THEN status = 201; message = 'OK. NEW MESSAGES'; ELSE status = 200; message = 'OK. NO MESSAGES'; END IF; END; LANGUAGE plpgsql SECURITY DEFINER; The other two user methods are for changing the friends list and telling the system whether they want to receive mails that are only from friends. Error checking is omitted here for brevity: CREATE or REPLACE FUNCTION set_friends_list( IN i_username text, IN i_friends_list text, OUT status int, OUT message text ) AS BEGIN UPDATE user_info SET friend_list = i_friends_list WHERE username = i_username; status = 200; message = 'OK'; END; LANGUAGE plpgsql SECURITY DEFINER; CREATE or REPLACE FUNCTION msg_from_friends_only( 194 Chapter 9 IN i_username text, IN i_friends_only boolean, OUT status int, OUT message text ) AS BEGIN UPDATE user_info SET friends_only = i_friends_only WHERE username = i_username; status = 200; message = 'OK'; END; LANGUAGE plpgsql SECURITY DEFINER; The function used for messaging simply send messages is as follows: CREATE or REPLACE FUNCTION send_message( IN i_from_user text, IN i_to_user text, IN i_message text, OUT status int, OUT message text ) AS BEGIN PERFORM 1 FROM user_info WHERE username = i_to_user AND (NOT friends_only OR friend_list ARRAYi_from_user); IF NOT FOUND THEN status = 400; message = 'SENDING FAILED'; RETURN; END IF; INSERT INTO message(from_user, to_user, msg_body, delivery_status) VALUES (i_from_user, i_to_user, i_message, 'sent'); status = 200; message = 'OK'; EXCEPTION WHEN foreign_key_violation THEN status = 500; message = 'FAILED'; END; LANGUAGE plpgsql SECURITY DEFINER; The function used for messaging simply get messages is as follows: CREATE or REPLACE FUNCTION get_new_messages( IN i_username text, OUT o_status int, OUT o_message_text text, OUT o_from_user text, OUT o_sent_at timestamp) RETURNS SETOF RECORD AS 195 Scaling Your Database with PL/Proxy BEGIN FOR o_status, o_message_text, o_from_user, o_sent_at IN UPDATE message SET read_at = CURRENT_TIMESTAMP, delivery_status = 'read' WHERE to_user = i_username AND read_at IS NULL RETURNING 200, msg_body, from_user , sent_at LOOP RETURN NEXT; END LOOP; END; LANGUAGE plpgsql SECURITY DEFINER; We are almost done with the database part of our simple server. To finish it up, we need to do some initial performance tuning, and for that we need some data in our tables. The easiest way is to use thegenerate_series() function to generate a list of numbers, which we will use as usernames. For our initial testing, names like 7 or 42 are as good as Bob, Mary, or Jill: hannu= SELECT new_user(generate_series::text, 'pwd', generate_ series::text 'pg.org') hannu- FROM generate_series(1,100000); hannu= WITH ns(n,len) AS ( hannu( SELECT ,(random() 10)::int FROM generate_ series(1,100000)) hannu- SELECT set_friends_list(ns.n::text, hannu( ARRAY( (SELECT (random() 100000)::int hannu( FROM generate_series(1,len)) )::text hannu( ) hannu- FROM ns ; Now we have 100,000 users with 0 to 10 friends each, for a total of 501,900 friends. hannu= SELECT count() FROM (SELECT username,unnest(friend_list) FROM user_info) a; - RECORD 1 - count 501900 Now, let's send each of the friends a message: hannu= SELECT send_message(username,unnest(friend_list),'hello friend') FROM user_info; 196 Chapter 9 Look how fast we can retrieve the messages: hannu= select get_new_messages('50000'); get_new_messages (200,"hello friend",49992,"2012-01-09 02:23:28.470979") (200,"hello friend",49994,"2012-01-09 02:23:28.470979") (200,"hello friend",49995,"2012-01-09 02:23:28.470979") (200,"hello friend",49996,"2012-01-09 02:23:28.470979") (200,"hello friend",49997,"2012-01-09 02:23:28.470979") (200,"hello friend",49999,"2012-01-09 02:23:28.470979") (200,"hello friend",50000,"2012-01-09 02:23:28.470979") (7 rows) Time: 763.513 ms Spending almost a second getting seven messages seems slow, so we need to optimize a bit. The first thing to do is to add indexes for retrieving the messages: hannu= CREATE INDEX message_from_user_ndx ON message(from_user); CREATE INDEX Time: 4341.890 ms hannu= CREATE INDEX message_to_user_ndx ON message(to_user); CREATE INDEX Time: 4340.841 ms And check if this helped to solve our problem: hannu= select get_new_messages('52000'); get_new_messages (200,"hello friend",51993,"2012-01-09 02:23:28.470979") (200,"hello friend",51994,"2012-01-09 02:23:28.470979") (200,"hello friend",51996,"2012-01-09 02:23:28.470979") (200,"hello friend",51997,"2012-01-09 02:23:28.470979") (200,"hello friend",51998,"2012-01-09 02:23:28.470979") (200,"hello friend",51999,"2012-01-09 02:23:28.470979") (200,"hello friend",52000,"2012-01-09 02:23:28.470979") (7 rows) Time: 2.949 ms Much better—indexed lookups are 300 times faster than sequential scans, and this difference will grow as tables get bigger 197 Scaling Your Database with PL/Proxy As we are updating the messages and setting their status to read, it is also a good idea to set the fillfactor to something less than 100 percent. Fillfactor tells PostgreSQL not to fill up database pages completely but to leave some space for HOT updates. When PostgreSQL updates a row, it only marks the old row for deletion and adds a new row to the data file. If the row that is updated only changes unindexed fields and there is enough room in the page to store a second copy, a HOT update will be done instead. In this case, the copy can be found using original index pointers to the first copy, and no expensive index updates are done while updating. hannu= ALTER TABLE message SET (fillfactor = 90); ALTER TABLE Time: 75.729 ms hannu= CLUSTER message_from_user_ndx ON message; CLUSTER Time: 9797.639 ms hannu= select get_new_messages('55022'); get_new_messages (200,"hello friend",55014,"2012-01-09 02:23:28.470979") (200,"hello friend",55016,"2012-01-09 02:23:28.470979") (200,"hello friend",55017,"2012-01-09 02:23:28.470979") (200,"hello friend",55019,"2012-01-09 02:23:28.470979") (200,"hello friend",55020,"2012-01-09 02:23:28.470979") (200,"hello friend",55021,"2012-01-09 02:23:28.470979") (200,"hello friend",55022,"2012-01-09 02:23:28.470979") (7 rows) Time: 1.895 ms Still better. The fillfactor made the get_new_messages() function another 20 to 30 percent faster, thanks to enabling the faster HOT updates 198 Chapter 9 Dealing with success – splitting tables over multiple databases Now, let's roll forward in time a little and assume you have been successful enough to attract tens of thousands of users and your single database starts creaking under the load. My general rule of thumb is to start planning for a bigger machine or splitting the database when you are over 80 percent utilization at least for a few hours a day. It's good to have a plan earlier, but now you have to start doing something about really carrying out the plan. What expansion plans work and when There are a couple of popular ways to grow database-backed systems. Depending on your use case, not all ways will work. Moving to a bigger server If you suspect that you are near your top load for the service or product, you can simply move to a more powerful server. This may not be the best long-time scaling solution if you are still in the middle, or even in the beginning of your growth. You will run out of "bigger" machines to buy long before you are done. Servers also become disproportionately more expensive as the size increases, and you will be left with at least one "different" and thus not easily replaceable server once you implement a proper scaling solution. On the other hand, this will work for some time and is often the easiest way to get some headroom while implementing real scaling solutions. Master-slave replication – moving reads to slave Master-slave replication, either trigger-based or WAL-based, works reasonably well in cases where the large majority of the database accesses are reads. Some things that fall under this case are website content managers, blogs, and other publishing systems. As our chat system has more or less a 1:1 ratio of writes and reads, moving reads to a separate server will buy us nothing. The replication itself is more expensive than the possible win from reading from a second server. 199 Scaling Your Database with PL/Proxy Multimaster replication Multi-master replication is even worse than master-slave(s) when the problem is scaling a write-heavy workload. It has all the problems of master-slave, plus it introduces extra load via cross-partition locking or conflict resolution requirements, which further slows down the whole cluster. Data partitioning across multiple servers The obvious solution to scaling writes is to split them between several servers. Ideally you could have, for example, four servers and each of them getting exactly ¼th of the load. In this case, each server would hold a quarter of users and messages, and serve a quarter of all requests. HUB HUB Proxy 1 Proxy 2 P0 P1 P2 P3 users 200 Chapter 9 To make the change transparent for database clients, we introduce a layer of proxy databases. These proxy databases can either reside on the same hosts as the partition databases or be on their own host. The role of the proxy databases is to pretend to be the database for clients, but in fact delegate the real work to partitions by calling the right function in the right partition database. This client transparency is not terribly important if you have just one application accessing the database. If you did, you could then do the splitting in the client application. It becomes very handy as your system grows to have several applications, perhaps using many different platforms and frameworks on the client side. Having a separate layer of proxy databases enables easy management of data splitting so that the client applications don't need to know anything about the underlying data architecture. They just call the functions they need and that's all they need to know. In fact, you can switch out the whole database structure without the clients ever noticing anything except the better performance from the new architecture. More on how exactly the proxy works later. For now, let us tackle splitting the data. Splitting the data If we split the data, we need a simple and efficient way to determine which server stores each data row. If the data had an integer primary key, you could just go round-robin, store the first row on the first server, the second row on the second, and so on. This would give you a fairly even distribution, even when rows with certain IDs are missing. The partitioning function for selecting between four servers would be simply: partition_nr = id & 3 The partitioning mask3 (binary 11) is for the first two bits. For eight partitions, you would use7 (binary 111), and for 64 servers it would be63 (00111111). It is not as easy with things like usernames, where putting all names starting with an A first, B second, and so on does not produce an even distribution. Turning the username into a fairly evenly distributed integer via the hash function solves this problem and can be used directly to select the partition. partition_nr = hashtext(username) & 3 201 Scaling Your Database with PL/Proxy This would distribute the users in the following manner: hannu= SELECT username, hashtext(username) & 3 as partition_nr FROM user_info; - RECORD 1 + username bob partition_nr 1 - RECORD 2 + username jane partition_nr 2 - RECORD 3 + username tom partition_nr 1 - RECORD 4 + username mary partition_nr 3 - RECORD 5 + username jill partition_nr 2 - RECORD 6 + username abigail partition_nr 3 - RECORD 7 + username ted partition_nr 3 - RECORD 8 + username alfonso partition_nr 0 So partition0 gets useralfonso, partition1 bob andtom, partition2 jane andjill, and partition3 getsmary,abigail, andted. The distribution is not exactly ¼th to each partition; but as the number of partitions increase, it will be pretty close where this actually matters. If we had no PL/Proxy language, we could write the partitioning functions in the most untrusted PL languages. For example, a simple login proxy function written in PL/Pythonu looks like this: CREATE OR REPLACE FUNCTION login( IN i_username text, IN i_pwdhash text, OUT status int, OUT message text ) AS import psycopg2 partitions = 'dbname=chap9p0 port=5433', 202 Chapter 9 'dbname=chap9p1 port=5433', 'dbname=chap9p2 port=5433', 'dbname=chap9p3 port=5433', partition_nr = hash(i_username) & 3 con = psycopg2.connect(partitionspartition_nr) cur = con.cursor() cur.execute('select from login(%s,%s)', ( i_username, i_ pwdhash)) status, message = cur.fetchone() return (status, message) LANGUAGE plpythonu SECURITY DEFINER; Here, we defined a set of four partition databases, given by their connect strings stored as a list in variablepartitions. When executing the function, we first evaluate the hash function on the username argument (hash(i_username)) and extract two bits from it (& 3) to get index into the partitions list (the partition number) for executing each call. Then, we open a connection to a partition database using the connect string selected by the partition number (con=psycopg2.connect(partitionspartition_nr)). Finally, we execute a remote query in the partition database and return the results of this to the caller of this proxy function. This works reasonably well if implemented like this, but also has at least two places where it is suboptimal: • First, it opens a new database connection each time the function is called, which kills performance • Second, it is a maintenance nightmare if you hard-wire the partition information in full in all functions The performance problem can be solved by caching the open connections, and the maintenance problem can be solved by having a single function returning the partition information. However, even when we do these changes and stay with PL/Pythonu for partitioning, we will still be doing a lot of copy and paste programming in each of our proxy functions. Once we had reached the preceding conclusions when growing our database systems at Skype, the next logical step was quite obvious. We needed a special partitioning language, which would do just this one thing—calling remote SQL functions, and then make it as fast as possible; and thus the PL/Proxy database partitioning language was born. 203 Scaling Your Database with PL/Proxy PL/Proxy – the partitioning language The rest of this chapter is devoted to the PL/Proxy language. First, we will install it. Then, we will look at its syntax and ways to configure the partitions for its use. Finally, we will discuss how to do the actual data migration from a single database to a partitioned one and then look at several usage examples. Installing PL/Proxy If you are on Debian, Ubuntu, or a Red Hat variant, installing the language is easy. 1. First, you have to install the required packages on your operating system: sudo apt-get install postgresql-9.2-plproxy Or: sudo yum install plproxy92 2. Then, install the language in the database as an extension, which will be hosting the PL/Proxy functions: -bash-4.2 psql -c "CREATE EXTENSION plproxy" proxy1 CREATE EXTENSION At the time of writing this book, the PL/Proxy language is still not completely integrated with the PostgreSQL standard distribution. The SQL commandsCREATE LANGUAGE plproxy and its command-line equivalentcreatelang plproxy do not work. This may have been fixed by the time you read this, so you can try these first. PL/Proxy language syntax The PL/Proxy language itself is very simple. The purpose of a PL/Proxy function is to hand off the processing to another server so that it only needs six statements: • CONNECT orCLUSTER andRUN ON for selecting the target database partition • SELECT andTARGET for specifying the query to run • SPLIT for splitting anARRAY argument between several sub-arrays for running on multiple partitions 204 Chapter 9 CONNECT, CLUSTER, and RUN ON The first group of statements handle the remote connectivity to the partitions. The help determines which database to run the query on. You specify the exact partition to run the query usingCONNECT: CONNECT 'connect string' ; Here,connect string determines the database to run.connect string is the standard PostgreSQL connect string you would use to connect to the database from a client application, for example:dbname=p0 port=5433. Or, you can specify a name usingCLUSTER: CLUSTER 'usercluster'; - Or finally, you can specify a partition number usingRUN ON: RUN ON part_func(arg, ...) ; part_func() can be any existing or user-defined PostgreSQL function returning an integer. PL/Proxy calls that function with the given arguments and then uses N lower bits from the result to select a connection to a cluster partition. There are two more versions of theRUN ON statement: RUN ON ANY; This means that the function can be executed on any partition in a cluster. This can be used when all the required data for a function is present on all partitions. The other version is: RUN ON ALL; This runs the statement on all partitions in parallel and then returns a concatenation of results from the partitions. This has at least three main uses: • For cases when you don't know where the required data row is, like when getting data using non-partition keys. For example, getting a user by its e-mail when the table is partitioned by username. • Running aggregate functions over larger subsets of data, say counting all users. For example, getting all the users who have a certain user in their friend's lists. • Manipulating data that needs to be the same on all partitions. For example, when you have a price list that other functions are using, then one simple way to manage this price list is using aRUN ON ALL function. 205 Scaling Your Database with PL/Proxy SELECT and TARGET The default behavior of a PL/Proxy function if noSELECT orTARGET is present is to call the function with the exact same signature as itself in the remote partition. Suppose we have the function: CREATE OR REPLACE FUNCTION login( IN i_username text, IN i_pwdhash text, OUT status int, OUT message text ) AS CONNECT 'dbname=chap9 host=10.10.10.1'; LANGUAGE plproxy SECURITY DEFINER; If it is defined in schema public, the following call select from login('bob', 'secret') connects to the databasechap9 on host 10.10.10.1 and runs the following SQL statement there: SELECT FROM public.login('bob', 'secret') This retrieves the result and returns it to its caller. If you don't want to define a function inside the remote database, you can substitute the defaultselect from thisfunction(arg1, ...) call with your own by writing it in the function body of PL/Proxy function: CREATE OR REPLACE FUNCTION get_user_email(i_username text) RETURNS SETOF text AS CONNECT 'dbname=chap9 host=10.10.10.1'; SELECT email FROM user_info where username = i_username; LANGUAGE plproxy SECURITY DEFINER; Only a singleSELECT is supported; for any other or more complex SQL statements, you have to write a remote function and call it. The third option is to still call a function similar to itself, but named differently. For example, if you have a proxy function defined not in a separate proxy database, but in a partition, you may want it to target the local database for some data: CREATE OR REPLACE FUNCTION public.get_user_email(i_username text) RETURNS SETOF text AS CLUSTER 'messaging'; RUN ON hashtext(i_username); TARGET local.get_user_email; LANGUAGE plproxy SECURITY DEFINER; 206 Chapter 9 In this setup, the local version ofget_user_email() is in schema local on all partitions. Therefore, if one of the partitions connects back to the same database that it is defined in, it avoids circular calling. SPLIT – distributing array elements over several partitions The last PL/Proxy statement is for cases where you want some bigger chunk of work to be done in appropriate partitions. For example, if you have a function to create several users in one call and you still want to be able to use it after partitioning, the SPLIT statement is a way to tell PL/Proxy to split the arrays between the partitions based on the partitioning function: CREATE or REPLACE FUNCTION create_new_users( IN i_username text, IN i_pwdhash text, IN i_email text, OUT status int, OUT message text ) RETURNS SETOF RECORD AS BEGIN FOR i IN 1..array_length(i_username,1) LOOP SELECT INTO status, message FROM new_user(i_usernamei, i_pwdhashi, i_emaili); RETURN NEXT; END LOOP; END; LANGUAGE plpgsql SECURITY DEFINER; The following PL/Proxy function definition created on the proxy database can be used to split the calls across the partitions: CREATE or REPLACE FUNCTION create_new_users( IN i_username text, IN i_pwdhash text, IN i_email text, OUT status int, OUT message text ) RETURNS SETOF RECORD AS CLUSTER 'messaging'; RUN ON hashtext(i_username); SPLIT i_username, i_pwdhash, i_email; LANGUAGE plproxy SECURITY DEFINER; It would be called by sending in three arrays to the function: SELECT FROM create_new_users( ARRAY'bob', 'jane', 'tom', ARRAYmd5('bobs_pwd'), md5('janes_pwd'), md5('toms_pwd'), ARRAY'bobmail.com', 'janemail.com', 'tommail.com' ); 207 Scaling Your Database with PL/Proxy It will result in two parallel calls to partitions1 and2 (as usinghashtext(i_username) tom and bob map to partition1 and mary to partition2 of total for partitions as explained earlier), with the following arguments for partition1: SELECT FROM create_new_users( ARRAY'bob', 'tom', ARRAY'6c6e5b564fb0b192f66b2a0a60c751bb', 'edcc36c33f7529f430a1bc6eb7191dfe', ARRAY'bobmail.com','tommail.com' ); And this for partition2: SELECT FROM create_new_users( ARRAY'jane', ARRAY'cbbf391d3ef4c60afd851d851bda2dc8', ARRAY'janemail.com' ); Then, it returns a concatenation of the results: status message +- 200 OK 200 OK 200 OK (3 rows) Distribution of data First, what is a cluster in PL/Proxy? Well, the cluster is a set of partitions that make up the whole database. Each cluster consists of a number of partitions as determined by the cluster configuration. Each partition is uniquely specified by its connect string. The list of connection strings is what makes up a cluster. The position of the partition in this list is what determines the partition number, so the first element in the list is partition0, the second partition is1, and so on. The partition is selected by the output of theRUN ON function, and then masked by the right number of bits to map it on partitions. So, ifhashtext(i_username) returns 14 and there are four partitions (2 bits, mask binary 11 or 3 in decimal), the partition number will be 14 and 3 = 2, and the function will be called on partition2 (starting from zero), which is the third element in partition list. 208 Chapter 9 The constraint that the number of partitions has to be a power of two may seem an unnecessary restriction at first, but it was done in order to make sure that it is, and it will remain to be, easy to expand the number of partitions without the need to redistribute all the data. For example, if you tried to move from three partitions to four, most likely 3/4th of the data rows in partitions0 to2 have to be moved to new partitions to evenly cover0 to3. On the other hand, when moving from four to eight partitions, the data for partitions0 and1 is exactly the same that was previously on partition0, 2-3 is old 1 and so on. That is, your data does not need to be moved immediately, and half of the data does not need to be moved at all. The actual configuration of the cluster, the definition of partitions can be done in two ways, either by using a set of functions in schemaplproxy, or you can take advantage of the SQL/MED connection management. (SQL/MED is available starting PostgreSQL 8.4 and above.) Configuring PL/Proxy cluster using functions This is the original way to configure PL/Proxy, which works on all versions of PostgreSQL. When a query needs to be forwarded to a remote database, the function plproxy.get_cluster_partitions(cluster) is invoked by PL/Proxy to get the connection string to use for each partition. The following function is an example which returns information for a cluster with four partitions,p0 top3: CREATE OR REPLACE FUNCTION plproxy.get_cluster_partitions(cluster_name text) RETURNS SETOF text AS BEGIN IF cluster_name = 'messaging' THEN RETURN NEXT 'dbname=p0'; RETURN NEXT 'dbname=p1'; RETURN NEXT 'dbname=p2'; RETURN NEXT 'dbname=p3'; ELSE RAISE EXCEPTION 'Unknown cluster'; END IF; END; LANGUAGE plpgsql; 209 Scaling Your Database with PL/Proxy A production application might query some configuration tables or even read some configuration files to return the connection strings. Once again, the number of partitions returned must be a power of two. If you are absolutely sure that some partitions are never used, you can return empty strings for these. We also need to define a plproxy.get_cluster_version(cluster_name) function. This is called on each request and if the cluster version has not changed, the output from a cached result fromplproxy.get_cluster_partitions can be reused. So, it is best to make sure that this function is as fast as possible: CREATE OR REPLACE FUNCTION plproxy.get_cluster_version(cluster_name text) RETURNS int4 AS BEGIN IF cluster_name = 'messaging' THEN RETURN 1; ELSE RAISE EXCEPTION 'Unknown cluster'; END IF; END; LANGUAGE plpgsql; The last function needed isplproxy.get_cluster_config, which enables you to configure the behavior of PL/Proxy. This sample will set the connection lifetime to 10 minutes: CREATE OR REPLACE FUNCTION plproxy.get_cluster_config( in cluster_name text, out key text, out val text) RETURNS SETOF record AS BEGIN lets use same config for all clusters key := 'connection_lifetime'; val := 1060; RETURN NEXT; RETURN; END; LANGUAGE plpgsql; 210