Split, Master and Global Tables
One of the most important ScaleBase concepts is its data-splitting techniques. They are transparent to the application, but can be configured. Because they have a major effect on application performance, I feel it is important to explain the differences between these table types.
Our example data will be an employee table, with the following values (those of you with an Oracle Database experience will recognize these values from the test HR schema that comes with the Oracle database).
| ID | First name | Last name |
| 100 | Steven | King |
| 101 | Neena | Kochhar |
| 102 | Lex | De Haan |
| 103 | Alexander | Hunold |
| 104 | Bruce | Ernst |
| 105 | David | Austin |
| 106 | Valli | Pataballa |
Also, for our example, we will consider the following ScaleBase
installation (without any high availability options), which contains
four databases – A,B,C and D.
Split Tables
A split table is a table whose data is distributed across several database servers. The table definition itself resides on all the database servers, but the data is divided between them, based on a split algorithm that can be configured (hash, list or range are currently supported – we will explain these options in a future blog). This option should be used for large tables or tables that have many write operations on them. Since each database contains a portion of the data, write operations (including index updating) are much faster.
So, in our example, if the employees table was configured to be a split table, we would have receive the following data distribution:
| Database A | Database B | Database C | Database D | ||||||||||||||||||||||
|
|
|
|
Global Tables
A global table means that the table configuration exists on all database servers, and each database server contains the exact data as all the other database servers. This option should be used for smaller tables, and is typically usually used in join operations with the split tables.
In our example, had we configured the employees table to be a global table, we would have received the following data distribution:
| Database A | Database B | Database C | Database D | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
|
Master Tables
Master tables are tables that only exist on one database (for the sake of this discussion, we’ll assume this is database A). This option is useful for data that is not used in joins. In this way, updating and selecting it are much faster.
| Database A | Database B | Database C | Database D | ||||||||||||||||||||||||
|
Summary
There are, of course, implications for each configuration option. Master tables cannot take part in joins with other tables, for example, without causing a performance impact. Global tables require XA transactions in order to ensure consistency between all nodes, and massive writes to them may cause lock situations that can hurt performance. Split tables are great, but you cannot split all tables, since complex queries are slow to execute.
We will be offering an offline analysis tool shortly that will assist in determining how to configure your tables. So, stay tuned to our site and Facebook page.
I hope this short post helped to explain the different options of our database configuration. Post back if you are interested in more information.
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)



