Home » 2.4. DW Reliability

2.4. DW Reliability

 

There are two levels of failover in a Distribware system, and each can operate independently of the other.  First, client applications can switch between different locations (stored as connections in their keyfiles) in the event one location is having a problem.  This can be the only level of redundancy used, in which case three different locations would be desired – but in general this configuration would only be used for small scale systems.

Second, there can also be redundancy within each location.  For Distribware, the mechanisms used to increase reliability within a given location are divided into two main categories:  processing node reliability and storage node reliability.  Reliability in all cases is based on hardware and software redundancy.  One other important concept is the endpoint.  All of the server-side components in a system are defined in the configuration data as endpoints.  The web service API’s (processing nodes) are defined by their URL’s in the client application keyfiles, and the database servers (storage nodes) by their ADO.NET connection strings in the system configuration data.

Processing node reliability is easy to achieve in the form of web server farms.  Each of the nodes in a farm is a duplicate copy of each other, sharing the processing workload of a location.  Processing nodes in Distribware are web servers hosting ASP.NET, and they are deployed into either stateful (DMZ) or stateless (internal network) web server farms.  Scaling out a web server farm increases both the reliability and the scalability of the system at the same time by allowing the system to stay online when an individual web server fails (reliability), and by dividing the workload of the system among all the available servers in the farm (scalability).

Storage nodes are not necessarily identical duplicates of each other, and they are neither a true farm nor a cluster, but the redundant data they host act as the foundation for both the reliability and scalability of Distribware as a whole.  In the platform, storage nodes are basically collections of stand-alone database servers.  Data storage reliability is based on two major features:  1) redundant hardware to host multiple independent copies of a database, and 2) synchronization of the data between the multiple copies of each database in the system.  All of the logic to perform this functionality is concentrated within the web service API’s – none of it exists at the database level (which is why the logic works with any of the major database engines).

Within a location, the database server endpoints are defined by ADO.NET connection strings used by SqlConnection objects in the API’s.  These connection strings form the basis for both the manual and automatic failover between database storage nodes within a location when redundant hardware is available.  Failover is simply a matter of changing which one of the connection strings is used by the web service API’s as the primary, and which is used as the backup.  A general convention used by Distribware systems is that for both mirrors and replicas, there are only two copies of a database per location.  This provides redundancy within the location while also simplifying the failover logic within the API’s.

The database servers host databases with one of three types of Distribware schemas:  1) standard, 2) mirror, or 3) replica.  The standard schema is to be used for databases that do not require any sort of redundancy, or when some other type of redundancy (replication, clustering, etc.) is preferred to the Distribware versions.  Otherwise, it is the mirror and replica schemas that provide both reliability and scalability within the system.  Both mechanisms are essentially real-time or near real-time incremental backups of the records in a database table, and therefore neither is what would generally be considered true mirroring or true replication.

All Distribware database servers are configured as single, stand-alone hosts collected into a loose confederation of mirrors or replicas via configuration data.   Databases that are unavailable for any reason are automatically flagged as offline, and the system continues to run as long as one copy of the database is available.  Under these circumstances, repairing the system consists of bringing the database and/or server back online, and changing its status in the configuration data.  Repair/synchronization of the data after downtime is automatic, except for corrupted data which requires an administrator to decide which version of the data is correct.

Since all of the logic for mirroring and replication is implemented within the web service API’s, the logic can easily work with any RDBMS that has an ADO.NET driver.  This feature allows organizations to use whatever RDBMS they already have, rather than be forced to switch to a new database engine.  Data can also be “backed up” from one type of database server to another, but that is an example of an ETL process.  The logic is very similar to replication, but includes a transform sandwiched in between the extract and load.  While any database engine can be used, the cost of the software license per storage node is a major factor (only MySql and Postgresql are recommended for this reason).

 

Expanded

 

 

Mirror = Immediate Backup

Immediate backup (mirroring) works quite well when no backup server has been defined, or if a backup server exists but is offline.  The API’s only require the primary.  Repair of the system involves bringing the failed server back online, and then changing the state of the server in the configuration data to indicate its online status.  Mirrored data is repaired automatically in the backup each time a record in the primary is modified.  In the event data was unable to be copied prior to a failover, “upsert” logic will attempt to insert a missing record that failed to be updated.

The term immediate backup is used to describe a pair of database write operations performed sequentially within a single API method.  A database write (insert, update, delete) is executed against the database whose connection string (endpoint) is designated as the primary.

If the write is successful, the logic then tests to see if a) a second database connection string exists, and b) if so, is the database online.  If the backup exists and is online, the same record written to the primary is immediately written to the backup.  The information regarding database and server state is stored in memory for each web service.

If the original write was not successful, a second database connection exists, and the second database is online, the method will a) attempt to write the original record to the backup, and b) if successful, will switch the connections string endpoints designating the primary and the backup servers (and flag the newly designated backup as offline).

This mechanism is obviously not true mirroring.  It is more like an immediate backup performed one record at a time, but only if and when a backup server is available.  This distinction in terminology is important, because it more accurately reflects the fact that the API logic works with a single server at a time, and the write operation to a backup server is optional (also, users experience immediate consistency of the data).  The database servers involved are themselves configured as independent, stand-alone servers, and all the backup logic exists within the web service API’s.

When failing over from a primary to a backup, “upsert” logic helps to correct any problems with un-synchronized records.  An automated background repair process exists to periodically insure that the primary and backup databases are completely in synch, which also corrects the problems that can occur during a failover.

The SysDB uses a synchronous (mirror) backup schema, and data is only copied within a location (never between locations).

 

Replica = Asynchronous Backup

Asynchronous backup (replication) also works when no replicas have been defined, or if any/all of those replicas are offline.  The API’s only require the primary.  Recovering from server downtime is similar to that of mirroring process described above from the system maintenance perspective.

The replication logic itself is based on a type of MVCC (Multi-Version Concurrency Control) combined with a distributed form of “Race and Repair” functionality.  Database table rows are considered to be immutable with the exception of a single state field, and the database tables themselves are append-only.  The replication process is transactional in that it guarantees each record will be copied “once, and only once” from a specific source table to each specific replica table.  It does so using a placeholder mechanism (forward scrolling cursor through the source data), and any problems will cause the process to pause until it is able to resume replication where it left off with the previous successful attempt.  The replication process also has logic to prevent the creation of duplicate records, even if the placeholder data is lost,  resulting in the process being reset and rolled forward.  Unlike the immediate backup, the asynchronous backup logic runs as a separate automated background process, continuously polling for new source table records to copy to a specific destination table.  If the mechanism were a type of replication rather than a backup, it would be described as n-way multi-master merge replication.

A second independent process (verification) uses a table-comparison mechanism to crawl through all the data in a source table and insure that no gaps, duplicates or corrupted data exists in each destination when compared to the source.  This verification process is intended to be run in the background frequently to guarantee that each destination is exactly in synch with each source – and to repair any problems in the data, no matter how or when they were caused.  In theory, the verification process could be used to completely repopulate an empty replica from scratch (but the replication process is faster).  Also, while the base replication mechanism is a forward-scrolling cursor through each source table, the verification repair process scrolls backward through the source data from the point in time it was triggered.  The verification process can be defined with ranges of records within a table, so that multiple processes can be run in parallel to reduce the time needed for a full verification.

The SecDB uses an asynchronous (replica) backup schema, and is copied both within locations and between them.

Reliability and Scalability are closely related.  Please refer to the Scalability page for more details.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: