Home » 2.2. DW Overview

2.2. DW Overview

Distribware is a distributed mulit-computer platform used to build custom software systems.  Distribware consists of the following logical/physical components:

  • Web service API’s
  • NewSQL data storage
  • Scheduler service

All logic in the system has been consolidated into the web service API’s.  What this means in practice is that the web service API’s are used for a) external client/server communication with applications as well as b) all internal inter-process communication and automated processing within the system.  Therefore, all external and internal communication in the system is funneled through the API message processing pipeline, which contains many layers of security.  The web service API’s are designed and built using a variation of a micro-services style of architecture, which allows all the various types of functionality in the system to be easily extended by adding more services/API’s/methods to each part of the system… continuously.  The platform has been designed to allow for the continuous modification of both the API’s and underlying database schemas without ever causing any breaking changes for the client applications that depend on a given system.  When sufficient redundant hardware is used, systems built using the platform are able to have no maintenance windows and no down time, planned or otherwise.

The platform uses a simplified data access architecture based on ADO.NET that contains built-in NewSQL database mirroring, replication and sharding functionality.  These capabilities are implemented within the middle tier API’s and therefore can be used with any of the major database servers on the market.  Database servers are not configured as clusters, but instead are just a bunch of independent servers (the database equivalent of JBOD for storage).  Support for MySQL and Postgresql are included initially, but any database with an ADO.NET driver can be used.  The NewSQL logic is optional, so a system can work just as well using a MySQL cluster, for example.  Other types of storage such as MongoDB can be used as fully integrated systems but not as main system storage.

Automated processing within the system is also implemented as API’s, and is event driven.  While any type of event can trigger the execution of an automated process, the scheduler service is typically used for this purpose (calling the appropriate API method to execute automated process logic).  As an example, some of the “out-of-the-box” automated processes include the replication of data between redundant servers, replicated data validation and repair, refresh/update of cached configuration data on each server, ETL processes, data cleansing, data categorization and tagging, and so on.  Most automated processes are intended to run both frequently and continuously through very large volumes of data.  Since automated processes are just API methods in the system, any logic that can be written using C# as a reusable library of code can be automated.


** An important fact about Distribware is that it has been designed to use the large IOPS, high throughput and low latencies of SSD’s.  It can be implemented using mechanical disk storage, but only for small-scale systems.  Any systems that require decent performance and scalability must use SSD’s for the system DB and security DB database servers at a minimum.


The architecture and code of the entire system has been adapted to use techniques from the books “Writing High Performance .NET Code” – Ben Watson – 2014 ( writinghighperf.net ), “Ultra-Fast ASP.NET 4.5” – Richard Keissig – Apress – 2012 ( 12titans.net ), and many other sources of high-performance .NET implementation guidelines.

The stateless web service API’s act as the public-facing conduit to all the reusable logic contained in the system as internal libraries of code.  It consists of a combination of the Front Controller and Single Message Argument service design patterns, used to implement a modular message-based API  (refer to Service Design Patterns:  http://www.amazon.com/Service-Design-Patterns-Fundamental-Solutions/dp/032154420X).  One of the primary requirements for the web services is that a single instance be usable by all major programming languages, platforms and devices.  The other main requirements for the web services are focused on security, high performance, and efficiency.

The message-based API’s provide the following functionality:

  1. The web services expose a single controller method, which contains a single parameter (to accept request messages) as the only attack surface in the system.
  2. The web services contain an (optional) integrated  message encryption, which can be used in place of SSL/TLS where appropriate.
  3. The front controller method provides a natural place for a message processing pipeline
    • a tolerant reader securely parses the API request message while tolerating random additional elements in the message used elsewhere in the system
    • a message TTL check prevents replay attacks
    • account authentication of the request message
    • authorization for each individual internal method call
    • mapping of external API method labels to internal concrete methods
    • mapping of message name/value pairs to concrete method parameters
    • execution of the internal method(s), followed by a method-level assertion test of the results (to be included in the response message)
  4. The messages can contain a collection of multiple API method calls, as a batch.  This helps reduce the “chattiness” between the client application and the server
    • batches of methods reduces the workload for the identity store authentication/authorization,
    • improves the overall performance and efficiency of the API’s while also reducing the amount of network bandwidth used.
  5. The response message can contain multiple result values per method (substitute mechanism for “out” params).
  6. The response message can contain other additional information beyond the method result(s), offering 2-way communication between the client and the server
    • authenticates each response message from the server to the client, preventing man-in-the-middle attacks.
    • sends back a simple assertion value indicating the outcome of each method’s execution (OK, empty, error, and exception) used by end-to-end testing.

Some additional advantages of message-based API’s in general are fairly well documented over at ServiceStack ( https://github.com/ServiceStack/ServiceStack/wiki/Advantages-of-message-based-web-services ).

On the back end, Distribware data storage focuses heavily on using low-cost relational database engines as redundant storage nodes for reliability, fault-tolerance and horizontal scalability.  The storage logic contains NewSQL functionality to provide database mirroring, replication, and sharding, which can be used with any of the major database engines.  This logic is optional, and can easily be replaced with a more standard type of data access when using storage systems that already have their own fault-tolerance and scalability,  such as MySQL clusters.

EFPerformance

  1. The data access logic is based on raw ADO.NET for performance.  Any layer of abstraction adds overhead which can dramatically reduce performance.  This diagram was taken from the ADO.NET blog ( http://blogs.msdn.com/b/adonet/archive/2012/02/14/sneak-preview-entity-framework-5-0-performance-improvements.aspx ) and compares EF 4 performance to the improvements made in EF 5.  However, ADO.NET is still twice as fast as the best EF options, or four times as fast as the most commonly used EF functionality.  This is one example of the overall performance guidelines from the book “Writing High Performance .NET Code”, which stresses elimination of layers of abstraction in favor of using the most direct/lowest level development techniques possible to achieve the best performance.
  2. Reliability logic (database mirroring and n-way multi-master merge replication) depend on the flexibility of the simple builder methods and the dependency injection utility classes that execute the SqlCommand objects passed to them.  Redundant servers in a location allow for failover within that location and no maintenance windows (maintenance performed while the system is running) in addition to failover by client applications between locations.  Note:  replication is based on a variant of the Multi-Version Concurrency Control (MVCC) pattern.
  3. Scalability logic is an extension of the flexible ADO.NET mirror and replica functionality to enable horizontal scale out (sharding) of each schema.
  4. Continuous modification and deployment of both the API’s and database schemas.  Eliminating the generated static data class object models and their tight coupling with the API code allows for continuous append-only modification of both the database schemas and API’s without causing any breaking changes to the client applications.  It also simplifies the automated deployment of schema modifications and API code modifications to their respective farms of processing nodes and storage nodes within a given system.
  5. Programmatic modification of the parameterized DML syntax at runtime.  The builder classes in Distribware’s data access object model are able to modify the parameterized SQL syntax at runtime to provide true optional data parameter functionality, allow conflicting updates of the same record to be merged together (a common occurrence in a distributed system).  Such flexibility is not possible using static ORM-generated data classes.

Finally, the automated processing logic has been itself implemented as modular API’s which can easily be extended.  Execution of the automated processes can be triggered by internal events, other API methods, but primarily are triggered by the scheduler.   A windows service is paired with each web service to act as the dedicated scheduler for that service.  It attempts to claim records in the systemwork database table, which store state information about each process and also act as a token granting permission for a specific thread to execute the process.  Since the automated processes are API methods, they are executed using the full security of the standard message processing pipeline.  The one-way message processing pipeline contains the logic to “close the loop” in the state information of each process by updating the record in the systemwork table.

Summary

This information represents a generalized high level overview of the Distribware platform.  For more detail, please refer to the documentation pages of each specific section of the system.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: