Designing Temporal Analytics with MariaDB

Summary: This post is the first of several related articles. My intention is to describe, as succinctly as possible, the development of a data-intensive WooCommerce plug-in that provides temporal information. The overall aim of this plug-in is to create a Woo-compatible, time-oriented mini DSS analytical environment that can easily be queried using ad hoc SQL statements.

* * *

MariaDB 10.3 provides built-in functionality called System Versioning (1). This has several use-cases, such as supporting sophisticated audit trails. In fact, MariaDB provides a plug-in that can be used to log a server’s activities for exactly this purpose (2).

System Versioning can also be used to produce data analytics that are of interest in a retailing environment.

For example, a WooCommerce end-user may be tasked with producing a graph that tracks, say, the price changes of store products over some unit of time.

Producing such a report using native SQL and the existing WooCommerce database schema is far from straightforward.

This is due to the latter’s underlying complexity – which is a result of WooCommerce reliance on, and conformance to, the semantics of the WordPress data model (3), which is extensible — a feature that WooCommerce uses to its full advantage.

This extensibility reduces the need for plug-ins to overpopulate the WP database with their own custom tables.

When shop owners run WooCommerce’s pre-installed or custom reports to obtain business-related information about their store, there are  several important factors to consider.

Multi-user ad hoc querying of a production WooCommerce platform can result in performance degradation of transactional execution (i.e., the shopping cart experience), slow down shopper catalog browsing, and impact the refresh rate of the somewhat sluggish WP Admin panel.

Moreover, particularly in entry-level hosted environments, running frequent reports may also cause a store owner to exceed a given plan’s CPU time in a shared server configuration

Finally, it can be difficult to produce customized WooCommerce reports without costly programming that requires skilled knowledge of WooCommerce internals. (For an excellent tutorial on the semantics of WP’s core database, here is an excellent multi-part tutorial.).  

A separate DSS environment would be the ideal solution proposed by this experimental project – which I am calling Project SB, for Sandbox.

Theoretically, this environment could be hosted on an end user’s local machine; alternatively, it could be hosted in the cloud on a Saas basis. 

But for now, the goal of Project SB is to produce a plug-in that simplifies an end-user’s ability to write simple yet sophisticated temporal queries of the WooCommerce database that will initially reside in the WP/Woo database itself, as a proof of concept..

For example, an end user (such as a store owner, or someone who is authorized to examine transactional data) may wish to know how many widgets were sold between time t1 and t2, and group the result of this query by price.

SB enables an end-user to formulate such a query as simply as possible – without needing to write extremely complex SQL statements or add mysterious PHP snippets to a child theme that they might be asked to create.

This suggested approach would bypass entirely the elaborate PHP/JS REST API machinery that characterizes the current WooCommerce ecosystem.

As just stated, however, it is not difficult to envision this DSS environment running on a different server, thereby minimizing the impact on the production environment.

For large-scale retailers, it is even possible to also envision the implementation of such a DSS database using a massively parallel data architecture, such as that provided by MariaDB’s AX new columnar storage engine (4), along with all the other advanced technology that it makes available to Enterprise-level customers.

But let us not get ahead of ourselves.

Instead, let me start as simply as possible, then refine my approach through iteration, as our understanding of the problem to be solved improves.

Ultimately, the goal is to end up with a flexible data model that can successfully be mapped to not only to the existing WooCommerce schema, but perhaps other schemas in the future.

Because the initial requirement (stated earlier) clearly mentions an interest in the temporal tracking of product prices, I will kick things off by sketching a star-schema flavored Product/Price model of this universe, but which also shows the relationship of this star schema to other entities of interest.

Here’s what I came up with.

pricing data model
Figure 1

As you can see, this model follows a classic design pattern that is often used see in DSS systems that, say, warehouse multiple sources of transactional stock and bond of other financial instrument trading information.

The model is anchored around three kernel entities (colored in blue), which specify that each Product is associated with an Inventory instance, and that each Inventory instance in turn can be associated with a Transaction.

I have left out cardinality constraints, as this level of detail is not yet needed.

This model represents an abstraction of things of recognizable interest in retailing, and can be used to organize the data structures that will be used in this application.

For example, the model allows for units of Inventory to be located in different warehouses or stores.

It also specifies the Product entity as having a unary comprises relationship: this is a preliminary attempt to indicate that a Product instance can be associated with several other multiple products that can be sold as a bundle; say, a towel and a bar of soap. Experienced data modelers will instantly recognize this as BOM problem, which can be notoriously tricky to solve using a relational database.

We will have to try out different scenarios and test how comprises can best be implemented in this particular situation.

A Customer can engage in multiple Transactions (these can be Buy/Sell/Return, etc: the model as it stands does not specify a characteristics entity of Transaction Type table, which will be added later on, when we focus on shopping cart semantics); this is also valid for an instance in Supplier.

Transactions can be related to a Discount Type, which as show here is associated with the weak entity Checkout Price. This proposed association must be reviewed programatically to ensure that the information produced is consistent.

Furthermore, it is clear that the cardinality of the weak Price entities will vary. For example, the MSRP is likely to change slowly during non-inflationary times, while Checkout Price instances may have a wide distribution in a short span of time.

There is the tricky issue of how to calculate the actual checkout price of an item – given that a discount can apply to an entire cart, or to one item in a shopping cart basket of goods.

Luckily for us, the job of capturing these details is that of the POS system; all a DSS has to do is reflect the baseline data provided by its feeder transactional systems, then massage this data to produce analytical information.

By collapsing the pricing star schema shown here, it might be possible to reduce the complexity of this section of the SB model.

In the next post, we shall examine how this might be achieved.

Then we will produce MariaDB-specific DDL that implements tables that only contain PK and FK columns, write DML inserts to add sample data, then test out various SQL queries against the proposed design.

The objective of that exercise will be to verify that the “DSS-optimized” version of this section of the model works correctly.

Following this step, we will determine what other attributes might be needed “fill out” the Product and Price entities, and begin to map these columns to their equivalents in the WooCommerce 3.5.x schema.

  1. see https://mariadb.com/kb/en/library/system-versioned-tables/
  2. see https://mariadb.com/kb/en/library/mariadb-audit-plugin/
  3. see https://codex.wordpress.org/Database_Description
  4. see https://mariadb.com/kb/en/library/mariadb-columnstore/
  5. As used here, a data model loosely refers to a pictorial depiction of the entities, attributes, and business relationships that are of interest to Project SB. Such a visual representation – which can be equated to a simplified architectural blueprint – can easily be produced using ER modeling tools of varying cost or functionality — or simply sketched by hand, using “the poor person’s” approach, before producing the needed DDL. The term schema will be applied to a collection of physical tables, views and other database objects specified in a database engine that, together, implement the SB data model. This nomenclature may not prove satisfactory from a data modeling purist’s perspective, but it is useful enough to serve our purpose, which is to produce a WP plug-in that works correctly and efficiently. In doing so, I shall steer clear of terminology wars, and focus instead on building the application, while touching on the various factors that play a role in the data modeling phase of this project.