The Great Leap Forward


stripe schema
v .01

Here is an ERD diagram of 2 important STRIPE tables that jumped out as being of immediate interest.

There are approximately 50 tables in the SIGMA SQL interface (I don’t know how these map to actual tables on Stripe’s production backend). I’m definitely not going to dupe them all.

Finding the table that represents the master Stripe Account (where store owner funds are deposited before transfer) is next.  This will give me a readout of the balance-to-date in Stripe.

Just as a reminder, I am primarily interested in capturing relevant data about money flows of a WooCommerce account over time,  That’s it.  No need to get bogged down in extraneous Stripe detail.

A StoreBoss owner would typically be interested in the following questions:

How much money is in my Stripe account right now?

How much of it is available to me? (ie, net of any holds by Stripe on my money?)

How fast is Stripe transferring the money to my bank?

What is Stripe charging me for this service?

Is Stripe dropping any sales transactions that I made through WooCommerce or getting any sales amounts wrong or miscalculated batch amounts?

What about refunds/chargebacks?

Once I have the tables that can help provide answers to these questions set up in my StoreBoss XAMPP dev env, I’ll start to look at the query templates that Stripe has made available and maybe translate these too in SQLite speak.

I am not going to spend any time customizing the canned Sigma queries, as I have made the decision that StoreBoss is going directly retrieve data via the Stripe REST API.

Doing so would allow me to load the data off Stripe without Sigma’s 2 or 3-day delay (and save the end user the $40/mo). Eventually, I’m going to have to do the same with Bank info.

I’ve changed my mind about integrating these sources of data on the desktop.

Companies like Leapfin already do similar sort of things on the cloud.

Leapfin, of course, is targeted at large companies, with beaucoup dollars — leaving typically thrifty Woo retailers who also need automated Cash Reconciliation functionality somewhere out to the left of Siberia.

So if I can develop a WP plug-in that does a decent job of transaction matching and offers some flashy temporal analytics, then I think StoreBoss might be of some interest to the Woo customer base.

And even if not, I will still use it for my own Woo e-commerce store, and save myself a lot of tedious checking every month for the state taxes, and at year’s end when it’s time for the Federal.



stripe payments file
Structure of sb_etrans table – it mirrors the redundant Stripe payment output file almost exactly while adding a Created_localtime descriptor.

Time is a central concept everywhere — in e-commerce, as it is in life. Some data architects have even argued that non-temporal data is a “trivial extension” of temporal data.

Knowing how long a transaction takes or fails to complete its entire life cycle can produce insights into the order flow pipeline, as well as facilitate reconciliation.

Similarly, knowing what time of day, or what day of the week, or what month of the year that customers shop most, can give store owners the ability to discern purchasing patterns that would otherwise go unnoticed.

Unfortunately, obtaining normalized, temporal information is not always the easiest thing to do.

A credit card processor may use UTC time to indicate when a transaction took place. UTC time is convenient in terms of data processing, but it does very little to enhance a merchant’s understanding of localized temporal shopping trends.

It would be simplistic to say that a one statement solution would be sufficient to convert everything to normalize.  An example of a naive temporal query is as follows:

select UTC_TIMESTAMP() - now();

A WC-plugin programmer has to infer a merchant’s actual timezone if he or she does not know in advance where the application will run.

Algorithms that take into account a global merchant’s entire temporal ecosystem can get quite complex.

Such a merchant may have customers located in any time zone around the world, or have a geo-distributed platform, in which time zones support daylight savings times vary.

There is also always the possibility — however remote, but a consideration that a completist developer would take into account — that an operating system may be set to one-time zone, the RDBM(S) running under it to another (which can be further complicated by distributed master-slave server deployments), and the runtime programming language (such as PHP) is defaulting to yet another TZ setting.

It is therefore obvious that some means must be found to determine events in equivalent time: for example, the local (to the customer) time that a customer purchased an item, and the local (to the merchant) time that funds became available in their treasury account(s).

My goal in this post is to create a simple function CONVERT_UTC_2_LOCALT that converts a UTC time value that is found in Stripe’s payment file to a localized one.

Here are the (ideal) requirements:

1. It must convert UTC time for the current year (at time of coding, this is 2019)
2. It must also convert UTC time for the previous year
3. The conversion must work irrespective of an end user’s local timezone
4. Conversion must take into account DST (in the US, and, after testing, in Europe and perhaps elsewhere)
5. If UTC values are outside this range, then the local time bucket is set to being identical to UTC
6. The entire function must be written in SQL
7. Must be tested against a sufficiently rich data set for performance evaluation

Here is a read-only, key | value table that is useful as a fast lookup of hardcoded data values.

CREATE TABLE sb_temporality (
sb_meta_key VARCHAR(255) NOT NULL,
sb_event_marker_UTC DATE NULL,
sb_ordinal_date_UTC SMALLINT NULL, -- aka "Julian date"
PRIMARY KEY ( sb_meta_id ) );

CREATE UNIQUE INDEX sb_tempindx ON sb_temporality(sb_meta_key);

Here’s the data contained in this “helper” table.

// 2019 DST rows

INSERT sb_temporality
'2019:03:10 02:00:00',

INSERT sb_temporality
'2019:11:03 02:00:00',

// 2018 DST rows

INSERT sb_temporality
'2018:03:11 02:00:00',

INSERT sb_temporality
'2018:11:04 02:00:00',
storeboss table
The extensible “temporality” table – does it look familiar?

Note that rows can easily be added to support additional time zones, or other sorts of calculations that need quick lookup values; for example, to support domains that require, say, decimal values or other data types.

Here’s the stored function.



/* variables */

DECLARE now, maintenant,
local_t DATETIME;

DECLARE curr_start_dst,
prev_end_dst SMALLINT;

/* initialization */

SET now = now();
SELECT sb_ordinal_date into curr_start_dst
from sb_temporality where sb_meta_key = '2019_start_dst_us';
SELECT sb_ordinal_date into curr_end_dst
from sb_temporality where sb_meta_key =  '2019_end_dst_us';
SELECT sb_ordinal_date into prev_start_dst
from sb_temporality where sb_meta_key= '2019_start_dst_us';
SELECT sb_ordinal_date into prev_end_dst
from sb_temporality where sb_meta_key = '2019_end_dst_us';

/* mainline */

-- current year, within DST interval

IF ( YEAR(Created_UTC) = YEAR(now) ) 	
AND ( DAYOFYEAR(Created_UTC) BETWEEN curr_start_dst AND curr_end_dst )
THEN SELECT (CONVERT_TZ(Created_UTC,'+00:00', LEFT(timediff( now(), utc_timestamp() ), 6)) )
into maintenant;

-- current year, outside DST interval

IF ( YEAR(Created_UTC) = YEAR(now) ) 	
AND ( DAYOFYEAR(Created_UTC) NOT BETWEEN curr_start_dst AND curr_end_dst )
THEN SELECT (CONVERT_TZ(Created_UTC,'+00:00', LEFT(timediff( now(), utc_timestamp() ), 6))  - INTERVAL 1 HOUR )
into maintenant;

-- previous year, within DST interval

IF ( YEAR(Created_UTC) = (YEAR(now) + INTERVAL -1 YEAR ) ) ) 	
AND ( DAYOFYEAR(Created_UTC) BETWEEN prev_start_dst AND prev_end_dst )
THEN SELECT (CONVERT_TZ(Created_UTC,'+00:00', LEFT(timediff( now(), utc_timestamp() ), 6)) )
into maintenant;

-- previous year, outside DST interval

IF ( YEAR(Created_UTC) = (YEAR(now) + INTERVAL -1 YEAR ) ) 	
AND ( DAYOFYEAR(Created_UTC) NOT BETWEEN prev_start_dst AND prev_end_dst )
THEN SELECT (CONVERT_TZ(Created_UTC,'+00:00', LEFT(timediff( now(), utc_timestamp() ), 6)) - INTERVAL 1 HOUR)
into maintenant;

RETURN maintenant;


Now all you have to do is run the following query.

UPDATE sb_etrans
SET Created_localtime = 

In this implementation, CONVERT_UTC_2_LOCALT(datetime) handles DST and non-DST intervals for “-0500” UTC offsets (which includes EST). It performs temporal normalization on transactions that were generated during 2018-19. Thanks to the sb_temporality table, it’s a straightforward exercise to extend the TZ conversion to other countries or date ranges.

Here’s the before and after.

CONVERT_UTC_2_LOCALT(datetime) greatly simplifies normalizing UTC, and can easily be added to a script that loads Stripe data to the appropriate StoreBoss table on the WP server.


Jumping Ahead – Handling Daylight Savings Time

StopreBoss Data Flows
Figure 1: StoreBoss Data Flows

StoreBoss™ relies on three important temporal data sources:

  1. the WooCommerce (WC) DB;
  2. Stripe transactional data; and
  3. a WC store owner’s bank Statement of Account.

StoreBoss performs a “tick-and-tie” matching operation between these sources of transactional data. Discrepancies may arise due to differences in time settings.*

DST adds a further complication, by causing transactional time to be shifted on recurring intervals, depending on which TZ a store owner has set for his or her WP installation.

Per Stripe’s documentation,



All payments and payouts are processed according to UTC time. As a result, the processed date may not be the same as your local time zone.


A desirable solution is to normalize the Stripe charge date by extending MariaDB (or MySQL) with a stored function that automatically and consistently performs a correction in all use cases.

By applying this function to produce a new temporal value (which is persisted in a custom descriptor during file loads), StoreBoss can calculate normalized lag times between the various servers that produce transactional data.

Thus, StoreBoss will be able to more accurately verify a store owner’s Cash Balance Ledger position.

Let’s get into some technical details.

Assume you have downloaded from Stripe the previous day’s transactions, and manually load it to a work table whose structure matches the file’s.  While creating the necessary DDL, you might notice that a field named “Created(UTC)” is used by Stripe to indicate when news of the transaction was conveyed to your bank.

After uploading the CSV file (which happens to contain 1 record) to a work table on Maria whose structure matches this file, you decide to create a view that contains a column that shows the time of the event in local units. The code for such a view that produces the local time translation for any global time zone would look something like this.

Created_UTC AS 'CreateD UTC',
CONVERT_TZ(Created_UTC,'+00:00', LEFT(timediff( now(), utc_timestamp() ), 6) )
AS 'Event instance',
FROM sb_etrans;

As you can see below, this code does its job perfectly well — except for two fatal flaws… it does not check to see if a transaction falls within the DST interval for your timezone, whose start and end date (expressed as a number, not as a weekday) changes yearly.

For example,  the second Sunday of March in 2019 was March 10, and the first Sunday of November will be the 3rd. In 2018, these dates were March 11th, and 4th.

storeboss utc testing


To facilitate its use, the modified code could be encapsulated, as follows:**

Created_UTC AS 'CreateD UTC',
AS 'Event instance',
FROM sb_etrans;

We shall be exploring in some depth CONVERT_UTC_2_LOCALT.

In future posts, I will explore Stripe’s Sigma product, which allows WC store owners) to perform extensive data analysis.

Sigma may be a preferred way for SQL-savvy end users to obtain needed data from Stripe — without having to go through Stripe’s REST API and JSON file decoding and OAuth authorizations.

To be sure, Sigma is not free; to many SMB WC storeowners, its price point — particularly in view of Sigma’s significant data lag — may not be worth $500 a year.

Moreover, the hurdle of its SQL interface can be quite intimidating to a nontechnical end-user.

Having said all that, let’s write the UTC conversion function for our manually-downloaded CSV STRIPE payments file.

*UTC has been in the news of late!

**Realistically you would only have to account for the last 2 years of transactions, on a rolling basis, as this is typically a banking archive limit

The TransEvents Table

storeboss table

Much of the ensuing discussion will be about this table.  Please click here to see its the early beta version its DDL and load file DML.

There will be significant changes to this data object in the weeks ahead.  Stayed tuned for more info, as I further progress in the development of StoreBoss.

Please note that I am targeting MySQL on the Server, which means that much of the advanced temporal discussion on this blog will not apply directly to StoreBoss.

Also, most hosting providers do not allow users to save SQL output to a custom file location — unless they are paying for some super expensive plan.  With this in mind, I’m developing StoreBoss as a WP plugin that populates data to this (and perhaps other) custom tables. I will try to do most if not all of the data prep using SQL, given that the row counts are likely to be small enough that StoreBoss will not significantly impact production.

The data it contains could be downloaded, if necessary, for further processing by an end-user.

WC Orders – more FSM analysis

wp erd
WP 4.4.2 schema

If you take a look at the entity-relationship diagram for WP’s original core tables (see above), which comprised a baker’s dozen or so tables, you might get the impression that RI constraints are being enforced via an underlying database engine.

In reality, most of WP/WC’s RI constraints are maintained through aplication code.

This can easily be verified by querying the MariaDB information schema catalog, as shown here.

Why was native RDBMS RI not implemented by WP?

According to Mike Little (one of the original developers of WordPress), it was an accident of history.

The initial DB schema for WP was created by Michel Valdrighi in the early 2000s. Valdrighi was the author of B2/Cafelog, from which WP was forked.

Per Little, either MySQL’s support for referential integrity was lacking back then, or, the database drivers for PHP at the time did not support RI.

The net result is that Woocommerce — an OLTP plugin that extends some core WordPress tables, as well as adds another half dozen or so tables to WP’s original schema — ended up being relational in name only. This despite being a domain that’s ideally suited for the kind of normalized schema implementation that you will find in any database 101 textbook.

In reality, WC product and transactional persistence is achieved via a hard-to-understand, key-value scheme that scales poorly and contains serialized columns that make porting or migrating this database less straightforward than might otherwise be desired.

We will now write a query (Q1) that materializes temporal order-related information.

In the example below, Q1 results in data pertaining to Order 117.

p.ID AS "Trans ID",
p.post_date AS "Trans Date",
p.post_modified AS "Trans Mod Date",
c.comment_content AS "Trans Status", 
c.comment_date AS "Status Date",
x1.meta_value AS "Order Key",
x2.meta_value AS "Currency",
o.meta_value AS "Order Total",
x3.meta_value AS "Completed Date",
x4.meta_value AS "Paid Date"

FROM wp_posts p

LEFT JOIN wp_comments c ON
(p.post_type = "shop_order"
AND p.ID = comment_post_ID 
AND c.comment_type = "order_note")

INNER JOIN wp_postmeta o ON 
(o.`meta_key`= "_order_total"
AND p.ID = o.post_id)

INNER JOIN wp_postmeta x1 ON 
(x1.`meta_key`= "_order_key"
AND p.ID = x1.post_id)

INNER JOIN wp_postmeta x2 ON 
(x2.`meta_key`= "_order_currency"
AND p.ID = x2.post_id)

INNER JOIN wp_postmeta x3 ON 
(x3.`meta_key`= "_completed_date"
AND p.ID = x3.post_id)

INNER JOIN wp_postmeta x4 ON 
(x4.`meta_key`= "_paid_date"
AND p.ID = x4.post_id)

WHERE p.ID = 117
ORDER BY p.post_date;


Q1 produces the following result.

wc data

We now have a starting point for analyzing temporal data with respect to state changes that a WC order can go through.

The FSM state transitions for the WC Orders domain  can be listed as follows:

OS0 unknown (custom state extension)
OS1 pending payment
OS2 processing
OS3 on hold
OS3 completed
OS4 canceled
OS5 refunded
OS6 failed
OS7 manager approved (custom state extension)

In our next few posts, we will further analyze the results produced by a variant of Q1, using a more comprehensive example.

We’ll also inspect a few additional tables — such as the notorious wp_options tables (home of ballooning transient data sets) as well as the quasi indecipherable woocommerce_order_items and woocommerce_order_itemmeta tables.

To anyone who follows Woo or is a Woo customer, it will not come as a revelation that Woo can quickly become cumbersome, complex, and slow.  As many respected observers have already noted, the WP/Woo schema is not particularly well-suited for e-commerce.

With this in mind, products such as Glew have been developed and are available for major e-commerce retailers. But they are expensive.  Still, it does not cost anything to learn from their YouTube tutorials.

Speaking of expensive, Liquid Web offers a pricey WooCommerce managed solution; we will eventually be taking a look at their custom orders plug-in, which is available for free on Github.

Our goal is to try to devise an open-source (ie, affordable!), best-practices way of producing sophisticated order analytics on a highly secure platform that does not consume costly hosting CPU cycles — namely, the desktop.

But first, we will have some fun in serverland, and take a look at the temporal semantics of WC orders, and how MariadDB 10.3+ might be used to extend these such that the average non-technical store owner can better, as the StoreBoss™ tagline goes, mind his or her business.


Woo Orders – Part 1

One of the main ideas behind my StoreBoss™ project is explore in different, interesting and useful ways how to provide a store owner with the ability to keep forensic track of Woo orders coming in over the digital transom.

Unfortunately, Woo reports tend to be skimpy in that regard, and often do not have the sort of details you might be interested in knowing.  Also, WooCommerce is not really ideally suited for ad hoc reporting or data analysis on the fly.

It always possible to write your own reports of course, but that means humping out the WP way of doing things, which entails learning PHP and mastering an elaborate gauntlet of hurdles that stand between you and your data.

It would be nice to simply be able to query the WP database directly, without having to go through lumbering and slow “REST APIs”, when all you want maybe is to quickly see what orders are coming in, how long it’s taking to ship product out the door, how long is it taking before you get paid, and determine precisely the haircut imposed by third-party payment processors like STRIPE.  A store owner might also wish to verify that no sales records are being manipulated by inhouse staff.

It’s unreasonable to expect the small B&M store owner to have the time or technical knowledge or personal inclination to master the contortions required to query the WP/WOO database schema.

However, if one is interested in stripped down, just the facts information about, say, Orders, it is not really that hard to write an SQL query that does the job by directly querying the database.

Have you ever wondered how you can verify that the data you see in Woo reports (or the various third party Woo plugins) is accurate?  Wouldn’t it be nice to be able to verify their accuracy for yourself?

Moreover, have you ever wanted to be able to look at a simple, up-to date report and see in a instant your cash flow position at any given moment?

That is what I am going to attempt to build in StoreBoss™. Please note that the ideas described here are somewhat experimental, and rely on MariaDB 10.3 or above as the WP Storage Engine.  Your WP hosting provider may require a dedicated server for such a configuration. A StoreBoss™ plug-in would of course eliminate this dependency.

Please note, also, the WP in general frowns on directly querying its database.  Since we are not touching any Woo or WP data directly (other than via the SELECT statement), we can safely ignore this directive — while recognizing that any query run against the DB will incur a performance cost.

Our goal therefore will be to create a staging  table for StoreBoss in the WP database, then download this data to the user’s machine where it can be stored in SQLite and analyzed using a Python application.  SQLite is a robust, single user DB engine does not require any technical knowledge to configure, and is free.

In this manner, security and privacy issues will be dealt with in a strict fashion: Woo and other data (such as Bank Statement information, which we will get to later on) will be controlled on the end user’s machine, and thus be far less susceptible to SQL injection and other hacker annoyances and malicious attacks that must be taken into account by WP and Woo in the cloud.

This will free StoreBoss™ from having to rely on tediously baroque and ineficient methods of calling the inordinately complex underlying Woo DB (using PDO etc) out of a constant paranoia of being hacked, a situation that also hamstrings the ability of the end user to query data on an ad hoc basis.

With all the needed data (which can be encrypted, if need be) updated on his or her private desktop on notebook machine, a store owner will NOT risk burdening the WOO prod env with an undue amount of extra CPU cycles that may exceed some hosting provider threshhold or slow down catalog displaying or transactional order processing.

Let’s get started with a simple example.

I’ll begin with 1 order that I will enter manually in a clean copy of WP/Woo that I’ve installed on my desktop.  You can set up something like this too, without too much trouble — in my case, I’m running XAMPP on a refurbished 10 year old HP machine running 64-bit Windows 8.1.  So there is nothing fancy whatsoever about my setup.  Store owners of course would not have to do this; my setup is for dev purposes only.

Here is what my Woo Order panel looked like after I manually enered my first order.

woo panel

To verify the accuracy of this data, I queried the underlying database using PhPMyAdmin.  This is the query I wrote to retrieve the basic information I need to know about this order.

p.`ID` AS `Trans ID`,
p.`post_status` AS `Trans Status`,
p.`post_date` AS `Trans Date`,
o.`meta_value` AS `Trans Amt`
FROM  `wp_posts` p
INNER JOIN  `wp_postmeta` o ON (
o.`meta_key`= "_order_total")
`post_type` = "shop_order"
p.`ID` = `post_id`;

This query produced a single row of Order data that’s in a Pending payment state.

I then changed the state of the Order to Processing – which meant that Payment was received (of course no actual money had changed hand, as this was just a test).

The Woo Orders panel now looked like this.

Notice how there was now a highighted “1” in a small red circle now in the WP Admin Panel.

I again used my query, which gave me the following result.

Note how there was still only one order row?  It had changed state (check out the Trans Status column), but did not provide an audit trail that I could query as to how much time had elapsed between when the order was received and when the money had “come in.” (We will go with Woo’s loose concept of being in receipt of funds, for now.)

Luckily, I had previously created a custom table in the WP db, and written out the first Order row to it.  Here is the DDL I used to create the table (the sb_ prefix stands for StoreBoss™, which will help reduce the chances of database-object naming conflicts in the WP/Woo persistence engine and its accompanying plug-in eco system.

`Trans_ID` bigint(20) NOT NULL,
`Trans_status` varchar(20) NOT NULL,
`Trans_date` datetime NOT NULL,
`Trans_amt` decimal(13,2) NOT NULL,
PRIMARY KEY (`Trans_ID`, `Trans_status`) )

Note the Engine I’ve chosen: ISAM is faster for DSS or Analytical processing, which is what we will be doing here.

I then updated my table with the order row in this current state, using the following DML.

INSERT INTO `sb_trans_1` (`Trans_ID`, 
p.`ID` AS `Trans ID`,
p.`post_status` AS `Trans Status`,
p.`post_date` AS `Trans Date`,
o.`meta_value` AS `Trans Amt`
FROM `wp_posts` p
INNER JOIN `wp_postmeta` o ON (
o.`meta_key`= "_order_total")
`post_type` = "shop_order"
p.`ID` = `post_id`;

Now I had 2 rows of order data, but unfortunately I had a small problem:  since the transaction date had not changed (which of course is correct), I had no way of knowing how much time had elapsed between these two events.

To remedy the situation, I added a timestamp to my table, using the following code.

ALTER TABLE `sb_trans_1` 
ADD `Trans_timestamp` TIMESTAMP NULL DEFAULT NULL AFTER `Trans_amt`;

Now I actually had temporal state information pertaining to the changes in status that my order had gone through,  I now marked the Order as paid, and updated my custom table.  It now looked like this.

Note the Trans_timestamp column on the far right. Now I finally have a way of adding valuable audit trail data to my Store Management toolkit. The best part is that all this ETL scrubbing can be automated and displayed on an unclutterd PHP-driven report that might look something like this.

storeboss prototype
Early working Storeboss responsive prototype that uses PHP and Stored Procedures to display transaction data

Woo: Relational in Name Only

My idea for StoreBoss™ came out of running my wife’s Woocommerce estore for 5 years.

In case you didn’t already know this: Woo uses a hybrid quasi-relational / EVA data model. It is implemented on top of WP’s underlying, equally non-relational meta schema, which can run atop MySQL or MariaDB, typically using the InnoDB engine.

The WP data model was designed initially to provide data persistence to an early CMS system. It allowed the addition of n-number of attributes to a post entity type, as needs evolved, and without knowing in advance what these might be.

At its core,  the WP data model consists of rows  in a centrally important table that can be joined via a SELECT statement to a vertically structured (i.e., “thin”) table which specifies whatever attributes (instantiated as rows) WP would need, as time went by, for its post and, later, page types.

This has MAJOR performance implications.

Also, partly because of WP’s flexible,”meta” structure, Woo does not make “normal” usage of Primary and Foreign relationships — business rules are implemented via PHP code. This leads to issues — with WP and Woo’s feverish update schedule a sign of these.

Moreover, Woo uses serialization, which makes some of its content non portable and undoubtedly violates one of the relational model’s 12 rules, though you have me as to which one that might be.