The TransEvents Table

storeboss table

Much of the ensuing discussion will be about this table.  Please click here to see its the early beta version its DDL and load file DML.

There will be significant changes to this data object in the weeks ahead.  Stayed tuned for more info, as I further progress in the development of StoreBoss.

Please note that I am targeting MySQL on the Server, which means that much of the advanced temporal discussion on this blog will not apply directly to StoreBoss.

Also, most hosting providers do not allow users to save SQL output to a custom file location — unless they are paying for some super expensive plan.  With this in mind, I’m developing StoreBoss as a WP plugin that populates data to this (and perhaps other) custom tables. I will try to do most if not all of the data prep using SQL, given that the row counts are likely to be small enough that StoreBoss will not significantly impact production.

The data it contains could be downloaded, if necessary, for further processing by an end-user.

Advertisements

Unlocking WC Orders

free yourself

To regular store owners, the WC REST API is intimidatingly useless — a roadblock that does not allow a simple way to run customized or unique ad hoc queries they might wish to run.

To SQL-savvy data analysts who prefer to use straight-up relational declarations (that are not chopped up into little pieces by SQL Injection-driven paranoia), retrieving sales info through the REST API via some procedural intermediate language is an inelegant impedance mismatch hack: a baroque apparatus that is oppressive, particularly if an app does not expose the RDMBS to the outside world, and is only meant for inhouse consumption by trusted users who can be granted various levels of authorization.

Why not just get at the data directly?

sql query
Q1 and its result set, run using PHPMYADMIN

After testing various forms of Q1 (the query I’ve been playing around with during the last few posts), I arrived at a version that provides a succinct way of presenting Order-related information.

SELECT
    p.ID AS "Trans ID",
    p.post_date AS "Trans Datetime",
    c.comment_content AS "Event Status",
    c.comment_date AS "Event Datetime",
    o.meta_value AS "Order Total"
FROM
    wp_posts p
LEFT JOIN wp_comments c ON
    (
        p.post_type = "shop_order" AND p.ID = comment_post_ID AND c.comment_type = "order_note" AND c.comment_content NOT LIKE "Item%" AND c.comment_content NOT LIKE "Stock%"
    )
INNER JOIN wp_postmeta o ON
    (
        o.`meta_key` = "_order_total" AND p.ID = o.post_id
    )
INNER JOIN wp_postmeta x3 ON
    (
        x3.`meta_key` = "_completed_date" AND p.ID = x3.post_id
    )
ORDER BY p.post_date;

As shown up top, Q1 produces a result set that lists by ascending date the predetermined state transformations that every Order (aka transaction) goes through in the WC database.

I have added (to previous incarnations of this query) two LIKE operators to eliminate superfluous events from the Orders event-driven timeline.

This does not completely remove the return of all non-payment related events (such as inventory changes) that are recorded in the wp_comments table.

The mask format of these comments has changed over the years, and it would be inefficient to add too much string processing to Q1 to account for every variation; besides, WC has lately seemed to stabilize how it persists inventory change information, as of 2019, so this should not be too much of a problem, going forward.

Q1 results can be saved to a local file on the server (with a slight modification to the code), then loaded to a custom table that shows :

  1. when a Sale was recorded in WC;
  2. when STRIPE (which is the payment processor I am using for test purposes) processed the CC payment, and,
  3. when the order itself was marked as completed (indicating that it was fulfilled by the merchant).

As I’ll demo in the next post, this custom Orders table can then be queried by an end-user to produce temporal analytics.

The query also results in other information of interest, such as when a card was declined, or if a customer was sent a confirmation email.

I tested the query in a production WC environment with thousands of records in the wp_posts table.  It works correctly, and is also quite fast — while utilizing subseconds of CPU time.

wc stripe charge
Here a real-world result set. I ran Q1 against a production WC site that I manage. Here we can see a Stripe Charge event being recorded against a $503 sale that took place this past Sunday (on 8-11-19). Note the transaction time is captured down to the second.

However, I have not stress-tested the query in a huge database with millions of orders.

Any such merchant will have an in-house IT department that performs these sort of backend tasks; the idea with StoreBoss (which is the code name of the plug-in app I’m building) is to provide DSS functionality on a Windows or Mac desktop, using SQLite and Python — from data that is derived (or “unlocked,” if you will) from the WC database — hence the catchy Widgeteria tag line, “AI for the little guy.” (‘Guy’ here being generic, not gender-specific.)

Armed with this simple query, and a vanilla script front end that sits in the WP admin panel, store owners will have an additional way (meaning, in addition to what WC offers, free or pay, by way of downloadable Orders data) of downloading their order data to the desktop.

But even within the WP admin panel env itself, the data is presented such as to be easily readable.  It also permits the ad hoc querying, in a straight forward manner, of temporal state changes that an Order goes through in  WC — without causing locks on the WC OLTP env.

How cool is that?!

WC order panel
I manually entered this order using the WC admin order panel. I used the sample product db provided by WC; unfortunately, WC does not provide a simple way to upload a testbed transaction db. Testing has to be done manually unless you are willing to try your hand at this. Btw, note that transaction seconds are not displayed in the WC Order panel.

In production, the query will be implemented as a parameterized stored procedure, one that accepts sanitized date ranges.

It should also be possible to run this query as a MariaDB event (like a cron file), one that runs, say, at midnight + 1 min every 24 hours, in order to produce a running tally of the previous day’s order processing, which could be then downloaded to the desktop for DSS or tick-and-tie processing.

wc order analytics
It seems that the WC Order analytics report doe not completely love Orders that are manually entered!

But for now, I am still experimenting with how to extract the simplest, cleanest data set possible from the WP database without having to deal with the complex and resource-consuming WooCommerce REST API.

Q1 seems to be a good start.

WP set date panel
When doing temporal analytics, make sure you first set WP to the correct time. In this case, I set it to EST. This will save a lot of headaches later on, when we perform some basic date math. The subject of temporal databases, particularly distributed ones, is vast; I will touch upon relevant aspects in various posts to come.

We are not too far away now from figuring out how to match STRIPE payment data to the receipt of actual funds (some unit of time later, minus any haircut) in a real bank account for honest-to-goodness real-world transaction verification!

 

Woo Orders – Part 2

sales receipt

WooCommerce relies on a 2 table schema — one fat, one skinny — as the primary method for constructing Order data.  A variety of other tables provide auxiliary information inflight.

It is difficult for a non-database professional to extract data from this sink.

Woo does not expect its customers to know how to write PhP code.  In Woo’s view, this is the job of specialists, sort of like old school IT departments —  ie, highly trained developers who are versed in the “technology stack” that is the hallmark of modern Web development.

Woo expects these specialists to use an interface called a “REST API” to extract or manipulate data that persists in the Woo/WP datastore.

Absent is the alluring promise of relational databases — with their declarative, somewhat verbose, set-based programming sublanguages, that in theory should allow a programmer to largely be unconcerned with how structured data is saved to disk or retrieved.

Thus, a non-technical Woo store owner cannot with ease directly query the Woo database in an ad hoc fashion; for example, to independently verify and/or audit the validity of Woo’s assertion as to what it holds to be relational facts.

Specialized, rather arcane knowledge, is required.

In my previous post, we saw how to extract a subset of Order data, particularly if we are interested in performing a form of applied ELH analysis to a WooCommerce Order — in other words, how does a Woo order get born, live, then die?

A simple query with only 1 inner join was needed to populate (insert) an ISAM DSS table with Order transaction rows.

We also saw how an Order passes through multiple states.

Woo maintains a log of these states in a table called wp_comments.  Rows with the field comment_type equal ‘order_note’ trace each step along the way.

That is a valuable table, a crucial one, in fact, for the purpose of creating a temporal ELH table.

A common way to capture temporal info is to declare a point-based data model, which is what we shall do with a table called trans_1.

The role of trans_1 is to:

  1.  Provide information relating to Order-level state transitions
  2. Create an efficient means to download or extract Order data from Woo without having to deal with its REST API
  3. Serve as an independent accounting-level verification for the data that is provided by Woo’s reports

Let’s assume for a moment, as a learning exercise, that there is no wp_comments table.  This will help us better understand the Woo Schema and how it uses WP’s underlying tables to track Orders data.

In such a case, how could we populate trans_1?

Leaving aside performance considerations, a trigger might be an attractive option to automatically populate our custom Trans (ie, Order) table. This, in fact, is one of the traditional solutions in data processing to this classic problem; another is having an app write twice, once to the TRANS table in question, the other to an associated historical, timestamped “versioning” table.  These normally feature rolling time-period views (such as yearly, semi-annually, quarterly, monthly, weekly, and daily).

The trigger approach would eliminate the need to write a cron job that would periodically run the INSERT query that we saw in the previous post. This begs the question, how frequently should it poll the wp_posts table? How do we ensure that state transitions are not missed?

To sidestep this problem, triggers may offer a low-code solution.

I will try to demonstrate how this might work, with an Insert trigger, MariaDB-style.  Essentially the trigger populates Table B, every time a row is inserted in Table A.

(In our real-world Order example, Table B would have to have, also, an Update trigger, to retrieve Transaction Amt order-level data from the auxiliary EVA table that the wp_posts table uses to construct virtual Order tables on the fly. This sucks, but there you have it.)

Here goes.

First, I will create a table in PHPMYADMIN that replicates or mirrors the structure of the sb_trans_1 table that we saw in the previous post.  I will call this table sb_trans_2.

CREATE TABLE IF NOT EXISTS `sb_trans_2`
 (
`Trans_ID` bigint(20) NOT NULL, 
`Trans_status` varchar(20) NOT NULL, 
`Trans_date` datetime NOT NULL,
`Trans_amt` decimal(13,2) NOT NULL, 
CONSTRAINT `Trans_pk`
PRIMARY KEY (`Trans_ID`, `Trans_status`) ) 
ENGINE = 'MyISAM';

Here it is, in my local Sandbox MariaDB database.

Next, I will attach a trigger to sb_trans_1.  For demo purposes, I will use an artificial firing condition — that the Order Number exceeds an Order I manually created in the database.  In a real-world implementation, the test condition would, of course, be to perform a `post_type` = “shop_order” column eval on the wp_posts table.

Here’s the trigger:

DELIMITER //
CREATE TRIGGER addrow_sb_trans_2
AFTER INSERT ON sb_trans_1
FOR EACH ROW
BEGIN
IF NEW.Trans_ID > "117" THEN
INSERT INTO sb_trans_2 (Trans_ID, Trans_status, Trans_date, Trans_amt)
VALUES (NEW.Trans_ID,
NEW.Trans_status,
NEW.Trans_date,
NEW.Trans_amt);
END IF;
END; //
DELIMITER ;

Now let’s add an order to the trans_1 table, #118. We can use the Insert Panel on PHP admin to accomplish this task.  As you can see below, trans_2 is automatically updated.

Conclusion:  the trigger idea shows promise — maybe.

Let’s try it on wp_posts, just for laughs.  This is the Big Kahuna table in the WP/WOO database universe.  We add the following trigger.

DELIMITER //
CREATE TRIGGER addrow_sb_trans_1 
AFTER INSERT ON wp_posts 
FOR EACH ROW 
BEGIN 
IF NEW.post_type = "shop_order" 
THEN INSERT INTO sb_trans_1 
(Trans_ID, Trans_status, Trans_date) 
VALUES (NEW.ID, NEW.post_status, NEW.post_date);
END IF; 
END; //
DELIMITER ;

Now let’s try to add a row using WooCommerce -> Orders in the WP admin panel.

Disaster.

And here is some further unexpected behavior.

The naive moral of this exercise is, don’t use triggers on WP/WC tables,

Out of curiosity, I investigated what caused the Fatal Error.  It turns out that our seemingly innocuous trigger resulted in the invocation of code that fires off a call at Line 160 that crashes the plugin. Why this happened shall forever remain one of the mysteries of life.

brackets code

Btw, I also tried, just for kicks, attaching a trigger to the wp_wc_order_stats, which is also populated with Orders data.

woo order stats table

This, too, didn’t work, but we might use this table down the road, as it provides a useful breakdown of the dollar amt descriptors that comprise a total Order amount.

In conclusion, it’s a good job that wp_comments exists.

But we’ve learned quite a bit about how Orders are handled internally in the WP database.  This will help us construct the SELECT queries that we will be needing.

In my next post, I will create a parameterized stored procedure that extracts and transforms Order data from the Woo tables mentioned above and loads the result into trans_1. Most likely I will write the result set to disk, then load the file into the table, as this is far faster than doing inserts, and I won’t have the transaction overhead to deal with. Besides, trans_1 is an ISAM table:  runs like greased lightning for DSS queries, not so fast OLTP.

From a data architecture standpoint, trans_1 will provide an ad hoc means to populate StoreBoss™ with dynamic Order data on a temporal basis.

All I would then have to do is put together a simple PHP GUI WP plug-in that supports CSV exporting to a local machine, and we can go on from there.

Most likely the trans_1 SP would look something like this:

SELECT 
p.ID AS "Trans ID",
p.post_date AS "Trans Date",
c.comment_content AS "Trans Status", 
c.comment_date AS "Status Date",
o.meta_value AS "Trans Amt"
FROM wp_posts p
LEFT JOIN wp_comments c ON
(p.post_type = "shop_order"
AND p.ID = comment_post_ID 
AND c.comment_type = "order_note")
INNER JOIN wp_postmeta o ON 
(o.`meta_key`= "_order_total"
AND p.ID = o.post_id)
ORDER BY p.post_date;

which results in

sql result

Bingo.

Woo Orders – Part 1

One of the main ideas behind my StoreBoss™ project is explore in different, interesting and useful ways how to provide a store owner with the ability to keep forensic track of Woo orders coming in over the digital transom.

Unfortunately, Woo reports tend to be skimpy in that regard, and often do not have the sort of details you might be interested in knowing.  Also, WooCommerce is not really ideally suited for ad hoc reporting or data analysis on the fly.

It always possible to write your own reports of course, but that means humping out the WP way of doing things, which entails learning PHP and mastering an elaborate gauntlet of hurdles that stand between you and your data.

It would be nice to simply be able to query the WP database directly, without having to go through lumbering and slow “REST APIs”, when all you want maybe is to quickly see what orders are coming in, how long it’s taking to ship product out the door, how long is it taking before you get paid, and determine precisely the haircut imposed by third-party payment processors like STRIPE.  A store owner might also wish to verify that no sales records are being manipulated by inhouse staff.

It’s unreasonable to expect the small B&M store owner to have the time or technical knowledge or personal inclination to master the contortions required to query the WP/WOO database schema.

However, if one is interested in stripped down, just the facts information about, say, Orders, it is not really that hard to write an SQL query that does the job by directly querying the database.

Have you ever wondered how you can verify that the data you see in Woo reports (or the various third party Woo plugins) is accurate?  Wouldn’t it be nice to be able to verify their accuracy for yourself?

Moreover, have you ever wanted to be able to look at a simple, up-to date report and see in a instant your cash flow position at any given moment?

That is what I am going to attempt to build in StoreBoss™. Please note that the ideas described here are somewhat experimental, and rely on MariaDB 10.3 or above as the WP Storage Engine.  Your WP hosting provider may require a dedicated server for such a configuration. A StoreBoss™ plug-in would of course eliminate this dependency.

Please note, also, the WP in general frowns on directly querying its database.  Since we are not touching any Woo or WP data directly (other than via the SELECT statement), we can safely ignore this directive — while recognizing that any query run against the DB will incur a performance cost.

Our goal therefore will be to create a staging  table for StoreBoss in the WP database, then download this data to the user’s machine where it can be stored in SQLite and analyzed using a Python application.  SQLite is a robust, single user DB engine does not require any technical knowledge to configure, and is free.

In this manner, security and privacy issues will be dealt with in a strict fashion: Woo and other data (such as Bank Statement information, which we will get to later on) will be controlled on the end user’s machine, and thus be far less susceptible to SQL injection and other hacker annoyances and malicious attacks that must be taken into account by WP and Woo in the cloud.

This will free StoreBoss™ from having to rely on tediously baroque and ineficient methods of calling the inordinately complex underlying Woo DB (using PDO etc) out of a constant paranoia of being hacked, a situation that also hamstrings the ability of the end user to query data on an ad hoc basis.

With all the needed data (which can be encrypted, if need be) updated on his or her private desktop on notebook machine, a store owner will NOT risk burdening the WOO prod env with an undue amount of extra CPU cycles that may exceed some hosting provider threshhold or slow down catalog displaying or transactional order processing.

Let’s get started with a simple example.

I’ll begin with 1 order that I will enter manually in a clean copy of WP/Woo that I’ve installed on my desktop.  You can set up something like this too, without too much trouble — in my case, I’m running XAMPP on a refurbished 10 year old HP machine running 64-bit Windows 8.1.  So there is nothing fancy whatsoever about my setup.  Store owners of course would not have to do this; my setup is for dev purposes only.

Here is what my Woo Order panel looked like after I manually enered my first order.

woo panel

To verify the accuracy of this data, I queried the underlying database using PhPMyAdmin.  This is the query I wrote to retrieve the basic information I need to know about this order.

SELECT 
p.`ID` AS `Trans ID`,
p.`post_status` AS `Trans Status`,
p.`post_date` AS `Trans Date`,
o.`meta_value` AS `Trans Amt`
FROM  `wp_posts` p
INNER JOIN  `wp_postmeta` o ON (
o.`meta_key`= "_order_total")
WHERE
`post_type` = "shop_order"
AND
p.`ID` = `post_id`;

This query produced a single row of Order data that’s in a Pending payment state.

I then changed the state of the Order to Processing – which meant that Payment was received (of course no actual money had changed hand, as this was just a test).

The Woo Orders panel now looked like this.

Notice how there was now a highighted “1” in a small red circle now in the WP Admin Panel.

I again used my query, which gave me the following result.

Note how there was still only one order row?  It had changed state (check out the Trans Status column), but did not provide an audit trail that I could query as to how much time had elapsed between when the order was received and when the money had “come in.” (We will go with Woo’s loose concept of being in receipt of funds, for now.)

Luckily, I had previously created a custom table in the WP db, and written out the first Order row to it.  Here is the DDL I used to create the table (the sb_ prefix stands for StoreBoss™, which will help reduce the chances of database-object naming conflicts in the WP/Woo persistence engine and its accompanying plug-in eco system.

CREATE TABLE IF NOT EXISTS `sb_trans_1`
(
`Trans_ID` bigint(20) NOT NULL,
`Trans_status` varchar(20) NOT NULL,
`Trans_date` datetime NOT NULL,
`Trans_amt` decimal(13,2) NOT NULL,
CONSTRAINT `Trans_pk`
PRIMARY KEY (`Trans_ID`, `Trans_status`) )
ENGINE = ‘MyISAM’;

Note the Engine I’ve chosen: ISAM is faster for DSS or Analytical processing, which is what we will be doing here.

I then updated my table with the order row in this current state, using the following DML.

INSERT INTO `sb_trans_1` (`Trans_ID`, 
`Trans_status`,
`Trans_date`,
`Trans_amt`)
SELECT 
p.`ID` AS `Trans ID`,
p.`post_status` AS `Trans Status`,
p.`post_date` AS `Trans Date`,
o.`meta_value` AS `Trans Amt`
FROM `wp_posts` p
INNER JOIN `wp_postmeta` o ON (
o.`meta_key`= "_order_total")
WHERE
`post_type` = "shop_order"
AND
p.`ID` = `post_id`;

Now I had 2 rows of order data, but unfortunately I had a small problem:  since the transaction date had not changed (which of course is correct), I had no way of knowing how much time had elapsed between these two events.

To remedy the situation, I added a timestamp to my table, using the following code.

ALTER TABLE `sb_trans_1` 
ADD `Trans_timestamp` TIMESTAMP NULL DEFAULT NULL AFTER `Trans_amt`;

Now I actually had temporal state information pertaining to the changes in status that my order had gone through,  I now marked the Order as paid, and updated my custom table.  It now looked like this.

Note the Trans_timestamp column on the far right. Now I finally have a way of adding valuable audit trail data to my Store Management toolkit. The best part is that all this ETL scrubbing can be automated and displayed on an unclutterd PHP-driven report that might look something like this.

storeboss prototype
Early working Storeboss responsive prototype that uses PHP and Stored Procedures to display transaction data

How I turned into a snake in the grass

Python in the grass

I’ve spent the last month coding in Florida, where the “local” Burmese python population is apparently exploding.

I was putting together an app that used PHP, HTML, CSS, some JS, and a custom SVG ico.  It currently runs on a desktop Xampp server configuration, with MariaDB 10.3 as the storage engine.

Initially, I thought I would eventually turn this application in a WP plug-in, but have since changed my mind.  The more I got my hands dirty with PHP and Maria, the less enthusiasitic I became about this whole plug-in idea.

Instead, I realized that my app would best be implemented  as a series of “widgets” written in Python, with SQLite as the data store, and PYSimpleGUI on the front end.

Going Xampp was hardly a waste of time though.  But rather than have my Data Analysis/DSS app hit the WP DB in real time, I’ll create a simple WP helper plugin that downloads the STRIPE and Woo Order info that’s needed by my app .  Instead of using plain old CSV, I’m thinking of implementing a micromodel download, as described here. Not sure yet how this would work, but the concept of dowloading relations (in a Coddian sense) instead of data cell arrays sounds far more appealing.

Using DB Browser for now, to look at my test data, and play around with SQLite, until I learn Python well enough to write the actual app.

SQLite seems perfect for my application, which is meant to be used by a single user  — whom I envision to be a non technical business owner running a small online and/or brick-and-mortar shop.

Since the app deals with sensitive financial data, it’s actually best — from a peace of mind standpoint — for the data to live securely where no one (presumably) can get at it but its one legitimate user — the store owner who is looking at store cash flow information.

What a relief not to have to write the contorted sanitizing code in order to deal with SQL injections and other annoyances!

However, I still have to look into the RCE vulnerability in SQLite that was identified earlier this year.

So far, though, SQLite seems to be lightning fast, espeically given the small test sample I am using (around 400 rows).  There are peculiaritiies in syntax that I have to study; a good primer on this topic can be found here.  Another peculiarity is that atypical code of conduct manifesto that made the news recently; and that is all I will say about that!

My work is cut out for me.  I have to simultaneously learn Python 3.7, PYSimple, and how to use GitLab — not to mention figure out the algorithms for my app.

I hope to resurface in Pythonesque mode in a month or two or three.

What’s the rush?