StoreBoss… the plugin!

storeboss(tm).io
StoreBoss: first sighting…

Okay… so it’s just a PHP file with header info, for now, but I think it looks right at home in WP admin.

I did some more data modeling and source data checking today to make sure that the “tick-and-tie” algorithm will work, given the test source data available from WC, Stripe, and Bank of America.

Much of my day was spent wrestling with the question as to whether I should implement part of StoreBoss on a desktop. I discounted MS Access and/or Libre Base as unsatisfactory options for a variety of technical and functional reasons.

So what other choices do I have?

For example, if I use SQLite as the DB engine, what language should I use for the GUI part?

Python has been all the rage for data analysis for some time, so Python/SQLite is an obvious choice.

But…

  1. I don’t know Python;
  2. I don’t really want to hassle with Tk or other Python frameworks to produce a desktop GUI that will probably look amateurish;
  3. I don’t care for some of the limitations of SQLite.

On the other hand…

  1. I am stunned — stunned! — at what PHP programmers have to go through if they wish to debug their apps by stepping through code; and;
  2. I want to avoid, as much as possible, being overly ensnared by the WP way of doing things;
  3. You can do a lot of cool thing in terms of data visualization with existing JS libraries.

So it looks like the split might be  SQL + basic PHP for extracting and prepping WC data for querying by parameterized stored procedures on the prod WP server. By focusing on doing much of this work via SQL, I can quickly create a few custom tables in WP, populate these via SQL, while providing value add transactional information via a bare-bones PHP script that (safely) calls the queries, then display results using JS razzmatazz.

StoreBoss would also provide a download facility to either a JS/SQLite SPA app (again, this is a lot of work for one person, especially for someone who is more of a data modeler/architect than a down-in-the-trenches developer who eats Laravel magical callbacks for breakfast) or EasyMorph (a bit on the expensive side, but fits right into the code-free data analysis mantra) or MS Power BI (more achievable, in a shorter time frame, with fewer GUI headaches, despite the learning curve) for sophisticated data analytics on the desktop.

This type of hybrid architecture is a simpler cousin to the sort of Big Data deployments ably discussed by Lockwood Lyon here.

I am eagerly awaiting, by the way, the short term release roadmap that is planned for Easymorph.

The “Actions for REST APIs” might allow for downloading of data from WC’s headless interface. The ability to pipeline data to Power BI in December ’19 is pretty exciting too — although its (meaning Power BI’s) notions of what constitutes a DSS data store and its associated BASE standards compliance requires further investigation, ditto the pros and cons of time spent mastering non-portable, non-transferable MS data manipulation sub-languages.

We’ll get to all that down the road. But for now, plain ol’ StoreBoss will live in the open-source WP cloud.

Initially, users will have to manually upload their bank file transaction data, and that’s a drag.

But that’s how it has to be, at least for now.

B of A (which is the bank I use for the e-commerce site I run) would not accept a wildcard SSL certificate comparable to the one used by SiteGround for my site:  so I would not be able to programmatically access B of A’s CashPro Account Info API from my hosted SG server.

I’ll have to figure out how to make the uploading (of banking data) to StoreBoss’s staging table as simple as possible;  there might be existing, open-source software out there that might be of help in that regard.

Maybe I can just lay out a sample Libre Calc template and my prospective customers can do the work of mapping CSV fields to the DB columns.  If explained clearly, and it’s not too complicated a file structure, the payoff might be worth the tedious hassle. But I don’t like this part of the solution… so we shall see how all this pans out. At the very least, I should be able to develop reusable template mappings for the top 3 US bank transactional files, and have those available out of the box.

Speaking of reusability, one of the interesting things (to me, at least) of this proposed StoreBoss architecture is that almost none of the code touches or relies on WP core.  What this means is that I can use JS libraries such as Datatables (which in turn depend on JQuery) that will greatly reduce the coding I have to come up with attractive tables for StoreBoss, and even if I end using other libraries (say for charting or date-fns for temporal calculations) that carry MIT licensing, it won’t matter, since I’m hitting the database the directly, and not depending directly on WP/ WC.  I will have to verify this with plug-in gods at WP, but I think I’m right.  Why re-invent the wheel?

Meanwhile, I am close to finalizing the sb_transaction_event table, and ran a few test analytics SQL queries against real-world data.

Boy was I unpleasantly surprised at some of the things I found! (Hint: certain online CC processors appear to charge over 3% in some cases, which I think is pretty rich, considering that my client’s in-store credit card processor charges less than 2).

Woah.

StoreBoss can’t come soon enough.

It shouldn’t be only big corporations that can shell out for automated systems that check this sort of thing.

What about the SMB WooCommerce store owner who can’t afford fancy cloud-based systems and an army of accountants and data analysts?

If smaller scale WC store owners are doing any kind of volume, you can bet there’s money probably slipping through the cracks. And even if there isn’t, would it be nice to know for sure?

Time to get cracking!

Advertisements

The TransEvents Table

storeboss table

Much of the ensuing discussion will be about this table.  Please click here to see its the early beta version its DDL and load file DML.

There will be significant changes to this data object in the weeks ahead.  Stayed tuned for more info, as I further progress in the development of StoreBoss.

Please note that I am targeting MySQL on the Server, which means that much of the advanced temporal discussion on this blog will not apply directly to StoreBoss.

Also, most hosting providers do not allow users to save SQL output to a custom file location — unless they are paying for some super expensive plan.  With this in mind, I’m developing StoreBoss as a WP plugin that populates data to this (and perhaps other) custom tables. I will try to do most if not all of the data prep using SQL, given that the row counts are likely to be small enough that StoreBoss will not significantly impact production.

The data it contains could be downloaded, if necessary, for further processing by an end-user.

WC Orders – more FSM analysis

wp erd
WP 4.4.2 schema

If you take a look at the entity-relationship diagram for WP’s original core tables (see above), which comprised a baker’s dozen or so tables, you might get the impression that RI constraints are being enforced via an underlying database engine.

In reality, most of WP/WC’s RI constraints are maintained through aplication code.

This can easily be verified by querying the MariaDB information schema catalog, as shown here.

Why was native RDBMS RI not implemented by WP?

According to Mike Little (one of the original developers of WordPress), it was an accident of history.

The initial DB schema for WP was created by Michel Valdrighi in the early 2000s. Valdrighi was the author of B2/Cafelog, from which WP was forked.

Per Little, either MySQL’s support for referential integrity was lacking back then, or, the database drivers for PHP at the time did not support RI.

The net result is that Woocommerce — an OLTP plugin that extends some core WordPress tables, as well as adds another half dozen or so tables to WP’s original schema — ended up being relational in name only. This despite being a domain that’s ideally suited for the kind of normalized schema implementation that you will find in any database 101 textbook.

In reality, WC product and transactional persistence is achieved via a hard-to-understand, key-value scheme that scales poorly and contains serialized columns that make porting or migrating this database less straightforward than might otherwise be desired.

We will now write a query (Q1) that materializes temporal order-related information.

In the example below, Q1 results in data pertaining to Order 117.

SELECT 
p.ID AS "Trans ID",
p.post_date AS "Trans Date",
p.post_modified AS "Trans Mod Date",
c.comment_content AS "Trans Status", 
c.comment_date AS "Status Date",
x1.meta_value AS "Order Key",
x2.meta_value AS "Currency",
o.meta_value AS "Order Total",
x3.meta_value AS "Completed Date",
x4.meta_value AS "Paid Date"

FROM wp_posts p

LEFT JOIN wp_comments c ON
(p.post_type = "shop_order"
AND p.ID = comment_post_ID 
AND c.comment_type = "order_note")

INNER JOIN wp_postmeta o ON 
(o.`meta_key`= "_order_total"
AND p.ID = o.post_id)

INNER JOIN wp_postmeta x1 ON 
(x1.`meta_key`= "_order_key"
AND p.ID = x1.post_id)

INNER JOIN wp_postmeta x2 ON 
(x2.`meta_key`= "_order_currency"
AND p.ID = x2.post_id)

INNER JOIN wp_postmeta x3 ON 
(x3.`meta_key`= "_completed_date"
AND p.ID = x3.post_id)

INNER JOIN wp_postmeta x4 ON 
(x4.`meta_key`= "_paid_date"
AND p.ID = x4.post_id)

WHERE p.ID = 117
ORDER BY p.post_date;

 

Q1 produces the following result.

wc data

We now have a starting point for analyzing temporal data with respect to state changes that a WC order can go through.

The FSM state transitions for the WC Orders domain  can be listed as follows:

OS0 unknown (custom state extension)
OS1 pending payment
OS2 processing
OS3 on hold
OS3 completed
OS4 canceled
OS5 refunded
OS6 failed
OS7 manager approved (custom state extension)

In our next few posts, we will further analyze the results produced by a variant of Q1, using a more comprehensive example.

We’ll also inspect a few additional tables — such as the notorious wp_options tables (home of ballooning transient data sets) as well as the quasi indecipherable woocommerce_order_items and woocommerce_order_itemmeta tables.

To anyone who follows Woo or is a Woo customer, it will not come as a revelation that Woo can quickly become cumbersome, complex, and slow.  As many respected observers have already noted, the WP/Woo schema is not particularly well-suited for e-commerce.

With this in mind, products such as Glew have been developed and are available for major e-commerce retailers. But they are expensive.  Still, it does not cost anything to learn from their YouTube tutorials.

Speaking of expensive, Liquid Web offers a pricey WooCommerce managed solution; we will eventually be taking a look at their custom orders plug-in, which is available for free on Github.

Our goal is to try to devise an open-source (ie, affordable!), best-practices way of producing sophisticated order analytics on a highly secure platform that does not consume costly hosting CPU cycles — namely, the desktop.

But first, we will have some fun in serverland, and take a look at the temporal semantics of WC orders, and how MariadDB 10.3+ might be used to extend these such that the average non-technical store owner can better, as the StoreBoss™ tagline goes, mind his or her business.

 

Woo Orders – Part 2

sales receipt

WooCommerce relies on a 2 table schema — one fat, one skinny — as the primary method for constructing Order data.  A variety of other tables provide auxiliary information inflight.

It is difficult for a non-database professional to extract data from this sink.

Woo does not expect its customers to know how to write PhP code.  In Woo’s view, this is the job of specialists, sort of like old school IT departments —  ie, highly trained developers who are versed in the “technology stack” that is the hallmark of modern Web development.

Woo expects these specialists to use an interface called a “REST API” to extract or manipulate data that persists in the Woo/WP datastore.

Absent is the alluring promise of relational databases — with their declarative, somewhat verbose, set-based programming sublanguages, that in theory should allow a programmer to largely be unconcerned with how structured data is saved to disk or retrieved.

Thus, a non-technical Woo store owner cannot with ease directly query the Woo database in an ad hoc fashion; for example, to independently verify and/or audit the validity of Woo’s assertion as to what it holds to be relational facts.

Specialized, rather arcane knowledge, is required.

In my previous post, we saw how to extract a subset of Order data, particularly if we are interested in performing a form of applied ELH analysis to a WooCommerce Order — in other words, how does a Woo order get born, live, then die?

A simple query with only 1 inner join was needed to populate (insert) an ISAM DSS table with Order transaction rows.

We also saw how an Order passes through multiple states.

Woo maintains a log of these states in a table called wp_comments.  Rows with the field comment_type equal ‘order_note’ trace each step along the way.

That is a valuable table, a crucial one, in fact, for the purpose of creating a temporal ELH table.

A common way to capture temporal info is to declare a point-based data model, which is what we shall do with a table called trans_1.

The role of trans_1 is to:

  1.  Provide information relating to Order-level state transitions
  2. Create an efficient means to download or extract Order data from Woo without having to deal with its REST API
  3. Serve as an independent accounting-level verification for the data that is provided by Woo’s reports

Let’s assume for a moment, as a learning exercise, that there is no wp_comments table.  This will help us better understand the Woo Schema and how it uses WP’s underlying tables to track Orders data.

In such a case, how could we populate trans_1?

Leaving aside performance considerations, a trigger might be an attractive option to automatically populate our custom Trans (ie, Order) table. This, in fact, is one of the traditional solutions in data processing to this classic problem; another is having an app write twice, once to the TRANS table in question, the other to an associated historical, timestamped “versioning” table.  These normally feature rolling time-period views (such as yearly, semi-annually, quarterly, monthly, weekly, and daily).

The trigger approach would eliminate the need to write a cron job that would periodically run the INSERT query that we saw in the previous post. This begs the question, how frequently should it poll the wp_posts table? How do we ensure that state transitions are not missed?

To sidestep this problem, triggers may offer a low-code solution.

I will try to demonstrate how this might work, with an Insert trigger, MariaDB-style.  Essentially the trigger populates Table B, every time a row is inserted in Table A.

(In our real-world Order example, Table B would have to have, also, an Update trigger, to retrieve Transaction Amt order-level data from the auxiliary EVA table that the wp_posts table uses to construct virtual Order tables on the fly. This sucks, but there you have it.)

Here goes.

First, I will create a table in PHPMYADMIN that replicates or mirrors the structure of the sb_trans_1 table that we saw in the previous post.  I will call this table sb_trans_2.

CREATE TABLE IF NOT EXISTS `sb_trans_2`
 (
`Trans_ID` bigint(20) NOT NULL, 
`Trans_status` varchar(20) NOT NULL, 
`Trans_date` datetime NOT NULL,
`Trans_amt` decimal(13,2) NOT NULL, 
CONSTRAINT `Trans_pk`
PRIMARY KEY (`Trans_ID`, `Trans_status`) ) 
ENGINE = 'MyISAM';

Here it is, in my local Sandbox MariaDB database.

Next, I will attach a trigger to sb_trans_1.  For demo purposes, I will use an artificial firing condition — that the Order Number exceeds an Order I manually created in the database.  In a real-world implementation, the test condition would, of course, be to perform a `post_type` = “shop_order” column eval on the wp_posts table.

Here’s the trigger:

DELIMITER //
CREATE TRIGGER addrow_sb_trans_2
AFTER INSERT ON sb_trans_1
FOR EACH ROW
BEGIN
IF NEW.Trans_ID > "117" THEN
INSERT INTO sb_trans_2 (Trans_ID, Trans_status, Trans_date, Trans_amt)
VALUES (NEW.Trans_ID,
NEW.Trans_status,
NEW.Trans_date,
NEW.Trans_amt);
END IF;
END; //
DELIMITER ;

Now let’s add an order to the trans_1 table, #118. We can use the Insert Panel on PHP admin to accomplish this task.  As you can see below, trans_2 is automatically updated.

Conclusion:  the trigger idea shows promise — maybe.

Let’s try it on wp_posts, just for laughs.  This is the Big Kahuna table in the WP/WOO database universe.  We add the following trigger.

DELIMITER //
CREATE TRIGGER addrow_sb_trans_1 
AFTER INSERT ON wp_posts 
FOR EACH ROW 
BEGIN 
IF NEW.post_type = "shop_order" 
THEN INSERT INTO sb_trans_1 
(Trans_ID, Trans_status, Trans_date) 
VALUES (NEW.ID, NEW.post_status, NEW.post_date);
END IF; 
END; //
DELIMITER ;

Now let’s try to add a row using WooCommerce -> Orders in the WP admin panel.

Disaster.

And here is some further unexpected behavior.

The naive moral of this exercise is, don’t use triggers on WP/WC tables,

Out of curiosity, I investigated what caused the Fatal Error.  It turns out that our seemingly innocuous trigger resulted in the invocation of code that fires off a call at Line 160 that crashes the plugin. Why this happened shall forever remain one of the mysteries of life.

brackets code

Btw, I also tried, just for kicks, attaching a trigger to the wp_wc_order_stats, which is also populated with Orders data.

woo order stats table

This, too, didn’t work, but we might use this table down the road, as it provides a useful breakdown of the dollar amt descriptors that comprise a total Order amount.

In conclusion, it’s a good job that wp_comments exists.

But we’ve learned quite a bit about how Orders are handled internally in the WP database.  This will help us construct the SELECT queries that we will be needing.

In my next post, I will create a parameterized stored procedure that extracts and transforms Order data from the Woo tables mentioned above and loads the result into trans_1. Most likely I will write the result set to disk, then load the file into the table, as this is far faster than doing inserts, and I won’t have the transaction overhead to deal with. Besides, trans_1 is an ISAM table:  runs like greased lightning for DSS queries, not so fast OLTP.

From a data architecture standpoint, trans_1 will provide an ad hoc means to populate StoreBoss™ with dynamic Order data on a temporal basis.

All I would then have to do is put together a simple PHP GUI WP plug-in that supports CSV exporting to a local machine, and we can go on from there.

Most likely the trans_1 SP would look something like this:

SELECT 
p.ID AS "Trans ID",
p.post_date AS "Trans Date",
c.comment_content AS "Trans Status", 
c.comment_date AS "Status Date",
o.meta_value AS "Trans Amt"
FROM wp_posts p
LEFT JOIN wp_comments c ON
(p.post_type = "shop_order"
AND p.ID = comment_post_ID 
AND c.comment_type = "order_note")
INNER JOIN wp_postmeta o ON 
(o.`meta_key`= "_order_total"
AND p.ID = o.post_id)
ORDER BY p.post_date;

which results in

sql result

Bingo.

Woo Orders – Part 1

One of the main ideas behind my StoreBoss™ project is explore in different, interesting and useful ways how to provide a store owner with the ability to keep forensic track of Woo orders coming in over the digital transom.

Unfortunately, Woo reports tend to be skimpy in that regard, and often do not have the sort of details you might be interested in knowing.  Also, WooCommerce is not really ideally suited for ad hoc reporting or data analysis on the fly.

It always possible to write your own reports of course, but that means humping out the WP way of doing things, which entails learning PHP and mastering an elaborate gauntlet of hurdles that stand between you and your data.

It would be nice to simply be able to query the WP database directly, without having to go through lumbering and slow “REST APIs”, when all you want maybe is to quickly see what orders are coming in, how long it’s taking to ship product out the door, how long is it taking before you get paid, and determine precisely the haircut imposed by third-party payment processors like STRIPE.  A store owner might also wish to verify that no sales records are being manipulated by inhouse staff.

It’s unreasonable to expect the small B&M store owner to have the time or technical knowledge or personal inclination to master the contortions required to query the WP/WOO database schema.

However, if one is interested in stripped down, just the facts information about, say, Orders, it is not really that hard to write an SQL query that does the job by directly querying the database.

Have you ever wondered how you can verify that the data you see in Woo reports (or the various third party Woo plugins) is accurate?  Wouldn’t it be nice to be able to verify their accuracy for yourself?

Moreover, have you ever wanted to be able to look at a simple, up-to date report and see in a instant your cash flow position at any given moment?

That is what I am going to attempt to build in StoreBoss™. Please note that the ideas described here are somewhat experimental, and rely on MariaDB 10.3 or above as the WP Storage Engine.  Your WP hosting provider may require a dedicated server for such a configuration. A StoreBoss™ plug-in would of course eliminate this dependency.

Please note, also, the WP in general frowns on directly querying its database.  Since we are not touching any Woo or WP data directly (other than via the SELECT statement), we can safely ignore this directive — while recognizing that any query run against the DB will incur a performance cost.

Our goal therefore will be to create a staging  table for StoreBoss in the WP database, then download this data to the user’s machine where it can be stored in SQLite and analyzed using a Python application.  SQLite is a robust, single user DB engine does not require any technical knowledge to configure, and is free.

In this manner, security and privacy issues will be dealt with in a strict fashion: Woo and other data (such as Bank Statement information, which we will get to later on) will be controlled on the end user’s machine, and thus be far less susceptible to SQL injection and other hacker annoyances and malicious attacks that must be taken into account by WP and Woo in the cloud.

This will free StoreBoss™ from having to rely on tediously baroque and ineficient methods of calling the inordinately complex underlying Woo DB (using PDO etc) out of a constant paranoia of being hacked, a situation that also hamstrings the ability of the end user to query data on an ad hoc basis.

With all the needed data (which can be encrypted, if need be) updated on his or her private desktop on notebook machine, a store owner will NOT risk burdening the WOO prod env with an undue amount of extra CPU cycles that may exceed some hosting provider threshhold or slow down catalog displaying or transactional order processing.

Let’s get started with a simple example.

I’ll begin with 1 order that I will enter manually in a clean copy of WP/Woo that I’ve installed on my desktop.  You can set up something like this too, without too much trouble — in my case, I’m running XAMPP on a refurbished 10 year old HP machine running 64-bit Windows 8.1.  So there is nothing fancy whatsoever about my setup.  Store owners of course would not have to do this; my setup is for dev purposes only.

Here is what my Woo Order panel looked like after I manually enered my first order.

woo panel

To verify the accuracy of this data, I queried the underlying database using PhPMyAdmin.  This is the query I wrote to retrieve the basic information I need to know about this order.

SELECT 
p.`ID` AS `Trans ID`,
p.`post_status` AS `Trans Status`,
p.`post_date` AS `Trans Date`,
o.`meta_value` AS `Trans Amt`
FROM  `wp_posts` p
INNER JOIN  `wp_postmeta` o ON (
o.`meta_key`= "_order_total")
WHERE
`post_type` = "shop_order"
AND
p.`ID` = `post_id`;

This query produced a single row of Order data that’s in a Pending payment state.

I then changed the state of the Order to Processing – which meant that Payment was received (of course no actual money had changed hand, as this was just a test).

The Woo Orders panel now looked like this.

Notice how there was now a highighted “1” in a small red circle now in the WP Admin Panel.

I again used my query, which gave me the following result.

Note how there was still only one order row?  It had changed state (check out the Trans Status column), but did not provide an audit trail that I could query as to how much time had elapsed between when the order was received and when the money had “come in.” (We will go with Woo’s loose concept of being in receipt of funds, for now.)

Luckily, I had previously created a custom table in the WP db, and written out the first Order row to it.  Here is the DDL I used to create the table (the sb_ prefix stands for StoreBoss™, which will help reduce the chances of database-object naming conflicts in the WP/Woo persistence engine and its accompanying plug-in eco system.

CREATE TABLE IF NOT EXISTS `sb_trans_1`
(
`Trans_ID` bigint(20) NOT NULL,
`Trans_status` varchar(20) NOT NULL,
`Trans_date` datetime NOT NULL,
`Trans_amt` decimal(13,2) NOT NULL,
CONSTRAINT `Trans_pk`
PRIMARY KEY (`Trans_ID`, `Trans_status`) )
ENGINE = ‘MyISAM’;

Note the Engine I’ve chosen: ISAM is faster for DSS or Analytical processing, which is what we will be doing here.

I then updated my table with the order row in this current state, using the following DML.

INSERT INTO `sb_trans_1` (`Trans_ID`, 
`Trans_status`,
`Trans_date`,
`Trans_amt`)
SELECT 
p.`ID` AS `Trans ID`,
p.`post_status` AS `Trans Status`,
p.`post_date` AS `Trans Date`,
o.`meta_value` AS `Trans Amt`
FROM `wp_posts` p
INNER JOIN `wp_postmeta` o ON (
o.`meta_key`= "_order_total")
WHERE
`post_type` = "shop_order"
AND
p.`ID` = `post_id`;

Now I had 2 rows of order data, but unfortunately I had a small problem:  since the transaction date had not changed (which of course is correct), I had no way of knowing how much time had elapsed between these two events.

To remedy the situation, I added a timestamp to my table, using the following code.

ALTER TABLE `sb_trans_1` 
ADD `Trans_timestamp` TIMESTAMP NULL DEFAULT NULL AFTER `Trans_amt`;

Now I actually had temporal state information pertaining to the changes in status that my order had gone through,  I now marked the Order as paid, and updated my custom table.  It now looked like this.

Note the Trans_timestamp column on the far right. Now I finally have a way of adding valuable audit trail data to my Store Management toolkit. The best part is that all this ETL scrubbing can be automated and displayed on an unclutterd PHP-driven report that might look something like this.

storeboss prototype
Early working Storeboss responsive prototype that uses PHP and Stored Procedures to display transaction data