Batch Report Beta

css responsive app
Galaxy S5 view of the report, which is written using CSS flexbox

This is the STOREBOSS financial analysis report I am currently developing.

The purpose of this merchant-facing app is to allow a store owner to upload transactions from a credit card processor to a custom table, and perform various comparative analytics.

Currently, this is not possible to do in WooCommerce.

The app is written in PhP, and uses JS on the front end, and MariaDB on the back.

Once I am done developing the beta version of Batch Totals Report (the name of the app ), I will turn it into a stand alone WP plug in.

Understanding the Woo Product Schema

wordpress data model
The yellow wp_posts row highlights how the  Product custom post type anchors the WooCommerce database schema

Let’s analyze how WooCommerce stores a Product instance as a custom post row in the wp_posts table.

Here is the query.

SHOW COLUMNS
FROM
    `wp_posts`

It produces a 24-row result.

ID  (PK)
post_author
post_date 
post_date_gmt
post_content
post_title text
post_excerpt
post_status
publish
comment_status
ping_status
post_password
post_name
to_ping
pinged
post_modified
post_modified_gmt
post_content_filtered
post_parent
guid
menu_order
post_type varchar
post_mime_type
comment_count

As you can see, most of these attributes have little to do with Products. This is due to WP core being a content management system. 

It is, however, extensible.

For example, many product-related attributes can be found in the wp_postmeta table.  These are stored as EAV pairs, which of course violates the 1st normal form of relational database design.

This scheme also allows the addition of n-number of attributes to a custom entity; however, this flexibility results in a performance hit — which WP tries to compensate for by keeping retrieved objects in memory. It also makes the model less understandable, particularly for end users wishing to write ad hoc database queries.

Let’s now take a look at the Product-related attributes found in the wp_postmeta table.

Here is the query:

SELECT DISTINCT
    q.meta_key AS 'ATTRIBUTE'
FROM
    wp_posts p
JOIN wp_postmeta q ON
    (p.id = q.post_id)
WHERE
    post_type = 'product'
ORDER BY
    q.meta_key

This query produces the following 35-row result.

_backorders
_crosssell_ids
_default_attributes
_download_expiry
_download_limit
_downloadable
_edit_last
_edit_lock
_height
_length
_low_stock_amount
_manage_stock
_price
_product_image_gallery
_product_version
_purchase_note
_regular_price
_sale_price
_sale_price_dates_from
_sale_price_dates_to
_sku
_sold_individually
_stock
_stock_status
_tax_class
_tax_status
_thumbnail_id
_upsell_ids
_virtual
_wc_average_rating
_wc_rating_count
_wc_review_count
_weight
_width
total_sales *

These are the descriptors associated by WooCommerce with a Product entity — except for Product Type, Product Category, and Product Tag.

Here is the query that returns those descriptors for every and any product in the Woo/WP database.

SELECT
    P.post_name AS "Prod Name",
    b.taxonomy AS "Prod Taxnmy",
    b.description AS "Prod Taxnmy Desc",
    b.parent AS "Prod Taxnmy Parent",
    b.count AS "Prox Taxnmy Cnt",
    c.term_group AS "Prod Taxnmy Group",
    c.name AS "Product Type"
FROM
    wp_posts P
INNER JOIN wp_term_relationships a ON
    (P.ID = a.object_id)
INNER JOIN wp_term_taxonomy b ON
    (
        a.term_taxonomy_id = b.term_taxonomy_id
    )
INNER JOIN wp_terms c ON
    (b.term_id = c.term_id)
WHERE
    post_type = 'product'

If you look at this on PhPMyAdmin, the results will look something like this.

Now that have all our candidate descriptors for a custom Product table lined up, our next step will be to eliminate those bearing information that is superfluous to a streamlined stock inventory analytics model.

Any serialized data should be decoded, in order for the data to be human readable. PHP serialized/deserialized data may pose an additional security risk.  

To make the custom products database readable and queryable by ordinary store owners, it is imperative to deserialize and normalize any encoded arrarys in the Woo schema.

 

* Note this field is inconsistently named Also, please note that MSRP is missing.  It can be added by purchasing an expensive plug-in from WooCommerce, or added and updated on a private Prodcut analytics.

Product Data in Woo

The (free) WooCommerce Product Export facility writes out a CSV file that contain product-related descriptors.

Here is an example csv file showing the 39 fields output by the Product Export facility.

I produced this file in a local host dev env that only contains 1 plug-in: WooCommerce, running on MariaDB 10.3. .

Let’s examine a sample row I created in this sandbox db.

One thing that is already of interest in that the Weight Value (Column 19) of this record is “5..55”.  This was a deliberate typo on my part, and shows that in some cases Woo does not validate input data. (

Basic data validation of this type must therefore be handled at some point in our database load process. Should the entire load fail, when an error of this type is encountered, or should the error be ignored and the record uploaded with an error record added to s log file.

Another item of interest is the second column of this example CSV file.

It has a column of name of Type.  Below is an extended discussion of the naming of this column heading, and how we can find it’s intensional value (via SQL).

Here are the Woo REST API Product properties, which exceed in number those found in the CSV export file.

The  following query (props to Dataeo) is useful to locate where a column exists (ie, in which tables) in Woo’s Information schema.

Just go to your PHPmyadmin SQL tab and type this query (changing values in red!):

select tab.table_schema as database_name, tab.table_name
from information_schema.tables
as tab inner join information_schema.columns
as col on col.table_schema = tab.table_schema
and col.table_name = tab.table_name and column_name = ‘your column name
where tab.table_type = ‘BASE TABLE’
order by tab.table_schema, tab.table_name;

In the Woo CSV Product export file, we may note that the second column heading is called Type.

I  had a very hard time using the above query to locate a column with this name.

There was a very good reason for this:  there is no such column.

To understand how Product Types were modeled, we need to understand what Woo calls a “custom taxonomy.”

Do not confuse this concept with that of a Variable product that has a custom attribute. Because they both contain the word “custom”, this can be misleading.

A variable product is a product that has, well, variations; these can be quite minor, such as color, or major, such as having a Stinkability attribute.

As you can see from the Add New Product panel below, I am here adding a new variant that has a custom attribute STINKABILITY. Some products of this type will stink, other variations will not.  The Woo engine can tell the difference by parsing the binary options provided, which are separated by a verti bar in the Value(s) box.woocommerce add product

Let’s ‘s return now to the mysterious “Type” column (that we see as the 2nd column the Product export CSV file).

To understand it, we can first think of the notion of a plain vanilla post Category.  This is the classification scheme that WP bloggers use to classify post by topic (which can be anything).  Shop owners can also use Category to segment products.

This can be conceptually diagrammed as follows.

Prodiuct classification

The built-in Product Taxonomy is a little different.

There are four (4) provided Product Types that come out of the box with WooCommerce: Simple product, Grouped product, External/affiliate product, and Variable product.

Woo uses the same modeling technique it employs in user-defined Categories to create instances of these built-in Product Types, of which Simple product is by far the most popular.

To see these built-in Product Types associated to their correct Product instance, you need a 3-way inner join:

(1) use column ID from wp-posts as your initial join clause… (ID is a Product’s unique ID number, as assigned in semi random ascending order by PHP code)

→ to a column in wp_term_relationships called object_ID (which is part of a composite PK)

this table is an “intersection table” which resolves the M:N relationship between wp_term_taxonomy and wp_posts)

the second part of its composite key contains a column called term_taxonomy_id., which numerically identifies the  code for a product ID instance

(2) → this column (term_taxonomy_id) is the join clause predicate used to match the intensional value of a column in wp_term_taxonomy. that’s also called term_taxonomy_id,.  This join enables the query to correctly select the term_id (“term” is WP/Woo’s nomenclature for a type of product; in this case, it’s the code used to represent “simple”) associated with this instance of product ID

(3) → join with the wp_terms table, using the term_ID column as the join clause with the term_id PK column in the  wp_terms table. This terminative join delivers the desired result set. In this case, it’s a Product type instance with the value “Simple:” in the Name column

The following diagram illustrates this σisSimple 3-join query:

sql query joins

#

Let’s look how this works in practice, by using the Add Product panel in WooCommerce. to add a simple product type called Widget 1.

woocommerce
Click or tap to enlarge

And for the moment you’ve all been waiting for, here is the query itself (you can replace the single quotes with double quotes if you encounter problems running this code):

SELECT P.post_name as ‘Product Name’, c.name as ‘Product Type’
FROM wp_posts P
INNER JOIN wp_term_relationships a     (1)
ON (P.ID = a.object_id )
INNER JOIN wp_term_taxonomy b            (2)
ON ( a.term_taxonomy_id = b.term_taxonomy_id )
INNER JOIN wp_terms c                                (3)
ON ( b.term_id = c.term_id )

Here is the result of the query off PHPmyadmin.

sql query result

As you can see there are funky rows that get added to the result set.  This probably has to do with autosave UI behavior. Extraneous detail will later have to scrubbed.

Refining the Woo Pricing Analytics Model

In a previous post, we  discovered that a Woocommerce Product instance is by default a “Simple Product.”

We also learned that three other Product types are available: 
a) Grouped product
b) External/Affiliate product
c) Variable product

Any of these four Product types can optionally be “virtual” or “downloadable,” which are in effect a subtype of Product.

There are also extensions sold by WooCommerce that at first blush appear to specify additional Product types.

These are described here.

However, upon closer examination, we realize that these various product bundling extensions are not in fact new Product types, but are more or less a way for a store owner to market products using various forms of discounting or promotions.

Thus, extensions can simply be modeled as Discount Types.

We will not be concerned with BOM functionality for now, since the majority of Woo users are most likely small business store owners with no need for complex recursive M:N functionality with respect to Products.

 

A Closer Look at Woo Products

woocommerce admin panel
Figure 1

WooCommerce supports the notion of “grouped” products. This is implemented by a store owner creating a Product abstraction, sometimes referred to by Woo in the documentation as a parent product.

This parent instance of a group of products does not itself have a price.

Its functional role is to visually link via some theme UI interface X number of products and presented to a shopper as a unit.

This unit is not invariant, as Product instance is a group can be purchased separately, by removing items from the Cart.

Moreover, there is no discount for buying grouped products.

In other words, being grouped does not affect the normal retail price in any way – unless a coupon is applied at checkout, the total cost is arithmetically additive.

The shop owner’s ability to group products is built into Woo core; in other words, it’s free.

In order to present a customer with more sophisticated marketing offers, a store owner must purchase extensions, such as the one for Product Bundling, or another for what is referred to in WooLand as a Composite Product.

These extensions are not free.

WooCommerce utilizes the term “virtual” to describe a subtype of Product, 

Once the sku and Type are specified, a shop owner user is free to link n-number of other Product types (except for abstract classes: in other words, you can have groupings of grouped products) to this parent instance.

In the WooCommerce world, a Product instance is by default a “Simple Product.”

Three other Product types are also available:
a) Grouped product
b) External/Affiliate product
c) Variable product

Any of these four Product types can optionally be “virtual” or “downloadable, which are subtypes of Product.

In general, the notion of virtual products refers to intangible products, such as a subscription to some pricey newsletter.

A downloadable product subtype is something tangible (such as the zip file for a plug in, or the .pdf of some newsletter) that you can download your computer.

This Product class hierarchy provides flexibility to the shop owner.

In can be confusing at first, but speaking generally, what the designers of Woo mean by the term “virtual” is something that you cannot see or touch, and downloadable is typically thought of as an executable of some sort.

There are additional complexities to Product that we need to keep in mind.

For example, what exactly are Variable Products, and what is meant by Custom Attributes?

A class of Grouped Product types is not a true M:N BOM (bill-of-materials) case, which usually leads to elaborate complexity in order to implement correctly, and often entails a significant performance hit.

 

Designing Temporal Analytics for Woo

MariaDB 10.3 provides built-in functionality called System Versioning (1). This has several use-cases, such as supporting sophisticated audit trails. In fact, MariaDB provides a plug-in that can be used to log a server’s activities for exactly this purpose (2).

System Versioning can also be used to produce data analytics that are of interest in a retailing environment.

For example, a WooCommerce end-user may be tasked with producing a graph that tracks, say, the price changes of store products over some unit of time.

Producing such a report using native SQL and the existing WooCommerce database schema is far from straightforward.

This is due to the latter’s underlying complexity – which is a result of WooCommerce reliance on, and conformance to, the semantics of the WordPress data model (3), which is extensible — a feature that WooCommerce uses to its full advantage.

This extensibility reduces the need for plug-ins to overpopulate the WP database with their own custom tables.

When shop owners run WooCommerce’s pre-installed or custom reports to obtain business-related information about their store, there are  several important factors to consider.

Multi-user ad hoc querying of a production WooCommerce platform can result in performance degradation of transactional execution (i.e., the shopping cart experience), slow down shopper catalog browsing, and impact the refresh rate of the somewhat sluggish WP Admin panel.

Moreover, particularly in entry-level hosted environments, running frequent reports may also cause a store owner to exceed a given plan’s CPU time in a shared server configuration

Finally, it can be difficult to produce customized WooCommerce reports without costly programming that requires skilled knowledge of WooCommerce internals. (For an excellent tutorial on the semantics of WP’s core database, here is an excellent multi-part tutorial.).  

A separate DSS environment would be the ideal solution proposed by this experimental project – which I am calling Project SB, for Sandbox.

Theoretically, this environment could be hosted on an end user’s local machine; alternatively, it could be hosted in the cloud on a Saas basis. 

For example, an end user (such as a store owner, or someone who is authorized to examine transactional data) may wish to know how many widgets were sold between time t1 and t2, and group the result of this query by price.

An end-user should be able to formulate such a query as simply as possible – without needing to write extremely complex SQL statements or add mysterious PHP snippets to a child theme that they might be asked to create.

It  is even possible to envision the implementation of a DSS/Analytical database in a SaaS model using, say,  MariaDB’s AX new columnar storage engine (4), that would keep the DSS and transactional envs in sync in quasi real time.

 
  1. see https://mariadb.com/kb/en/library/system-versioned-tables/
  2. see https://mariadb.com/kb/en/library/mariadb-audit-plugin/
  3. see https://codex.wordpress.org/Database_Description
  4. see https://mariadb.com/kb/en/library/mariadb-columnstore

Woo: Relational in Name Only

In case you didn’t already know this: Woo uses a hybrid quasi-relational / EVA NoSQL data model. It is implemented on top of WP’s underlying, equally non-relational meta schema, which runs atop MySQL or MariaDB.

The WP data model was designed initially to provide data persistence to an early CMS system. It allowed the addition of n-number of attributes to a post entity type, as needs evolved.

At its core,  the WP data model consists of rows  in a centrally important table that can be joined via a SELECT statement to a vertically structured (i.e., “thin”) table which specifies whatever attributes (instantiated as rows) WP would need, as time went by, for its post and, later, page types.

This has performance implications.

Also, partly because of WP’s flexible,”meta” structure, Woo does not make “normal” usage of Primary and Foreign relationships — business rules are implemented via PHP code. This leads to maintainability issues.

Moreover, Woo uses serialization, which makes its db non portable and violates the relational model’s scalar rule.