Mapping Product and Price Data (Updated version)

fjord

The (free) WooCommerce Product Export facility writes out a file that lists the comma-separated attribute names and associated rows that capture product-related content.

Unfortunately, this file is insufficient to fully specify the Pricing Analytics data entities that we have created in the previous article.

The Woo REST API for Products contains additional variables that are also of interest to our project.

These will be helpful to help us populate the Pricing Analytics with other important attributes..

Our ultimate challenge in this project will be to minimize reliance on, PHP 7.x.

Instead, we will pursue the traditional Data Warehouse route, using batch file updating.

In particular , we will use MariaDB’s MYSQLDUMP facility to export the necessary CSV files, then write a Python script to mungue them in order to create new CSV files that conform to our Pricing Analytics schema.

After that, we can easily populate our Product and Price-related tables using the MariaDB LOAD DATA INFILE statement.

One can imagine a beta version of this DSS/Pricing Analytics front end looking something like this.

#

But before any of this can take place, we must go through a mapping exercise to determine where the data is in Woo’s database that’s needed for our proposed DSS schema is located.

Here is an example csv file showing the 39 fields output by the Product Export facility.

I produced this file in a local host dev env that only contains 1 plug-in: WooCommerce, running on MariaDB 10.3. Then I imported it into Google sheets and published it as a link (WP does not allow iframe embeds).

There are no other plug-ins installed on my local db, nor have I downloaded any themes, including the ones that you can download for free with Woo.

There are no posts or pages either; and I got rid of Hello Dolly (a simple plug-in that comes bundled with every install of WP)..

This way, we can focus on exactly how various product types are stored in the Woo database, without extraneous information to distract us from this reverse engineering examination.

I started with Widget-1, a product of type Simple.

One thing that is already of interest in that the Weight Value (Column 19) of this record is “5..55”.  This was a deliberate typo on my part, and shows that in some cases Woo does not validate input data. (There are other discrepancies in weight and size, but was due to the hassle of having to reinstall WP., thinking I had trashed a table. It turns out that in my XAMPP local hosted configuration, tables sometimes vanish in the PHPmyadmin sidebar, thought you can still query them in the MariaDB engine thru the SQL box. It was fine when I reinstalled WOO, but this is why I will only unit test a few records at a time, and will only bulk test on the cloud server when I think I have a working prototype.)

Basic data validation of this type must therefore be handled at some point in our database load process. Should the entire load fail, when an error of this type is encountered, or should the error be ignored and the record uploaded with an error record added to s log file.

Another item of interest is the second column of this example CSV file.

It has a column of name of Type.  Below is an extended discussion of the naming of this column heading, and how we can find it’s intensional value (via SQL).

Here are the Woo REST API Product properties, which exceed in number those found in the CSV export file.

Why is that?

More importantly, based on the SB data model we saw in the previous post, what should our mapping table look like like?

#

Data Detective Query

The  following query (hats off to Dataeo) is useful to locate where a column exists (ie, in which tables) in Woo’s Information schema.

Just go to your PHPmyadmin SQL tab and type this query (changing values in red!):

select tab.table_schema as database_name, tab.table_name
from information_schema.tables
as tab inner join information_schema.columns
as col on col.table_schema = tab.table_schema
and col.table_name = tab.table_name and column_name = ‘your column name
where tab.table_type = ‘BASE TABLE’
order by tab.table_schema, tab.table_name;

#

In the Woo CSV Product export file, we may note that the second column heading is called Type.

I  had a very hard time using the above query to locate a column with this name.

There was a very good reason for this:  there is no such column.

To understand how Product Types were modeled, we need to understand what Woo calls a “custom taxonomy.”

Do not confuse this concept with that of a Variable product that has a custom attribute. Because they both contain the word “custom”, this can be misleading.

A variable product is a product that has, well, variations; these can be quite minor, such as color, or major, such as having a Stinkability attribute.

As you can see from the Add New Product panel below, I am here adding a new variant that has a custom attribute STINKABILITY. Some products of this type will stink, other variations will not.  The Woo engine can tell the difference by parsing the binary options provided, which are separated by a verti bar in the Value(s) box.woocommerce add product

Let’s ‘s return now to the subject of Woo custom taxonomies and our mysterious “Type” column (that we see as the 2nd column the Product export CSV file).

To understand it, we can first think of the notion of a plain vanilla post Category.  This is the classification scheme that WP bloggers use to classify post by topic (which can be anything).  Shop owners can also use Category to segment products.

This can be conceptually diagrammed as follows.

Prodiuct classification

The built-in Product Taxonomy is a little different.

There are four (4) provided Product Types that come out of the box with WooCommerce: Simple product, Grouped product, External/affiliate product, and Variable product.

Woo uses the same modeling technique it employs in user-defined Categories to create instances of these built-in Product Types, of which Simple product is by far the most popular.

To see these built-in Product Types associated to their correct Product instance, you need a 3-way inner join:

(1) use column ID from wp-posts as your initial join clause… (ID is a Product’s unique ID number, as assigned in semi random ascending order by PHP code)

→ to a column in wp_term_relationships called object_ID (which is part of a composite PK)

this table is an “intersection table” which resolves the M:N relationship between wp_term_taxonomy and wp_posts)

the second part of its composite key contains a column called term_taxonomy_id., which numerically identifies the  code for a product ID instance

(2) → this column (term_taxonomy_id) is the join clause predicate used to match the intensional value of a column in wp_term_taxonomy. that’s also called term_taxonomy_id,.  This join enables the query to correctly select the term_id (“term” is WP/Woo’s nomenclature for a type of product; in this case, it’s the code used to represent “simple”) associated with this instance of product ID

(3) → join with the wp_terms table, using the term_ID column as the join clause with the term_id PK column in the  wp_terms table. This terminative join delivers the desired result set. In this case, it’s a Product type instance with the value “Simple:” in the Name column

The following diagram illustrates this σisSimple 3-join query:

sql query joins

#

Let’s look how this works in practice, by using the Add Product panel in WooCommerce. to add a simple product type called Widget 1.

woocommerce
Click or tap to enlarge

And for the moment you’ve all been waiting for, here is the query itself (you can replace the single quotes with double quotes if you encounter problems running this code):

SELECT P.post_name as ‘Product Name’, c.name as ‘Product Type’
FROM wp_posts P
INNER JOIN wp_term_relationships a     (1)
ON (P.ID = a.object_id )
INNER JOIN wp_term_taxonomy b            (2)
ON ( a.term_taxonomy_id = b.term_taxonomy_id )
INNER JOIN wp_terms c                                (3)
ON ( b.term_id = c.term_id )

Here is the result of the query off PHPmyadmin.

sql query result

As you can see there are funky rows that get added to the result set.  This probably has to do with autosave UI behavior. Extraneous detail will later have to scrubbed by the DSS loader.

Steps ahead…

(In my next post, I will produce a set of mappings that identify the data attributes in the DSS model in the Pricing related tables.  I am not going to go through an extended exercise showing how I identified each attribute of interest, but will give relevant examples here and there.)