Designing the Product Table

wordpress data model
The yellow wp_posts row highlights how the Widget-1 Product custom post type anchors the WooCommerce database schema

In the previous post, we initially considered candidate attributes for Product table by analyzing the output of the Product Export facility provided by WooCommerce.

We used the Woo Add a Product panel and saw how WooCommerce divides Products into four (4) major types (there are variations on this, which we’ll get into as we continue our analysis).

Earlier on in this series, I showed you a schema that I designed to support our Price Analytics database — the development of which is the ultimate objective of this series, although we might take interesting detours along the the way.

This schema contains a Product entity.  In the SB (sandbox) conceptual data model, I showed the relationship between Product and other entities of interest by using primary and foreign keys only.

Now I’m going to focus on the Product entity, and add the attributes that will allow us to write the DDL for our Product table.

A reasonably efficient way of doing this is to query the WP database information schema, and list out all the attributes that are related to Product, which we already know is anchored by a custom post type row in the wp_posts table.  

I’ll only retain attributes that describe the notion of a Product, and discard the synthetic ones WooCommerce uses in modeling a Product (for example, attributes that relate to the Product visibility in the Woo catalog).

Let’s start off by analyzing how WooCommerce stores a Product instance as a custom post row in the wp_posts table.

In passing, let me make an obvious point: data analysis here involves examining how Woo has structured its schema by (1) leveraging the underlying framework of WP core tables as a base, and, (2) creating a not insignificant number of custom or supplemental tables..

Data modeling in this series refers to schema analysis, and not what is meant by the term when it used in the context of, say, BI tools or a plain old spreadsheet. These applications are used to manipulate the contents of a data store, although desktop tools such as Microsoft’s BI can also be used to produce derived structures of source data in a given scenario or view.

Here is the query.


It produces a 24-row result.

ID  (PK)
post_title text
post_type varchar

As you can see, most of these attributes have little to do with Products. This is due to WP being a content management system. 

It is, however, extensible.

For example, many product-related attributes can be found in the wp_postmeta table.  These are stored as EAV pairs, which of course violates the 1st normal form of relational database design, but can reduce the number of tables needed to model a given domain.

This scheme also allows the addition of n-number of attributes to a custom entity; however, this flexibility results in a performance hit — which WP tries to compensate for by keeping retrieved objects in memory. It also makes the model less understandable, particularly for end users wishing to write ad hoc database queries.

One thing to note are the temporal attributes included in wp_posts.  We will soon examine them more closely, as we determine the precise usage/function of any attribute that we decide to include in our SB model.

Let’s now take a look at the Product-related attributes found in the wp_postmeta table.

Here is the query:

    q.meta_key AS 'ATTRIBUTE'
    wp_posts p
JOIN wp_postmeta q ON
    ( = q.post_id)
    post_type = 'product'

This query produces the following 35-row result.

total_sales *

 So far, so good.  We appear to have captured all of the descriptors associated by WooCommerce with a Product entity — except for Product Type, Product Category, and Product Tag.

Fortunately for us, the code for that is based on the hard work we did in the previous post.  Here is the query that returns those descriptors for every and any product in the Woo/WP database.

    P.post_name AS "Prod Name",
    b.taxonomy AS "Prod Taxnmy",
    b.description AS "Prod Taxnmy Desc",
    b.parent AS "Prod Taxnmy Parent",
    b.count AS "Prox Taxnmy Cnt",
    c.term_group AS "Prod Taxnmy Group", AS "Product Type"
    wp_posts P
INNER JOIN wp_term_relationships a ON
    (P.ID = a.object_id)
INNER JOIN wp_term_taxonomy b ON
        a.term_taxonomy_id = b.term_taxonomy_id
INNER JOIN wp_terms c ON
    (b.term_id = c.term_id)
    post_type = 'product'

And here is the result set for the simple test product we added earlier to our Sandbox WooCommerce database.

Now that we appear to have all our candidate descriptors for Product lined up, our next step will be to eliminate those bearing information that is superfluous to our analytics model.

Should any descriptors of interest turn out to be serialized, we shall also have to devise an efficient strategy to decompose these fields into data independent (i.e., portable) Product attributes. It should also be mentioned that PHP serialized/deserialized data has been proven to pose a security risk.

Our last step before creating the Product table DDL will be to run a number of manual test cases, to make sure that our new design correctly captures every possible combination of Product Type or Variation in WooCommerce.

There is still much work ahead, but we are making progress.

* Note this field is inconsistently named Also, please note that MSRP is missing.  It can be added via a custom PHP snippet, or by purchasing an expensive plug-in from WooCommerce.  We will of course reflect this important attribute in our DSS schema.