Designing the Product Table

wordpress data model
The yellow wp_posts row highlights how the Widget-1 Product custom post type anchors the WooCommerce database schema

In the previous post, we initially considered candidate attributes for Product table by analyzing the output of the Product Export facility provided by WooCommerce.

We used the Woo Add a Product panel and saw how WooCommerce divides Products into four (4) major types (there are variations on this, which we’ll get into as we continue our analysis).

Earlier on in this series, I showed you a schema that I designed to support our Price Analytics database — the development of which is the ultimate objective of this series, although we might take interesting detours along the the way.

This schema contains a Product entity.  In the SB (sandbox) conceptual data model, I showed the relationship between Product and other entities of interest by using primary and foreign keys only.

Now I’m going to focus on the Product entity, and add the attributes that will allow us to write the DDL for our Product table.

A reasonably efficient way of doing this is to query the WP database information schema, and list out all the attributes that are related to Product, which we already know is anchored by a custom post type row in the wp_posts table.  

I’ll only retain attributes that describe the notion of a Product, and discard the synthetic ones WooCommerce uses in modeling a Product (for example, attributes that relate to the Product visibility in the Woo catalog).

Let’s start off by analyzing how WooCommerce stores a Product instance as a custom post row in the wp_posts table.

In passing, let me make an obvious point: data analysis here involves examining how Woo has structured its schema by (1) leveraging the underlying framework of WP core tables as a base, and, (2) creating a not insignificant number of custom or supplemental tables..

Data modeling in this series refers to schema analysis, and not what is meant by the term when it used in the context of, say, BI tools or a plain old spreadsheet. These applications are used to manipulate the contents of a data store, although desktop tools such as Microsoft’s BI can also be used to produce derived structures of source data in a given scenario or view.

Here is the query.

SHOW COLUMNS
FROM
    `wp_posts`

It produces a 24-row result.

ID  (PK)
post_author
post_date 
post_date_gmt
post_content
post_title text
post_excerpt
post_status
publish
comment_status
ping_status
post_password
post_name
to_ping
pinged
post_modified
post_modified_gmt
post_content_filtered
post_parent
guid
menu_order
post_type varchar
post_mime_type
comment_count

As you can see, most of these attributes have little to do with Products. This is due to WP being a content management system. 

It is, however, extensible.

For example, many product-related attributes can be found in the wp_postmeta table.  These are stored as EAV pairs, which of course violates the 1st normal form of relational database design, but can reduce the number of tables needed to model a given domain.

This scheme also allows the addition of n-number of attributes to a custom entity; however, this flexibility results in a performance hit — which WP tries to compensate for by keeping retrieved objects in memory. It also makes the model less understandable, particularly for end users wishing to write ad hoc database queries.

One thing to note are the temporal attributes included in wp_posts.  We will soon examine them more closely, as we determine the precise usage/function of any attribute that we decide to include in our SB model.

Let’s now take a look at the Product-related attributes found in the wp_postmeta table.

Here is the query:

SELECT DISTINCT
    q.meta_key AS 'ATTRIBUTE'
FROM
    wp_posts p
JOIN wp_postmeta q ON
    (p.id = q.post_id)
WHERE
    post_type = 'product'
ORDER BY
    q.meta_key

This query produces the following 35-row result.

_backorders
_crosssell_ids
_default_attributes
_download_expiry
_download_limit
_downloadable
_edit_last
_edit_lock
_height
_length
_low_stock_amount
_manage_stock
_price
_product_image_gallery
_product_version
_purchase_note
_regular_price
_sale_price
_sale_price_dates_from
_sale_price_dates_to
_sku
_sold_individually
_stock
_stock_status
_tax_class
_tax_status
_thumbnail_id
_upsell_ids
_virtual
_wc_average_rating
_wc_rating_count
_wc_review_count
_weight
_width
total_sales *

 So far, so good.  We appear to have captured all of the descriptors associated by WooCommerce with a Product entity — except for Product Type, Product Category, and Product Tag.

Fortunately for us, the code for that is based on the hard work we did in the previous post.  Here is the query that returns those descriptors for every and any product in the Woo/WP database.

SELECT
    P.post_name AS "Prod Name",
    b.taxonomy AS "Prod Taxnmy",
    b.description AS "Prod Taxnmy Desc",
    b.parent AS "Prod Taxnmy Parent",
    b.count AS "Prox Taxnmy Cnt",
    c.term_group AS "Prod Taxnmy Group",
    c.name AS "Product Type"
FROM
    wp_posts P
INNER JOIN wp_term_relationships a ON
    (P.ID = a.object_id)
INNER JOIN wp_term_taxonomy b ON
    (
        a.term_taxonomy_id = b.term_taxonomy_id
    )
INNER JOIN wp_terms c ON
    (b.term_id = c.term_id)
WHERE
    post_type = 'product'

And here is the result set for the simple test product we added earlier to our Sandbox WooCommerce database.

Now that we appear to have all our candidate descriptors for Product lined up, our next step will be to eliminate those bearing information that is superfluous to our analytics model.

Should any descriptors of interest turn out to be serialized, we shall also have to devise an efficient strategy to decompose these fields into data independent (i.e., portable) Product attributes. It should also be mentioned that PHP serialized/deserialized data has been proven to pose a security risk.

Our last step before creating the Product table DDL will be to run a number of manual test cases, to make sure that our new design correctly captures every possible combination of Product Type or Variation in WooCommerce.

There is still much work ahead, but we are making progress.

* Note this field is inconsistently named Also, please note that MSRP is missing.  It can be added via a custom PHP snippet, or by purchasing an expensive plug-in from WooCommerce.  We will of course reflect this important attribute in our DSS schema.