Incorta Webinar 12-12-18 Notes

data storage
Source: “Accelerating Analytics ” article by Jen Underwood

For anyone interested in what this cool-sounding BI company is doing lately, here are my quick and dirty notes (which I did not put through the prettify edit process) that I took as it went along.  The presentation was kind of breezy and high level, and both presenters sounded quite knowledgeable about the subject, which was hyped as THE DEATH OF THE DEATH STAR MODEL  The questions at the end sounded rather entry-level, but the two big questions that were of most interest to me were not touched upon at all.

These being:

  1.  how do business end users perform ad-hoc querying?
  2. how does this architecture guarantee verifiable data integrity?

Bottom Line:  There was some hand waving at the end of the presentation about Machine Learning doing away with the schema analysis and how Apache / Python Spark solves all the difficult stuff almost automatically, but then yes you still need ETL analysis, so I was kind of left unclear as to what this product is exactly from a technical Data Modeling / Data Analysis perspective.

+++ Here are my notes, uncensored (any mistakes are mine, not the presenters!)  and unedited.  The presenters were the founder of the company @Claudia_Imhoff @incorta and Matthew Halliday  — chief evangelist and head of product development (I took screenshots during the presentation pics, which you can see at bottom of these notes, and I am also going to read their white paper.  I would definitely be interested in learning more about their product )

And not without further ado, the notes….

Death of Star Schema-less: the new world order is that of
the 3 legged stool (see pic below)

Now  users can ask impromptu question vs canned
queries?

more complex questions

no more indexes!!!! let us be agile! incorta!

matthew halliday will come after after dr claudia’s presenation

which started off as a high level history of data warehousing 101 in 5-10 minutes

(claudia has obviously personally gone through many of the early phases of DW so knows what that was like)

etl is no longer necessary

how do users query data?

in a big company, if i want to understand product profitability is a year
to implement and costs over 1 million dollars

etl / star schemas designed around specific
determenistic or, rather, pre determined questions

new questions take a long time to implement (personal note: not sure I agree that SQL is predetermined: rather I would say would with a good model there many things it does well in terms of ad hoc)

data enrichment brings in value
machine learning
will and can answer such as how are my best vendors

real time aggregations of data no etl no star
schema. it is dead now because it is rigid (note; really? is adding another dimension to some fact entity that inflexble and hard?)

what’s needed is aggregation on the fly of millions of records

who are the biggest users of “data collectors”?

who engages with analytics?

oracle EBS, which somehow is related (as I understood it) to Incorta (though it was just a passing reference so I could be wrong!)

it was emphasized that this is not a virtualization only product

but data can be brought in the shape it comes in

Achilles’ heel of 3NF is join

3NF does not scale

again, no to the notion that this is only a data visualization tool and pretty charts
but ability to drill down into data quickly

1.1 billion rows tables that mirror prod env

data analysis projects go from 1 year to 10 weeks

no more data schema but business schemas (hmmm…… marketing speak: how are these views created?)

such as revenue view — they dont care where data
comes from .. no clear how different representations of the same thing are resolved, if at all

50 questions queued up after presentation, only a few answered

1st question sounded like a joke… now that
programmers have more free time on their hands, what should they
with it?– answer get into ML and
predictive analytics and add data scientist and
prescriptive analytics to your skillset

2nd question how to you handle culture shift from
star to non star schema

answer: your job is secure, but it is going to
take some time to move the star schemas in the
new env and some star schemas for some reasons
will never be moved — you have to change your
value — you value is now not to program but bring
a business value of answering questions — this will
make you more marketable and have agility —
mathew says this in not threat but great
opportunity
… then some more blah

BEST QUESTION  is this  this a schema on read approach? (which I sometimes think of as: “kicking the can down the road model)

ans

schema on read = star schema-less

schema on read = data lakes

we need data scientists to analyze the lakes

so it can be a data warehouse and a data lake
joined

(personal observation: the answer to this important crucial question was definitely hand waving):… this could have been the heart of a live demo… showing Incorta in action with what I understand is some kind of NLP query interace)

… next question

is your product just data virtualization?

ans. a data warehouse relies on queries running against a data store
system and creating visualized answer sets

so going against an ERP system will take hours to
come back with something meaningful and
it will slow down operational system

so data virtualization is appealing but non
interactive

question  how do you maintain data start and end date of
relevance?  (this is another good one:  it gets a temporal data engine such as those now found in MariaDB 10.3)

answer the way we do that is bring in a
baseline then bring it deltas then maintain
versions (temporal filters — ie versioned tables for eg) — (note: this sound pretty traditional)

last question: how do you manage master data
when data comes from disparate sources

answer 3 large erp system were brought into
incorta by geographic regions so this was no problem
in that use case

data resides in separate schemas and you can join
across sinks

so how do you join across disparate data set
schemas?

well python spark is used to get rid of these problems
and enrich the results (personal note: this to me sounded like an unconvincing buzzword answer, but I have no experience with python spark so reserve judgement)

okay that was it.

and now the pics. (there was a downloadable paper, but it was mostly marketese) — note the Client Eastwood slide on the week Sandra Locke passed away!