For anyone interested in what this cool-sounding BI company is doing lately, here are my quick and dirty notes (which I did not put through the prettify edit process) that I took as it went along. The presentation was kind of breezy and high level, and both presenters sounded quite knowledgeable about the subject, which was hyped as THE DEATH OF THE DEATH STAR MODEL The questions at the end sounded rather entry-level, but the two big questions that were of most interest to me were not touched upon at all.
- how do business end users perform ad-hoc querying?
- how does this architecture guarantee verifiable data integrity?
Bottom Line: There was some hand waving at the end of the presentation about Machine Learning doing away with the schema analysis and how Apache / Python Spark solves all the difficult stuff almost automatically, but then yes you still need ETL analysis, so I was kind of left unclear as to what this product is exactly from a technical Data Modeling / Data Analysis perspective.
+++ Here are my notes, uncensored (any mistakes are mine, not the presenters!) and unedited. The presenters were the founder of the company @Claudia_Imhoff @incorta and Matthew Halliday — chief evangelist and head of product development (I took screenshots during the presentation pics, which you can see at bottom of these notes, and I am also going to read their white paper. I would definitely be interested in learning more about their product )
And not without further ado, the notes….
Death of Star Schema-less: the new world order is that of
the 3 legged stool (see pic below)
Now users can ask impromptu question vs canned
more complex questions
no more indexes!!!! let us be agile! incorta!
matthew halliday will come after after dr claudia’s presenation
which started off as a high level history of data warehousing 101 in 5-10 minutes
(claudia has obviously personally gone through many of the early phases of DW so knows what that was like)
etl is no longer necessary
how do users query data?
in a big company, if i want to understand product profitability is a year
to implement and costs over 1 million dollars
etl / star schemas designed around specific
determenistic or, rather, pre determined questions
new questions take a long time to implement (personal note: not sure I agree that SQL is predetermined: rather I would say would with a good model there many things it does well in terms of ad hoc)
data enrichment brings in value
will and can answer such as how are my best vendors
real time aggregations of data no etl no star
schema. it is dead now because it is rigid (note; really? is adding another dimension to some fact entity that inflexble and hard?)
what’s needed is aggregation on the fly of millions of records
who are the biggest users of “data collectors”?
who engages with analytics?
oracle EBS, which somehow is related (as I understood it) to Incorta (though it was just a passing reference so I could be wrong!)
it was emphasized that this is not a virtualization only product
but data can be brought in the shape it comes in
Achilles’ heel of 3NF is join
3NF does not scale
again, no to the notion that this is only a data visualization tool and pretty charts
but ability to drill down into data quickly
1.1 billion rows tables that mirror prod env
data analysis projects go from 1 year to 10 weeks
no more data schema but business schemas (hmmm…… marketing speak: how are these views created?)
such as revenue view — they dont care where data
comes from .. no clear how different representations of the same thing are resolved, if at all
50 questions queued up after presentation, only a few answered
1st question sounded like a joke… now that
programmers have more free time on their hands, what should they
with it?– answer get into ML and
predictive analytics and add data scientist and
prescriptive analytics to your skillset
2nd question how to you handle culture shift from
star to non star schema
answer: your job is secure, but it is going to
take some time to move the star schemas in the
new env and some star schemas for some reasons
will never be moved — you have to change your
value — you value is now not to program but bring
a business value of answering questions — this will
make you more marketable and have agility —
mathew says this in not threat but great
… then some more blah
BEST QUESTION is this this a schema on read approach? (which I sometimes think of as: “kicking the can down the road model)
schema on read = star schema-less
schema on read = data lakes
we need data scientists to analyze the lakes
so it can be a data warehouse and a data lake
(personal observation: the answer to this important crucial question was definitely hand waving):… this could have been the heart of a live demo… showing Incorta in action with what I understand is some kind of NLP query interace)
… next question
is your product just data virtualization?
ans. a data warehouse relies on queries running against a data store
system and creating visualized answer sets
so going against an ERP system will take hours to
come back with something meaningful and
it will slow down operational system
so data virtualization is appealing but non
question how do you maintain data start and end date of
relevance? (this is another good one: it gets a temporal data engine such as those now found in MariaDB 10.3)
answer the way we do that is bring in a
baseline then bring it deltas then maintain
versions (temporal filters — ie versioned tables for eg) — (note: this sound pretty traditional)
last question: how do you manage master data
when data comes from disparate sources
answer 3 large erp system were brought into
incorta by geographic regions so this was no problem
in that use case
data resides in separate schemas and you can join
so how do you join across disparate data set
well python spark is used to get rid of these problems
and enrich the results (personal note: this to me sounded like an unconvincing buzzword answer, but I have no experience with python spark so reserve judgement)
okay that was it.
and now the pics. (there was a downloadable paper, but it was mostly marketese) — note the Client Eastwood slide on the week Sandra Locke passed away!