Reflection 1: The Six Principles of Modern Data Architecture
The first article I want to reflect on lists the six principles of a modern data architecture. I almost dismissed this article as a list of banal bullet points which we have seen many times over. However, it was the last bullet point (#6 Eliminate Data Copies and Movement) that grabbed me.
I’ve always focused on establishing single points-of-entry (into the enterprise) for critical data entities. Once these single points-of-entry are established, a robust SOA can allow the rest of the enterprise to request the data, rather than creating redundant data capture capabilities which always lead to data inconsistencies throughout the enterprise.
My thinking also assumes that systems requesting data from the point-of-entry system(s) will store the data locally for performance purposes. Even though the data will be replicated, at least it will be single-sourced and consistent. It is that very assumption of mine that made principle #6 so compelling. It’s a great vision, eliminating data redundancies and movements, but has technology progressed enough to make it a data architecture principle that all enterprises can adopt?
The same observation was noted in Andrew Richard’s blog posting from September 11, 2016, where he notes the usual and customary approach of replicating data for performance purposes, even when a robust SOA is available to share data. Mr. Richard goes on to ask if there are any good reference architectures that address the replication problem. The short write-up for principle #6 mentions a product called Hadoop which I had never heard of before and which is where my second reflection is headed toward.
Reflection 2: Running Operational Applications (OLTP) on Hadoop...
After Googling “Hadoop” and reading various articles, I quickly learned that it is an open sourced, big data platform. I will admit that I am not well versed with the big data trend. My high-level understanding is that it is largely pertinent to data warehousing and analytics. However, for the question raised in Reflection 1, what I’m really interested in are traditional relational databases used by OLTP applications. Can Hadoop (along with a compatible RDBMS) and big data reach beyond the data warehousing and analytics world and solve the replication problem in the OLTP world? Can a business actually have a single “big data” OLTP database serving much, or maybe even all, of the enterprise?
Those questions brought me to the second article I want to reflect on. This article asks and answers four different questions about Hadoop. The first question sets the stage quite well...
Hadoop is primarily known for running batch based, analytic workloads. Is it ready to support real-time, operational and transactional applications?
This is exactly what I want to understand, so let’s look at the second question which asks how enterprises can take advantage of Hadoop.
How can enterprises, specifically in the Retail industry, take advantage of a Hadoop RDBMS?
The author responds to this question with a scalability and performance answer, stating that enterprises can leverage Hadoop and Splice Machine (Hadoop-compatible RDBMS) to provide scalability and performance for extremely large RDBMSs. So, scalability and performance is examined while eliminating data replication and movement is not. My hope that this article was going to address principle #6 from my Reflection 1 article seems very much in doubt now (even though principle #6 specifically mentions Hadoop). Let’s take a look at the third question.
Can we run mixed workloads – transactional (OLTP) and analytical (OLAP) – on the same Hadoop cluster?
Now I see how this article is addressing principle #6 and why Hadoop was mentioned in the first place. Remember principle #6 is titled “Eliminate Data Copies and Movement”. I immediately interpreted that as a call to eliminate redundancies between OLTP databases and the movement of redundant data between OLTP databases. However, this third question reminds me of the more likely intent, which is the elimination of data redundancies and movement between OLTP and OLAP environments. I don’t know if that’s what the first article had in mind with principle #6, but I think it is.
The fourth question simply explores the return on investment for implementing a Hadoop/Split Machine solution for massive RDBMs that mix OLTP and OLAP capabilities.
Even though article 1 was not specific in what it meant by principle #6, and even though article 2 did not actually explore the possibility of some enterprise one day building a single enterprise-wide OLTP database, I can’t help but think the enabling technology exists with Hadoop and Split Machine - the biggest challenge with such a crazy concept will likely be organizational.
Reflection 3: Do You Know What Your Company’s Data Is Worth?
The third article I want to reflect on shifts gears to a different topic, which is treating data as an asset. Among the Lesson 3: Enterprise Data Architecture reading material for EA-874 is an article from Gartner titled “Managing Information as an Asset: Enterprise Architects, Beware!” which describes this “information as an asset” concept as treating your data with care, ensuring consistency, accuracy, accessibility, utility, safety, and transparency. Upon reading the title, I felt I wanted to include the article among my three reflections. Upon reading the article itself, I no longer felt it was worth reflecting on - businesses needing to treat their data with care like it’s an asset seemed too obvious and not very interesting.
However, just a few days later I came across the Reflection 3 article from above. This article actually does talk about assigning a dollar value to a company’s data, which is a new and interesting concept, and one I want to reflect on here.
My initial position is that data valuations don’t belong on the balance sheet as its value is already reflected in intangible asset categories such as goodwill, brand equity, and intellectual capital. However, what I learned in reading the article is that this idea of valuing data is not about adding new assets to a company’s balance sheet, but is instead about taking the intangible assets valuation and breaking it down further to assign a chunk of that valuation to the company’s data.
The important realization for me is that the total value of intangible assets will still be calculated in their traditional ways. Here is a very simple approach:
Intangible assets = market capitalization - (year-end sales + tangible assets)
The article isn’t saying businesses are now expected to mess with their balance sheets and inflate them with valuations of the data they possess. It’s simply saying that businesses need to take their intangible assets valuation and figure out how much of that valuation is attributable to their data.
So why is this important? The answer is risk management. The higher the data valuation, the greater the exposure to such things as cyber attacks and system outages (such as Delta Airlines recent outage). Once investors, business leaders, and IT leaders have a clearer picture of risk exposure, then, it stands to reason, better decisions will be made to mitigate the exposure associated with safeguarding a company’s data.