I’ve been in the data world for quite a long time now and been a technology enthusiast almost all my life. I’ve lived in the world of relational databases and been learning a lot about all kinds of new “databases”. There are so many terms that your brain can hurt after a while and they just keep coming. Graph, Redshift, JSON, Big Data, In Memory, Hadoop, Hive, Hbase, Mongo, Redis, Columnar, Spark, Cassandra – just to name a few. Wait another week and you’ll find out about a new entry into the market with a different flavor of features.
We know there a lots of different types of data stores that are used for different purposes and different environments. The world has changed from just having one general purpose relational database you put all your data in to having so many options. We always knew that persistent data was lurking in our systems in files – but the majority of the important data you thought was stored in a set of identifiable relational databases. And relational databases, to some degree, are self documenting in that the object names (table, columns) and relationships alone can help you understand what data you have.
But with the trend to move toward newer varied technologies can lead us to a place where nothing is documented or governed anymore. Even though I’ve been steeped in the relational world I’m very interested in understanding and using these new technologies where they make sense. But what I don’t agree with is the idea that because they’re not relational that suddenly there is no need for governance of our persistent data. A phrase that I found myself saying is that “The auditors or the government don’t how you store the data” – in that they will expect you govern it whether it’s in a traditional relational database or a document or hierarchical store.
Many companies are in the position of responsibility for maintaining and safeguarding other people’s information. Personally, if you have my Social Security Number I expect you to protect no matter how you store it. If it gets breached its the same impact to me regardless of if you used SQL Server or HBase. To me its all data – no matter what technology you use. It’s what the data is that drives what type of governance to use – not how its stored.