Pages

2015-05-07

PloyGlot Management

PloyGlot Data Management

 

What is the data stored in? 


A traditional database administrator is familiar with a set of utilities, SQL, and a stack of tools specific to the RDBMS that they are supporting.


SQL Server, SQL Server Analysis Service, SQL Server Integration Services, SQL Server Reporting Services. SQL Server Management Studio.

Not to mention the whole disk partitioning, SAN layout, RAID level and such that they need to know.

Oracle has it's own set of peculiarities as does DB2, MySQL, and others.

There is a new kid on the block.

NoSQL.

Martin Fowler does a phenomenal job introducing the topic of NoSQL in this talk 

While most people do not get into too many of the details about how their data is stored and structured, the world of #DataOps the nature of how and where the data is stored becomes incredibly important.

In any given day managing a PloyGlot environment the command line tools used could be sqlplus, bcp, sqlplus,hdfs,spark-shell, xd-shell.

Each of these data storage engines have different requirements. The manner in which databases are clustered when moving between RDBMS, and the NoSQL platforms are quite different.

Putting Hadoop or Cassandra on a SAN should require due thought before doing so.

Likewise creating an environment with isolated disks for an Oracle Cluster may not be the correct solution.


Managing a PolyGlot environment, is by itself a challenge. Sometimes this requires a team,
sometimes this requires a lightly shaded purple squirrel that is equally at home at the command line, the SQL prompt,a REPL, a console, a white board, an AWS management browser, or a management console like Grid control, Ops Center, or Cloudera Manager.

Working with this variety of data, and the variety of the types of teams, and people that need access to this data, or even a subset of the data requires it's own level of understanding of the data, data management, and how to make the data itself work to contribute to the bottom line of the organization.

Are your purple squirrels only on the data science team? Probably not.



2 comments:

  1. Hi Doug,

    Thank you for this insightful post, I completely agree with you -management and skill sets at the NoSQL layers of data storage are generally overlooked by those who purchase the solution to bring it in. Today, there is still a big need for this kind of business user "education" about just what the solution IS and what it is NOT, and of course what it requires in order to make it work properly for the business.

    Cheers mate,
    Dan Linstedt

    ReplyDelete
  2. Great comment Dan! While it is useful to "learn as you go" at times, this should be built in to the schedule or awareness of whoever is wanting the system to be built. Training comes at a premium, not because of the cost of training, but the time involved in applying those lessons learned to the real world.

    Doug

    ReplyDelete