Recently I mentioned to a few people that I was doing an
analysis on a Data Structure Graph. The immediate response was :
What is a Data Structure Graph?
The image shows a Data Structure Graph rendered for a fictional organization where the applications are the dots(Nodes), and the data transfers between the applications are the lines (Edges)
What is a Data Structure Graph?
The image shows a Data Structure Graph rendered for a fictional organization where the applications are the dots(Nodes), and the data transfers between the applications are the lines (Edges)
A Definition
A Data Structure Graph is a group of atomic entities that
are related to each other, stored in a repository, then moved from one
persistence layer to another, rendered as a Graph.
A group of atomic entities.
An atomic entity is an entity that cannot be broken down any
further. This Entity (Vertex) could be an application, or a table in a
database, or a document stored on a file system. These Entities are related to
each other through some mechanism (Edge), the mechanism could be a simple
foreign key, or transfer of a subset of data.
Related to each other.
For two entities to have a relationship means that one
entity refers to another entity. In a relational database example this is what
is called a foreign key. I have also seen cases where documents are related to
each other by having some type of business key in the title of the document
along with a prefix, or suffix indicating what type of document it represents.
Stored in a repository.
I use the term repository here because it is no longer the
case that all data is stored in a relational database. In the document case these could be simply
documents stored in a file server. For the case of an application, any
application that an enterprise relies on for its duties persists data in some
type of repository.
Moved from one persistence layer to another.
This is the portion of the definition, in general, where we make the transition from a Level 1 to
a Level 2 Data Structure Graph. Seldom in my experience is one application
sufficient to meet the needs of an entire organization. Data has to flow from
some systems to other systems. This is
generally where a Data Architect should have the most influence in an
organization. This portion of the definition is for data that has to move
between applications, then be persisted in the new application for some period
of time, for use-cases not originally designed as part of the application of
origin.
Rendered as a Graph.
Why do I say it is rendered as a Graph? In a number of
instances where I have worked as a data architect, more time was spent
discussing the layout of a diagram than the content of the diagram. On a recent
project I set out to change the discussion. Using some of the tools available
for Graph analysis, such as Gephi, NetworkX, and iGraph, along with the tenets
of graph theory along with a bit of data cleansing I was able to gather,
consolidate, render, and present data about the overall structure and
relationships of enterprise applications to my customer in a rapid fashion.
This completely changed the conversation from the specifics of the diagram to
the manner in which things needed to be changed to get away from the hairball
they had created.
I will be writing more about Data Structure Graphs as time
goes on. This blog gives us a
foundational definition of what a Data Structure Graph is, in future blogs I
will write about the particular use-cases and interesting analysis done with
Data Structure Graphs.
Update: The book is out now for Kindle: Data Structure Graph Book
Update: The book is out now for Kindle: Data Structure Graph Book