Recently I mentioned to a few people that I was doing an analysis on a Data Structure Graph. The immediate response was :

What is a Data Structure Graph?

The image shows a Data Structure Graph rendered for a fictional organization where the applications are the dots(Nodes), and the data transfers between the applications are the lines (Edges)


A Definition

A Data Structure Graph is a group of atomic entities that are related to each other, stored in a repository, then moved from one persistence layer to another, rendered as a Graph.

A group of atomic entities.

An atomic entity is an entity that cannot be broken down any further. This Entity (Vertex) could be an application, or a table in a database, or a document stored on a file system. These Entities are related to each other through some mechanism (Edge), the mechanism could be a simple foreign key, or transfer of a subset of data.

Related to each other.

For two entities to have a relationship means that one entity refers to another entity. In a relational database example this is what is called a foreign key. I have also seen cases where documents are related to each other by having some type of business key in the title of the document along with a prefix, or suffix indicating what type of document it represents.

Stored in a repository.

I use the term repository here because it is no longer the case that all data is stored in a relational database.  In the document case these could be simply documents stored in a file server. For the case of an application, any application that an enterprise relies on for its duties persists data in some type of repository.

Moved from one persistence layer to another.

This is the portion of the definition, in general,  where we make the transition from a Level 1 to a Level 2 Data Structure Graph. Seldom in my experience is one application sufficient to meet the needs of an entire organization. Data has to flow from some systems to other systems.  This is generally where a Data Architect should have the most influence in an organization. This portion of the definition is for data that has to move between applications, then be persisted in the new application for some period of time, for use-cases not originally designed as part of the application of origin.

Rendered as a Graph.

Why do I say it is rendered as a Graph? In a number of instances where I have worked as a data architect, more time was spent discussing the layout of a diagram than the content of the diagram. On a recent project I set out to change the discussion. Using some of the tools available for Graph analysis, such as Gephi, NetworkX, and iGraph, along with the tenets of graph theory along with a bit of data cleansing I was able to gather, consolidate, render, and present data about the overall structure and relationships of enterprise applications to my customer in a rapid fashion. This completely changed the conversation from the specifics of the diagram to the manner in which things needed to be changed to get away from the hairball they had created.

I will be writing more about Data Structure Graphs as time goes on.  This blog gives us a foundational definition of what a Data Structure Graph is, in future blogs I will write about the particular use-cases and interesting analysis done with Data Structure Graphs.

Update: The book is out now for Kindle: Data Structure Graph Book