How does centrality affect your Architecture?
Some time ago, I was responsible for a data architecture I
had mostly inherited. There were a number of tweaks I worked to on to refine
the monolithic nature of the main database. It was a time of upheaval in this
organization. They had outgrown their legacy Computer Telephony Interface
application. It was time to create something new.
A large new application development team was brought in to
develop some new software.
There was a large division of labor and processing where
some things were handled by the new application, and another thing was
developed to handle the data. Reporting, cleansing, analysis, ingress feeds,
egress feeds, all of these went through the “less important” system.
This was the system I was responsible for.
In thinking about how best to explain a Data Structure
Graph, I spent some time revisiting this architecture and brought it into a format
that could be analyzed with the tools of Network Analysis.
After anonymizing the data a bit, and limiting the data flows to only the principle data flows, I constructed a csv file
to load into Gephi for analysis.
Source
|
Target
|
Edge_Label
|
Spider
|
ODS
|
Application
|
ODS
|
Spider
|
Prospect
|
Vendor1
|
ODS
|
Prospect
|
Vendor2
|
ODS
|
Prospect
|
Vendor3
|
ODS
|
Prospect
|
ODS
|
Servicing
|
Application
|
Legacy
|
ODS
|
Application
|
ODS
|
Legacy
|
Prospect
|
ODS
|
Dialer1
|
Prospect
|
ODS
|
Dialer2
|
Prospect
|
Gov
|
ODS
|
DNC
|
ODS
|
Spider
|
LegacyData1
|
ODS
|
Spider
|
LegacyData2
|
ODS
|
Spider
|
LegacyData3
|
Spider
|
ODS
|
LegacyData1
|
Spider
|
ODS
|
LegacyData2
|
Spider
|
ODS
|
LegacyData3
|
ODS
|
ThirdParty
|
Prospect
|
ThirdParty
|
ODS
|
Application
|
Legacy
|
ODS
|
Application
|
Legacy
|
ODS
|
DialerStats
|
Dialer1
|
ODS
|
DialerStats
|
Dialer2
|
ODS
|
DialerStats
|
I ran a few simple statistics on the graph, then did some
partitioning to color the graph to make it apparent the degree of a node this
is the first output of Gephi:
The actual statistics Gephi calculated are in this table:
Id
|
Label
|
PageRank
|
Eigenvector Centrality
|
In-Degree
|
Out-Degree
|
Degree
|
Vendor1
|
Vendor1
|
0.01991719
|
0.00000000
|
0
|
1
|
1
|
Vendor2
|
Vendor2
|
0.01991719
|
0.00000000
|
0
|
1
|
1
|
Vendor3
|
Vendor3
|
0.01991719
|
0.00000000
|
0
|
1
|
1
|
Gov
|
Gov
|
0.01991719
|
0.00000000
|
0
|
1
|
1
|
Spider
|
Spider
|
0.08121259
|
0.44698155
|
1
|
1
|
2
|
Servicing
|
Servicing
|
0.08121259
|
0.44698155
|
1
|
0
|
1
|
Legacy
|
Legacy
|
0.08121259
|
0.44698155
|
1
|
1
|
2
|
Dialer1
|
Dialer1
|
0.08121259
|
0.44698155
|
1
|
1
|
2
|
Dialer2
|
Dialer2
|
0.08121259
|
0.44698155
|
1
|
1
|
2
|
ThirdParty
|
ThirdParty
|
0.08121259
|
0.44698155
|
1
|
1
|
2
|
ODS
|
ODS
|
0.43305573
|
1.00000000
|
9
|
6
|
15
|
From the Data Architecture perspective, which “application”
has the greatest impact to the organization if there were a failure?
Which “application” should have the greatest degree of
protection, redundancy, and expertise
associated with it?
Let's cover in detail the two metrics in the middle of the last table PageRank, and Eigenvector Centrality.
I will have to create individual blog entries for both PageRank and Eigenvector Centrality to discuss the actual mechanism for how these are calculated. The math for these can be a bit cumbersome, and each algorithm should be given due attention on its own.
The point of this analysis is to determine which component of the architecture should have additional resources devoted to it. For any customer facing application, it should be given due attention, and infrastructure. However, one question I have seen many of my clients struggle with is what is the priority of the back-end infrastructure? Should once component of the architecture be given more attention than another? I have 90 databases throughout the organization, which one is the most important?
These centrality calculations show unequivocally which component of the architecture has the most impact in the event of an outage, or where the most value can be provided for an upgrade.
This type of analysis can begin to shed light on the answers to these questions. A methodical approach to an architecture based on data, rather than the division that screams the loudest can give insight into how an architecture is truly implemented.
I call these artifacts a Data Structure Graph
No comments:
Post a Comment