2021-09-15

What is an Enrichment Platform?

An Enrichment Platform:

The place for Knowledge workers.

In my book the Enrichment Game, available from Technics Publications, I write about the players, processes, tools and techniques to create an Enrichment Platform for knowledge workers to create data products.

The book is available from Technics Publications

While I wrote about what an Enrichment Platform could do, I never properly defined what an Enrichment Platform actually is.

An Enrichment Platform will be different things to meet different needs. However, the essence of an Enrichment Platform will always be consistent.

Separate your operational data from your analytical and reporting data.
Provide for easily accessible tools to create reports that are not created by application developers.
Knowledge workers like Data Scientists have the tools they are familiar with to create new data products.
Data Operations are managing the flow of data throughout the organization.
Data Governance rules define how and where data is used.

An Enrichment Platform looks like this:

The Value proposition of the Enrichment Platform consists of:

Focusing Application Developers on creating, maintaining, or updating existing products.
Focusing Data Flows through a common organization.
Focuses Data Scientists to be able to work with high quality data.
Increasing Innovation, and reducing the time to new data products.

This focus gives your organization the ability to adapt to change while having a stable, reliable, and repeatable environment for the knowledge workers to be able to make your organization more innovative.

2021-02-23

Visualizing tasks in Snowflake.

I have created a number of tasks in snowflake that are have a various dependencies.

I wanted a simple way to document these tasks, and the graph nature of how they run.

Using a few simple queries, you can feed the output of the show tasks command into another query, and that query can be used to format the output to feed it into a tool called graphviz.

Graphviz has a very simple notation for creating sophisticated images of different types of graphs.

In this case we will use a the digraph option for a dot file to create a simple image.

The dot notation is very simple especially when visualizing hierarchical graphs.

Create a file in your favorite editor (notepad++) with this syntax:

digraph G {

}

Run the following queries:

show tasks;

with task_table as (
select split_part("predecessors", '.', 3) as parent,"name" as child from (select * from table(result_scan(last_query_id(-1))))
)
select parent||'->'||child||';' from task_table where parent is not null;

The output from this command you can copy from the worksheet editor and paste between the brackets of your dot file so that the dot file will now look like this:

digraph G {
TASKD->TASKE;
TASKB->TASKC;
TASKA->TASKB;
TASKA->TASKI;
TASKC->TASKL;
TASKA->TASKM;
TASKG->TASKH;
TASKA->TASKN;
TASKH->TASKJ;
TASKI->TASKK;
TASKA->TASKD;
TASKE->TASKF;
TASKJ->TASKO;
TASKK->TASKP;
}

Save your file as something meaningful like Blog_demo.dot

So long as you have graphviz installed correctly, once you save the file you can convert the .dot file to a png you can use for documentation like this:

dot -Tpng Blog_demo.dot > blog_demo.png

And here is my anonymized graph visualization.

Graphviz is a powerful tool for visualization, it is not so much a graph analysis tool like gephi, but it is quite sufficient for documentation, and sharing images that represent the graphs we work with every day.

2021-02-07

Wordcloud your resume

Word Cloud your resume.

I am working with more text now than I normally do, and I had an idea for helping people get noticed on LinkedIN. Word Cloud your resume.

Getting noticed on LinkedIN is largely a matter of timing, luck, and who you know. I make no claim that creating an image out of the words that make up your resume will guarantee to get you noticed, but humans tend to be more visual creatures.

Popping up an image that summarizes your expertise cannot hurt, and there is a possibility that going through this process you may learn something about how to express yourself.

I create an R notebook located at: https://github.com/dougneedham/WCYR

This is a simple R notebook that anyone can download and run with the latest version of RStudio.

The notebook walkthrough.

For this section, you should download the R code and follow along.

There are a few packages we need to load first.

These are the packages for reading word documents, text processing, creating a wordcloud, and letting wordcloud choose various colors.

The first Cell reads in a resume document. In this case, this is my most recent resume.

Using the readtext package reads a word document, then puts all of the text into the variable named text for the result.

Passing this variable to the original wordcloud package and specifying we want to only display words with a minimum frequency of 2 we get a basic wordcloud image that could be used. However, I want to create something a little more colorful.

In order to use the wordcloud2 package, the data must be munged a bit into a data frame that lists the words and their frequency rather than just a raw bag of words.

Using the Corpus function from the tm package gives us just what we are looking for.

In the next couple of steps we want to lower case all of our words, then remove the standard stopwords from the list of words we are displaying.

Based on some early displays, I found a few words that kept showing up, so I added them to the standard stopword list to keep them from showing up in our display.

Now that we have a TermDocumentMatrix we convert that to a standard matrix, then do a summary of the words for some metrics.

Finally we create a data frame that the wordcloud2 package is expecting. A list of words along with their frequencies.

Now we run the actual wordcloud command with some color and shape options. I chose the Star shape since I am from Texas.

Displaying a wordcloud on my RStudio screen is cool, but I need a file to attach to postings. The HTMLWidgets and webshot packages allow me to create files based on web pages.Since the wordcloud2 package actually creates an interactive wordcloud that you can hover over, and actually get counts associated with each word we will need to do a few transforms in order to get a proper image out of it.

In the final cell of the notebook, we save the wordcloud as an image to be manipulated. Then using that image we create an HTML file that can be referenced for later. And finally the webshot function saves the generated HTML as a PNG for attaching to posts, or emailing to your friends. :)

This is an interesting way of enriching your resume, don't you think?

If you want to give this a go, please reach out and let me know if you have any trouble.

Code found at github

2020-12-22

JSON in Snowflake

Snowflake is an amazing database. It has auto-scaling capabilities, the ability to separate workloads, and automatically recover data using its time-travel capabilities.

However, one of the most amazing things that I like about Snowflake is its ability to directly query JSON data stored in a variant column Query JSON Data

Combine this with some other tools like Kakfa and Dbezium and you have some very rapid prototyping capabilities for doing analysis of your application database.

Dbezium monitors the transaction logs of your application database, and publishes change events to Kafka.

Using the Snowflake Kafka Connector, you can capture these events and store the data in what would otherwise be called staging tables.

The Kafka Connector stores the data as JSON, but Snowflake's ability to query JSON data is simple enough that any SQL developer can extract the pertinent data from the JSON Structure.

An example from their documentation is:

SELECT src:device_type
  FROM raw_source

The src column of the raw_source table is where the JSON is stored. Once you know the JSON structure you want to query the :COLUMNAME is all that is needed to query the data from the variant column where the JSON data is stored.

For more complicated JSON structures you may need to use some of the flattening technique in order to get all of the data from the JSON structure.

With tools like this rapid prototyping is a breeze, the more complicated ETL jobs may not be necessary if the goal is to rapidly create a dimensional data model from your source application, then expose that data model with your favorite reporting tool (like Tableau)

Combining data from multiple sources using this technique is also very straightforward so long as there are common keys in the systems that relates the data together.

This allows architects to rapidly play The Enrichment Game, by combining data from multiple sources, rationalizing the data, enriching one application with the data of another, then expose that data through a reporting tool.

There are a number of problems I had to solve in different ways without having this capability. Using this particular combination of tools to rapidly create reports and respond to business needs drives more conversations about using Data to solve problems.

These conversations are much more valuable rather than the conversations about how complicated a Data Ecosystem is, and the need to either write more Data Munging code in order to get data migrated from one system to another.

2020-06-21

No Response?

Does No Response mean: Yes, No, or Piss off?

I am an old guy. I know this means I have certain expectations for the way things should be done. Please and Thank you never go out of style. Sorry is a sign of weakness, but if you make a mistake or miscommunication you should acknowledge that.

One of the earliest lessons in protocol I learned in Marine Corps boot camp. Our platoon was waiting on something (which is a good portion of what boot camp is all about learning to be prepared for your next set of instructions.) Our Drill Instructor told us to watch something.

There were a group of officers some distance away waiting for their next set of instructions, and a lone recruit was walking towards them.

The recruit stopped, saluted,and requested permission to walk by.

Every officer in the group of about 8 returned the salute.

The Drill Instructor then got our attention and in words I don't quite recall said something to this affect.

The Salute is a sacred duty. Every Marine has to salute superior officers. But here is the duty behind the salute. Every officer must acknowledge and return the salute.

The lowliest Private can Salute the commandant of the Marine Corps, the Commandant will return the Salute. It is an acknowledgement of our shared duty and heritage as Marines.

This may be a touching story, but what does this have to do with anything?

For those who are a Director, Vice President, or C-level executive, if one of your people in your reporting structure communicates something to you. You should at the very least acknowledge the communication.

Something as simple as, "I have received your message, I need to think about it.", a thumbs up on Slack, or even a simple OK is better than nothing.

Communication goes both ways.

We are flooded with communication today. Slack, Teams, email, Text, alerts, etc.

Many of these are automated and do not stop until they are acknowledged.

Should we not have enough professionalism to at least give a little:

The-Data-Guy

Pages