Pages

2021-09-15

What is an Enrichment Platform?

An Enrichment Platform:  

The place for Knowledge workers. 

In my book the Enrichment Game, available from Technics Publications, I write about the players, processes, tools and techniques to create an Enrichment Platform for knowledge workers to create data products. 

The book is available from Technics Publications

While I wrote about what an Enrichment Platform could do, I never properly defined what an Enrichment Platform actually is. 

An Enrichment Platform will be different things to meet different needs. However, the essence of an Enrichment Platform will always be consistent. 

  • Separate your operational data from your analytical and reporting data. 
  • Provide for easily accessible tools to create reports that are not created by application developers. 
  • Knowledge workers like Data Scientists have the tools they are familiar with to create new data products. 
  • Data Operations are managing the flow of data throughout the organization. 
  • Data Governance rules define how and where data is used. 

An Enrichment Platform looks like this: 

An Enrichment Platform


The Value proposition of the Enrichment Platform consists of: 

  • Focusing Application Developers on creating, maintaining, or updating existing products. 
  • Focusing Data Flows through a common organization. 
  • Focuses Data Scientists to be able to work with high quality data. 
  • Increasing Innovation, and reducing the time to new data products. 

This focus gives your organization the ability to adapt to change while having a stable, reliable, and repeatable environment for the knowledge workers to be able to make your organization more innovative. 





2021-02-23

Visualizing tasks in Snowflake.

 I have created a number of tasks in snowflake that are have a various dependencies. 


I wanted a simple way to document these tasks, and the graph nature of how they run. 


Using a few simple queries, you can feed the output of the show tasks command into another query, and that query can be used to format the output to feed it into a tool called graphviz

Graphviz has a very simple notation for creating sophisticated images of different types of graphs. 

In this case we will use a the digraph option for a dot file to create a simple image. 

The dot notation is very simple especially when visualizing hierarchical graphs. 


Create a file in your favorite editor (notepad++) with this syntax: 


digraph G { 


Run the following queries: 

show tasks;

with task_table as (
select split_part("predecessors", '.',  3) as parent,"name" as child from (select * from table(result_scan(last_query_id(-1))))
)
select parent||'->'||child||';' from task_table where parent is not null;

The output from this command you can copy from the worksheet editor and paste between the brackets of your dot file so that the dot file will now look like this: 

 digraph G {
TASKD->TASKE;
TASKB->TASKC;
TASKA->TASKB;
TASKA->TASKI;
TASKC->TASKL;
TASKA->TASKM;
TASKG->TASKH;
TASKA->TASKN;
TASKH->TASKJ;
TASKI->TASKK;
TASKA->TASKD;
TASKE->TASKF;
TASKJ->TASKO;
TASKK->TASKP;
}

 

Save your file as something meaningful like Blog_demo.dot 

So long as you have graphviz installed correctly, once you save the file you can convert the .dot file to a png you can use for documentation like this: 

dot -Tpng Blog_demo.dot > blog_demo.png

 

And here is my anonymized graph visualization. 


Graphviz is a powerful tool for visualization, it is not so much a graph analysis tool like gephi, but it is quite sufficient for documentation, and sharing images that represent the graphs we work with every day. 


.

 

 


2021-02-07

Wordcloud your resume

Word Cloud your resume.

 I am working with more text now than I normally do, and I had an idea for helping people get noticed on LinkedIN. Word Cloud your resume. 

 Getting noticed on LinkedIN is largely a matter of timing, luck, and who you know. I make no claim that creating an image out of the words that make up your resume will guarantee to get you noticed, but humans tend to be more visual creatures.

Popping up an image that summarizes your expertise cannot hurt, and there is a possibility that going through this process you may learn something about how to express yourself. 

 I create an R notebook located at:   https://github.com/dougneedham/WCYR

This is a simple R notebook that anyone can download and run with the latest version of RStudio. 

The notebook walkthrough. 

For this section, you should download the R code and follow along. 

There are a few packages we need to load first. 

These are the packages for reading word documents, text processing, creating a wordcloud, and letting wordcloud  choose various colors. 

The first Cell reads in a resume document. In this case, this is my most recent resume. 

 Using the readtext package reads a word document, then puts all of the text into the variable named text for the result. 

Passing this variable to the original wordcloud package and specifying we want to only display words with a minimum frequency of 2 we get a basic wordcloud image that could be used. However, I want to create something a little more colorful. 

In order to use the wordcloud2 package, the data must be munged a bit into a data frame that lists the words and their frequency rather than just a raw bag of words. 

Using the Corpus function from the tm package gives us just what we are looking for. 

In the next couple of steps we want to lower case all of our words, then remove the standard stopwords from the list of words we are displaying. 

Based on some early displays, I found a few words that kept showing up, so I added them to the standard stopword list to keep them from showing up in our display. 

Now that we have a TermDocumentMatrix we convert that to a standard matrix, then do a summary of the words for some metrics. 

Finally we create a data frame that the wordcloud2 package is expecting. A list of words along with their frequencies. 

Now we run the actual wordcloud command with some color and shape options. I chose the Star shape since I am from Texas.

Displaying a wordcloud on my RStudio screen is cool, but I need a file to attach to postings. The HTMLWidgets and webshot packages allow me to create files based on web pages.Since the wordcloud2 package actually creates an interactive wordcloud that you can hover over, and actually get counts associated with each word we will need to do a few transforms in order to get a proper image out of it. 

In the final cell of the notebook, we save the wordcloud as an image to be manipulated. Then using that image we create an HTML file that can be referenced for later. And finally the webshot function saves the generated HTML as a PNG for attaching to posts, or emailing to your friends. :) 

 

This is an interesting way of enriching your resume, don't you think? 

 

If you want to give this a go, please reach out and let me know if you have any trouble. 

 

 

Code found at github