Pages

2021-02-23

Visualizing tasks in Snowflake.

 I have created a number of tasks in snowflake that are have a various dependencies. 


I wanted a simple way to document these tasks, and the graph nature of how they run. 


Using a few simple queries, you can feed the output of the show tasks command into another query, and that query can be used to format the output to feed it into a tool called graphviz

Graphviz has a very simple notation for creating sophisticated images of different types of graphs. 

In this case we will use a the digraph option for a dot file to create a simple image. 

The dot notation is very simple especially when visualizing hierarchical graphs. 


Create a file in your favorite editor (notepad++) with this syntax: 


digraph G { 


Run the following queries: 

show tasks;

with task_table as (
select split_part("predecessors", '.',  3) as parent,"name" as child from (select * from table(result_scan(last_query_id(-1))))
)
select parent||'->'||child||';' from task_table where parent is not null;

The output from this command you can copy from the worksheet editor and paste between the brackets of your dot file so that the dot file will now look like this: 

 digraph G {
TASKD->TASKE;
TASKB->TASKC;
TASKA->TASKB;
TASKA->TASKI;
TASKC->TASKL;
TASKA->TASKM;
TASKG->TASKH;
TASKA->TASKN;
TASKH->TASKJ;
TASKI->TASKK;
TASKA->TASKD;
TASKE->TASKF;
TASKJ->TASKO;
TASKK->TASKP;
}

 

Save your file as something meaningful like Blog_demo.dot 

So long as you have graphviz installed correctly, once you save the file you can convert the .dot file to a png you can use for documentation like this: 

dot -Tpng Blog_demo.dot > blog_demo.png

 

And here is my anonymized graph visualization. 


Graphviz is a powerful tool for visualization, it is not so much a graph analysis tool like gephi, but it is quite sufficient for documentation, and sharing images that represent the graphs we work with every day. 


.

 

 


2021-02-07

Wordcloud your resume

Word Cloud your resume.

 I am working with more text now than I normally do, and I had an idea for helping people get noticed on LinkedIN. Word Cloud your resume. 

 Getting noticed on LinkedIN is largely a matter of timing, luck, and who you know. I make no claim that creating an image out of the words that make up your resume will guarantee to get you noticed, but humans tend to be more visual creatures.

Popping up an image that summarizes your expertise cannot hurt, and there is a possibility that going through this process you may learn something about how to express yourself. 

 I create an R notebook located at:   https://github.com/dougneedham/WCYR

This is a simple R notebook that anyone can download and run with the latest version of RStudio. 

The notebook walkthrough. 

For this section, you should download the R code and follow along. 

There are a few packages we need to load first. 

These are the packages for reading word documents, text processing, creating a wordcloud, and letting wordcloud  choose various colors. 

The first Cell reads in a resume document. In this case, this is my most recent resume. 

 Using the readtext package reads a word document, then puts all of the text into the variable named text for the result. 

Passing this variable to the original wordcloud package and specifying we want to only display words with a minimum frequency of 2 we get a basic wordcloud image that could be used. However, I want to create something a little more colorful. 

In order to use the wordcloud2 package, the data must be munged a bit into a data frame that lists the words and their frequency rather than just a raw bag of words. 

Using the Corpus function from the tm package gives us just what we are looking for. 

In the next couple of steps we want to lower case all of our words, then remove the standard stopwords from the list of words we are displaying. 

Based on some early displays, I found a few words that kept showing up, so I added them to the standard stopword list to keep them from showing up in our display. 

Now that we have a TermDocumentMatrix we convert that to a standard matrix, then do a summary of the words for some metrics. 

Finally we create a data frame that the wordcloud2 package is expecting. A list of words along with their frequencies. 

Now we run the actual wordcloud command with some color and shape options. I chose the Star shape since I am from Texas.

Displaying a wordcloud on my RStudio screen is cool, but I need a file to attach to postings. The HTMLWidgets and webshot packages allow me to create files based on web pages.Since the wordcloud2 package actually creates an interactive wordcloud that you can hover over, and actually get counts associated with each word we will need to do a few transforms in order to get a proper image out of it. 

In the final cell of the notebook, we save the wordcloud as an image to be manipulated. Then using that image we create an HTML file that can be referenced for later. And finally the webshot function saves the generated HTML as a PNG for attaching to posts, or emailing to your friends. :) 

 

This is an interesting way of enriching your resume, don't you think? 

 

If you want to give this a go, please reach out and let me know if you have any trouble. 

 

 

Code found at github 

 

 

 

 

 

2020-12-22

JSON in Snowflake

 Snowflake is an amazing database. It has auto-scaling capabilities, the ability to separate workloads, and automatically recover data using its time-travel capabilities. 

However, one of the most amazing things that I like about Snowflake is its ability to directly query JSON data stored in a variant column Query JSON Data 

Combine this with some other tools like Kakfa and Dbezium and you have some very rapid prototyping capabilities for doing analysis of your application database. 

 Dbezium monitors the transaction logs of your application database, and publishes change events to Kafka. 

Using the Snowflake Kafka Connector, you can capture these events and store the data in what would otherwise be called staging tables. 

The Kafka Connector stores the data as JSON, but Snowflake's ability to query JSON data is simple enough that any SQL developer can extract the pertinent data from the JSON Structure. 

An example from their documentation is:

SELECT src:device_type
  FROM raw_source
The src column of the raw_source table is where the JSON is stored. Once you know the JSON structure you want to query the :COLUMNAME is all that is needed to query the data from the variant column where the JSON data is stored. 
 
For more complicated JSON structures you may need to use some of the flattening technique in order to get all of the data from the JSON structure. 

With tools like this rapid prototyping is a breeze, the more complicated ETL jobs may not be necessary if the goal is to rapidly create a dimensional data model from your source application, then expose that data model with your favorite reporting tool (like Tableau

Combining data from multiple sources using this technique is also very straightforward so long as there are common keys in the systems that relates the data together. 

This allows architects to rapidly play The Enrichment Game, by combining data from  multiple sources, rationalizing the data, enriching one application with the data of another, then expose that data through a reporting tool. 

There are a number of problems I had to solve in different ways without having this capability. Using this particular combination of tools to rapidly create reports and respond to business needs drives more conversations about using Data to solve problems. 
 
These conversations are much more valuable rather than the conversations about how complicated a Data Ecosystem is, and the need to either write more Data Munging code in order to get data migrated from one system to another. 
 
 

2020-06-21

No Response?

Does No Response mean: Yes, No, or Piss off? 


I am an old guy. I know this means I have certain expectations for the way things should be done. Please and Thank you never go out of style. Sorry is a sign of weakness, but if you make a mistake or miscommunication you should acknowledge that. 

One of the earliest lessons in protocol I learned in Marine Corps boot camp. Our platoon was waiting on something (which is a good portion of what boot camp is all about learning to be prepared for your next set of instructions.) Our Drill Instructor told us to watch something. 

There were a group of officers some distance away waiting for their next set of instructions, and a lone recruit was walking towards them. 

The recruit stopped, saluted,and requested permission to walk by. 

Every officer in the group of about 8 returned the salute. 

The Drill Instructor then got our attention and in words I don't quite recall said something to this affect. 

The Salute is a sacred duty. Every Marine has to salute superior officers. But here is the duty behind the salute. Every officer must acknowledge and return the salute. 

The lowliest Private can Salute the commandant of the Marine Corps, the Commandant will return the Salute. It is an acknowledgement of our shared duty and heritage as Marines. 

This may be a touching story, but what does this have to do with anything? 

For those who are a Director, Vice President, or C-level executive, if one of your people in your reporting structure communicates something to you. You should at the very least acknowledge the communication. 

Something as simple as, "I have received your message, I need to think about it.", a thumbs up on Slack, or even a simple OK is better than nothing. 

Communication goes both ways. 

We are flooded with communication today. Slack, Teams, email, Text, alerts, etc. 

Many of these are automated and do not stop until they are acknowledged. 

Should we not have enough professionalism to at least give a little: 


2020-06-17

Poor Planning

Poor Planning on your part does not constitute an Emergency on my part.  

(Except when it does.)


Recently I was involved in a major hardware failure at work.

There had been indications our disk storage was under duress for quite some time. It had been in my status reports for months, I finally quit reporting on it since none of our leadership team even acknowledged the concern.

They would say, yeah we know we need to do something there. Or even better: Your predecessor complained about that as well. 

Over a long weekend involving most of my team, teams from other groups, vendors from infrastructure support and hardware vendors we fought for our customers not management.

War room calls were set up to run around the clock with people stepping in and out of the meeting we finally were able to get things ironed out. 

But at what cost? 

To proactively take care of this issue would have required potentially spending a bit of money up front to either get new infrastructure or upgrade it.

However to be down for the amount of time that we were down we broke faith with our customers.

I don't know the long term impacts of that on our organization. 

In cases like this it is never a good idea to say "I told you so."

A current joke going around says: "At the start of every disaster movie there's a scientist being ignored"

When you are in the middle of a disaster even if you were the one that could have prevented it, the only thing that you can do is to work hard on the recovery.

There is a tendency to shoot the messenger when dealing with issues, but that should be the last thing on anyones mind. The messenger is the one who knew about the problem the longest. Usually they are the person that has been kept up at night stressing over what to do.

The messenger probably has the most ideas about how to solve the problem.

Knee-jerk reactions do not create long term solutions. Careful design, planning, testing, and proactively building in resiliency and stability create long term solutions.

None of these are cheap, but as the saying goes: "Pay me now, or pay me later."

Proactive "pay me now" situations appear to be expensive.

Reactive "pay me later" situations make the proactive look like nickels and dimes.

When you are getting warnings about things that need to be addressed, don't ignore them. If you are the ones giving the warnings, don't give up.

Keep warning.

Keep telling.

Above all, Keep planning.

When the disaster strikes, someone has to be the voice of reason.