Pages

2020-12-22

JSON in Snowflake

 Snowflake is an amazing database. It has auto-scaling capabilities, the ability to separate workloads, and automatically recover data using its time-travel capabilities. 

However, one of the most amazing things that I like about Snowflake is its ability to directly query JSON data stored in a variant column Query JSON Data 

Combine this with some other tools like Kakfa and Dbezium and you have some very rapid prototyping capabilities for doing analysis of your application database. 

 Dbezium monitors the transaction logs of your application database, and publishes change events to Kafka. 

Using the Snowflake Kafka Connector, you can capture these events and store the data in what would otherwise be called staging tables. 

The Kafka Connector stores the data as JSON, but Snowflake's ability to query JSON data is simple enough that any SQL developer can extract the pertinent data from the JSON Structure. 

An example from their documentation is:

SELECT src:device_type
  FROM raw_source
The src column of the raw_source table is where the JSON is stored. Once you know the JSON structure you want to query the :COLUMNAME is all that is needed to query the data from the variant column where the JSON data is stored. 
 
For more complicated JSON structures you may need to use some of the flattening technique in order to get all of the data from the JSON structure. 

With tools like this rapid prototyping is a breeze, the more complicated ETL jobs may not be necessary if the goal is to rapidly create a dimensional data model from your source application, then expose that data model with your favorite reporting tool (like Tableau

Combining data from multiple sources using this technique is also very straightforward so long as there are common keys in the systems that relates the data together. 

This allows architects to rapidly play The Enrichment Game, by combining data from  multiple sources, rationalizing the data, enriching one application with the data of another, then expose that data through a reporting tool. 

There are a number of problems I had to solve in different ways without having this capability. Using this particular combination of tools to rapidly create reports and respond to business needs drives more conversations about using Data to solve problems. 
 
These conversations are much more valuable rather than the conversations about how complicated a Data Ecosystem is, and the need to either write more Data Munging code in order to get data migrated from one system to another. 
 
 

1 comment:

  1. For simple JSON, it will work. 95% of time JSON data are complex where simple SQL query will not help.

    ReplyDelete