Last Updated Nov 3, 2021

How to slice, dice and visualize data with sql queries

by JR · 7 minute read

In this post we are going to talk about how to use the query tool to query and visualize data.

If you aren't familiar with the query tool, you may want to read the first post about it here https://www.graphjson.com/guides/queries

Today we are going to construct a query where we can visualize a cohort of users and how they retain over time. Let's imagine we have a social network and we log two types of events

The first event is a register event. The JSON might something like this

{
  user_id: "123",
  event: "register"
}

The second event is a news feed view event. The JSON might look something like this

{
  user_id: "123",
  event: "feed_view"
}

Now the query we want to construct is one that will show for all users that registered in some month X. How many came back at X+1 month, X+2 month etc.

The first query we will construct is getting a unique list of user ids that signed up in a particular month. The query we will use is

SELECT
    user_id
FROM
  (
    SELECT
      toUnixTimestamp(
        toDateTime(toStartOfMonth(toDateTime(event_timestamp, 'America/Los_Angeles')))
      ) AS month,
      JSONExtractString(json, 'user_id') AS user_id
    FROM
      all_events
    WHERE
      JSONExtractString(json, 'event') = 'register'
      AND month = 1635739200
    LIMIT
      10
  )

The next query we'll construct is getting user activity grouped by month.

  SELECT
    toUnixTimestamp(
      toDateTime(toStartOfMonth(toDateTime(event_timestamp, 'America/Los_Angeles')))
    ) AS month,
    JSONExtractString(json, 'user_id') AS user_id
  FROM
    all_events
  WHERE
    JSONExtractString(json, 'user_id') IN user_ids
    AND JSONExtractString(json, 'event') = 'feed_view'
  GROUP BY
    month,
    json

Lastly we'll compose all of these queries together into a single query using WITH clauses

WITH user_ids AS (
  SELECT
    user_id
  FROM
    (
      SELECT
        toUnixTimestamp(
          toDateTime(toStartOfMonth(toDateTime(event_timestamp, 'America/Los_Angeles')))
        ) AS month,
        JSONExtractString(json, 'user_id') AS user_id
      FROM
        all_events
      WHERE
        JSONExtractString(json, 'event') = 'register'
        AND month = 1635739200
      LIMIT
        10
    )
), activity AS (
  SELECT
    toUnixTimestamp(
      toDateTime(toStartOfMonth(toDateTime(event_timestamp, 'America/Los_Angeles')))
    ) AS month,
    JSONExtractString(json, 'user_id') AS user_id
  FROM
    all_events
  WHERE
    JSONExtractString(json, 'user_id') IN user_ids
    AND JSONExtractString(json, 'event') = 'feed_view'
  GROUP BY
    month,
    json
), (
    SELECT COUNT(*) FROM user_ids
) AS cohort_size
SELECT
  COUNT(user_id) / cohort_size * 100 as percentage,
  month AS timestamp
FROM
  activity
GROUP BY
  month

And bam! We have the results.

Query

Now all we need to do is configure the visualize tool and we have the graph we want!

Query

If we want we can now click on the "Export" button and export it to a dashboard. We can then share a link to this dashboard to our investors! We can also optionally add password protection on the dashboard!

GraphJSON by @TheBuilderJR

© GraphJSON 2021