Motivation
From time to time you may need to review the users who are querying a Snowflake table. It’s common for legal and compliance teams to ask to review which internal users are actually accessing sensitive user data.
Another reason is operational, such as when engineers on your data team want to delete unused tables. In this case, they need to understand recent usage and whether a table is safe to delete.
Account Usage Views
Before we jump in, let’s take a brief detour to Snowflake’s Account Usage schema:
ACCOUNT_USAGE and READER_ACCOUNT_USAGE schemas enable querying object metadata, as well as historical usage data, for your account and all reader accounts (if any) associated with the account.
These views are super useful for getting a meta understanding of our Snowflake usage and security policies. In fact, Snowflake offers a few custom roles for these views: Object Viewer, Usage Viewer, Security Viewer, and Governance Viewer. Each role has access to a different set of data.
Becoming a Governance Viewer
To access audit history views, we need the GOVERNANCE_VIEWER
role. We can’t apply this database role to our user directly, so we need to create a role or use an existing one.
CREATE ROLE governance_users;
GRANT DATABASE ROLE snowflake.governance_viewer
TO ROLE governance_users;
GRANT ROLE governance_users TO USER alice;
With our governance_users role inheriting from the GOVERNANCE_VIEWER
role, Alice can now dive head first into our Snowflake account’s audit history!
Querying the Audit History
To help us on our quest, we’re using two Account Usage tables:
query_history — provides us the actual query text.
access_history — provides us the exact names of accessed objects.
Now let’s find all the queries for a specific table in the last 90 days. The query below uses the access_history.base_objects_accessed
column to find the name of the object being accessed.
In plain english, this means we report the underlying table being accessed, and skip over any higher level materialized views that sit between a user’s query and the data.
SELECT
qh.user_name,
qh.query_text,
value:objectName::string as "TABLE"
FROM
snowflake.account_usage.query_history AS qh
JOIN
snowflake.account_usage.access_history AS ah
ON
qh.query_id = ah.query_id,
LATERAL FLATTEN(input => ah.base_objects_accessed)
WHERE
query_type = 'SELECT' AND
value:objectName = 'SNOWFLAKE_SAMPLE_DATA.TPCDS_SF10TCL.CUSTOMER' AND
start_time > dateadd(day, -90, current_date());
This returns a list of users and their queries for the TPCDS_SF10TCL.CUSTOMER
sample data table. Example output:
USER_NAME | QUERY_TEXT
------------|---------------------------------------------------
ALICE | select * from merchants.production.customer_view_14;
BOB | select * from merchants.production.customer_view_35;
CHARLIE | select * from merchants.production.customer_view_2;
Notice the query_text which indicates the users accessed the table using a variety of different Snowflake views, which gives us extra context into the access itself.
From here, we can summarize access for all tables and aggregate the users to fit our reporting requirements.
Limitations
While useful as a demonstration, the sample code above only works for basic queries on tables. There is a lot more depth to the sorts of access you may want to track about your internal users.
You’d also want to automate this process so that you’re not spending our valuable time dealing with creating reports for legal, compliance, and governance teams.
About Spyglass
Since you’re here, let me tell you what we’ve cooked up at Spyglass. In short, we make Snowflake data access controls easy - or provide an automated and better way to do the above.
If you’ve nodded your head while reading this, reach out at spyglass.software (or demo@spyglass.software) and we’ll show you a product demo to give you a taste of the future of data access management.