Security in Snowflake

Introduction

Snowflake is a cloud-based data warehousing platform that is designed to handle data storage and analysis for large organizations. It uses a unique architecture that allows it to be highly scalable, efficient, and flexible, making it an attractive option for businesses that need to store and analyze large amounts of data.

Snowflake also provides robust security features to protect data and ensure compliance with industry regulations.

In this blog, we will learn about all the intriguing security features that Snowflake has to offer us.

Security Consideration

Before diving into the security considerations, We need to discuss the guiding principles that serve as the foundation of every organization’s security architecture. They serve as goals and objectives for any security program, in reality.

The CIA triangle, which is intended to direct policies for data security, is one of the most significant models when discussing security at the organizational level.

demand forecast

CIA stands for :

  1. Confidentiality
    Confidentiality means that only authorized individuals/systems can view sensitive or classified information whether it is in any form(in motion or at rest).

  2. Integrity
    The next thing to talk about is integrity. Well, the idea here is to make sure that data is trustworthy and free from tampering at any stage of the Data Lifecycle.

  3. Availability
    This means that the network should be readily available to its users. This applies to both systems and data. To ensure availability, the network administrator should maintain hardware, make regular upgrades, have a plan for failover, and prevent bottlenecks in a network.

Keeping this GOLD Standard in mind we will now discuss Security Considerations at every level of the data lifecycle

demand forecast

Let’s go by an example,

There is a Company named “Alfa”. It produces large amounts of data which is consumed on regular basis by its employee to generate reports, do some reconciliation, handle analytical workload & some auditing purposes.

Data Creation is the first step.
Once the data has been generated it immediately gets stored in a storage system or a network drive.

On stored data or while storing data, there are generally two types of actions made by a user/program.

  • To do something like transformation, querying.
  • To Load/Unload some data.

Since our company is producing very sensitive data. We need to make sure that our storage system is only accessible from a few authentic IP addresses while querying & … Load/Unloading of data is done via private channels only. Also, data should be encrypted at rest & also in motion.

In this way, we can avoid PHISHING, DOS Attacks & Man In Middle Attacks, which are major cyber threats to any organization

DATA USAGE

Since our Company is big, we have different departments like HR that are eligible to view Employee’s sensitive information. Finance who are eligible to view the Balance sheet. But at the same time, an employee is not eligible to do so.

Therefore, now is the right time to setup Access control frameworks that helps us in defining data usages policies & it can be achieved by different authorization techniques like Role-based access control, Rule-based access control, Risk-based access control

In this way, we can avoid Data Leakage & Attacks from Malicious Insiders.

DATA SHARING

The data we have generated will be shared & consumed via different mediums like APIs, dashboards, drivers, or another group of users via Client Screens. Therefore we need to have a secure way of gaining access to our systems which can be achieved through Authentication Methods like MFA, OAuth, etc

In this way, we can avoid hijacking or misuse of our accounts

Security Architecture

By looking into all the considerations we have a well-designed security architecture that will cover all security features provided by Snowflake.

Snowflake secures customer data using defence depth with three security layers.

  • Network Security
  • IAM
  • Data Encryption

demand forecast

Network Security

The first line of defence against malicious individuals trying to access Snowflake customer accounts is network security.

Snowflake provides two different forms of network security safeguards to protect against malicious users: employing network policies and private connectivity.

  1. Network Policies

demand forecast

Network access to the Snowflake data warehouse is managed and limited using Snowflake Network Policies. These policies can be applied to restrict access to particular protocols or ports, as well as to designate allowed IP addresses or CIDR ranges. Only users with the SECURITYADMIN position or higher, as well as roles with the global CREATE NETWORK POLICY access, are able to create network policies. A network policy’s ownership can be transferred to another role. Network Policies can be managed via the Snowflake web interface or the Snowflake API and applied at the account, user, and warehouse levels. To determine whether a network policy is set on your account or for a specific user, execute the SHOW PARAMETERS command.

  1. Private Connectivity

Business Critical Feature**

By connecting to Snowflake with a private IP address and using the private connectivity provided by cloud service providers like AWS PrivateLink or Azure Private Link, you can make use of the private connectivity available.

Snowflake’s account will show up in our network as a resource thanks to this feature. Here are a few guidelines for using this function effectively.

  • Setting up DNS to resolve Snowflake’s secret URL is our responsibility. The optimal strategy is to use private DNS in our cloud provider network since it enables clients running both on-premises and in the cloud provider network to resolve Snowflake accounts. The Snowflake account can then have a DNS forwarding rule created for it in our on-premise DNS.
  • If we want to restrict access to the public endpoint after configuring private connectivity, we can make an account-level network policy that only allows connections from the private IP range on our network.
  • Client apps running outside of our network would connect to our account via a public endpoint if we wanted to let this happen. We can add the client application’s IP range to the approved list of account level, user level, or OAuth integration network policy in order to grant access, depending on the use case.

Identity and Access Management

The next step in gaining access to Snowflake is to authenticate the user after our Snowflake account has been made accessible. Before gaining access, users must be created in Snowflake.
Once the user has been verified, a session with roles is formed and utilised to grant access to Snowflake.

This section covers best practices for:

1. Managing users and roles
2. Authentication and single sign-on
3. Sessions
4. Object-level access control (authorization)
5. Column-level access control
6. Row-level access control

Managing users and roles

To supply and externally manage users and roles in Snowflake, Snowflake suggests utilising SCIM when our Identity Provider supports it. User and role synchronisation with our Active Directory users and groups is a feature that identity providers can be further customised to offer. If SCIM is not an option for us for any reason, create our own AD sync tool using a Snowflake driver that is similar to this one.

Snowflake recommends using federated single sign-on (SSO) while using passwords for only certain use cases such as for service accounts and users with the Snowflake ACCOUNTADMIN system role. For such cases, the password management best practices are as follows:

  • Enable built-in Duo multi-factor authentication for additional security.
  • Use lengthy, complex passwords that are preferably monitored by platforms for privileged access management (PAM).To utilise Hashicorp Vault with Snowflake, see the sample.
  • Passwords should be changed on a frequent basis. Snowflake does not presently enable password expiry. however, we can force password changes by using platforms for secrets management or privileged access management (PAM).

Authentication and single sign-on

Depending on the interface being used, Snowflake offers a variety of authentication techniques, including client applications using drivers, UI, or Snowpipe.

demand forecast

Snowflake advises compiling a spreadsheet that details each client application that connects to it as well as its authentication capabilities. Use the authentication method in the priority order listed below if the app supports multiple authentication methods.

  • OAuth (either Snowflake OAuth or External OAuth)
  • If the application is a desktop programme and OAuth is not supported, use an external browser.
  • If we’re using Okta, the app supports Okta native authentication but neither OAuth nor external browser authentication just yet.
  • Key Pair Authentication, which is mostly utilised by service account users. Add our internal key management software as a complement because this necessitates the client application managing private keys.
  • If none of the aforementioned alternatives is supported by the application, the last resort should be a password. Users connecting via third-party ETL apps typically use this option when using service account login credentials.

*Additionally, Snowflake advises always employing MFA because it adds an extra layer of security for user access.

Object-level Access Control

In Snowflake, roles are used to control access to objects like tables, views, and functions. Roles have hierarchies and can contain other roles. The primary role is linked to the database session when it is created for a user. To carry out the authorisation, all roles in the principal role’s hierarchy are activated throughout the session. We should spend some time upfront creating a proper role hierarchy model.

Snowflake recommends the following best practices for access control:

  • Define functional roles and access roles
  • Avoid granting access roles to other access roles
  • Use future grants
  • Set default_role property for the user
  • Create a role per user for cross-database join use cases
  • Use managed access schema to centralize grant management

Column-level Access Control

Snowflake advises using the following data governance capabilities to limit column access for unauthorised users if we want to restrict access to sensitive information that is present in particular columns, such as PII, PHI, or financial data.

  1. Dynamic Data Masking: this is a built-in feature that can dynamically obfuscate column data based on who’s querying it.
  2. External Tokenization: It integrates with partner solutions to detokenize data at query time for authorized users.
  3. Secure Views: We can hide the columns entirely from unauthorized users using them.

Masking policies are used by Dynamic Data Masking and External Tokenization to limit authorised users’ access to sensitive data. Additionally, Snowflake suggests the following guidelines for disguising policies:

  • Determine up-front if we want to take a centralized vs. decentralized approach for policy management.
  • Use invoker_role() in policy condition for unauthorized users to view aggregate data while unable to view individual data.
  • Avoid using the SHA2 function in the policy to allow joins on protected columns for unauthorized users since it can lead to unintended query results.

Row-level Access Control

Snowflake offers row-level security by using row access restrictions to choose which rows to return in the query result. The row access policy can be as basic as allowing one role to view rows or as sophisticated as including a mapping table in the policy description to decide access to rows in the query result.

It is a schema-level object that controls whether a certain row in a table or view may be accessed using the following statements:

  • SELECT clauses
  • UPDATE, DELETE, and MERGE commands.

When requirements are fulfilled, row access policies can incorporate conditions and functions in the policy expression to alter the data at query runtime. The policy-driven model encourages the separation of roles, allowing governance teams to develop regulations that restrict the exposure of sensitive data.

The object owner (i.e. the role with the OWNERSHIP privilege on the object, such as a table or view) is also included in this method, as they generally have complete access to the underlying data. Note: A single policy can be applied to several tables and views at the same time.

demand forecast

The main advantage of a row access policy is that it provides an organisation with an extendable policy that allows it to correctly balance data security, governance, and analytics. The row access policy’s extensible design enables one or more conditions to be added or withdrawn at any moment in order to keep the policy up to date with changes to the data, the mapping tables, and the RBAC hierarchy.

Data Encryption

Snowflake provides us with End-to-end encryption, which is a method that prevents third parties from reading data while at rest or in transit to and from Snowflake.

Aside from E2EE, SF provides us with two features that serve as the icing on the cake.

  • Periodic Rekeying
  • Tri-secret secure

Tri-secret secure

demand forecast

Snowflake controls data encryption keys to safeguard consumer information. There is no requirement for client involvement in this management; it happens automatically. Customers can manage their own extra encryption key using the key management feature of the cloud platform that houses their Snowflake account.

When enabled, a composite master key is produced by combining a customer-managed key with a Snowflake-maintained key to secure Snowflake data. It’s known as Tri-Secret Secure.

Periodic Rekeying

demand forecast

All Snowflake-managed keys are automatically rotated by Snowflake when they are more than 30 days old. Active keys are retired, and new keys are created.

The following image illustrates key rotation for one table master key (TMK) over a period of three months:

The TMK rotation works as follows:

  1. Version 1 of the TMK is active in April. Data inserted into this table in April is protected with TMK v1.

  2. In May, this TMK is rotated: TMK v1 is retired and a new, completely random key, TMK v2, is created. TMK v1 is now used only to decrypt data from April. New data inserted into the table is encrypted using TMK v2.

  3. In June, the TMK has rotated again: TMK v2 is retired and a new TMK, v3, is created. TMK v1 is used to decrypt data from April, TMK v2 is used to decrypt data from May, and TMK v3 is used to encrypt and decrypt new data inserted into the table in June.

Encryption Key Rotation is described as key rotation, which replaces active keys with new keys on a periodic basis and retires the old keys. Periodic data rekeying completes the life cycle.

If periodic rekeying is enabled, then when the retired encryption key for a table is older than one year, Snowflake automatically creates a new encryption key and re-encrypts all data previously protected by the retired key using the new key. The new key is used to decrypt the table data going forward.

Periodic rekeying works as follows:

  1. In April of the following year, after TMK v1 has been retired for an entire year, it is rekeyed (generation 2) using a fully new random key.

  2. The data files protected by TMK v1 generation 1 are decrypted and re-encrypted using TMK v1 generation 2. Having no further purpose, TMK v1 generation 1 is destroyed.

  3. In May, Snowflake performs the same rekeying process on the table data protected by TMK v2.

  4. And so on.

Summary

In the age of data-driven digital transformation, ensuring the security of your business data has never been more important. As a leading digital transformation service provider, Bluepi understands the importance of robust security measures for businesses undergoing digital transformation. Our team of experts has extensive experience in providing the best digital transformation services and solutions, making us one of the top consulting firms for digital transformation.

As the best Snowflake services provider in India, we offer a wide range of Snowflake services and solutions, including Snowpark services in India, Data Lake in Snowflake, Data Warehousing in Snowflake, Data Analytics in Snowflake, and Data Engineering with Snowflake. Our expertise in Snowflake allows us to offer unparalleled security measures for businesses in various industries.

Partnering with Bluepi means working with a digital transformation consulting company that prioritizes data security and provides the most effective digital transformation services and solutions. Trust us to help you secure your business data and achieve your digital transformation goals.

Tanika Jindal
Written by
Rishabh Sengar

Technical Lead