Announcing BigQuery Encrypt and Decrypt function compatibility with Sensitive Data Protection

7 months ago 34
News Banner

Looking for an Interim or Fractional CTO to support your business?

Read more

Organizations collect vast amounts of data to create innovative solutions, perform ground breaking research, or optimize their designs. With this comes the responsibility to ensure data is adequately protected to meet regulatory, compliance, contractual or internal security requirements.

For organizations that want to move their data warehouses from on-premises to cloud-first systems, such as BigQuery, protecting sensitive data from unauthorized access or accidental exposure is crucial. Using encryption-based tokenization is a vital tool to create an additional layer of defense and fine-grained data control. 

In addition to storage-level encryption, whether using Google-managed or customer-managed keys, BigQuery now has seamless integration with Sensitive Data Protection supporting native SQL functions that allow interoperable deterministic encryption and decryption between BigQuery and Sensitive Data Protection. 

In short, this makes it easier to protect sensitive data across a variety of scenarios:

  • Protect sensitive data in BigQuery: Securely protect data containing personally identifiable information (PII), healthcare records, or financial data at query time while maintaining compliance with regulations.

  • Share sensitive data securely: Collaborate with  external parties (partners or consumers) while keeping sensitive information protected by sharing encrypted data (encrypted externally with Sensitive Data Protection), providing decryption keys separately and decrypting with function in BigQuery.

  • Compatible tokenization anywhere you need it:  Whether you create tokens with the Sensitive Data Protection APIs for workloads outside of BigQuery or create tokens natively in BigQuery, you can join, aggregate, and keep referential integrity where you need it. 

  • Improved performance:  Enhanced performance for example, like functions, leverages BigQuery distributed architecture to execute Sensitive Data Protection token-based encryption and decryption tasks natively in parallel across multiple BigQuery nodes to significantly accelerate the operations.

First-Image

Using Sensitive Data Protection functions in BigQuery

Here are the steps to get you started:

1. Identify your sensitive data: Use the Sensitive Data Protection discovery service to pinpoint BigQuery tables and columns containing confidential information.

2. Generate your encryption keys: generate your data keys and use Cloud KMS to protect your data keys.

3. Apply encryption: Use DLP_DETERMINISTIC_ENCRYPT to encrypt the identified data fields.

code_block <ListValue: [StructValue([('code', 'CREATE OR REPLACE FUNCTION\r\n`project_name.dataset.sdp_encrypt`(column_to_encrypt STRING,\r\n column_ad STRING) AS ((\r\n SELECT\r\n DLP_DETERMINISTIC_ENCRYPT(DLP_KEY_CHAIN("gcp-kms://projects/<project>/locations/<location>/keyRings/<keyring name>/cryptoKeys/<key-name>",\r\nb\'\\012\\044\\000\\066\\206\\201\\007\\....\'),\r\n column_to_encrypt,\r\n column_ad )));'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3e491ec5ab50>)])]>
code_block <ListValue: [StructValue([('code', 'CREATE TABLE `project_name.dataset.sdp_demodata_enc` AS\r\nSELECT\r\nID,\r\ndemo_dataset.sdp_encrypt(Card_Number,"CC") AS Encrypted_CC_Num,\r\ndemo_dataset.sdp_encrypt(SSN,"SSN") AS SSN_encrypted,\r\nFROM `project_name.dataset.sdp_demodata`;'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3e491f50eee0>)])]>

4. Store and process securely: Continue working with your encrypted data within BigQuery, safe in the knowledge that it's protected.

Query from encrypted table

Second-image

 Run aggregates on encrypted columns:

code_block <ListValue: [StructValue([('code', 'SELECT SSN_encrypted, count(Encrypted_CC_Num) as Card_count FROM `project_name.dataset.sdp_demodata_enc` GROUP BY SSN_encrypted;'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3e491f5c3d60>)])]>

5. Decrypt when needed: Use DLP_DETERMINISTIC_DECRYPT to access the original data only when necessary at query time and for authorized users.

code_block <ListValue: [StructValue([('code', 'CREATE OR REPLACE FUNCTION\r\n`project_name.dataset.sdp_decrypt`(column_to_decrypt STRING,\r\n column_ad STRING) AS ((\r\nSELECT DLP_DETERMINISTIC_DECRYPT(DLP_KEY_CHAIN("gcp-kms://projects/<project>/locations/<location>/keyRings/<keyring name>/cryptoKeys/<key-name>",\r\nb\'\\012\\044\\000\\066\\206\\201\\007\\....\'),\r\n column_to_decrypt,\r\n column_ad )));'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3e491f8456d0>)])]>

Query from decrypted records:

code_block <ListValue: [StructValue([('code', 'SELECT\r\nID, \r\ndataset.sdp_decrypt(Encrypted_CC_Num,"CC") AS Card_Number_decrypted,\r\ndataset.sdp_decrypt(SSN_encrypted,"SSN") AS SSN_decrypted \r\nFROM `project_name.dataset.sdp_demodata_enc`;'), ('language', ''), ('caption', <wagtail.rich_text.RichText object at 0x3e491ccc6d00>)])]>
Third-Image

Next steps

Sensitive Data Protection and BigQuery data security functions are powerful tools for protecting sensitive data in the cloud. By understanding how they function, and how their capabilities can be best used, you can enhance your data security posture, reduce the risk of data breaches, and help with the confidentiality of sensitive information while safeguarding your privacy.

Ready to dive deeper? Check out the Sensitive Data Protection: DLP-compatible encrypt function documentation for detailed instructions.

Read Entire Article