Databricks Clean Rooms Integration Guide

This guide is for advertisers and data providers who want to convert their user data to raw UID2s in a Databricks environment.

Integration Overview

This solution enables you to securely share consumer identifier data without exposing sensitive directly identifying information (DII), by processing your data in an instance of the Databricks Clean Rooms feature. This feature provides a secure and privacy-protecting environment for working on sensitive data.

When you've set up the Databricks Clean Rooms environment, you establish a trust relationship with the UID2 service and allow the service to convert your data, which you share in the clean room, to raw UID2s.

Functionality

The following table summarizes the functionality available with the UID2 Databricks integration.

Encrypt Raw UID2 to UID2 Token for Sharing	Decrypt UID2 Token to Raw UID2	Generate UID2 Token from DII	Refresh UID2 Token	Map DII to Raw UID2s
—	—	—	—	✅

Key Benefits

Here are some key benefits of integrating with Databricks for your UID2 processing:

Native support for managing UID2 workflows within a Databricks data clean room.
Secure identity interoperability between partner datasets.
Direct lineage and observability for all UID2-related transformations and joins, for auditing and traceability.
Streamlined integration between UID2 identifiers and The Trade Desk activation ecosystem.
Self-service support for marketers and advertisers through Databricks.

Preparing DII for Processing

It's critical that the input data, which you are converting to UID2, is in an acceptable format. If it isn't, you won't get the expected results. For example, you must normalize phone numbers to include the country code, as explained in Phone Number Normalization.

For details, see Preparing Emails and Phone Numbers for Processing.

To validate the full token generation pipeline end to end, confirming that tokens generated from your normalized, hashed, and encoded values are correct, use the UID2 Token Validator.

Integration Steps

At a high level, the following are the steps to set up your Databricks integration and process your data:

Create a clean room for UID2 collaboration.
Send your Databricks sharing identifier to your UID2 contact.
Add data to the clean room.
Map DII by running the clean room notebook.

Create Clean Room for UID2 Collaboration

As a starting point, create a Databricks Clean Rooms environment—a secure environment for you to collaborate with UID2 to process your data.

Follow the steps in Create clean rooms in the Databricks documentation. Use the correct sharing identifier based on the UID2 environment you want to connect to: see UID2 Sharing Identifiers.

important

After you've created a clean room, you cannot change its collaborators. If you have the option to set clean room collaborator aliases—for example, if you’re using the Databricks Python SDK to create the clean room—your collaborator alias must be creator and the UID2 collaborator alias must be collaborator. If you’re creating the clean room using the Databricks web UI, the correct collaborator aliases are set for you.

Before you can use the clean room notebook, you'll need to send your Databricks sharing identifier to your UID2 contact.

The sharing identifier is a string in this format: <cloud>:<region>:<uuid>.

Follow these steps:

Find the sharing identifier for the Unity Catalog metastore that is attached to the Databricks workspace where you’ll work with the clean room.

For information on how to find this value, see Finding a Sharing Identifier.
Send the sharing identifier to your UID2 contact.

Add Data to the Clean Room

Add one or more tables or views to the clean room. You can use any names for the schema, tables, and views. Tables and views must follow the schema detailed in Input Table.

Map DII

Run the identity_map_v3 Databricks Clean Rooms notebook to map email addresses, phone numbers, or their respective hashes to raw UID2s.

A successful notebook run results in raw UID2s populated in the output table. For details, see Output Table.

Running the Clean Rooms Notebook

This section provides details to help you use your Databricks Clean Rooms environment to process your DII into raw UID2s, including the following:

Notebook Parameters
Input Table
DII Format and Normalization
Output Table
Output Table Schema

Notebook Parameters

You can use the identity_map_v3 notebook to map DII in any table or view that you've added to the creator catalog of the clean room.

The notebook has two parameters, input_schema and input_table. Together, these two parameters identify the table or view in the clean room that contains the DII to be mapped.

For example, to map DII in the clean room table named creator.default.emails, set input_schema to default and input_table to emails.

Parameter Name	Description
`input_schema`	The schema containing the table or view.
`input_table`	The name you specify for the table or view containing the DII to be mapped.

Input Table

The input table or view must have the two columns shown in the following table. The table or view can have additional columns, but the notebook doesn't use any additional columns, only these two.

Column Name	Data Type	Description
`INPUT`	string	The DII to map.
`INPUT_TYPE`	string	The type of DII to map. Allowed values: `email`, `email_hash`, `phone`, and `phone_hash`.

DII Format and Normalization

The normalization requirements depend on the type of DII you're processing, as follows:

Email address: The notebook automatically normalizes the data using the UID2 Email Address Normalization rules.
Phone number: You must normalize the phone number before mapping it with the notebook, using the UID2 Phone Number Normalization rules.

Output Table

If the clean room has an output catalog, the mapped DII is written to a table in the output catalog. Output tables are stored for 30 days.

For details, see Overview of output tables in the Databricks documentation.

Output Table Schema

The following table provides information about the structure of the output data, including field names and values.

Column Name	Data Type	Description
`UID`	string	The value is one of the following: DII was successfully mapped: The UID2 associated with the DII. Otherwise: `NULL`.
`PREV_UID`	string	The value is one of the following: DII was successfully mapped and the current raw UID2 was rotated in the last 90 days: the previous raw UID2. Otherwise: `NULL`.
`REFRESH_FROM`	timestamp	The value is one of the following: DII was successfully mapped: The timestamp indicating when this UID2 should be refreshed. Otherwise: `NULL`.
`UNMAPPED`	string	The value is one of the following: DII was successfully mapped: `NULL`. Otherwise: The reason why the identifier was not mapped: `OPTOUT`, `INVALID IDENTIFIER`, or `INVALID INPUT TYPE`. For details, see Values for the UNMAPPED Column.

note

The raw UID2 does not change before the refresh timestamp. After the refresh timestamp, remapping the DII returns a new refresh timestamp, but the raw UID2 might or might not change. It is possible for the raw UID2 to remain unchanged for multiple refresh intervals.

Values for the UNMAPPED Column

The following table shows possible values for the UNMAPPED column in the output table schema.

Value	Meaning
`NULL`	The DII was successfully mapped.
`OPTOUT`	The user has opted out.
`INVALID IDENTIFIER`	The email address or phone number is invalid.
`INVALID INPUT TYPE`	The value of `INPUT_TYPE` is invalid. Valid values for `INPUT_TYPE` are: `email`, `email_hash`, `phone`, `phone_hash`.

Testing in the Integ Environment

If you'd like to test the Databricks Clean Rooms implementation before signing a UID2 POC, you can ask your UID2 contact for access in the integ (integration) environment. This environment is for testing only, and has no production data.

In the request, include your sharing identifier.

While you're waiting to hear back, you can complete the following actions:

Create the clean room, using the UID2 sharing identifier for the integration environment.
Put your assets into the clean room.

For details, see Integration Steps.

When your access is ready, your UID2 contact notifies you.

Reference

This section includes the following reference information:

UID2 Sharing Identifiers
Finding a Sharing Identifier

UID2 sharing identifiers can change. Before creating a new clean room, check this section to make sure you have the latest sharing identifier.

Environment	UID2 Sharing Identifier
Production	`aws:us-east-2:21149de7-a9e9-4463-b4e0-066f4b033e5d:673872910525611:010d98a6-8cf2-4011-8bf7-ca45940bc329`
Integration	`aws:us-east-2:4651b4ea-b29c-42ec-aecb-2377de70bbd4:2366823546528067:c15e03bf-a348-4189-92e5-68b9a7fb4018`

To find the sharing identifier for your UID2 contact, follow these steps:

In your Databricks workspace, in the Catalog Explorer, click Catalog.

At the top, click the gear icon and select Delta Sharing.

On the Shared with me tab, in the upper right, click your Databricks sharing organization and then select Copy sharing identifier.

For details, see Request the recipient's sharing identifier in the Databricks documentation.

Integration Overview​

Functionality​

Key Benefits​

Preparing DII for Processing​

Integration Steps​

Create Clean Room for UID2 Collaboration​

Send Sharing Identifier to UID2 Contact​

Add Data to the Clean Room​

Map DII​

Running the Clean Rooms Notebook​

Notebook Parameters​

Input Table​

DII Format and Normalization​

Output Table​

Output Table Schema​

Values for the UNMAPPED Column​

Testing in the Integ Environment​

Reference​

UID2 Sharing Identifiers​

Finding a Sharing Identifier​