Manage access to cloud storage using Unity Catalog (2024)

  • Documentation
  • Connect to data sources
  • Connect to cloud object storage and services using Unity Catalog
  • Manage access to cloud storage using Unity Catalog

This article gives an overview of how to use Unity Catalog to manage access to cloud storage from Databricks. It introduces the concepts of external location, storage credential, and managed storage.

Note

If you want to use Unity Catalog to govern access to an external service rather than cloud storage, see Manage access to external cloud services using service credentials.

External locations and storage credentials

All data that is governed by Unity Catalog must be in cloud storage in your cloud provider account. Unity Catalog governs access to cloud storage using a securable object called an external location, which defines a path to a cloud storage location and the credentials required to access that location. Those credentials are, in turn, defined in a Unity Catalog securable object called a storage credential. By granting and revoking access to external location securables in Unity Catalog, you control access to the data in the cloud storage location. By granting and revoking access to storage credential securables in Unity Catalog, you control the ability to create external location objects.

Here’s a little more detail about these two securable objects:

  • A storage credential represents an authentication and authorization mechanism for accessing data stored on your cloud tenant, using an IAM role for S3 buckets or an R2 API token for Cloudflare R2 buckets. Privileges granted in Unity Catalog control which users and groups can use the credential to define external locations. Permission to create and use storage credentials should only be granted to users who need to create external location objects. See Create a storage credential for connecting to AWS S3 and Create a storage credential for connecting to Cloudflare R2.

  • An external location combines a cloud storage path with a storage credential that authorizes access to the cloud storage path. Privileges granted in Unity Catalog control which users and groups can access the cloud storage path defined by the external location. Permission to create and use external locations should only be granted to users who need to create external tables, external volumes, or managed storage locations. See Create an external location to connect cloud storage to Databricks.

External locations are used in Unity Catalog both for external data assets, like external tables and external volumes, and for managed data assets, like managed tables and managed volumes. For more information about the difference between external and managed data assets in Unity Catalog, see What are tables and views? and What are Unity Catalog volumes?.

To learn about best practices for using external locations, see Manage external locations, external tables, and external volumes.

Using external locations when you create external tables and volumes

External tables and external volumes registered in Unity Catalog are essentially pointers to data in cloud storage that you manage outside of Databricks. When you create an external table or external volume in Unity Catalog, you must reference a cloud storage path that is included in an external location object that you have been granted adequate privileges on. For more information about the difference between external and managed data assets in Unity Catalog, see What are tables and views? and What are Unity Catalog volumes?. For privileges, see Grant permissions on an external location.

Using external locations when you create managed storage

Managed tables and managed volumes are fully managed by Unity Catalog. They are stored by default in a managed storage location, which can be defined at the metastore, catalog, or schema level. When you assign a managed storage location to a metastore, catalog, or schema, you must reference an external location object, and you must have adequate privileges to use it. See Specify a managed storage location in Unity Catalog and Unity Catalog best practices.

Workflow for managing access to cloud storage in Unity Catalog

To manage access to cloud storage using Unity Catalog, you do the following:

  1. Create a storage credential object that encapsulates an IAM role that gives access to the cloud storage path.

  2. Create an external location object that references the storage path and the storage credential object.

  3. Reference a path that is included in the external location when you create external tables, external volumes, or default managed storage locations. This can be the exact path defined in the external location or a subpath.

Next steps

  • Create a storage credential for connecting to AWS S3

  • Create a storage credential for connecting to Cloudflare R2

  • Create an external location to connect cloud storage to Databricks

  • Specify a managed storage location in Unity Catalog

  • Manage storage credentials

  • Manage external locations

Manage access to cloud storage using Unity Catalog (2024)

References

Top Articles
Latest Posts
Recommended Articles
Article information

Author: Maia Crooks Jr

Last Updated:

Views: 6042

Rating: 4.2 / 5 (63 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Maia Crooks Jr

Birthday: 1997-09-21

Address: 93119 Joseph Street, Peggyfurt, NC 11582

Phone: +2983088926881

Job: Principal Design Liaison

Hobby: Web surfing, Skiing, role-playing games, Sketching, Polo, Sewing, Genealogy

Introduction: My name is Maia Crooks Jr, I am a homely, joyous, shiny, successful, hilarious, thoughtful, joyous person who loves writing and wants to share my knowledge and understanding with you.