- Azure Data Catalog is a fully managed cloud service whose users can discover the data sources they need and understand the data sources they find.
- With Data Catalog, any user (analyst, data scientist, or developer) can discover, understand, and consume data sources.
- Data Catalog includes a crowd-sourcing model of metadata and annotations.
- It is a single, central place for all of an organization's users to contribute their knowledge and build a community and culture of data.
Traditional Challenges
- Users might not be aware that a data source exists unless they come into contact with it as part of another process.
- There is no central location where data sources are registered.
- Unless users know the location of a data source, they cannot connect to the data by using a client application.
- Data-consumption experiences require users to know the connection string or path.
- Unless users know the location of a data source's documentation, they cannot understand the intended uses of the data.
- Data sources and documentation might live in a variety of places and be consumed through a variety of experiences.
- If users have questions about an information asset, they must locate the expert or team that's responsible for the data and engage them offline.
- There is no explicit connection between data and those with expert perspectives on its use.
- Unless users understand the process for requesting access to the data source, discovering the data source and its documentation still does not help them access the data.
Discovery Challenges
- Annotating data sources with descriptive metadata is often a lost effort.
- Client applications typically ignore descriptions that are stored in the data source.
- Creating documentation for data sources is often a lost effort.
- Keeping documentation in sync with data sources is an ongoing responsibility, and users might lack trust in documentation that's perceived as being out of date.
- Creating and maintaining documentation for data sources is complex and time-consuming.
- Making that documentation readily available to everyone who uses the data source can be even more so.
- Restricting access to data sources and ensuring that data consumers know how to request access is an ongoing challenge.
Azure Data Catalog Features
- Data Catalog is designed to address these problems and to help enterprises get the most value from their existing information assets.
- Data Catalog makes data sources easily discoverable and understandable by the users who manage the data.
- Data Catalog provides a cloud-based service into which a data source can be registered.