OpenMetadata 0.8.0 Release

Sachin Chaurasiya
OpenMetadata
Published in
8 min readJan 27, 2022

--

OpenMetadata 0.8.0 Release — Event Notification via Webhooks, Slack Integration, Access Control Policy, and Manual Lineage

Written by Aashit Kothari, akash-jain-10, Ayushshah, Mithun Mathew, Parth Panchal, Sachin chaurasiya

It’s 2022 and we are more than happy to announce the first release of the year with the 0.8 release. We are excited to showcase many important new features like the Role-Based Access Control Policy to create well-defined User Roles with pre-set Rules that provide access to metadata operations on any entity. Another exciting feature is Manual Lineage, to augment the lineage captured from machine metadata with user knowledge.

After having worked on the metadata change events in the past releases, we are now ready to subscribe for event notifications via webhooks and to start with Slack notifications. The Slack integration is another major step to foster collaboration in organizations among the producers and consumers of metadata by providing timely updates. There’s more progress on the data quality front with building a new Data Profiler that will also support the creation and scheduling of data quality tests.

Community Update

Over the past several months, we’ve witnessed a steady growth in community membership. It’s been great to watch the increasing participation of the community with all the feature requests, code contributions, helpful feedback, and lively interactions around metadata. Your collective experience in Data has been enriching OpenMetadata.

Key Metrics

  1. 100+ new members joined our Slack in the last 30 days
  2. 278 commits are merged into 0.8 release
  3. 43 contributors provided features/improvements to 0.8 release
  4. We have bi-weekly meetings scheduled for OpenMetadata. Please join our Meetup and RSVP to our weekly meetings
  5. If you like what we are doing, please give us a GitHub star. This really helps in OpenMetadata reaching a wider audience.

Be a part of the OpenMetadata community to strive together to build awesome features around metadata!

New in 0.8.0 OpenMetadata Release

Access Control Policy

In the 0.8 release, the Role-Based Access Control (RBAC) policy for metadata operations has been designed. New entities ‘Role’ and ‘Policy’ have been added to support access control. A User has a Role. Further, the Role is associated with a Policy with a set of Rules. Rules are used to provide access to metadata operations such as Update Description, Update Tags, and Update Owners.

The current version has a standard set of Roles, Policies, and Rules. The roles such as Admins and Bots can perform any metadata operation on any entity. In OpenMetadata, Bots have Admin privileges, as we have ‘Ingestion Bots’, which connect to various data sources and ingest metadata. User roles will have access to update certain parts of the metadata. Other roles cannot update but can only suggest updates to descriptions and tags and can follow/unfollow data assets. With the use of Policy and Role, we can support anything custom that suits your organization’s roles.

With regard to the architectural overview, an Admin will be able to define policies through the Policy API, which is stored in the database. For any OpenMetadata User, all calls go through the Authorizer. The entity information and the metadata operation from the call are run through a Policy Evaluator, which is based on all the rules that are pre-defined. The policy evaluator checks if a specific user has or does not have permission to perform the metadata operation. Rules can be enabled or disabled, as well as an entire policy can be disabled.

Manual Lineage

The manual lineage feature aims at the curation of lineage to make it richer by allowing users to edit the lineage and connect the entities with a no-code editor. Earlier, we supported a read-only Lineage graph, which displayed the lineage relationship collected from the sources. A drag and drop UI has been designed to allow users to add lineage information manually for the table and column levels. Entities like tables, pipelines, and dashboards can be dragged and dropped to the lineage graph to create a node. The required entity can be searched and clicked to insert into the graph. Users can add a new edge as well as delete an existing edge. The lineage data updates in real-time. The nodes load incrementally.

Event Notification via Webhooks and Slack Integration

In the 0.8 release, we worked on LMAX disruptor integration to publish change events. The webhook interface allows you to build applications that receive all the data changes happening in your organization through APIs. We’ve worked on webhook integration to register URLs to receive metadata event notifications. These events will be pushed to Elasticsearch and Slack via Pluggable Consumers that can read the events published to LMAX. We’ve done a Slack integration through webhook, allowing users to set up Slack notifications. In the future, we’ll be linking to similar services like Slack to publish metadata events to keep teams informed about metadata change events in their organization.

Data Quality — Data Profiler

So far, we were using the Great Expectations profiler. We replaced it with a new and more efficient data profiler that will not only collect profiling stats but also help in writing tests and executing them.

Entity Deletion

Entities have a lot of user-generated metadata, such as descriptions, tags, ownership, tiering. There’s also rich metadata generated by OpenMetadata through the data profiler, usage data, lineage, test results, and other graph relationships with other entities. When an entity is deleted, all of this rich information is lost, and it’s not easy to recreate it.

Now, API support has been added for entity deletion, both for soft delete and hard delete. A dataset is marked as soft-deleted in the OpenMetadata backend during deletion instead of hard deleting it. Users can restore accidentally deleted entities. Ingestion support has been added to publish entity deletion. We’ve enabled version support for deleted entities. We’ve also added support for deleted entities in Elasticsearch so that users can now search for deleted entities and restore them if necessary. Please refer to GitHub for more details.

Updates to Metadata Versioning

The metadata version panel has been added for all the entities- Table, Topic, Pipeline, and Dashboard. Previously, we were getting the change descriptions for a limited set of fields for the Topic entity; several other fields have now been included.

New Connectors

With this release, we are adding to our ever-growing list of connectors. If there’s a connector that you’d like to see in our upcoming release, please file a ticket here. We now support Delta Lake, an open-source project that enables building a Lakehouse architecture on top of data lakes.

We worked on the refactor of SQL connectors to extract the lineage. The Connector API was refactored to capture the configs on the OpenMetadata side and to schedule the ingestion via UI.

Other Features

  • DataSource attribute has been added to the ML model entity.
  • The Python API has been updated to add lineage for ML Model entities.
  • OpenMetadata now supports ingestion via environment variables to configure connectivity for both Elasticsearch and Airflow.
  • A new tab called ‘Bots’ has been added to group users with isBot set to true.
  • We now support Application Default Credentials or a keyless, default service account in BigQuery data ingestion.
  • Improved usability of the metadata docker commands

Planned for 0.9.0 Release

The focus of the 0.9 release will be on User quality tests and Conversation threads. The following enhancements will be covered in the upcoming release:

  1. Data Quality: Build functionality to allow users to create their own semantic tests to test the quality of data, as well as provide support quality for more databases.
  2. Collaboration: Conversations will be added in the main feed to give users the ability to ask questions, add suggestions and replies. Also, add the ability to convert the conversation threads into tasks. These tasks will be displayed in MyData. We’ll provide a Glossary. We’ll also provide table details to know what services are using the table, and also what queries are pulling from it.
  3. DBT: Add the ability to capture lineage and test metadata.
  4. Lineage:
    a.
    Provide versioning support for lineage
    b. Add more components into the lineage
    c. Lineage attribute propagation such as Tags, Tier etc.
    d. Spark Lineage support
    e. Column level lineage API support
  5. Pipeline: Capture pipeline status and bring in more data quality controls based on the status.
  6. Security: Provide security policies through the UI and also provide the ability to configure personas and authorization, i.e. Role-Based Access Control (RBAC) for metadata operations.

Thanks to our Contributors

It makes us extremely happy to witness the constant growth of the OpenMetadata community. We wholeheartedly thank this awesome community for their active participation, code contributions, continued support, and honest feedback. We’d especially like to say a thank you to the following community members:

If you are interested in contributing code, we created good starting issues to get you going. The goal is to drive sonar cloud flagged issues to zero to get to the best possible code quality. If you have any questions about code, installation, and docs, please reach out to us on Slack. If you have feature requests, please file a GitHub issue or reach out to us on Slack. Thanks for taking the time to explore and contribute to OpenMetadata. We look forward to your feedback, questions, and comments.

--

--