« CASE STUDIES

Data mesh products – security and automation

Creating a series of new automation data products and capabilities to form the foundation of the General Data Mesh and implementing SCIM integration with Azure AD.

Tags

Data PlatformsData and AIModern Data StrategyProduct DiscoveryProduct DevelopmentTechnology

Date

Oct 2023

The Client and Challenge

The client is a West Australian company that leads globally in the iron ore industry. Embracing innovation is deeply ingrained in their approach to business, and is core to their vision of being the safest, lowest cost, most profitable iron ore producer.

Following a strategic review of their Data Platform, it was identified that current processes for platform administration were largely manual, not repeatable, subject to error, consumed a substantial amount of time, and were reliant on key individuals with elevated levels of access.

The client required robust, secure and scalable, self-service patterns for provisioning of data and analytics resources and workspaces – with key building blocks including automation, security, compliance and observability.

The challenge was twofold –

  1. to implement SCIM Integration with Azure AD for the Data Platform and enabling single sign-on (SSO) and
  2. to create a series of new automation products and capabilities to form the foundation of the General Data Mesh.

The Partnership and Approach

The client had previously partnered with Mechanical Rock to undertake a data platform review and strategy. Now they sought to commence the first phase in order to achieve their strategic objectives.

Mechanical Rock provided a team of expert data platform specialists with deep experience using infrastructure-as-code to deliver automated data pipelines that allow for rapid configuration, scaling and baked-in security; coupled with a product and user experience specialist to understand the customer’s needs and problems.

Delivering a data platform focused on users

The team conducted user research and a service design approach to understand the users’ goals, tool landscape, workflow processes for various administration tasks, the jobs that needed to be done and the problems faced by the data teams.

Visual maps for various admin tasks and user insights helped the team understand the complexities of the data platform, the friction points, areas of improvement and opportunities in order to deliver the most value to the users of the data ecosystem.

Current State for Admin Task 1 - Creating a new Data Area in SnowflakeImage: Current State for Admin Task 1 - Creating a new Data Area in Snowflake

Removed steps for Admin Task 1Removed steps for Admin Task 1

Delivered Future State for Admin Task 1Image: Delivered Future State for Admin Task 1 – Shows how the new data platform will streamline the creation of a new Data Area with automated steps, automated tests and automatic creation of data resources.

The new workflow removes 12 manual actions and now requires the Data Administrator to perform only 1 manual action.

The Solution and Achievements

Azure AD and SCIM Integration

For the Data Platform, Mechanical Rock led the migration of Snowflake accounts to integrate with Azure AD and to enable SSO.

The main benefits included:

  1. Enhanced security across the whole Data Platform
  2. Improved compliance with regulatory requirements
  3. Improved role based access control and management
  4. Simplified the login process
  5. Streamlined user provisioning and deprovisioning
  6. Standardisation of password resets
  7. Reduces the burden on the IT help desk and data administration teams.


Current State for Admin Task 2 - Grant Access to Data AreaImage: Current State for Admin Task 2 - Grant Access to Data Area

Removed steps for Admin Task 2 - Grant Access to Data AreaImage: Removed steps for Admin Task 2 - Grant Access to Data Area

Delivered Future State for Admin Task 2Image: Delivered Future State for Admin Task 2 – Shows how the new data platform will streamline the workflow with automated steps, automated tests, automatic creation of data resources.

Data Admin Automation Capabilities

The new automation capabilities for the Data Platform aimed to:

  • Minimise administrative efforts, reducing costs and allowing for resources to be allocated to higher-value tasks.
  • Empower business self-service for various common tasks, enabling autonomy and streamlining the delivery of analytical outcomes.

1. Infrastructure provisioning

This new capability enables automatic and consistent creation of Snowflake infrastructure for data domain workspaces, including Cookiecutter templates, automation tools and Terraform cloud workspaces.

The benefits included:

  • A centralised repository for managing infrastructure provisioning.
  • Version control and transparent history via the adoption of GitHub.
  • Template tooling for infrastructure deployments, in line with domain naming conventions and infrastructure requirements.
  • Role grants managed in Azure AD.

2. Streamlined code management

This template is designed to deliver a streamlined process for code management.

It helps data consumers establish and deploy resources needed for the initial data pipeline, automated tests, and also allows data teams to generate and manage service account key-pairs.

3. Provisioning and code management for data migration

A deployment framework was created to perform a “lift and shift” of data from the outdated infrastructure to the new data platform.

Data administrators are now able to:

  • Manage code using source control.
  • Better control code via continuous integration and continuous delivery (CI/CD).
  • Gain insights from examples of schema translations and patterns for data migrations.

4. Snowflake Administration

This capability tackled various Snowflake Administration tasks such as Audit SQL history, cost management, object virtualization, database admin code, audit security, and the migration of Snowflake administrative schema objects to be managed by Terraform.

As a step towards platform automation, data administrators will now be able to:

  • Manage code in a robust and controlled process, with version control and code transparency.
  • Have a centralised, coherent and consistent working methodology.
  • Deploy new capabilities with a faster turnaround time.

5. Data delivery templates via GHA

The data delivery templates provide a simple and efficient way to manage the deployment of data pipelines to data environments, allowing data engineers and developers to focus on developing high-quality pipelines and code.

It simplifies the migration of DBT projects and provides scheduling of transformations via GH Actions. It enables logging, monitoring, and alerting capabilities for DBT and GH Action components, ensuring that data pipelines are run reliably and efficiently.

6. Airflow Platform Pipeline Template

Mechanical Rock delivered an Airflow Platform Pipeline allowing users to architect, orchestrate and monitor data workflows in a safe, robust and consistent way.

Utilising Terraform for its Infrastructure-as-Code tool, the Airflow Platform automates the provisioning of managed workflows, secrets, policies, DAGS in all environments; as well as manages the monitoring, alerting and logging configurations.



THINK WE CAN HELP YOU?

Get in Touch

Reach out to us and a member of our team will be in touch right away.

contact@mechanicalrock.io