The Client
Rio Tinto is a leading global mining group that focuses on finding, mining and processing the Earth's mineral resources. Rio Tinto has started a programme of work to Ingest and Provision Geo Scientific data for Rio Tinto Exploration (RTX). This includes data of many forms such as spectral and image data and comes from a myriad of sources such as devices, aircraft, third party APIs, databases, and internal apps.
The Challenge
Mechanical Rock were engaged to build a solution that takes images from a third party repository and replicates these in the central data platform of choice, in this case Databricks. The current solution architecture uses patterns defined from a centralised team. Features include ingestion, and processing within AWS plus Databricks, for storage and analytics, including AI/ML workloads using Sagemaker. The data was not required to be realtime, but that there is a faithful replication of all data in the third party repository in the data platform at least once daily.
The Solution
Running daily, the application reaches out to the third party API, while operating within the environmental constraints set by Rio Tinto and only affects changes since the previous run. AWS Lambdas were chosen as the primary compute mechanism to enable horizontal scaling and AWS Queues allow failed messages to be retried and upon three failed attempts. The message is sent to a Dead-Letter-Queue alongside a notification to a human user to resolve the failing process. AWS Eventbridge is used to schedule and initiate the ingestion process. The ingestion process constructs a hierarchy to determine how to optimally batch and download images.
To make Imago data in S3 more generally available to consumers Mechanical Rock built a catalogue of Imago metadata, in Databricks.
The Benefits
The data from the third party repository is now available, and updated daily, in a central data platform at Rio Tinto. This can now be used as a single point of access for data products for any customers at Rio Tinto. Furthermore, this was the first use case for Databricks at RTX and has provided a template for future data ingestion that can be utilised for other data sources, underpinned with robust software engineering process to ensure quality and repeatability.
Get in Touch
Reach out to us and a member of our team will be in touch right away.