Attention
This library is in very early stages. Like the idea of it? Please star us on GitHub and contribute via the issues board and roadmap.
Twined¶
twined is a library to help create and connect Digital Twins and data services.
“Twined” [t-why-nd] ~ encircled, twisted together, interwoven
A digital twin is a virtual representation of a real life being - a physical asset like a wind turbine or car - or even a human. Like real things, digital twins need to interact, so can be connected together, but need a common communication framework to do so.
twined helps you to define a single file, a “twine”, that defines a digital twin / data service. It specifies specifying its data interfaces, connections to other twins, and other requirements.
Any person, or any computer, can read a twine and understand what-goes-in and what-comes-out. That makes it easy to collaborate with other teams, since everybody is crystal clear about what’s needed.
Digital twins / data services connected in a hierarchy. Each blue circle represents a twin, coupled to its neighbours. Yellow nodes are where schema are used to connect twins.¶
Aims¶
twined provides a toolkit to help create and validate “twines” - descriptions of a digital twin, what data it requires, what it does and how it works.
- The goals of this twined library are as follows:
Provide a clear framework for what a twine can and/or must contain
Provide functions to validate incoming data against a known twine
Provide functions to check that a twine itself is valid
Provide (or direct you to) tools to create twines describing what you require
In Anatomy Of The Twine File, we describe the different parts of a twine (examining how digital twins connect and interact… building them together in hierarchies and networks). But you may prefer to dive straight in with the Quick Start guide.
The scope of twined is not large. Many other libraries will deal with hosting and deploying digital twins, still more will deal with the actual analyses done within them. twined purely deals with parsing and checking the information exchanged.
Raison d’etre¶
Octue believes that a lynchpin of solving climate change is the ability for all engineering, manufacturing, supply chain and infrastructure plant to be connected together, enabling strong optimisation and efficient use of these systems.
To enable engineers and scientists to build, connect and run digital twins in large networks (or even in small teams!) it is necessary for everyone to be on the same page - the Gemini Principles are a great way to start with that, which is why we’ve released this part of our technology stack as open source, to support those principles and help develop a wider ecosystem.
The main goal is to help engineers and scientists focus on doing engineering and science - instead of apis, data cleaning/management, and all this cloud-pipeline-devops-test-ci-ml BS that takes up 90% of a scientist’s time, when they should be spending their valuable time researching migratory patterns of birds, or cell structures, or wind turbine performance, or whatever excites them.
Uses¶
At Octue, twined is used as a core part of our application creation process:
As a format to communicate requirements to our partners in research projects
As a tool to validate incoming data to digital twins
As a framework to help establish schema when designing digital twins
As a source of information on digital twins in our network, to help map and connect twins together
We’d like to hear about your use case. Please get in touch!
We use the GitHub Issue Tracker to manage bug reports and feature requests. Please note, this is not a “general help” forum; we recommend Stack Overflow for such questions. For really gnarly issues or for help designing digital twin schema, Octue is able to provide application support services for those building digital twins using twined.
Quick Start¶
Installation¶
twined is available on pypi, so installation into your python virtual environment is dead simple:
pip install twined
Don’t have a virtual environment with pip? You probably should! pyenv
is your friend. Google it.
Compilation¶
There is presently no need to compile twined, as it’s written entirely in python.
Third party library installation¶
twined is for python >= 3.6 so expects that. Other dependencies can be checked in setup.py
, and will
automatically installed during the installation above.
Third party build requirements¶
Attention
- Woohoo! There are no crazy dependencies that you have to compile and build for your particular system.
(you know the ones… they never actually compile, right?). We aim to keep it this way.
Create your first twine¶
Let’s say we want a digital twin that accepts two values, uses them to make a calculation, then gives the result. Anyone connecting to the twin will need to know what values it requires, and what it responds with.
First, create a blank text file, call it twine.json. We’ll give the twin a title and description. Paste in the following:
{
"title": "My first digital twin... of an atomising discombobulator",
"description": "A simple example... estimates the `foz` value of an atomising discombobulator."
}
Now, let’s define an input values strand, to specify what values are required by the twin. For this we use a json schema
(you can read more about them in Introducing JSON Schema). Add the input_values
field, so your twine looks like this:
{
"title": "My first digital twin",
"description": "A simple example to build on..."
"input_values_schema": {
"$schema": "http://json-schema.org/2019-09/schema#",
"title": "Input Values schema for my first digital twin",
"description": "These values are supplied to the twin by another program (often over a websocket, depending on your integration provider). So as these values change, the twin can reply with an update.",
"type": "object",
"properties": {
"foo": {
"description": "The foo value... speed of the discombobulator's input bobulation module, in m/s",
"type": "number",
"minimum": 10,
"maximum": 500
},
"baz": {
"description": "The baz value... period of the discombobulator's recombulation unit, in s",
"type": "number",
"minimum": 0,
"maximum": 1000
}
}
}
}
Finally, let’s define an output values strand, to define what kind of data is returned by the twin:
"output_values_schema": {
"$schema": "http://json-schema.org/2019-09/schema#",
"title": "Output Values schema for my first digital twin",
"description": "The twin will output data that matches this schema",
"type": "object",
"properties": {
"foz": {
"description": "Estimate of the foz value... efficiency of the discombobulator in %",
"type": "number",
"minimum": 10,
"maximum": 500
}
}
}
Load the twine¶
twined provides a Twine() class to load a twine (from a file or a json string). The loading process checks the twine itself is valid. It’s as simple as:
from twined import Twine
my_twine = Twine(source='twine.json')
Validate some inputs¶
Say we have some json that we want to parse and validate, to make sure it matches what’s required for input values.
my_input_values = my_twine.validate_input_values(json='{"foo": 30, "baz": 500}')
You can read the values from a file too. Paste the following into a file named input_values.json
:
{
"foo": 30,
"baz": 500
}
Then parse and validate directly from the file:
my_input_values = my_twine.validate_input_values(source="input_values.json")
Attention
LIBRARY IS UNDER CONSTRUCTION! WATCH THIS SPACE FOR MORE!
Anatomy Of The Twine File¶
The main point of twined is to enable engineers and scientists to easily (and rigorously) define a digital twin or data service.
This is done by adding a twine.json
file to the repository containing your code. Adding a twine means you can:
communicate (to you or a colleague) what data is required by this service
communicate (to another service / machine) what data is required
deploy services automatically with a provider like Octue.
To just get started building a twine, check out the Quick Start. To learn more about twines in general, see About Twines. Here, we describe the parts of a twine (“strands”) and what they mean.
Strands¶
A twine has several sections, called strands. Each defines a different kind of data required (or produced) by the twin.
Strand |
Describes the twin’s requirements for… |
---|---|
Data, in JSON form, used for configuration of the twin/service. |
|
Files/datasets required by the twin at configuration/startup |
|
Data, in JSON form, passed to the twin in order to trigger an analysis |
|
Files/datasets passed with Input Values to trigger an analysis |
|
Data, in JSON form, that will be produced by the twin (in response to inputs) |
|
Files/datasets that will be produced by the twin (in response to inputs) |
|
Credentials that are required by the twin in order to access third party services |
|
Other twins, access to which are required for this twin to function |
|
Visual and progress outputs from an analysis |
Values-based Strands¶
The configuration_values_schema
, input_values_schema
and output_values_schema
strands are values-based,
meaning the data that matches these strands is in JSON form.
Each of these strands is a json schema which describes that data.
This strand is a configuration_values_schema
, that is used to check validity of any
configuration_values
data supplied to the twin at startup.
The Configuration Values Strand is generally used to define control parameters relating to what the twin should do, or how it should operate.
For example, should it produce output images as low resolution PNGs or as SVGs? How many iterations of a fluid flow solver should be used? What is the acceptable error level on an classifier algorithm?
This strand is an input_values_schema
, that is used to check validity of input_values
data supplied to the
twin at the beginning of an analysis task.
The Input Values Strand is generally used to define actual data which will be processed by the twin. Sometimes, it may be used to define control parameters specific to an analysis.
For example, if a twin cleans and detects anomalies in a 10-minute timeseries of 1Hz data, the input_values
might contain an array of data and a list of corresponding timestamps. It may also contain a control parameter
specifying which algorithm is used to do the detection.
Note
Depending on the way the twin is deployed (see Deployment), the input_values
might come in from a
web request, over a websocket or called directly from the command line or another library.
However they come, if the new input_values
validate against the input_values_schema
strand,
then analysis can proceed.
This strand is an output_values_schema
, that is used to check results (output_values
) computed during an
analysis. This ensures that the application wrapped up within the twine is operating correctly, and
enables other twins/services or the end users to see what outputs they will get.
For example,if a twin cleans and detects anomalies in a 10-minute timeseries of 1Hz data, the output_values
might contain an array of data interpolated onto regular timestamps, with missing values filled in and a list of
warnings where anomalies were found.
Let’s look at basic examples for twines containing each of these strands:
This twine contains an example configuration_values_schema
with one control parameter.
Many more detailed and specialised examples are available in the GitHub repository
{
"configuration_values_schema": {
"title": "The example configuration form",
"description": "The Configuration Values Strand of an example twine",
"type": "object",
"properties": {
"n_iterations": {
"description": "An example of an integer configuration variable, called 'n_iterations'.",
"type": "integer",
"minimum": 1,
"maximum": 10,
"default": 5
}
}
}
}
Matching configuration_values
data could look like this:
{
"n_iterations": 8,
}
This twine contains an example input_values_schema
with one input value, which marked as required.
Many more detailed and specialised examples are available in Examples.
{
"input_values_schema": {
"title": "Input Values",
"description": "The input values strand of an example twine, with a required height value",
"type": "object",
"properties": {
"height": {
"description": "An example of an integer value called 'height'",
"type": "integer",
"minimum": 2
}
},
"required": ["height"]
},
Matching input_values
data could look like this:
{
"height": 13,
}
Stuff
Manifest-based Strands¶
Frequently, twins operate on files containing some kind of data. These files need to be made accessible to the code running in the twin, in order that their contents can be read and processed. Conversely, a twin might produce an output dataset which must be understood by users.
The configuration_manifest
, input_manifest
and output_manifest
strands describe what kind of datasets (and
associated files) are required / produced.
Note
Files are always contained in datasets, even if there’s only one file. It’s so that we can keep nitty-gritty file metadata separate from the more meaningful, higher level metadata like what a dataset is for.
This describes datasets/files that are required at startup of the twin / service. They typically contain a resource that the twin might use across many analyses.
For example, a twin might predict failure for a particular component, given an image. It will require a trained
ML model (saved in a *.pickle
or *.json
). While many thousands of predictions might be done over the
period that the twin is deployed, all predictions are done using this version of the model - so the model file is
supplied at startup.
These files are made available for the twin to run a particular analysis with. Each analysis will likely have different input datasets.
For example, a twin might be passed a dataset of LiDAR *.scn
files and be expected to compute atmospheric flow
properties as a timeseries (which might be returned in the output values for onward
processing and storage).
Files are created by the twin during an analysis, tagged and stored as datasets for some onward purpose. This strand is not used for sourcing data; it enables users or other services to understand appropriate search terms to retrieve datasets produced.
Describing Manifests¶
Manifest-based strands are a description of what files are needed. The purpose of the manifest strands is to provide a helper to a wider system providing datafiles to digital twins.
Show twine containing this strand
{
// Manifest strands contain lists, with one entry for each required dataset
"configuration_manifest": {
"datasets": [
{
// Once the inputs are validated, your analysis program can use this key to access the dataset
"key": "trained_model",
// General notes, which are helpful as a reminder to users of the service
"purpose": "The trained classifier"
}
]
}
}
Show a matching file manifest
{
"id": "8ead7669-8162-4f64-8cd5-4abe92509e17",
"datasets": [
{
"id": "7ead7669-8162-4f64-8cd5-4abe92509e17",
"name": "training data for system abc123",
"organisation": "megacorp",
"tags": {"system": "abc123"},
"labels": ["classifier", "damage"],
"files": [
{
"path": "datasets/7ead7669/blade_damage.mdl",
"cluster": 0,
"sequence": 0,
"extension": "csv",
"tags": {},
"labels": [],
"posix_timestamp": 0,
"id": "abff07bc-7c19-4ed5-be6d-a6546eae8e86",
"last_modified": "2019-02-28T22:40:30.533005Z",
"name": "blade_damage.mdl",
"size_bytes": 59684813,
"sha-512/256": "somesha"
}
]
}
]
}
Here we specify that two datasets (and all or some of the files associated with them) are required, for a service that cross-checks meteorological mast data and power output data for a wind farm.
Show twine containing this strand
{
// Manifest strands contain lists, with one entry for each required dataset
"input_manifest": {
"datasets": [
{
// Once the inputs are validated, your analysis program can use this key to access the dataset
"key": "met_mast_data",
// General notes, which are helpful as a reminder to users of the service
"purpose": "A dataset containing meteorological mast data"
},
{
"key": "scada_data",
"purpose": "A dataset containing scada data"
}
]
}
}
Show a matching file manifest
{
"id": "8ead7669-8162-4f64-8cd5-4abe92509e17",
"datasets": [
{
"id": "7ead7669-8162-4f64-8cd5-4abe92509e17",
"name": "meteorological mast dataset",
"tags": {"location": 108346},
"labels": ["met", "mast", "wind"],
"files": [
{
"path": "input/datasets/7ead7669/mast_1.csv",
"cluster": 0,
"sequence": 0,
"extension": "csv",
"tags": {},
"labels": [],
"posix_timestamp": 1551393630,
"id": "abff07bc-7c19-4ed5-be6d-a6546eae8e86",
"last_modified": "2019-02-28T22:40:30.533005Z",
"name": "mast_1.csv",
"size_bytes": 59684813,
"sha-512/256": "somesha"
},
{
"path": "input/datasets/7ead7669/mast_2.csv",
"cluster": 0,
"sequence": 1,
"extension": "csv",
"tags": {},
"labels": [],
"posix_timestamp": 1551394230,
"id": "bbff07bc-7c19-4ed5-be6d-a6546eae8e45",
"last_modified": "2019-02-28T22:50:40.633001Z",
"name": "mast_2.csv",
"size_bytes": 59684813,
"sha-512/256": "someothersha"
}
]
},
{
"id": "5cf9e445-c288-4567-9072-edc31003b022",
"name": "scada data exports",
"tags": {"location": 108346, "system": "ab32"},
"labels": ["wind", "turbine", "scada"],
"files": [
{
"path": "input/datasets/7ead7669/export_1.csv",
"cluster": 0,
"sequence": 0,
"extension": "csv",
"tags": {},
"labels": [],
"posix_timestamp": 1551393600,
"id": "78fa511f-3e28-4bc2-aa28-7b6a2e8e6ef9",
"last_modified": "2019-02-28T22:40:00.000000Z",
"name": "export_1.csv",
"size_bytes": 88684813,
"sha-512/256": "somesha"
},
{
"path": "input/datasets/7ead7669/export_2.csv",
"cluster": 0,
"sequence": 1,
"extension": "csv",
"tags": {},
"labels": [],
"posix_timestamp": 1551394200,
"id": "204d7316-7ae6-45e3-8f90-443225b21226",
"last_modified": "2019-02-28T22:50:00.000000Z",
"name": "export_2.csv",
"size_bytes": 88684813,
"sha-512/256": "someothersha"
}
]
}
]
}
Show twine containing this strand
{
"output_manifest": {
"datasets": [
{
// Twined will prepare a manifest with this key, which you can add to during the analysis or once its complete
"key": "met_scada_checks",
// General notes, which are helpful as a reminder to users of the service
"purpose": "A dataset containing figures showing correlations between mast and scada data"
}
]
}
}
Show a matching file manifest
{
"id": "8ead7669-8162-4f64-8cd5-4abe92509e17",
"datasets": [
{
"id": "4564deca-5654-42e8-aadf-70690b393a30",
"name": "visual cross check data",
"organisation": "megacorp",
"tags": {"location": 108346},
"labels": ["figure", "met", "mast", "scada", "check"],
"files": [
{
"path": "datasets/7ead7669/cross_check.fig",
"cluster": 0,
"sequence": 0,
"extension": "fig",
"tags": {},
"labels": [],
"posix_timestamp": 1551394800,
"id": "38f77fe2-c8c0-49d1-a08c-0928d53a742f",
"last_modified": "2019-02-28T23:00:00.000000Z",
"name": "cross_check.fig",
"size_bytes": 59684813,
"sha-512/256": "somesha"
}
]
}
]
}
File tag templates¶
Datafiles can be tagged with key-value pairs of relevant metadata that can be used in analyses. Certain datasets might
need one set of metadata on each file, while others might need a different set. The required (or optional) file tags can be
specified in the twine in the file_tags_template
property of each dataset of any manifest
strand. Each file in
the corresponding manifest strand is then validated against its dataset’s file tag template to ensure the required tags
are present.
The example below is for an input manifest, but the format is the same for configuration and output manifests.
Show twine containing a manifest strand with a file tag template
{
"input_manifest": {
"datasets": [
{
"key": "met_mast_data",
"purpose": "A dataset containing meteorological mast data",
"file_tags_template": {
"type": "object",
"properties": {
"manufacturer": {"type": "string"},
"height": {"type": "number"},
"is_recycled": {"type": "boolean"}
},
"required": ["manufacturer", "height", "is_recycled"]
}
}
]
}
}
Show a matching file manifest
{
"id": "8ead7669-8162-4f64-8cd5-4abe92509e17",
"datasets": [
{
"id": "7ead7669-8162-4f64-8cd5-4abe92509e17",
"name": "met_mast_data",
"tags": {},
"labels": ["met", "mast", "wind"],
"files": [
{
"path": "input/datasets/7ead7669/file_1.csv",
"cluster": 0,
"sequence": 0,
"extension": "csv",
"labels": ["mykeyword1", "mykeyword2"],
"tags": {
"manufacturer": "vestas",
"height": 500,
"is_recycled": true
},
"id": "abff07bc-7c19-4ed5-be6d-a6546eae8e86",
"name": "file_1.csv"
},
{
"path": "input/datasets/7ead7669/file_1.csv",
"cluster": 0,
"sequence": 1,
"extension": "csv",
"labels": [],
"tags": {
"manufacturer": "vestas",
"height": 500,
"is_recycled": true
},
"id": "abff07bc-7c19-4ed5-be6d-a6546eae8e86",
"name": "file_1.csv"
}
]
}
]
}
A remote reference can also be given for a file tag template. If the tag template somewhere public, this is useful for sharing the template between one or more teams working on the same type of data.
The example below is for an input manifest, but the format is the same for configuration and output manifests. It also shows two different tag templates being specified for two different types of dataset required by the manifest.
Show twine using a remote tag template
{
"input_manifest": {
"datasets": [
{
"key": "met_mast_data",
"purpose": "A dataset containing meteorological mast data",
"file_tags_template": {
"$ref": "https://refs.schema.octue.com/octue/my-file-type-tag-template/0.0.0.json"
}
},
{
"key": "some_other_kind_of_dataset",
"purpose": "A dataset containing something else",
"file_tags_template": {
"$ref": "https://refs.schema.octue.com/octue/another-file-type-tag-template/0.0.0.json"
}
}
]
}
}
Show a matching file manifest
{
"id": "8ead7669-8162-4f64-8cd5-4abe92509e17",
"datasets": [
{
"id": "7ead7669-8162-4f64-8cd5-4abe92509e17",
"name": "met_mast_data",
"tags": {},
"labels": ["met", "mast", "wind"],
"files": [
{
"path": "input/datasets/7ead7669/file_1.csv",
"cluster": 0,
"sequence": 0,
"extension": "csv",
"labels": ["mykeyword1", "mykeyword2"],
"tags": {
"manufacturer": "vestas",
"height": 500,
"is_recycled": true
},
"id": "abff07bc-7c19-4ed5-be6d-a6546eae8e86",
"name": "file_1.csv"
},
{
"path": "input/datasets/7ead7669/file_1.csv",
"cluster": 0,
"sequence": 1,
"extension": "csv",
"labels": [],
"tags": {
"manufacturer": "vestas",
"height": 500,
"is_recycled": true
},
"id": "abff07bc-7c19-4ed5-be6d-a6546eae8e86",
"name": "file_1.csv"
}
]
},
{
"id": "7ead7669-8162-4f64-8cd5-4abe92509e29",
"name": "some_other_kind_of_dataset",
"tags": {},
"labels": ["my-label"],
"files": [
{
"path": "input/datasets/7eadpp9/interesting_file.dat",
"cluster": 0,
"sequence": 0,
"extension": "dat",
"labels": [],
"tags": {
"length": 864,
"orientation_angle": 85
},
"id": "abff07bc-7c19-4ed5-be6d-a6546eae9071",
"name": "interesting_file.csv"
},
}
]
}
TODO - clean up or remove this section
How Filtering Works¶
It’s the job of twined to make sure of two things:
make sure the twine file itself is valid,
File data (input, output)
Files are not streamed directly to the digital twin (this would require extreme bandwidth in whatever system is orchestrating all the twins). Instead, files should be made available on the local storage system; i.e. a volume mounted to whatever container or VM the digital twin runs in.
Groups of files are described by a
manifest
, where a manifest is (in essence) a catalogue of files in a dataset.A digital twin might receive multiple manifests, if it uses multiple datasets. For example, it could use a 3D point cloud LiDAR dataset, and a meteorological dataset.
{ "manifests": [ { "type": "dataset", "id": "3c15c2ba-6a32-87e0-11e9-3baa66a632fe", // UUID of the manifest "files": [ { "id": "abff07bc-7c19-4ed5-be6d-a6546eae8e86", // UUID of that file "sha1": "askjnkdfoisdnfkjnkjsnd" // for quality control to check correctness of file contents "name": "Lidar - 4 to 10 Dec.csv", "path": "local/file/path/to/folder/containing/it/", "type": "csv", "metadata": { }, "size_bytes": 59684813, "tags": {"special_number": 1}, "labels": ["lidar", "helpful", "information", "like"], // Searchable, parsable and filterable }, { "id": "abff07bc-7c19-4ed5-be6d-a6546eae8e86", "name": "Lidar - 11 to 18 Dec.csv", "path": "local/file/path/to/folder/containing/it/", "type": "csv", "metadata": { }, "size_bytes": 59684813, "tags": {"special_number": 2}, "labels": ["lidar", "helpful", "information", "like"] // Searchable, parsable and filterable }, { "id": "abff07bc-7c19-4ed5-be6d-a6546eae8e86", "name": "Lidar report.pdf", "path": "local/file/path/to/folder/containing/it/", "type": "pdf", "metadata": { }, "size_bytes": 484813, "tags": {}, "labels": ["report"] // Searchable, parsable and filterable } ] }, { // ... another dataset manifest ... } ] }
Credentials Strand¶
In order to:
GET/POST data from/to an API,
query a database, or
connect to a socket (for receiving Values or emitting Values, Monitors or Logs),
A digital twin must have access to it. API keys, database URIs, etc must be supplied to the digital twin but
treated with best practice with respect to security considerations. The purpose of the credentials
strand is to
dictate what credentials the twin requires in order to function.
Defining the Credentials Strand¶
This is the simplest of the strands, containing a list of credentials (whose NAMES_SHOULD_BE_SHOUTY_SNAKE_CASE
) with
a reminder of the purpose.
{
"credentials": [
{
"name": "SECRET_THE_FIRST",
"purpose": "Token for accessing a 3rd party API service"
},
{
"name": "SECRET_THE_SECOND",
"purpose": "Token for accessing a 3rd party API service"
},
{
"name": "SECRET_THE_THIRD",
"purpose": "Another secret, like a password for a sandbox or local database"
}
]
}
Supplying Credentials¶
Attention
Credentials should never be hard-coded into application code
Do you trust the twin code? If you insert credentials to your own database into a digital twin provided by a third party, you better be very sure that twin isn’t going to scrape all that data out then send it elsewhere!
Alternatively, if you’re building a twin requiring such credentials, it’s your responsibility to give the end users confidence that you’re not abusing their access.
There’ll be a lot more discussion on these issues, but it’s outside the scope of twined - all we do here is make sure a twin has the credentials it requires.
Credentials should be securely managed by whatever system is managing the twin, then made accessible to the twin in the form of environment variables:
SERVICE_API_KEY=someLongTokenTHatYouProbablyHaveToPayTheThirdPartyProviderLoadsOfMoneyFor
Credentials may also reside in a .env
file in the current directory, either in the format above
(with a new line for each variable) or, for convenience, as bash exports like:
export SERVICE_API_KEY=someLongTokenTHatYouProbablyHaveToPayTheThirdPartyProviderLoadsOfMoneyFor
The validate_credentials()
method of the Twine
class checks for their presence and, where contained in a
.env
file, ensures they are loaded into the environment.
Monitor Message Strand¶
The monitor_message_schema
strand is values-based meaning the data that matches the strand is in JSON form. It is
a json schema which describes a monitor message.
There are two kinds of monitoring data required from a digital twin.
Monitor data (output)
Values for health and progress monitoring of the twin, for example percentage progress, iteration number and status - perhaps even residuals graphs for a converging calculation. Broadly speaking, this should be user-facing information.
This kind of monitoring data can be in a suitable form for display on a dashboard
Log data (output)
Logged statements, typically in iostream form, produced by the twin (e.g. via python’s logging
module) must be
capturable as an output for debugging and monitoring purposes. Broadly speaking, this should be developer-facing
information.
Let’s look at basic examples for twines containing each of these strands:
Monitor data (output)
{
"monitor_message_schema": {
"type": "object",
"properties": {
"my_property": {
"type": "number"
}
},
"required": ["my_property"]
}
}
Log data (output)
Children Strand¶
Attention
Coming Soon!
Twine File Schema¶
Because the twine.json
file itself is in JSON
format with a strict structure, twined uses a schema to make
that twine files are correctly written (a “schema-schema”, if you will, since a twine already contains schema). Try not
to think about it. But if you must, the twine schema is
here.
The first thing twined always does is check that the twine.json
file itself is valid, and give you a
descriptive error if it isn’t.
Other External I/O¶
A twin might:
GET/POST data from/to an external API,
query/update a database,
upload files to an object store,
trigger events in another network, or
perform pretty much any interaction you can think of with other applications over the web.
However, such data exchange may not be controllable by twined (which is intended to operate at the boundaries of the twin) unless the resulting data is returned from the twin (and must therefore be compliant with the schema).
So, there’s nothing for twined to do here, and no need for a strand in the twine file. However, interacting with third party APIs or databases might require some credentials. See Credentials Strand for help with that.
Note
This is actually a very common scenario. For example, the purpose of the twin might be to fetch data (like a weather
forecast) from some external API then return it in the output_values
for use in a network of digital twins.
But its the twin developer’s job to do the fetchin’ and make sure the resulting data is compliant with the
output_values_schema
(see Values-based Strands).
About Twines¶
Twined is a framework for describing a digital twin or data service.
We call these descriptions “twines”. To just get started building a twine, check out the Quick Start. To get into the detail of what’s in a twine, see Anatomy Of The Twine File.
Here, we look at requirements for the framework, our motivations and background, and some of the decisions made while developing twined.
Digital Twins¶
A digital twin is a virtual representation of a real life being - a physical asset like a wind turbine or car - or even a human.
- There are three reasons why you might want to create a digital twin:
Monitoring
Prediction
Optimisation
On its own, a digital twin can be quite useful. For example, a twin might embody an AI-based analysis to predict power output of a turbine.
A digital twin consists of some kind of analysis or processing task, which could be run many times per second, or daily, down to occasionally or sometimes only once (the same as a “normal” analysis).¶
Coupling digital twins is generally even more useful. You might wish to couple your turbine twin with a representation of the local power grid, and a representation of a factory building to determine power demand… enabling you to optimise your factory plant for lowest energy cost whilst intelligently selling surplus power to the grid.
A hierarchy of digital twins. Each blue circle represents a twin, coupled to its neighbours. Yellow nodes are where schema are used to connect twins.¶
Gemini Principles¶
The Gemini Principles have been derived by the Centre for Digital Built Britain (CDBB). We strongly recommend you give them a read if embarking on a digital twins project.
The aim of twined is to enable the following principles. In particular:
Openness (open-source project to create schema for twins that can be run anywhere, anywhen)
Federation (encouraging a standardised way of connecting twins together)
Security (making sure schemas and data can be read safely)
Public Good (see our nano-rant about climate change in Raison d’etre)
Requirements of the framework¶
A twine must describe a digital twin, and have multiple roles. It must:
Define what data is required by a digital twin, in order to run
Define what data will be returned by the twin following a successful run
Define the formats of these data, in such a way that incoming data can be validated
Define what other (1st or 3rd party) twins / services are required by this one in order for it to run.
If this weren’t enough, the description:
Must be trustable (i.e. a twine from an untrusted, corrupt or malicious third party should be safe to at least read)
Must be machine-readable and machine-understandable 1
Must be human-readable and human-understandable 1
Must be discoverable (that is, searchable/indexable) otherwise people won’t know it’s there in orer to use it.
Fortunately for digital twin developers, several of these requirements have already been seen for data interchange
formats developed for the web. twined uses JSON
and JSONSchema
to help interchange data.
If you’re not already familiar with JSONSchema
(or wish to know why twined uses JSON
over the seemingly more
appropriate XML
standard), see Introducing JSON Schema.
Introducing JSON Schema¶
JSON
is a data interchange format that has rapidly taken over as the defacto web-based data communication standard
in recent years.
JSONSchema
is a way of specifying what a JSON
document should contain. The Schema are, themselves, written in
JSON
!
Whilst schema can become extremely complicated in some scenarios, they are best designed to be quite succinct. See below
for the schema (and matching JSON
) for an integer and a string variable.
JSON:
{
"id": 1,
"name": "Tom"
}
Schema:
{
"type": "object",
"title": "An id number and a name",
"properties": {
"id": {
"type": "integer",
"title": "An integer number",
"default": 0
},
"name": {
"type": "string",
"title": "A string name",
"default": ""
}
}
}
Useful resources¶
Link |
Resource |
---|---|
Useful web tool for inferring schema from existing json |
|
A powerful online editor for json, allowing manipulation of large documents better than most text editors |
|
The JSON standard spec |
|
The (draft standard) JSONSchema spec |
|
A front end library for generating webforms directly from a schema |
Human readability¶
Back in our Requirements of the framework section, we noted it was important for humans to read and understand schema.
The actual documents themselves are pretty easy to read by technical users. But, for non technical users, readability can be
enhanced even further by the ability to turn JSONSchema
into web forms automatically. For our example above, we can
autogenerate a web form straight from the schema:

Web form generated from the example schema above.¶
Thus, we can take a schema (or a part of a schema) and use it to generate a control form for a digital twin in a web interface without writing a separate form component - great for ease and maintainability.
Why not XML?¶
In a truly excellent three-part blog, writer Seva Savris takes us
through the ups and downs of JSON
versus XML
; well worth a read if wishing to understand the respective technologies
better.
In short, both JSON
and XML
are generalised data interchange specifications and can both can do what we want here.
We choose JSON
because:
Textual representation is much more concise and easy to understand (very important where non-developers like engineers and scientists must be expected to interpret schema)
Attack vectors. Because entities in
XML
are not necessarily primitives (unlike inJSON
), anXML
document parser in its default state may leave a system open to XXE injection attacks and DTD validation attacks, and therefore requires hardening.JSON
documents are similarly afflicated (just like any kind of serialized data) but default parsers, operating on the premise of only deserializing to primitive types, are safe by default - it is only when nondefault parsering or deserialization techniques (such asJSONP
) are used that the application becomes vulnerable. By utilising a defaultJSON
parser we can therefore significantly shrink the attack surface of the system. See this blog post for further discussion.XML
is powerful… perhaps too powerful. The standard can be adapted greatly, resulting in high encapsulation and a high resilience to future unknowns. Both beneficial. However, this requires developers of twins to maintain interfaces of very high complexity, adaptable to a much wider variety of input. To enable developers to progress, we suggest handling changes and future unknowns through well-considered versioning, whilst keeping their API simple.XML
allows baked-in validation of data and attributes. Whilst advantageous in some situations, this is not a benefit here. We wish validation to be one-sided: validation of data accepted/generated by a digital twin should be occur within (at) the boundaries of that twin.Required validation capabilities, built into
XML
are achievable withJSONSchema
(otherwise missing from the pureJSON
standard)JSON
is a more compact expression than XML, significantly reducing memory and bandwidth requirements. Whilst not a major issue for most modern PCS, sensors on the edge may have limited memory, and both memory and bandwidth at scale are extremely expensive. Thus for extremely large networks of interconnected systems there could be significant speed and cost savings.
Other Considerations¶
A variety of thoughts that arose whilst architecting twined.
Bash-style stdio¶
Some thought was given to using a very old-school-unix approach to piping data between twins, via stdout.
Whilst attractive (as being a wildly fast way of piping data between twins on the same machine) it was felt this was insufficiently general, eg:
where twins don’t exist on the same machine or container, making it cumbersome to engineer common iostreams
where slight differences between different shells might lead to incompatibilities or changes in behaviour
And also unfriendly, eg:
engineers or scientists unfamiliar with subtleties of bash shell scripting encounter difficulty piping data around
difficult to build friendly web based tools to introspect the data and configuration
bound to be headaches on windows platforms, even though windows now supports bash
easy to corrupt using third party libraries (e.g. which print to stdout)
Units¶
Being used (mostly) for engineering and scientific analysis, it was tempting to add in a specified sub-schema for units. For example, mandating that where values can be given in units, they be specified in a certain way, like:
{
"wind_speed": {
"value": 10.2,
"units": "mph"
}
}
or (more succinct):
{
"wind_speed": 10.2,
"wind_speed_units": "mph"
}
It’s still extremely tempting to provide this facility; or at least provide some way of specifying in the schema what units a value should be provided in. Thinking about it but don’t have time right now. If anybody wants to start crafting a PR with an extension or update to twined that facilitates this; please raise an issue to start progressing it.
Variable Style¶
A premptive stamp on the whinging…
Note that in the JSON
descriptions above, all variables are named in snake_case
rather than camelCase
. This
decision, more likely than even Brexit to divide opinions, is based on:
- The reservation of snake case for the schema spec has the subtle advantage that in future, we might be able to use
camelCase within the spec to denote class types in some useful way, just like in python. Not sure yet; just mulling.
- The Requirements of the framework mention human-readability as a must;
this paper suggests a 20% slower comprehension of camel case than snake, although to be fair that’s probably arguable.
- We’re starting in Python so are taking a lead from PEP8, which is bar none the most successful style guide on the
planet, because it got everybody on the same page really early on.
If existing code that you’re dropping in uses camelCase, please don’t file that as an issue… converting property names automatically after schema validation generation is trivial, there are tons of libraries (like humps) to do it.
We’d also consider a pull request for a built-in utility converting to and from that does this following validation and prior to returning results. Suggest your proposed approach on the issues board.
Language Choice¶
twined is presently released in python only. It won’t be too hard to replicate functionality in other languages, and we’re considering other languages at present, so might be easily persuadable ;)
If you require implementation of twined in a different language, and are willing to consider sponsorship of development and maintenance of that library, please file an issue.
Deployment¶
Deploying with Octue¶
Octue provides automated deployment to a cloud provider (like GCP or Azure), along with permissions and user management, monitoring, logging and data storage management out of the box.
There are also a whole bunch of collaborative helper tools, like the graphical twine builder and manifesting tools, designed to speed up the process of building and using twines.
The full set of services is in early beta, get in touch and we can help you architect systems - from small data services to large networks of Digital Twins.
Coming Soon - Deploying with doctue¶
Once we’ve bedded down our services internally at Octue, we’ll be open-sourcing more parts of our build/deploy process, including docker containers with pre-configured servers to run and monitor twine-based services and digital twins.
This will allow services to be easily spun up on GCP, Azure Digital Ocean etc., and be a nice halfway house between fully managed system on Octue and running your own webserver. Of course, without all the collaborative and data management features that Octue provides ;)
We’re looking for commercial sponsors for this part of the process - if that could be you, please get in touch
Deploying as a command-line application¶
Use the open-source octue app template as a guide. Write your new python code (or call your existing tools/libraries) within it. It’s set up to wrap and check configuration, inputs and outputs using twined. Follow the instructions there to set up your inputs, and your files, and run an analysis.
Deploying with your own web server¶
You can use any python based web server (need another language? see Language Choice):
Add
configuration_values_data
to your webserver configSet up an endpoint to allow.
Set up an endpoint to handle incoming requests / socket messages - these will be
input_values_data
.Treat these requests / messages as events which trigger a task.
- In your task framework (e.g. your celery task), either:
Use twined directly to validate the
input_values_data
/output_values_data
(and, on startup, theconfiguration_values_data
) and handle running any required analysis yourself, orimport your analysis app (as built in Deploying as a command-line application) and call it with the configuration and input data in your task framework.
Return the result to the client.
License¶
Octue maintains twined as an open source project, under the MIT license.
The boring bit¶
Copyright (c) 2013-2019 Octue Ltd, All Rights Reserved.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Third Party Libraries¶
twined includes or is linked against the following third party libraries:
Plotly.js¶
The MIT License (MIT)
Copyright (c) 2020 Plotly, Inc
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
jsonschema¶
Copyright (c) 2013 Julian Berman
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Version History¶
Origins¶
twined began as an internal tool at Octue, enabling applications to be connected together in the Octue ecosystem.
The twined library is presently being ported out of Octue’s SDKs as it became clear that it would be most beneficial to open-source the framework we developed to connect applications and digital twins together.
0.0.x¶
Initial library framework - development version. Highly unstable! Let’s see what happens…
New Features¶
Documentation
Travis- and RTD- based test and documentation build with Codecov integration
Load and validation of twine itself against twine schema
Main
Twine()
class with strands set as attributesValidation of input, config and output values against twine
Validation of manifest json
Credential parsing from the environment and validation
Hook allowing instantiation of inputs and config to a given class e.g.
Manifest
Tests to cover the majority of functionality
Backward Incompatible API Changes¶
n/a (Initial release)
Bug Fixes & Minor Changes¶
n/a (Initial Release)