Defining a schema: Bringing your own properties into the graph
You can create your own schemas to enrich instances with custom properties. The schemas consist of three key elements:
- Containers: define physical storage which contains properties.
- Views: establish logical schemas which map properties.
- Data models are collections of one or more views, used for graph data consumption and ingestion.
All three elements are scoped to a space, just like instances:
Containers
Containers are the physical storage for properties. They are defined within a space, and hold a set of properties that logically belong together. You must define types for your properties, and you can add optional constraints that the data must adhere to, and define indexes to optimize query performance.
Containers store properties for instances (nodes and edges.) An instance can have properties in multiple containers:
You can populate the containers for an instance, in this example below, for a node.
This data:
externalId: 'xyz42'
equipment:
manufacturer: 'Acme Inc.'
pump:
maxPressure: 1.2
translates to this:
You can define containers in different space than the space holding the instances. This can be useful if you want to use the same schema for nodes in different spaces, which is often the case given the access control model.
As you add data to these containers for more nodes, the physical storage of the containers will look similar to this:
Note that only node.{space, externalId, type}
is included in the Node
base container for brevity.
This is similar to relational database schemas where (space, externalId)
constitutes a foreign key to the core node table, and results in a snowflake schema. Importantly, this data lives on a different plane than the graph data discussed in the previous section. For example, nothing ensures that a node has data in Pump
just because it has node.type
set to [types, pump]
. Validation of data content is left to the client to determine, but you can use views to make it more ergonomic.
Which types of instances can you use a container for?
The usedFor
field lets you define which types of instances the containers can be used for. Specify one of these values:
node
: the container can only be used to populate properties on a node.edge
: the container can only be used to populate properties on an edge.all
: the container can be used to populate properties on both nodes and edges.
If you use all
, ingesting to the container will be more expensive than using only node
or edge
.
Properties
When you define a container, you must specify the properties it will contain. Data modeling supports the following basic data types for properties:
Property type | Description |
---|---|
text | A string of characters. |
int64 | A 64-bit integer. |
float64 | A 64-bit floating point number. |
float32 | A 32-bit floating point number. |
boolean | A boolean value. |
timestamp | A timestamp (with timezone). |
date | A date (without timezone). |
json | A JSON object. |
direct | A direct relation to another instance. |
In addition to these property types, we support native reference types that point to resources in other CDF APIs. This lets you reference data not suited for storage in a property graph. We support the following native resource reference types:
Native resource reference type | Description |
---|---|
TimeSeries | A reference to one specific time series. You can use GraphQL queries to expand data from the time series, including data points. |
File | A reference to a file stored in CDF and uploaded through the files service. |
Sequences | A reference to a sequence stored in CDF. |
With the exception of the direct
type, we support declaring all of these base and reference types as lists. For example, to store a list of file references: files: [File]
You can specify whether the property is nullable, and provide a default value.
The full specification of a required string property can look like this:
name: myStringProperty
description: A string property
nullable: false
defaultValue: foo
type:
type: string
list: false