|
1 | 1 | # Meteor Metadata Model |
2 | 2 |
|
3 | | -We have a set of defined metadata models which define the structure of metadata |
4 | | -that meteor will yield. To visit the metadata models being used by different |
5 | | -extractors please visit [here](extractors.md). We are currently using the |
6 | | -following metadata models: |
7 | | - |
8 | | -- [Bucket][proton-bucket]: Used for metadata being extracted from buckets. |
9 | | - Buckets are the basic containers in Google cloud services, or Amazon S3, etc., |
10 | | - that are used for data storage, and quite popular because of their features of |
11 | | - access management, aggregation of usage and services and ease of |
12 | | - configurations. Currently, Meteor provides a metadata extractor for the |
13 | | - buckets mentioned [here](extractors.md#bucket) |
14 | | - |
15 | | -- [Dashboard][proton-dashboard]: Dashboards are an essential part of data |
16 | | - analysis and are used to track, analyze, and visualize. These Dashboard |
17 | | - metadata model includes some basic fields like `urn` and `source`, etc., and a |
18 | | - list of `Chart`. There are multiple dashboards that are essential for Data |
19 | | - Analysis such as metabase, grafana, tableau, etc. Please refer to the list of |
20 | | - 'Dashboard' extractors meteor currently |
21 | | - supports [here](extractors.md#dashboard). |
22 | | - |
23 | | - - [Chart][proton-dashboard]: Charts are included in all the Dashboard and are |
24 | | - the result of certain queries in a Dashboard. Information about them |
25 | | - includes the information of the query and few similar details. |
26 | | - |
27 | | -- [User][proton-user]: This metadata model is used for defining the output of |
28 | | - extraction on User accounts. Some of these sources can be GitHub, Workday, |
29 | | - Google Suite, LDAP. Please refer to the list of 'User' extractors meteor |
30 | | - currently supports [here](extractors.md#user). |
31 | | - |
32 | | -- [Table][proton-table]: This metadata model is being used by extractors based |
33 | | - around databases, typically for the ones that store data in tabular format. It |
34 | | - contains various fields that include `schema` of the table and other access |
35 | | - related information. Please refer to the list of 'Table' extractors meteor |
36 | | - currently supports [here](extractors.md#table). |
37 | | - |
38 | | -- [Job][proton-job]: A job can represent a scheduled or recurring task that |
39 | | - performs some transformation in the data engineering pipeline. Job is a |
40 | | - metadata model built for this purpose. Please refer to the list of 'Job' |
41 | | - extractors meteor currently supports [here](extractors.md#table). |
42 | | - |
43 | | -- [Topic][proton-topic]: A topic represents a virtual group for logical group of |
44 | | - messages in message bus like kafka, pubsub, pulsar etc. Please refer to the |
45 | | - list of 'Topic' extractors meteor currently |
46 | | - supports [here](extractors.md#topic). |
47 | | - |
48 | | -- [Machine Learning Feature Table][proton-featuretable]: A Feature Table is a |
49 | | - table or view that represents a logical group of time-series feature data as |
50 | | - it is found in a data source. Please refer to the list of 'Feature Table' |
51 | | - extractors meteor currently |
52 | | - supports [here](extractors.md#machine-learning-feature-table). |
53 | | - |
54 | | -- [Application][proton-application]: An application represents a service that |
55 | | - typically communicates over well-defined APIs. Please refer to the list of ' |
56 | | - Application' extractors meteor currently |
57 | | - supports [here](extractors.md#application). |
58 | | - |
59 | | -- [Machine Learning Model][proton-model]: A Model represents a Data Science |
60 | | - Model commonly used for Machine Learning(ML). Models are algorithms trained on |
61 | | - data to find patterns or make predictions. Models typically consume ML |
62 | | - features to generate a meaningful output. Please refer to the list of 'Model' |
63 | | - extractors meteor currently |
64 | | - supports [here](extractors.md#machine-learning-model). |
65 | | - |
66 | | -`Proto` has been used to define these metadata models. To check their |
67 | | -implementation please refer [here][proton-assets]. |
| 3 | +Meteor uses an **Entity + Edge** model to represent metadata. Each extractor emits one or more **Records**, where each Record contains an **Entity** and zero or more **Edges**. |
68 | 4 |
|
69 | | -## Usage |
| 5 | +## Entity |
| 6 | + |
| 7 | +An Entity represents a metadata resource (table, dashboard, topic, job, user, etc.). All entity types share a single flat structure: |
| 8 | + |
| 9 | +| Field | Type | Description | |
| 10 | +|:--------------|:------------------------|:--------------------------------------------------------| |
| 11 | +| `urn` | `string` | Unique resource name. Format: `urn:{source}:{scope}:{type}:{name}` | |
| 12 | +| `type` | `string` | Entity type: `table`, `dashboard`, `topic`, `job`, `user`, `bucket`, `application`, `model`, `feature_table`, `metric`, `experiment`, `group` | |
| 13 | +| `name` | `string` | Human-readable name | |
| 14 | +| `description` | `string` | Description of the entity | |
| 15 | +| `source` | `string` | Source system (e.g. `bigquery`, `postgres`, `kafka`) | |
| 16 | +| `properties` | `structpb.Struct` | Flat key-value map holding all type-specific metadata (schema, columns, charts, config, labels, etc.) | |
| 17 | + |
| 18 | +There are no separate typed schemas (e.g. no `Table`, `Dashboard`, `Bucket` proto types). All metadata is stored as flat key-value pairs in `properties`. |
| 19 | + |
| 20 | +## Edge |
| 21 | + |
| 22 | +An Edge represents a relationship between two entities (ownership, lineage, etc.): |
| 23 | + |
| 24 | +| Field | Type | Description | |
| 25 | +|:--------------|:------------------|:--------------------------------------------------------| |
| 26 | +| `source_urn` | `string` | URN of the source entity | |
| 27 | +| `target_urn` | `string` | URN of the target entity | |
| 28 | +| `type` | `string` | Relationship type: `owned_by`, `lineage`, etc. | |
| 29 | +| `source` | `string` | Source system that reported this relationship | |
| 30 | +| `properties` | `structpb.Struct` | Additional metadata about the relationship | |
| 31 | + |
| 32 | +### Relationship Types |
| 33 | + |
| 34 | +- **`owned_by`**: Indicates ownership. Replaces the old `owners` field. |
| 35 | +- **`lineage`**: Indicates data flow (upstream/downstream). Replaces the old `lineage.upstreams` and `lineage.downstreams` fields. |
70 | 36 |
|
71 | | -[//]: # "@formatter:off" |
| 37 | +## Record |
| 38 | + |
| 39 | +A Record is the unit of data flowing through the Meteor pipeline. It wraps an Entity and its associated Edges: |
| 40 | + |
| 41 | +- `record.Entity()` returns the Entity. |
| 42 | +- `record.Edges()` returns the list of Edges. |
| 43 | + |
| 44 | +## Supported Entity Types |
| 45 | + |
| 46 | +- **bucket**: Cloud storage containers (GCS, S3, etc.) |
| 47 | +- **dashboard**: Data visualization dashboards (Metabase, Grafana, Tableau, etc.) |
| 48 | +- **table**: Database tables and views (BigQuery, Postgres, MySQL, etc.) |
| 49 | +- **topic**: Message bus topics (Kafka, Pub/Sub, Pulsar, etc.) |
| 50 | +- **job**: Scheduled/recurring data transformation tasks |
| 51 | +- **user**: User accounts (GitHub, LDAP, Google Suite, etc.) |
| 52 | +- **application**: Services communicating over APIs |
| 53 | +- **model**: Machine learning models |
| 54 | +- **feature_table**: ML feature tables |
| 55 | +- **metric**: Metric definitions |
| 56 | +- **experiment**: A/B experiments |
| 57 | +- **group**: User groups |
| 58 | + |
| 59 | +To see which extractors emit which entity types, visit [here](extractors.md). |
| 60 | + |
| 61 | +## Usage |
72 | 62 |
|
73 | 63 | ```golang |
74 | | -import( |
75 | | - assetsv1beta1 "github.com/raystack/meteor/models/raystack/assets/v1beta1" |
76 | | - "github.com/raystack/meteor/models/raystack/assets/facets/v1beta1" |
| 64 | +import ( |
| 65 | + "github.com/raystack/meteor/models" |
| 66 | + meteorv1beta1 "github.com/raystack/proton/meteor/v1beta1" |
| 67 | + "google.golang.org/protobuf/types/known/structpb" |
77 | 68 | ) |
78 | 69 |
|
79 | | -func main(){ |
80 | | - // result is a var of data type of assetsv1beta1.Table one of our metadata model |
81 | | - result := &assetsv1beta1.Table{ |
82 | | - // assigining value to metadata model |
83 | | - Urn: fmt.Sprintf("%s.%s", dbName, tableName), |
84 | | - Name: tableName, |
| 70 | +func main() { |
| 71 | + // Build properties |
| 72 | + props, _ := structpb.NewStruct(map[string]interface{}{ |
| 73 | + "schema": map[string]interface{}{ |
| 74 | + "columns": []interface{}{ |
| 75 | + map[string]interface{}{ |
| 76 | + "name": "column_name", |
| 77 | + "data_type": "varchar", |
| 78 | + "is_nullable": true, |
| 79 | + "length": 256, |
| 80 | + }, |
| 81 | + }, |
| 82 | + }, |
| 83 | + }) |
| 84 | + |
| 85 | + // Create an Entity |
| 86 | + entity := &meteorv1beta1.Entity{ |
| 87 | + Urn: "urn:postgres:mydb:table:mydb.my_table", |
| 88 | + Type: "table", |
| 89 | + Name: "my_table", |
| 90 | + Source: "postgres", |
| 91 | + Properties: props, |
85 | 92 | } |
86 | 93 |
|
87 | | - // using column facet to add metadata info of schema |
88 | | - |
89 | | - var columns []*facetsv1beta1.Column |
90 | | - columns = append(columns, &facetsv1beta1.Column{ |
91 | | - Name: "column_name", |
92 | | - DataType: "varchar", |
93 | | - IsNullable: true, |
94 | | - Length: 256, |
95 | | - }) |
96 | | - result.Schema = &facetsv1beta1.Columns{ |
97 | | - Columns: columns, |
| 94 | + // Create ownership and lineage as Edges |
| 95 | + edges := []*meteorv1beta1.Edge{ |
| 96 | + { |
| 97 | + SourceUrn: "urn:postgres:mydb:table:mydb.my_table", |
| 98 | + TargetUrn: "urn:user:myorg:user:alice", |
| 99 | + Type: "owned_by", |
| 100 | + Source: "postgres", |
| 101 | + }, |
98 | 102 | } |
| 103 | + |
| 104 | + // Wrap in a Record for the pipeline |
| 105 | + record := models.NewRecord(entity, edges) |
| 106 | + _ = record |
99 | 107 | } |
100 | 108 | ``` |
101 | | - |
102 | | -[//]: # "@formatter:on" |
103 | | -[proton-bucket]: https://github.com/raystack/proton/tree/main/raystack/assets/v1beta2/bucket.proto |
104 | | -[proton-dashboard]: https://github.com/raystack/proton/tree/main/raystack/assets/v1beta2/dashboard.proto |
105 | | -[proton-user]: https://github.com/raystack/proton/tree/main/raystack/assets/v1beta2/user.proto |
106 | | -[proton-table]: https://github.com/raystack/proton/tree/main/raystack/assets/v1beta2/table.proto |
107 | | -[proton-job]: https://github.com/raystack/proton/tree/main/raystack/assets/v1beta2/job.proto |
108 | | -[proton-topic]: https://github.com/raystack/proton/tree/main/raystack/assets/v1beta2/topic.proto |
109 | | -[proton-featuretable]: https://github.com/raystack/proton/tree/main/raystack/assets/v1beta2/feature_table.proto |
110 | | -[proton-application]: https://github.com/raystack/proton/tree/main/raystack/assets/v1beta2/application.proto |
111 | | -[proton-model]: https://github.com/raystack/proton/tree/main/raystack/assets/v1beta2/model.proto |
112 | | -[proton-assets]: https://github.com/raystack/proton/tree/main/raystack/assets/v1beta2 |
0 commit comments