Skip to content

Commit 7565ce2

Browse files
committed
docs: update documentation to reflect Entity+Edge model
- Rewrite metadata_models.md for Entity+Edge instead of typed Asset protos - Update script processor README with entity.properties examples - Update HTTP extractor/sink READMEs with entity terminology - Update application_yaml README with Entity and Edge output tables - Update CLAUDE.md with new architecture and data model sections
1 parent 3feefd2 commit 7565ce2

File tree

6 files changed

+308
-338
lines changed

6 files changed

+308
-338
lines changed

CLAUDE.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
# Meteor
2+
3+
Meteor is a plugin-driven metadata collection agent. It extracts metadata from data stores/services via **extractors**, transforms it via **processors**, and pushes it to catalog services via **sinks**.
4+
5+
## Architecture
6+
7+
```
8+
Recipe (YAML) → Extractor → Processor(s) → Sink(s)
9+
```
10+
11+
Each extractor emits **Records**. A Record contains:
12+
- **Entity**: urn, type, name, description, source, properties (flat structpb.Struct)
13+
- **Edges**: list of relationships, each with source_urn, target_urn, type, source, properties
14+
15+
Ownership is represented as edges with type `owned_by`. Lineage (upstreams/downstreams) is represented as edges with type `lineage`.
16+
17+
- **Extractors**: 34+ plugins (bigquery, postgres, kafka, github, etc.)
18+
- **Processors**: Transform/enrich records in-flight
19+
- **Sinks**: Push to destinations (compass, kafka, file, http, etc.)
20+
- **Agent**: Orchestrates the pipeline with batching, retries, concurrency
21+
22+
## Key Directories
23+
24+
```
25+
models/ Core data model (Record wrapping Entity + Edges)
26+
plugins/
27+
extractors/ Source plugins (one dir per source)
28+
processors/ Transform plugins
29+
sinks/ Destination plugins (compass, kafka, file, etc.)
30+
agent/ Pipeline orchestration
31+
recipe/ Recipe parsing and validation
32+
cmd/ CLI commands (run, lint, list, info, gen)
33+
```
34+
35+
## Data Model
36+
37+
**Entity** (`meteorv1beta1.Entity`):
38+
- `urn` - Unique resource name
39+
- `type` - Entity type (table, dashboard, topic, job, user, bucket, application, model, etc.)
40+
- `name` - Human-readable name
41+
- `description` - Description
42+
- `source` - Source system (e.g. bigquery, postgres, kafka)
43+
- `properties` - Flat key-value map (structpb.Struct) holding all type-specific metadata
44+
45+
**Edge** (`meteorv1beta1.Edge`):
46+
- `source_urn` - URN of the source entity
47+
- `target_urn` - URN of the target entity
48+
- `type` - Relationship type (`owned_by`, `lineage`, etc.)
49+
- `source` - Source system
50+
- `properties` - Additional metadata
51+
52+
## Compass Integration
53+
54+
The Compass sink (`plugins/sinks/compass/`) sends entities and edges to Compass. Each Record is an Entity with flat properties, plus Edges for ownership and lineage.
55+
56+
## Build & Test
57+
58+
```
59+
go build ./...
60+
go test ./...
61+
make lint
62+
```
63+
64+
## Plan: Align Meteor with Compass v2
65+
66+
See `.claude/plans/compass-v2-alignment.md` for the implementation plan.
Lines changed: 95 additions & 99 deletions
Original file line numberDiff line numberDiff line change
@@ -1,112 +1,108 @@
11
# Meteor Metadata Model
22

3-
We have a set of defined metadata models which define the structure of metadata
4-
that meteor will yield. To visit the metadata models being used by different
5-
extractors please visit [here](extractors.md). We are currently using the
6-
following metadata models:
7-
8-
- [Bucket][proton-bucket]: Used for metadata being extracted from buckets.
9-
Buckets are the basic containers in Google cloud services, or Amazon S3, etc.,
10-
that are used for data storage, and quite popular because of their features of
11-
access management, aggregation of usage and services and ease of
12-
configurations. Currently, Meteor provides a metadata extractor for the
13-
buckets mentioned [here](extractors.md#bucket)
14-
15-
- [Dashboard][proton-dashboard]: Dashboards are an essential part of data
16-
analysis and are used to track, analyze, and visualize. These Dashboard
17-
metadata model includes some basic fields like `urn` and `source`, etc., and a
18-
list of `Chart`. There are multiple dashboards that are essential for Data
19-
Analysis such as metabase, grafana, tableau, etc. Please refer to the list of
20-
'Dashboard' extractors meteor currently
21-
supports [here](extractors.md#dashboard).
22-
23-
- [Chart][proton-dashboard]: Charts are included in all the Dashboard and are
24-
the result of certain queries in a Dashboard. Information about them
25-
includes the information of the query and few similar details.
26-
27-
- [User][proton-user]: This metadata model is used for defining the output of
28-
extraction on User accounts. Some of these sources can be GitHub, Workday,
29-
Google Suite, LDAP. Please refer to the list of 'User' extractors meteor
30-
currently supports [here](extractors.md#user).
31-
32-
- [Table][proton-table]: This metadata model is being used by extractors based
33-
around databases, typically for the ones that store data in tabular format. It
34-
contains various fields that include `schema` of the table and other access
35-
related information. Please refer to the list of 'Table' extractors meteor
36-
currently supports [here](extractors.md#table).
37-
38-
- [Job][proton-job]: A job can represent a scheduled or recurring task that
39-
performs some transformation in the data engineering pipeline. Job is a
40-
metadata model built for this purpose. Please refer to the list of 'Job'
41-
extractors meteor currently supports [here](extractors.md#table).
42-
43-
- [Topic][proton-topic]: A topic represents a virtual group for logical group of
44-
messages in message bus like kafka, pubsub, pulsar etc. Please refer to the
45-
list of 'Topic' extractors meteor currently
46-
supports [here](extractors.md#topic).
47-
48-
- [Machine Learning Feature Table][proton-featuretable]: A Feature Table is a
49-
table or view that represents a logical group of time-series feature data as
50-
it is found in a data source. Please refer to the list of 'Feature Table'
51-
extractors meteor currently
52-
supports [here](extractors.md#machine-learning-feature-table).
53-
54-
- [Application][proton-application]: An application represents a service that
55-
typically communicates over well-defined APIs. Please refer to the list of '
56-
Application' extractors meteor currently
57-
supports [here](extractors.md#application).
58-
59-
- [Machine Learning Model][proton-model]: A Model represents a Data Science
60-
Model commonly used for Machine Learning(ML). Models are algorithms trained on
61-
data to find patterns or make predictions. Models typically consume ML
62-
features to generate a meaningful output. Please refer to the list of 'Model'
63-
extractors meteor currently
64-
supports [here](extractors.md#machine-learning-model).
65-
66-
`Proto` has been used to define these metadata models. To check their
67-
implementation please refer [here][proton-assets].
3+
Meteor uses an **Entity + Edge** model to represent metadata. Each extractor emits one or more **Records**, where each Record contains an **Entity** and zero or more **Edges**.
684

69-
## Usage
5+
## Entity
6+
7+
An Entity represents a metadata resource (table, dashboard, topic, job, user, etc.). All entity types share a single flat structure:
8+
9+
| Field | Type | Description |
10+
|:--------------|:------------------------|:--------------------------------------------------------|
11+
| `urn` | `string` | Unique resource name. Format: `urn:{source}:{scope}:{type}:{name}` |
12+
| `type` | `string` | Entity type: `table`, `dashboard`, `topic`, `job`, `user`, `bucket`, `application`, `model`, `feature_table`, `metric`, `experiment`, `group` |
13+
| `name` | `string` | Human-readable name |
14+
| `description` | `string` | Description of the entity |
15+
| `source` | `string` | Source system (e.g. `bigquery`, `postgres`, `kafka`) |
16+
| `properties` | `structpb.Struct` | Flat key-value map holding all type-specific metadata (schema, columns, charts, config, labels, etc.) |
17+
18+
There are no separate typed schemas (e.g. no `Table`, `Dashboard`, `Bucket` proto types). All metadata is stored as flat key-value pairs in `properties`.
19+
20+
## Edge
21+
22+
An Edge represents a relationship between two entities (ownership, lineage, etc.):
23+
24+
| Field | Type | Description |
25+
|:--------------|:------------------|:--------------------------------------------------------|
26+
| `source_urn` | `string` | URN of the source entity |
27+
| `target_urn` | `string` | URN of the target entity |
28+
| `type` | `string` | Relationship type: `owned_by`, `lineage`, etc. |
29+
| `source` | `string` | Source system that reported this relationship |
30+
| `properties` | `structpb.Struct` | Additional metadata about the relationship |
31+
32+
### Relationship Types
33+
34+
- **`owned_by`**: Indicates ownership. Replaces the old `owners` field.
35+
- **`lineage`**: Indicates data flow (upstream/downstream). Replaces the old `lineage.upstreams` and `lineage.downstreams` fields.
7036

71-
[//]: # "@formatter:off"
37+
## Record
38+
39+
A Record is the unit of data flowing through the Meteor pipeline. It wraps an Entity and its associated Edges:
40+
41+
- `record.Entity()` returns the Entity.
42+
- `record.Edges()` returns the list of Edges.
43+
44+
## Supported Entity Types
45+
46+
- **bucket**: Cloud storage containers (GCS, S3, etc.)
47+
- **dashboard**: Data visualization dashboards (Metabase, Grafana, Tableau, etc.)
48+
- **table**: Database tables and views (BigQuery, Postgres, MySQL, etc.)
49+
- **topic**: Message bus topics (Kafka, Pub/Sub, Pulsar, etc.)
50+
- **job**: Scheduled/recurring data transformation tasks
51+
- **user**: User accounts (GitHub, LDAP, Google Suite, etc.)
52+
- **application**: Services communicating over APIs
53+
- **model**: Machine learning models
54+
- **feature_table**: ML feature tables
55+
- **metric**: Metric definitions
56+
- **experiment**: A/B experiments
57+
- **group**: User groups
58+
59+
To see which extractors emit which entity types, visit [here](extractors.md).
60+
61+
## Usage
7262

7363
```golang
74-
import(
75-
assetsv1beta1 "github.com/raystack/meteor/models/raystack/assets/v1beta1"
76-
"github.com/raystack/meteor/models/raystack/assets/facets/v1beta1"
64+
import (
65+
"github.com/raystack/meteor/models"
66+
meteorv1beta1 "github.com/raystack/proton/meteor/v1beta1"
67+
"google.golang.org/protobuf/types/known/structpb"
7768
)
7869

79-
func main(){
80-
// result is a var of data type of assetsv1beta1.Table one of our metadata model
81-
result := &assetsv1beta1.Table{
82-
// assigining value to metadata model
83-
Urn: fmt.Sprintf("%s.%s", dbName, tableName),
84-
Name: tableName,
70+
func main() {
71+
// Build properties
72+
props, _ := structpb.NewStruct(map[string]interface{}{
73+
"schema": map[string]interface{}{
74+
"columns": []interface{}{
75+
map[string]interface{}{
76+
"name": "column_name",
77+
"data_type": "varchar",
78+
"is_nullable": true,
79+
"length": 256,
80+
},
81+
},
82+
},
83+
})
84+
85+
// Create an Entity
86+
entity := &meteorv1beta1.Entity{
87+
Urn: "urn:postgres:mydb:table:mydb.my_table",
88+
Type: "table",
89+
Name: "my_table",
90+
Source: "postgres",
91+
Properties: props,
8592
}
8693

87-
// using column facet to add metadata info of schema
88-
89-
var columns []*facetsv1beta1.Column
90-
columns = append(columns, &facetsv1beta1.Column{
91-
Name: "column_name",
92-
DataType: "varchar",
93-
IsNullable: true,
94-
Length: 256,
95-
})
96-
result.Schema = &facetsv1beta1.Columns{
97-
Columns: columns,
94+
// Create ownership and lineage as Edges
95+
edges := []*meteorv1beta1.Edge{
96+
{
97+
SourceUrn: "urn:postgres:mydb:table:mydb.my_table",
98+
TargetUrn: "urn:user:myorg:user:alice",
99+
Type: "owned_by",
100+
Source: "postgres",
101+
},
98102
}
103+
104+
// Wrap in a Record for the pipeline
105+
record := models.NewRecord(entity, edges)
106+
_ = record
99107
}
100108
```
101-
102-
[//]: # "@formatter:on"
103-
[proton-bucket]: https://github.com/raystack/proton/tree/main/raystack/assets/v1beta2/bucket.proto
104-
[proton-dashboard]: https://github.com/raystack/proton/tree/main/raystack/assets/v1beta2/dashboard.proto
105-
[proton-user]: https://github.com/raystack/proton/tree/main/raystack/assets/v1beta2/user.proto
106-
[proton-table]: https://github.com/raystack/proton/tree/main/raystack/assets/v1beta2/table.proto
107-
[proton-job]: https://github.com/raystack/proton/tree/main/raystack/assets/v1beta2/job.proto
108-
[proton-topic]: https://github.com/raystack/proton/tree/main/raystack/assets/v1beta2/topic.proto
109-
[proton-featuretable]: https://github.com/raystack/proton/tree/main/raystack/assets/v1beta2/feature_table.proto
110-
[proton-application]: https://github.com/raystack/proton/tree/main/raystack/assets/v1beta2/application.proto
111-
[proton-model]: https://github.com/raystack/proton/tree/main/raystack/assets/v1beta2/model.proto
112-
[proton-assets]: https://github.com/raystack/proton/tree/main/raystack/assets/v1beta2

plugins/extractors/application_yaml/README.md

Lines changed: 27 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -33,11 +33,11 @@ description: "string"
3333
url: "string"
3434
version: "string"
3535
inputs: # OPTIONAL
36-
# Format: "urn:{service}:{scope}:{type}:{name}"
36+
# Format: "urn:{source}:{scope}:{type}:{name}"
3737
- urn:bigquery:bq-raw-internal:table:bq-raw-internal:dagstream.production_feast09_s2id13_30min_demand
3838
- urn:kafka:int-dagstream-kafka.yonkou.io:topic:staging_feast09_s2id13_30min_demand
3939
outputs: # OPTIONAL
40-
# Format: "urn:{service}:{scope}:{type}:{name}"
40+
# Format: "urn:{source}:{scope}:{type}:{name}"
4141
- urn:kafka:1-my-kafka.com:topic:staging_feast09_mixed_granularity_demand_forecast_3es
4242
create_time: "2006-01-02T15:04:05Z"
4343
update_time: "2006-01-02T15:04:05Z"
@@ -62,34 +62,34 @@ following env vars are utilised for it:
6262

6363
## Outputs
6464

65-
The application is mapped to an [`Asset`][proton-asset] with model specific
66-
metadata stored using [`Application`][proton-application]. Please refer the
67-
proto definitions for more information.
68-
69-
| Field | Value | Sample Value |
70-
| :-------------------------- | :------------------------------------------------------------ | :----------------------------------------------------------------------------- |
71-
| `resource.urn` | `urn:application_yaml:{scope}:application:{application.name}` | `urn:application_yaml:integration:application:order-manager` |
72-
| `resource.name` | `{application.name}` | `order-manager` |
73-
| `resource.service` | `application_yaml` | `application_yaml` |
74-
| `resource.type` | `application` | `application` |
75-
| `resource.url` | `{application.url}` | `https://github.com/mycompany/order-manager` |
76-
| `resource.description` | `{application.description` | `Order-Manager is the order management system for MyCompany` |
77-
| `application_id` | `application.id` | `0adf3214-676c-4a74-ab37-9d4a4b8ade0e` |
78-
| `version` | `application.version` | `d6ec883` |
79-
| `create_time` | `{application.create_time}` | `2022-08-08T03:17:54Z` |
80-
| `update_time` | `{application.update_time}` | `2022-08-08T03:57:54Z` |
81-
| `ownership.owners[0].urn` | `{application.team.id}` | `9ebcc2f8-5894-47c6-83a9-160b7eaa3f6b` |
82-
| `ownership.owners[0].name` | `{application.team.name}` | `Search` |
83-
| `ownership.owners[0].email` | `{application.team.email}` | `search@mycompany.com` |
84-
| `lineage.upstreams[].urn` | `{application.inputs[]}` | `urn:kafka:int-kafka.yonkou.io:topic:staging_30min_demand` |
85-
| `lineage.downstreams[].urn` | `{application.outputs[]}` | `urn:bigquery:bq-internal:table:bq-internal:dagstream.production_30min_demand` |
86-
| `resource.labels` | `map[string]string` | `{"team": "Booking Experience"}` |
65+
The extractor emits a Record containing an Entity and Edges.
66+
67+
### Entity
68+
69+
| Field | Value | Sample Value |
70+
| :------------------ | :------------------------------------------------------------ | :----------------------------------------------------------- |
71+
| `urn` | `urn:application_yaml:{scope}:application:{application.name}` | `urn:application_yaml:integration:application:order-manager` |
72+
| `name` | `{application.name}` | `order-manager` |
73+
| `source` | `application_yaml` | `application_yaml` |
74+
| `type` | `application` | `application` |
75+
| `description` | `{application.description}` | `Order-Manager is the order management system for MyCompany` |
76+
| `properties.url` | `{application.url}` | `https://github.com/mycompany/order-manager` |
77+
| `properties.id` | `{application.id}` | `0adf3214-676c-4a74-ab37-9d4a4b8ade0e` |
78+
| `properties.version`| `{application.version}` | `d6ec883` |
79+
| `properties.create_time` | `{application.create_time}` | `2022-08-08T03:17:54Z` |
80+
| `properties.update_time` | `{application.update_time}` | `2022-08-08T03:57:54Z` |
81+
| `properties.labels` | `map[string]string` | `{"team": "Booking Experience"}` |
82+
83+
### Edges
84+
85+
| Edge Type | Description | Example |
86+
|:------------|:----------------------------------------|:-----------------------------------------------------------------------------------|
87+
| `owned_by` | Team ownership from `application.team` | `source_urn: <app_urn>`, `target_urn: {team.id}`, `properties: {name, email}` |
88+
| `lineage` | Upstream from `application.inputs[]` | `source_urn: {input_urn}`, `target_urn: <app_urn>`, `type: lineage` |
89+
| `lineage` | Downstream from `application.outputs[]` | `source_urn: <app_urn>`, `target_urn: {output_urn}`, `type: lineage` |
8790

8891
## Contributing
8992

9093
Refer to
9194
the [contribution guidelines](../../../docs/docs/contribute/guide.md#adding-a-new-extractor)
9295
for information on contributing to this module.
93-
94-
[proton-asset]: https://github.com/raystack/proton/blob/fabbde8/raystack/assets/v1beta2/asset.proto#L14
95-
[proton-application]: https://github.com/raystack/proton/blob/fabbde8/raystack/assets/v1beta2/application.proto#L11

0 commit comments

Comments
 (0)