raystack
diff --git a/‎CLAUDE.md‎
Lines changed: 66 additions & 0 deletions b/‎CLAUDE.md‎
Lines changed: 66 additions & 0 deletions
diff --git a/‎docs/docs/reference/metadata_models.md‎
Lines changed: 95 additions & 99 deletions b/‎docs/docs/reference/metadata_models.md‎
Lines changed: 95 additions & 99 deletions
diff --git a/‎plugins/extractors/application_yaml/README.md‎
Lines changed: 27 additions & 27 deletions b/‎plugins/extractors/application_yaml/README.md‎
Lines changed: 27 additions & 27 deletions
@@ -0,0 +1,66 @@
+# Meteor
+
+Meteor is a plugin-driven metadata collection agent. It extracts metadata from data stores/services via **extractors**, transforms it via **processors**, and pushes it to catalog services via **sinks**.
+
+## Architecture
+
+```
+Recipe (YAML) → Extractor → Processor(s) → Sink(s)
+```
+
+Each extractor emits **Records**. A Record contains:
+- **Entity**: urn, type, name, description, source, properties (flat structpb.Struct)
+- **Edges**: list of relationships, each with source_urn, target_urn, type, source, properties
+
+Ownership is represented as edges with type `owned_by`. Lineage (upstreams/downstreams) is represented as edges with type `lineage`.
+
+- **Extractors**: 34+ plugins (bigquery, postgres, kafka, github, etc.)
+- **Processors**: Transform/enrich records in-flight
+- **Sinks**: Push to destinations (compass, kafka, file, http, etc.)
+- **Agent**: Orchestrates the pipeline with batching, retries, concurrency
+
+## Key Directories
+
+```
+models/          Core data model (Record wrapping Entity + Edges)
+plugins/
+  extractors/    Source plugins (one dir per source)
+  processors/    Transform plugins
+  sinks/         Destination plugins (compass, kafka, file, etc.)
+agent/           Pipeline orchestration
+recipe/          Recipe parsing and validation
+cmd/             CLI commands (run, lint, list, info, gen)
+```
+
+## Data Model
+
+**Entity** (`meteorv1beta1.Entity`):
+- `urn` - Unique resource name
+- `type` - Entity type (table, dashboard, topic, job, user, bucket, application, model, etc.)
+- `name` - Human-readable name
+- `description` - Description
+- `source` - Source system (e.g. bigquery, postgres, kafka)
+- `properties` - Flat key-value map (structpb.Struct) holding all type-specific metadata
+
+**Edge** (`meteorv1beta1.Edge`):
+- `source_urn` - URN of the source entity
+- `target_urn` - URN of the target entity
+- `type` - Relationship type (`owned_by`, `lineage`, etc.)
+- `source` - Source system
+- `properties` - Additional metadata
+
+## Compass Integration
+
+The Compass sink (`plugins/sinks/compass/`) sends entities and edges to Compass. Each Record is an Entity with flat properties, plus Edges for ownership and lineage.
+
+## Build & Test
+
+```
+go build ./...
+go test ./...
+make lint
+```
+
+## Plan: Align Meteor with Compass v2
+
+See `.claude/plans/compass-v2-alignment.md` for the implementation plan.
@@ -1,112 +1,108 @@
 # Meteor Metadata Model
 
-We have a set of defined metadata models which define the structure of metadata
-that meteor will yield. To visit the metadata models being used by different
-extractors please visit [here](extractors.md). We are currently using the
-following metadata models:
-
-- [Bucket][proton-bucket]: Used for metadata being extracted from buckets.
-  Buckets are the basic containers in Google cloud services, or Amazon S3, etc.,
-  that are used for data storage, and quite popular because of their features of
-  access management, aggregation of usage and services and ease of
-  configurations. Currently, Meteor provides a metadata extractor for the
-  buckets mentioned [here](extractors.md#bucket)
-
-- [Dashboard][proton-dashboard]: Dashboards are an essential part of data
-  analysis and are used to track, analyze, and visualize. These Dashboard
-  metadata model includes some basic fields like `urn` and `source`, etc., and a
-  list of `Chart`. There are multiple dashboards that are essential for Data
-  Analysis such as metabase, grafana, tableau, etc. Please refer to the list of
-  'Dashboard' extractors meteor currently
-  supports [here](extractors.md#dashboard).
-
-  - [Chart][proton-dashboard]: Charts are included in all the Dashboard and are
-    the result of certain queries in a Dashboard. Information about them
-    includes the information of the query and few similar details.
-
-- [User][proton-user]: This metadata model is used for defining the output of
-  extraction on User accounts. Some of these sources can be GitHub, Workday,
-  Google Suite, LDAP. Please refer to the list of 'User' extractors meteor
-  currently supports [here](extractors.md#user).
-
-- [Table][proton-table]: This metadata model is being used by extractors based
-  around databases, typically for the ones that store data in tabular format. It
-  contains various fields that include `schema` of the table and other access
-  related information. Please refer to the list of 'Table' extractors meteor
-  currently supports [here](extractors.md#table).
-
-- [Job][proton-job]: A job can represent a scheduled or recurring task that
-  performs some transformation in the data engineering pipeline. Job is a
-  metadata model built for this purpose. Please refer to the list of 'Job'
-  extractors meteor currently supports [here](extractors.md#table).
-
-- [Topic][proton-topic]: A topic represents a virtual group for logical group of
-  messages in message bus like kafka, pubsub, pulsar etc. Please refer to the
-  list of 'Topic' extractors meteor currently
-  supports [here](extractors.md#topic).
-
-- [Machine Learning Feature Table][proton-featuretable]: A Feature Table is a
-  table or view that represents a logical group of time-series feature data as
-  it is found in a data source. Please refer to the list of 'Feature Table'
-  extractors meteor currently
-  supports [here](extractors.md#machine-learning-feature-table).
-
-- [Application][proton-application]: An application represents a service that
-  typically communicates over well-defined APIs. Please refer to the list of '
-  Application' extractors meteor currently
-  supports [here](extractors.md#application).
-
-- [Machine Learning Model][proton-model]: A Model represents a Data Science
-  Model commonly used for Machine Learning(ML). Models are algorithms trained on
-  data to find patterns or make predictions. Models typically consume ML
-  features to generate a meaningful output. Please refer to the list of 'Model'
-  extractors meteor currently
-  supports [here](extractors.md#machine-learning-model).
-
-`Proto` has been used to define these metadata models. To check their
-implementation please refer [here][proton-assets].
+Meteor uses an **Entity + Edge** model to represent metadata. Each extractor emits one or more **Records**, where each Record contains an **Entity** and zero or more **Edges**.
 
-## Usage
+## Entity
+
+An Entity represents a metadata resource (table, dashboard, topic, job, user, etc.). All entity types share a single flat structure:
+
+| Field         | Type                    | Description                                             |
+|:--------------|:------------------------|:--------------------------------------------------------|
+| `urn`         | `string`                | Unique resource name. Format: `urn:{source}:{scope}:{type}:{name}` |
+| `type`        | `string`                | Entity type: `table`, `dashboard`, `topic`, `job`, `user`, `bucket`, `application`, `model`, `feature_table`, `metric`, `experiment`, `group` |
+| `name`        | `string`                | Human-readable name                                     |
+| `description` | `string`                | Description of the entity                               |
+| `source`      | `string`                | Source system (e.g. `bigquery`, `postgres`, `kafka`)    |
+| `properties`  | `structpb.Struct`       | Flat key-value map holding all type-specific metadata (schema, columns, charts, config, labels, etc.) |
+
+There are no separate typed schemas (e.g. no `Table`, `Dashboard`, `Bucket` proto types). All metadata is stored as flat key-value pairs in `properties`.
+
+## Edge
+
+An Edge represents a relationship between two entities (ownership, lineage, etc.):
+
+| Field         | Type              | Description                                             |
+|:--------------|:------------------|:--------------------------------------------------------|
+| `source_urn`  | `string`          | URN of the source entity                                |
+| `target_urn`  | `string`          | URN of the target entity                                |
+| `type`        | `string`          | Relationship type: `owned_by`, `lineage`, etc.          |
+| `source`      | `string`          | Source system that reported this relationship            |
+| `properties`  | `structpb.Struct` | Additional metadata about the relationship              |
+
+### Relationship Types
+
+- **`owned_by`**: Indicates ownership. Replaces the old `owners` field.
+- **`lineage`**: Indicates data flow (upstream/downstream). Replaces the old `lineage.upstreams` and `lineage.downstreams` fields.
 
-[//]: # "@formatter:off"
+## Record
+
+A Record is the unit of data flowing through the Meteor pipeline. It wraps an Entity and its associated Edges:
+
+- `record.Entity()` returns the Entity.
+- `record.Edges()` returns the list of Edges.
+
+## Supported Entity Types
+
+- **bucket**: Cloud storage containers (GCS, S3, etc.)
+- **dashboard**: Data visualization dashboards (Metabase, Grafana, Tableau, etc.)
+- **table**: Database tables and views (BigQuery, Postgres, MySQL, etc.)
+- **topic**: Message bus topics (Kafka, Pub/Sub, Pulsar, etc.)
+- **job**: Scheduled/recurring data transformation tasks
+- **user**: User accounts (GitHub, LDAP, Google Suite, etc.)
+- **application**: Services communicating over APIs
+- **model**: Machine learning models
+- **feature_table**: ML feature tables
+- **metric**: Metric definitions
+- **experiment**: A/B experiments
+- **group**: User groups
+
+To see which extractors emit which entity types, visit [here](extractors.md).
+
+## Usage
 
 ```golang
-import(
-    assetsv1beta1 "github.com/raystack/meteor/models/raystack/assets/v1beta1"
-    "github.com/raystack/meteor/models/raystack/assets/facets/v1beta1"
+import (
+    "github.com/raystack/meteor/models"
+    meteorv1beta1 "github.com/raystack/proton/meteor/v1beta1"
+    "google.golang.org/protobuf/types/known/structpb"
 )
 
-func main(){
-    // result is a var of data type of assetsv1beta1.Table one of our metadata model
-    result := &assetsv1beta1.Table{
-        // assigining value to metadata model
-        Urn:  fmt.Sprintf("%s.%s", dbName, tableName),
-        Name: tableName,
+func main() {
+    // Build properties
+    props, _ := structpb.NewStruct(map[string]interface{}{
+        "schema": map[string]interface{}{
+            "columns": []interface{}{
+                map[string]interface{}{
+                    "name":        "column_name",
+                    "data_type":   "varchar",
+                    "is_nullable": true,
+                    "length":      256,
+                },
+            },
+        },
+    })
+
+    // Create an Entity
+    entity := &meteorv1beta1.Entity{
+        Urn:         "urn:postgres:mydb:table:mydb.my_table",
+        Type:        "table",
+        Name:        "my_table",
+        Source:      "postgres",
+        Properties:  props,
     }
 
-    // using column facet to add metadata info of schema
-
-    var columns []*facetsv1beta1.Column
-    columns = append(columns, &facetsv1beta1.Column{
-            Name:       "column_name",
-            DataType:   "varchar",
-            IsNullable: true,
-            Length:     256,
-        })
-    result.Schema = &facetsv1beta1.Columns{
-        Columns: columns,
+    // Create ownership and lineage as Edges
+    edges := []*meteorv1beta1.Edge{
+        {
+            SourceUrn: "urn:postgres:mydb:table:mydb.my_table",
+            TargetUrn: "urn:user:myorg:user:alice",
+            Type:      "owned_by",
+            Source:    "postgres",
+        },
     }
+
+    // Wrap in a Record for the pipeline
+    record := models.NewRecord(entity, edges)
+    _ = record
 }
 ```
-
-[//]: # "@formatter:on"
-[proton-bucket]: https://github.com/raystack/proton/tree/main/raystack/assets/v1beta2/bucket.proto
-[proton-dashboard]: https://github.com/raystack/proton/tree/main/raystack/assets/v1beta2/dashboard.proto
-[proton-user]: https://github.com/raystack/proton/tree/main/raystack/assets/v1beta2/user.proto
-[proton-table]: https://github.com/raystack/proton/tree/main/raystack/assets/v1beta2/table.proto
-[proton-job]: https://github.com/raystack/proton/tree/main/raystack/assets/v1beta2/job.proto
-[proton-topic]: https://github.com/raystack/proton/tree/main/raystack/assets/v1beta2/topic.proto
-[proton-featuretable]: https://github.com/raystack/proton/tree/main/raystack/assets/v1beta2/feature_table.proto
-[proton-application]: https://github.com/raystack/proton/tree/main/raystack/assets/v1beta2/application.proto
-[proton-model]: https://github.com/raystack/proton/tree/main/raystack/assets/v1beta2/model.proto
-[proton-assets]: https://github.com/raystack/proton/tree/main/raystack/assets/v1beta2
@@ -33,11 +33,11 @@ description: "string"
 url: "string"
 version: "string"
 inputs: # OPTIONAL
-  # Format: "urn:{service}:{scope}:{type}:{name}"
+  # Format: "urn:{source}:{scope}:{type}:{name}"
   - urn:bigquery:bq-raw-internal:table:bq-raw-internal:dagstream.production_feast09_s2id13_30min_demand
   - urn:kafka:int-dagstream-kafka.yonkou.io:topic:staging_feast09_s2id13_30min_demand
 outputs: # OPTIONAL
-  # Format: "urn:{service}:{scope}:{type}:{name}"
+  # Format: "urn:{source}:{scope}:{type}:{name}"
   - urn:kafka:1-my-kafka.com:topic:staging_feast09_mixed_granularity_demand_forecast_3es
 create_time: "2006-01-02T15:04:05Z"
 update_time: "2006-01-02T15:04:05Z"
@@ -62,34 +62,34 @@ following env vars are utilised for it:
 
 ## Outputs
 
-The application is mapped to an [`Asset`][proton-asset] with model specific
-metadata stored using [`Application`][proton-application]. Please refer the
-proto definitions for more information.
-
-| Field                       | Value                                                         | Sample Value                                                                   |
-| :-------------------------- | :------------------------------------------------------------ | :----------------------------------------------------------------------------- |
-| `resource.urn`              | `urn:application_yaml:{scope}:application:{application.name}` | `urn:application_yaml:integration:application:order-manager`                   |
-| `resource.name`             | `{application.name}`                                          | `order-manager`                                                                |
-| `resource.service`          | `application_yaml`                                            | `application_yaml`                                                             |
-| `resource.type`             | `application`                                                 | `application`                                                                  |
-| `resource.url`              | `{application.url}`                                           | `https://github.com/mycompany/order-manager`                                   |
-| `resource.description`      | `{application.description`                                    | `Order-Manager is the order management system for MyCompany`                   |
-| `application_id`            | `application.id`                                              | `0adf3214-676c-4a74-ab37-9d4a4b8ade0e`                                         |
-| `version`                   | `application.version`                                         | `d6ec883`                                                                      |
-| `create_time`               | `{application.create_time}`                                   | `2022-08-08T03:17:54Z`                                                         |
-| `update_time`               | `{application.update_time}`                                   | `2022-08-08T03:57:54Z`                                                         |
-| `ownership.owners[0].urn`   | `{application.team.id}`                                       | `9ebcc2f8-5894-47c6-83a9-160b7eaa3f6b`                                         |
-| `ownership.owners[0].name`  | `{application.team.name}`                                     | `Search`                                                                       |
-| `ownership.owners[0].email` | `{application.team.email}`                                    | `search@mycompany.com`                                                         |
-| `lineage.upstreams[].urn`   | `{application.inputs[]}`                                      | `urn:kafka:int-kafka.yonkou.io:topic:staging_30min_demand`                     |
-| `lineage.downstreams[].urn` | `{application.outputs[]}`                                     | `urn:bigquery:bq-internal:table:bq-internal:dagstream.production_30min_demand` |
-| `resource.labels`           | `map[string]string`                                           | `{"team": "Booking Experience"}`                                               |
+The extractor emits a Record containing an Entity and Edges.
+
+### Entity
+
+| Field               | Value                                                         | Sample Value                                                 |
+| :------------------ | :------------------------------------------------------------ | :----------------------------------------------------------- |
+| `urn`               | `urn:application_yaml:{scope}:application:{application.name}` | `urn:application_yaml:integration:application:order-manager` |
+| `name`              | `{application.name}`                                          | `order-manager`                                              |
+| `source`            | `application_yaml`                                            | `application_yaml`                                           |
+| `type`              | `application`                                                 | `application`                                                |
+| `description`       | `{application.description}`                                   | `Order-Manager is the order management system for MyCompany` |
+| `properties.url`    | `{application.url}`                                           | `https://github.com/mycompany/order-manager`                 |
+| `properties.id`     | `{application.id}`                                            | `0adf3214-676c-4a74-ab37-9d4a4b8ade0e`                      |
+| `properties.version`| `{application.version}`                                       | `d6ec883`                                                    |
+| `properties.create_time` | `{application.create_time}`                              | `2022-08-08T03:17:54Z`                                       |
+| `properties.update_time` | `{application.update_time}`                              | `2022-08-08T03:57:54Z`                                       |
+| `properties.labels` | `map[string]string`                                           | `{"team": "Booking Experience"}`                             |
+
+### Edges
+
+| Edge Type   | Description                             | Example                                                                            |
+|:------------|:----------------------------------------|:-----------------------------------------------------------------------------------|
+| `owned_by`  | Team ownership from `application.team`  | `source_urn: <app_urn>`, `target_urn: {team.id}`, `properties: {name, email}`      |
+| `lineage`   | Upstream from `application.inputs[]`    | `source_urn: {input_urn}`, `target_urn: <app_urn>`, `type: lineage`                |
+| `lineage`   | Downstream from `application.outputs[]` | `source_urn: <app_urn>`, `target_urn: {output_urn}`, `type: lineage`               |
 
 ## Contributing
 
 Refer to
 the [contribution guidelines](../../../docs/docs/contribute/guide.md#adding-a-new-extractor)
 for information on contributing to this module.
-
-[proton-asset]: https://github.com/raystack/proton/blob/fabbde8/raystack/assets/v1beta2/asset.proto#L14
-[proton-application]: https://github.com/raystack/proton/blob/fabbde8/raystack/assets/v1beta2/application.proto#L11