Columns information for a dataset implementation

I'm working on datasets arithmetic operations with our internal representation using [dataforge.Dataframe](https://data-forge.github.io/data-forge-ts/classes/dataframe.html).

The current arithmetic test suite is using the following structure:

```javascript
{
    type,
    columns,
    resolve
}
```

(see https://github.com/InseeFr/VTL-Tools/blob/master/src/engine/tests/interpretors/arithmetic.spec.js#L14)

The idea here is to store type and role of a column, for example

```javascript
{
    Id_1: { type: VtlParser.STRING, role: VtlParser.DIMENSION },
[...]
}
```

This structure helps when doing arithmetic operations (eg. when summing two dataframes, we want to only apply the operator to columns having a NUMBER type and a MEASURE role)

On the other side, our current implementation of a dataset is simpler:

```javascript
{
    type,
    resolve
}
```

(see https://github.com/InseeFr/VTL-Tools/blob/master/src/engine/visitors/Variable.js#L52)

In this case, we return a generic shape for a variable, only providing the type and the resolve function. It seems unpractical then to add members to this structure depending on the type of the parsed variable (eg adding a `columns` field for a dataset).

But if we keep this simple structure, how to pass columns types and roles at runtime ?

Here is a third way: using the "metadata" information dataforge style.

Indeed, the [dataforge Dataframe can be provided](https://data-forge.github.io/data-forge-ts/classes/dataframe.html#constructor) [with informations about the columns](https://data-forge.github.io/data-forge-ts/interfaces/idataframeconfig.html#columns), for example:

```javascript
const df = new dataForge.DataFrame({
  columnNames: ["Col1","Col2"],
  columns: {
    Col1: [{type: "NUMBER", role: "MEASURE"}]
  },
  rows: [
    [1, 'hello'],
    [5, 'computer'],
    [10, 'good day']
  ]
});
```

It looks like the cleaner solution. Further parsing the dataset object provided by clients will be necessary though in order to create this `columns` object.

Do you validate this idea ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Columns information for a dataset implementation #53

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Columns information for a dataset implementation #53

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions