Skip to content

Columns information for a dataset implementation #53

@romaintailhurat

Description

@romaintailhurat

I'm working on datasets arithmetic operations with our internal representation using dataforge.Dataframe.

The current arithmetic test suite is using the following structure:

{
    type,
    columns,
    resolve
}

(see https://github.com/InseeFr/VTL-Tools/blob/master/src/engine/tests/interpretors/arithmetic.spec.js#L14)

The idea here is to store type and role of a column, for example

{
    Id_1: { type: VtlParser.STRING, role: VtlParser.DIMENSION },
[...]
}

This structure helps when doing arithmetic operations (eg. when summing two dataframes, we want to only apply the operator to columns having a NUMBER type and a MEASURE role)

On the other side, our current implementation of a dataset is simpler:

{
    type,
    resolve
}

(see https://github.com/InseeFr/VTL-Tools/blob/master/src/engine/visitors/Variable.js#L52)

In this case, we return a generic shape for a variable, only providing the type and the resolve function. It seems unpractical then to add members to this structure depending on the type of the parsed variable (eg adding a columns field for a dataset).

But if we keep this simple structure, how to pass columns types and roles at runtime ?

Here is a third way: using the "metadata" information dataforge style.

Indeed, the dataforge Dataframe can be provided with informations about the columns, for example:

const df = new dataForge.DataFrame({
  columnNames: ["Col1","Col2"],
  columns: {
    Col1: [{type: "NUMBER", role: "MEASURE"}]
  },
  rows: [
    [1, 'hello'],
    [5, 'computer'],
    [10, 'good day']
  ]
});

It looks like the cleaner solution. Further parsing the dataset object provided by clients will be necessary though in order to create this columns object.

Do you validate this idea ?

Metadata

Metadata

Labels

designDiscussing or reviewing design decisions

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions