I'm working on datasets arithmetic operations with our internal representation using dataforge.Dataframe.
The current arithmetic test suite is using the following structure:
{
type,
columns,
resolve
}
(see https://github.com/InseeFr/VTL-Tools/blob/master/src/engine/tests/interpretors/arithmetic.spec.js#L14)
The idea here is to store type and role of a column, for example
{
Id_1: { type: VtlParser.STRING, role: VtlParser.DIMENSION },
[...]
}
This structure helps when doing arithmetic operations (eg. when summing two dataframes, we want to only apply the operator to columns having a NUMBER type and a MEASURE role)
On the other side, our current implementation of a dataset is simpler:
(see https://github.com/InseeFr/VTL-Tools/blob/master/src/engine/visitors/Variable.js#L52)
In this case, we return a generic shape for a variable, only providing the type and the resolve function. It seems unpractical then to add members to this structure depending on the type of the parsed variable (eg adding a columns field for a dataset).
But if we keep this simple structure, how to pass columns types and roles at runtime ?
Here is a third way: using the "metadata" information dataforge style.
Indeed, the dataforge Dataframe can be provided with informations about the columns, for example:
const df = new dataForge.DataFrame({
columnNames: ["Col1","Col2"],
columns: {
Col1: [{type: "NUMBER", role: "MEASURE"}]
},
rows: [
[1, 'hello'],
[5, 'computer'],
[10, 'good day']
]
});
It looks like the cleaner solution. Further parsing the dataset object provided by clients will be necessary though in order to create this columns object.
Do you validate this idea ?
I'm working on datasets arithmetic operations with our internal representation using dataforge.Dataframe.
The current arithmetic test suite is using the following structure:
(see https://github.com/InseeFr/VTL-Tools/blob/master/src/engine/tests/interpretors/arithmetic.spec.js#L14)
The idea here is to store type and role of a column, for example
This structure helps when doing arithmetic operations (eg. when summing two dataframes, we want to only apply the operator to columns having a NUMBER type and a MEASURE role)
On the other side, our current implementation of a dataset is simpler:
(see https://github.com/InseeFr/VTL-Tools/blob/master/src/engine/visitors/Variable.js#L52)
In this case, we return a generic shape for a variable, only providing the type and the resolve function. It seems unpractical then to add members to this structure depending on the type of the parsed variable (eg adding a
columnsfield for a dataset).But if we keep this simple structure, how to pass columns types and roles at runtime ?
Here is a third way: using the "metadata" information dataforge style.
Indeed, the dataforge Dataframe can be provided with informations about the columns, for example:
It looks like the cleaner solution. Further parsing the dataset object provided by clients will be necessary though in order to create this
columnsobject.Do you validate this idea ?