Skip to content

516 dynamic tables and columns#718

Open
Simon-Will wants to merge 34 commits intoOpenEnergyPlatform:developfrom
Simon-Will:516-dynamic-tables-and-columns
Open

516 dynamic tables and columns#718
Simon-Will wants to merge 34 commits intoOpenEnergyPlatform:developfrom
Simon-Will:516-dynamic-tables-and-columns

Conversation

@Simon-Will
Copy link
Copy Markdown

@Simon-Will Simon-Will commented Jan 24, 2026

A first attempt at creating tables based on the MaStR XSD files.

Checklist until this is really ready:

  • Download current documentation and create database model from it
  • Use new database model for all the insertion code
  • Use fallback XSD for when using the current documentation fails for some reason
  • Implement CSV export
    • I radically simplified the existing CSV export. It was pretty complex, joined several tables and backfilled the basic units table. I found that a bit much for an export. There's probably a way to make it work in much the same way as it used to work, but I frankly didn't want to spend the time to fully understand all of what's going on there. Let's talk about it!
  • Implement translation feature
  • Give the user an easy way to use the mastr_table_to_db_model returned from Mastr.generate_data_model. E.g. by adding a function that generates a Python code snippet with the SQLALchemy models/tables.
    • I solved this by having Mastr.generate_data_model return SQLAlchemy core tables, not ORM models. They are easy to just print and a user can then copy them to their code & modify them. They are also the best common ground. After all, some users might not use the ORM.
    • There's also a function format_mastr_table_to_db_table that makes printing easy for the user.
  • Clear up the date situation. I made a couple of changes to utils_download_bulk.py because I found date handling unnecessarily complex. Add interactive download functionality for MaStR date selection #696 #697 changes the same code and adds support for retrieving available XML download dates. If Add interactive download functionality for MaStR date selection #696 #697 is merged, we have to update this retrieval logic to also retrieve the documentation download dates.
    • I reverted my changes regarding the dates.
    • I added the docs download to the download browsing, etc. Note that some old XSD files are invalid (e.g. 20240101) and cannot be read with XMLSchema. We fall back to the XSD files in the library in that case.
  • Think about how we handle the transition from users' existing databases. Especially w.r.t. to translated databases and also all the renamed columns.
    • My proposal: Since it is extremely difficult to provide an upgrade path from the old table & column names to the way they are done now, I think we should just tell users to adjust their existing queries so they fit the newest open-mastr version. This is fine imo because this whole thing here will trigger a major version bump anyway.
    • Please let's talk about table name translations. I'm using the old names here in this new code, but would rather like to create new names that are closer to the names of the original MaStR export files.
  • Create usage examples
  • Address a couple of open points
    • How to determine primary key of tables? By hardcoding it for MaStR tables we know? Or by checking the available columns and choosing the most likely one based on some hierarchy (e.g. "Id > MastrNummer > EinheitMastrNummer > …") Cf. this code
      • I hard-coded the id column for the tables we know. For unknown future tables, a column "openMastrId" will be inserted. This is also done for the EinheitenAenderungNetzbetreiberzuordnungen table because there is no primary key.
    • How much do we want to adjust/normalize column names? Cf. this code
      • I decided to only do straightforward changes (MaStR -> MaStR, ß -> ss, deleting surrounding whitespace, etc.). No singularization/pluralization of column names à la VerknuepfteEinheitenMaStRNummern -> VerknuepfteEinheit.
    • Do we want to handle the case where adding only some columns to a table fails? Cf. this code
      • I decided not to add special handling for that.
  • Go through the library and remove newly obsolete code
  • Add tests

Type of change (CHANGELOG.md)

Added

  • Add the new method Mastr.generate_data_model that downloads the newest MaStR documentation and uses the XSD file to build SQLAlchemy models from the contained definitions

Updated

  • Update the method Mastr.download with two optional new arguments mastr_table_to_db_table, with which the user can pass their own database schema, and alter_database_tables, with which the user can prevent open-mastr from issuing any DDL statements.
  • Change CSV export by removing joining tables. The tables are now exported as they are.
  • Change the default names of tables and columns that are created and used for the import.

Removed

  • Remove the method Mastr.translate. The user can now get English table and column names by passing english=True to the generate_data_model or download method.

Workflow checklist

Automation

Closes #516
Closes #577

PR-Assignee

Reviewer

  • 🐙 Follow the Reviewer Guidelines
  • 🐙 Provided feedback and show sufficient appreciation for the work done

@Simon-Will Simon-Will marked this pull request as draft January 24, 2026 15:40
@Simon-Will Simon-Will force-pushed the 516-dynamic-tables-and-columns branch from c85c1fe to fc5bfec Compare January 24, 2026 15:45
@Simon-Will Simon-Will force-pushed the 516-dynamic-tables-and-columns branch from 8550e9a to f31622c Compare February 3, 2026 18:13
@Simon-Will Simon-Will force-pushed the 516-dynamic-tables-and-columns branch 2 times, most recently from 68eb2ad to 2158b8a Compare February 5, 2026 11:35
@pt-kkraemer
Copy link
Copy Markdown
Collaborator

pt-kkraemer commented Feb 9, 2026

On 19.02.2026 a new public version of the Gesamtdatenexport will have a bugfix concerning attributes in the .xsd files:
grafik

This will probably have no effect on what you have done already right?

@Simon-Will
Copy link
Copy Markdown
Author

Interesting, thanks for pointing that out, @pt-kkraemer! It shouldn't make any difference for now because we make all attributes except the primary keys nullable anyway.

But I'll be sure to check out the difference in the XSD files to make sure I understand that point correctly.

As for this whole PR, it's almost ready now as you can see from the checklist. If anyone already has comments on the approach, please let them be heard. I don't think I will make substantial changes to the non-testing code anymore.


# TODO: Should we really mess with the original column names?
# The BNetzA "choice" to sometimes write MaStR and sometimes Mastr is certainly confusing,
# but are we the ones who should change that?
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not be the ones to change that. But sadly we decided to do it anyway several years ago and now have to live with that decision 😮‍💨

@Simon-Will Simon-Will force-pushed the 516-dynamic-tables-and-columns branch from 01d3f9d to 9b92e57 Compare March 3, 2026 10:14
@Simon-Will
Copy link
Copy Markdown
Author

Simon-Will commented Mar 3, 2026

I just added some view functionality. Now I'm running the download & import with the current develop branch and will afterwards run the import again with the current state of this branch here to check if transitioning from an existing version will work. (This takes a while because the download takes a long time on my connection.)

Edit: Looks like it worked. Some column names of course change, but the views that preserve the old table names are in place.

@FlorianK13
Copy link
Copy Markdown
Member

FlorianK13 commented Mar 5, 2026

I just added some view functionality. Now I'm running the download & import with the current develop branch and will afterwards run the import again with the current state of this branch here to check if transitioning from an existing version will work. (This takes a while because the download takes a long time on my connection.)

Edit: Looks like it worked. Some column names of course change, but the views that preserve the old table names are in place.

So great that this View thing works 🎉 I still found one bug, namely that the views now get the TempID

image

Edit: This is not a bug in Views but it is for wind in general. Somehow in my local branch I have an additional commit from you called "Improve clarity around artificial primary keys" where the primary ID for wind is commented in sqlalchemy_tables.py - I have no idea where this is coming from, but it is not part of this PR so I guess you can ignore this error.

Copy link
Copy Markdown
Member

@FlorianK13 FlorianK13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments

dt = datetime.strptime(bulk_date_string, "%Y%m%d")
stichtag_url = (
"https://download.marktstammdatenregister.de/Stichtag/"
"Dokumentation%20MaStR%20Gesamdatenexport%20"
Copy link
Copy Markdown
Collaborator

@pt-kkraemer pt-kkraemer Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing "t" in "Gesamtdatenexport" or? I could not find a link with this particular error on the MaStR site

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, thanks a lot for catching that!

)
return True
log.info(
f"MaStR XML ZIP file already present but missing the following data: {bulk_data_list}"
Copy link
Copy Markdown
Collaborator

@pt-kkraemer pt-kkraemer Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comprehension question: what happens if only a part of the bulk_data_list is already downloaded? Would we not show here that the whole bulk_data_list is missing? Could be misleading.

Edit: I tested the following case where I first downloaded "biomass" and then ["biomass","wind"] and in the second run "biomass" got downloaded again:
grafik

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True! I think I missed the incremental nature of the download when refactoring this part of the code a bit. 😬

I now changed it up a bit more and it should be correct and incremental now.

@Simon-Will
Copy link
Copy Markdown
Author

Hi @FlorianK13 and @pt-kkraemer, thanks a lot for your reviews and suggestions! I addressed your comments with the two latest commits. Could you have another look?

@pt-kkraemer
Copy link
Copy Markdown
Collaborator

I just had the same error as #723 when using the latest pull from your fork @Simon-Will.
grafik

@FlorianK13 wrote in #723 that this will be fixed, I don't think it does. Maybe we can talk about this in our next dev-meeting?

@Simon-Will
Copy link
Copy Markdown
Author

All discussions resolved. I'm waiting for the 0.17 release and will then update this branch before it can be merged. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants