Skip to content

NEW: Dependency config design and content#2812

Open
tech3371 wants to merge 2 commits intoIMAP-Science-Operations-Center:devfrom
tech3371:dependency_config_doc
Open

NEW: Dependency config design and content#2812
tech3371 wants to merge 2 commits intoIMAP-Science-Operations-Center:devfrom
tech3371:dependency_config_doc

Conversation

@tech3371
Copy link
Contributor

@tech3371 tech3371 commented Mar 3, 2026

Change Summary

closes IMAP-Science-Operations-Center/sds-data-manager#1151

Overview

File changes

This contains final design of new config file. It contains information such as filename convention, new file content and required/optional fields and defaults used. The part that I need feedback the most is the time range options and the example content.

Testing

- ``p`` - pointing
- ``h`` - hourly
- ``d`` - days
- ``l`` - last_processed
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to last_processed, we need nearest in the past.

But Hi wants nearest 7 irrespective of past or future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

options:
past_nearest, future_nearest, any_nearest.

or
(7p) - this means any future or past data.
(1n, 0n) - this means get me last nearest data from past.

What to do if hi science file event comes and then need to looks up for SWE dependency which is daily?
pointing number 9 file came, we look nearest 7 files, then derive date range using earlier and latest pointing id of that 7 files and look data range from the pointing table. Then use date range to query for SPICE and other dependency. If dependency is found,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one class for daily and pointing. Then parent class can be inherit those as needed for ENA or not.


This step is what determines if a instrument and level is ready for processing, by checking dependencies. For each file that arrives, the system checks to see what the downstream dependencies are -
meaning, what future files need this file in order to complete processing. For example, if a MAG L1A file arrived, this step would determine that the MAG L1B ``mago`` and ``magi`` files are dependent on
This step is what determines if a instrument and level is ready for processing, by checking dependencies.
Copy link
Contributor

@subagonsouth subagonsouth Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This step is what determines if a instrument and level is ready for processing, by checking dependencies.
After indexing, the batch starter lambda is triggered in order to determine what jobs may be ready for processing.

Comment on lines +56 to +58
For each file that arrives, the system checks to see what the downstream dependencies are -
meaning, what future files need this file in order to complete processing. For example, if a MAG L1A
file arrived, this step would determine that the MAG L1B ``mago`` and ``magi`` files are dependent on
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For each file that arrives, the system checks to see what the downstream dependencies are -
meaning, what future files need this file in order to complete processing. For example, if a MAG L1A
file arrived, this step would determine that the MAG L1B ``mago`` and ``magi`` files are dependent on
For each file that arrives, the system checks to see what jobs may need to be run by looking at the downstream dependencies are. For example, if a MAG L1A
file arrived, this step would determine that the MAG L1B ``mago`` and ``magi`` files are dependent on


The status of different files is recorded in the status tracking table. This table records the status of each anticipated output file as "in progress", "complete", or "failed." Through this,
we can track processing for specific files and determine if a file exists quickly.
Then, for each anticipated job, the batch starter process checks to see if all the upstream
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Then, for each anticipated job, the batch starter process checks to see if all the upstream
Then, for each possible job, the batch starter process checks to see if all the upstream

dependencies are met. Although we know we have one of the upstream dependencies for an
expected job, it's possible that there are other required dependencies that have not yet
arrived. If we are missing any required dependencies, then the system does not kick off the
processing job. When the missing file arrives, it will trigger the same process of checking
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
processing job. When the missing file arrives, it will trigger the same process of checking
processing job. When the missing upstream dependency arrives, it will trigger the same process of checking

Comment on lines +80 to +82
The status of different files is recorded in the status tracking table. This table records
the status of each anticipated output file as "in progress", "complete", or "failed." Through
this, we can track processing for specific files and determine if a file exists quickly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is talking about the ProcessingJob table right?

Suggested change
The status of different files is recorded in the status tracking table. This table records
the status of each anticipated output file as "in progress", "complete", or "failed." Through
this, we can track processing for specific files and determine if a file exists quickly.
The status of each job is recorded in the status tracking table as "in progress", "complete", or "failed." Through this, we can track processing for specific files and determine if a file exists quickly.

I think that one piece that we are missing is checking for upstream dependencies that have jobs that are "in progress". I have added such a check to the Hi Goodtimes special handling. The idea is to avoid race conditions where multiple jobs for the same product get triggered in fast succession.

~~~~~~~~~~~~~~~~~~~~

Primary descriptor can be one of the following:
Upstream Product Name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally like Descriptor better for this. I think that product name has several possible meanings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool. Then let's keep descriptor and I will make remaining changes.

- For science or ancillary data, the product names are defined by the instrument and SDC.

- For ``spin`` and ``repoint`` data types, ``historical`` is the only valid descriptor.
- For ``spice`` data types, ``historical`` and ``best`` are the valid product names.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about predict? I specify Hi spice dependencies individually for example:

ephemeris_reconstructed, spice, historical, hi, l1c, 45sensor-pset, HARD_NO_TRIGGER, DOWNSTREAM
ephemeris_predicted, spice, best, hi, l1c, 45sensor-pset, HARD_NO_TRIGGER, DOWNSTREAM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we had intension of using 'best' at one time but may be we didn't enforce it. We can remove that option.

~~~~~~~~~~~~~~~~~~~~

Same as primary_data_type, but for the dependent file.
Kickoff_job (Optional)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My nitpick on terminology: It seems like we use trigger more widely.

Suggested change
Kickoff_job (Optional)
Trigger_job (Optional)

- (imap_frames, spice, historical)

(l1b, 45sensor-goodtimes):
- (hi, l1b, 45sensor-de, true, true, (-3p, 3p))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possible way to do this:

(l1b, 45sensor-goodtimes):
      # Entry for getting the 7 nearest pointings
      - (hi, l1b, 45sensor-de, {required: true, trigger: true, nearest: 7p})
      # Entry for getting the past 3 and future 3 pointints, if they exist
      - (hi, l1b, 45sensor-de, {required: false, trigger: true, past: 3p, future: 3p})
      # Entry for getting the 3 nearest available pointings in the past
      - (hi, l1b, 45sensor-de, {required: false, trigger: true, nearest_past: 3p})

Co-authored-by: Tim Plummer <timothy.plummer@lasp.colorado.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: New dependency config

2 participants