Skip to content

COO-173 DAG based pipeline engine#4

Open
Yashvi-Sharma wants to merge 6 commits intomainfrom
COO-173_dag
Open

COO-173 DAG based pipeline engine#4
Yashvi-Sharma wants to merge 6 commits intomainfrom
COO-173_dag

Conversation

@Yashvi-Sharma
Copy link
Collaborator

Major upgrades to the pipeline engine:

  • Pipeline config can have multiple pipelines, each with their own tasks
  • Lazy execution is supported at the pipeline level, i.e. all tasks in that pipeline are executed lazily or eagerly, other pipelines can have a different setting.
  • Double DAG order: a DAG is constructed per pipeline, and one DAG is constructed for inter-pipeline dependencies
  • Tasks/Pipelines in the same generation of DAG order are executed in parallel

Added a base class for preprocessing tasks to reduce redundant code, updated outputs to conform to dict structure.

Minor updates and fixes to utils, data models, and tasks

Copy link

@weatherhead99 weatherhead99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this all looks really nice.

Haven't run anything to test yet though, we should probably work on getting an example script ready to go on real DEIMOS data just so we have something to run

run:
params:
input_source: "/Users/yashvi/Desktop/Detector Characterization Tools/DTU_dettest/DTU_singledet_acceptance/PTC/SCI/20250812-101350/*_bias_*.fits"
identifier_func: tasks.custom.guess_image_type_from_filename_DEIMOS

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this all looks great so far. I'm a little concerned about the absolute paths here. Hopefully that's just a way for you to get this running, but do we need to think about a way of having these configurations be transplantable?

(one way would be to use environment variables here.. shudders). Another way would be to allow e.g. jinja template variables and allow a local per-computer config to be loaded as well on the command line for variable substitutions.

elif isinstance(data, xr.DataArray):
return data.values
else:
raise TypeError("data must be an xarray.DataArray, or numpy.ndarray")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be a place to use a match: case rather than if else

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants