diff --git a/docs/source/FAQ/predictor_evaluation_migration.rst b/docs/source/FAQ/predictor_evaluation_migration.rst index 47fb83f35..2a6e0dfd3 100644 --- a/docs/source/FAQ/predictor_evaluation_migration.rst +++ b/docs/source/FAQ/predictor_evaluation_migration.rst @@ -5,7 +5,7 @@ Migrating to Predictor Evaluations Summary ======= -In version 4.0, :py:class:`Predictor Evaluation Workflows ` and :py:class:`Predictor Evaluation Executions ` (collectively, PEWs) will be merged into a single entity called :py:class:`Predictor Evaluations `. The new entity will retain the functionality of its predecessors, while simplyfing interactions with it. And it will support the continuing evolution of the platform. +`Predictor Evaluation Workflows` and `Predictor Evaluation Executions` (collectively, PEWs) have been merged into a single entity called :py:class:`Predictor Evaluations `. The new entity retains the functionality of its predecessors, while simplyfing interactions with it. And it will support the continuing evolution of the platform. Basic Usage =========== diff --git a/docs/source/data_extraction.rst b/docs/source/data_extraction.rst index 0fc53e6f1..486340991 100644 --- a/docs/source/data_extraction.rst +++ b/docs/source/data_extraction.rst @@ -102,7 +102,7 @@ Assume that the process template is accessible from a known Project, ``project`` table_config = table_config.add_all_ingredients( process_template=LinkByUID('id', '3a308f78-e341-f39c-8076-35a2c88292ad'), - project=project, + team=team, quantity_dimension=IngredientQuantityDimension.MASS ) diff --git a/docs/source/getting_started/ai_modules.rst b/docs/source/getting_started/ai_modules.rst index db48a9e10..ca59c3a03 100644 --- a/docs/source/getting_started/ai_modules.rst +++ b/docs/source/getting_started/ai_modules.rst @@ -41,4 +41,4 @@ They also include a :doc:`Score <../workflows/scores>` which codifies goals of t Predictor Evaluation ^^^^^^^^^^^^^^^^^^^^ -:doc:`Predictor Evaluations <../workflows/predictor_evaluation_workflows>` analyze the quality of a Predictor. +:doc:`Predictor Evaluations <../workflows/predictor_evaluations>` analyze the quality of a Predictor. diff --git a/docs/source/getting_started/basic_functionality.rst b/docs/source/getting_started/basic_functionality.rst index d56a8d4a8..df4b52f48 100644 --- a/docs/source/getting_started/basic_functionality.rst +++ b/docs/source/getting_started/basic_functionality.rst @@ -59,18 +59,18 @@ Similarly, the ``wait_while_executing`` function will wait for a design or predi Checking Status --------------- -After registering an asset, the ``status`` command can be used to obtain a static readout of the state of the asset on the platform (e.g., VALID, INVALID, VALIDATING, SUCCEEDED, FAILED, INPROGRESS). +After registering an asset, the ``status`` command can be used to obtain a static readout of the state of the asset on the platform (e.g., READY, INVALID, VALIDATING, SUCCEEDED, FAILED, INPROGRESS). .. code-block:: python sintering_model = sintering_project.predictors.register(sintering_model) sintering_model.status -The ``status_info`` command returns additional details about an asset's status that can be very useful for debugging. +The ``status_detail`` command returns additional details about an asset's status that can be very useful for debugging. .. code-block:: python - sintering_model.status_info + sintering_model.status_detail Reading ------- diff --git a/docs/source/getting_started/code_examples.rst b/docs/source/getting_started/code_examples.rst index 3236cf336..c6c221e95 100644 --- a/docs/source/getting_started/code_examples.rst +++ b/docs/source/getting_started/code_examples.rst @@ -38,17 +38,17 @@ Note that all resources are given descriptive names and summaries. print("My new team has name {} and id {}".format( band_gaps_team.name, band_gaps_team.uid)) - Strehlow_Cook_description = "Band gaps for elemental and binary " \ + strehlow_cook_description = "Band gaps for elemental and binary " \ "semiconductors with phase and temperature of measurement. DOI 10.1063/1.3253115" - Strehlow_Cook_dataset = Dataset(name="Strehlow and Cook", - summary="Strehlow and Cook band gaps", description=Strehlow_Cook_description) - Strehlow_Cook_dataset = band_gaps_team.datasets.register(Strehlow_Cook_dataset) + strehlow_cook_dataset = Dataset(name="Strehlow and Cook", + summary="Strehlow and Cook band gaps", description=strehlow_cook_description) + strehlow_cook_dataset = band_gaps_team.datasets.register(strehlow_cook_dataset) Find an existing Team and Dataset ------------------------------------ Often you will work with existing resources. -The code below retrieves a Team with the name "Copper oxides team" and a dataset with a known unique id that is stored as ``dataset_A_uid``. +The code below retrieves a Team with the name "Copper oxides team" and a dataset with a known unique id that is stored as ``dataset_a_uid``. For more information on retrieving resources, see :ref:`Reading Resources `. .. code-block:: python @@ -58,7 +58,7 @@ For more information on retrieving resources, see :ref:`Reading Resources `. @@ -47,11 +46,11 @@ Assume you have a "band gaps team" with known id, ``band_gaps_team_id``, and are band_gaps_team = citrine.teams.get(band_gaps_team_id) # create the Dataset object - Strehlow_Cook_description = "Band gaps for elemental and binary semiconductors with phase and temperature of measurement. DOI 10.1063/1.3253115" - Strehlow_Cook_dataset = Dataset(name="Strehlow and Cook", summary="Strehlow and Cook band gaps", description=Strehlow_Cook_description) + strehlow_cook_description = "Band gaps for elemental and binary semiconductors with phase and temperature of measurement. DOI 10.1063/1.3253115" + strehlow_cook_dataset = Dataset(name="Strehlow and Cook", summary="Strehlow and Cook band gaps", description=strehlow_cook_description) # pass the Dataset object to the registration endpoint - Strehlow_Cook_dataset = band_gaps_team.datasets.register(Strehlow_Cook_dataset) + strehlow_cook_dataset = band_gaps_team.datasets.register(strehlow_cook_dataset) Deleting a Dataset diff --git a/docs/source/getting_started/projects.rst b/docs/source/getting_started/projects.rst index 85cc1fd9c..ecc479bff 100644 --- a/docs/source/getting_started/projects.rst +++ b/docs/source/getting_started/projects.rst @@ -5,7 +5,7 @@ Projects A Project is the basic container for AI Assets on the Citrine Platform, such as GEM Tables, Predictors, Design Spaces, and Design Workflows. Access rights on resources inside a Project are managed, granted, and revoked at the Team level. -Users are individuals using the Citrine Platform, and they are made members of Team. +Users are individuals using the Citrine Platform, and they are made members of a Team. A user who is a member of a Team has access to all of the Projects that the Team has access to. Every interaction with every other type of resource is scoped to a single Team. @@ -33,8 +33,15 @@ To retrieve a Project in the team, either find the Project in the list: project_name = "Copper oxides project" all_projects = team.projects.list() - copper_oxides_project = next((project for project in all_projects - if project.name == project_name), None) + project = next((project for project in all_projects if project.name == project_name), None) + +or use the :func:`~citrine.seeding.find_or_create.find_or_create_project` convenience method: + +.. code-block:: python + + from citrine.seeding.find_or_create import find_or_create_project + project_name = "Copper oxides project" + project = find_or_create_project(project_collection=team.projects, project_name=project_name) or get it by unique identifier: diff --git a/docs/source/workflows/code_examples.rst b/docs/source/workflows/code_examples.rst index e3c01cf14..bbe85d8da 100644 --- a/docs/source/workflows/code_examples.rst +++ b/docs/source/workflows/code_examples.rst @@ -79,8 +79,7 @@ This pattern is also extremely useful for performing optimization over complex o outputs = [ final_ph, final_loaf_hydration, - ], - training_data=[training_table] + ] ) shelf_life_calculator = ExpressionPredictor( @@ -104,7 +103,8 @@ This pattern is also extremely useful for performing optimization over complex o dough_hydration_calculator, physical_properties_predictor, shelf_life_calculator - ] + ], + training_data=[training_table] ) .. |Bread Predictor Graph Visualization| image:: bread_predictor_graph_viz.jpg diff --git a/docs/source/workflows/data_sources.rst b/docs/source/workflows/data_sources.rst index 0ee5b9e42..210f1f5d5 100644 --- a/docs/source/workflows/data_sources.rst +++ b/docs/source/workflows/data_sources.rst @@ -19,7 +19,7 @@ The example below assumes that the uuid and the version of the desired GEM Table .. code:: python from citrine.informatics.data_sources import GemTableDataSource - from citrine.informatics.predictors import AutoMLPredictor + from citrine.informatics.predictors import AutoMLPredictor, GraphPredictor from citrine.informatics.descriptors import RealDescriptor, CategoricalDescriptor, ChemicalFormulaDescriptor data_source = GemTableDataSource( @@ -27,7 +27,7 @@ The example below assumes that the uuid and the version of the desired GEM Table table_version = "2" ) - predictor = AutoMLPredictor( + auto_ml_predictor = AutoMLPredictor( name = "Band gap predictor", description = "Predict the band gap from the chemical formula and crystallinity", inputs = [ @@ -35,9 +35,15 @@ The example below assumes that the uuid and the version of the desired GEM Table CategoricalDescriptor("terminal~crystallinity", categories=[ "Single crystalline", "Amorphous", "Polycrystalline"]) ], - outputs = [RealDescriptor("terminal~band gap", lower_bound=0, upper_bound=20, units="eV")], + outputs = [RealDescriptor("terminal~band gap", lower_bound=0, upper_bound=20, units="eV")] + ) + + predictor = GraphPredictor( + name = "Root predictor", + predictors = [auto_ml_predictor], training_data = [data_source] ) + Note that the descriptor keys above are the headers of the *variable* not the column in the table. The last term in the column header is a suffix associated with the specific column definition rather than the variable. diff --git a/docs/source/workflows/descriptors.rst b/docs/source/workflows/descriptors.rst index 995e4e44c..2ef75b211 100644 --- a/docs/source/workflows/descriptors.rst +++ b/docs/source/workflows/descriptors.rst @@ -7,7 +7,7 @@ Descriptors allow users to define a controlled vocabulary with which to describe Each descriptor defines a term in that vocabulary, which is comprised of a name, a datatype, and bounds on that data type. If you are familiar with the GEMD data model, descriptors are roughly equivalent to :class:`AttributeTemplates `. -The AI Engine currently supports 5 kinds of descriptors: +The AI Engine currently supports 6 kinds of descriptors: - `Real Descriptors <#real-descriptor>`__ - `Integer Descriptor <#integer-descriptor>`__ diff --git a/docs/source/workflows/design_spaces.rst b/docs/source/workflows/design_spaces.rst index e6441acb1..5b8f71c03 100644 --- a/docs/source/workflows/design_spaces.rst +++ b/docs/source/workflows/design_spaces.rst @@ -5,61 +5,18 @@ A Design Space defines a set of materials that should be searched over when perf Design Spaces must be registered to be used in a :doc:`design workflow `. Currently, there are four Design Spaces: -- `EnumeratedDesignSpace <#enumerated-design-space>`__ - `ProductDesignSpace <#product-design-space>`__ +- `HierarchicalDesignSpace <#hierarchical-design-space>`__ - `DataSourceDesignSpace <#data-source-design-space>`__ - `FormulationDesignSpace <#formulation-design-space>`__ -Enumerated design space ------------------------ - -An :class:`~citrine.informatics.design_spaces.enumerated_design_space.EnumeratedDesignSpace` is composed of an explicit list of candidates. -Each candidate is specified using a dictionary keyed on the key of a corresponding :class:`~citrine.informatics.descriptors.Descriptor`. -A list of descriptors defines what key-value pairs must be present in each candidate. -If a candidate is missing a descriptor key-value pair, contains extra key-value pairs or any value is not valid for the corresponding descriptor, it is removed from the design space. - -As an example, an enumerated design space that represents points from a 2D Cartesian coordinate system can be created using the Citrine Python client: - -.. code:: python - - from citrine.informatics.descriptors import RealDescriptor - from citrine.informatics.design_spaces import EnumeratedDesignSpace - - x = RealDescriptor(key='x', lower_bound=0, upper_bound=10, units="") - y = RealDescriptor(key='y', lower_bound=0, upper_bound=10, units="") - descriptors = [x, y] - - # create a list of candidates - # invalid candidates will be removed from the design space - candidates = [ - {'x': 0, 'y': 0}, - {'x': 0, 'y': 1}, - {'x': 2, 'y': 3}, - {'x': 10, 'y': 10}, - # invalid because x > 10 - {'x': 11, 'y': 10}, - # invalid because z isn't in descriptors - {'x': 11, 'y': 10, 'z': 0}, - # invalid because y is missing - {'x': 10} - ] - - design_space = EnumeratedDesignSpace( - name='2D coordinate system', - description='Design space that contains (x, y) points', - descriptors=descriptors, - data=candidates - ) - - registered_design_space = project.design_spaces.register(design_space) - -Product design space +Product Design Space -------------------- Materials from a :class:`~citrine.informatics.design_spaces.product_design_space.ProductDesignSpace` are composed of the `Cartesian product`_ of independent factors. -Each factor can be a separate design space _or_ a univariate dimension. +Each factor can be a separate design space *or* a univariate dimension. Any other type of design space can be a valid subspace. -Subspaces can either be registered on the platform and referenced through their uid, or they can be defined anonymously and embedded in the product design space. +Subspaces are defined anonymously and embedded in the product design space. A :class:`~citrine.informatics.dimensions.Dimension` defines valid values of a single variable. Valid values can be discrete sets (i.e., enumerated using a list) or continuous ranges (i.e., defined by upper and lower bounds on real numbers). @@ -115,29 +72,23 @@ Note, each factor must be **independent**. This means that the same descriptor may not appear more than once in a product design space. As an example, let's create a produt design space that defines the ways in which we might mix two pigments together and stir at some temperature. -We are only interested in specific amounts of each pigment, so we create an enumerated design space that defines the amounts we wish to test. +We are only interested in specific amounts of each pigment, so we create a data source design space that references a data source defining the amounts we wish to test. The mixing speed is discrete, so we describe it with an enumerated dimension. And temperature is described by a continuous dimension. .. code:: python - from citrine.informatics.descriptors import RealDescriptor, CategoricalDescriptor + from citrine.informatics.data_sources import GemTableDataSource + from citrine.informatics.descriptors import CategoricalDescriptor, RealDescriptor + from citrine.informatics.design_spaces import DataSourceDesignSpace, ProductDesignSpace from citrine.informatics.dimensions import ContinuousDimension, EnumeratedDimension - from citrine.informatics.design_spaces import ProductDesignSpace, EnumeratedDesignSpace - pigmentA_descriptor = RealDescriptor(key='Amount of Pigment A', lower_bound=0, upper_bound=100, units='g') - pigmentB_descriptor = RealDescriptor(key='Amount of Pigment B', lower_bound=0, upper_bound=100, units='g') - enumerated_space = EnumeratedDesignSpace( + pigment_data_source = data_source=GemTableDataSource(table_id=table_id, table_version=table_version) + enumerated_space = DataSourceDesignSpace( name="amounts of pigments A and B", description="total amount of pigment is 100 grams", - data=[ - {'Amount of Pigment A': 10.0, 'Amount of Pigment B': 90.0}, - {'Amount of Pigment A': 15.0, 'Amount of Pigment B': 85.0}, - {'Amount of Pigment A': 20.0, 'Amount of Pigment B': 80.0} - ] + data_source=pigment_data_source ) - enumerated_space_registered = project.design_spaces.register(enumerated_space) - enumerated_space_uid = enumerated_space_registered.uid temp_descriptor = RealDescriptor(key='Temperature', lower_bound=273, upper_bound=1000, units='K') temp_dimension = ContinuousDimension(descriptor=temp_descriptor, lower_bound=300, upper_bound=400) @@ -148,15 +99,12 @@ And temperature is described by a continuous dimension. product_space = ProductDesignSpace( name="Mix 2 pigments at some speed and temperature", description="Pigments A and B, temperatures between 300 and 400 K, and either Slow or Fast", - subspaces=[enumerated_space_uid], + subspaces=[enumerated_space], dimensions=[temp_dimension, speed_dimension] ) product_space = project.design_spaces.register(product_space) -In the approach shown above, the enumerated design space is registered on-platform and can be used in other contexts. -It would also be valid, however, to not register the enumerated design space and to include it in the product design space directly as opposed to through its uid: `subspaces=[enumerated_space]`. - The enumerated design space defined in this way might product the following candidates: .. code:: python @@ -172,33 +120,43 @@ The enumerated design space defined in this way might product the following cand ... # enumerated factors repeat while continuously sampling Temperature ] +Hierarchical Design Space +------------------------- + Data Source Design Space ------------------------ -A :class:`~citrine.informatics.design_spaces.data_source_design_space.DataSourceDesignSpace` is similar in spirit to an enumerated design space, but the candidates are drawn from an existing data source instead of being specified through a list of dictionaries. +A :class:`~citrine.informatics.design_spaces.data_source_design_space.DataSourceDesignSpace` draws its candidates from an existing data source. Any data source can be used and no additional information is needed. +When registered, this type of design space must be a subspace of a :class:`~citrine.informatics.design_spaces.product_design_space.ProductDesignSpace`. For example, assume you have a :class:`~citrine.resources.gemtables.GemTable` that contains one :class:`~citrine.gemtables.rows.Row` for each candidate that you wish to test. -Assume the table's `table_id` and `table_version` are known. +Assume the table's ``table_id`` and ``table_version`` are known. The example code below creates a registers a design space based on this Gem Table. .. code:: python from citrine.informatics.data_sources import GemTableDataSource - from citrine.informatics.design_spaces import DataSourceDesignSpace + from citrine.informatics.design_spaces import DataSourceDesignSpace, ProductDesignSpace data_source = GemTableDataSource( table_id=table_id, table_version=table_version ) - design_space = DataSourceDesignSpace( + data_source_design_space = DataSourceDesignSpace( name="my candidates", description="450 potential formulations", data_source=data_source ) + design_space = ProductDesignSpace( + name="top-level design space", + description="contains a single data source design space.", + subspaces=[data_source_design_space] + ) + registered_design_space = project.design_spaces.register(design_space) Formulation Design Space @@ -246,6 +204,8 @@ Ingredient fractions in recipes sampled from a formulation design space will alw Label information defines which labels are applied to each ingredient in the recipe. These labels will always be a subset of all labels from the design space. +When registered, this type of design space must be a subspace of a :class:`~citrine.informatics.design_spaces.product_design_space.ProductDesignSpace` or part of a :class:`~citrine.informatics.design_spaces.hierarchical_design_space.HierarchicalDesignSpace`. + The following demonstrates how to create a formulation design space of saline solutions containing three ingredients: water, salt, and boric acid (a common antiseptic). We will require that formulations contain 2 ingredients, that no more than 1 solute is present, and that the total fraction of water is between 0.95 and 0.99. @@ -274,7 +234,7 @@ We will require that formulations contain 2 ingredients, that no more than 1 sol IngredientFractionConstraint(formulation_descriptor=descriptor, ingredient="water", min=0.95, max=0.99) } - design_space = FormulationDesignSpace( + formulation_design_space = FormulationDesignSpace( name = "Saline solution design space", description = "Composes formulations from water, salt, and boric acid", formulation_descriptor = descriptor, @@ -282,6 +242,12 @@ We will require that formulations contain 2 ingredients, that no more than 1 sol labels = labels, constraints = constraints ) + + design_space = ProductDesignSpace( + name="top-level design space", + description="contains a single formulation design space.", + subspaces=[formulation_design_space] + ) registered_design_space = project.design_spaces.register(design_space) diff --git a/docs/source/workflows/design_workflows.rst b/docs/source/workflows/design_workflows.rst index afc11fb5e..ae10cb01f 100644 --- a/docs/source/workflows/design_workflows.rst +++ b/docs/source/workflows/design_workflows.rst @@ -30,9 +30,9 @@ The following example demonstrates how to use the Citrine Python client to regis # print final validation status validated_workflow = project.design_workflows.get(workflow.uid) print(validated_workflow.status) - # status info will contain relevant validation information + # status detail will contain relevant validation information # (i.e. why the workflow is valid/invalid) - print(validated_workflow.status_info) + print(validated_workflow.status_detail) Execution and results @@ -148,13 +148,13 @@ Branches -------- Branches are purely an organizational concept, used to group design workflows with similar goals under a single name. -They are the primary organizational concept of AI assets as displayed in our web UI. +They are the primary organizational concept of AI assets as displayed in the web UI. In the context of the Citrine Python client, they can be thought of as a bucket of design workflows. -If you do not wish to interact with them in the python client, ignore the ``branch_id`` on a DesignWorkflow, and it will be handled for you. +They are managed for you in the python client, but you can view the associated ``branch_root_id`` and ``branch_version`` on a :class:`~citrine.informatics.workflows.design_workflow.DesignWorkflow`. A branch has a name, along with any number of design workflows. A DesignWorkflow can be created and retrieved, and you can list all design workflows on a branch. -You can still list all design workflows on the project as before. +You can also list all design workflows in the project directly. .. code:: python @@ -181,15 +181,15 @@ You can still list all design workflows on the project as before. # print final validation status validated_workflow = branch.design_workflows.get(workflow.uid) print(validated_workflow.status) - # status info will contain relevant validation information + # status detail will contain relevant validation information # (i.e. why the workflow is valid/invalid) - print(validated_workflow.status_info) + print(validated_workflow.status_detail) When you're done with a branch, it can be archived, removing it from the results of ``list`` and setting the ``archived`` flag. ``list_archived`` lists all archived branches in a project. An archived branch can be restored via its unique ID. -Note that archiving branches is independent of archiving the design workflows contained within it. +Archiving branches is independent of archiving the design workflows contained within it. Archiving a branch will hide the entire branch from default displays in the web UI. As a result, the design workflows it contained within it will also be hidden. Yet archiving th branch will *not* change the archived status of the contained design workflows in the context of design workflow listing methods. diff --git a/docs/source/workflows/getting_started.rst b/docs/source/workflows/getting_started.rst index 22a226778..6d9aca59b 100644 --- a/docs/source/workflows/getting_started.rst +++ b/docs/source/workflows/getting_started.rst @@ -11,7 +11,7 @@ These capabilities include generating candidates for Sequential Learning, identi Workflows Overview ------------------ -Currently, there are two workflows on the AI Engine: the :doc:`DesignWorkflow ` and the :doc:`PredictorEvaluation `. +Currently, there are two workflows on the AI Engine: the :doc:`DesignWorkflow ` and the :doc:`PredictorEvaluation `. There are two different types of modules, and these are discussed in greater detail below. Design Workflow @@ -35,12 +35,11 @@ Branches ######## A ``Branch`` is a named container which can contain any number of design workflows, and is purely a tool for organization. -If you do not see branches in the Citrine Platform, you do not need to change how you work with design workflows. They will contain an additional field ``branch_id``, which you can ignore. Predictor Evaluation ******************** -The :doc:`PredictorEvaluation ` is used to analyze a :doc:`Predictor `. +The :doc:`PredictorEvaluation ` is used to analyze a :doc:`Predictor `. They helps users understand how well their predictor module works with their data: in essence, it describes the trustworthiness of their model. These outcomes are captured in a series of response metrics. @@ -51,8 +50,8 @@ Modules are re-usable computational tools used to construct workflows. The modules dictate how the platform utilizes research data to generate computational results. There are 2 types of modules on the platform: -- :doc:`Design Spaces ` define the domain of controllable experimental parameters, their allowable values and relevant bounds. -- :doc:`Predictors ` define relations between variables in a table of experimental data. +* :doc:`Design Spaces ` define the domain of controllable experimental parameters, their allowable values and relevant bounds. +* :doc:`Predictors ` define relations between variables in a table of experimental data. A predictor can be composed of machine-learned models, featurizers, and analytical relations. .. _archiving_label: @@ -61,7 +60,7 @@ Archiving ********* Modules and workflows start active by default when created. -An archived resource will not show up when listing, and an archived module cannot be used in workflows. +An archived resource will not show up when listing, although an archived module can be used in workflows. To archive a resource with a known ``uid``, use the ``.archive()`` method of the relevant collection (e.g., :meth:`DesignWorkflowCollection.archive() `). Use ``.restore()`` to un-archive the resource. @@ -69,13 +68,13 @@ Use ``.restore()`` to un-archive the resource. Registration and validation --------------------------- -Both modules and workflows are registered with a project and validated before they are ready for use. Once registered, validation occurs automatically. +Both modules and workflows are registered with a project and validated before they are ready for use. When registered via the Citrine Python client, validation occurs automatically. Validation status can be one of the following states: -- **Created:** The module/workflow has been registered with a project and has been queued for validation. +- **Created:** The module/workflow has been registered with a project, but validation has not begun. - **Validating:** The module/workflow is currently validating. The status will be updated to one of the subsequent states upon completion. - **Invalid:** Validation completed successfully but found errors with the workflow/module. - **Ready:** Validation completed successfully and found no errors. - **Error:** Validation did not complete. An error was raised during the validation process that prevented an invalid or ready status to be determined. -Validation of a workflow and all constituent modules must complete with ready status before the workflow can be executed. +Validation of a workflow and all constituent modules must complete with Ready status before the workflow can be executed. diff --git a/docs/source/workflows/index.rst b/docs/source/workflows/index.rst index 865ea3960..e09214327 100644 --- a/docs/source/workflows/index.rst +++ b/docs/source/workflows/index.rst @@ -15,6 +15,6 @@ AI Engine scores data_sources predictor_reports - predictor_evaluation_workflows + predictor_evaluations generative_design code_examples diff --git a/docs/source/workflows/predictor_evaluation_workflows.rst b/docs/source/workflows/predictor_evaluations.rst similarity index 98% rename from docs/source/workflows/predictor_evaluation_workflows.rst rename to docs/source/workflows/predictor_evaluations.rst index 874ffa996..52588e95f 100644 --- a/docs/source/workflows/predictor_evaluation_workflows.rst +++ b/docs/source/workflows/predictor_evaluations.rst @@ -99,8 +99,8 @@ For categorical responses, performance metrics include the area under the receiv .. _execution-and-results: -Execution and results ---------------------- +Evaluation and results +---------------------- Once triggered, you can track the evaluation's progress using its ``status`` and ``status_detail`` properties. The ``status`` can be one of ``INPROGRESS``, ``SUCCEEDED``, or ``FAILED``. @@ -166,14 +166,21 @@ The predictor we'll evaluate is defined below: x = RealDescriptor(key='x', lower_bound=0.0, upper_bound=1.0, units='') y = RealDescriptor(key='y', lower_bound=0.0, upper_bound=1.0, units='') - predictor = AutoMLPredictor( + auto_ml_predictor = AutoMLPredictor( name='y predictor', description='predicts y given x', inputs=[y], - outputs=[x], + outputs=[x] + ) + + predictor = GraphPredictor( + name='root predictor', + description='container for the auto ML predictor.', + predictors=[auto_ml_predictor], training_data=[data_source] ) + This predictor expects ``x`` as an input and predicts ``y``. Training data is provided by a :class:`~citrine.informatics.data_sources.GemTableDataSource` that contains ``x`` and ``y``. diff --git a/docs/source/workflows/predictor_reports.rst b/docs/source/workflows/predictor_reports.rst index aeeb73d72..2a75c138a 100644 --- a/docs/source/workflows/predictor_reports.rst +++ b/docs/source/workflows/predictor_reports.rst @@ -4,7 +4,7 @@ Predictor Reports Training a predictor generally produces a set of inter-connected models. A predictor report describes those models, for example their settings and what features are important to the model. It does not include predictor evaluation metrics. -To learn more about predictor evaluation metrics, please see :doc:`PredictorEvaluationMetrics `. +To learn more about predictor evaluation metrics, please see :doc:`PredictorEvaluationMetrics `. The report can be accessed via ``predictor.report``. A task to generate a predictor report is scheduled when a predictor is registered. @@ -70,28 +70,28 @@ Assume that there is a training data table with known id and version. # create ML predictor auto_ml_predictor = AutoMLPredictor( - name='ML predictor for z', - description='Predicts z from x and y', - inputs=[x, y], - outputs=[z], - training_data=[GemTableDataSource( - table_id = training_table_id, - table_version = training_table_version - )] + name='ML predictor for z', + description='Predicts z from x and y', + inputs=[x, y], + outputs=[z] ) # register a predictor with a project predictor = project.predictors.register( - GraphPredictor( - name='ML predictor for z', - description='Predicts z from x and y', - predictors=[auto_ml_predictor] - ) + GraphPredictor( + name='ML predictor for z', + description='Predicts z from x and y', + predictors=[auto_ml_predictor] + training_data=[GemTableDataSource( + table_id = training_table_id, + table_version = training_table_version + )] + ) ) # wait for the predictor report to be ready while project.predictors.get(predictor.uid).report.status == 'PENDING': - sleep(10) + sleep(10) # print the json report report = project.predictors.get(predictor.uid).report diff --git a/docs/source/workflows/predictors.rst b/docs/source/workflows/predictors.rst index 8472e0379..f6185860b 100644 --- a/docs/source/workflows/predictors.rst +++ b/docs/source/workflows/predictors.rst @@ -13,11 +13,10 @@ Versioning ------------------------ All predictors have a version number. When you create a new predictor, it will be version 1, and its `draft` flag will be `True`. -While `draft` is True, any edits will overwrite the current version. -Once the predictor trains successfully, `draft` will be set to `False`. Any further edits will apply to the next version. -If training fails, `draft` will remain `True`. -To act on a specific version of the predictor (where allowed), use the function which accepts a `version` argument. -Any which don't accept a version act on the most recent version of the predictor. For example, `get()` vs. `get_version()`. +While ``draft`` is ``True``, any edits will overwrite the current version. +Once the predictor trains successfully, ``draft`` will be set to ``False``. Any further edits will apply to the next version. +If training fails, ``draft`` will remain ``True``. +To act on a specific version of the predictor (where allowed), pass the ``version`` argument. Auto ML predictor ------------------------- @@ -27,7 +26,7 @@ AutoMLPredictors allow you to use your domain knowledge to construct custom `Gra Each AutoMLPredictor is defined by a set of inputs and outputs. Inputs are used as input features to the machine learning model. The outputs are the properties that you would like the model to predict. -Currently, only one output per AutoML predictor is supported, and there must be at least one input. +There must be at least one input. Only one model is trained from inputs to the outputs. AutoMLPredictors support both regression and classification. @@ -45,7 +44,7 @@ The following example demonstrates how to use the Citrine Python client to creat .. code:: python - from citrine.informatics.predictors import AutoMLPredictor + from citrine.informatics.predictors import AutoMLPredictor, GraphPredictor from citrine.seeding.find_or_create import create_or_update # create AutoMLPredictor (assumes descriptors for @@ -54,12 +53,16 @@ The following example demonstrates how to use the Citrine Python client to creat name = 'Predictor name', description = 'Predictor description', inputs = [input_descriptor_1, input_descriptor_2], - outputs = [output_descriptor_1], + outputs = [output_descriptor_1] + ) + graph_predictor = GraphPredictor( + name = 'Root predictor', + predictors = [auto_ml_predictor], training_data = [GemTableDataSource(table_id=training_data_table_uid, table_version=training_data_table_version)] ) predictor = create_or_update(collection=project.predictors, - resource=auto_ml_predictor + resource=graph_predictor ) @@ -69,23 +72,20 @@ Graph predictor The :class:`~citrine.informatics.predictors.graph_predictor.GraphPredictor` stitches together multiple predictors into a directed bipartite graph. The predictors are connected based on their descriptors -- using a descriptor as the output of one predictor and also as the input of another will ensure that the predictors are wired together. The graph structure is quite flexible. -A descriptor can be the output and/or input of multiple predictors, and cycles are permitted. +A descriptor can be the output and/or input of multiple predictors. -A ``GraphPredictor`` is created by specifying the sub-predictors. -These can either be references to predictors that exist on-platform, or they can be predictors that are defined locally. +A ``GraphPredictor`` is created by specifying the sub-predictors, defined locally. A sub-predictor **cannot** be another ``GraphPredictor``. Training data can be specified when creating a graph predictor. -This will be combined with any training data present in the sub-predictors. -The following example demonstrates how to create a :class:`~citrine.informatics.predictors.graph_predictor.GraphPredictor` from on-platform and locally-defined predictors. -Assume that there exists a GEMD table with columns for time, bulk modulus, and Poisson's ratio. +The following example demonstrates how to create a :class:`~citrine.informatics.predictors.graph_predictor.GraphPredictor`. +Assume that there exists a GEM Table with columns for time, bulk modulus, and Poisson's ratio. We train ML models to predict bulk modulus and Poisson's ratio, then apply an expression to calculate Young's modulus. -The ML models are independently registered on-platform, but the expression predictor is defined locally and hence cannot be re-used. .. code:: python - from citrine.informatics.predictors import GraphPredictor, AutoMLPredictor, ExpressionPredictor + from citrine.informatics.predictors import AutoMLPredictor, ExpressionPredictor, GraphPredictor from citrine.informatics.data_sources import GemTableDataSource time = RealDescriptor("tempering time", lower_bound=0, upper_bound=30, units="s") @@ -93,23 +93,17 @@ The ML models are independently registered on-platform, but the expression predi poissons_ratio = RealDescriptor("Poisson\'s Ratio", lower_bound=0, upper_bound=0.5, units="") training_data = GemTableDataSource(table_id=training_data_table_uid, table_version=training_data_table_version) - bulk_modulus_predictor = project.predictors.register( - AutoMLPredictor( - name="predict bulk modulus from tempering time", - description="", - inputs=[time], - outputs=[bulk_modulus], - training_data=[training_data] - ) + bulk_modulus_predictor = AutoMLPredictor( + name="predict bulk modulus from tempering time", + description="", + inputs=[time], + outputs=[bulk_modulus] ) - poissons_ratio_predictor = project.predictors.register( - AutoMLPredictor( - name="predict Poisson\'s ratio from tempering time", - description="", - inputs=[time], - outputs=[poissons_ratio], - training_data=[training_data] - ) + poissons_ratio_predictor = AutoMLPredictor( + name="predict Poisson\'s ratio from tempering time", + description="", + inputs=[time], + outputs=[poissons_ratio] ) youngs_modulus = RealDescriptor("Young\'s Modulus", lower_bound=0, upper_bound=1E4, units="GPa") @@ -126,11 +120,11 @@ The ML models are independently registered on-platform, but the expression predi name = "Big elastic constant predictor", description = "" predictors = [ - bulk_modulus_predictor.uid, - poissons_ratio_predictor.uid, + bulk_modulus_predictor, + poissons_ratio_predictor, expression_predictor ], - training_data = [] + training_data=[training_data] ) ) @@ -172,7 +166,8 @@ The following example demonstrates how to create an :class:`~citrine.informatics .. code:: python - from citrine.informatics.predictors import ExpressionPredictor + from citrine.informatics.data_sources import GemTableDataSource + from citrine.informatics.predictors import ExpressionPredictor, GraphPredictor youngs_modulus = RealDescriptor(key='Property~Young\'s modulus', lower_bound=0, upper_bound=100, units='GPa') poissons_ratio = RealDescriptor(key='Property~Poisson\'s ratio', lower_bound=-1, upper_bound=0.5, units='') @@ -189,13 +184,19 @@ The following example demonstrates how to create an :class:`~citrine.informatics } ) + graph_predictor = GraphPredictor( + name = 'Root predictor', + predictors = [shear_modulus_predictor], + training_data = [GemTableDataSource(table_id=training_data_table_uid, table_version=training_data_table_version)] + ) + # register or update predictor by name predictor = create_or_update( collection=project.predictors, module=shear_modulus_predictor ) -For an example of expression predictors used in a graph predictor, see :ref:`AI Engine Code Examples `. +For a more involved example of expression predictors used in a graph predictor, see :ref:`AI Engine Code Examples `. Molecular Structure Featurizer ------------------------------------ @@ -215,7 +216,7 @@ The following example demonstrates how to use a :class:`~citrine.informatics.pre .. code:: python from citrine.informatics.descriptors import MolecularStructureDescriptor, RealDescriptor - from citrine.informatics.predictors import MolecularStructureFeaturizer, AutoMLPredictor, GraphPredictor + from citrine.informatics.predictors import AutoMLPredictor, GraphPredictor, MolecularStructureFeaturizer from citrine.seeding.find_or_create import create_or_update from citrine.informatics.data_sources import GemTableDataSource @@ -251,8 +252,7 @@ The following example demonstrates how to use a :class:`~citrine.informatics.pre name='ML Model for Density', description='Predict the density, given molecular features of the solvent', inputs = features, - output = [output_desc], - training_data = [] + output = [output_desc] ) # use a graph predictor to wrap together the featurizer and the machine learning model @@ -276,7 +276,7 @@ computes a configurable set of features on chemical formula data by using the pr and their stoichiometric amounts. Many of the features are stoichiometrically weighted generalized means of element-level properties, though some are more complex functions of the chemical formula. The generalized means are configured with the ``powers`` argument, which is a list of means to calculate. -For example, setting ``powers=[1, 3]`` will calculate the mean and 3-mean of all applicable features. +For example, setting ``powers=[1.0, 3.0]`` will calculate the mean and 3-mean of all applicable features. The features to compute are configured using the ``features`` and ``excludes`` arguments, which accept either feature names or predefined aliases. The default is the `standard` alias, corresponding to a variety of features that are intuitive and often correlate with properties of interest. @@ -287,14 +287,13 @@ The feature names and descriptors are automatically constructed from the name of The ``from_predictor_responses`` method will grab the descriptors for the features so that they can be fed into other predicors, e.g., the :class:`~citrine.informatics.predictors.auto_ml_predictor.AutoMLPredictor`, as inputs. - The following example demonstrates how to use a :class:`~citrine.informatics.predictors.chemical_formula_featurizer.ChemicalFormulaFeaturizer` and :class:`~citrine.informatics.predictors.auto_ml_predictor.AutoMLPredictor` to model a property of an alloy: .. code:: python from citrine.informatics.descriptors import ChemicalFormulaDescriptor, RealDescriptor - from citrine.informatics.predictors import ChemicalFormulaFeaturizer, AutoMLPredictor, GraphPredictor + from citrine.informatics.predictors import AutoMLPredictor, ChemicalFormulaFeaturizer, GraphPredictor from citrine.seeding.find_or_create import create_or_update from citrine.informatics.data_sources import GemTableDataSource @@ -316,7 +315,7 @@ The following example demonstrates how to use a :class:`~citrine.informatics.pre description="Featurize the Alloy's chemical formula using the default features and a 2-mean.", input_descriptor=input_desc, features=['standard'], - powers=[2] + powers=[2.0] ) # get the feature names @@ -331,8 +330,7 @@ The following example demonstrates how to use a :class:`~citrine.informatics.pre name='ML Model for Melting Temperature', description='Predict the melting temperature, given chemical features of the alloy', inputs = features, - outputs = [output_desc], - training_data = [] + outputs = [output_desc] ) # use a graph predictor to wrap together the featurizer and the machine learning model @@ -363,7 +361,7 @@ An ``input_descriptor`` with key 'Formulation' is automatically generated that r the associated material history of the input formulation is traversed to determine the leaf ingredients. These leaf ingredients are then summed across all leaves of the mixing processes, with the resulting candidates described by an automatically generated ``output_descriptor`` formulation descriptor named 'Flat Formulation'. -The ``training_data`` parameter is used as a source of formulation recipes to be used in flattening hierarchical mixtures. +The ``training_data`` of the parent :class:`~citrine.informatics.predictors.graph_predictor.GraphPredictor` is used as a source of formulation recipes to be used in flattening hierarchical mixtures. The following example illustrates how a :class:`~citrine.informatics.predictors.simple_mixture_predictor.SimpleMixturePredictor` can be used to flatten the ingredients used in aqueous dilutions of hypertonic saline, @@ -371,7 +369,7 @@ yielding just the quantities of the leaf constituents salt and water. .. code:: python - from citrine.informatics.predictors import SimpleMixturePredictor + from citrine.informatics.predictors import GraphPredictor, SimpleMixturePredictor # table with simple mixtures and their ingredients data_source = GemTableDataSource( @@ -379,12 +377,19 @@ yielding just the quantities of the leaf constituents salt and water. table_version=table_version ) - SimpleMixturePredictor( + mixture_predictor = SimpleMixturePredictor( name='Simple mixture predictor', - description='Constructs a formulation descriptor that flattens a hierarchy of simple mixtures into the quantities of leaf ingredients', - training_data=[data_source] + description='Constructs a formulation descriptor that flattens a hierarchy of simple mixtures into the quantities of leaf ingredients' + ) + + graph_predictor = GraphPredictor( + name = 'Root predictor', + predictors = [mixture_predictor], + training_data = [data_source] ) + + Mean property predictor ----------------------- @@ -460,7 +465,7 @@ to compute the mean solute density and the distribution of acetone solubility in from citrine.informatics.data_sources import GemTableDataSource from citrine.informatics.descriptors import FormulationDescriptor, RealDescriptor - from citrine.informatics.predictors import MeanPropertyPredictor + from citrine.informatics.predictors import GraphPredictor, MeanPropertyPredictor # descriptor that holds formulation data formulation = FormulationDescriptor.hierarchical() @@ -485,7 +490,6 @@ to compute the mean solute density and the distribution of acetone solubility in properties=[density, acetone_solubility], # compute the response with component quantities weighted evenly p=1, - training_data=[data_source], # impute ingredient properties, if missing impute_properties=True, # if missing, use provided defaults @@ -493,6 +497,12 @@ to compute the mean solute density and the distribution of acetone solubility in # only featurize ingredients labeled as a solute label='solute' ) + + graph_predictor = GraphPredictor( + name = 'Root predictor', + predictors = [mean_property_predictor], + training_data = [data_source] + ) This predictor will compute a real descriptor with a key ``mean of property density with label solute in formulation`` and a categorical descriptor with key ``distribution of property acetone solubility with label solute in formulation``, @@ -591,7 +601,7 @@ Predictor reports A :doc:`predictor report ` describes a machine-learned model, for example its settings and what features are important to the model. It does not include predictor evaluation metrics. -To learn more about predictor evaluation metrics, please see :doc:`PredictorEvaluation `. +To learn more about predictor evaluation metrics, please see :doc:`PredictorEvaluation `. Training data ------------- @@ -606,11 +616,7 @@ Deduplication is additive. Given three rows with identifiers ``[a]``, ``[b]`` and ``[a, b]``, deduplication will result in a single row with three identifiers (``[a, b, c]``) and the union of all data from these rows. Care must be taken to ensure uids and identifiers aren't shared across multiple data sources to avoid unwanted deduplication. -When using a :class:`~citrine.informatics.predictors.graph_predictor.GraphPredictor`, training data provided by the graph predictor and all sub-predictors are combined into a single deduplicated list. Each predictor is trained on the subset of the combined data that is valid for the model. -Note, data may come from sources defined by other sub-predictors in the graph. -Because training data are shared by all predictors in the graph, a data source does not need to be redefined by all sub-predictors that require it. -If all data sources required to train a predictor are specified elsewhere in the graph, the ``training_data`` parameter may be omitted. If the graph contains a predictor that requires formulations data, e.g. a :class:`~citrine.informatics.predictors.simple_mixture_predictor.SimpleMixturePredictor` or :class:`~citrine.informatics.predictors.mean_property_predictor.MeanPropertyPredictor`, any GEM Tables specified by the graph predictor that contain formulation data must provide a formulation descriptor, and this descriptor must match the input formulation descriptor of the sub-predictors that require these data. diff --git a/src/citrine/resources/project.py b/src/citrine/resources/project.py index ee3a7388b..f59e7206c 100644 --- a/src/citrine/resources/project.py +++ b/src/citrine/resources/project.py @@ -26,7 +26,7 @@ class Project(Resource['Project']): """ A Citrine Project. - A project is a collection of datasets and AI assets, some of which belong directly + A project is a collection of training sets and AI assets, some of which belong directly to the project, and some of which have been shared with the project. Parameters