Add a few more methods to indicate the output data of a processor by Logiquo · Pull Request #827 · sunlabuiuc/PyHealth

Logiquo · 2026-02-07T09:51:06Z

This propose a new set of methods need to be implemented for processor.

This should greatly reduce the complexity for both models & interpretability methods to generalize across multiple inputs, as they now have a standard way to understand the data they are receiving.

new APIs are

    def is_token(self) -> bool:
        """Returns whether the output (in particular, the value tensor) of the processor 
        represents discrete token indices (True) or continuous values (False). This is used to 
        determine whether to apply token-based transformations (e.g. `nn.Embedding`) or 
        value-based augmentations (e.g. `nn.Linear`). 

        Returns:
            True if the output of the processor represents discrete token indices, False otherwise.
        """
        raise NotImplementedError("is_token method is not implemented for this processor.")
    
    def schema(self) -> tuple[str, ...]:
        """Returns the schema of the processed feature. For a processor that emits a single tensor,
        this should just return `["value"]`. For a processor that emits a tuple of tensors, 
        this should return a tuple of the same length as the tuple, with the semantic name of each tensor,
        such as `["time", "value"]`, `["value", "mask"]`, etc.
        
        Typical semantic names include:
            - "value": the main processed tensor output of the processor
            - "time": the time tensor output of the processor (mostly for StageNet)
            - "mask": the mask tensor output of the processor (if applicable)
        
        Returns:
            Tuple of semantic names corresponding to the output of the processor.
        """
        raise NotImplementedError("Schema method is not implemented for this processor.")
    
    def dim(self) -> tuple[int, ...]:
        """Number of dimensions (`Tensor.dim()`) for each output
        tensor, in the same order as the output tuple.

        Returns:
            Tuple of integers corresponding to the number of dimensions of each output tensor.
        """
        raise NotImplementedError("dim method is not implemented for this processor.")
    
    def spatial(self, i: int) -> tuple[bool, ...]:
        """Whether each dimension (axis) of the i-th output tensor is spatial (i.e. corresponds to a spatial 
        axis like time, height, width, etc.) or not. This is used to determine how to apply 
        augmentations and other transformations that should only be applied to spatial dimensions.
        
        E.g. for CNN or RNN features, this would help determine which dimensions to apply spatial augmentations to, 
        and which dimensions to treat as channels or features.
        
        Args:
            i: Index of the output tensor to check.
        
        Returns:
            Tuple of booleans corresponding to whether each axis of the i-th output tensor is spatial or not.
        """
        raise NotImplementedError("spatial method is not implemented for this processor.")

Logiquo added 3 commits February 7, 2026 01:13

add some API for the processors

7e1c46a

Implement for some of the processors

aa292d2

More processors

4861a6b

Logiquo requested a review from jhnwu3 February 7, 2026 09:51

Logiquo added 2 commits February 7, 2026 05:03

Add comment

8c11754

rename method to is_token to better reflects is usage

b87f795

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a few more methods to indicate the output data of a processor#827

Add a few more methods to indicate the output data of a processor#827
Logiquo wants to merge 5 commits intosunlabuiuc:masterfrom
Logiquo:add/processor-api

Logiquo commented Feb 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Logiquo commented Feb 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Logiquo commented Feb 7, 2026 •

edited

Loading