Skip to content

customized subtraction and aggregation implement much slower than the pytorch implementation #13

@mzy97

Description

@mzy97

enviroment:
pytorch 1.5.1
cuda 10.1
test on small input tensor (2,8,5,5)

when using the test method in lib/sa/functions to test the speed, I found that the corresponding implementation using pytorch api is much faster than your C code in backward propogation (about 50X faster). Although the forward times of them are relatively close, customized api is slightly faster than torch api.

image

So why you choose to implement the operation on your own?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions