customized subtraction and aggregation implement much slower than the pytorch implementation

enviroment:
pytorch 1.5.1
cuda 10.1
test on small input tensor (2,8,5,5)

when using the test method in lib/sa/functions to test the speed, I found that the corresponding implementation using pytorch api is much faster than your C code in backward propogation (about 50X faster). Although the forward times of them are relatively close, customized api is slightly faster than torch api.

<img width="502" alt="image" src="https://user-images.githubusercontent.com/31816690/93763023-42af1400-fc43-11ea-922a-39b4d576abf4.png">

So why you choose to implement the operation on your own?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

customized subtraction and aggregation implement much slower than the pytorch implementation #13

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

customized subtraction and aggregation implement much slower than the pytorch implementation #13

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions