Increasing batch size does not improve efficiency

When I increase batch size, the inference time on TensorRT does not change. Basically if inference time on the batch with size 8 took 20ms. Inference on batch size 16 just takes 40ms. I am not sure why this is happening ...

I have converted EfficientNet backbone from TF to ONNX, and then to TensorRT. In TF I specified the batch size as follows:
```
# save backbone model w/ full signature!

@tf.function()
def my_predict(my_prediction_inputs, **kwargs):
    prediction = mod(my_prediction_inputs, training=False)
    return {"prediction": prediction}

my_signatures = my_predict.get_concrete_function(
   my_prediction_inputs=tf.TensorSpec([batch_size, 256, 256, 3], dtype=tf.float32, name="image")
)

tf.saved_model.save(mod, bbone_name, signatures=my_signatures)
```
### Converting TensorFlow model to ONNX
```
$ python -m tf2onnx.convert --saved-model mods/effnet-l/bbone --output mods/effnet-l/bbone.onnx
```
### Converting ONNX model to TensorRT and saving it.
```
import engine as eng
import argparse
from onnx import ModelProto
import tensorrt as trt

base_dir = "mods/effnet-l"
# base_dir = "mods/resnet152/"
onnx_path = base_dir+"/bbone.onnx"
engine_name =  base_dir+"/bbone.plan"

batch_size = 8

model = ModelProto()
with open(onnx_path, "rb") as f:
    model.ParseFromString(f.read())

shape = [batch_size, 256, 256, 3]

engine = eng.build_engine(onnx_path, shape=shape)
eng.save_engine(engine, engine_name) 
```

### Here is an inference code for TensorRT. 

Everything works properly. The problem is the speed. Basically, if I increase batch size twice it will just increase inference time twice. Thus, it is not changing total inference time. 
```
std::vector<float> EffnetBBone::convert_mat_to_fvec(cv::Mat mat)
{
    std::vector<float> array;
    if (mat.isContinuous())
    {
        array.assign((float *)mat.data, (float *)mat.data + mat.total() * mat.channels());
    }
    else
    {
        for (int i = 0; i < mat.rows; ++i)
        {
            array.insert(array.end(), mat.ptr<float>(i), mat.ptr<float>(i) + mat.cols * mat.channels());
        }
    }
    return array;
}



EffnetBBone::EffnetBBone(std::string base_dir, bool half_precision)
{
    onnx_net = new Trt();
    if (half_precision)
    {
        onnx_net->EnableFP16();
    }
    onnx_net->BuildEngine(base_dir + "/bbone.onnx", base_dir + "/bbone.plan")
    onnx_net->SetLogLevel((int)Severity::kINTERNAL_ERROR);
    
}

std::vector<float> EffnetBBone::run_batch(std::vector<cv::Mat> batch_img, bool normalized)
{
    cv::Mat crop;
    std::vector<float> batch_fvec;
    int size = batch_img.size() * (327680 / 4) ;
    std::vector<float> output(size);
    for (int i = 0; i < batch_img.size(); i++)
    {
        std::vector<float> fvec;
        crop = batch_img[i];
        cv::Mat img_f32;
        
        crop.convertTo(img_f32, CV_32F);
        if (normalized == false){
            img_f32 = img_f32 / 256.f;
        }   
        fvec = convert_mat_to_fvec(img_f32);
        batch_fvec.insert(batch_fvec.end(), fvec.begin(), fvec.end());
    }
    
    onnx_net->CopyFromHostToDevice(batch_fvec, inputBindIndex);
    bool state = onnx_net->Forward();
    assert(state == true);
    onnx_net->CopyFromDeviceToHost(output, outputBindIndex);
    return output;
}
```

**Screenshots**
If applicable, add screenshots to help explain your problem.

**System environment (please complete the following information):**
 - Device: GeForce RTX 3090
 - OS: Ubuntu 20.04
 - Driver version: 470.103.01
 - CUDA version: 11.2
 - TensorRT version: 8.4.0
 - Others:



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increasing batch size does not improve efficiency #65

Converting TensorFlow model to ONNX

Converting ONNX model to TensorRT and saving it.

Here is an inference code for TensorRT.

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Increasing batch size does not improve efficiency #65

Description

Converting TensorFlow model to ONNX

Converting ONNX model to TensorRT and saving it.

Here is an inference code for TensorRT.

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions