Skip to content

Conversation

@jungan
Copy link
Collaborator

@jungan jungan commented Apr 18, 2022

  • Make sure make distclean can clean up everything in mpi-proxy-split
  • Check error returns
  • Make some functions static when applicable
  • Free pointers in g_async_calls
  • Remove unused function & variable definitions

* Make sure `make distclean` can clean up everything in mpi-proxy-split
* Check error returns
* Make some functions static when applicable
* Free pointers in `g_async_calls`
* Remove unused function & variable definitions
}
JTRACE("Reading segment from lh_proxy")
(remote_iov[0].iov_base)(remote_iov[0].iov_len);
ret = process_vm_readv(childpid, remote_iov, IOV_SZ, remote_iov, IOV_SZ, 0);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rohgarg @gc00 @xuyao0127 Not sure if I understand it correctly, I tested it and works fine.
Please comment if there is anything I missed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the intent here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to take the advantage of process_vm_readv?
I know it doesn't have too much difference given IOV_SZ is 2 here, I think using the standard way probably performs better than we do the loop here when IOV_SZ is large.

int rc = MPI_Recv(buf, count, MPI_BYTE, status.MPI_SOURCE, status.MPI_TAG,
comm, MPI_STATUS_IGNORE);
if (rc != MPI_SUCCESS) {
fprintf(stderr, "MPI_Recv failed\n");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to use JTRACE here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will update the code.

call->comm);
g_recvBytesByRank[worldRank] += call->count * size;
bytesReceived += call->count * size;
JALLOC_HELPER_FREE(call);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to check, but IIRC, the call object couldn't be freed here because it was being referenced in some other place later. So, I had decided to live with the memory leak. I'll have to go through the code again to make sure.

Copy link
Collaborator Author

@jungan jungan Apr 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, shall we use a shared_ptr?

if (call->type == ISEND_REQUEST) {
UPDATE_REQUEST_MAP(request, MPI_REQUEST_NULL);
// FIXME: We should free `call' to avoid memory leak
JALLOC_HELPER_FREE(call);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

}
JTRACE("Reading segment from lh_proxy")
(remote_iov[0].iov_base)(remote_iov[0].iov_len);
ret = process_vm_readv(childpid, remote_iov, IOV_SZ, remote_iov, IOV_SZ, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the intent here?

Copy link
Collaborator

@gc00 gc00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like @rohgarg and @jungan have a continuing conversation here. So, I'll add my review solely as a "comment". @jungan said that he intends to provide an updated version of this PR later.

void *buf = JALLOC_HELPER_MALLOC(count);
MPI_Recv(buf, count, MPI_BYTE, status.MPI_SOURCE, status.MPI_TAG,
int rc = MPI_Recv(buf, count, MPI_BYTE, status.MPI_SOURCE, status.MPI_TAG,
comm, MPI_STATUS_IGNORE);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we re-indent the arguments comm, MPI_STATUS_ERROR to be underneath buf on the previous line?

close(pipefd_out[1]); // close write end of pipe
// FIXME: should be a readall. Check for return error code.
if (read(pipefd_out[0], &lh_info, sizeof lh_info) < sizeof lh_info) {
if (read(pipefd_out[0], &lh_info, sizeof lh_info) < (ssize_t) sizeof lh_info) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do not commit the change o this line.

The issue here is that read returns an int and sizeof returns a size_t. (unsigned). Some compiler complain or warn about comparing an int and unsigned int. The type ssize_t stands for "signed size", which is almost the same thing as an int. So, this cast is needed to make some compilers stop issuing warnings.

@gc00
Copy link
Collaborator

gc00 commented May 13, 2022

@jungan and @rohgarg ,
Where are we on this PR? Shall we split off a portion of this that we all agree is useful, and then leave the rest for further consideration?
Best,

  • Gene

@jungan
Copy link
Collaborator Author

jungan commented May 13, 2022

@jungan and @rohgarg , Where are we on this PR? Shall we split off a portion of this that we all agree is useful, and then leave the rest for further consideration? Best,

  • Gene

Firstly I think I need to rebase the code on top of feature/dmtcp-master, then I will dive deeper into the issue @rohgarg mentioned and see what's the best solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants