Feature/stats_shm#413
Conversation
| shm::unregister_path(m_stats_path.c_str()); | ||
| } | ||
|
|
||
| void open_shm(int rank, int comm_size, int local_size, std::string uuid) { |
There was a problem hiding this comment.
This name is a little confusing with shm_open. Might be better to use comm_stats_shm_open or similar. If this is changed, change close_shm the same way.
There was a problem hiding this comment.
Sounds good, will change to be more descriptive.
| // Tracking array. Mutated by register/unregister_path in main thread; | ||
| // read by signal handler to unlink in case of abnormal behavior. | ||
| inline int num_active_paths = 0; | ||
| inline char active_shm_paths[MAX_SHM_PATHS][kShmPathMaxLen] = {0}; |
There was a problem hiding this comment.
Why not use the live_comm_uuids from ygm_uuids for unlinking files during a crash? The use of a C array of strings with MAX_SHM_PATHS entries is going to be unnecessarily limiting.
There was a problem hiding this comment.
Manipulation/use of std::set and std::string are very likely to violate signal handler calling context assumptions due to use of dynamic memory, and many associated functions not safe for potential reentry behavior. Current code was intentionally written to not use any operations or structures that are not verifiably async-signal-safe.
It may be overly complicated now that we are considering session removal / recovery in YGM-top. If it is overkill I can roll back to live_comm_uuids.
There was a problem hiding this comment.
I think we can work around this in our case. We aren't trying to call malloc in the signal handler. Just read from the set. We can also get away with not removing the UUIDs when we are in the signal handler.
The one case that could be an issue is if the signal was raised during the insertion of the UUID into live_comm_uuids. If we want to be safe in this case, I believe we could use sigprocmask to turn signals off when adding/removing UUIDs, and then turn them back on.
There was a problem hiding this comment.
I see what you mean. I'll change it to read from live_comm_uuid and reconstruct the shm paths in handler.
…s for comm_stats data storage.
…ined handler; Tests not yet refactored
Implements features to be used in coordination with YGM-top.
Adds shared memory backing capability for ygm communicator stats, allowing external reader to see comm stats data during live program runs.
Also adds cleanup infrastructure to account for abnormal program exit conditions where possible.