Skip to content

[WIP] Add Portable::MatrixFree-based MechanicalOperatorDevice#467

Draft
masterleinad wants to merge 6 commits intoadamantine-sim:masterfrom
masterleinad:matrixfree-mech-operator
Draft

[WIP] Add Portable::MatrixFree-based MechanicalOperatorDevice#467
masterleinad wants to merge 6 commits intoadamantine-sim:masterfrom
masterleinad:matrixfree-mech-operator

Conversation

@masterleinad
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the MechanicalOperatorDevice class and its template implementation to support GPU-accelerated mechanical simulations using Kokkos and deal.II. It also updates the MechanicalPhysics class to integrate this new operator. Key feedback includes addressing inverted instantiation logic, removing hardcoded finite element degrees, and adding a null check for triangulation casting. Additionally, the reviewer suggested optimizing the cell-to-quadrature mapping and simplifying stress calculations using tensor operations.

Comment on lines +72 to +77
if constexpr (std::is_same_v<MemorySpaceType, dealii::MemorySpace::Host>)
_mechanical_operator = std::make_unique<
MechanicalOperatorDevice<dim, 1, n_materials, p_order, MaterialStates>>(
communicator, _material_properties);
else
Kokkos::abort("Not implemented");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The if constexpr logic appears to be inverted. MechanicalOperatorDevice is intended for device-accelerated execution using Kokkos and Portable::MatrixFree, yet it is currently instantiated only when MemorySpaceType is dealii::MemorySpace::Host. Conversely, the implementation aborts for other memory spaces (like device spaces). This likely prevents the code from running on GPUs as intended.

_mechanical_operator =
if constexpr (std::is_same_v<MemorySpaceType, dealii::MemorySpace::Host>)
_mechanical_operator = std::make_unique<
MechanicalOperatorDevice<dim, 1, n_materials, p_order, MaterialStates>>(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The fe_degree template parameter for MechanicalOperatorDevice is hardcoded to 1. This contradicts the fe_degree argument passed to the MechanicalPhysics constructor (line 29) and prevents the use of higher-order finite elements. Since MechanicalOperatorDevice requires the degree at compile-time, you may need to template MechanicalPhysics on the degree or use a dispatch mechanism to instantiate the correct operator.

* Pointer to the MechanicalOperatorDevice
*/
std::unique_ptr<
MechanicalOperatorDevice<dim, 1, n_materials, p_order, MaterialStates>>
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The fe_degree template parameter is hardcoded to 1 here as well. This should be consistent with the degree used throughout the physics class.

Comment on lines +102 to +105
_n_owned_cells =
dynamic_cast<dealii::parallel::DistributedTriangulationBase<dim> const *>(
&dof_handler.get_triangulation())
->n_locally_owned_active_cells();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The result of dynamic_cast to dealii::parallel::DistributedTriangulationBase<dim> const * is not checked for nullptr before dereferencing. If the triangulation is not distributed (e.g., a serial triangulation), this will cause a crash.

Comment on lines +113 to +133
auto graph = _matrix_free.get_colored_graph();
unsigned int const n_colors = graph.size();
for (unsigned int color = 0; color < n_colors; ++color)
{
auto gpu_data = _matrix_free.get_data(color);
unsigned int const n_cells = gpu_data.n_cells;
auto gpu_data_host = dealii::Portable::copy_mf_data_to_host<dim, double>(
gpu_data, _matrix_free_data.mapping_update_flags);
for (unsigned int cell_id = 0; cell_id < n_cells; ++cell_id)
{
auto cell = graph[color][cell_id];
std::vector<unsigned int> quad_pos(n_q_points_per_cell);
for (unsigned int i = 0; i < n_q_points_per_cell; ++i)
{
unsigned int const pos =
gpu_data_host.local_q_point_id(cell_id, n_q_points_per_cell, i);
quad_pos[i] = pos;
}
_cell_it_to_mf_pos[cell] = quad_pos;
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block for building the cell-to-quadrature-position mapping is inefficient. Specifically, dealii::Portable::copy_mf_data_to_host is called inside the color loop (line 119), leading to multiple host-device transfers. Additionally, using a std::map with cell iterators as keys (line 131) is slow for large meshes. Consider using a more direct indexing scheme or a vector-based lookup if possible.

Comment on lines +58 to +65
dealii::Tensor<2, dim, double> stress;
for (unsigned int i = 0; i < dim; ++i)
for (unsigned int j = 0; j < dim; ++j)
{
stress[i][j] = 2.0 * mu * eps[i][j];
if (i == j)
stress[i][j] += lambda * trace_eps;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The stress calculation can be simplified using dealii tensor operations for better readability.

          dealii::Tensor<2, dim, double> stress = 2.0 * mu * eps;
          for (unsigned int i = 0; i < dim; ++i)
            stress[i][i] += lambda * trace_eps;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant