[WIP] Add Portable::MatrixFree-based MechanicalOperatorDevice#467
[WIP] Add Portable::MatrixFree-based MechanicalOperatorDevice#467masterleinad wants to merge 6 commits intoadamantine-sim:masterfrom
Conversation
…vmult/vmult_add and device material parameters)
There was a problem hiding this comment.
Code Review
This pull request introduces the MechanicalOperatorDevice class and its template implementation to support GPU-accelerated mechanical simulations using Kokkos and deal.II. It also updates the MechanicalPhysics class to integrate this new operator. Key feedback includes addressing inverted instantiation logic, removing hardcoded finite element degrees, and adding a null check for triangulation casting. Additionally, the reviewer suggested optimizing the cell-to-quadrature mapping and simplifying stress calculations using tensor operations.
| if constexpr (std::is_same_v<MemorySpaceType, dealii::MemorySpace::Host>) | ||
| _mechanical_operator = std::make_unique< | ||
| MechanicalOperatorDevice<dim, 1, n_materials, p_order, MaterialStates>>( | ||
| communicator, _material_properties); | ||
| else | ||
| Kokkos::abort("Not implemented"); |
There was a problem hiding this comment.
The if constexpr logic appears to be inverted. MechanicalOperatorDevice is intended for device-accelerated execution using Kokkos and Portable::MatrixFree, yet it is currently instantiated only when MemorySpaceType is dealii::MemorySpace::Host. Conversely, the implementation aborts for other memory spaces (like device spaces). This likely prevents the code from running on GPUs as intended.
| _mechanical_operator = | ||
| if constexpr (std::is_same_v<MemorySpaceType, dealii::MemorySpace::Host>) | ||
| _mechanical_operator = std::make_unique< | ||
| MechanicalOperatorDevice<dim, 1, n_materials, p_order, MaterialStates>>( |
There was a problem hiding this comment.
The fe_degree template parameter for MechanicalOperatorDevice is hardcoded to 1. This contradicts the fe_degree argument passed to the MechanicalPhysics constructor (line 29) and prevents the use of higher-order finite elements. Since MechanicalOperatorDevice requires the degree at compile-time, you may need to template MechanicalPhysics on the degree or use a dispatch mechanism to instantiate the correct operator.
| * Pointer to the MechanicalOperatorDevice | ||
| */ | ||
| std::unique_ptr< | ||
| MechanicalOperatorDevice<dim, 1, n_materials, p_order, MaterialStates>> |
| _n_owned_cells = | ||
| dynamic_cast<dealii::parallel::DistributedTriangulationBase<dim> const *>( | ||
| &dof_handler.get_triangulation()) | ||
| ->n_locally_owned_active_cells(); |
| auto graph = _matrix_free.get_colored_graph(); | ||
| unsigned int const n_colors = graph.size(); | ||
| for (unsigned int color = 0; color < n_colors; ++color) | ||
| { | ||
| auto gpu_data = _matrix_free.get_data(color); | ||
| unsigned int const n_cells = gpu_data.n_cells; | ||
| auto gpu_data_host = dealii::Portable::copy_mf_data_to_host<dim, double>( | ||
| gpu_data, _matrix_free_data.mapping_update_flags); | ||
| for (unsigned int cell_id = 0; cell_id < n_cells; ++cell_id) | ||
| { | ||
| auto cell = graph[color][cell_id]; | ||
| std::vector<unsigned int> quad_pos(n_q_points_per_cell); | ||
| for (unsigned int i = 0; i < n_q_points_per_cell; ++i) | ||
| { | ||
| unsigned int const pos = | ||
| gpu_data_host.local_q_point_id(cell_id, n_q_points_per_cell, i); | ||
| quad_pos[i] = pos; | ||
| } | ||
| _cell_it_to_mf_pos[cell] = quad_pos; | ||
| } | ||
| } |
There was a problem hiding this comment.
This block for building the cell-to-quadrature-position mapping is inefficient. Specifically, dealii::Portable::copy_mf_data_to_host is called inside the color loop (line 119), leading to multiple host-device transfers. Additionally, using a std::map with cell iterators as keys (line 131) is slow for large meshes. Consider using a more direct indexing scheme or a vector-based lookup if possible.
| dealii::Tensor<2, dim, double> stress; | ||
| for (unsigned int i = 0; i < dim; ++i) | ||
| for (unsigned int j = 0; j < dim; ++j) | ||
| { | ||
| stress[i][j] = 2.0 * mu * eps[i][j]; | ||
| if (i == j) | ||
| stress[i][j] += lambda * trace_eps; | ||
| } |
No description provided.