add high-level PETSc SNES interface for nonlinear problems#19
add high-level PETSc SNES interface for nonlinear problems#19jonas-heinzmann wants to merge 6 commits intobpachev:mainfrom
Conversation
…ES (resembling dolfinx.fem.petsc.NonlinearProblem), and add demo solving a nonlinear problem with said interface class
…residual function
|
@bpachev One common use case for non-linear problems is to apply the BCs in an incremental way, i.e. to have some sort of load stepping and update the values of the BCs with the load step (or one might have time-dependent BCs). It seems to me that this does not work if one uses the packed |
@jonas-heinzmann The CUDADirichletBC object has an |
I see, sorry for missing that! What do you think of calling it in the |
I think that is a great idea. I have very little documentation currently, so it is easy to miss. The method does require the list of boundary conditions that need to be updated, but it should be fine to update all of them just in case. |
|
Previously, the CUDAVector binding had a hardcoded |
|
@bpachev I believe this should be ready now, could you take another look? Thanks a lot! |
bpachev
left a comment
There was a problem hiding this comment.
Good catch on the ghosting for x0 - this is another reason to rework the API to directly use PETSc Vectors. Would you be able to address the comments about copying b and using ghost_layer_mesh in the demo? Also, have you tested to ensure that the results on the GPU are the same as those on the CPU, particularly in parallel?
python/cudolfinx/petsc.py
Outdated
| ) | ||
|
|
||
| # copy assembled vector to PETSc work vector | ||
| self._cuda_b.vector.copy(b) |
There was a problem hiding this comment.
If you create a CUDAVector for b like what you have for x and use that in the assembly routines, you can avoid this copy.
There was a problem hiding this comment.
I used nsys to profile the example you shared, with and without CUDAVector for b.
Original:
** CUDA GPU MemOps Summary (by Size) (cuda_gpu_mem_size_sum):
Total (MB) Count Avg (MB) Med (MB) Min (MB) Max (MB) StdDev (MB) Operation
---------- ----- -------- -------- -------- -------- ----------- ------------------------------
541.068 653 0.829 0.003 0.000 65.059 4.935 [CUDA memcpy Host-to-Device]
192.048 453 0.424 0.042 0.002 3.903 0.531 [CUDA memcpy Device-to-Device]
165.536 427 0.388 0.003 0.000 7.712 1.110 [CUDA memset]
108.620 391 0.278 0.000 0.000 32.529 2.019 [CUDA memcpy Device-to-Host]
Using CUDAVector instead of copying:
** CUDA GPU MemOps Summary (by Size) (cuda_gpu_mem_size_sum):
Total (MB) Count Avg (MB) Med (MB) Min (MB) Max (MB) StdDev (MB) Operation
---------- ----- -------- -------- -------- -------- ----------- ------------------------------
541.068 653 0.829 0.003 0.000 65.059 4.935 [CUDA memcpy Host-to-Device]
179.314 441 0.407 0.042 0.002 3.903 0.528 [CUDA memcpy Device-to-Device]
165.536 427 0.388 0.003 0.000 7.712 1.110 [CUDA memset]
108.620 391 0.278 0.000 0.000 32.529 2.019 [CUDA memcpy Device-to-Host]
The impact on runtime is pretty negligible since that is dominated by the linear solver. But it does avoid the copy and does not increase memory movement elsewhere.
There was a problem hiding this comment.
Good point! I now wrapped the b provided by PETSc into a CUDAVector inside _assemble_residual.
| def main(res: int = 30, degree: int = 1, dim: int = 3, cuda: bool = True): | ||
| """solve a nonlinear problem on a CPU or GPU""" | ||
|
|
||
| domain = create_mesh(res, dim=dim) |
There was a problem hiding this comment.
In order for the demo to work in parallel, you need to do domain = cudolfinx.ghost_layer_mesh(domain), in the CUDA case with more than one process like in the Poisson example. Have you tested the correctness of your interface with multiple MPI processes?
There was a problem hiding this comment.
I had this just a few lines below that, so the demo is also working also in parallel. But good point, I added a test in python/test/test_nonlinear_problem.py to compare CPU and GPU results (working also in parallel).
… hence to avoid a copy in _assemble_residual function
ae754d5 to
a887842
Compare
a887842 to
5d59b61
Compare
Adds NonlinearProblem class in
python/cudolfinx/petsc.py, which is adapted from and directly resemblesdolfinx.fem.petsc.NonlinearProblem, as well as a demo inpython/examplesmaking use of this class.Special attention is necessary to make the SNESLineSearches work: These rely on internal work vectors for both the solution iterate, as well as the residual to be assembled at the solution iterate. This necessitates copies between the assembled vectors and the PETSc work vectors.
Output for said example with
python nonlinear.py --res 50:and for
python nonlinear.py --res 50 --no-cuda: