Sync to CICE-Consortium (2026-02-08)#108
Conversation
…rs to remove residual ice (CICE-Consortium#1067) Removes residual amounts of ice that are not otherwise handled by the numerics. The controlling parameters (itd_area_min and itd_mass_min, implemented in Icepack) set minimum ice area and mass values below which all ice is removed following the thermodynamics and ridging calculations. For the B-grid, these parameters are currently set to the dynamics stability minima, which are being reduced to extremely small values based on testing in multiple modeling systems. If needed, users can revert these parameters to the original, larger values by adding them to ice_in. Setting them to 0 turns off the new zapping completely. These parameters are set to the larger, original values in the C-grid test scripts, pending further work. This updates Icepack and changes answers.
After an upgrade, several of Carpenter's modules were removed. These changes update the modules and software versions used to compiler and run CICE.
There is a great deal of confusion about how various history variables are time-averaged, e.g. SIMIP history output implementation CICE-Consortium#1038, Fixes for sitemptop, sitempbot, and sitempsnic. CICE-Consortium#1054. This PR attempts to clarify the situation. These averages are also relevant for conservative coupling.
…ormat) (CICE-Consortium#1079) Add ability to read an extended grid (supported for pop netcdf file format) Add subroutine popgrid_nc_ext to read an extended grid pop netcdf file Add 'nc_ext' option to grid_format namelist The extended grid will apply to the kmt file as well as these are specified by the same grid_format namelist Modify gridbox_verts to operate on a local array instead of a global array, this should improve performance and removes redundant extrapolation calculations. This approach also supports both regular and extended grid reads. The implementation largely duplicates subroutine popgrid_nc but for an extended grid in subroutine popgrid_nc_ext. The extended grid represents the active points plus the full halo. As much as possible, the extended grid (LON, LAT, ANGLE, KMT) is read in on the halo instead of being computed. For some grid metrics (DXT, DYT, DXU, DYU, etc), extrapolation is still required onto the halo. Remove some trailing blanks is other places as needed.
Adds a namelist flag to allow significant wave height to be passed into the ice model from a coupler. In addition, this PR moves wave_spec_height out of icepack interface argument lists, since it is initialized via icepack_init_parameters. See CICE-Consortium/Icepack#545 Update Icepack to #0bcde255637a594 Update ice_step_mod.F90 in opticep unit test to be consistent with latest changes --------- Co-authored-by: apcraig <anthony.p.craig@gmail.com>
Updated all of the variable names, long names, and units to correspond to the CMIP7 data request. Added new variables requested in the CMIP7 data request. Added documentation about the CMIP6 to CMIP7 update. Simplified the accumulation of some fields where possible and added prognostic sea ice density. Added accumulation of variables relative to aice_init or aice. Bug fix for flwout (sifllwutop) where aice_init = 0, but aice > 0. Bug fix for shortwave abosrbed and albedo computation (more coming later) Bug fix: Some variables that were scaled by aice, should be multiplied by aice (not aice_init) to get the _ai quantities, including fswabs, fsens, flat, etc. Removed f_CMIP flag and added set_nml.cmip option instead. Added comment field for SIMIP variables that uses part of the description field in the CMIP data request table. Added long_name field to address issue: time_bounds, lat?_bounds, lon?_bounds attributes CICE-Consortium#1057 Partly addresses aice versus aice_init aice vs. aice/aice_init factor in ice_history CICE-Consortium#1033 Partial fix for albedo variables [albedo]_ai history variables over 100% CICE-Consortium#1051 Addresses issue: Some CMIP variables are computed using a mix of U and T quantities CICE-Consortium#904
Add history restart to netcdf and pio IO options. Binary was not included due to the complexity of having to track history fields in binary files. History restart files are written automatically for history streams that are averaged and when a restart is written during the middle of a history accumulation period. There is one history restart file per history stream. File are written in the restart directory using the history name, an appended "_r[histfreq]", and the model date. An ice_read_hist subroutine was added to the ice_history_write.F90 file. For binary, calling this returns with a warning message that history restarts are not implemented. When history restarts are read, the model will only read files and fields that are found and continue with the accumulator initialized to zero for fields that are not found. For production runs, this should work fine. If a user modifies the history streams in the middle or a run, then an assessment should be made of which fields are valid on the first restart run. The history restart files are basically history files, written at double precision, writing the accumulated fields. In addition, some additional fields are written including time_beg, avgct, albcnt, and snwcnt which represent accumulation counters for time average history output. A new histall10d set_nml option was added that turns on 3 averaged history streams and all history fields. When used in a restart test, the scripts will verify bit-for-bit history files and history restart files across the restart. Several tests were added to the io_suite to include formal testing of bit-for-bit history restarts. Two fields, mlt_onset and frz_onset and not turned on with histall10d because they do not restart properly and they are unable to restart bit-for-bit on the history file, see CICE-Consortium#1068. Several history fields have a bug in them and have been written out incorrectly, and these bugs were fixed. The bug in these cases was that the fields were accumulated during the timestep across categories but were not zeroed out at the start of the timestep. As a result, those fields were accumulating over the entire run incorrectly. The fields that had to be zeroed out were evaps and evaps plus upNO, upNH, bTiz, bphi, iDi, and iki associated with bgc. The bit-for-bit history restart test discovered these errors. Add a new namelist, write_histrest, to turn off history restart writing. The default is that history restarts are on. Update set_nml.cmip to fix an error in f_apond_ai setting.
Update Copyright to 2026 Remove trailing whitespace Update Icepack to #2f31ee37f3a70, Icepack v1.5.3
Bug fix for lwout in CESM driver Also some FSD stuff for coupling Fix define for sitimefrac Add the CESM3 namelist changes
…ICE-Consortium#1089) Update Icepack to #daa41638c6cef to include Enforce minimum snow grain radius (CICE-Consortium#552) If the snow grain radius is set to zero, possibly because of zapping small ice or if ice disappears mid-timestep, then updates of snow grain radius will produce NaNs. Snow grain radius is usually bounded between a min and max so this generally doesn't happen, but a recent coupled E3SM bgc run crashed with this error. While the error seems to be relatively rare, this bug fix changes answers when the snow grain radius is nonzero but still less than the minimum.
Add new author(s)
Derecho shared node jobs intermittently abort with error message "start failed on dec2436: No reply from shepherd after 108s" due to PBS/MPI launch conflicts. Derecho qstat output was also recently changed to return output for completed jobs which prevented the job checking scripts from identifying jobs that have completed. Update derecho shared batch job submission to both increase the number of shared node jobs and control the number of jobs per shared node by submitting the shared jobs on more cores than needed. In the end, an upgrade to PBS seemed to fix the shared node aborts, so this change was commented out in the PR. Derecho will continue to be closely watched. Fix potential bug in setting ICE_MACHINE_QSTAT if the string has spaces in it. Update job checking logic to avoid PBS output that shows completed jobs, added -v " historical ". This is far from ideal and not particularly future proof, but PBS qstat has become a mess. Update create fails to identify test suite jobs that failed to run then generate a script to resubmit them.
|
Could we get a review of this PR so that we can schedule its WM parent PR 3086? |
|
Thanks @gspetro-NOAA . I'd like to check the new baselines for a few more days and then I'll request reviews |
Minor fix to initialize worka=0 for sifb history variable accumulation (like elsewhere in ice_history) so don't have uninitialized values being accumulated. This is needed to fix out-of-range history values for sifb (in UFS)
Initialize worka for sifb
|
@DeniseWorthen Any chance you can review this PR so that we can process ufs-community/ufs-weather-model#3086 ? |
|
Sure, I had looked at it previously but didn't formally approve. |
| enddo | ||
| enddo | ||
| do j = 1,ny_block | ||
| do i = 1,nx_block | ||
| if (kmt(i,j,iblk) >= p5) hm(i,j,iblk) = c1 | ||
| enddo | ||
| enddo |
There was a problem hiding this comment.
TODO: check kmt and hm mpi exchange after this
There was a problem hiding this comment.
yes, in makemask
| enddo | ||
| enddo | ||
| endif | ||
| call scatter_global(work1, work_g2, & |
There was a problem hiding this comment.
TODO: gridbox_verts used to scatter_global
|
Can I check in with you here @apcraig @DeniseWorthen ? The changes brought in here to update EMC/CICE to CICE-Consortium fail to reproduce when changing the number of MPI tasks in several ufs-weather-model regression tests ufs-community/ufs-weather-model#3086 (comment) The changes in cicecore/cicedyn/infrastructure/ice_grid.F90 are really all I see. Does any of this look suspicious to you? |
|
OK, just deleted my last comment. Our standalone testing suggests bit-for-bit results when running different tasks/threads. @NickSzapiro-NOAA, are you suggesting that is no longer the case in UFS or are you asking whether the CICE answers have changed relative to UFS current version? |
|
These UFS mpi regression tests pass with CICE at CICE-Consortium#1054 @apcraig . That is no longer the case for CICE at top of CICE-Consortium/main |
|
OK, CICE-Consortium#1054 is Nov, 2025. Since then answer changes were introduced in CICE-Consortium#1067, CICE-Consortium#1089 if you're using the new snow physics, |
|
Maybe I should be clearer. The baselines have changed (with zap residual). Coupled runs using different MPI tasks don't match each other |
|
OK, I understand now. Standalone CICE testing suggests runs are bit-for-bit with different block size, task, and thread counts. Let me know if I can help pin this issue down. |
|
Great that CICE standalone passes @apcraig ! We use grid_format='nc' so that narrows things. I can only figure it's related to CICE-Consortium#1079 , particularly in the changes around gridbox_verts as now |
|
Another idea. We do not test a MOM grid. It would be great to add one. Maybe someone can provide a coarse grid we can use. But, it's possible something was missed along the way related to the MOM grid implementation? |
|
I'll try to isolate the troublesome commit Well, @DeniseWorthen has made several MOM fix files from 1/4 to 9 degrees: I'm sorry I should know more about CICE standalone+unit testing @apcraig |
|
@NickSzapiro-NOAA, there are no standalone tests with any MOM grids. I will try to setup some standalone testing with one of the lower resolution grids. Separately, I encourage you to try to identify when the problem was introduced. There were several updates over the last few months, including some that impacted infrastructure. |
|
Thanks @apcraig . fwiw, reverting extended grid commit 26a5cfe doesn't fix it. I'm trying the zap residual and history changes too separately Denise found that the first restart reproduces, but the next one doesn't ... maybe the thresholding in zap residual is sensitive to the order of operations across tasks (?) |
|
My understanding of zap residual is that it's just local and the parameters are identical at all grid points. I would be surprised if that result had a decomposition issue. But definitely worth confirming. |
|
Quick summary is that there is a lack of reproducibility in this update when change number of MPI tasks for 3 UFS RTs (cpld_mpi_p8, cpld_mpi_gfsv17, cpld_mpi_pdlib_p8). Towards isolating problem, I tried these today:
Also, there is a code difference in ice_read_hist for netcdf vs. pio if history restart file does not exist |
|
Changing code and fixing some cases sounds to me like a compiler issue. Maybe the next step is to reduce the optimization of the compilation in the ice model just to see if that makes the problem go away. If it does, then that doesn't absolve the ice model completely, but might point us one way or another. Is there something about the compilation in UFS that is too aggressive? Does this happen on all machines? Has the CICE code update created a situation where the compiler is being too aggressive or generating an error? Can a CICE modification provide a temporary fix? What machine does this happen on? What compiler is being used? What compiler options are being used? I could try to duplicate in the standalone model. |
|
Thanks @apcraig . It's the same failure on Ursa, Derecho, and Gaea.c6 at least, all with Intel OneAPI as in spack-stack 1.9.2 like I haven't tried with less optimization and will test in debug mode It's puzzling ... it's only happening on the 1 degree mesh coupled to active atmosphere. |
|
@NickSzapiro-NOAA Using the |
|
Interesting. Is there ice in Hudson Bay with that @DeniseWorthen ? Maybe RTs have have ice on land or such |
|
Just throwing out some additional ideas. Could there be a mismatch with the mask on the grid and initial condition? I don't know why this would be a problem though. Could there be an issue with a diagnostic that is seeing sea ice on land from the initial condition? What is not bit-for-bit, does the entire model solution diverge or are just some sea ice diagnostics different? |
|
So diffs are small (~1.0e-7) and only in Hudson bay. There are no local mods in your version, right? There was a change, CICE-Consortium#1062, that updated the haloUpdate and other infrastructure features quite dramatically. This should have been bit-for-bit and it came before CICE-Consortium#1054, so I assume it's not the problem. Your CICE-Consortium#1054 "version" has CICE-Consortium#1062 merged too, right? Maybe a haloupdate has changed somewhere and during initialization, halo values are not updated when they should be. That would be consistent with different results with different block sizes. Can you run on 1 pe with different block sizes? If not, how about a fixed pe count with different block sizes? |
|
So in addition to cpld_mpi_p8 , cpld_mpi_gfsv17 , cpld_mpi_pdlib_p8 all failing in "intel" compiler options across HPCs. The EMC/CICE branch is kept similar to CICE-Consortium/main. Currently, the only code difference is not relevant (i.e., the UFS tracing PR has been merged here but not yet at Consortium) fyi, these "mpi" tests change the number of tasks for each coupled component from their control, like: Most notably, if change only the ice MPI tasks from the control and keep other components with same tasks, tests pass in intel debug mode on Ursa. So it's weird ... how can the code changes in CICE relate to changing the number of tasks in another component? Since diffs are in first coupling interval, I imagine it's ice related to ATM or CMEPS. On another line of thought, still curious why reverting the CMIP7 history PRs changed a test failure to pass. Maybe flwout? |
|
@NickSzapiro-NOAA I think you're on to something here w/ the ATM min_seaice parameter. It would explain why I can't get a DATM config to fail. I don't remember checking explicitly, but if it is in ATM, I think we'd see that the fields toATM on the first coupling timestep are the same, but it the first fields from ATM (2nd coupling interval) are different. |
|
Several thoughts. The CMIP7 history PRs do change model output and diagnostics but do not alter the prognostic solution. But not sure how that affects answer changes with different pes. Can you clarify a few things. If you revert the zap PR, CICE-Consortium#1067, either by undoing the PR or setting do you recover the prior bit-for-bit capability? I know you tested these set to zero, but that turns it off. What you want to do is "turn it up" to recover the prior settings. If that has an impact, my guess is there is some interaction (still TBD) between the initial condition, the zapping parameter, and the coupling MIN_SEAICE. Although none of those things should be block/task variable. With the smaller values currently implemented (or setting the two parameters to zero), you will carry around more small ice concentrations. Maybe, with the old dyn_*_min values, the MIN_SEAICE was never invoked but now it is? Do your initial conditions have small ice concentrations that used to get zapped and now don't, particularly in the Labrador? |
|
Let me take a few more hours to clean up a branch with reproducer(s) For your clarifying points, we still have with a todo to try to reduce these with current UFS regression tests. And no, I was not able to keep current answers after the zap_residual PR. UFS tests do have concentrations down to ~ puny . It seemed that the added For UFS, maybe a good constraint is to have dyn_area_min < min_seaice so CICE is solving over the ice atmosphere sees. If it all works... |
|
I started a ufs-weather-model reproducer branch here: Two points to highlight:
I made the reproducer branch pointing to CICE-Consortium/main , if that's cleaner And thanks for the dialogue. It's been so helpful |
|
Changing ice+ocn and ice+wav tasks reproduce the control, but changing ice+atm+med fails. So the lack of reproducibility is in UFSATM or CMEPS. Tracing fice through the code, current suspect is when use_cice_alb this is uninitialized if fice<min_seaice: The code is in 2 places for some reason (?) |
|
For my test case, the first differences come back from the ATM; ICE sends the identical fields I think we need to track back in ATM commits to find the culprit. |
|
Changing line in ufsatm/io/fv3atm_sfc_io.F90 I can try some older atmosphere hashes but there's no guarantee this ever worked (as zap residual is new). At what point do we make this a UFSATM issue? |
|
My thinking is that a) all the differences show up only on tile3 and only in that specific region; we're zapping ice globally, so why don't we see differences globally? b) ATM gets identical values from ICE w/ either mpi or control but sends back diff values c) the control reproduces itself (not definitive, but less likely to be an uninitialized variable) d) I can't get a DATM config to fail. I think it has to be a decomp bug in the ATM. I've also edited my comment associated w/ the ice difference field from the mediator ice history files (above). This difference is the 2nd coupling timestep. On the first coupling, the ice mediator fields are identical. At the 2nd, the ice sends back diff values because it got different values from ATM at the first coupling. |

For detailed information about submitting Pull Requests (PRs) to the CICE-Consortium,
please refer to: https://github.com/CICE-Consortium/About-Us/wiki/Resource-Index#information-for-developers
PR checklist
Sync CICE-Consortium/main into EMC fork, including baseline changes for zapping residual ice, bug fixes in history variables, and new CMIP7 history variables. Also adds history restart feature.
See PRs at CICE-Consortium
UFS regression testing (Update CICE (2026-02) ufs-community/ufs-weather-model#3086) and preceding CICE-Consortium testing
EMC/CICE sync, including baseline changes for zapping residual ice, bug fixes in history variables, and new CMIP7 history variables. Also adds history restart feature. Closes #109