Skip to content

Status Updates

E. M. Bray edited this page Oct 10, 2025 · 6 revisions

Semi-regular status updates on libasdf development progress

2025-10-10

Progress has continued on ASDF read support in SourceXtractor++. All parts of the code that can read array data out of FITS files can now equivalently be read out of ASDF files, though it will be useful to get some real-world feedback on its use (WCS in ASDF is still not yet supported though, but more on that briefly).

I wrote some basic instructions on using SE++ with ASDF here: https://github.com/embray/SourceXtractorPlusPlus/wiki/Example-usage

One interesting feature of SE++ which I have not explored much yet is its Python-based configuration system: That is, in addition to simple flat config files, it can also be configured using an arbitrarily complex Python script for more complicated and dynamic processing pipelines. This includes a Python package (built into SE++) that contains some utilities that users can use for their configuration scripts. The Python code is then read directly by SE++ (through its built-in Python interpreter). My work includes some provisional code for working with ASDF files in this environment as well, though further work is needed on the Python interface side for this to truly work. We'll also need to add new config settings for ASDF--for example instead of "HDU numbers" one wants to reference a YAML path within an ASDF file for the data arrays. We will also need a config setting for where to look in an ASDF file for WCS info.

We will punt on that work for now pending further feedback from users in the mission office.

As for WCS support I made progress on that this week, implementing C native data structure for small pieces of the GWCS specification--just enough to read a WCS with the fitswcs_imaging transform from pixel to celestial coordinates.

It turns out that while not technically challenging, it is very time consuming to manually write code wrapping all of the GWCS specification, especially when one gets into the transforms. And here we're not even talking about actually implementing the GWCS transforms themselves--I refer simply to reading the steps, the frames, the transforms, etc. out of YAML and into easier to work with C native data structures.

If we do continue down this path in the future it will probably make sense to turn to some meta-programming to to generate code from the JSON Schemas. I think this would be quite doable actually.

But in the meantime I will continue implementing just the fitswcs_imaging transform. I'm not convinced yet for the first pass it will really be necessary to implement full data structures for all the possible projection transforms--just to be able to map, e.g. gnomonic -> TAN etc. and not worry too much about things like the input and output names. It seems to me the easiest way to go then is to convert these to the corresponding FITS CTYPEi header values like RA---TAN to put in the wcsprm. This still looks to get a little more complicated than I initially thought but I think I can get a working test case by next week.

2025-09-15

We are back from vacation and up-and-running again. A prototype implementation of ASDF (read-only) support in SourceXtractor++ is now starting to work--detection images can be read from ASDF files.

This required some refactoring of SE++ to abstract file reading a bit more, specifically more agnosticism towards the underlying file type when opening image files for reading.

There are still some FITS-specific bits. Namely reading metadata out of the FITS headers used by SourceXtractor. Particularly things like the GAIN, FLXSCALE and SATURATE keywords understood by SE. We would need to define an ASDF-specific convention for this as well. Perhaps as keywords on the ndarray object in the ASDF tree. These values can also be passed in from the configuration file so they do not strictly have to come from the image file itself, though I think it's used fairly commonly. Not sure.

Next steps will include extending support for ASDF files in other parts of the SE++ code which should now be straightforward, then looking into reading basic WCS...

2025-08-05

Progress has slowed a bit in the last month due to vacations, but continues apace. Milestone 1 can be broadly considered reached, with Milestone 2 in progress, especially with work on issue #38, the first prototype of the extension registry system.

This is working well, and some of the core ASDF schemas are already implemented, including core/asdf, core/history_entry, core/extension_metadata, and core/software with more to come soon. The interface seems to be working well and will integrate well with future plans for external extension plug-ins (issue #39).

The most complex schema to support is of course ndarray. Work on this will start out as a minimal viable product, with the ability to get raw access to array data, as well as the metadata required to interpret it (only for N-D arrays of scalar values to start; more complex cases like record arrays will come later).

The bare minimum required to understand the datatype and dimensions of an array will already be straightforward with the new extension interface. This is the last bit needed to begin implementation of prototype ASDF support in SourceExtractor++ (documented in issue #24).

Finally, as the library is nearly in a usable state (especially once ndarray support is added) we will begin some API documentation, especially for the high-level API, with example code (this is issue #22). I'm a bit hesitant about this, however, as there is still some roughness around the edges when it comes to memory management. Internally, the code is reasonably clear and consistent with no memory leaks that I'm aware of, but I would like to make it more user-friendly when it comes to memory management, with fewer cases where users of the library have to manually track memory allocations. This is especially covered in issue #34.

2025-06-27

Block index support

Status update for the week ending June 27: I have been on vacation and working only sporadically on libasdf the last two weeks. However, we made progress in parsing the optional end-file block index, and using it to locate the start positions of blocks within files. This complicates the code somewhat (which is why the Java ASDF library has not even implemented it yet, nor do I think either of the C++ libraries). In fact, trying to implement it myself led to finding some shortcomings with the standard as to how this should be implemented--the very design of it actually has a bit of ambiguity involved. Some discussion of these problems took place on #3. There are already discussions for the ASDF Standard v2.0 of overhauling the block index, and I think we need to continue those discussions and/or make some prototypes.

My thinking falls along these lines:

  • Just as there is a binary block header, I believe there should also be a (small, fixed-size) block footer, which may duplicate some of the block header, but may also contain information not found in the header. I'm thinking:

    • The foooter must be aligned to some boundary to make searching for it simpler. I would propose aligning it to a 4096 byte boundary.
    • It could contain a magic string, the offset of the start of the block, the total size of the block, and a fixed-length checksum of the block data (TBD)
  • Change the block index to be a binary block itself:

    • Include a flag bit in the header that indicates that it is a block index
    • It can still be found at the end of the file
    • Having a predictable alignment on which we can expect to find a block footer can tell us predictably where to find the end of a block. If the data at that offset looks like a valid block footer, then it can also tell us unambiguously where to find the block header as well, and hence whether that block represents a block index.

No real advantage to having the block index in YAML. As it is the binary portion of an ASDF file is not all "human readable" (e.g. block headers are binary as well). At the time this was deemed a reasonably tradeoff--that the block headers serve more as hints to the software reading the data, than as structural interpretation of the data (as found in the human readable YAML tree). The block index is in this same category. It's entirely optional, and only serves as a hint to the software. I don't recall why we decided to write it as YAML in the first place. It only makes it more difficult and more ambiguous to parse.

Considering ASDF support in SourceXtractor++

We discussed the possibility of adding native support for reading ASDF files in either the original SExtractor or in its successor SourceXtractor++. The latter was deemed more likely to accept patches, and being built on a more modern code-base, more likely to be easily extended.

After researching the code for each project I came to the same conclusion. SourceXtractor++ is not too badly tied to FITS at least when it comes to reading image data. There are some places that become more FITS-specific when it comes to reading or writing metadata to a file, but this could still be adapted to ASDF files.

The trickiest part is WCS handling, as ASDF does not necessarily impose one single WCS format (one of its strengths). But then any software that wants to read WCS data out of an ASDF file has to be able to recognize the WCS extension in use, and have code to integrate it. Ideally this should be "easy" (one of the goals of the libasdf project is to make it relatively easy to implement reading of extension types as plugin libraries that can be included in other code). Then it's a question of having a C or C++ library that actually implements the WCS transforms on the pixel data given the WCS information from the file.

For an initial prototype the concern is reading images files for RST. As I understand it it is using the GWCS schemas, but only a very limited subset of its features, so we may be able to start just with that subset that is simple enough to be understandable by wcslib.

The SourceXtractor++ code is somewhat more FITS-specific here:

  • There is a class called WCS, whose constructor assumes an instance of the FitsImageSource class.
  • There is a FitsImageSource.getFitsHeaders that essentially returns a FITS header as a string.
  • It initializes wcslib using the FITS headers directly.

IIRC wcslib does not have to be used this way though, so this WCS class could be extended to support our hypothetical AsdfImageSource, with its own method to return WCS data that can initialize wcslib.

2025-06-17

With the merging of PR #20 I think the most important work on "low-level" parsing of ASDF files is in good shape. There is more work to be done on specific use cases, especially streaming I/O but that is a somewhat niche case, not of highest priority. Remaining to do there is:

  • Support reading and taking advantage of the ASDF block index
  • Capture ranges of padding / empty space in files -- this will be important to make note of for writing updates to an existing file

I would like to get the block index working. Capturing the padding ranges should be mostly straightforward as well, but I will defer that until thinking more about writing.

The next step is to start adding APIs for actually reading values out of the tree. At first pass this is just a thin wrapper around libfyaml, but with a few enhancements:

  • Extended support for different types of scalar values: libfyaml, like some other low-level YAML parsers, punt on how to interpret scalar values into some native type. This makes sense: if I have a key like count: 8 I could read that as virtually any integer type in C. Python will interpret it as an int, though I could also technically read it as a string (char *) if I wanted to as well.

My idea is to have an asdf_value_t type that would be a tagged union of different types. If it is a scalar value, by default it remains "uninterpreted", and there is a function to receive its raw, uninterpreted value. But libasdf will have a preferred default interpretation of value that follows predictable logic similar to Python, and functions for accessing the value as a given type if possible (or return an error result if not). the asdf_value_t is then (lazily) re-cast to a more specific value type.

This logic applies (as a first pass) for standard built-in scalar types, but will also eventually be used for custom tagged types (which can include mappings and arrays, as well as scalars). I will document my prototype concept for extension types in the next status update.

Clone this wiki locally