Conversation
…w the primary header
|
@misanthropealoupe Thanks! It would be great if you could also either post or e-mail a script you're currently using to read your data, and a link to where your data is located. We might not want to create a test suite with it (or do we?), but it would help my understanding greatly. |
|
you'll find a sample file here: baseband/data/sample_asp.asp my workflow is basically just using an ASPStreamReader to read frames |
cczhu
left a comment
There was a problem hiding this comment.
I've gone through base.py and made some preliminary structural comments. More to come. For now though, doing:
from baseband import asp
from baseband.data import SAMPLE_ASP
fh_raw = io.open(SAMPLE_ASP)
fh = asp.ASPStreamReader(fh_raw)
fh.read(10)
returns a TypeError because the stream reader doesn't know what the sample shape is. Moreover the header doesn't seem to have frame_nbytes, shape, payload_nbytes, get_time, set_time, etc. required for the stream reader.
|
|
||
| def read_frame(self): | ||
| frame = ASPFrame.fromfile(self._fh_raw) | ||
| # associate the original file header with |
There was a problem hiding this comment.
Super nitpicky, but I've been capitalizing and adding trailing periods to all new and existing comments. (I don't think Marten's mandated that, though.)
| class ASPStreamReaderBase(VLBIStreamReaderBase): | ||
|
|
||
| # unfinished | ||
| def __init__(self, fh_raw, header0): |
There was a problem hiding this comment.
Formats like VDIF have a stream base class because there are routines common to the reader and writer to inherit. Since ASP has no writer, it doesn't make sense to separate this class from the ASPStreamReader class below. I would unify the two classes, and inherit directly from VLBIStreamReaderBase.
It would be reasonable, though, to create an ASPFileReader class that includes read_frame, read_header, etc., like with VDIF. Doing so would make the code cleaner and fit better with the other formats, and make writing a tutorial revolving around adding ASP support more understandable.
|
|
||
| fh_raw.seek(pos) | ||
|
|
||
| super(ASPStreamReader, self).__init__(fh_raw, header0) |
There was a problem hiding this comment.
Even as a hack, it would be useful to have an open function (not built by vlbi_base.base.make_opener, since that allows for 'rb' and 'w' modes). The reason is because the stream reader currently doesn't open a string filename with io.open, so the user has to do it themselves.
open can be as simple as this:
def open(name):
# Open file using io if not already opened.
got_fh = hasattr(name, 'read')
if not got_fh:
name = io.open(name, 'rb')
# Wrap file pointer with stream reader.
try:
return ASPStreamReader(name)
except Exception as exc:
if not got_fh:
try:
name.close()
except Exception: # pragma: no cover
pass
raise exc
This simply tries to open the filename. If you want automatic file sequence opening, you can do:
from ..helpers import sequentialfile as sf
def open(name):
# If sequentialfile object, check that it's opened properly.
if isinstance(name, sf.SequentialFileBase):
assert name.mode == 'rb', (
"open only accepts sequential files opened in 'rb' "
"mode for reading.")
# If passed some kind of list, open a sequentialfile object.
if isinstance(name, (tuple, list, sf.FileNameSequencer)):
name = sf.open(name, 'rb')
# Open file using io if not already opened.
got_fh = hasattr(name, 'read')
if not got_fh:
name = io.open(name, 'rb')
# Wrap file pointer with stream reader.
try:
return ASPStreamReader(name)
except Exception as exc:
if not got_fh:
try:
name.close()
except Exception: # pragma: no cover
pass
raise exc
| 'open'] | ||
|
|
||
|
|
||
| def fr_open(path, mode='rb'): |
There was a problem hiding this comment.
Since fr_open and FileReader are used only by the ASP format, they should go in the ASP folder, not here.
There was a problem hiding this comment.
Some further thoughts: I don't think at_start is a property that's currently called by anything in asp. As far as I can tell, it's meant to allow the stream reader to seek to the beginning of file to read the file header, but this is only useful when the stream reader is passed an offset file pointer. Currently, no format can properly handle opening these. For example, with VDIF:
from baseband import vdif
from baseband.data import SAMPLE_VDIF
import io
fh_raw = io.open(SAMPLE_VDIF, 'rb')
fh_raw.seek(134)
fh = vdif.open(fh_raw, 'rs')
will return an EOFError. Doing the same thing with DADA returns a KeyError.
Instead of having at_start, it may make more sense just to let ASPStreamReader try and fail to read the header, since it's consistent with the other formats.
cczhu
left a comment
There was a problem hiding this comment.
Now had a look at the entire code. Most of my comments are for base.py in order to get the stream reader's read, tell, etc. methods reading properly, and fixes to header.py to properly tell header and payload sizes. Some outstanding issues include sequential file support and a small quirk of the sample file; see code comments.
Once the code itself (and test suite) is finalized, we should think a bit about documentation and presentation. I don't think it necessary to write a briefing or have examples as with the DADA docs, but it might be reasonable to have asp.open documented like dada.open is, including parameters and what they control. I also think an ASP entry should be made under File Formats in the docs, with a quick note on the ASP page itself that the format is read-only. Lastly, the notes file should be converted into a full-fledged tutorial, though that can be done in a separate pull request.
I'm personally a sticker for every commit included in master to be working code with a descriptive commit comment and without commented-out blocks or incomplete functions (though some of my own early PRs completely fail those requirements), so we may either squash and merge or do an interactive rebase before merging.
Also, we need to get your name on AUTHORS.rst!
| 'open'] | ||
|
|
||
|
|
||
| def fr_open(path, mode='rb'): |
There was a problem hiding this comment.
Some further thoughts: I don't think at_start is a property that's currently called by anything in asp. As far as I can tell, it's meant to allow the stream reader to seek to the beginning of file to read the file header, but this is only useful when the stream reader is passed an offset file pointer. Currently, no format can properly handle opening these. For example, with VDIF:
from baseband import vdif
from baseband.data import SAMPLE_VDIF
import io
fh_raw = io.open(SAMPLE_VDIF, 'rb')
fh_raw.seek(134)
fh = vdif.open(fh_raw, 'rs')
will return an EOFError. Doing the same thing with DADA returns a KeyError.
Instead of having at_start, it may make more sense just to let ASPStreamReader try and fail to read the header, since it's consistent with the other formats.
| from astropy.utils import lazyproperty | ||
|
|
||
| __all__ = ['FileNameSequencer', 'SequentialFileReader', 'SequentialFileWriter', | ||
| __all__ = ['FileReader', 'FileNameSequencer', 'SequentialFileReader', 'SequentialFileWriter', |
There was a problem hiding this comment.
The bigger problem with sequential files is that there's a file header for every file in an ASP sequence, which we want to skip over by seeking past the file header bytes for all files other than the first. This requires some reworking of both SequentialFileBase (notably, _open, _file_sizes and the way it's treated) and SequentialFileReader (possibly file_size, for consistency).
I think the modified versions of SequentialFileBase and SequentialFileReader - call them ASPSequentialFileBase, etc., should go in asp/base.py. Note that you can take advantage of diamond inheritance by doing:
class ASPSequentialFileBase(helpers.SequentialFileBase):
...
class ASPSequentialFileReader(helpers.SequentialFileReader, ASPSequentialFileBase):
...
which will let you override the SequentialFileBase methods as needed without having to rewrite all of the SequentialFileReader methods that depend on them.
There was a problem hiding this comment.
Should be done with help of getting the raw offset from the frame index
| @@ -0,0 +1,32 @@ | |||
| import pytest | |||
There was a problem hiding this comment.
I'm not certain for ASP whether it's necessary to test anything other than the stream reader itself. @mhvk, what do you think - should the payload, header and frame classes also be tested?
We should also consider testing sequential files.
| @@ -0,0 +1,29 @@ | |||
| notes: | |||
There was a problem hiding this comment.
Thank you, this is quite useful!
I'm hoping for these notes to eventually be turned into a design document and tutorial within the docs. It should be left out of the code folder itself.
| ('epoch', '<f4'), | ||
| ('pad', 'V2')]) # manual padding hack | ||
|
|
||
| _header_parser = HeaderParser(make_parser_from_dtype(_dtype)) |
There was a problem hiding this comment.
It's not obvious to me why the header parser is needed. Currently fromvalues and __setitem__ don't work (for good reasons), and __getitem__ has been over written. For keys, it's possible to just do _dtype.names to recover a tuple of names.
|
|
||
|
|
||
| NPOL = 2 | ||
| NDIM = 2 |
There was a problem hiding this comment.
As far as I can tell these are also hardcoded into dspsr, but the sample file's file header gives nchan = 8, as does /mnt/scratch-lustre/mahajan/Ar_P2067/P2067_1/p2067/node16/2006_05_18_01/000007.asp.raw. Are the file headers not to be trusted?
| 0059000.dat. | ||
| """ | ||
|
|
||
| SAMPLE_ASP = _full_path('sample_asp.asp') |
There was a problem hiding this comment.
The file header contains imjd and fmjd values:
imjd: 53873,
fmjd: 0.32216575686339866,
while the frame header seems to always have the less accurate:
iMJD: 53873.0,
fMJD: 0.3225,
Reading /mnt/scratch-lustre/mahajan/Ar_P2067/P2067_1/p2067/node16/2006_05_18_01/000007.asp.raw, however, I see that the frame header has just as accurate a fractional MJD value as the file header. Is this the case for all ASP files, and if so, can the sample file be altered to match?
| pos = pos2 | ||
|
|
||
| fh_raw.seek(pos) | ||
|
|
There was a problem hiding this comment.
To get read, tell, etc., working in units of time, we could add:
@property
def start_time(self):
return self.header0.file_header.time
@lazyproperty
def stop_time(self):
"""Time at the end of the file, just after the last sample.
See also `start_time` for the start time of the file, and `time` for
the time of the sample pointer's current offset.
"""
return (self.header0.file_header.time +
(self._nsample / self.sample_rate).to(u.s))
@lazyproperty
def _nframe(self):
# May misbehave for partial last frames...
curr_pos = self.fh_raw.tell()
nframe = ((self.fh_raw.seek(0, 2) - self.header0.file_header.nbytes) //
self.header0.frame_nbytes)
self.fh_raw.seek(curr_pos)
return nframe
@lazyproperty
def _nsample(self):
"""Number of complete samples in the stream data."""
return self._nframe * self.header0.samples_per_frame
def _read_frame(self, index):
self.fh_raw.seek(index * self.header0.frame_nbytes +
self.header0.file_header.nbytes)
return self.read_frame()
|
closing in favour of #439, which builds on this PR, addressing the comments, and providing more complete support and better integration. |
basic asp StreamReader support