ASP support by misanthropealoupe · Pull Request #263 · mhvk/baseband

misanthropealoupe · 2018-08-01T15:23:08Z

basic asp StreamReader support

…w the primary header

cczhu · 2018-08-01T16:12:19Z

@misanthropealoupe Thanks! It would be great if you could also either post or e-mail a script you're currently using to read your data, and a link to where your data is located. We might not want to create a test suite with it (or do we?), but it would help my understanding greatly.

misanthropealoupe · 2018-08-01T16:26:55Z

you'll find a sample file here: baseband/data/sample_asp.asp
there's a simple header integrity check in baseband/asp/tests/test_asp.py::TestASP

my workflow is basically just using an ASPStreamReader to read frames

cczhu

I've gone through base.py and made some preliminary structural comments. More to come. For now though, doing:

from baseband import asp
from baseband.data import SAMPLE_ASP
fh_raw = io.open(SAMPLE_ASP)
fh = asp.ASPStreamReader(fh_raw)
fh.read(10)

returns a TypeError because the stream reader doesn't know what the sample shape is. Moreover the header doesn't seem to have frame_nbytes, shape, payload_nbytes, get_time, set_time, etc. required for the stream reader.

cczhu · 2018-08-09T21:21:32Z

+
+    def read_frame(self):
+        frame = ASPFrame.fromfile(self._fh_raw)
+        # associate the original file header with


Super nitpicky, but I've been capitalizing and adding trailing periods to all new and existing comments. (I don't think Marten's mandated that, though.)

cczhu · 2018-08-09T22:59:36Z

+class ASPStreamReaderBase(VLBIStreamReaderBase):
+
+    # unfinished
+    def __init__(self, fh_raw, header0):


Formats like VDIF have a stream base class because there are routines common to the reader and writer to inherit. Since ASP has no writer, it doesn't make sense to separate this class from the ASPStreamReader class below. I would unify the two classes, and inherit directly from VLBIStreamReaderBase.

It would be reasonable, though, to create an ASPFileReader class that includes read_frame, read_header, etc., like with VDIF. Doing so would make the code cleaner and fit better with the other formats, and make writing a tutorial revolving around adding ASP support more understandable.

cczhu · 2018-08-09T23:07:49Z

+
+        fh_raw.seek(pos)
+
+        super(ASPStreamReader, self).__init__(fh_raw, header0)


Even as a hack, it would be useful to have an open function (not built by vlbi_base.base.make_opener, since that allows for 'rb' and 'w' modes). The reason is because the stream reader currently doesn't open a string filename with io.open, so the user has to do it themselves.

open can be as simple as this:

def open(name): # Open file using io if not already opened. got_fh = hasattr(name, 'read') if not got_fh: name = io.open(name, 'rb') # Wrap file pointer with stream reader. try: return ASPStreamReader(name) except Exception as exc: if not got_fh: try: name.close() except Exception: # pragma: no cover pass raise exc

This simply tries to open the filename. If you want automatic file sequence opening, you can do:

from ..helpers import sequentialfile as sf def open(name): # If sequentialfile object, check that it's opened properly. if isinstance(name, sf.SequentialFileBase): assert name.mode == 'rb', ( "open only accepts sequential files opened in 'rb' " "mode for reading.") # If passed some kind of list, open a sequentialfile object. if isinstance(name, (tuple, list, sf.FileNameSequencer)): name = sf.open(name, 'rb') # Open file using io if not already opened. got_fh = hasattr(name, 'read') if not got_fh: name = io.open(name, 'rb') # Wrap file pointer with stream reader. try: return ASPStreamReader(name) except Exception as exc: if not got_fh: try: name.close() except Exception: # pragma: no cover pass raise exc

cczhu · 2018-08-09T23:09:37Z

           'open']


+def fr_open(path, mode='rb'):


Since fr_open and FileReader are used only by the ASP format, they should go in the ASP folder, not here.

Some further thoughts: I don't think at_start is a property that's currently called by anything in asp. As far as I can tell, it's meant to allow the stream reader to seek to the beginning of file to read the file header, but this is only useful when the stream reader is passed an offset file pointer. Currently, no format can properly handle opening these. For example, with VDIF:

from baseband import vdif from baseband.data import SAMPLE_VDIF import io fh_raw = io.open(SAMPLE_VDIF, 'rb') fh_raw.seek(134) fh = vdif.open(fh_raw, 'rs')

will return an EOFError. Doing the same thing with DADA returns a KeyError.

Instead of having at_start, it may make more sense just to let ASPStreamReader try and fail to read the header, since it's consistent with the other formats.

cczhu

Now had a look at the entire code. Most of my comments are for base.py in order to get the stream reader's read, tell, etc. methods reading properly, and fixes to header.py to properly tell header and payload sizes. Some outstanding issues include sequential file support and a small quirk of the sample file; see code comments.

Once the code itself (and test suite) is finalized, we should think a bit about documentation and presentation. I don't think it necessary to write a briefing or have examples as with the DADA docs, but it might be reasonable to have asp.open documented like dada.open is, including parameters and what they control. I also think an ASP entry should be made under File Formats in the docs, with a quick note on the ASP page itself that the format is read-only. Lastly, the notes file should be converted into a full-fledged tutorial, though that can be done in a separate pull request.

I'm personally a sticker for every commit included in master to be working code with a descriptive commit comment and without commented-out blocks or incomplete functions (though some of my own early PRs completely fail those requirements), so we may either squash and merge or do an interactive rebase before merging.

Also, we need to get your name on AUTHORS.rst!

cczhu · 2018-08-10T20:52:06Z

           'open']


+def fr_open(path, mode='rb'):


Some further thoughts: I don't think at_start is a property that's currently called by anything in asp. As far as I can tell, it's meant to allow the stream reader to seek to the beginning of file to read the file header, but this is only useful when the stream reader is passed an offset file pointer. Currently, no format can properly handle opening these. For example, with VDIF:

from baseband import vdif from baseband.data import SAMPLE_VDIF import io fh_raw = io.open(SAMPLE_VDIF, 'rb') fh_raw.seek(134) fh = vdif.open(fh_raw, 'rs')

will return an EOFError. Doing the same thing with DADA returns a KeyError.

Instead of having at_start, it may make more sense just to let ASPStreamReader try and fail to read the header, since it's consistent with the other formats.

cczhu · 2018-08-10T21:33:37Z

 from astropy.utils import lazyproperty

-__all__ = ['FileNameSequencer', 'SequentialFileReader', 'SequentialFileWriter',
+__all__ = ['FileReader', 'FileNameSequencer', 'SequentialFileReader', 'SequentialFileWriter',


The bigger problem with sequential files is that there's a file header for every file in an ASP sequence, which we want to skip over by seeking past the file header bytes for all files other than the first. This requires some reworking of both SequentialFileBase (notably, _open, _file_sizes and the way it's treated) and SequentialFileReader (possibly file_size, for consistency).

I think the modified versions of SequentialFileBase and SequentialFileReader - call them ASPSequentialFileBase, etc., should go in asp/base.py. Note that you can take advantage of diamond inheritance by doing:

class ASPSequentialFileBase(helpers.SequentialFileBase): ... class ASPSequentialFileReader(helpers.SequentialFileReader, ASPSequentialFileBase): ...

which will let you override the SequentialFileBase methods as needed without having to rewrite all of the SequentialFileReader methods that depend on them.

Should be done with help of getting the raw offset from the frame index

cczhu · 2018-08-10T21:38:20Z

@@ -0,0 +1,32 @@
+import pytest


I'm not certain for ASP whether it's necessary to test anything other than the stream reader itself. @mhvk, what do you think - should the payload, header and frame classes also be tested?

We should also consider testing sequential files.

cczhu · 2018-08-10T22:36:46Z

@@ -0,0 +1,29 @@
+notes:


Thank you, this is quite useful!

I'm hoping for these notes to eventually be turned into a design document and tutorial within the docs. It should be left out of the code folder itself.

cczhu · 2018-08-10T23:30:10Z

+        ('epoch', '<f4'),
+        ('pad', 'V2')])    # manual padding hack
+
+    _header_parser = HeaderParser(make_parser_from_dtype(_dtype))


It's not obvious to me why the header parser is needed. Currently fromvalues and __setitem__ don't work (for good reasons), and __getitem__ has been over written. For keys, it's possible to just do _dtype.names to recover a tuple of names.

cczhu · 2018-08-11T22:07:15Z

+
+
+NPOL = 2
+NDIM = 2


As far as I can tell these are also hardcoded into dspsr, but the sample file's file header gives nchan = 8, as does /mnt/scratch-lustre/mahajan/Ar_P2067/P2067_1/p2067/node16/2006_05_18_01/000007.asp.raw. Are the file headers not to be trusted?

cczhu · 2018-08-11T22:16:28Z

 0059000.dat.
 """
+
+SAMPLE_ASP = _full_path('sample_asp.asp')


The file header contains imjd and fmjd values:

imjd: 53873, fmjd: 0.32216575686339866,

while the frame header seems to always have the less accurate:

iMJD: 53873.0, fMJD: 0.3225,

Reading /mnt/scratch-lustre/mahajan/Ar_P2067/P2067_1/p2067/node16/2006_05_18_01/000007.asp.raw, however, I see that the frame header has just as accurate a fractional MJD value as the file header. Is this the case for all ASP files, and if so, can the sample file be altered to match?

cczhu · 2018-08-11T22:34:50Z

+            pos = pos2
+
+        fh_raw.seek(pos)
+


To get read, tell, etc., working in units of time, we could add:

@property def start_time(self): return self.header0.file_header.time @lazyproperty def stop_time(self): """Time at the end of the file, just after the last sample. See also `start_time` for the start time of the file, and `time` for the time of the sample pointer's current offset. """ return (self.header0.file_header.time + (self._nsample / self.sample_rate).to(u.s)) @lazyproperty def _nframe(self): # May misbehave for partial last frames... curr_pos = self.fh_raw.tell() nframe = ((self.fh_raw.seek(0, 2) - self.header0.file_header.nbytes) // self.header0.frame_nbytes) self.fh_raw.seek(curr_pos) return nframe @lazyproperty def _nsample(self): """Number of complete samples in the stream data.""" return self._nframe * self.header0.samples_per_frame def _read_frame(self, index): self.fh_raw.seek(index * self.header0.frame_nbytes + self.header0.file_header.nbytes) return self.read_frame()

mhvk · 2020-07-11T22:38:39Z

closing in favour of #439, which builds on this PR, addressing the comments, and providing more complete support and better integration.

misanthropealoupe added 8 commits July 22, 2018 13:08

header parser prototype for asp file format

2b23bc1

added basic stream reader functionality

e04869b

added correct sample file

6abcd62

moved file header management to ASPStreamReader; ASPBlockHeader is no…

c988484

…w the primary header

added simple asp header test

663b76a

merged changes from mhvk baseband

b4152a7

fixed decoding error

80147eb

merged latest from mhkv master

29c4f62

misanthropealoupe added 5 commits August 2, 2018 03:27

added proper EOF handling

d77f58a

fixed syntax

c468a45

formatting

01cb409

more formatting

01ecdeb

more formatting

8c5a659

cczhu suggested changes Aug 9, 2018

View reviewed changes

cczhu self-assigned this Aug 10, 2018

cczhu added the asp label Aug 10, 2018

cczhu added this to the 2.0 milestone Aug 10, 2018

cczhu changed the title ~~Asp~~ ASP support Aug 10, 2018

cczhu suggested changes Aug 11, 2018

View reviewed changes

mhvk modified the milestones: 2.0, 3.0, 1.2.1 Dec 3, 2018

mhvk modified the milestones: 3.0, 3.1 Aug 29, 2019

mhvk removed this from the 3.1 milestone Jan 24, 2020

mhvk added this to the 4.0 milestone Jan 24, 2020

mhvk mentioned this pull request Jul 11, 2020

Add support for the ASP format #439

Open

4 tasks

mhvk closed this Jul 11, 2020


		fh_raw.seek(pos)

		super(ASPStreamReader, self).__init__(fh_raw, header0)



		NPOL = 2
		NDIM = 2

Conversation

misanthropealoupe commented Aug 1, 2018

Uh oh!

cczhu commented Aug 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

misanthropealoupe commented Aug 1, 2018

Uh oh!

cczhu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cczhu Aug 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cczhu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mhvk commented Jul 11, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cczhu commented Aug 1, 2018 •

edited

Loading

cczhu Aug 9, 2018 •

edited

Loading