-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Hi,
first off I want to express my appreciation that you developed this package. Up to now I would convert with msconvert to mzML and then import with mzR. The loss of the spectrum header data in this process always annoyed me.
With that being said I have noticed the following with a file of 135 MB acquired on an Orbitrap Elite in MS1 mode, m/z 50 -2000, 8796 scans. On a PC with 16 GB memory reading the whole file with
>spec<-rawrr::readSpectrum(files,scan=1:8796)
Error in .rawrrSystem2Source(rawfile, input = scan, rawrrArgs = "scans", :
Parsing the output of 'C:\Users\operator\AppData\Local/R/cache/R/rawrr/rawrrassembly/rawrr.exe' failed for an unknown reason.
Please check the debug files:
C:\Users\operator\AppData\Local\Temp\RtmpQVfxD5\file291814a545c5.stderr
C:\Users\operator\AppData\Local\Temp\RtmpQVfxD5\file291829ef2f34.stdout
and the System Requirements
fails. The following works:
>beRaw <- Spectra::backendInitialize(
+ MsBackendRawFileReader::MsBackendRawFileReader(),
+ files = files)
> spec<-rawrr::readSpectrum(files,scan=1:2932)
Object sizes are 814 kB for beRaw (full file) and 179 MB for spec (1/3 of file) which brings me to the reason for my issue post. The rtime and the mzslots of all spectra are empty in beRaw:
>beRaw
MsBackendRawFileReader with 8796 spectra
msLevel rtime scanIndex
<integer> <numeric> <integer>
1 1 NA 1
2 1 NA 2
3 1 NA 3
4 1 NA 4
5 1 NA 5
... ... ... ...
8792 1 NA 8792
8793 1 NA 8793
8794 1 NA 8794
8795 1 NA 8795
8796 1 NA 8796
... 26 more variables/columns.
file(s):
02_eggpep.raw
>mz(beRaw)
NumericList of length 8796
[[1]] numeric(0)
[[2]] numeric(0)
[[3]] numeric(0)
[[4]] numeric(0)
[[5]] numeric(0)
[[6]] numeric(0)
[[7]] numeric(0)
[[8]] numeric(0)
[[9]] numeric(0)
[[10]] numeric(0)
...
<8786 more elements>
In spec those slots are filled with data:
>spec[1:10]|>map(~length(.x$mZ))
[[1]]
[1] 828
[[2]]
[1] 799
[[3]]
[1] 802
[[4]]
[1] 921
[[5]]
[1] 960
[[6]]
[1] 841
[[7]]
[1] 836
[[8]]
[1] 826
[[9]]
[1] 846
[[10]]
[1] 881
> spec[1:10]|>map("StartTime")
[[1]]
[1] 0.006031667
[[2]]
[1] 0.01105833
[[3]]
[1] 0.018325
[[4]]
[1] 0.02559167
[[5]]
[1] 0.03284667
[[6]]
[1] 0.04016667
[[7]]
[1] 0.04972167
[[8]]
[1] 0.05707
[[9]]
[1] 0.06377833
[[10]]
[1] 0.071045
Am I doing something wrong here with my MS data or is this a bug in MsBackendRawFileReader . And I know it is not the right place but can anything be done about the memory hunger of rawrr::readSpectrum? I can read in several of files like the above with MSnbase::readMSData without a problem.
EDIT:
I just realized that peaksData(beRaw) does provide the mz values albeit very, very slowly and without the rtime.
ANOTHER EDIT:
Is data read in with MsBackendRawFileReader "on-disk" or "in-memory"? That is not clear from the help documentation but looking at the object size it appears to be "on-disk".