I read a paragrah from tmap-book, but does not understand the phrase " lexicographically smallest DNA base" and why R is converted to C, can you explain this?
Ambiguous IUPAC codes in the reference/target FASTA will be converted to the lexico-
graphically smallest DNA base that is not compatible to the IUPAC code to ensure mini-
mum reference bias. For example, an IUPAC base R, which represents an A or a G, will be
converted to a C. All Ns in the reference will be converted to As. Furthermore, any non-
IUPAC character will be treated as an N. The ambiguity codes will only be re-considered
when calculating the NM and MD SAM record optional tags.
I read a paragrah from tmap-book, but does not understand the phrase " lexicographically smallest DNA base" and why R is converted to C, can you explain this?
Ambiguous IUPAC codes in the reference/target FASTA will be converted to the lexico-
graphically smallest DNA base that is not compatible to the IUPAC code to ensure mini-
mum reference bias. For example, an IUPAC base R, which represents an A or a G, will be
converted to a C. All Ns in the reference will be converted to As. Furthermore, any non-
IUPAC character will be treated as an N. The ambiguity codes will only be re-considered
when calculating the NM and MD SAM record optional tags.