Probably the most common form of
sequencing these days is "paired-end" sequencing. It's very
impressive: the sequencing machine can process the same nucleic acid
fragment from both ends! This means that each observation looks like:
| forward read | gap | reverse read |
Because accuracy ("quality") tends to drop off as you sequence further
into a fragment, sequencing from both ends gives you much more
accurate data than trying to sequence the whole thing from one end.
And because we build up larger sequences ("contigs") by piecing
together overlapping ones ("assembly"), two sequences of bases
separated by a gap are actually usually more helpful than the same
number of bases without a gap.
It's common to refer to paired-end sequencing with designations like
"2x150", where the "2x" tells us it's paired-end, and the "150" tells
us it reads for 150 bases from each end, for a total of 300 bases per
But this introduces a terminology question: what is a read? When we
only had "single-end" sequencing it was clear: each sequenced fragment,
each contiguous sequence of bases, was a read. With paired-end
sequencing, however, these are no longer the same thing! There are
two things a "read" could mean:
For example, say we have:
This is a forward read (SRR14530724.2 2/1) and a reverse read
(SRR14530724.2 2/2) that together comprise a single observation of a
fragment from the sample and would generally by analyzed together.
Does this count as one read or two?
Turns out people do both, and it leads to a lot of misunderstandings!
Illumina counts them as two. They say the "25B" flow
cell on the NovaSeq X will
produce 52B paired end reads, or ~8Tb ("terabases", or
trillion bases) of 2x150. Since 52B * 150b = 7.8Tb (what they call
~8Tb), that tells us they're counting both the forward and the
Element counts them as one. They
say a 2x150 high output flow cell produces 300 Gb and 1B reads.
Since 1B * 150b * 2 = 300 Gb, that tells us they're counting the
forward and reverse read together as one.
Singular is not
clear but I'm pretty sure they're counting each fragment's
observations as a single read.
The European Nucleotide Archive counts them as one. For
example, if you visit ERR1470825,
which is Illumina MiSeq paired end sequencing at 2x250, you'll see it
says 2.2M reads and if you download the fastq.gz files you'll find
2.2M reads in each of the forward and reverse files.
et. al 2021 counts them as one. They say "paired reads" on first
use and then "reads" later, and you can tell that they are counting
them as one because (a) they often give odd numbers for
things like "there were only 337 SARS-CoV-2 reads" and (b) if you reanalyze the data their
numbers only make sense if they're counting pairs.
An academic group I was recently talking to about a potential
partnership counted them as one in a recent paper I
reanalyzed and as two when talking over email.
A commercial sequencing company I was recently talking to
counted them as two.
Asking ChatGPT and Claude, both count them as two. Ex: "In
short-read paired-end sequencing, a forward-reverse pair is typically
considered as two reads".
This is a mess! And, to make it worse, as far as I can tell there's
no standard term other than "read" either "what both the forward and
reverse read are examples of" or "what the forward and reverse read
are when considered together".
I've been using "read" to mean "read pair", but given the ambiguity I
think I should switch to another term. The NCBI SRA uses "spots", but no
one else seems to use this terminology. You can just say "read pair", which
is pretty good, but a bit long. Possible "pairs" or "mates" would be
Comment via: facebook, mastodon
"Paired-end reads" is the most common way to phrase it (at least in my experience). One paired-end read comprises both ends and is equivalent to a read pair. I agree that "read pair" may be less confusing.
I wish "paired end read" always meant that, but I've encountered people saying "N paired end reads" to mean "N reads generated by paired end sequencing", where "read" is either a forward or reverse read.
For example, copying from a recent email: "100 M paired end reads (50 M in each direction)"