You are on page 1of 4

Bioinformatics: Sequence Editing

Good Sequence:

Why is this good?


1) Peaks are relatively equal in width
2) Little to no background noise
3) One base called per peak
*** NOTE *** Ideally we would like to see equal peak heights

Also good. Peak heights low but equal in width and height (mostly).

Note that on both of these examples, the bases called at the top of the windows are matched to
the peaks, there is very little overlap between peaks, and no background noise.
Mediocre Sequence:

This sequence may be usable, but it needs to be edited and checked by eye for miscalled and
multiply-called bases. Specifically note that peak widths and heights are uneven. There also
seems to be some background noise, especially in the G trace. The best plan is to look for
multiply-called bases, circled in red here. If you have a good reverse sequence (or any
overlapping sequence from the same individual) to compare to, these issues can be resolved with
fairly high confidence. The analysis algorithm in the CEQ 8800 looks at both peak height and
peak width, among other data, to call bases, which is how this problem arises.

Same as above, only worse. Note excessive numbers of bases called for many of the peaks, more
background noise, more heterogeneity in peak width and height. Calling this sequence requires a
good sequence from the same region and individual to refer to, or vast quantities of experience.
You will likely find that the beginning of sequences (i.e. - close to the primer sequence) has
similar problems. This is due to an excess of unincorporated ddNTPs (single labeled bases) at the
"short" end of the read (first to pass the sensor). This noise should be deleted from the sequence
prior to BLAST search.
Bad Sequence:

Both of these have very poor signal to noise ratio. Note extreme peak width and height
heterogeneity. This is typically what the longest end of sequence looks like. The gel (the
electrophoretic matrix) tends to break down through the running of a single sample, leading to
poor length separation. Sequence that looks like this toward the long end (i.e. - far from the
primer) should be deleted prior to BLAST search.
More Bad Sequence:

Low signal. Not enough template added to cycle sequencing reaction. Peaks are fairly uniform in
width, but peak height is low. Not advisable to use this sequence.

NOISE. DO NOT USE. This may have been caused by current fluctuations during the run,
extreme gel breakdown, non-specific PCR product (i.e. - primer amplifies multiple loci), or
problems with reagents.

If you open your file and you see nothing but grey, the CEQ was not able to call any bases, likely
owing problems with technique. Use the reference sequence for your unknown to complete this
part of the project, and be sure to state this in your final paper.

You might also like