You are on page 1of 16

(

(IJCSIS) Internaational Journal of


o Computer Scieence and Information Security,
Vol. 9, No. 2, 2011

HS-M
MSA: New
N Algorith
A hm Baased onn Metaa-heuriistic
Harm
mony Searchh for Solving
S g Multtiple Sequen
S nce
Alignmennt
Survey andd Proposedd Work
Mubarak S. Mohsen, Rosni Abduullah,
Scchool of Compu uter Sciences, School of Computter Sciences,
U
Universiti Sain
ns Malaysia, Unniversiti Sains Malaysia,
Penang, Malaysia,
M Penang, Maalaysia,
m
mobarak_seif@ @yahoo.com. rosni@cs.usm.my.

Absstract—Alignin ng multiple biiological sequeences such as in Aliggnment is a method


m to arrannge the sequennces one over
prootein or DNA/RRNA is a fundam mental task in bioinformatics
b a
and the othher to show the match annd mismatch between the
sequence analysis.. In the functioonal, structural and evolutionaary residuees. A column which
w has mattch residues shhows that no
studdies of sequencce data the rolee of multiple sequence alignmeent mutatioon has occurrred whereas a column witth mismatch
(MS SA) cannot be denied. It is im mperative that there is accurate symbolls indicates thaat several mutaation events arre happening.
aliggnment when predicting
p the RNA
R structure.. MSA is a majjor To impprove the alignnment score, thhe character “––” is used to
bioiinformatics chaallenge as it is NP-complete. In addition, the t corresppond to a spacee introduced in the sequence. This space is
lackk of a reliable scoring metho od makes it haarder to align the
t usuallyy called a gap. The gap is vieewed as an inssertion in one
sequences and evvaluate the allignment outcoomes. Scalabiliity,
sequencce and deletionn in the other. A score is useed to measure
biollogical accuracyy, and computa ational complexxity must be tak
ken
intoo considerationn when solving g MSA problem m. The harmoony
the aliggnment performmance. The higghest score of one indicates
searrch algorithm is a recent meeta-heuristic method
m which has
h the bestt alignment.
beeen successfully applied
a to a nu
umber of optim mization problemms. Forr clarity’s sakee, the generic MSA
M problem is expressed
In this
t paper, an adapted harmo ony search algoorithm (HS-MS SA) using thhe following declaration:
d “Innsert gaps withiin a given set
metthodology is prroposed to solv ve MSA probleem. In addition n, a of sequuences in ordeer to maximizee a similarity criterion”[1].
hyb brid method of finding the connserved regions using the Divid de- Findingg an accurate MSA
M from the sequences is very
v difficult.
andd-Conquer (DA AC) method is proposed to reducer the searrch
It is a time conssuming and computationally NP-hard
spaace. The propossed method (HS S-MSA) is exten nded to a paralllel
problemm[2, 3]. The MSA M problem can be divideed into three
appproach in orderr to exploit thee benefits of thhe multi-core and
a
GPU system so as to reduce comp putational compplexity and timee. difficullties, that is, scalability, opptimization, and
a objective
functionn.
Keyword: RNA
A, Multiple sequ
uence alignmentt, Harmony searrch In fact, the com mplexity that arises from all a the three
algoorithm. problemms must be soolved simultanneously. The first f problem,
I. INTR
RODUCTION
scalabillity, is about finding the alignment off many long
sequencces. The secoond problem, optimization,, deals with
Living organisms are relatted to each othero throughoout findingg the alignmentt with the highhest score baseed on a given
evoolution. A paiir of organism ms sometimes has a comm mon objectivve function ammong the sequeences. Optimizaation of even
anccestor in the paast from whichh they were evoolved. MSA trries a simplle objective fuunction is an NP-hard
N probleem. The third
to discover
d the sim
milarities amon
ng the sequence and recover the
t problemm, the objective function (OF F), involves speeeding up the
muutations that toook place. calculattion in order too measure the alignment.
a
A sequence isi an ordered list of symbols from a set of MSSA covers two closely
c related problems: globbal MSA and
lettters of the alphabet, S (20 amino
a acids for
fo protein andd 4 local MSA.
M Global MSA
M aligns sequences acrosss their whole
nuccleotides for RNA/DNA). In bioinform matics, a RN NA length while local MSA
M aligns certtain parts of thhe sequences,
seqquence is writteen as s = AUUUUCUGUAA.. It is a string of and loccates conserveed regions alonng with them as shown in
nuccleotides symbbols comprisin ng adenine (AA), cytosine (C
C), Figure 1.
guaanine (G) and uracil
u (U): S = {A, C, G, U}.

Figure 1. Global and local MSA


M

70 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 2, 2011

In bioinformatics, MSA is a major interesting problem and proposed to solve the old MSA problem. The MSA problem is
constitutes the basis for other molecular biology analyses. viewed as an optimization problem and can be resolved by
MSA has been used to address many critical problems in adapting a harmony search algorithm. Since the search space in
bioinformatics. Studying these alignments provides scientists HS is wide, a modified algorithm is proposed (MHS-MSA) to
with information needed to determine the evolutionary find the conserved blocks using well-known regions, and then
relationships between them, find the sequences of the family, align the mismatch regions between the successive blocks to
detect the structure of protein/DNA, reveal the sequence form a final alignment. HS-MSA is extended to include the
homologies, predict the functions of protein/DNA sequences, divide-and-conquer (DCA) approach in which DCA is used to
and predict the patient’s diseases or discover drug-like cut and combine the sub-sequence to form the final MSA.
compounds that can bind to the sequences. Another proposed technique is to use the harmony search
algorithm as an MSA improver (HSI-MSA) in which the initial
In general, the primary step in the secondary structure
alignment can be obtained from the conventional algorithms or
prediction is through MSA, particularly in the prediction of the
their combinations. HS-MSA can be extended to the parallel
structure of RNA sequences. The RNA structure prediction
algorithm (PHS-MSA) in order to exploit the benefits of the
method is extremely affected by the quality of the
multi-core and GPU system to reduce computational
alignment[4]. Indeed, prediction of an accurate RNA secondary
complexity and time.
structure relies on multiple sequence alignments to provide data
on co-varying bases[5]. MSA significantly improves the This paper is organized as follows: Section 2 reviews the
accuracy of protein/RNA structure prediction. For example, related literature and describes the state-of-the-art MSA
current RNA secondary structure prediction methods using approaches. Section 3 explains the proposed algorithm. The
aligned sequences have been successful in gaining a higher evaluation and analysis methodology that is used to assess our
prediction accuracy than those using a single sequence[6]. proposed algorithm is explained in Section 4. Lastly, Section 5
Nucleic acid sequences are of primary concern in our proposed provides the conclusion and summary of the paper.
method to evaluate and improve the influence of the alignment
tools on RNA secondary structure prediction. II. LITERATURE REVIEW
Many different approaches have been proposed to solve the There are several MSA algorithms reported in the literature
MSA problem. Dynamic programming, progressive, iterative, review. For a deeper understanding about the MSA algorithms,
consistency and segment-based approaches are the most the basic concepts of MSA alignment representation, gap
commonly used approaches[7]. Although many MSA penalty, alignment scores, dataset benchmarks, MSA
algorithms are available, a solution has yet to been found that is approaches, and harmony search algorithm need to be
applicable to all possible alignment situations[7]. understood. As such subsection 2.1 briefly reviews the
representation of MSA alignment followed by the details about
It is well-known fact that the MSA problem can be solved gap penalty in subsection 2.2. The alignment scores, RNA
by using the dynamic programming (DP) algorithm[8, 9]. datasets and benchmarks, and current MSA approaches are
Unfortunately, such an approach is notorious for its large explained in subsections 2.3, 2.4 and 2.5 respectively.
consumption of processing time. DP methods with the sum-of- Subsection 2.6 provides a summary of the MSA algorithms and
pairs score have been shown to be a NP-complete concludes with the harmony search algorithm in subsection 2.7.
problem[10],[11]. Algorithms that provide the optimal solution
is time consuming and have a running time that grows A. Representation of MSA Alignment
exponentially with the increase in the number of sequences and There are several ways to represent a multiple sequence
their lengths. alignment. Usually, the final sequences are an aligned listing of
the entire sequence of one over the other. However, during the
In essence, all widely used MSA tools seek an alignment alignment process, it is helpful to represent the alignment of the
with a high sum-of-pairs score. This optimization problem is sequences in a manner known as a representation. Some of the
NP-complete[2, 3] and thus motivates the research into representations that have been used in previous algorithms
heuristics. Over the last decade, the evolutionary and meta- include a bit matrix as used in[12], a matrix of gaps position as
heuristic approaches are one of the most recent approaches that used in[13], multiple number-strings as used
have been used to solve the optimization problem. in[14],[15],[16],[17], string representation[18],[19],[20] as used
Evolutionary and meta-heuristic algorithms have been used in in SAGA[18], four parallel chromosomes as used in[21],
several problem domains, including science, commerce, and directed acyclic graph (DAG) as used in[22, 23], A-Bruijn
engineering. Consequently, most of the practical MSA graph as used in[24-26] , and dispersion Graph as used in[27].
algorithms are based on heuristics to obtain a reasonably
accurate MSA within a moderate computational time and that B. Gaps Penalty
which usually produces quasi-optimal alignment. Although A negative score or a penalty can be assigned to a set of
many algorithms are now available, there is still room to gaps. Two types of gaps which were mentioned in the previous
improve its computational complexity, accuracy, and reviews[28] are defined as follows:
scalability.
- Linear gap model – in this model a Gap is always given
In this paper, a novel algorithm (HS-MSA), that is, a meta- the same penalty wherever it is placed in the alignment.
heuristic technique known as harmony search algorithm, is The penalty is proportional to the length of the gap and is

71 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 2, 2011
given by gap = n×go, where go < 0 is the opening penalty aligned residue pairs[36]. It has been used in PRIME[37],
of a gap and n is the number of consecutive gaps. and ProbCons[38] algorithms.
- Affine gap model – in this model both the new gap and - Consistency-based Scoring: This consistency concept was
extension gap are not given the same penalty. The originally introduced by Gotoh [9] and later refined by
insertion of a new gap has a greater penalty than the Vingron and Argos[39]. Consistency-based scoring is used
extension of an existing gap and is given by gap = go + (n in T-Coffee[40], MAFFT[41], and Align-m[42]
− 1) × ge, where go < 0 is the gap opening penalty and ge algorithms.
< 0 is the gap extension penalty and are such that |ge| <
|go|. - Probabilistic consistency Scoring function: This scoring
function is introduced in ProbCons[38]. It is a novel
C. Alignment Score modification of the traditional sum-of-pairs scoring
The MSA objective function is defined for assessing the system. This promising idea is implemented and extended
alignment quality either explicitly or implicitly. An efficient in the PECAN[43], MUMMALS[44], PROMALS[45],
algorithm is used to find the optimal or a near optimal ProbAlign[46] , ProDA[47], and PicXAA[48] programs.
alignment according to the objective function. Matches, - Segment-to-segment objective function: It is used by
mismatches, substitutions, insertions, and deletions need to be DIALIGN[49] to construct an alignment through
scored in the scoring function. The scoring function can be comparison of the whole segments of the sequences rather
divided into two parts: substitution matrices and gap penalties. than the residue-to-residue comparison.
The former provides a numerical score for matches and
mismatches while the latter allows for numerical quantification - NorMD[50] objective function: It is a conservation-based
of insertions and deletions. All possible transitions between the score which measures the mean distance between the
20 amino acids, or the 4 nucleic acids are represented in a similarities of the residue pairs at each alignment column.
substitution matrix which is an array of two dimensions of 20 x NorMD is used in RASCAL[51] and AQUA[52].
20 for amino acid and 4 x 4 for nucleic acids. - Muscle profile scoring function: MUSCLE[53] uses a
Usually a simple matrix used for DNA or RNA sequences scoring function which is defined for a pair of profile
involves assigning a positive value for a match and a negative positions. In addition to PSP, MUSCLE uses a new profile
value for a mismatch[20]. Meanwhile, the scores for protein function which is called the log-expectation (LE) score.
aligned residues are given as log-odds[29] substitution matrices D. RNA Database and Benchmarks
such as PAM[30], GONNET[31], or BLOSUM[32].
Typically, a benchmark of reference alignments is used to
There are several models for assessing the score of a given validate the MSA program. The accurate score is given by
MSA. Many MSA tools have adopted the score method. A comparing the aligned sequence (test sequences) produced by
brief review of the score method that has been used to calculate the program with the corresponding reference alignment. Most
the alignment score is as follows: alignment programs have been extensively investigated for
- Sum-of-Pairs (SP): It was introduced by Carrillo and protein. To date, few attempts have been made to benchmark
Lipman[10]. More details about the sum-of-Pairs will be nucleic acid sequences.
presented later. RNA reference alignments exist in several databases. It
- Weighted sum-of-pairs score[33],[34]: The weighted sum- must be noted that although these databases provide a
of-pairs (WSP) score is an extension of the SP score so substantial amount of information to the specialist, they do
that each pair-wise alignment score contributes differently differ in the file formats used and the data obtained. Herein, a
to the whole score. brief review of the benchmarks and database that have been
used for multiple RNA sequence alignment is explained in
- Maximal expected accuracy (MEA)[35]: The basic idea of Table 1.
MEA is to maximize the expected number of “correctly”

TABLE I. DATABASE AND BENCHMARKS


RNA Database Description Website
,
Rfam[54] [55] It is a compilation of alignment and covariance models including many http://rfam.sanger.ac.uk/
regular non-coding RNA families[55] http://rfam.janelia.org/index.html.
BRAliBase[56],[57] It is a compilation of RNA reference alignments especially designed for the http://www.biophys.uni-
benchmark of RNA alignment methods[57]. duesseldorf.de/bralibase/
http://projects.binf.ku.dk/pgardner/bralibase/
Comparative RNA Website It has alignments for rRNA (5S / 16S / 23S), Group I Intron, Group II http://www.rna.ccbb.utexas.edu/
(CRW)[58] intron, and tRNA for various organisms[58]
European Ribosomal RNA It is a collection of all complete or nearly complete SSU (small subunit) and http://bioinformatics.psb.ugent.be/webtools/
Database[59],[60] LSU (large subunit) ribosomal RNA sequences available from public rRNA/
sequence databases[60].
The Ribonuclease P It contains a collection of sequence alignments, RNase P sequences, three http://www.mbio.ncsu.edu/RnaseP/
Database[61] dimensional models, secondary structures, and accessory information[61].
5S Ribosomal RNA It is a collection of the large subunit of most organellar ribosomes and all http://biobases.ibch.poznan.pl/5SData/

72 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 2, 2011
Database[62] cytoplasmic. This database is intended to provide information on nucleotide
sequences of 5S rRNAs and their genes[62].
tmRNA[63] tmRNA (also known as 10Sa RNA or SsrA) contains a compilation of http://www.indiana.edu/~tmrna/
sequences, alignments, secondary structures and other information. It shows
secondary structure, together with careful documentation[63].
The tmRDB( tmRNA tmRDB provides aligned, secondary and tertiary structure of each tmRNA http://www.ag.auburn.edu/mirror/tmRDB/
database)[64] molecule. The alignment is available in several formats.
RNAdb[65],[66] It provides sequences and annotations for tens of thousands of non-coding http://research.imb.uq.edu.au/rnadb/default.a
RNAs. spx
Noncoding RNA (ncRNA) It provides information of the non-coding RNA sequences and functions of http://biobases.ibch.poznan.pl/ncRNA/
database[67] transcripts, (the non-coding RNA does not code for proteins, but performs
regulatory roles in the cell)

sequence alignment) combined two different alignment


E. Current MSA Approaches strategies, that is, progressive and consistency approaches.
Many research on MSA algorithms have been published in
the last thirty years and reviewed by a few researchers such 2) Block-based Approach
as[7],[68],[69],[70]. The published algorithms vary in the way Block-based MSA is a method in which an alignment is
the researchers choose the specified order to do the alignment, constructed by first identifying the conserved regions into what
and in the procedure used to align and score the sequences. is called “blocks”. Then, the regions between the successive
Existing algorithms can be classified into one or combinations blocks are aligned to form a final alignment[74]. Block-based
of the following basic approaches: exact, progressive, iterative methods can be included in the consistency or probability-
algorithms, group alignment, block-based, consistency-based, based[75] approach. A block can be referred to a sub-sequence,
probabilistic, computational intelligence, and heuristic. The a segment, a region, or a fragment[76]. A fragment is defined
following subsections provide a brief overview of the as pairs of ungapped segments of the input sequences[77]. A
consistency-based, block-based and heuristic optimization weight score is assigned to each possible fragment to find the
approaches. These approaches are related in one way or the consistent fragments with high overall sum of fragment scores.
other to our proposed work. The consistency-based approach Those fragments are integrated from a pair-wise alignment into
is explained in subsection 2.5.1 followed by the block-based a multiple alignment.
approach in subsection 2.5.2. Finally, the heuristic Searching for these conserver blocks in many blocked-
optimization approach is explained in subsection 2.5.4. based methods is very time-consuming. Therefore, the key
1) Consistency-based Approach issue is how to construct the possible set of blocks
The “consistency-based” approach is one of the strategies efficiently[75].
that has been proposed to improve the MSA scoring function. Some of the previous algorithms such as those undertaken
This approach tries to reduce the chance of early errors when by Boguski et al.,[78]; Miller,[79]; Miller et al.,[80] construct
constructing the alignment instead of correcting the existing blocks either by pair-wise alignment or by those not matched
errors via post processing[40],[38]. This is typically achieved by all the N sequences. Instead of starting from pair-wise
by improving the pair-wise sequence quality based on other alignments, Match-Box[81] aims to identify conserved blocks
sequences in the alignment so as to obtain pair-wise alignments (or boxes) among the sequences without performing a pair-
that are consistent with one another. This consistency strategy wise alignment. Similarly, Zhao and Jiang [74] introduced the
was originally described by Gotoh[9] and later refined by BMA algorithm which allows for internal gaps and some
Vingron and Argos[39]. This strategy has been modified by degree of mismatch in the method used to identify the blocks.
several methods since then.
Based on a combination of local and global alignment,
SAGA[18] incorporated the optimization of alignment with Dialign[71],[82],[83] involves an extensive use of the segment-
COFFEE based on a consistency measure called the by-segment methods. It combines the local and global
consistence-based objective function. alignment features by identifying and adding the conserve
Later, Dialign2[71] represented the consistency-based regions (block) shared between the sequences based on their
method incorporating the segment-by-segment approach. consistency weights.

Similarly, Align-m[42] used a local alignment as a guide to Based on the anchored alignment, CHAOS[84] used fast
a global alignment non-progressive problem. Align-m used the local alignments as "seeds" for a slower global-alignment.
pair-wise alignment consistency to find the parts that are CHAOS is used to improve DIALIGN[71] and LAGAN[85].
consistent with each other. Recently, Wang et al.[75] produced a block-based
T-Coffee[40] also implemented this idea by using a algorithm called BlockMSA. It combined the biclustering and
consistency-based alignment measure based on a library of divide-and-conquer approaches to align the sequences.
pair-wise alignments. This method was later brought into a 3) Heuristic Optimization Approaches
probabilistic framework by ProbCons[38], MUMMALS[44], Many optimization problems from various fields have been
ProbAlign[46], PROMALS[45], and MSAProbs[72]. solved by using diverse optimization algorithms.
Nonetheless, a combination of different strategies can be Computational intelligence (CI) plays an important role in
used. For instance, PCMA[73] (profile consistency multiple solving the sequence alignment problem. Recently,

73 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 2, 2011
Evolutionary Algorithms have the advantage of operating on It shows efficiency in solving the MSA problems such as
several solutions simultaneously, combining an exploratory those reported in[101],[102] where each proposed algorithm
search through the solution space with the exploitation of was based on the ant colony optimization and divide-and-
current results[15]. There are no restrictions on the sequence conquer technique. Other researchers such
numbers or their length. It is very flexible in optimizing the as[103],[104],[27],[105] relied on the ant colony to solve the
solution with low complexity. Many efforts have attempted to MSA problem in their research work.
solve the MSA problem using evolutionary programming[86],
[87]. Since MSA has computational difficulty, there is no best c) Particle Swarm Optimization
method that can solve MSA professionally. Particle swarm optimization (PSO) is a swarm intelligence
technique for numerical optimization. It simulates the
Heuristic optimization approaches include genetic behaviour of bird flocking or fish schooling. PSO was
algorithm, ant colony, swarm intelligence, simulating presented by Kennedy and Eberhart[106] in 1995. The
annealing, tabu search, and combinations thereof. In the simplicity of implementation, quick convergence, and few
following subsections, the several techniques of heuristic parameters have resulted in PSO gaining popularity.
optimization approaches are explained to show how these
techniques are applied to solve the MSA problems. Many researchers have made modifications to the PSO idea
and utilized this technique widely in solving MSA problems.
a) Genetic Algorithm Rasmussen and Krink[107] used a combination of particle
Genetic Algorithm (GA) is a heuristic search that performs swarm optimization and evolutionary algorithms to train
an adaptive search to find optimal solutions of large-scale HMMs for protein sequences alignment. Meanwhile, Pedro et
optimization problems with multiple local minima[15] using al.[108] presented an algorithm based on PSO to improve a
techniques that simulate natural evolution. sequence alignment previously obtained using ClustalX. Juang
and Su[109] produced an algorithm which combined the pair-
GA is well suited for solving some NP-complete problems wise DP and particle swarm optimization (PSO) to overcome
such as MSA. Sequence Alignment by Genetic Algorithm the local optimum problems. Xu and Chen[110] designed an
(SAGA)[18] is the earliest GA to be used to solve MSA improved particle swarm optimization to solve MSA. Based on
problems. With the GA approach there are different methods
the idea of chaos optimization Lei et al.[111] produced chaotic
that can be applied to solve the MSA problem such as the one PSO (CPSO) to solve MSA. A novel algorithm of mutation-
used in[13], [12],[17],[88],[19],[20]. based binary particle swarm optimization (M-BPSO) was
Some methods are a hybrid with other approaches. Zhang presented by Hai-Xia et al.[112] for solving MSA.
and Wong[89] presented a method that used pair-wise dynamic
d) Simulated Annealing
programming (DP) technique based on GA. Similarly, utilizing
GA in a progressive approach has been presented in[90]. Later, Simulated annealing (SA) was described by
Wang and Lefkowitz[91] produced the GenAlignRefine Kirkpatrick[113]. Simulated annealing is an algorithm that
algorithm which uses a genetic algorithm to improve local attempts to simulate the physical process of annealing. The
region alignment which leads to improving the overall quality basic concept of simulated annealing algorithms is based on
of global multiple alignments. In[92] GA is used as an iterative observing the change of energy in which materials solidify
method to refine the alignment score obtained by the from the liquid state to the solid state[114].
progressive method. The use of GA to find the cut-off point in Several SA algorithms have been used to solve MSA
the divide-and-conquer approach is presented in[93]. Using problem. Kim et al.[115] used simulated annealing to develop
similar combinations, a novel algorithm of genetic algorithm the MSASA algorithm for solving MSA. Uren et al,[116]
with ant colony optimization GA-ACO was presented by Lee et presented MAUSA that used simulated annealing to perform a
al.[94]. Chen et al.[95] reported a method which employs a search through the space of possible guide trees. Meanwhile,
new selection scheme to avoid premature convergence in GAs. Keith et al.[117] described a new algorithm for finding a
Taheri and Zomaya[96] presented RBT-GA using a consensus sequence by using the SA method. Omar et al.[118]
combination of the Rubber Band Technique (RBT) and the produced a combination of Genetic Algorithm and Simulated
Genetic Algorithm (GA). Jeevitesh et al.[97] proposed the Annealing to solve MSA problems. Roc[114] presented a
PASA algorithm which used the alignment outputs of two method for multiple DNA sequence alignment in which an
MSA programs – MCoffee and ProbCons – and combined optimal cut-off point is chosen by the genetic simulated
them in a genetic algorithm model. annealing (GSA) techniques. Joo et al.[119] presented a new
b) ANT Colony method called MSACSA for MSA, which is based on the
conformational space annealing (CSA). CSA combines three
Ant colony optimization algorithm (ACO) is a probabilistic traditional global optimization methods, that is, SA, genetic
technique for solving computational problems. It is one of the algorithm (GA), and Monte Carlo with minimization (MCM).
swarm intelligence families. The ACO algorithm is used as a
new cooperative search algorithm in solving optimization e) Tabu Search
problems. ACO was inspired from the observation of the Tabu search is a meta-heuristic approach used to solve
activities of real ants[98],[99],[100]. Recently, ACO is used to combinatorial optimization problems. Tabu search (TS) and
solve the NP-complete problems. simulated annealing are similar in that both traverse the
solution space by testing mutations of an individual solution.
However, they differ in the number of generated solutions.

74 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 2, 2011
While simulated annealing generates only one mutated model and the intensification heuristic approach to further
solution, tabu search generates many mutated solutions and improve the alignment.
moves to the solution with the lowest energy of those
generated. TS has been used to solve MSA problems. Riaz at F. Summary of Related Algorithms for MSA
el.[120] has implemented the adaptive memory features of tabu Table 2 lists the most current algorithms that are in use.
search to refine MSA. Lightner[121] used a tabu search This list is incomplete but includes the most related algorithms
approach to obtain multiple sequence alignment and explored explained above. Online availability is the link to the online
iterative refinement techniques such as the hidden Markov server or the site which can download and access the particular
algorithm.

TABLE II. CURRENT MSA ALGORITHMS

Algorithm Approach RNA Online Availability Reference

MAFFT Consistency Y http://mafft.cbrc.jp/alignment/server/ [122]


MUSCLE Progressive/ refinement Y http://www.ebi.ac.uk/Tools/msa/muscle/ [123]
Dialign2 Consistency/ segment Y http://bibiserv.techfak.uni-bielefeld.de/cgi-bin/dialign_submit [71]
Align-m Consistency N http://bioinformatics.vub.ac.be/software/software.html [42]
3-way consistency/
BlockMSA Y http://aug.csres.utexas.edu/msa/ [75]
Block/DCA
MAUSA SA N http://eprints.utas.edu.au/208/ [116]
SAGA Iterative/Stochastic/GA Y http://www.tcoffee.org/Projects_home_page/saga_home_page.html [18]
Mishima k-tuple Y http://esper.lab.nig.ac.jp/study/mishima/ [124]
http://sourceforge.net/projects/msaprobs/
MSAProbs Pair-HMM and partition function Y [72]

pecan Consistency/ progressive - http://www.ebi.ac.uk/~bjp/pecan/ [43]


PicXAA posterior probability/ consistency Y http://www.ece.tamu.edu/~bjyoon/picxaa/ [48]
PRIME GROUP-TO-GROUP/ ANCHOR Y http://prime.cbrc.jp/ [37]
ProAlign HMM/ progressive Y http://applications.lanevol.org/ProAlign/ [125]
posterior probability
PROBCONS N http://probcons.stanford.edu/index.html [38]
pair-hmm
ProDA repeated and shuffled elements Y http://proda.stanford.edu/ [47]
Probalign posterior probabilities Y http://probalign.njit.edu/probalign/login [46]
[126],
REFINER Refinement/ Block - ftp://ftp.ncbi.nih.gov/pub/REFINER
[127]
AIMSA Region - - [128]
Profile/iterative
PRALINE - http://www.ibi.vu.nl/programs/pralinewww/ [129]
/progressive
T-COFFEE Consistency/ Progressive Y http://www.tcoffee.org/ [40]

MUMMALS N http://prodata.swmed.edu/mummals/mummals.php [44]


Probability HMM
PROMALS Y http://prodata.swmed.edu/promals/promals.php [45]
k-mer/ Pair-HMM consistency
PCMA k-mer/ Profile/consistency - ftp://iole.swmed.edu/pub/PCMA/pcma/ [73]
BMA Conserve block Y - [74]
GA-ACO GA and Ant colony - - [94]
PASA Refine by GA - - [97]

on one of the three options (memory consideration, pitch


G. Harmony Search Algorithm adjustment, and random selection). This is the equivalent of
Harmony search algorithm (HS) is developed by finding the optimal solution in an optimization process.
Geem[130]. HS is a meta-heuristic optimization algorithm
based on music. Geem et al.[130] models HS components into three
quantitative optimization processes as follows:
HS simulates a team of musicians together trying to seek
the best state of harmony. Each player generates a sound based

75 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(
(IJCSIS) Internaational Journal of
o Computer Scieence and Information Security,
Vol. 9, No. 2, 2011
- The Harmonny memory (H HM): It is useed to keep goood indepenndent processees are perform med in each sub-HM. A
harmonies. A harmony fro om HM is seelected random mly periodic regrouping schedule
s is useed to exchangee information
based on the parameteer called harrmony memoory betweenn the sub-HMss, so that the population
p diveersity and the
considering (or
( accepting) rate,
r HMCR Є [0,1]. It typicaally improvvement in thee accuracy of o the final solution are
uses HMCR = 0.7 ~ 0.95. maintaiined. In additiion, the parammeters are adjuusted using a
new deeveloped adaptiive strategy to enable it to bee used with a
- The pitch adjjustment: It is similar to a loocal search. It is particullar problem or phase of the seearch process.
used to generrate a slightly different solutiion from the HM
H
depending onn the pitch-adju usting rate (PA
AR) values. PA AR Reccently, Zou at a el.[136] prooposed a novvel algorithm
controls the degree of the t adjustmennt by the pittch known as a global haarmony search algorithm (NGGHS) to solve
bandwidth (bbrange). It usuaally uses PAR = 0.1~0.5 in moost reliability problems.
applications.
GHS modifies thhe improvisatiion step of the HS. Position
NG
- The random m selection: A new harmonny is generatted updatinng and genetic mutation are new
n operationns included in
randomly to increase the diversity
d of thhe solutions. The
T NGHS.. Position updaating enables thhe worst harmoony of HM to
probability off randomization is Prandom = 1- HMCR , and a move toward
t the gloobal best harm
mony rapidly while
w genetic
the actual probability of th
he pitch adjustm ment is Ppitchh = mutatioon prevents NG
GHS from becooming trapped into the local
HMCR × PA AR. optimum.
The pseudo coode of the basicc HS algorithm
m with these thrree III. THE PROPOSED
D ALGORITHM
mponents is sum
com mmarized in Fiigure 2.
Herrein, in this arrticle several algorithms
a are proposed to
Haarmony Search
h Algorithm solve thhe MSA probllem by using thhe adapted harrmony search
Beggin algorithhm (HS). Adapptive HS for MSAM is explaineed in the next
D
Declare the objecttive function f(x), x =(x1,x2, …,xn) subsecttion 3.1. A moodified HS alggorithm for redducing search
I
Initialize the harmmony memory acceepting rate (HMCR R) space isi explained inn subsection 3.22. Subsection 3.3 describes
I
Initialize pitch adjusting rate (PAR) and other parameters the HS Improver. Finnally, in subsecttion 3.4 a paraallel HS-MSA
I
Initialize Harmonyy Memory with ran ndom harmonies
W
While (t<max num mber of iterations )
is introoduced which can be implem mented in diffe
ferent parallel
If (rand<H HMCR), platformms such as the Multi-core andd GPU. Figuree 3 shows the
Choose a value from HM stages of
o the proposedd research frammework.
If (rannd<PAR), Adjust the
t value by addinng certain amount
End iff
Else choosee a new random vaalue
End if
End while
Calculate the objective
o function
Accept the new w harmony (solution) if better
Update HM
E while
End
F
Find the current beest solution in HMM
Endd
Figure 2. Pseudo Code of the Harmony
H Search Algorithm[131]
A

Later, Geem[132] proposed d an ensemble harmony searrch


(EHHS) where a neew ensemble consideration opperation is addded
to the original HSH structure. The
T new operration takes innto
acccount the relationship among the decision variables,
v and the
t
valuue of each deecision variablee can be chossen based on the
t
othher variables.
Thereafter, Mahdavi
M et al..[133] produceed an improvved
harrmony search (IHS),
( in whichh the parameteer PAR and pittch
banndwidth are adjjusted dynamiccally in the impprovisation stepp.
So far, Omrann and Mahdavi[134] have prroposed a globbal-
besst harmony searrch (GHS) in which
w the perfoormance of HS
S is
impproved by borrrowing the con ncepts from sw
warm intelligennce
to modify
m the pitcch-adjustment step
s such that thet new harmoony
is assigned
a by thee best harmony in the HM.
Meanwhile, Pan
P at el.[135] produced a loccal-best harmoony
seaarch algorithm with dynamiic subpopulatioons (DLHS) for f
solvving continuoous optimization problem ms. The DLH HS
algoorithm differs from the existiing HS in that a whole harmoony
mem mory (HM) is divided in nto many subb-HMs and the t Figuure 3. Research Framework.
F

76 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 2, 2011
A. Proposed Harmony Search Algorithm for MSA To find the optimal solution in the HS-MSA, the sum-of-
The main goal of the MSA algorithms is to detect and align pairs (SP) score described in[139],[140],[10],[107] will be used
the homologous regions across the different sequences. This is to calculate the Objective Function (OF) where there is no prior
achieved by optimizing an objective function that measures the knowledge of the reference alignment. The general form of the
quality of the alignment. The harmony search is a new meta- OF score of alignment n sequences which consists of M
heuristic optimization algorithm which has a history in solving columns is:
NP-complete problems[137]. This subsection explains the OF = ∑    S m G m  ,
ability of the harmony search algorithm in solving MSA
problem. Herein alignment representation, objective function, where S m is the similarity score of the column mi,
harmony memory initialization, and adaptive harmony search G m is the gap penalty of the column mi and l is the
algorithm for MSA are explained in greater details. sequence length. The similarity score of the column mi can be
measured by the sum-of-pairs (SP). The SP-score S(mi) for the
1) Alignment Representation
Alignment of N sequences with different lengths from L1 to i-th column mi is calculated as follows:
LN, are represented as a matrix N x W where each row contains
gap positions encoded for each sequence. The length of the S(mi) = ∑ ∑ s m ,m ,
rows in the matrix is W = [αLmax], where Lmax = max
{L1,L2,..,LN}, and [x] is the smallest integer greater than or where m is the j-th row in the i-th column. For aligning
equal to x, and the parameter α is a scaling factor[86]. The two residues x and y, the substitution matrix s(x,y) is used to
value α is chosen according to the probability distribution. The give the similarity score.
value of α can be 1.2 as used in[94] or 1.5 as used 3) Harmony Memory Initialization
in[138],[13],[20]. The choice of 1.2 is to allow the aligned For a given 5 sequences, the procedure to initialize the
sequences to be 20% longer than the longest sequence. harmony memory is as follows: Maximum sequence length is
Meanwhile the selection of 1.5 is to allow the alignment to be MaxS = 7, minimum sequence length is MinS = 4, maximum
50% longer than the longest sequence in the test as in [138]. length of alignment is W = [1.2 * 7] = 9, maximum gaps in
2) Objective Function sequence Si is (W – Li) where Li is the length of sequence i,
maximum number of gaps is Gs = 9 – 4 = 5.
Generate
Gap positions in Sort
Length Gap
Sequence ascending
Li Positions
(W-Li)
(W-Li)
A U C A A 5 4187 1478
U A A U C A A 7 32 23
A U C A 4 34789 34789
U A A U C A U 7 62 26
A U G A U U 6 729 279
A. Gaps Position

- A U - C A - - A
U - - A A U C A A
A T - - C A - - -
U - A A U - C A U
A - U G A U - U -
B. Aligned sequence
Figure 4. Harmony memory initialization

The initial harmony memory is randomly generated and the positions as in[94]. The generation gap positions are less than
rows are initialized in the following way: First, a random the generation residue positions for each sequence. The second
permutation number W-Li of gap positions is generated from a difference is related to the first step in that the number of
range of values (1 – W) for each sequence Si with length Li. permutations are (W-Li) and not W as in[94].
Second, those numbers (W-Li) are sorted and used to indicate
where the corresponding gaps are placed in the matrix. Finally, 4) Adaptive Harmony Search Algorithm for MSA (AHS-
the positions in the matrix rows which are not associated by MSA)
gaps are filled with the base symbols taken from the original The purpose of AHS-MSA is to aid scientists in producing
sequence. a high quality of MSAs that may lead to a better RNA structure
prediction (Figure 5) as well as other issues in molecular
The random initialization procedure that produces the initial biology. To date in reviewing the approaches to solving the
Harmony memory is illustrated in Figure 4. This is similar to MSA problem or in predicting the multiple RNA secondary
the procedure used in [94]. The difference in our procedure is structure, we have found that no studies have incorporated the
that the gap positions are generated and not the residue use of the harmony search algorithm. The only research that

77 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 2, 2011
has involved HS in bioinformatics is that of Mohsen et al.[141] sequence based on Minimum Free Energy.
which predicted the secondary structure for a single RNA

RNA Sequences Aligned RNA Sequences RNA


MSA A - -AAACAAAAACGGAACA rithm
2D Struct.
AAAACAAAAACGGAACA
AGGACACAAGAACGGAA
HS-
Algorithm AGGACACAAGAACGGA - -A
Prediction
AAAACAAAAACGGAACA MSA
HS-
A - -AAACAAAAACGGAACA HS-
Algorithm

Figure 5. The impact of MSA in RNA secondary structure prediction

The HS algorithm has been successfully applied to several 6. Update the harmony memory.
optimization problems[142]. As such this study aims to
investigate the use and adaption of the HS algorithm in finding Initialize
solutions to the MSA problems. The MSA problem can be Start
Parameters
considered as an optimization problem with minimal disruption Accept Yes
of the accuracy, complexity, and speed rules. MSA can be Objective
New
resolved by adapting the harmony search algorithm. Moreover, Harmony
Function
HS possesses several advantages over conventional HM of
optimization techniques[143] such as: alignment No Update
(HM) Improvise of
HM
1. HS does not require initial value settings for decision New Harmony
variables;
No
2. HS is a population-based meta-heuristic algorithm, which
means that a group of multiple harmonies can be used Terminal
simultaneously. Proper parallelism usually leads to better Cond.
performance with higher efficiency and speed;
3. HS uses stochastic random searches which explore the Yes
search space more widely and efficiently;
4. HS does not need derivation information;
End
5. HS is less sensitive to chosen parameters;
6. HS can solve various NP-complete problems[137]; Figure 6. The flowchart of the proposed HS-MSA algorithm
7. The structure of the HS algorithm is relatively easier;
B. A Modified Harmony Search Algorithm for MSA (MHS-
8. HS is a very successful meta-heuristic algorithm due to its MSA)
way of handling intensification and diversification.
To reduce the search space, a combination of methods is
9. HS is very versatile being able to combine with other proposed. A hybrid method of HS and a segment-based
meta-heuristic algorithms[134] approach is proposed and explained in the next subsection
3.2.1. In subsection 3.2.2, a hybrid method of HS and a
These characteristics increase the reliability and flexibility
combination of segment-based and divide-and-conquer
of the HS algorithm in producing better solutions.
approaches are proposed and explained.
The AHS-MSA algorithm as described in Figure 6
3.2.1 A Harmony Search algorithm with a Segment-based
combines and adapts the HS idea to solve the MSA problem.
The steps of the AMS-MSA algorithm are as follows: Approach
Lately identifying areas of local conservations before
1. Initialize the harmony parameters (HMCR, PAR, NI, and finding the global alignment is gaining popularity among
HMS). researchers. Conserved regions can be a helpful guide in
identifying the homology of sequences and assisting the
2. Initialize the harmony memory with random harmonies by
process of MSA. This idea is not new and has been
HMS solution. Each solution is an alignment.
implemented in other algorithms such as DIALIGN[49],
3. Calculate the objective function (OF) for each harmony. MLAGAN[85], CHAOS[84], align-m[42], and MAFFT[144]
where blocks are first detected from the pair-wise sequence
4. Improvise the new harmony. alignment and that information is then used to detect MSA. The
5. Accept/reject the new harmony other algorithm, such as MISHIMA[124], also used this idea in

78 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 2, 2011
which k-tuple is explored and analyzed from the original the results are combined to form a complete MSA alignment.
sequence. In the same way, well-aligned regions were seen in The method proceeds as follows:
RASCAL[51],[128] where a consistency-based objective
function called NorMD[50] was used. 1. Find all possible residue pairs in each sequence pair using
the pair-wise algorithm.
Herein, this proposed method in our research is to reduce
the search space in the previous AHS-MSA algorithm by 2. By using the consistency concept, find all the possible
combining pair-wise alignments into multiple alignments. It blocks or columns that are acceptable.
works by finding the conserved blocks through all the 3. Calculate the score value for each column by using the
sequences before starting the MSA process. It explores all sum-of-pairs objective function.
possible regions, which is more correct and consistent. All
matched blocks are used to guide the MSA alignment. The idea 4. Identify and analyze the potentially useful columns, and
is first to detect the conserved blocks in the sequences pair- select those that are more consistent with each other.
wise and then to apply HS to identify MSA from those 5. Add these conserve blocks/fragments to the fragments set
conserved columns. F and they can be considered as cutting points.
The multiple alignment search space can be narrowed down 6. Divide the sequence into sub-sequence based on these
to a number of possible regions per sequence pair. If parts of cutting points.
these residue pair are consistent within each other, they are
considered as acceptable. For consistency it means that if 7. Apply the HS algorithm to construct the final alignment
symbol Ai (residue i of sequence A) is aligned correctly with from these regions and find the optimal one.
symbol Bj , and Bj with Ck, then Ai and Ck should also be C. A Harmony Search Algorithm Improver for MSA (HSI-
aligned. Therefore, this property can be used to define the MSA)
consistent parts among all the pair-wise alignments which can
be considered as acceptable, and the gap positions can be Another proposed method in our research work is the use of
defined at the rest of the aligned residue pairs. HSI-MSA to combine many multiple alignments into one
improved alignment. Any conventional MSA program or a
The ability to determine the well-aligned regions has at combination of them can initialize the Harmony memory. Then
least two advantages. It prevents the same region from being the Harmony algorithm can be applied as an iterative method to
changed in the later process. Additionally, it speeds up the refine/combine the alignment to find the best alignment result.
optimization process. The modified steps of the HS-MSA Here HS takes on the role of an improver of the accuracy of the
algorithm can be summarized as follows: current alignment. The goal of this study is to investigate
1. Find all possible residue pairs in each sequence pair using whether this approach is going to improve the accuracy of the
the pair-wise algorithm. different alignments or not. This improver idea is similar to the
PASA algorithm[97] which was used a genetic algorithm
2. By using the consistency concept, find all possible blocks model to combine the alignment outputs of two MSA programs
or columns that are acceptable. – M-Coffee and ProbCons. It has also been used in
ComAlign[147], M-Coffee[148] and AQUA[52] . The
3. Calculate the score value for each block by using the sum-
proposed method can be summarized as follows:
of-pairs objective function.
1. Initialize the harmony memory by using well-known MSA
4. Identify and analyze the potentially useful blocks, and
algorithms including our alignment gained from the
select those that are more consistent with each other.
previous step.
5. Apply the HS algorithm to initialize the final alignment
2. Calculate the score for each alignment.
from these blocks and find the optimal alignment.
3. Apply the HS algorithm to improve and find the optimal
3.2.2 A Harmony Search algorithm with Segment-based and alignment.
Divide-and-conquer Approaches
The previous proposed method can be extended where the This will combine all the alignment parts from the different
divide-and-conquer (DAC)[145] method can be combined. alignments to find the optimal alignment within them and not
just to select the best of them.
Sammeth at el.[146], and Kryukov and Saitou[124] used
the DCA approach in solving MSA. Kryukov and Saitou[124] D. A Parallel Harmony Search Algorithm for MSA (PHS-
produced the adapted DCA in which k-tuple is used to find the MSA)
segments and align these segments by CLUSTALW and In addition to the foregoing proposed methods, another way
MAFFT. Sammeth at el.[146], on the other hand, integrated the to reduce the computational complexity and time consumed is
global divide-and-conquer approach with the local segment- to parallel the HS-MSA algorithm using multi-core and multi-
based approach as in DIALIGN. GPU platforms.
A set of consistent columns can form segments in the CUDA (Compute Unified Device Architecture) is an
alignment. The DCA protocol is to cut the sequences at a point extension from C/C++ developed by NVIDIA to run
and repeat that cutting procedure until it is no longer exceeded. thousands of threads parallelly[149] and to execute on the
Then the obtained sub-sequences are aligned independently and GPUs[150]. GPUs’ architectures are “manycore” with

79 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 2, 2011
hundreds of cores[149]. GPUs were implemented as a 5S.B.actinobacteria), 16S (16S.B.fibrobacteres,
streaming processor. 16S.E.entamoebidae, 16S.E.perkinsea) ribosomal RNA.
It is a good alternative for high performance computing and B. Reference Comparison
it will become even more excellent in the near future. To assess the quality of the aligned sequence, it requires a
Furthermore, availability, low price, and easy installation are reference alignment from the database benchmark. The
the main advantages[151] of the GPUs compared to other comparison is between the test alignment and the reference
architecture. alignment.
Re-developing the algorithm and the data structure based Sum-of-pairs (SPS) and column Score (CS) are two
on computer graphic concepts is the main obstacle facing the different score functions that can be used to estimate this
use of the GPUs[151],[152]. Moreover, other limitations are comparison. The SPS score is the percentage of the correct
based on the streaming architecture which have to be taken into aligned residue pairs in the test alignment that occurred in the
consideration (i.e. memory random access, cross fragment, reference alignment[159]. The CS score is the percentage of the
persistent state) entire columns in the test alignment that occurred completely in
Many researchers have shown the design and the reference alignment[159].
implementation of bioinformatics algorithms using GPUs. In a given test alignment consisting of M columns, the ith
Examples that use GPU to parallel sequence alignment column is denoted by Ai1,Ai2, . . . ,AiN where N is the number
algorithm in bioinformatics are[153], [154], [151], [155], [156], of sequences. For each pair of residues Aij and Aik, pi(j,k) is
[157]. defined such that pi(j,k) = 1 if residues Aij and Aik from the test
Our approach is motivated by the rapidly increasing power alignment are aligned with each other in the reference
of GPU. Our proposed approach is to implement the proposed alignment, otherwise pi(j,k) = 0. The Score of the ith column
HS-MSA algorithm using NVIDIA's GPUs, to explore and can be calculated as follows:
develop high performance solutions for multiple sequence Si= ∑N ∑N P j, k .
,
alignment. To program the GPU, the HS-MSA will be
implemented in NVIDIA GeForce 9400 GT CUDA. The Then, the sum-of-pairs score for a given test alignment can
computation will be conducted on NVIDIA GPUs installed in a be calculated as follows:
2.66 GHz intel Core 2 Quad CPU computer equipped with 3
∑M S
GB RAM, running on Microsoft Windows XP Professional. Sum-of-Pairs (SPS) = M ,
∑ S
Moreover, to utilitize multiple CPU threads to incorporate
GPU devices into one single program, the proposed method where Mr is the number of columns in the reference
can be extended to use a hybrid multi-core and GPU codes by alignment and Sri is the score Si for the ith column in the
CUDA and OpenMP. This can lead to quicker implementation reference alignment.
and greater efficiency on both GPU and multi-core CPU[158]. Column score (CS): Using the same symbols as shown
IV. EVALUATION AND ANALYSIS above, the score Ci of the ith column is equal to 1 if all the
residues in that column are aligned in the reference alignment,
To evaluate and analyse the performance of the proposed otherwise it is equal to 0. Therefore, the column score is:
HS-MSA algorithm in greater depth there is a need for an C
objective criterion to assess the quality of the aligned CS = ∑M
M
sequences. The quality attained can be evaluated by comparing
the results of the test alignment with the reference To compare the test alignment with the corresponding
alignment[139]. reference alignment, the sum-of-pairs function and column
score are used as described in[139],[107],[160],[161],[162].
The comparison can use some scores that may be dependent
on the alignment itself (e.g, Sum-of-Pairs, Total Column Score) C. Alignment Comparison
or independent from it (structure sensitivity and selectivity). This comparison is to evaluate the performance of the
This subsection describes in detail the benchmark dataset, the proposed algorithm with respect to the other MSA aligners.
reference comparison, the alignment comparison and the Typically, the MSA aligners are validated by using a
structure comparison, which can be investigated to evaluate the benchmark data set of reference alignments.
test alignments.
The Sum-of-pairs (SPS) and column scores (CS) of every
A. Benchmark Dataset produced alignment of each aligner program including our
The proposed algorithm will be tested using the following proposed algorithm are used to compare with the reference
datasets: Rfam, BRAliBase 2.1, Comparative RNA website alignment.
(CRW), the Ribonuclease P database, 5S Ribosomal RNA The proposed algorithm HS-MSA can be compared to the
database, tmRNA , tRNA , SRPDB, RNAdb, and ncRNA as commonly used MSA programs on the above reference
explained in section 2.6. Different RNA datasets will be used alignment benchmark.
from a variety of families and lengths such as 5S
(5S.B.alphaproteobacteria, 5S.B.betaproteobacteria,

80 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 2, 2011
D. Structure Comparison paper proposes a novel meta-heuristic method to solve the
It might be expected that a more accurate alignment would MSA problem. A meta-heuristic algorithm (HS-MSA), which
lead to a more accurate RNA secondary structure. The has not been used up to now, is proposed for multiple sequence
proposed method is to investigate the impact of alignment alignment that promises to greatly speed up the alignment
accuracy on the accuracy of the RNA secondary structure using process and improve its accuracy. The optimization method
standard benchmarks and comparing them with the common introduced herein is inspired by the so-called harmony search
well-known MSA algorithms. algorithm (HS). A new optimization algorithm for the
combination of HS-MSA with segment-based multiple-
Both the alignment process and the prediction process can alignment problem is also proposed and extended to include the
affect the accuracy of the secondary structure prediction, but parallel techniques.
here only the alignment process is investigated.
ACKNOWLEDGMENTS
The evaluation is performed in respect to sensitivity,
selectivity or positive predictive value (PPV), and Mathews This research is supported by the Universiti Sains Malaysia
correlation coefficient (MCC) of the RNA secondary structure (USM) Fellowship awarded to the corresponding authors. The
as used by Gardner and Giegerich[163]. The secondary authors extend their appreciation to the School of Computer
structure of the test alignment produced by the proposed Sciences as well as Universiti Sains Malaysia for their facilities
algorithm will be compared with that of others. The sensitivity and assistance. The authors acknowledge with gratitude the
and selectivity of the alignment process will be studied to help of USM-IPS for proof-editing this paper. The authors are
investigate the effect of the proposed aligner on the accuracy of appreciative of the efforts of the reviewers for their helpful
the structure as shown in Figure 7. comments.
REFERENCES
RNA Sequences
[1] Zablocki, F.B.R., Multiple Sequence Alignment using Particle Swarm
1--------------------
Optimization, in Department of Computer Science. 2007, University of
2-------------------- Pretoria.
3--------------------
[2] Bonizzoni, P. and G. Della Vedova, The complexity of multiple
sequence alignment with SP-score that is a metric. Theoretical
Computer Science, 2001. 259(1-2): p. 63-79.
HS-MSA MSA MSA [3] Just, W., Computational complexity of multiple sequence alignment
Tool1 Tool2 Tool3 with SP-Score. Journal of Computational Biology, 2001. 8(6): p. 615-
623.
[4] Hickson, R.E., C. Simon, and S.W. Perrey, The performance of several
Aligned RNA Aligned RNA Aligned RNA multiple-sequence alignment programs in relation to secondary-
Sequences Sequences Sequences structure features for an rRNA sequence. Molecular Biology and
1-------------------- 1-------------------- 1-------------------- Evolution, 2000. 17(4): p. 530-539.
2-------------------- 2-------------------- 2--------------------
3-------------------- 3-------------------- 3-------------------- [5] Pace, N.R., B.C. Thomas, and C.R. Woese, Probing RNA structure,
function, and history by comparative analysis. COLD SPRING
HARBOR MONOGRAPH SERIES, 1999. 37: p. 113-142.
[6] Bernhart, S.H., et al., RNAalifold: improved consensus structure
RNA Secondary prediction for RNA alignments. Bmc Bioinformatics, 2008. 9: p. -.
Structure Tool Reference [7] Notredame, C., Recent progress in multiple sequence alignment: a
Structure
survey. Pharmacogenomics, 2002. 3(1): p. 131-144.
Structures Comparison

[8] Smith, T.F. and M.S. Waterman, Identification of Common Molecular


Subsequences. Journal of Molecular Biology, 1981. 147(1): p. 195-
197.
[9] Gotoh, O., Consistency of Optimal Sequence Alignments. Bulletin of
Mathematical Biology, 1990. 52(4): p. 509-525.
[10] Carrillo, H. and D. Lipman, The Multiple Sequence Alignment
Problem in Biology. Siam Journal on Applied Mathematics, 1988.
48(5): p. 1073-1082.
Figure 7. Structure comparison
[11] Wang, L. and T. Jiang, On the complexity of multiple sequence
alignment. Journal of Computational Biology, 1994. 1(4): p. 337-348.
V. CONCLUSION [12] Isokawa, M., M. Wayama, and T. Shimizu, Multiple sequence
Multiple sequence alignment is a fundamental technique in alignment using a genetic algorithm. Genome Informatics, 1996. 7: p.
176-177.
many bioinformatics applications. Many algorithms have been
developed to achieve optimal alignment. Some programs are [13] Lai, C.C., C.H. Wu, and C.C. Ho, Using Genetic Algorithm to Solve
Multiple Sequence Alignment Problem. International Journal of
exhaustive in nature; some are heuristic. Because exhaustive Software Engineering and Knowledge Engineering, 2009. 19(6): p.
programs are not feasible in most cases, heuristic programs are 871-888.
commonly used. These include progressive, iterative, and [14] Horng, J.T., et al., A genetic algorithm for multiple sequence
block-based approaches. alignment. Soft Computing, 2005. 9(6): p. 407-420.
[15] 15. Bi, C., Computational intelligence in multiple sequence alignment.
This paper describes briefly the basic concepts of MSA and International Journal of Intelligent Computing and Cybernetics, 2008.
reviews the common approaches in MSA. To this end, this 1(1): p. 8-24.

81 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 2, 2011
[16] Yang, B.-H., An Approach to Multiple Protein Sequence Alignment [39] Vingron, M. and P. Argos, Motif Recognition and Alignment for Many
Using A Genetic Algorithm. 2000, National Central University. Sequences by Comparison of Dot-Matrices. Journal of Molecular
[17] Jorng-Tzong Horng, et al. Using Genetic Algorithms to Solve Multiple Biology, 1991. 218(1): p. 33-43.
Sequence Alignments. in Proceedings of the Genetic and Evolutionary [40] Notredame, C., D.G. Higgins, and J. Heringa, T-Coffee: A novel
Computation Conference (GECCO-2000). 2000. Morgan Kaufmann, method for fast and accurate multiple sequence alignment. Journal of
Las Vegas, Nevada, USA. Molecular Biology, 2000. 302(1): p. 205-217.
[18] Notredame, C. and D.G. Higgins, SAGA: Sequence alignment by [41] Katoh, K. and H. Toh, Recent developments in the MAFFT multiple
genetic algorithm. Nucleic Acids Research, 1996. 24(8): p. 1515-1524. sequence alignment program. Briefings in Bioinformatics, 2008. 9(4):
[19] da Silva, F.J.M., et al., AlineaGA: A Genetic Algorithm for Multiple p. 286-298.
Sequence Alignment. New Challenges in Applied Intelligence [42] Van Walle, I., I. Lasters, and L. Wyns, Align-m - a new algorithm for
Technologies, 2008. 134: p. 309-318. multiple alignment of highly divergent sequences. Bioinformatics,
[20] Gondro, C. and B.P. Kinghorn, A simple genetic algorithm for multiple 2004. 20(9): p. 1428-1435.
sequence alignment. Genetics and Molecular Research, 2007. 6(4): p. [43] Paten, B., et al., Sequence progressive alignment, a framework for
964-982. practical large-scale probabilistic consistency alignment.
[21] Shyu, C. and J.A. Foster, Evolving consensus sequence for multiple Bioinformatics, 2009. 25(3): p. 295-301.
sequence alignment with a genetic algorithm. Genetic and Evolutionary [44] Pei, J.M. and N.V. Grishin, MUMMALS: multiple sequence alignment
Computation - Gecco 2003, Pt Ii, Proceedings, 2003. 2724: p. 2313- improved by using hidden Markov models with local structural
2324. information. Nucleic Acids Research, 2006. 34(16): p. 4364-4374.
[22] Lee, C., C. Grasso, and M.F. Sharlow, Multiple sequence alignment [45] Pei, J. and N.V. Grishin, PROMALS: towards accurate multiple
using partial order graphs. Bioinformatics, 2002. 18(3): p. 452-464. sequence alignments of distantly related proteins. Bioinformatics,
[23] Grasso, C. and C. Lee, Combining partial order alignment and 2007. 23(7): p. 802.
progressive multiple sequence alignment increases alignment speed [46] Roshan, U. and D.R. Livesay, Probalign: multiple sequence alignment
and scalability to very large alignment problems. Bioinformatics, 2004. using partition function posterior probabilities. Bioinformatics, 2006.
20(10): p. 1546-1556. 22(22): p. 2715-2721.
[24] Raphael, B., et al., A novel method for multiple alignment of sequences [47] Phuong, T.M., et al., Multiple alignment of protein sequences with
with repeated and shuffled elements. Genome Research, 2004. 14(11): repeats and rearrangements. Nucleic Acids Research, 2006. 34(20): p.
p. 2336-2346. 5932-5942.
[25] Pevzner, P.A., H.X. Tang, and G. Tesler, De novo repeat classification [48] Sahraeian, S.M.E. and B.J. Yoon, PicXAA: greedy probabilistic
and fragment assembly. Genome Research, 2004. 14(9): p. 1786-1796. construction of maximum expected accuracy alignment of multiple
[26] Jones, N.C., D.G. Zhi, and B.J. Raphael, AliWABA: alignment on the sequences. Nucleic acids research.
web through an A-Bruijn approach. Nucleic Acids Research, 2006. 34: [49] Morgenstern, B., et al., DIALIGN: Finding local similarities by
p. W613-W616. multiple sequence alignment. Bioinformatics, 1998. 14(3): p. 290-294.
[27] Chen, W.Y., et al., Multiple Sequence Alignment Algorithm Based on [50] Thompson, J.D., et al., Towards a reliable objective function for
a Dispersion Graph and Ant Colony Algorithm. Journal of multiple sequence alignments. Journal of Molecular Biology, 2001.
Computational Chemistry, 2009. 30(13): p. 2031-2038. 314(4): p. 937-951.
[28] Richer, J.M., V. Derrien, and J.K. Hao, A new dynamic programming [51] Thompson, J.D., J.C. Thierry, and O. Poch, RASCAL: rapid scanning
algorithm for multiple sequence alignment. Combinatorial and correction of multiple sequence alignments. Bioinformatics, 2003.
Optimization and Applications, Proceedings, 2007. 4616: p. 52-61. 19(9): p. 1155-1161.
[29] Altschul, S.F., Amino-Acid Substitution Matrices from an Information [52] Muller, J., et al., AQUA: automated quality improvement for multiple
Theoretic Perspective. Journal of Molecular Biology, 1991. 219(3): p. sequence alignments. Bioinformatics, 2010. 26(2): p. 263-265.
555-565. [53] Edgar, R.C., MUSCLE: a multiple sequence alignment method with
[30] Dayhoff, M.O., R.M. Schwartz, and B.C. Orcutt, A model of reduced time and space complexity. Bmc Bioinformatics, 2004. 5: p. 1-
evolutionary change in proteins. Atlas of protein sequence and 19.
structure, 1978. 5(Suppl 3): p. 345–352. [54] Griffiths-Jones, S., et al., Rfam: an RNA family database. Nucleic
[31] Gonnet, G.H., M.A. Cohen, and S.A. Benner, Exhaustive Matching of Acids Research, 2003. 31(1): p. 439-441.
the Entire Protein-Sequence Database. Science, 1992. 256(5062): p. [55] Griffiths-Jones, S., et al., Rfam: annotating non-coding RNAs in
1443-1445. complete genomes. Nucleic Acids Research, 2005. 33: p. D121-D124.
[32] Henikoff, S. and J.G. Henikoff, Amino-Acid Substitution Matrices [56] Gardner, P.P., A. Wilm, and S. Washietl, A benchmark of multiple
from Protein Blocks. Proceedings of the National Academy of Sciences sequence alignment programs upon structural RNAs. Nucleic Acids
of the United States of America, 1992. 89(22): p. 10915-10919. Research, 2005. 33(8): p. 2433-2439.
[33] Altschul, S.F., R.J. Carroll, and D.J. Lipman, Weights for Data Related [57] Wilm, A., I. Mainz, and G. Steger, An enhanced RNA alignment
by a Tree. Journal of Molecular Biology, 1989. 207(4): p. 647-653. benchmark for sequence alignment programs. Algorithms for
[34] Gotoh, O., A Weighting System and Algorithm for Aligning Many Molecular Biology, 2006. 1: p. -.
Phylogenetically Related Sequences. Computer Applications in the [58] Cannone, J.J., et al., The Comparative RNA Web (CRW) Site: an
Biosciences, 1995. 11(5): p. 543-551. online database of comparative sequence and structure information for
[35] Gotoh, O., Multiple sequence alignment: algorithms and applications. ribosomal, intron, and other RNAs. Bmc Bioinformatics, 2002. 3: p. -.
Advances in Biophysics, 1999. 36(1): p. 159-206. [59] Wuyts, J., et al., The European Large Subunit Ribosomal RNA
[36] Miyazawa, S., A reliable sequence alignment method based on Database. Nucleic Acids Research, 2001. 29(1): p. 175-177.
probabilities of residue correspondences. Protein Engineering, 1995. [60] Wuyts, J., G. Perriere, and Y. Van de Peer, The European ribosomal
8(10): p. 999-1009. RNA database. Nucleic Acids Research, 2004. 32: p. D101-D103.
[37] Yamada, S., O. Gotoh, and H. Yamana, Improvement in Speed and [61] Brown, J.W., The Ribonuclease P Database. Nucleic Acids Research,
Accuracy of Multiple Sequence Alignment Program PRIME. IPSJ 1999. 27(1): p. 314-314.
Transactions on Bioinformatics, 2008. 1(0): p. 2-12.
[62] Szymanski, M., et al., 5S ribosomal RNA database. Nucleic Acids
[38] Do, C.B., et al., ProbCons: Probabilistic consistency-based multiple Research, 2002. 30(1): p. 176-178.
sequence alignment. Genome Research, 2005. 15(2): p. 330-340.
[63] de Novoa, P.G. and K.P. Williams, The tmRNA website: reductive
evolution of tmRNA in plastids and other endosymbionts. Nucleic
Acids Research, 2004. 32: p. D104-D108.

82 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 2, 2011
[64] Zwieb, C., et al., tmRDB (tmRNA database). Nucleic Acids Research, [89] Zhang, C. and A.K.C. Wong, Toward efficient multiple molecular
2003. 31(1): p. 446-447. sequence alignment: A system of genetic algorithm and dynamic
[65] Pang, K.C., et al., RNAdb - a comprehensive mammalian noncoding programming. Ieee Transactions on Systems Man and Cybernetics Part
RNA database. Nucleic Acids Research, 2005. 33: p. D125-D130. B-Cybernetics, 1997. 27(6): p. 918-932.
[66] Pang, K.C., et al., RNAdb 2.0-an expanded database of mammalian [90] Cai, L.M., D. Juedes, and E. Liakhovitch, Evolutionary computation
non-coding RNAs. Nucleic Acids Research, 2007. 35: p. D178-D182. techniques for multiple sequence alignment. Proceedings of the 2000
Congress on Evolutionary Computation, Vols 1 and 2, 2000: p. 829-
[67] Mattick, J.S. and I.V. Makunin, Non-coding RNA. Human Molecular 835.
Genetics, 2006. 15: p. R17-R29.
[91] Wang, C.L. and E.J. Lefkowitz, Genomic multiple sequence
[68] Kemena, C. and C. Notredame, Upcoming challenges for multiple alignments: refinement using a genetic algorithm. Bmc Bioinformatics,
sequence alignment methods in the high-throughput era. 2005. 6: p. -.
Bioinformatics, 2009. 25(19): p. 2455-2465.
[92] Ergezer, H. and K. Leblebicioglu, Refining the progressive multiple
[69] Edgar, R.C. and S. Batzoglou, Multiple sequence alignment. Current
sequence alignment score using genetic algorithms. Artificial
Opinion in Structural Biology, 2006. 16(3): p. 368-373.
Intelligence and Neural Networks, 2006. 3949: p. 177-184.
[70] Wallace, I.M., G. Blackshields, and D.G. Higgins, Multiple sequence
[93] Chen, S.M., C.H. Lin, and S.J. Chen, Multiple DNA sequence
alignments. Current Opinion in Structural Biology, 2005. 15(3): p. 261-
alignment based on genetic algorithms and divide-and-conquer
266.
techniques. International Journal of Applied Science and Engineering,
[71] Morgenstern, B., DIALIGN 2: improvement of the segment-to-segment 2005. 3(2): p. 89-100.
approach to multiple sequence alignment. Bioinformatics, 1999. 15(3): [94] Lee, Z.J., et al., Genetic algorithm with ant colony optimization (GA-
p. 211-218.
ACO) for multiple sequence alignment. Applied Soft Computing,
[72] Liu, Y., B. Schmidt, and D.L. Maskell, MSAProbs: multiple sequence 2008. 8(1): p. 55-78.
alignment based on pair hidden Markov models and partition function
[95] Chen, Y., et al., Multiple sequence alignment based on genetic
posterior probabilities. Bioinformatics, 2010: p. btq338.
algorithms with reserve selection. Proceedings of 2008 Ieee
[73] Pei, J.M., R. Sadreyev, and N.V. Grishin, PCMA: fast and accurate International Conference on Networking, Sensing and Control, Vols 1
multiple sequence alignment based on profile consistency. and 2, 2008: p. 1511-1516.
Bioinformatics, 2003. 19(3): p. 427-428.
[96] Taheri, J. and A.Y. Zomaya, RBT-GA: a novel metaheuristic for
[74] Zhao, P. and T. Jiang, A heuristic algorithm for multiple sequence solving the multiple sequence alignment problem. Bmc Genomics,
alignment based on blocks. Journal of Combinatorial Optimization, 2009.
2001. 5(1): p. 95-115.
[97] Jeevitesh.M.S, et al., Higher accuracy protein Multiple Sequence
[75] Wang, S., R.R. Gutell, and D.P. Miranker, Biclustering as a method for Alignment by Stochastic Algorithm. 2010.
RNA local multiple sequence alignment. Bioinformatics, 2007. 23(24):
[98] Dorigo, M., V. Maniezzo, and A. Colorni, Ant system: Optimization by
p. 3289-3296.
a colony of cooperating agents. Ieee Transactions on Systems Man and
[76] Chan, S.C., A.K.C. Wong, and D.K.Y. Chiu, A Survey of Multiple Cybernetics Part B-Cybernetics, 1996. 26(1): p. 29-41.
Sequence Comparison Methods. Bulletin of Mathematical Biology,
[99] Dorigo, M., G. Di Caro, and L.M. Gambardella, Ant algorithms for
1992. 54(4): p. 563-598.
discrete optimization. Artificial Life, 1999. 5(2): p. 137-172.
[77] Morgenstern, B., et al., Multiple sequence alignment with user-defined [100] Dorigo, M. and C. Blum, Ant colony optimization theory: A survey.
anchor points. Algorithms for Molecular Biology, 2006. 1: p. -. Theoretical Computer Science, 2005. 344(2-3): p. 243-278.
[78] Boguski, M.S., et al., Analysis of Conserved Domains and Sequence
[101] Chen, Y.X., et al., Multiple sequence alignment by ant colony
Motifs in Cellular Regulatory Proteins and Locus-Control Regions
optimization and divide-and-conquer. Computational Science - Iccs
Using New Software Tools for Multiple Alignment and Visualization. 2006, Pt 2, Proceedings, 2006. 3992: p. 646-653.
New Biologist, 1992. 4(3): p. 247-260.
[102] Liu, W., L. Chen, and J. Chen, An efficient algorithm for multiple
[79] Miller, W., Building Multiple Alignments from Pairwise Alignments.
sequence alignment based on ant colony optimisation and divide-and-
Computer Applications in the Biosciences, 1993. 9(2): p. 169-176.
conquer method. New Zealand Journal of Agricultural Research, 2007.
[80] Miller, W., et al., Constructing aligned sequence blocks. Journal of 50(5): p. 617-626.
Computational Biology, 1994. 1(1): p. 51-64.
[103] Moss, J. and C.G. Johnson, An ant colony algorithm for multiple
[81] Depiereux, E. and E. Feytmans, Match-Box - a Fundamentally New sequence alignment in bioinformatics. Artificial Neural Nets and
Algorithm for the Simultaneous Alignment of Several Protein Genetic Algorithms, Proceedings, 2003: p. 182-186.
Sequences. Computer Applications in the Biosciences, 1992. 8(5): p.
[104] Chen, Y.X., et al., Partitioned optimization algorithms for multiple
501-509.
sequence alignment. 20th International Conference on Advanced
[82] Subramanian, A.R., et al., DIALIGN-T: An improved algorithm for Information Networking and Applications, Vol 2, Proceedings, 2006:
segment-based multiple sequence alignment. Bmc Bioinformatics, p. 618-622.
2005. 6: p. -.
[105] Zhao, Y.D., et al., An Improved Ant Colony Algorithm for DNA
[83] Subramanian, A.R., M. Kaufmann, and B. Morgenstern, DIALIGN- Sequence Alignment. Isise 2008: International Symposium on
TX: greedy and progressive approaches for segment-based multiple Information Science and Engineering, Vol 2, 2008: p. 683-688.
sequence alignment. Algorithms for Molecular Biology, 2008. 3: p. -.
[106] Kennedy, J. and R. Eberhart, Particle swarm optimization. 1995 Ieee
[84] Brudno, M., et al., Fast and sensitive multiple alignment of large International Conference on Neural Networks Proceedings, Vols 1-6,
genomic sequences. Bmc Bioinformatics, 2003. 4: p. -. 1995: p. 1942-1948.
[85] Brudno, M., et al., LAGAN and Multi-LAGAN: Efficient tools for [107] Rasmussen, T.K. and T. Krink, Improved Hidden Markov Model
large-scale multiple alignment of genomic DNA. Genome Research, training for multiple sequence alignment by a particle swarm
2003. 13(4): p. 721-731. optimization - evolutionary algorithm hybrid. Biosystems, 2003. 72(1-
[86] Chellapilla, K. and G.B. Fogel. Multiple sequence alignment using 2): p. 5-17.
evolutionary programming. 1999. [108] Pedro F. Rodriguez, L.F. Nino, and O.M. Alonso, Multiple sequence
[87] Kupis, P. and J. Mandziuk, Multiple sequence alignment with alignment using swarm intelligence. International Journal of
evolutionary-progressive method. Adaptive and Natural Computing Computational Intelligence Research 2007. 3(2): p. pp. 123-130.
Algorithms, Pt 1, 2007. 4431: p. 23-30. [109] Juang, W.S. and S.F. Su, Multiple sequence alignment using modified
[88] Zhang, C. and A.K.C. Wong, A genetic algorithm for multiple dynamic programming and particle swarm optimization. Journal of the
molecular sequence alignment. Computer Applications in the Chinese Institute of Engineers, 2008. 31(4): p. 659-673.
Biosciences, 1997. 13(6): p. 565-581.

83 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 2, 2011
[110] Xu, F.S. and Y.H. Chen, A Method for Multiple Sequence Alignment [132] Geem, Z.W., Improved harmony search from ensemble of music
Based on Particle Swarm Optimization. Emerging Intelligent players. Knowledge-Based Intelligent Information and Engineering
Computing Technology and Applications: With Aspects of Artificial Systems, Pt 1, Proceedings, 2006. 4251: p. 86-93.
Intelligence, 2009. 5755: p. 965-973. [133] Mahdavi, M., M. Fesanghary, and E. Damangir, An improved harmony
[111] Lei, X.J., J.J. Sun, and Q.Z. Ma, Multiple Sequence Alignment Based search algorithm for solving optimization problems. Applied
on Chaotic PSO. Computational Intelligence and Intelligent Systems, Mathematics and Computation, 2007. 188(2): p. 1567-1579.
2009. 51: p. 351-360. [134] Omran, M.G.H. and M. Mahdavi, Global-best harmony search.
[112] Hai-Xia, L., et al., Multiple Sequence Alignment Based on a Binary Applied Mathematics and Computation, 2008. 198(2): p. 643-656.
Particle Swarm Optimization Algorithm, in Proceedings of the 2009 [135] Pan, Q.K., et al., A local-best harmony search algorithm with dynamic
Fifth International Conference on Natural Computation - Volume 03. subpopulations. Engineering Optimization, 2010. 42(2): p. 101-117.
2009, IEEE Computer Society.
[136] Zou, D.X., et al., A novel global harmony search algorithm for
[113] Kirkpatrick, S., C.D. Gelatt, and M.P. Vecchi, Optimization by reliability problems. Computers & Industrial Engineering, 2010. 58(2):
Simulated Annealing. Science, 1983. 220(4598): p. 671-680. p. 307-316.
[114] Roc, R.O.C., Multiple DNA Sequence Alignment Based on Genetic [137] Mahdavi, M., Solving NP-Complete Problems by Harmony Search.
Simulated Annealing Techniques. Information and Management, 2007. Music-Inspired Harmony Search Algorithm, 2009: p. 53-70.
18(2): p. 97-111.
[138] Thomsen, R., G.B. Fogel, and T. Krink, A clustal alignment improver
[115] Kim, J., S. Pramanik, and M.J. Chung, Multiple Sequence Alignment using evolutionary algorithms. Cec'02: Proceedings of the 2002
Using Simulated Annealing. Computer Applications in the Congress on Evolutionary Computation, Vols 1 and 2, 2002: p. 121-
Biosciences, 1994. 10(4): p. 419-426. 126.
[116] Uren, P.J., R.M. Cameron-Jones, and A.H.J. Sale, MAUSA: Using [139] Thompson, J.D., F. Plewniak, and O. Poch, A comprehensive
simulated annealing for guide tree construction in multiple sequence comparison of multiple sequence alignment programs. Nucleic Acids
alignment. Ai 2007: Advances in Artificial Intelligence, Proceedings, Research, 1999. 27(13): p. 2682-2690.
2007. 4830: p. 599-608.
[140] Lipman, D.J., S.F. Altschul, and J.D. Kececioglu, A Tool for Multiple
[117] Keith, J.M., et al., A simulated annealing algorithm for finding Sequence Alignment. Proceedings of the National Academy of
consensus sequences. Bioinformatics, 2002. 18(11): p. 1494-1499. Sciences of the United States of America, 1989. 86(12): p. 4412-4415.
[118] Omar, M.F., et al., Multiple Sequence Alignment Using Optimization [141] Mohsen, A.M., A.T. Khader, and D. Ramachandram, HSRNAFold: A
Algorithms. International Journal of Computational Intelligence, 2005. Harmony Search Algorithm for RNA Secondary Structure Prediction
1: p. 2. Based on Minimum Free Energy. Iit: 2008 International Conference on
[119] Joo, K., et al., Multiple Sequence Alignment by Conformational Space Innovations in Information Technology, 2008: p. 326-330.
Annealing. Biophysical Journal, 2008. 95(10): p. 4813-4819. [142] Ingram, G. and T. Zhang, Overview of applications and developments
[120] Riaz, T., Y. Wang, and L. Kuo-Bin, A TABU SEARCH in the harmony search algorithm. Music-Inspired Harmony Search
ALGORITHM FOR POST-PROCESSING MULTIPLE SEQUENCE Algorithm, 2009: p. 15-37.
ALIGNMENT. Journal of Bioinformatics & Computational Biology, [143] G. Ingram and T. Zhang, Music-Inspired Harmony Search Algorithm.
2005. 3(1): p. 145-156. Springer Berlin / Heidelberg, ed. c.O.o.A.a. and p. Developments in
[121] Lightner, C.A., A Tabu Search Approach to Multiple Sequence the Harmony Search Algorithm. 2009.
Alignment. 2008. [144] Katoh, K., et al., MAFFT: a novel method for rapid multiple sequence
[122] Katoh, K., et al., MAFFT version 5: improvement in accuracy of alignment based on fast Fourier transform. Nucleic Acids Research,
multiple sequence alignment. Nucleic acids research, 2005. 33(2): p. 2002. 30(14): p. 3059-3066.
511. [145] Stoye, J., V. Moulton, and A.W.M. Dress, DCA: An efficient
[123] Edgar, R.C., MUSCLE: multiple sequence alignment with high implementation of the divide-and-conquer approach to simultaneous
accuracy and high throughput. Nucleic Acids Research, 2004. 32(5): p. multiple sequence alignment. Computer Applications in the
1792-1797. Biosciences, 1997. 13(6): p. 625-626.
[124] Kryukov, K. and N. Saitou, MISHIMA - a new method for high speed [146] Sammeth, M., B. Morgenstern, and J. Stoye, Divide-and-conquer
multiple alignment of nucleotide sequences of bacterial genome scale multiple alignment with segment-based constraints. Bioinformatics,
data. Bmc Bioinformatics, 2010. 11: p. -. 2003. 19: p. Ii189-Ii195.
[125] Loytynoja, A. and M.C. Milinkovitch, A hidden Markov model for [147] Bucka-Lassen, K., O. Caprani, and J. Hein, Combining many multiple
progressive multiple alignment. Bioinformatics, 2003. 19(12): p. 1505- alignments in one improved alignment. Bioinformatics, 1999. 15(2): p.
1513. 122-130.
[126] Chakrabarti, S., et al., State of the art: refinement of multiple sequence [148] Wallace, I.M., et al., M-Coffee: combining multiple sequence
alignments. Bmc Bioinformatics, 2006. 7: p. -. alignment methods with T-Coffee. Nucleic Acids Research, 2006.
[127] Chakrabarti, S., et al., Refining multiple sequence alignments with 34(6): p. 1692-1699.
conserved core regions. Nucleic Acids Research, 2006. 34(9): p. 2598- [149] Luebke, D., CUDA: Scalable parallel programming for high-
2606. performance scientific computing. 2008 Ieee International Symposium
[128] Wang, Y. and K.B. Li, An adaptive and iterative algorithm for refining on Biomedical Imaging: From Nano to Macro, Vols 1-4, 2008: p. 836-
multiple sequence alignment. Computational Biology and Chemistry, 838.
2004. 28(2): p. 141-148. [150] Lindholm, E., et al., NVIDIA Tesla: A unified graphics and computing
[129] Simossis, V.A. and J. Heringa, PRALINE: a multiple sequence architecture. Ieee Micro, 2008. 28(2): p. 39-55.
alignment toolbox that integrates homology-extended and secondary [151] Liu, W.G., et al., GPU-ClustalW: Using graphics hardware to
structure information. Nucleic Acids Research, 2005. 33: p. W289- accelerate multiple sequence alignment. High Performance Computing
W294. - HiPC 2006, Proceedings, 2006. 4297: p. 363-374.
[130] Geem, Z.W., J.H. Kim, and G.V. Loganathan, A new heuristic [152] Liu, W., et al. Bio-sequence database scanning on a GPU. 2006: IEEE.
optimization algorithm: Harmony search. Simulation, 2001. 76(2): p. [153] Liu, W., et al., Streaming algorithms for biological sequence alignment
60-68. on GPUs. Ieee Transactions on Parallel and Distributed Systems, 2007.
[131] Yang, X.-S., Harmony Search as a Metaheuristic Algorithm, in Music- 18(9): p. 1270-1281.
Inspired Harmony Search Algorithm. 2009. p. 1-14. [154] Liu, Y., et al., GPU accelerated Smith-Waterman. Computational
Science - Iccs 2006, Pt 4, Proceedings, 2006. 3994: p. 188-195.

84 http://sites.google.com/site/ijcsis/
ISSN 1947-5500
(IJCSIS) International Journal of Computer Science and Information Security,
Vol. 9, No. 2, 2011
[155] Jung, S.B., Parallelized pairwise sequence alignment using CUDA on
multiple GPUs. Bmc Bioinformatics, 2009. 10: p. -.
[156] Liu, Y.C., B. Schmidt, and D.L. Maskell, Parallel Reconstruction of
Neighbor-Joining Trees for Large Multiple Sequence Alignments using
CUDA. 2009 Ieee International Symposium on Parallel & Distributed
Processing, Vols 1-5, 2009: p. 1538-1545.
[157] Liu, Y.C., B. Schmidt, and D.L. Maskell, MSA-CUDA: Multiple
Sequence Alignment on Graphics Processing Units with CUDA. 2009
20th Ieee International Conference on Application-Specific Systems,
Architectures and Processors, 2009: p. 121-128.
[158] Jang, H., A. Park, and K. Jung. Neural network implementation using
cuda and openmp. 2008: IEEE.
[159] Wheeler, T.J. and J.D. Kececioglu, Multiple alignment by aligning
alignments. Bioinformatics, 2007. 23(13): p. I559-I568.
[160] Lassmann, T. and E.L.L. Sonnhammer, Automatic assessment of
alignment quality. Nucleic Acids Research, 2005. 33(22): p. 7120-
7128.
[161] O'Sullivan, O., et al., APDB: a novel measure for benchmarking
sequence alignment methods without reference alignments.
Bioinformatics, 2003. 19: p. i215-i221.
[162] Lassmann, T. and E.L.L. Sonnhammer, Quality assessment of multiple
alignment programs. Febs Letters, 2002. 529(1): p. 126-130.
[163] Gardner, P.P. and R. Giegerich, A comprehensive comparison of
comparative RNA structure prediction approaches. Bmc
Bioinformatics, 2004. 5: p. -.

Mobarak Saif received his Bachelor’s Degree in


computer Science, Alzarqa, Jordan in 2000 and
Masters Degree in Computer Science from
Universiti Sains Malaysia, Penang, Malaysia in
2005. He is currently a PhD candidate under the
supervision of Professor Dr. Rosni Abdullah at the
School of Computer Sciences, Universiti Sains
Malaysia in the area of Parallel Algorithms Applied
to Bioinformatics Applications.

Rosni Abdullah received her Bachelor's Degree in


Computer Science and Applied Mathematics and
Masters Degree in Computer Science from Western
Michigan University, Kalamazoo, Michigan, U.S.A.
in 1984 and 1986 respectively. She joined the
School of Computer Sciences at Universiti Sains
Malaysia in 1987 as a lecturer. She received an
award from USM in 1993 to pursue her PhD at
Loughborough University United Kingdom in the
area Parallel Algorithms. She was promoted to
Associate Professor in 2000 and to Professor in
2008. She has held several administrative positions such as First Year
Coordinator, Programme Chairman and Deputy Dean for Postgraduate Studies
and Research. She is currently the Dean of the School of Computer Sciences
and also Head of the Parallel and Distributed Processing Research Group
which focus on grid computing and bioinformatics research. Her current
research work is in the area of Parallel Algorithms for Bioinformatics
Applications.

85 http://sites.google.com/site/ijcsis/
ISSN 1947-5500