• Home

Sequence Alignment Program For Mac

 
Sequence Alignment Program For Mac 7,5/10 6551 reviews
Published online 2016 Feb 17. doi: 10.1186/s13104-016-1927-4

Clustal 2 comes in two flavors: the command-line version Clustal W and the graphical version Clustal X. Precompiled executables for Linux, Mac OS X and Windows (incl. XP and Vista) of the most recent version (currently 2.1) along with the source code are available for download here. Multiple alignment program for amino acid or nucleotide sequences All-in-one package for Mac OS X Recommended only when redistributing MAFFT within another program package. This Mac download was checked by our built-in antivirus and was rated as clean. The program lies within Education Tools, more precisely Science. MAFFT is a multiple sequence alignment program for unix-like operating systems. It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment of ∼200 sequences), FFT-NS-2.

PMID: 26887850
This article has been cited by other articles in PMC.

Abstract

Background

Accurate multiple sequence alignment is central to bioinformatics and molecular evolutionary analyses. Although sophisticated sequence alignment programs are available, manual adjustments are often required to improve alignment quality. Unfortunately, few programs offer a simple and intuitive way to edit sequence alignments.

Results

We present Seqotron, a sequence editor that reads and writes files in a wide variety of sequence formats. Sequences can be easily aligned and manually edited using the mouse and keyboard. The program also allows the user to estimate both phylogenetic trees and distance matrices.

Conclusions

Seqotron will benefit researchers who need to manipulate and align complex sequence data. Seqotron is a Mac OS X compatible open source project and is available from Github https://github.com/4ment/seqotron/.

Keywords: Sequence editor, Alignment, Phylogenetics

Background

State-of-the-art methods of multiple sequence alignment such as MUSCLE [] and MAFFT [] are usually used to automatically generate alignments. Unfortunately, these methods can be inaccurate when the input sequences are highly dissimilar or when sequencing errors have been incorporated. Hence, it is important to visually inspect any sequence alignment prior to subsequent analysis to detect and correct potential errors. There are a large number of sequence editors that allow sequence alignments to be displayed, including Se-Al [3], Jalview [], SeaView [], Mesquite [6] and UGENE []. However, only a few (e.g. Se-Al) provide a simple and intuitive way to edit sequence alignments. In addition, it is often problematic to convert files into different file formats, even though a wide variety of formats are required for different applications.

Herein, we present a user-friendly application for visualizing, aligning, and manually editing genomic and protein sequences, and for converting between a variety of file formats. Alignments can be generated automatically using the MUSCLE [] and MAFFT [] packages and the quality of the alignment can be visually inspected and manually corrected using simple mouse-based and keyboard-based operations. In addition, Seqotron allows the computation of distance matrices and the inference of phylogenetic trees through the Physher program [].

Implementation

Seqotron is written in Objective-C and uses Cocoa, the native application programming interface for the Mac OS X operating system.

Results and discussion

Seqotron is designed for visualizing, aligning, and editing nucleotide and amino acid sequences (Fig. 1). Unaligned sequences and multiple sequence alignments can be imported and exported in a wide range of formats including: FASTA, NEXUS, NEWICK, PHYLIP, MEGA, Clustal, NBRF, Stockholm, and GDE. The sequence viewer can display sequences using different preset color schemes, such as the standard ClustalX coloring scheme. In addition, Seqotron allows the user to create personalized coloring schemes using a color editor. Sequences can be aligned or realigned using MUSCLE [] and MAFFT []. The alignment of protein-coding DNA sequences can also be achieved using their amino acid translation during the alignment process before reverting to DNA sequences []. One or a group of sequences can be manually edited by dragging regions of the alignment using the mouse in a similar way to Se-Al. In addition, selected regions can be removed in an intuitive way using the keyboard. A nucleotide sequence alignment can easily be temporarily translated according to any genetic code available, while allowing the user to simply revert to the original nucleotide sequences. Manual editing of translated sequences is also available. Another function that is useful for the analysis of segmented genomes (such as found in some viruses including influenza) in a phylogenetic context is the ability to concatenate sequences with identical names. This option is provided when several files are open at the same time.

Visualisation of a nucleotide alignment in Seqotron. This screenshot displays a region of an alignment

Given an accurate alignment of homologous sequences, it is natural to investigate the evolutionary history of the underlying organisms using phylogenetic methods. Seqotron allows the inference of phylogenetic trees using Physher [] from both amino acid and nucleotide sequences using distance-based (neighbor-joining and UPGMA) and maximum likelihood methods. Statistical support for each branch can be assessed through non-parametric bootstrapping and jackknifing. These resampling methods can be parallelized across multiple cores for higher efficiency. Physher’s binaries are packaged with the Seqotron application and therefore does not require installing any third-party programs or libraries. Seqotron provides a tree viewer (Fig. 2) to display newly generated trees or trees stored on file in NEXUS or NEWICK formats. The tree viewer provides additional functionalities such as taxa coloring, search by taxon name, re-rooting, node rotation, printing, and exporting to NEWICK-based text and PDF files. Another common task is to extract a subset of sequences for further investigation based on a their evolutionary relationship. To this end, Seqotron allows the selection of sequences through the tree viewer. In the case of segmented genomes a single tree can be used to select the same sequences in different alignments.

Visualisation of a phylogenetic tree in Seqotron. This screenshot displays a neighbour joining phylogenetic tree inferred from the data set in Fig. 1. Bootstrap values computed from 100 replicates are shown next to each branch. The tree was built using Physher, a program included in Seqotron

Finally, Seqotron supports natively the Quick Look technology that enables the Finder to display a quick preview of an alignment file and other useful information such as the number of sequences and the alignment length.

A comparison of the features available in Seqotron and other editors is provided in Table 1. Seqotron uses the native language of Mac OS X and therefore tends to be more memory efficient than editors written in other programming languages. Indeed, a common problem with programs written in Java is that they are prone to consume a large amount of memory. In some cases, when the amount of memory required to run the program exceeds a certain threshold, the user has to adjust the maximum heap size in a trial and error fashion and restart the application. We have compared the memory consumption of Seqotron to other programs using an alignment in a FASTA file containing 2813 sequences and 2277 sites on an iMac running Mac OSX 10.11 with a 3.2 GHz Intel Core i5 processor and 16 gigabytes of memory. The physical memory determined with the program top is reported. Se-al was not included since it does not run on Intel-based Apple computers. Seqotron is slightly more memory efficient than SeaView, requiring 54 and 85 megabytes (MB), respectively. Mesquite and Jalview showed the largest memory footprint requiring 2.98 gigabytes (333 MB when the data set is loaded from a NEXUS file) and 446 MB of memory, respectively. We also profiled the memory consumption and the speed of each program using the Instruments tool during the inference of a neighbor-joining tree. The same alignment was used to infer the tree and the total runtime also includes the calculation of an un-corrected pairwise distance matrix. Seqotron estimated the phylogenetic tree in 37 s and the memory peaked at 115 MB. Sea View was significantly slower: 5 min 56 s and the memory peak is higher with 769 MB during the inference of the tree. After the alignment was read as a NEXUS file, Mesquite calculated the tree in 23 min and its memory peak was 439 MB. Jalview used 635 MB and required more than 6 h to complete the analysis.

Table 1

SeqotronJalviewSeaViewSe-AlMesquite
Concatenate sequencesYesYesYesYes
Mouse-based alignmentYesYesYes
TransalignaYesYes
Temporary translationYesYes
Alignment zoomingYesYes
Distance matrixYesYes
Loading tree formatsNEXUS, NEWICKNEWICKNEXUS
Estimating treesNJb, UPGMAc, MLdNJ, UPGMANJ, MPe, MLNJ, MP
Tree resamplingBootstrap, jackknifeBootstrap

aAlignment of protein-coding DNA sequences using their amino acid translation

bNeighbor-joining (NJ)

cUnweighted pair-group method using arithmetic averages (UPGMA)

dMaximum likelihood (ML)

eMaximum parsimony (MP)

Conclusions

We have presented an open source, memory efficient, and user-friendly desktop application to automatically or manually align and edit multiple nucleotide and amino acid sequences. Seqotron also provides the option to estimate phylogenetic trees and distance matrices. We aim to add more functionalities in the future, such as creating a plugin mechanism and algorithms for searching sequence motifs.

Availability and requirements

Project name: Seqotron.

Project home page: https://github.com/4ment/seqotron/.

Operating system: Macintosh OS X (Intel) version 10.8 and higher.

Programming language: Objective-C/Cocoa.

License: GNU GPL version 3.

Any restrictions to use by non-academics: None.

Authors’ contributions

MF designed and implemented the software. All authors contributed to the writing of this manuscript. 3d animation program for mac. All authors read and approved the final manuscript.

Acknowledgements

MF was currently supported by a postdoctoral research fellowship from the University of Sydney. ECH is supported by an NHMRC Australia Fellowship.

Competing interests

Both authors declare that they have no competing interests.

Abbreviations

MUSCLEmultiple sequence comparison by log-expectation
MAFFTmultiple alignment using fast Fourier transform
PHYLIPphylogeny inference package
MEGAmolecular evolutionary genetics analysis
NBRFNational Biomedical Research Foundation
GDEgenetic data environment
UPGMAunweighted pair group method with arithmetic mean

Contributor Information

Mathieu Fourment, Email: ua.ude.stu@tnemruof.ueihtam.

Edward C. Holmes, Email: ua.ude.yendys@semloh.drawde.

References

1. Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucl Acids Res. 2004;32(5):1792–1797. doi: 10.1093/nar/gkh340.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
2. Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30(4):772–780. doi: 10.1093/molbev/mst010.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
3. Rambaut A. Se-Al: sequence alignment editor. http://evolve.zoo.ox.ac.uk/software/Se-Al.
4. Waterhouse AM, Procter JB, Martin DM, Clamp M, Barton GJ. Jalview Version 2—a multiple sequence alignment editor and analysis workbench. Bioinformatics. 2009;25(9):1189–1191. doi: 10.1093/bioinformatics/btp033.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
5. Gouy M, Guindon S, Gascuel O. SeaView version 4: a multiplatform graphical user interface for sequence alignment and phylogenetic tree building. Mol Biol Evol. 2010;27(2):221–224. doi: 10.1093/molbev/msp259. [PubMed] [CrossRef] [Google Scholar]
6. Maddison WP, Maddison DR. Mesquite: a modular system for evolutionary analysis. http://mesquiteproject.org. Version 3.04.
7. Okonechnikov K, Golosova O, Fursov M. Ugene team: unipro UGENE: a unified bioinformatics toolkit. Bioinformatics. 2012;28(8):1166–1167. doi: 10.1093/bioinformatics/bts091. [PubMed] [CrossRef] [Google Scholar]
8. Fourment M, Holmes EC. Novel non-parametric models to estimate evolutionary rates and divergence times from heterochronous sequence data. BMC Evolut Biol. 2014;14:163. doi: 10.1186/s12862-014-0163-6.[PMC free article] [PubMed] [CrossRef] [Google Scholar]

Sequence Alignment Software For Mac

9. Bininda-Emonds OR. transAlign: using amino acids to facilitate the multiple alignment of protein-coding DNA sequences. BMC Bioinform. 2005;6:156. doi: 10.1186/1471-2105-6-156.[PMC free article] [PubMed] [CrossRef] [Google Scholar]
Articles from BMC Research Notes are provided here courtesy of BioMed Central

This list of sequence alignment software is a compilation of software tools and web portals used in pairwise sequence alignment and multiple sequence alignment. See structural alignment software for structural alignment of proteins.

Database search only[edit]

NameDescriptionSequence type*LinkAuthorsYear
BLASTLocal search with fast k-tuple heuristic (Basic Local Alignment Search Tool)BothNCBIEMBL-EBIDDBJDDBJ (psi-blast)GenomeNetPIR (protein only)Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ[1]1990
CS-BLASTSequence-context specific BLAST, more sensitive than BLAST, FASTA, and SSEARCH. Position-specific iterative version CSI-BLAST more sensitive than PSI-BLASTProteinCS-BLAST serverdownload[permanent dead link]Angermueller C, Biegert A, Soeding J[2]2013
CUDASW++GPU accelerated Smith Waterman algorithm for multiple shared-host GPUsProteinLiu Y, Maskell DL and Schmidt B2009/2010
DIAMONDBLASTX and BLASTP aligner based on double indexingProteinBuchfink B, Xie, C and Huson DH[3]2015
FASTALocal search with fast k-tuple heuristic, faster but less sensitive than BLASTBothEMBL-EBIDDBJGenomeNetPIR (protein only)
GGSEARCH, GLSEARCHGlobal:Global (GG), Global:Local (GL) alignment with statisticsProteinFASTA server
GenoogleGenoogle uses indexing and parallel processing techniques for searching DNA and Proteins sequences. It is developed in Java and open source.BothAlbrecht F2015
HMMERLocal and global search with profile Hidden Markov models, more sensitive than PSI-BLASTBothdownloadDurbin R, Eddy SR, Krogh A, Mitchison G[4]1998
HH-suitePairwise comparison of profile Hidden Markov models; very sensitiveProteinSöding J[5][6]2005/2012
IDFInverse Document FrequencyBothdownload
InfernalProfile SCFG searchRNAdownloadEddy S
KLASTHigh-performance general purpose sequence similarity search toolBoth2009/2014
LAMBDAHigh performance local aligner compatible to BLAST, but much faster; supports SAM/BAMProteinHannes Hauswedell, Jochen Singer, Knut Reinert[7]2014
MMseqs2Software suite to search and cluster huge sequence sets. Similar sensitivity to BLAST and PSI-BLAST but orders of magnitude fasterProteinhomepageSteinegger M, Mirdita M, Galiez C, Söding J[8]2017
USEARCHUltra-fast sequence analysis toolBothhomepageEdgar, RC (2010) Search and clustering orders of magnitude faster than BLAST, Bioinformatics 26(19), 2460-2461. doi: 10.1093/bioinformatics/btq461 publication2010
OSWALDOpenCL Smith-Waterman on Altera's FPGA for Large Protein DatabasesProteinhomepageRucci E, García C, Botella G, De Giusti A, Naiouf M, Prieto-Matías M[9]2016
parasailFast Smith-Waterman search using SIMD parallelizationBothhomepageDaily J2015
PSI-BLASTPosition-specific iterative BLAST, local search with position-specific scoring matrices, much more sensitive than BLASTProteinNCBI PSI-BLASTAltschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ[10]1997
PSI-SearchCombining the Smith-Waterman search algorithm with the PSI-BLAST profile construction strategy to find distantly related protein sequences, and preventing homologous over-extension errors.ProteinEMBL-EBI PSI-SearchLi W, McWilliam H, Goujon M, Cowley A, Lopez R, Pearson WR[11]2012
ScalaBLASTHighly parallel Scalable BLASTBothScalaBLASTOehmen et al.[12]2011
SequilabLinking and profiling sequence alignment data from NCBI-BLAST results with major sequence analysis servers/servicesNucleotide, peptideserver2010
SAMLocal and global search with profile Hidden Markov models, more sensitive than PSI-BLASTBothSAMKarplus K, Krogh A[13]1999
SSEARCHSmith-Waterman search, slower but more sensitive than FASTABoth
SWAPHIFirst parallelized algorithm employing the emerging Intel Xeon Phis to accelerate Smith-Waterman protein database searchProteinhomepageLiu Y and Schmidt B2014
SWAPHI-LSFirst parallel Smith-Waterman algorithm exploiting Intel Xeon Phi clusters to accelerate the alignment of long DNA sequencesDNAhomepageLiu Y, Tran TT, Lauenroth F, Schmidt B2014
SWIMMSmith-Waterman implementation for Intel Multicore and Manycore architecturesProteinhomepageRucci E, García C, Botella G, De Giusti A, Naiouf M and Prieto-Matías M[14]2015
SWIPEFast Smith-Waterman search using SIMD parallelizationBothhomepageRognes T2011

*Sequence type: protein or nucleotide

Pairwise alignment[edit]

NameDescriptionSequence type*Alignment type**LinkAuthorYear
ACANAFast heuristic anchor based pairwise alignmentBothBothdownloadHuang, Umbach, Li2005
AlignMeAlignments for membrane protein sequencesProteinBothdownload, serverM. Stamm, K. Khafizov, R. Staritzbichler, L.R. Forrest2013
ALLALIGNFor DNA, RNA and proteins with summed length n, generates all local alignments in O(n) time using approximate suffix tree matching or mapped density dynamic alignmentBothLocalallalignE. Wachtel2017
Bioconductor Biostrings::pairwiseAlignmentDynamic programmingBothBoth + Ends-freesiteP. Aboyoun2008
BioPerl dpAlignDynamic programmingBothBoth + Ends-freesiteY. M. Chan2003
BLASTZ, LASTZSeeded pattern-matchingNucleotideLocaldownload, downloadSchwartz et al.[15][16]2004,2009
CUDAlignDNA sequence alignment of unrestricted size in single or multiple GPUsNucleotideLocal, SemiGlobal, GlobaldownloadE. Sandes[17][18][19]2011-2015
DNADotWeb-based dot-plot toolNucleotideGlobalserverR. Bowen1998
DNASTAR Lasergene Molecular Biology SuiteSoftware to align DNA, RNA, protein, or DNA + protein sequences via pairwise and multiple sequence alignment algorithms including MUSCLE, Mauve, MAFFT, Clustal Omega, Jotun Hein, Wilbur-Lipman, Martinez Needleman-Wunsch, Lipman-Pearson and Dotplot analysis.BothBothDNASTAR siteDNASTAR1993-2016
DOTLETJava-based dot-plot toolBothGlobalappletM. Pagni and T. Junier1998
FEASTPosterior based local extension with descriptive evolution modelNucleotideLocalsiteA. K. Hudek and D. G. Brown2010
Genome Compiler Genome CompilerAlign chromatogram files (.ab1, .scf) against a template sequence, locate errors, and correct them instantly. Learn moreNucleotideLocalFree online & downloadGenome Compiler Corporation2014
G-PASGPU-based dynamic programming with backtrackingBothLocal, SemiGlobal, Globalsite+downloadW. Frohmberg, M. Kierzynka et al.2011
GapMisDoes pairwise sequence alignment with one gapBothSemiGlobalsiteK. Frousios, T. Flouri, C. S. Iliopoulos, K. Park, S. P. Pissis, G. Tischler2012
GGSEARCH, GLSEARCHGlobal:Global (GG), Global:Local (GL) alignment with statisticsProteinGlobal in queryFASTA serverW. Pearson2007
JAlignerJavaopen-source implementation of Smith-WatermanBothLocalJWSA. Moustafa2005
K*SyncProtein sequence to structure alignment that includes secondary structure, structural conservation, structure-derived sequence profiles, and consensus alignment scoresProteinBothRobetta serverD. Chivian & D. Baker[20]2003
LALIGNMultiple, non-overlapping, local similarity (same algorithm as SIM)BothLocal non-overlappingW. Pearson1991 (algorithm)
NW-alignStandard Needleman-Wunsch dynamic programming algorithmProteinGlobalserver and downloadY Zhang2012
mAlignmodelling alignment; models the information content of the sequencesNucleotideBothdoccode[permanent dead link]D. Powell, L. Allison and T. I. Dix2004
matcherWaterman-Eggert local alignment (based on LALIGN)BothLocalPasteurI. Longden (modified from W. Pearson)1999
MCALIGN2explicit models of indel evolutionDNAGlobalserverJ. Wang et al.2006
MUMmersuffix tree basedNucleotideGlobaldownloadS. Kurtz et al.2004
needleNeedleman-Wunsch dynamic programmingBothSemiGlobalA. Bleasby1999
Ngilalogarithmic and affine gap costs and explicit models of indel evolutionBothGlobaldownloadR. Cartwright2007
NWNeedleman-Wunsch dynamic programmingBothGlobaldownloadA.C.R. Martin1990-2015
parasailC/C++/Python/Java SIMD dynamic programming library for SSE, AVX2BothGlobal, Ends-free, LocalsiteJ. Daily2015
PathSmith-Waterman on protein back-translationgraph (detects frameshifts at protein level)ProteinLocalM. Gîrdea et al.[21]2009
PatternHunterSeeded pattern-matchingNucleotideLocaldownloadB. Ma et al.[22][23]2002–2004
ProbA (also propA)Stochastic partition function sampling via dynamic programmingBothGlobaldownloadU. Mückstein2002
PyMOL'align' command aligns sequence & applies it to structureProteinGlobal (by selection)siteW. L. DeLano2007
REPutersuffix tree basedNucleotideLocaldownloadS. Kurtz et al.2001
SABERTOOTHAlignment using predicted Connectivity ProfilesProteinGlobaldownload on request[permanent dead link]F. Teichert, J. Minning, U. Bastolla, and M. Porto2009
SatsumaParallel whole-genome synteny alignmentsDNALocaldownloadM.G. Grabherr et al.2010
SEQALNVarious dynamic programmingBothLocal or globalserverM.S. Waterman and P. Hardy1996
SIM, GAP, NAP, LAPLocal similarity with varying gap treatmentsBothLocal or globalserverX. Huang and W. Miller1990-6
SIMLocal similarityBothLocalserversX. Huang and W. Miller1991
SPA: Super pairwise alignmentFast pairwise global alignmentNucleotideGlobalavailable upon requestShen, Yang, Yao, Hwang2002
SSEARCHLocal (Smith-Waterman) alignment with statisticsProteinLocalW. Pearson1981 (Algorithm)
Sequences StudioJava applet demonstrating various algorithms from[24]Generic sequenceLocal and globalA.Meskauskas1997 (reference book)
SWIFT suitFast Local Alignment SearchingDNALocalsiteK. Rasmussen,[25] W. Gerlach2005,2008
stretcherMemory-optimized Needleman-Wunsch dynamic programmingBothGlobalPasteurI. Longden (modified from G. Myers and W. Miller)1999
tranalignAligns nucleic acid sequences given a protein alignmentNucleotideNAPasteurG. Williams (modified from B. Pearson)2002
UGENEOpensource Smith-Waterman for SSE/CUDA, Suffix array based repeats finder & dotplotBothBothUGENE siteUniPro2010
waterSmith-Waterman dynamic programmingBothLocalA. Bleasby1999
wordmatchk-tuple pairwise matchBothNAPasteurI. Longden1998
YASSSeeded pattern-matchingNucleotideLocalL. Noe and G. Kucherov[26]2004

*Sequence type: protein or nucleotide **Alignment type: local or global

Multiple sequence alignment[edit]

NameDescriptionSequence type*Alignment type**LinkAuthorYearLicense
ABAA-Bruijn alignmentProteinGlobaldownloadB.Raphael et al.2004Proprietary, freeware for education, research, nonprofit
ALEmanual alignment ; some software assistanceNucleotidesLocaldownloadJ. Blandy and K. Fogel1994 (latest version 2007)Free, GPL2
ALLALIGNFor DNA, RNA and proteins with summed length n, generates all local alignments in O(n) time using approximate suffix tree matching or mapped density dynamic alignmentBothLocalallalignE. Wachtel2017Free
AMAPSequence annealingBothGlobalserverA. Schwartz and L. Pachter2006
anon.fast, optimal alignment of three sequences using linear gap costsNucleotidesGlobalpapersoftware[permanent dead link]D. Powell, L. Allison and T. I. Dix2000
BAli-PhyTree+multi-alignment; probabilistic-Bayesian; joint estimationBoth + CodonsGlobalWWW+downloadBD Redelings and MA Suchard2005 (latest version 2018)Free, GPL
Base-By-BaseJava-based multiple sequence alignment editor with integrated analysis toolsBothLocal or globaldownloadR. Brodie et al.2004Proprietary, freeware, must register
CHAOS, DIALIGNIterative alignmentBothLocal (preferred)serverM. Brudno and B. Morgenstern2003
ClustalWProgressive alignmentBothLocal or globalThompson et al.1994Free, LGPL
CodonCode AlignerMulti-alignment; ClustalW & Phrap supportNucleotidesLocal or globaldownloadP. Richterich et al.2003 (latest version 2009)
CompassCOmparison of Multiple Protein sequence Alignments with assessment of Statistical SignificanceProteinGlobaldownload and serverR.I. Sadreyev, et al.2009
DECIPHERProgressive-iterative alignmentBothGlobaldownloadErik S. Wright2014Free, GPL
DIALIGN-TX and DIALIGN-TSegment-based methodBothLocal (preferred) or Globaldownload and serverA.R.Subramanian2005 (latest version 2008)
DNA AlignmentSegment-based method for intraspecific alignmentsBothLocal (preferred) or GlobalserverA.Roehl2005 (latest version 2008)
DNA Baser Sequence AssemblerMulti-alignment; Full automatic sequence alignment; Automatic ambiguity correction; Internal base caller; Command line seq alignmentNucleotidesLocal or globalwww.DnaBaser.comHeracle BioSoft SRL2006 (latest version 2018)Commercial (some modules are freeware)
DNADynamolinked DNA to Protein multiple alignment with MUSCLE, Clustal and Smith-WatermanBothLocal or globaldownloadDNADynamo2004 (newest version 2017)
DNASTAR Lasergene Molecular Biology SuiteSoftware to align DNA, RNA, protein, or DNA + protein sequences via pairwise and multiple sequence alignment algorithms including MUSCLE, Mauve, MAFFT, Clustal Omega, Jotun Hein, Wilbur-Lipman, Martinez Needleman-Wunsch, Lipman-Pearson and Dotplot analysis.BothLocal or globalDNASTAR siteDNASTAR1993-2016
EDNAEnergy Based Multiple Sequence Alignment for DNA Binding SitesNucleotidesLocal or globalsourceforge.net/projects/msa-edna/Salama, RA. et al.2013
FAMSAProgressive alignment for extremely large protein families (hundreds of thousands of members)ProteinGlobaldownloadDeorowicz et al.2016
FSASequence annealingBothGlobaldownload and serverR. K. Bradley et al.2008
GeneiousProgressive-Iterative alignment; ClustalW pluginBothLocal or globaldownloadA.J. Drummond et al.2005 (latest version 2017)
KalignProgressive alignmentBothGlobalT. Lassmann2005
MAFFTProgressive-iterative alignmentBothLocal or globalK. Katoh et al.2005Free, BSD
MARNAMulti-alignment of RNAsRNALocalS. Siebert et al.2005
MAVIDProgressive alignmentBothGlobalserverN. Bray and L. Pachter2004
MSADynamic programmingBothLocal or globaldownloadD.J. Lipman et al.1989 (modified 1995)
MSAProbsDynamic programmingProteinGlobaldownloadY. Liu, B. Schmidt, D. Maskell2010
MULTALINDynamic programming-clusteringBothLocal or globalF. Corpet1988
Multi-LAGANProgressive dynamic programming alignmentBothGlobalserverM. Brudno et al.2003
MUSCLEProgressive-iterative alignmentBothLocal or globalserverR. Edgar2004
OpalProgressive-iterative alignmentBothLocal or globaldownloadT. Wheeler and J. Kececioglu2007 (latest stable 2013, latest beta 2016)
PecanProbabilistic-consistencyDNAGlobaldownloadB. Paten et al.2008
PhyloA human computing framework for comparative genomics to solve multiple alignmentNucleotidesLocal or globalsiteMcGill Bioinformatics2010
PMFastRProgressive structure aware alignmentRNAGlobalsiteD. DeBlasio, J Braund, S Zhang2009
PralineProgressive-iterative-consistency-homology-extended alignment with preprofiling and secondary structure predictionProteinGlobalserverJ. Heringa1999 (latest version 2009)
PicXAANonprogressive, maximum expected accuracy alignmentBothGlobaldownload and serverS.M.E. Sahraeian and B.J. Yoon2010
POAPartial order/hidden Markov modelProteinLocal or globaldownloadC. Lee2002
ProbalignProbabilistic/consistency with partition function probabilitiesProteinGlobalserverRoshan and Livesay2006Free, public domain
ProbConsProbabilistic/consistencyProteinLocal or globalserverC. Do et al.2005Free, public domain
PROMALS3DProgressive alignment/hidden Markov model/Secondary structure/3D structureProteinGlobalserverJ. Pei et al.2008
PRRN/PRRPIterative alignment (especially refinement)ProteinLocal or globalY. Totoki (based on O. Gotoh)1991 and later
PSAlignAlignment preserving non-heuristicBothLocal or globaldownloadS.H. Sze, Y. Lu, Q. Yang.2006
RevTransCombines DNA and Protein alignment, by back translating the protein alignment to DNA.DNA/Protein (special)Local or globalserverWernersson and Pedersen2003 (newest version 2005)
SAGASequence alignment by genetic algorithmProteinLocal or globaldownloadC. Notredame et al.1996 (new version 1998)
SAMHidden Markov modelProteinLocal or globalserverA. Krogh et al.1994 (most recent version 2002)
Se-AlManual alignmentBothLocaldownloadA. Rambaut2002
StatAlignBayesian co-estimation of alignment and phylogeny (MCMC)BothGlobaldownloadA. Novak et al.2008
StemlocMultiple alignment and secondary structure predictionRNALocal or globaldownloadI. Holmes2005Free, GPL 3 (parte de DART)
T-CoffeeMore sensitive progressive alignmentBothLocal or globalC. Notredame et al.2000 (newest version 2008)Free, GPL 2
UGENESupports multiple alignment with MUSCLE, KAlign, Clustal and MAFFT pluginsBothLocal or globaldownloadUGENE team2010 (newest version 2012)Free, GPL 2
VectorFriendsVectorFriends Aligner, MUSCLE plugin, and ClustalW pluginBothLocal or globaldownloadBioFriends team2013Proprietary, freeware for academic use
GLProbsAdaptive pair-Hidden Markov Model based approachProteinGlobaldownloadY. Ye et al.2013

*Sequence type: protein or nucleotide. **Alignment type: local or global

Genomics analysis[edit]

NameDescriptionSequence type*Link
ACT (Artemis Comparison Tool)Synteny and comparative genomicsNucleotideserver
AVIDPairwise global alignment with whole genomesNucleotideserver
BLATAlignment of cDNA sequences to a genome.Nucleotide[27]
DECIPHERAlignment of rearranged genomes using 6 frame translationNucleotidedownload
FLAKFuzzy whole genome alignment and analysisNucleotideserver
GMAPAlignment of cDNA sequences to a genome. Identifies splice site junctions with high accuracy.Nucleotidehttp://research-pub.gene.com/gmap
SplignAlignment of cDNA sequences to a genome. Identifies splice site junctions with high accuracy. Able to recognize and separate gene duplications.Nucleotidehttps://www.ncbi.nlm.nih.gov/sutils/splign
MauveMultiple alignment of rearranged genomesNucleotidedownload
MGAMultiple Genome AlignerNucleotidedownload
MulanLocal multiple alignments of genome-length sequencesNucleotideserver
MultizMultiple alignment of genomesNucleotidedownload
PLAST-ncRNASearch for ncRNAs in genomes by partition function local alignmentNucleotideserver
SequeromeProfiling sequence alignment data with major servers/servicesNucleotide, peptideserver
SequilabProfiling sequence alignment data from NCBI-BLAST results with major servers-servicesNucleotide, peptideserver
Shuffle-LAGANPairwise glocal alignment of completed genome regionsNucleotideserver
SIBsim4, Sim4A program designed to align an expressed DNA sequence with a genomic sequence, allowing for intronsNucleotidedownload
SLAMGene finding, alignment, annotation (human-mouse homology identification)Nucleotideserver

*Sequence type: protein or nucleotide


Motif finding[edit]

NameDescriptionSequence type*Link
PMSMotif search and discoveryBoth
FMMMotif search and discovery (can get also positive & negative sequences as input for enriched motif search)Nucleotideserver
BLOCKSUngapped motif identification from BLOCKS databaseBothserver
eMOTIFExtraction and identification of shorter motifsBothservers
Gibbs motif samplerStochastic motif extraction by statistical likelihoodBoth
HMMTOPPrediction of transmembrane helices and topology of proteinsProteinhomepage & download
I-sitesLocal structure motif libraryProteinserver
JCoilsPrediction of Coiled coil and Leucine ZipperProteinhomepage & download
MEME/MASTMotif discovery and searchBothserver
CUDA-MEMEGPU accelerated MEME (v4.4.0) algorithm for GPU clustersBothhomepage
MERCIDiscriminative motif discovery and searchBothhomepage & download
PHI-BlastMotif search and alignment toolBothPasteur
PhyloscanMotif search toolNucleotideserver
PRATTPattern generation for use with ScanPrositeProteinserver
ScanPrositeMotif database search toolProteinserver
TEIRESIASMotif extraction and database searchBothserver
BASALTMultiple motif and regular expression searchBothhomepage

Amino Acid Sequence Alignment Program

*Sequence type: protein or nucleotide


Benchmarking[edit]

NameLinkAuthors
PFAM 30.0 (2016)
SMART (2015)websiteLetunic, Copley, Schmidt, Ciccarelli, Doerks, Schultz, Ponting, Bork
BAliBASE 3 (2015)websiteThompson, Plewniak, Poch
Oxbench (2011)downloadRaghava, Searle, Audley, Barber, Barton
Benchmark collection (2009)websiteEdgar
HOMSTRAD (2005)websiteMizuguchi
PREFAB 4.0 (2005)websiteEdgar
SABmark (2004)downloadVan Walle, Lasters, Wyns

Alignment viewers, editors[edit]

Please see List of alignment visualization software.

Short-read sequence alignment[edit]

NameDescriptionpaired-end optionUse FASTQ qualityGappedMulti-threadedLicenseLinkReferenceYear
AriocComputes Smith-Waterman gapped alignments and mapping qualities on one or more GPUs. Supports BS-seq alignments. Processes 100,000 to 500,000 reads per second (varies with data, hardware, and configured sensitivity).YesNoYesYesFree, BSDgithub[28]2015
BarraCUDAA GPGPU accelerated Burrows-Wheeler transform (FM-index) short read alignment program based on BWA, supports alignment of indels with gap openings and extensions.YesNoYesYes, POSIX Threads and CUDAFree, GPLlink
BBMapUses a short kmers to rapidly index genome; no size or scaffold count limit. Higher sensitivity and specificity than Burrows-Wheeler aligners, with similar or greater speed. Performs affine-transform-optimized global alignment, which is slower but more accurate than Smith-Waterman. Handles Illumina, 454, PacBio, Sanger, and Ion Torrent data. Splice-aware; capable of processing long indels and RNA-seq. Pure Java; runs on any platform. Used by the Joint Genome Institute.YesYesYesYesFree, BSDlink2010
BFASTExplicit time and accuracy tradeoff with a prior accuracy estimation, supported by indexing the reference sequences. Optimally compresses indexes. Can handle billions of short reads. Can handle insertions, deletions, SNPs, and color errors (can map ABI SOLiD color space reads). Performs a full Smith Waterman alignment.Yes, POSIX ThreadsFree, GPLlink[permanent dead link][29]2009
BigBWARuns the Burrows-Wheeler Aligner-BWA on a Hadoop cluster. It supports the algorithms BWA-MEM, BWA-ALN, and BWA-SW, working with paired and single reads. It implies an important reduction in the computational time when running in a Hadoop cluster, adding scalability and fault-tolerance.YesLow quality bases trimmingYesYesFree, GPL 3link[30]2015
BLASTNBLAST's nucleotide alignment program, slow and not accurate for short reads, and uses a sequence database (EST, Sanger sequence) rather than a reference genome.link
BLATMade by Jim Kent. Can handle one mismatch in initial alignment step.Yes, client-serverProprietary, freeware for academic and noncommercial uselink[31]2002
BowtieUses a Burrows-Wheeler transform to create a permanent, reusable index of the genome; 1.3 GB memory footprint for human genome. Aligns more than 25 million Illumina reads in 1 CPU hour. Supports Maq-like and SOAP-like alignment policiesYesYesNoYes, POSIX ThreadsFree, Artisticlink[32]2009
BWAUses a Burrows-Wheeler transform to create an index of the genome. It's a bit slower than Bowtie but allows indels in alignment.YesLow quality bases trimmingYesYesFree, GPLlink[33]2009
BWA-PSSMA probabilistic short read aligner based on the use of position specific scoring matrices (PSSM). The aligner is adaptable in the sense that it can take into account the quality scores of the reads and models of data specific biases, such as those observed in Ancient DNA, PAR-CLIP data or genomes with biased nucleotide compositions.[34]YesYesYesYesFree, GPLlink[34]2014
CASHXQuantify and manage large quantities of short-read sequence data. CASHX pipeline contains a set of tools that can be used together, or separately as modules. This algorithm is very accurate for perfect hits to a reference genome.NoProprietary, freeware for academic and noncommercial uselink
CloudburstShort-read mapping using Hadoop MapReduceYes, HadoopMapReduceFree, Artisticlink
CUDA-ECShort-read alignment error correction using GPUs.Yes, GPU enabledlink
CUSHAWA CUDA compatible short read aligner to large genomes based on Burrows-Wheeler transformYesYesNoYes (GPU enabled)Free, GPLlink[35]2012
CUSHAW2Gapped short-read and long-read alignment based on maximal exact match seeds. This aligner supports both base-space (e.g. from Illumina, 454, Ion Torrent and PacBio sequencers) and ABI SOLiD color-space read alignments.YesNoYesYesFree, GPLlink2014
CUSHAW2-GPUGPU-accelerated CUSHAW2 short-read aligner.YesNoYesYesFree, GPLlink
CUSHAW3Sensitive and accurate base-space and color-space short-read alignment with hybrid seedingYesNoYesYesFree, GPLlink[36]2012
drFASTRead mapping alignment software that implements cache obliviousness to minimize main/cache memory transfers like mrFAST and mrsFAST, however designed for the SOLiD sequencing platform (color space reads). It also returns all possible map locations for improved structural variation discovery.YesYes, for structural variationYesNoFree, BSDlink
ELANDImplemented by Illumina. Includes ungapped alignment with a finite read length.
ERNEExtended Randomized Numerical alignEr for accurate alignment of NGS reads. It can map bisulfite-treated reads.YesLow quality bases trimmingYesMultithreading and MPI-enabledFree, GPL 3link
GASSSTFinds global alignments of short DNA sequences against large DNA banksMultithreadingCeCILL version 2 License.link[37]2011
GEMHigh-quality alignment engine (exhaustive mapping with substitutions and indels). More accurate and several times faster than BWA or Bowtie 1/2. Many standalone biological applications (mapper, split mapper, mappability, and other) provided.YesYesYesYesDual, freeware for noncommercial use; GEM source is currently unavailablelink[38]2012
Genalice MAPUltra fast and comprehensive NGS read aligner with high precision and small storage footprint.YesLow quality bases trimmingYesYesProprietary, commerciallink
Geneious AssemblerFast, accurate overlap assembler with the ability to handle any combination of sequencing technology, read length, any pairing orientations, with any spacer size for the pairing, with or without a reference genome.YesProprietary, commerciallink
GensearchNGSComplete framework with user-friendly GUI to analyse NGS data. It integrates a proprietary high quality alignment algorithm and plug-in ability to integrate various public aligner into a framework allowing to import short reads, align them, detect variants, and generate reports. It is made for resequencing projects, namely in a diagnostic setting.YesNoYesYesProprietary, commerciallink
GMAP and GSNAPRobust, fast short-read alignment. GMAP: longer reads, with multiple indels and splices (see entry above under Genomics analysis); GSNAP: shorter reads, with one indel or up to two splices per read. Useful for digital gene expression, SNP and indel genotyping. Developed by Thomas Wu at Genentech. Used by the National Center for Genome Resources (NCGR) in Alpheus.YesYesYesYesProprietary, freeware for academic and noncommercial uselink
GNUMAPAccurately performs gapped alignment of sequence data obtained from next-generation sequencing machines (specifically of Solexa-Illumina) back to a genome of any size. Includes adaptor trimming, SNP calling and Bisulfite sequence analysis.Yes, also supports Illumina *_int.txt and *_prb.txt files with all 4 quality scores for each baseMultithreading and MPI-enabledlink[39]2009
HIVE-hexagonUses a hash table and bloom matrix to create and filter potential positions on the genome. For higher efficiency uses cross-similarity between short reads and avoids realigning non unique redundant sequences. It is faster than Bowtie and BWA and allows indels and divergent sensitive alignments on viruses, bacteria, and more conservative eukaryotic alignments.YesYesYesYesProprietary, freeware for academic and noncommercial users registered to HIVE deployment instancelink[40]2014
IMOSImproved Meta-aligner and Minimap2 On Spark. A long read distributed aligner on Apache Spark platform with linear scalability w.r.t. single node execution.YesYesYesFreegithub
IsaacFully uses all the computing power available on one server node; thus, it scales well over a broad range of hardware architectures, and alignment performance improves with hardware abilitiesYesYesYesYesFree, GPLgithub
LASTUses adaptative seeds and copes more efficiently with repeat-rich sequences (e.g. genomes). For example: it can align reads to genomes without repeat-masking, without becoming overwhelmed by repetitive hits.YesYesYesNoFree, GPLlink[41]2011
MAQUngapped alignment that takes into account quality scores for each base.Free, GPLlink
mrFAST, mrsFASTGapped (mrFAST) and ungapped (mrsFAST) alignment software that implements cache obliviousness to minimize main/cache memory transfers. They are designed for the Illumina sequencing platform and they can return all possible map locations for improved structural variation discovery.YesYes, for structural variationYesNoFree, BSD
MOMMOM or maximum oligonucleotide mapping is a query matching tool that captures a maximal length match within the short read.Yeslink
MOSAIKFast gapped aligner and reference-guided assembler. Aligns reads using a banded Smith-Waterman algorithm seeded by results from a k-mer hashing scheme. Supports reads ranging in size from very short to very long.Yeslink
MPscanFast aligner based on a filtration strategy (no indexing, use q-grams and Backward Nondeterministic DAWG Matching)link[42]2009
Novoalign & NovoalignCSGapped alignment of single end and paired end Illumina GA I & II, ABI Colour space & ION Torrent reads. High sensitivity and specificity, using base qualities at all steps in the alignment. Includes adapter trimming, base quality calibration, Bi-Seq alignment, and options for reporting multiple alignments per read. Use of ambiguous IUPAC codes in reference for common SNPs can improve SNP recall and remove allelic bias.YesYesYesMulti-threading and MPI versions available with paid licenseProprietary, freeware single threaded version for academic and noncommercial useNovocraft
NextGENeDeveloped for use by biologists performing analysis of next generation sequencing data from Roche Genome Sequencer FLX, Illumina GA/HiSeq, Life Technologies Applied BioSystems’ SOLiD System, PacBio and Ion Torrent platforms.YesYesYesYesProprietary, commercialSoftgenetics
NextGenMapFlexible and fast read mapping program (twice as fast as BWA), achieves a mapping sensitivity comparable to Stampy. Internally uses a memory efficient index structure (hash table) to store positions of all 13-mers present in the reference genome. Mapping regions where pairwise alignments are required are dynamically determined for each read. Uses fast SIMD instructions (SSE) to accelerate alignment calculations on CPU. If available, alignments are computed on GPU (using OpenCL/CUDA) further reducing runtime 20-50%.YesNoYesYes, POSIX Threads, OpenCL/CUDA, SSEFreeOfficial GitHub Page[43]2013
Omixon Variant ToolkitIncludes highly sensitive and highly accurate tools for detecting SNPs and indels. It offers a solution to map NGS short reads with a moderate distance (up to 30% sequence divergence) from reference genomes. It poses no restrictions on the size of the reference, which, combined with its high sensitivity, makes the Variant Toolkit well-suited for targeted sequencing projects and diagnostics.YesYesYesYesProprietary, commercialwww.omixon.com
PALMapperEfficiently computes both spliced and unspliced alignments at high accuracy. Relying on a machine learning strategy combined with a fast mapping based on a banded Smith-Waterman-like algorithm, it aligns around 7 million reads per hour on one CPU. It refines the originally proposed QPALMA approach.YesFree, GPLlink
Partek FlowFor use by biologists and bioinformaticians. It supports ungapped, gapped and splice-junction alignment from single and paired-end reads from Illumina, Life technologies Solid TM, Roche 454 and Ion Torrent raw data (with or without quality information). It integrates powerful quality control on FASTQ/Qual level and on aligned data. Additional functionality include trimming and filtering of raw reads, SNP and InDel detection, mRNA and microRNA quantification and fusion gene detection.YesYesYesMultiprocessor-core, client-server installation possibleProprietary, commercial, free trial version[1]
PASSIndexes the genome, then extends seeds using pre-computed alignments of words. Works with base space, color space (SOLID), and can align genomic and spliced RNA-seq reads.YesYesYesYesProprietary, freeware for academic and noncommercial usePASS_HOME
PerMIndexes the genome with periodic seeds to quickly find alignments with full sensitivity up to four mismatches. It can map Illumina and SOLiD reads. Unlike most mapping programs, speed increases for longer read lengths.YesFree, GPLlink[44]
PRIMEXIndexes the genome with a k-mer lookup table with full sensitivity up to an adjustable number of mismatches. It is best for mapping 15-60 bp sequences to a genome.NoNoYesNo, multiple processes per searchlink[2]2003
QPalmaCan use quality scores, intron lengths, and computation splice site predictions to perform and performs an unbiased alignment. Can be trained to the specifics of a RNA-seq experiment and genome. Useful for splice site/intron discovery and for gene model building. (See PALMapper for a faster version).Yes, client-serverFree, GPL 2link
RazerSNo read length limit. Hamming or edit distance mapping with configurable error rates. Configurable and predictable sensitivity (runtime/sensitivity tradeoff). Supports paired-end read mapping.Free, LGPLlink
REAL, cREALREAL is an efficient, accurate, and sensitive tool for aligning short reads obtained from next-generation sequencing. The programme can handle an enormous amount of single-end reads generated by the next-generation Illumina/Solexa Genome Analyzer. cREAL is a simple extension of REAL for aligning short reads obtained from next-generation sequencing to a genome with circular structure.YesYesFree, GPLlink
RMAPCan map reads with or without error probability information (quality scores) and supports paired-end reads or bisulfite-treated read mapping. There are no limitations on read length or number of mismatches.YesYesYesFree, GPL 3link
rNAA randomized Numerical Aligner for Accurate alignment of NGS readsYesLow quality bases trimmingYesMultithreading and MPI-enabledFree, GPL 3link
RTG InvestigatorExtremely fast, tolerant to high indel and substitution counts. Includes full read alignment. Product includes comprehensive pipelines for variant detection and metagenomic analysis with any combination of Illumina, Complete Genomics and Roche 454 data.YesYes, for variant callingYesYesProprietary, freeware for individual investigator uselink
SegemehlCan handle insertions, deletions, mismatches; uses enhanced suffix arraysYesNoYesYesProprietary, freeware for noncommercial uselink[45]2009
SeqMapUp to 5 mixed substitutions and insertions-deletions; various tuning options and input-output formatsProprietary, freeware for academic and noncommercial uselink
ShrecShort read error correction with a suffix tree data structureYes, Javalink
SHRiMPIndexes the reference genome as of version 2. Uses masks to generate possible keys. Can map ABI SOLiD color space reads.YesYesYesYes, OpenMPFree, [[BSD licenses Free, BSD]] derivativelink

[46][47]

2009-2011
SLIDERSlider is an application for the Illumina Sequence Analyzer output that uses the 'probability' files instead of the sequence files as an input for alignment to a reference sequence or a set of reference sequences.YesYesNoNolink[48][49]2009-2010
SOAP, SOAP2, SOAP3, SOAP3-dpSOAP: robust with a small (1-3) number of gaps and mismatches. Speed improvement over BLAT, uses a 12 letter hash table. SOAP2: using bidirectional BWT to build the index of reference, and it is much faster than the first version. SOAP3: GPU-accelerated version that could find all 4-mismatch alignments in tens of seconds per one million reads. SOAP3-dp, also GPU accelerated, supports arbitrary number of mismatches and gaps according to affine gap penalty scores.YesNoYes, SOAP3-dpYes, POSIX Threads; SOAP3, SOAP3-dp need GPU with CUDA supportFree, GPLlink[50][51]
SOCSFor ABI SOLiD technologies. Significant increase in time to map reads with mismatches (or color errors). Uses an iterative version of the Rabin-Karp string search algorithm.YesFree, GPLlink
SparkBWAIntegrates the Burrows-Wheeler Aligner—BWA on an Apache Spark framework running atop Hadoop. Version 0.2 of October 2016, supports the algorithms BWA-MEM, BWA-backtrack, and BWA-ALN. All of them work with single-reads and paired-end reads.YesLow quality bases trimmingYesYesFree, GPL 3link[52]2016
SSAHA, SSAHA2Fast for a small number of variantsProprietary, freeware for academic and noncommercial uselink
StampyFor Illumina reads. High specificity, and sensitive for reads with indels, structural variants, or many SNPs. Slow, but speed increased dramatically by using BWA for first alignment pass.YesYesYesNoProprietary, freeware for academic and noncommercial uselink[53]2010
SToRMFor Illumina or ABI SOLiD reads, with SAM native output. Highly sensitive for reads with many errors, indels (full from 0 to 15, extended support otherwise). Uses spaced seeds (single hit) and a very fast SSE-SSE2-AVX2-AVX-512 banded alignment filter. For fixed-length reads only, authors recommend SHRiMP2 otherwise.NoYesYesYes, OpenMPFreelink[54]2010
Subread, SubjuncSuperfast and accurate read aligners. Subread can be used to map both gDNA-seq and RNA-seq reads. Subjunc detects exon-exon junctions and maps RNA-seq reads. They employ a novel mapping paradigm named seed-and-vote.YesYesYesYesFree, GPL 3
TaipanDe-novo assembler for Illumina readsProprietary, freeware for academic and noncommercial uselink
UGENEVisual interface both for Bowtie and BWA, and an embedded alignerYesYesYesYesFree, GPLlink
VelociMapperFPGA-accelerated reference sequence alignment mapping tool from TimeLogic. Faster than Burrows-Wheeler transform-based algorithms like BWA and Bowtie. Supports up to 7 mismatches and/or indels with no performance penalty. Produces sensitive Smith-Waterman gapped alignments.YesYesYesYesProprietary, commercialTimeLogic
XpressAlignFPGA based sliding window short read aligner which exploits the embarrassingly parallel property of short read alignment. Performance scales linearly with number of transistors on a chip (i.e. performance guaranteed to double with each iteration of Moore's Law without modification to algorithm). Low power consumption is useful for datacentre equipment. Predictable runtime. Better price/performance than software sliding window aligners on current hardware, but not better than software BWT-based aligners currently. Can manage large numbers (>2) of mismatches. Will find all hit positions for all seeds. Single-FPGA experimental version, needs work to develop it into a multi-FPGA production version.Proprietary, freeware for academic and noncommercial uselink
ZOOM100% sensitivity for a reads between 15-240 bp with practical mismatches. Very fast. Support insertions and deletions. Works with Illumina & SOLiD instruments, not 454.Yes (GUI), no (CLI)Proprietary, commerciallink[55]

See also[edit]

References[edit]

  1. ^Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ; Gish; Miller; Myers; Lipman (October 1990). 'Basic local alignment search tool'. Journal of Molecular Biology. 215 (3): 403–10. doi:10.1016/S0022-2836(05)80360-2. PMID2231712.CS1 maint: multiple names: authors list (link)
  2. ^Angermüller, C.; Biegert, A.; Söding, J. (Dec 2012). 'Discriminative modelling of context-specific amino acid substitution probabilities'. Bioinformatics. 28 (24): 3240–7. doi:10.1093/bioinformatics/bts622. PMID23080114.
  3. ^Buchfink, Xie and Huson (2015). 'Fast and sensitive protein alignment using DIAMOND'. Nature Methods. 12 (1): 59–60. doi:10.1038/nmeth.3176. PMID25402007.
  4. ^Durbin, Richard; Eddy, Sean R.; Krogh, Anders; Mitchison, Graeme, eds. (1998). Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge, UK: Cambridge University Press. ISBN978-0-521-62971-3.[page needed]
  5. ^Söding J (April 2005). 'Protein homology detection by HMM-HMM comparison'. Bioinformatics. 21 (7): 951–60. doi:10.1093/bioinformatics/bti125. PMID15531603.
  6. ^Remmert, Michael; Biegert, Andreas; Hauser, Andreas; Söding, Johannes (2011-12-25). 'HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment'. Nature Methods. 9 (2): 173–175. doi:10.1038/nmeth.1818. hdl:11858/00-001M-0000-0015-8D56-A. ISSN1548-7105. PMID22198341.
  7. ^Hauswedell H, Singer J, Reinert K (2014-09-01). 'Lambda: the local aligner for massive biological data'. Bioinformatics. 30 (17): 349–355. doi:10.1093/bioinformatics/btu439. PMC4147892. PMID25161219.
  8. ^Steinegger, Martin; Soeding, Johannes (2017-10-16). 'MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets'. Nature Biotechnology. 35 (11): 1026–1028. doi:10.1038/nbt.3988. hdl:11858/00-001M-0000-002E-1967-3. PMID29035372.
  9. ^Rucci, Enzo; Garcia, Carlos; Botella, Guillermo; Giusti, Armando E. De; Naiouf, Marcelo; Prieto-Matias, Manuel (2016-06-30). 'OSWALD: OpenCL Smith–Waterman on Altera's FPGA for Large Protein Databases'. International Journal of High Performance Computing Applications. 32 (3): 337–350. doi:10.1177/1094342016654215. ISSN1094-3420.
  10. ^Altschul SF, Madden TL, Schäffer AA, et al. (September 1997). 'Gapped BLAST and PSI-BLAST: a new generation of protein database search programs'. Nucleic Acids Research. 25 (17): 3389–402. doi:10.1093/nar/25.17.3389. PMC146917. PMID9254694.
  11. ^Li W, McWilliam H, Goujon M, et al. (June 2012). 'PSI-Search: iterative HOE-reduced profile SSEARCH searching'. Bioinformatics. 28 (12): 1650–1651. doi:10.1093/bioinformatics/bts240. PMC3371869. PMID22539666.
  12. ^Oehmen, C.; Nieplocha, J. (August 2006). 'ScalaBLAST: A scalable implementation of BLAST for high-performancemw-data:TemplateStyles:r886058088'>
  13. ^Hughey, R.; Karplus, K.; Krogh, A. (2003). SAM: sequence alignment and modeling software system. Technical report UCSC-CRL-99-11 (Report). University of California, Santa Cruz, CA.
  14. ^Rucci, Enzo; García, Carlos; Botella, Guillermo; De Giusti, Armando; Naiouf, Marcelo; Prieto-Matías, Manuel (2015-12-25). 'An energy-aware performance analysis of SWIMM: Smith–Waterman implementation on Intel's Multicore and Manycore architectures'. Concurrency and Computation: Practice and Experience. 27 (18): 5517–5537. doi:10.1002/cpe.3598. ISSN1532-0634.
  15. ^Schwartz S, Kent WJ, Smit A, Zhang Z, Baertsch R, Hardison RC, Haussler D, Miller W; Kent; Smit; Zhang; Baertsch; Hardison; Haussler; Miller (2003). 'Human-mouse alignments with BLASTZ'. Genome Research. 13 (1): 103–107. doi:10.1101/gr.809403. PMC430961. PMID12529312.CS1 maint: multiple names: authors list (link)
  16. ^Harris R S (2007). Improved pairwise alignment of genomic DNA (Thesis).
  17. ^Sandes, Edans F. de O.; de Melo, Alba Cristina M.A. (May 2013). 'Retrieving Smith-Waterman Alignments with Optimizations for Megabase Biological Sequences Using GPU'. IEEE Transactions on Parallel and Distributed Systems. 24 (5): 1009–1021. doi:10.1109/TPDS.2012.194.
  18. ^Sandes, Edans F. de O.; Miranda, G.; De Melo, A.C.M.A.; Martorell, X.; Ayguade, E. (May 2014). CUDAlign 3.0: Parallel Biological Sequence Comparison in Large GPU Clusters. Cluster, Cloud and Grid Computing (CCGrid), 2014 14th IEEE/ACM International Symposium on. p. 160. doi:10.1109/CCGrid.2014.18.
  19. ^Sandes, Edans F. de O.; Miranda, G.; De Melo, A.C.M.A.; Martorell, X.; Ayguade, E. (August 2014). Fine-grain Parallel Megabase Sequence Comparison with Multiple Heterogeneous GPUs. Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. pp. 383–384. doi:10.1145/2555243.2555280.
  20. ^Chivian, D; Baker, D (2006). 'Homology modeling using parametric alignment ensemble generation with consensus and energy-based model selection'. Nucleic Acids Research. 34 (17): e112. doi:10.1093/nar/gkl480. PMC1635247. PMID16971460.
  21. ^Girdea, M; Noe, L; Kucherov, G (January 2010). 'Back-translation for discovering distant protein homologies in the presence of frameshift mutations'. Algorithms for Molecular Biology. 5 (6): 6. doi:10.1186/1748-7188-5-6. PMC2821327. PMID20047662.
  22. ^Ma, B.; Tromp, J.; Li, M. (2002). 'PatternHunter: faster and more sensitive homology search'. Bioinformatics. 18 (3): 440–445. doi:10.1093/bioinformatics/18.3.440. PMID11934743.
  23. ^Li, M.; Ma, B.; Kisman, D.; Tromp, J. (2004). 'Patternhunter II: highly sensitive and fast homology search'. Journal of Bioinformatics and Computational Biology. 2 (3): 417–439. CiteSeerX10.1.1.1.2393. doi:10.1142/S0219720004000661. PMID15359419.
  24. ^Gusfield, Dan (1997). Algorithms on strings, trees and sequences. Cambridge university press. ISBN978-0-521-58519-4.
  25. ^Rasmussen K, Stoye J, Myers EW; Stoye; Myers (2006). 'Efficient q-Gram Filters for Finding All epsilon-Matches over a Given Length'. Journal of Computational Biology. 13 (2): 296–308. CiteSeerX10.1.1.465.2084. doi:10.1089/cmb.2006.13.296. PMID16597241.CS1 maint: multiple names: authors list (link)
  26. ^Noe L, Kucherov G; Kucherov (2005). 'YASS: enhancing the sensitivity of DNA similarity search'. Nucleic Acids Research. 33 (suppl_2): W540–W543. doi:10.1093/nar/gki478. PMC1160238. PMID15980530.
  27. ^'Index of /admin/exe'.
  28. ^Wilton, Richard; Budavari, Tamas; Langmead, Ben; Wheelan, Sarah J.; Salzberg, Steven L.; Szalay, Alexander S. (2015). 'Arioc: high-throughput read alignment with GPU-accelerated exploration of the seed-and-extend search space'. PeerJ. 3: e808. doi:10.7717/peerj.808. PMC4358639. PMID25780763.
  29. ^Homer, Nils; Merriman, Barry; Nelson, Stanley F. (2009). 'BFAST: An Alignment Tool for Large Scale Genome Resequencing'. PLOS ONE. 4 (11): e7767. doi:10.1371/journal.pone.0007767. PMC2770639. PMID19907642.
  30. ^Abuín, J.M.; Pichel, J.C.; Pena, T.F.; Amigo, J. (2015). 'BigBWA: approaching the Burrows–Wheeler aligner to Big Data technologies'. Bioinformatics. 31 (24): 4003–5. doi:10.1093/bioinformatics/btv506. PMID26323715.
  31. ^Kent, W. J. (2002). 'BLAT---The BLAST-Like Alignment Tool'. Genome Research. 12 (4): 656–664. doi:10.1101/gr.229202. ISSN1088-9051. PMC187518. PMID11932250.
  32. ^Langmead, Ben; Trapnell, Cole; Pop, Mihai; Salzberg, Steven L (2009). 'Ultrafast and memory-efficient alignment of short DNA sequences to the human genome'. Genome Biology. 10 (3): R25. doi:10.1186/gb-2009-10-3-r25. ISSN1465-6906. PMC2690996. PMID19261174.
  33. ^Li, H.; Durbin, R. (2009). 'Fast and accurate short read alignment with Burrows-Wheeler transform'. Bioinformatics. 25 (14): 1754–1760. doi:10.1093/bioinformatics/btp324. ISSN1367-4803. PMC2705234. PMID19451168.
  34. ^ abKerpedjiev, Peter; Frellsen, Jes; Lindgreen, Stinus; Krogh, Anders (2014). 'Adaptable probabilistic mapping of short reads using position specific scoring matrices'. BMC Bioinformatics. 15 (1): 100. doi:10.1186/1471-2105-15-100. ISSN1471-2105. PMC4021105. PMID24717095.
  35. ^Liu, Y.; Schmidt, B.; Maskell, D. L. (2012). 'CUSHAW: a CUDA compatible short read aligner to large genomes based on the Burrows-Wheeler transform'. Bioinformatics. 28 (14): 1830–1837. doi:10.1093/bioinformatics/bts276. ISSN1367-4803. PMID22576173.
  36. ^Liu, Y.; Schmidt, B. (2012). 'Long read alignment based on maximal exact match seeds'. Bioinformatics. 28 (18): i318–i324. doi:10.1093/bioinformatics/bts414. ISSN1367-4803. PMC3436841. PMID22962447.
  37. ^Rizk, Guillaume; Lavenier, Dominique (2010). 'GASSST: global alignment short sequence search tool'. Bioinformatics. 26 (20): 2534–2540. doi:10.1093/bioinformatics/btq485. PMC2951093. PMID20739310.
  38. ^Marco-Sola, Santiago; Sammeth, Michael; Guigó, Roderic; Ribeca, Paolo (2012). 'The GEM mapper: fast, accurate and versatile alignment by filtration'. Nature Methods. 9 (12): 1185–1188. doi:10.1038/nmeth.2221. ISSN1548-7091. PMID23103880.
  39. ^Clement, N. L.; Snell, Q.; Clement, M. J.; Hollenhorst, P. C.; Purwar, J.; Graves, B. J.; Cairns, B. R.; Johnson, W. E. (2009). 'The GNUMAP algorithm: unbiased probabilistic mapping of oligonucleotides from next-generation sequencing'. Bioinformatics. 26 (1): 38–45. doi:10.1093/bioinformatics/btp614. ISSN1367-4803. PMC6276904. PMID19861355.
  40. ^Santana-Quintero, Luis; Dingerdissen, Hayley; Thierry-Mieg, Jean; Mazumder, Raja; Simonyan, Vahan (2014). 'HIVE-Hexagon: High-Performance, Parallelized Sequence Alignment for Next-Generation Sequencing Data Analysis'. PLOS ONE. 9 (6): 1754–1760. doi:10.1371/journal.pone.0099033. PMC4053384. PMID24918764.
  41. ^Kielbasa, S.M.; Wan, R.; Sato, K.; Horton, P.; Frith, M.C. (2011). 'Adaptive seeds tame genomic sequence comparison'. Genome Research. 21 (3): 487–493. doi:10.1101/gr.113985.110. PMC3044862. PMID21209072.
  42. ^Rivals, Eric; Salmela, Leena; Kiiskinen, Petteri; Kalsi, Petri; Tarhio, Jorma (2009). mpscan: Fast Localisation of Multiple Reads in Genomes. Algorithms in Bioinformatics. Lecture Notes in Computer Science. 5724. pp. 246–260. CiteSeerX10.1.1.156.928. doi:10.1007/978-3-642-04241-6_21. ISBN978-3-642-04240-9.
  43. ^Sedlazeck, Fritz J.; Rescheneder, Philipp; von Haeseler, Arndt (2013). 'NextGenMap: fast and accurate read mapping in highly polymorphic genomes'. Bioinformatics. 29 (21): 2790–2791. doi:10.1093/bioinformatics/btt468. PMID23975764.
  44. ^Chen, Yangho; Souaiaia, Tade; Chen, Ting (2009). 'PerM: efficient mapping of short sequencing reads with periodic full sensitive spaced seeds'. Bioinformatics. 25 (19): 2514–2521. doi:10.1093/bioinformatics/btp486. PMC2752623. PMID19675096.
  45. ^Searls, David B.; Hoffmann, Steve; Otto, Christian; Kurtz, Stefan; Sharma, Cynthia M.; Khaitovich, Philipp; Vogel, Jörg; Stadler, Peter F.; Hackermüller, Jörg (2009). 'Fast Mapping of Short Sequences with Mismatches, Insertions and Deletions Using Index Structures'. PLoS Computational Biology. 5 (9): e1000502. doi:10.1371/journal.pcbi.1000502. ISSN1553-7358. PMC2730575. PMID19750212.
  46. ^Rumble, Stephen M.; Lacroute, Phil; Dalca, Adrian V.; Fiume, Marc; Sidow, Arend; Brudno, Michael (2009). 'SHRiMP: Accurate Mapping of Short Color-space Reads'. PLOS Computational Biology. 5 (5): e1000386. doi:10.1371/journal.pcbi.1000386. PMC2678294. PMID19461883.
  47. ^David, Matei; Dzamba, Misko; Lister, Dan; Ilie, Lucian; Brudno, Michael (2011). 'SHRiMP2: Sensitive yet Practical Short Read Mapping'. Bioinformatics. 27 (7): 1011–1012. doi:10.1093/bioinformatics/btr046. PMID21278192.
  48. ^Malhis, Nawar; Butterfield, Yaron S. N.; Ester, Martin; Jones, Steven J. M. (2009). 'Slider – Maximum use of probability information for alignment of short sequence reads and SNP detection'. Bioinformatics. 1 (1): 6–13. doi:10.1093/bioinformatics/btn565. PMC2638935. PMID18974170.
  49. ^Malhis, Nawar; Jones, Steven J. M. (2010). 'High Quality SNP Calling Using Illumina Data at Shallow Coverage'. Bioinformatics. 26 (8): 1029–1035. doi:10.1093/bioinformatics/btq092. PMID20190250.
  50. ^Li, R.; Li, Y.; Kristiansen, K.; Wang, J. (2008). 'SOAP: short oligonucleotide alignment program'. Bioinformatics. 24 (5): 713–714. doi:10.1093/bioinformatics/btn025. ISSN1367-4803. PMID18227114.
  51. ^Li, R.; Yu, C.; Li, Y.; Lam, T.-W.; Yiu, S.-M.; Kristiansen, K.; Wang, J. (2009). 'SOAP2: an improved ultrafast tool for short read alignment'. Bioinformatics. 25 (15): 1966–1967. doi:10.1093/bioinformatics/btp336. ISSN1367-4803. PMID19497933.
  52. ^Abuín, José M.; Pichel, Juan C.; Pena, Tomás F.; Amigo, Jorge (2016-05-16). 'SparkBWA: Speeding Up the Alignment of High-Throughput DNA Sequencing Data'. PLOS ONE. 11 (5): e0155461. doi:10.1371/journal.pone.0155461. ISSN1932-6203. PMC4868289. PMID27182962.
  53. ^Lunter, G.; Goodson, M. (2010). 'Stampy: A statistical algorithm for sensitive and fast mapping of Illumina sequence reads'. Genome Research. 21 (6): 936–939. doi:10.1101/gr.111120.110. ISSN1088-9051. PMC3106326. PMID20980556.
  54. ^Noe, L.; Girdea, M.; Kucherov, G. (2010). 'Designing efficient spaced seeds for SOLiD read mapping'. Advances in Bioinformatics. 2010: 708501. doi:10.1155/2010/708501. PMC2945724. PMID20936175.
  55. ^Lin, H.; Zhang, Z.; Zhang, M.Q.; Ma, B.; Li, M. (2008). 'ZOOM! Zillions of oligos mapped'. Bioinformatics. 24 (21): 2431–2437. doi:10.1093/bioinformatics/btn416. PMC2732274. PMID18684737.

Online Sequence Alignment Tool

Retrieved from 'https://en.wikipedia.org/w/index.php?title=List_of_sequence_alignment_software&oldid=912704530'