Bioinformatic Analysis of Mosquito COX1 Gene Based on the Distance Values of Species Across India and the Globe
Divya Damodaran*, Sudarsanam D
Department of Advanced Zoology and Biotechnology, Loyola college, University of Madras, Chennai, India
*Corresponding author: Divya Damodaran, Department of Advanced Zoology and Biotechnology, Loyola college, University of Madras, Chennai, India. Email:divya.shaheed@gmail.com
Received Date: 27 September, 2017; Accepted Date: 23 October, 2017; Published
Date: 30 October, 2017
1. Abstract
Molecular
phylogenetics provides insights into relationships among organisms - through
"Species" trees and gene trees provides insights into the evolution
and history of genes.It applies a blend of molecular and statistical methods to
induce evolutionary connections among living beings or genes. The essential
target of molecular phylogenetic studies is to recover the order of
transformative events and represent them in developmental trees that
graphically illustrate connections among species or genes after some time.
Distance analysis compares two aligned sequences at a time, and builds a matrix
of all possible sequence pairs. During each comparison, the number of changes
(base substitutions and insertion/deletion events) are counted and presented as
a proportion of the overall sequence length. Final estimates of the difference
between all possible pairs of sequences are known as pairwise distances.
1. Introduction
Currently, the most commonly used barcode region for animals is a 5′‐segment of the mitochondrial gene Cytochrome Oxidase I (COI) called the ‘Universal’ or ‘Folmer’ region. This region is the standard marker chosen by the Barcode of Life Database (BOLD), which is an online platform for collating and curating DNA barcoding information from around the world (Ratnasingham and Hebert 2007).Genetic techniques are considered to be relatively free from the subjectivity of identifying morphological features and can reveal the presence of cryptic species complexes that are often overlooked (e.g., Hemmerter et al. 2007). As such, barcoding as a method for identifying mosquitoes is vital to the accuracy of a surveillance program. In this study, we evaluated the use of the COI fragment as a barcode and compared the distance values of mosquito species present globally and in the Indian subcontinent. The study also discussesthe relationships between different mosquito species and the composition of mosquito genera. Distance matrix is a phenetic approach preferred by many molecular biologists for DNA and protein work. This method estimates the mean number of changes (per site in sequence) in two taxa that have descended from a common ancestor. There is much information in the gene sequences that must be simplified in order to compare only two species at a time. The relevant measure is the number of differences in these two sequences, a measure that can be interpreted as the distance between the species in terms of relatedness.The overall mean distance values of global and Indian mosquito species were also studied.
2. Keywords:Bioinformatics;DNA Barcoding;Distance Values; Phylogenetic Analysis
3. Materials and Methods
3.1. Distance Matrix
Distance-matrix methods of phylogenetic analysis explicitly rely on a measure of "Genetic distance" between the sequences being classified, and therefore they require an MSA (multiple sequence alignment) as an input. Distance is often defined as the fraction of mismatches at aligned positions, with gaps either ignored or counted as mismatches. Distance methods attempt to construct an all-to-all matrix from the sequence query set describing the distance between each sequence pair. From this is constructed a phylogenetic tree that places closely related sequences under the same interior node and whose branch lengths closely reproduce the observed distances between sequences. Distance-matrix methods may produce either rooted or unrooted trees, depending on the algorithm used to calculate them. They are frequently used as the basis for progressive and iterative types of multiple sequence alignment. The main disadvantage of distance-matrix methods is their inability to efficiently use information about local high-variation regions that appear across multiple subtrees.
3.2. Estimating Evolutionary Distances Using Pairwise Distance
MEGA 6 software was used to you can calculate average pair-wise distances between sequences in several ways: Overall, Within Groups, Between Groups and Net Between Groups.
3.2.1. Basic Descriptive Statistics in SAS
PROC MEANS is used in a variety of analytic, business intelligence, reporting and data management situations. PROC MEANS capabilities may be employed in “Data cleansing” or “Exploratory data analysis” tasks to determine if incorrect or “Bad” values of analysis variables are contained in the data set that must be transformed or removed prior to further analysis. PROC MEANS is included the BASE Module of SAS System Software.
PROC MEANS noprint DATA=Aedes SKEW KURTOSIS MEAN STD T PROBT;
OUTPUTOUT=Aedes_mean_global
SKEW=SKEWNESS_global KURTOSIS=KURTOSIS_global MEAN=MEAN_globalstd=STD_global
T=NORMAL_global PROBT=PVALUE_global;varDistance_global;
RUN;
DATA = - Specify data set to use
NOPRINT -
Do not print output
VAR variable -
specifies which numeric variables to use
OUTPUT OUT = datasetname-
statistics will be output to a SAS data file
SKEW = - specifies the name of the column
to be assigned for the Skew column produced in Proc Means procedure
KURTOSIS = - specifies the name of the column
to be assigned for the Kurtosis column produced in Proc Means procedure
MEAN - Arithmetic average
STD -Standard Deviation
3.3. Line Overlay Plots
A line plot is a graphical display of data along a number line with symbols connected by a line. The symbols represent data points. The symbols can represent frequency. It is best to use a line plot when comparing fewer than 25 numbers. It is a quick, simple way to organize data.
A line plot will have outliers. An outlier is a number that is much greater or much less than the other numbers in the data set. Outliers are usually represented without any data transformation.
A line plot consists of a horizontal line which is the x-axis with equal intervals. It is important for a line to plot to have a title and a label of the x-axis to provide the reader an overview of what is being displayed. Also, line plots must have legends to explain what is being measured.
3.3.1. Line Overlay Plots in SAS
The GPLOT procedure plots the values of two or more variables on a set of coordinate axes (X and Y). The coordinates of each point on the plot correspond to two variable values in an observation of the input data set. The GPLOT procedure creates a temporary SAS data set that is used to generate an image map in an SVG file when you are sending output to the LISTING destination. (This option is not necessary when you are sending output to the HTML destination.) The drill-down URLs in the image map must be provided by variables in the input data set.
proc gplot data=Aedes;title "Aedes";
SYMBOL1 I=JOIN
V=DOT WIDTH=1 HEIGHT=1 CV=GREEN CI=GREEN; /*Global*/
SYMBOL2 I=JOIN
V=DOT WIDTH=1 HEIGHT=1 CV=BLUE CI=BLUE; /*Indian*/
SYMBOL3 I=JOIN
V=DOT WIDTH=1 HEIGHT=1 CV=BROWN CI=BROWN; /*Kerala*/
PLOT (distance_globaldistance_indiandistance_kerala)*distance / CAXIS=BROWN nolegend CFRAME=LIGHTBLUE overlay;run; quit;
DATA = -
specifies the SAS dataset name from plots are to be done
TITLE “” - specifies title for the plot
output
SYMBOL1,
SYMBOL2, SYMBOL3 statements - specifies different output symbols and colours
for different line plots
PLOT statement - specifies the list of numeric
variables to be plotted in the overlay plot
OVERLAY option - specifies that many line plots should be shown in the same plot
4. Results
and Discussion
4.1. Estimates of Evolutionary Divergence between Sequences
Tables for evolutionary divergence between the mosquito species of Kerala, India and across the globe were generated using the maximum composite likelihood model.The number of base substitutions per site from between sequences are shown. Analyses were conducted using the Maximum Composite Likelihood model. The analysis involved 28 nucleotide sequences for Kerala species. Codon positions included were 1st+2nd+3rd+Noncoding. All positions containing gaps and missing data were eliminated. There wasa total of 611 positions in the final dataset. Evolutionary analyses were conducted in MEGA6.
For Indian species, the number of base substitutions per site from between sequences are shown. Analyses were conducted using the Maximum Composite Likelihood model [1]. The analysis involved 16 nucleotide sequences in Indian species. Codon positions included were 1st+2nd+3rd+Noncoding. All positions containing gaps and missing data were eliminated. There was a total of 490 positions in the final dataset.In the case of global species, analyses were conducted using the Maximum Composite Likelihood model. The analysis involved 15 nucleotide sequences. Codon positions included were 1st+2nd+3rd+Noncoding. All positions containing gaps and missing data were eliminated. There was a total of 512 positions in the final dataset. Evolutionary analyses were conducted in MEGA6Table 2-6.
The distance values of global species are more negatively skewed than positive skewness. The distance values of Indian mosquito species are largely positively skewed. The Kerala species which are distributed only in Aedes and Culexgenus are negatively skewed.
4.2. Scatter plots - Comparing the distance values of the three groups
Further, scatter
plots were obtained for the distance values of each genus. It should be noted
here that the top hits of distance values are taken for the analysis though the
genus may actually be present in all the 3 groups - Global, Indian and
Kerala.In the scatter plots below, the green line represents distance values of
global species, the blue line represents distance values of Indian species and
the brown line represents distance values of Kerala species.
5. Discussion
The biostatistical analysis of the mosquito species of the three-geographic region suggested that Toxorhynchitis, Orthopodomyia and Mimomyia occur only in the Indian species. Limatus was not found to be related to any species in the Indian and Kerala species groups.Hodgesia species was found only in India. Borichinda, Chagasia andNyctomyia are found only in the global species.
The greater the genetic distance between populations, the less breeding there is between them and the more isolated they are from one another. The lower the genetic distance between populations, the more breeding there is between them and the less isolated they are from one another. Values on the high end indicate some isolation between populations, and most likely mean that the populations are not currently breeding with one another. Values on the low end indicate that the populations are sharing their genetic material through high levels of breeding. Overall mean distance value in the Indian speciesis recorded as 4.67, suggesting there is more isolation among the Indian species. If two species have a small distance between them (as measured by the number of differences in their character sequences), then they have a recent common ancestor; but if they are far apart, then their common ancestor is in the remote past. We can use the distance between the species as a measure of the distance in time since the species diverged. These two distances, the number of character differences and the time since divergence, will be approximately proportional when they're relatively small.
6. Conflict
of Interests:The author declares no conflict of interests.
Figure 1: Aedeomyia (Aedeomyia was found to be closer to other species
among the Indian genus).
Figure
2: Aedes (Aedes
species are found across the globe however it was prevalent in Kerala. This was
evident from the distance values being more in Kerala).
Figure 3: Anopheles (The distance values of Anopheles
were found in global species).
Figure 4: Armigeres (The distance values of Armigeres species
were closer only in Indian species).
Figure 5: Borichinda (Borichinda species are found across the globe but
not in India).
Figure 6: Chagasia (Chagasia species are found across the globe but
not in India).
Figure 7: Culex (Culex species
are found across the globe however it was more prevalent in Kerala. This was
evident from the distance values being more in Kerala).
Figure 8:
Ficalbia (Ficalbia species
are closely related to Indian species and hence are present only in the group
of distance values of Indian species.
Figure 9: Hodgesia
(Hodgesia species
are found only in India).
Figure 10: Limatus (Only
one distance value was obtained for the genus Limatus in the group of global
mosquito species. Limatus was not found to be related to any species in the
Indian and Kerala species groups.
Figure 11: Lutzia (The
species of Lutzia were
closely related to global and Indian species as evident from the chart.
Figure 12: Malaya (Species
of Malaya
are present in the global species group with distance value close to few other
global species).
Figure 13: Mansonia (Mansonia species were found among the global
species).
Figure 14: Mimomyia (Mimomyia
species were found only in the Indian species group).
Figure 15: Nyctomyia
(Nyctomyia
species occur only in the global species group).
Figure 16: Ochlerotatus (Species
of Ochlerotatus
was found in the global and Indian species groups with wide range of distance
values.
Figure 17: Orthopodomyia (Species
of Orthopodomyia was
found only in Indian species with a small range of distance values.
Figure 18: Psorophora (Psorophora genus
occurs only in the global genus group with a small range of distance values).
Figure 19: Toxorhynchites (Toxorhynchites genus
occurs only in Indian genus group with a very small range of distance values
against few Indian species.
Figure20:Tripteroides(Tripteroidesgenus
occurs only in Indian genus group with high distance values against few Indian
species.
Figure 21: Uranotaenia
(Uranotaenia
genus shows a wide range of distance values in the global and Indian species
group. Though it shows relatedness only to very few Indian species, the range
of distance value is very high.
Figure 22: Wyeomyia (Wyeomyia
shows close relatedness to only two species in the global group).
|
Global genus
|
Indian genus |
Kerala genus |
|
Aedes |
Aedes |
Aedes |
|
Culex |
Ficalbia |
Culex |
|
Anopheles |
Mimomyia |
|
|
Borichinda |
Orthopodomyia |
|
|
Chagasia |
Uranotaenia |
|
|
Limatus |
Aedeomyia |
|
|
Lutzia |
Anopheles |
|
|
Mansonia |
Malaya |
|
|
Nyctomyia |
Tripteroides |
|
|
Ochlerotatus |
Lutzia |
|
|
Psorophora |
Ochlerotatus |
|
|
Uranotaenia |
Armigeres |
|
|
Wyeomyia |
Hodgesia |
|
|
|
Toxorhynchites |
|
|
Distance value analysis across genus |
||
Table 1: Distribution of mosquito genus across the globe, in India and in Kerala.
Table 2: Estimates of Evolutionary Divergence between Sequences of Kerala mosquito sp.
Table 3: Estimates of Evolutionary Divergence between Sequences of Indian mosquito species
Table 4: Estimates of Evolutionary Divergence between Sequences of global mosquitoes
|
Genus
|
No of species |
MEAN_Global |
STD_Global |
MEAN_Indian |
STD_Indian |
MEAN_Kerala |
STD_Kerala |
|
Aedeomyia |
14 |
|
|
4.61015968 |
0.8209613 |
|
|
|
Aedes |
84 |
2.14150969 |
1.6723311 |
5.91519674 |
1.70899657 |
3.34697739 |
1.02060716 |
|
Anopheles |
13 |
2.19742519 |
1.56109312 |
3.56893041 |
|
|
|
|
Armigeres |
11 |
|
|
5.18577073 |
0.97010325 |
|
|
|
Borichinda |
6 |
2.89695411 |
0.48829907 |
|
|
|
|
|
Chagasia |
3 |
2.99673371 |
0.68455408 |
|
|
|
|
|
Culex |
50 |
1.92248503 |
1.63237414 |
|
|
3.55076552 |
0.83660842 |
|
Ficalbia |
13 |
|
|
4.56522958 |
1.32820502 |
|
|
|
Hodgesia |
9 |
|
|
4.25205447 |
0.80472496 |
|
|
|
Limatus |
1 |
3.30051513 |
|
|
|
|
|
|
Lutzia |
9 |
1.84121742 |
1.92963567 |
5.56352614 |
1.14175693 |
|
|
|
Malaya |
4 |
|
|
4.29329846 |
1.47338557 |
|
|
|
Mansonia |
8 |
2.27360141 |
1.85616911 |
|
|
|
|
|
Mimomyia |
12 |
|
|
4.59420894 |
0.87637907 |
|
|
|
Nyctomyia |
5 |
2.4187205 |
1.33145748 |
|
|
|
|
|
Ochlerotatus |
20 |
2.16678597 |
1.45729738 |
4.55916076 |
1.22457303 |
|
|
|
Orthopodomyia |
6 |
|
|
3.89686994 |
1.23080656 |
|
|
|
Psorophora |
11 |
3.03573939 |
1.17872096 |
|
|
|
|
|
Toxorhynchites |
7 |
|
|
3.50589741 |
0.62954431 |
|
|
|
Tripteroides |
8 |
|
|
5.31990604 |
1.39029998 |
|
|
|
Uranotaenia |
19 |
2.74427837 |
1.11870989 |
3.78157214 |
3.41777328 |
|
|
|
Wyeomyia |
2 |
1.618676 |
2.16322465 |
|
|
|
|
|
The mean and standard deviation values for all the three groups are tabulated below. |
|||||||
Table 5: Basic Descriptive Statistics of Distance Values of all Genus.
|
Genus
|
Number of species |
SKEWNESS_ Global |
KURTOSIS_ Global |
SKEWNESS_Indian |
KURTOSIS_Indian |
SKEWNESS_Kerala |
KURTOSIS_Kerala |
|
Aedeomyia |
14 |
|
|
0.5778686 |
0.877181 |
|
|
|
Aedes |
84 |
-0.328022 |
-1.869409 |
|
|
-0.5113681 |
1.4596398 |
|
Anopheles |
13 |
-0.637107 |
-1.734709 |
|
|
|
|
|
Armigeres |
11 |
|
|
0.0026733 |
0.162524 |
|
|
|
Borichinda |
6 |
0.3483025 |
-1.147263 |
|
|
|
|
|
Chagasia |
3 |
-1.729392 |
|
|
|
|
|
|
Culex |
50 |
-0.20929 |
-2.116797 |
|
|
-0.32894 |
-1.0505693 |
|
Ficalbia |
13 |
|
|
0.2051113 |
-1.45355 |
|
|
|
Hodgesia |
9 |
|
|
0.9890915 |
2.132938 |
|
|
|
Limatus |
1 |
|
|
|
|
|
|
|
Lutzia |
9 |
8.72E-05 |
-5.999483 |
-0.069212 |
-0.618239 |
|
|
|
Malaya |
4 |
|
|
1.8710308 |
3.512792 |
|
|
|
Mansonia |
8 |
-0.231863 |
-1.838699 |
|
|
|
|
|
Mimomyia |
12 |
|
|
0.5489878 |
1.588763 |
|
|
|
Nyctomyia |
5 |
-1.521145 |
1.71543 |
|
|
|
|
|
Ochlerotatus |
20 |
-0.682133 |
-1.50357 |
1.1189143 |
0.715222 |
|
|
|
Orthopodomyia |
6 |
|
|
0.0898244 |
-3.015679 |
|
|
|
Psorophora |
11 |
0.4350519 |
-1.719519 |
|
|
|
|
|
Toxorhynchites |
7 |
|
|
0.4797761 |
-1.558486 |
|
|
|
Tripteroides |
8 |
|
|
0.0635613 |
-1.328823 |
|
|
|
Uranotaenia |
19 |
-1.643276 |
2.064902 |
-0.439821 |
|
|
|
|
Wyeomyia |
2 |
|
|
|
|
|
|
|
The distance values of global species are more negatively skewed than positive skewness. The distance values of Indian mosquito species are largely positively skewed. The Kerala species which are distributed only in Aedesand Culexgenus are negatively skewed |
|||||||
Table 6: The Skewness and Kurtosis for the Three Groups are Tabulated Below.
|
Global species |
2.371 |
|
Indian species |
4.678 |
|
Kerala species |
3.33 |
Overall Mean Distance Values in the Three Groups.
- Berger J. Introduction to Molecular
Phylogeny Construction. BIOL 334.
- Day WHE (1986)
Computational complexity of inferring phylogenies from dissimilarity
matrices". Bulletin of Mathematical Biology 49: 461-467.
- Endo T, Ogishima S, Tanaka H (2003)
Standardized phylogenetic tree: a reference to discover functional evolution J
MolEvol 57: 174-181.
- Felsenstein J(2004) Inferring
phylogenies. Sinauer Associates, Sunderland, Massachusetts.
- Felsenstein J (2004) Inferring
Phylogenies Sinauer Associates: Sunderland, MA.
- Felsenstein J (1981) Evolutionary trees
from DNA sequences: a maximum likelihood approach, Journal of Molecular
Evolution 17:368-376
- Fitch WM,Margoliash
E (1967) Construction of phylogenetic trees. Science 155: 279-284.
- Hillis DM, Moritz C, Mable BK, eds. (1996)
Molecular systematics, 2nd ed. Sinauer Associates, Sunderland, Massachusetts.
- http://study.com/academy/lesson/what-is-a-line-plot-in-math-definition-examples.html
- http://support.sas.com/documentation/cdl/en/graphref/67881/HTML/default/viewer.htm#n1ca4rvgoodca6n19cn5npeiwccb.htm
- http://www2.sas.com/proceedings/sugi29/240-29.pdf
- http://www.itl.nist.gov/div898/handbook/eda/section3/eda35b.htm
- http://www.mathplanet.com/education/algebra-2/equations-and-inequalities/line-plots-and-stem-and-leaf-plots
- http://www.socialresearchmethods.net/kb/statdesc.php
- http://www.stattutorials.com/SAS/TUTORIAL-PROC-MEANS.htm
- Mount DM (2004) Bioinformatics: Sequence
and Genome Analysis 2nd ed. Cold Spring Harbor Laboratory Press: Cold Spring
Harbor NY.
- Pagel M (1999)
Inferring historical patterns of biological evolution. Nature 401: 877-884.
- vise JC (2000) Phylogeny: The history
and formation of species. Harvard University Press, Cambridge, Massachusetts.
- Wen-Hsiung Li (1997) Molecular
Evolution. Sinauer Associates.
- Whelan S, Lio P, Goldman
N(2001)Molecular phylogenetics: state-of-the-art methods for looking into the past
Trends in Genetics 17: 262-272.
- ZuckerlandlE, Pauling, L (1962)
Molecular disease, evolution, and genetic heterogeneity. In Horizons in
Biochemistry (Kasha,M. and Pullman, B., eds), 189-225, Academic Press 1921-1930
© by the Authors & Gavin Publishers. This is an Open Access Journal Article Published Under Attribution-Share Alike CC BY-SA: Creative Commons Attribution-Share Alike 4.0 International License. Read More About Open Access Policy.