Blog Post

Using Perl for Bioinformatics

Using Perl for Bioinformatics

Bioinformatics is the use of computer technology in the analysis of biological information. As a result of the Human Genome Project, the need for bioinformatics is greater than ever. By using bioinformatics, analyzing genetic information is made significantly easier and efficient.

In this tutorial, we will be using Perl Script to write a simple bioinformatics program that simulates the generation of a complementary strand of DNA, which is what happens in a cell when DNA is being copied and replicated. If you are using a Mac or Linux OS you can get started right away. PC users will need to download Strawberry first (this can be done at Once that’s done we can get started.

This is what our program is going to look like once it is executed:

As you can see above, this program asks the user to input a single DNA strand sequence. The appropriate complementary strand of DNA will then be returned and the user will then be asked to press any key to exit the program.

Let’s start with asking the user for the input string, which in this case is a strand of DNA. Open up your text editor of choice (I like Notepad++ for coding in Perl) and type in the following:

When executed, the program will print “Enter single DNA strand: “. $ symbolizes a variable and in this case the variable $dnaseq will be <STDIN>, which is whatever input comes in from the keyboard.

DNA is composed of a As, Gs, Ts, and Cs. As will pair up with Ts while Gs pair up in Cs generating a double helical structure. This complementary pairing of nucleotides can be represented through code. We will want to use $dnaseq to generate the complementary strand of DNA and then to display what the opposite strand actually is. This can be done by writing the following lines of code:

The for loop shown above will continue until it reaches the end of $dnaseq. The if and elsif statements that follow will print out the nucleotides that are complementary to those found in the original DNA sequence.

Once you’ve gotten all of this written out you can execute this program. Enter a random series of As, Ts, Gs, and Cs if you’d like and you'll see what the complementary strand of your made up DNA strand. Or, you can look up actual DNA sequences on the Online Mendelian Inheritance in Man (OMIM) website to generate the DNA strands of known genetic data. A program like this is extremely useful in the science because the genetic sequence for certain genes can range from 10 base pairs (A's, T's, G's, and C's) in length to thousands of base pairs in length. This kind of program can make processing this kind of information sigficantly more efficient. This code can also be used to write more complicated programs such as for writing a program that transcribes a DNA strand to an RNA strand. A code such as that would aid in understanding protein synthesis since RNA is what is used to create proteins in cells.


No comments