Monday, February 13, 2012

This is research


Repost: http://cteg.berkeley.edu/~nielsen/2012/this-is-research/

After Rori's post, I'm inspired to put together a video (or more) on my past and current projects, but I felt compelled to write a little about what goes into the research, instead of just the results.

Quick, when I say I'm doing research what is the first thing you think of? Probably something like this:


"Sarah Sawah, photographed by Michael Wakely (Almanac - April 14, 2009, Vol 55, No 29)"

Standing up in a lab, probably messing around with some kind of chemicals, or measuring wee beasties.

Yes, some people do those things, but there is so much more to the world of research. I research using a computer, and yes, it can be very exciting. Sure, if you wandered into our lab space you'd find yourself surrounded by cubicles and people tap, tap, tapping away on their keyboards, but the research we do is so much more than typing.

Let's start with what my research looks like. Now, if you're like I was in undergrad (terrified of computer programming), you might want to brace yourself. On a typical day, I probably write several small codes, something like this:


#!/usr/bin/perl -w
use strict;#-------------------------------------------------------------
# What: This program will remove NR_* gene annotations
#-------------------------------------------------------------
my $usage = "remove_NR.pl [RefSeq chr input]\n";
die $usage unless @ARGV == 1;
#-------------------------------------------------------------

open (GENES, "<$ARGV[0]") or die "Cannot open $ARGV[0]:$!\n";
my @genes = ; # Define array
close(GENES);


my @filename = split(/\./, $ARGV[0]);
my $file = $filename[0];
my $outfile = "$file"."_no_NR.txt";


open(OUT, ">$outfile");
print OUT "$genes[0]";


foreach my $genes (@genes){
if ($genes =~ m/NM_[0-9]/){
print OUT "$genes";
}
}



Still with me? Good. : )

After overcoming my fear of computer programming, I've discovered how accessible, and useful, and yes, fun, writing programs can be (if you're rolling your eyes at that one, I understand, but would argue that writing your first successful program may bring out the inner techie in you too!).

I write codes to analyze DNA sequences, but the real research for me comes, not in writing the actual code writing (although some scientists do study codes, and are very good at it), but in deciding what questions to ask, and how to write the code to test our hypotheses.

For example, I am broadly interested in how sex chromosomes evolve. In mammals, males have an X and a Y chromosome while females have two copies of the X, and several hundred thousand years ago, the X and Y used to be identical. Today, however, the X is still very large (~1100 genes) while the Y is small and degraded (fewer than 100 genes). To study the differences between them, I use codes to align the remaining X and Y sequences, I use codes to look for differences between the sequences (like an A to a T change, or an insertion in one of the sequences), and I use codes to estimate how quickly those sequences are changing.

Studying billions of base pairs of DNA is simply not feasible to do by hand. Further, using computer programs can (if they are properly debugged!) eliminate many forms of accidental human errors. And really, how cools is it that this morning it took me 3 minutes and 27 seconds to read in and analyze ten thousand genes. Ten thousand! In less than four minutes! Really, that's amazing.

Yes, we need biologists who work in the lab, who collect samples, who study beasties big and small, but there is a whole contingent of biologists who follow the same scientific method, who spend hours pouring over data, who discover new and exciting results, and who happen to conduct their research at a desk.

1 comment:

Mike Russo said...

I'm right there with you; I <3 Perl! I use it all the time to do data analysis of my simulations. Everything from tracking individual atoms during the course of a simulation, to generating input files for a ray tracing program to make fancy movies of my work :-)