Saturday, February 9, 2013

Teaching moment

I was very excited to get to teach one lesson for my advisor's course this week. The course is co-taught, and although it was the other professor's time to teach, both were out of town, so my advisor asked myself and one other postdoc to step up for the week.

I had a blast! The course is methods in Statistical Genomics, mainly for statistics and mathematics student. The students were alert (for a 9:30am class at Berkeley, I was especially impressed), and engaged. They asked questions, and really seemed interested in what I was talking about. My main goals for the class were to get them thinking about what kinds of data are available (to apply their statistics to), to help familiarize themselves with where and how to access the data, and to get them thinking about the diversity of the questions they can ask.

Below is my outline for the class, and some references I handed out to the students. I took about an hour to go through the first three points, and my fellow postdoc spent the remaining half hour on the fourth point.

Introduction to Bioinformatics: Finding Data

1.              What kind of data is there:  Overview of the Genome
1.              Central Dogma
1.              DNA --transcribed--> RNA --translated--> Protein
1.              DNA is a double helix (forward/reverse), four nucleotides
(Adenine, Guanine, Cytosine, and Thymine)
2.              Ribosomes transcribe the DNA to form single strands of RNA
(Adenine, Guanine, Cytosine, and Uracil)
3.              RNA is translated into protein
1.              read in triplets
2.              64 permutations of three nucleotides, but only 20 amino acids, plus three stop codons
3.              starting with the start codon, Methionine, and ending in one of the three stop codons, TAG, TGA, TAA

2.              Coding regions
1.              Affected by selection
2.              Genes
1.              5’, 3’ UTR, exons, introns
2.              multiple isoforms (major and minor, mostly similar exons)
3.              Transcripts
1.              miRNA, snoRNA, lcRNA

3.              Repetitive
1.              Transposable elements (SINEs, LINEs)
2.              Simple tandem repeats (microsatellites, mini-satellites)
3.              Copy number variants

4.              Neutral regions
1.              Noncoding
2.              Far or near genes?
3.              CpG sites – mutation rate is 15-30x’s higher than non-CpG sites
1.              Cytosine deaminated into a Uracil à becomes a Thymine upon repair

2.              What kind of data do you want?
1.              Across species: Comparative Genomics
1.              Multiple alignments – mammals, vertebrates, worms, flies,
2.              What kinds of questions?
1.              How has evolved across species
2.              Has gene family (opsins, olfactory, brain-related) expanded in certain lineages?
3.              Which genes are highly conserved across species? (Difficult to ask the opposite, because highly diverged genes will align poorly)
4.              What is the genome structure across viruses (influenza, HIV)
5.              Gene content evolution (e.g. yeast – bread/beer or bacteria – gut microbiome)

2.              Within species: Population Genetics
1.              Data for multiple individuals
2.              Human
1.              Complete Genomics (fewer individuals, higher coverage)
2.              1000 Genomes (more individuals, lower coverage)
3.              HapMap, dbSNP
3.              Non-Human
1.              dbSNP
2.              Flybase, WormBase
4.              What kinds of questions?
1.              Demographic history – out of Africa, human dispersal around the world, mating patterns
2.              Identify genes subject to natural selection (high altitude adaptation or lactose digestion in humans, response to climate change)
3.              Effects of artificial selection (rice domestication, changes in dog genome due to selective breeding)
4.              Evolution of mimicry (poisonous versus nonpoisonous species – butterflies and frogs)

3.              How to get the data?
1.              UCSC Genome Browser - Example downloading gene coding positions on chrX
2.              Galaxy – Example of interface, extracting multiple alignments for all genes on chrX

4.              R example for parsing and analyizing files
1.              Background of the 1000 genomes project, explain vcf
2.              R code to extract .vcf
3.              PCA with subset of 1000genomes
4.       Clustering (UPGMA, Neighbor-Joining)


Get Data
1. UCSC Genome Browser:   
2. Ensembl:
a. Nucleotide                b. Gene              c. dbGap                     
d. dbVar                       e. dbSNP            f. PubMed
4. Wormbase:
5. Flybase:
6. CompleteGenomics:
7. HapMap:
8. 1000 Genomes Project:
10. DAVID Functional Annotation:
11. BioPerl:
12. GitHub:
13. Introduction to Unix:
15. ExPASy Bioinformatics Resource Portal:


Amit said...

I gave a similar guest lecture to the undergrad bioinformatics here when my boss was out of town. It was titled 'Gene Safari' and was about all the different web based bioinformatics resources ( UCSC, NCBI, etc)

I tried to motivate the lecture by showing them resources to learn more about FOXP2 as an example.
These were undergrads and think most of them were either half asleep (a 9am class) or surfing the internet on their laptops. The whole experience was underwhelming because I didn't know if they were bored by my lecture, or just weren't into it.

mathbionerd said...

Sometimes there is nothing you can do when the students show up not wanting to learn. All the interesting material and enthusiasm will fall flat.