Was this page helpful?



    • Undergraduate section: BIOL 478 / CS 478 / STAT 490B (3 credits)
      • meets MWF:1230-13:20 Lilly G432 
    • Graduate section: BIOL 595B (4 credits)  
      • meets with Biol478 above and U:11:30 Lilly G230 

    Graduates and undergraduates attend the same lectures and take the same exams. Graduates will also participate in a one hour per week seminar with Dr. Gribskov (time to be arranged)

    • Course Coordinator-Michael Gribskov (gribskov@purdue.edu)
      • Lilly G-233, 494-6933; office hours by appointment
    • Co-Instructors - Daisuke Kihara (dkihara@purdue.edu)
    • Grad TA - Yifeng (David) Yang (yang41@purdue.edu)
      • Office Hour: TH 1:30pm-3:00pm @B404,Lilly. (Find room B401 and B404 is in it, I am at the cubile with a Monet Painting)


    Course Description

    Bioinformatics is broadly defined as the computational study of biological information, targeting particularly the enormous volume of information in DNA and protein sequences, gene expression data, and protein structure and function. Topics in this course will include understanding the sequences of DNA and proteins, the evolution of genes and genomes, the structure and function of proteins, and the dynamics of gene expression in biological processes (transcriptomics). Inherently, bioinformatics is interdisciplinary, melding biology with applications of computer science and statistics. This course introduces analytical methods from biology, statistics and computer science that are necessary for bioinformatics investigations, and demonstrates some application of these methods to biological problems.

    The undergraduate section is intended for juniors and seniors from various science backgrounds. Our objective is to develop the skills of both tool users and tool designers in this important new field of research.

    The graduate section has the same goals as the undergrad course with the additional goal of learning to read and critically analyze bioinformatics research literature.

    Course Content Biol478/595 provides a survey of the major areas of bioinformatics at the macromolecular level. There are many interesting areas of bioinformatics that we won’t be able to cover in a semester, such as cellular modeling, organismal interactions, and ecology.  The course content is divided into four modules:

    1. Genomics (DNA and protein sequence analysis)
    2. Evolution and Phylogenetics
    3. Systems Biology
    4. Protein structure

    Prerequisites Background in both biology and computer science is helpful in this course. Background in molecular biology as represented by either BIOL 295E or both of BIOL 231 and 241, and background in computer science as represented by CS 158, CS 177, CS 180 or an equivalent first‐year introductory programming course. Students who have not achieved grades of C or better in these courses should consult with the instructor. Admission for students who have not taken these courses is also possible by consent of the instructor.

    Course Activities

    Regular lectures Unless indicated on the schedule below all classes will be regular lectures. Readings for the next lecture will be announced at the beginning of each lecture. See course materials below for a description of the text and background references.

    Tutorials Practical demonstrations of bioinformatic analyses using available programs.

    Homework Homework assignments, generally weekly. Generally homework assignments will bepublished on Wednesday or friday, and will be due the following Friday, e.g., published on blackboard 27 Aug, due 5 Sep.

    Mini‐projects There will be three long homeworks or mini‐projects during the semester. Each miniproject will involve a multidisciplinary approach, and contain both biological and computational analyses. The due dates for each mini‐project will be listed on the schedule. The assignment for each will be published on blackboard 2‐4 weeks before it is due.

    Evening exams A midterm is tentatively scheduled for 7:00‐9:00 PM, Monday, 29 September (location TBA). The comp time will be provided by having no required lecture during class time on that day, instead class time on 29 September will be used for an optional review session. The exam is open book, open notes, but no calculators and no computers. It will cover material through Friday, 26 September.

    Quizzes There will be two quizzes. Each quiz will last 30 minutes. The time before the quiz will be used as catch‐up time for the lectures and time for a question and answer review of the material. You will be able to make a single sheet of notes (one side) for reference, but otherwise the quizzes will be closed book, closed notes and no calculators or computers.

    Assessment and Grading Grades in the course will be based on one mid‐term and one final exam together worth a total of 40% of the course grade. The balance of the course grade (60%) will be based on homework assignments, mini‐projects and quizzes. The breakdown of points follows (note that this may be tweaked slightly).

    Activity  Points
    Midterm  100
     Final  300
     Homework  20 each = 200
     Mini-projects  100 each = 300
     Quizes  50 each = 100
     Total  1000

    Note If the grade on the Final is better than the grade on the Midterm, the grade on the final can replace the Midterm, making the Final worth 400 points.  Semester grades will be awarded based on the following minima of performance: 85% = A, 75% = B, 65%= C and 50% = D; < 50% = F. These thresholds may be adjusted downwards at the discretion of the instructors so that it may be easier to get each letter grade, but the thresholds will not be raised. There will be no strict curve (everyone can earn an A, and we hope they do).

    Course Policies

    Academic behavior Academic dishonesty of any kind (cheating, plagiarism, fabrication of data, improper collaboration, etc.) is not tolerated and is grounds for failing the course (grade F) and notification of University administration for further disciplinary action. All assignments will be explicitly labeled for individual versus group effort; groups will be instructed as to the rules for collaboration. All questions about course policy and administration should be directed to the Course Coordinator.

    Important dates 

    • Last day to drop without it appearing on your record - 8 Sep
    • Last day to drop without a grade  - 22 Sep
    • Last day to drop with W or WF on your record - 29 Oct

    We give W (and not WF) to encourage those who find the course is unsuitable to drop and not feel like they have to stay with it.

    Course Materials

    Course Web Page: http://blackboard.purdue.edu or http://pkr.genomics.purdue.edu:888/Biol478

    Primary Text

    Mount, David W. 2004. Bioinformatics: sequence and genome analysis. Second edition. Cold Spring Harbor Laboratory Press, Cold Spring Harbor MY, USA. ISBN 0‐87969‐712‐1 (paperback)

    Additional readings will be provided for specific lectures; generally these will be posted on blackboard as PDF files.

    Background Biology References:


    • NCBI website (http://www.ncbi.nlm.nih.gov). See Science Primers on Molecular Genetics, Bioinformatics, etc.
    • Biology and Biochemistry:Berg, Jeremy M. 2006. Biochemistry. Sixth Edition. W. H. Freeman. ISBN 0‐71678‐724‐5 (hardcover).  The Fifth Edition is also available on the web at:
    • http://www.ncbi.nlm.nih.gov/books/bv...er.TOC&depth=2.  An excellent text to refresh your knowledge of the biochemistry underlying bioinformatics. Any edition newer than the third is equally valuable for background reading.


     Lecture  Date
    1 25-Aug Introduction

    Sequences and evolution
    Sequences and Evolution
    Genomics: Sequence database searching
    Genomics: Sequence database searching
    Genomics: Scoring systems
    Genomics: Scoring systems and pairwise alignments
    8 12-Sep
    Genomics: Scoring systems and pairwise alignments
    9 15-Sep
    Genomics: statistics of alignments
    10  17-Sep
    Genomics: Genome Assembly
    11 19 Sep
    Genomics: Gene modeling
    12 22-Sep
    Genomics: Gene modeling
    13 24-Sep
    Genomics:Quiz and Bayesian Statistics (slides 1-17, only)
    14 26-Sep
    Genomics: Sequence Motifs
    1 Oct
    Multiple alignments and trees
    3 Oct
    Multiple alignments and trees
    6 Oct
    Multiple alignments and trees
    8 Oct
    Multiple alignments and trees
    10 Oct
    Multiple alignments and trees


    Graduate Section

    The graduate section of Introduction to Bioinformatics focuses on reading original papers from the scientific literature.These papers are a mix of “classics” and more recent papers that relate to the weekly content of the lecture course.The goal of the graduate section is to gain experience in critically reading the scientific literature, and to provide survey of background literature in Computational Biology as a steppingstone to advance studies.

    Reviewer responsibilities (assignment)

    All: prepare a brief review of each paper, as discussed in our first session

    1. Primary- Present a brief synopsis of paper: written review should be slightly more detailed than when you are secondary or tertiary reviewer.
    2. Secondary - be prepared to add any points the primary reviewer misses
    3. Everyone - be prepared to (intelligently discuss) the paper

    Assigned Papers

    Date Topic Papers
    1 Sep - 5 Sep Organization Handout
    8 Sep - 12 Sep Scoring & Alignment

    ·Dayhoff, MO, Schwarz, RM, Orcutt BC, A model of evolutionary change in proteins, in Atlas of Protein Sequence and Structure, Vol5, Supp. 3, 1978, MO Dayhoff, ed., National Biomedical Research Foundation, Georgetown University Medical Center, Washington DC.
    Henikoff S, Henikoff JG., Amino acid substitution matrices from protein blocks.Proc Natl Acad Sci U S A. 1992, 89, 10915-10919.

    15 Sep - 19 Sep Alignment & Gene Finding

    ·Needleman SB, Wunsch CD., A general method applicable to the search for similarities in the amino acid sequence of two proteins., J Mol Biol. 1970 48:443-453
    Altschul SF, Madden TL, Schaeffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ, Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, 1997, Nucleic Acids Res. 25:3389-3402.

    22 Sep - 25 Sep Sequence Motif

    ·Gribskov M,  Veretnik S. "Identification of Sequence Patterns with Profile Analysis." Methods in Enzymology&nbsp266, 198-212, 1996.  PDF
    Lawrence CE, Altschul SF, Boguski MS, Liu JS, Neuwald AF, Wootton JC.“Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment.” Science 262, 208-14, 1993.  PDF

    29 Sep - 3 Oct Multiple Alignment

    ·Thompson JD, Higgins DG, Gibson TJ.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice.  Nucleic Acids Res.  22, 4673-4680, 1994.  PDF
    Edgar RC.  MUSCLE: multiple sequence alignment with high accuracy and high throughput.
    Nucleic Acids Res. 32, 1792-1797,  2004.  PDF


    6 - 10

    ·L Vigilant, M Stoneking, H Harpending, K Hawkes, AC Wilson, African populations and the evolution of human mitochondrial DNA.Science 253, 1503-1507 (1991). PDF
    Templeton AR, Human origins and analysis of mitochondrial DNA sequences, Science 255, 737 (1992) PDF (includes Hedges)
    Hedges SB, Kumar S, Tamura K, Human origins and analysis of mitochondrial DNA sequences.  Science 255, 737-739(1992)
    Ke Y, Su B, Song X, Lu D, Chen L, Li H, Qi C, Marzuki S, Deka R, Underhill P, Xiao C, Shriver M, Lell J, Wallace D, Wells RS, Seielstad M, Oefner P, Zhu D, Jin J, Huang W, Chakraborty R, Chen Z, Jin L. African origin of modern humans in East Asia: a tale of 12,000 Y chromosomes. Science 292, 1151-1153 (2001). PDF

    Comparative Genomics ·R Overbeek, M Fonstein, M D’souza, G D. Pusch, and N Maltsev, The use of gene clusters to infer functional coupling, Proc. Natl. Acad. Sci. USA, 96, 2896–2901, 1999 PDF ·Tamames, J, Evolution of gene order conservation in prokaryotes, Genome Biology 2001, 2(6):research0020.1–0020.11 PDF
    27 Oct - 31 Oct Protein
    Sequence and
    ·C Chothia and A M Lesk, The relation between the divergence of sequence and structure in proteins.EMBO J.  5,  823–826,  1986 PDF
    ·F Pazos, A Valencia, Protein co-evolution, co-adaptation and interactions.  EMBO J., 2008 Sep 25. [Epub ahead of print].  PDF
    3 Nov - 7 Nov
    ·Nework biology: understanding the cell's functional organizationPDF

    Because of the late posting, I will lead a discussion of some of the ideas in this paper.  Please read but not reviews are needed.

    10 Nov - 13 Nov
    Special topics

    ·Matthew C. Cowperthwaite, Evan P. Economo, William R. Harcombe, Eric L. Miller, Lauren Ancel Meyers, The Ascent of the Abundant: How Mutational Networks Constrain Evolution, PLoS Computational Biology 4, e1000110 (2008) PDF

    Since this is a long paper, only one for this week. Please review.

    17 Nov - 21 Nov
    Neural Systems

    · N Qian and T.J. Sejnowski. Predicting the Secondary Structure of Globular Proteins Using Neural Network Models. J. Mol. Biol. (1988) 202, 865-884 PDF
    ·  A Krogh, B Larsson, G von Heijne, and ELL Sonhammer.  Predictiing transmembrane protein topology with a hidden markov model: applicayion to complete genomes. J. Molec. Biol.  305, 567-580, 2001. PDF

    24 Nov - 28 Nov

    Apparently there was some confusion since the papers did not appear in this box due to some error on the wiki (i did post them, really).  Since only three people showed up, we will just talk about these next week.

    · Meiler

    · Englebienne

    1 Dec - 5 Dec


    · Englebienne

    8 Dec - 12 Dec

    · H.-L. Huang, C.-C. Lee b, S.-Y. Ho, "Selecting a minimal number of relevant genes from microarray data to design accurate tissue classifiers", Biosystems.90, 78-86, 2007 PDF
    · Y.Liu and H.Yokota, "Artificial ants deposit pheromone to search for regulatory DNA
    elements"  BMC Genomics 7, 221  http://www.biomedcentral.com/content...2164-7-221.pdf, PDF

    More information

    Sample BIOL 595B Paper Report

    (Reviewer Name withheld)
    Phylogeny based discovery of regulatory elements
    Jason Gertz, Justin C Fay, Barak A Cohen
    BMC Bioinformatics. 2006 May 22;7:266.


    This paper discusses a method of motif discovery which not only searches genomes exhaustively but also considers the underlying phylogeny of sequences under consideration by weighting the significance of conserved sequences in proportion to the phylogenetic distances between sequences. HKY85 nucleotide substitution model is used which underlies both neutral and conserved models of transcription binding site evolution. Then the likelihoods of the conserved and neutral models are compared to determine whether the conserved model is a significantly better fit or not. Chi-squared distribution was taken as an approximation to the expected distribution of likelihood comparisons and not a single false positive was observed. Their observation that true regulatory elements have a different distribution of log likelihood ratios than random DNA motifs led to further investigation (by exhaustive genome search and performing likelihood comparisons for all hexamers, assuming that conserved hexamers are part of binding sites) whether the likelihood comparisons could identify novel TFBS. Results suggested that intergenic sequences are not significantly conserved. Significant hexamers were identified using Chi-square test and combined by alignment but 50% of these were false positives. One motif prediction was validated experimentally.

    Strong points
    Edit section

    1. The method also considers phylogeny and is not solely based on motif searching method.
    2. Methodology is well supported by logical arguments. For example, exhaustive search and not EM or Gibbs sampling (usually used for co-regulated genes), was used to efficiently search whole genomes.
    3. Experimental validation (though only for one prediction) is done.

    Weak points
    Edit section

    1. Perfect alignments are assumed at every evolutionary distance (not correct, since alignments become less perfect as distance increases) for determining sensitivity of method.
    2. 50% false positives were observed when alignments (obtained by combining hexamers) were matched with known binding sites, so high error rate.
    3. Reasons why the identified regulatory motif didn’t match any of the databases, and didn’t correlated with motifs identified in large scale studies, weren’t elucidated.
    4. Computational requirements of method aren’t discussed.

    Validity of conclusions:
    Edit section

    The data set used to test the success of the method doesn’t seem to strongly support the anticipated results. Nonetheless, the method is significant because of exhaustive genome searching and consideration of phylogeny, and hence it would be useful to do more testing and experimental validations to support the conclusions with more confidence.
    Was this page helpful?
    Tag page (Edit tags)
    • No tags

    Files 23

    FileSizeDateAttached by 
    No description
    760.02 kB09:29, 24 Nov 2008gribskovActions
    No description
    621.01 kB09:29, 24 Nov 2008gribskovActions
    Syllabus and assignments
    22.49 kB17:59, 8 Sep 2008gribskovActions
    25 Aug
    745.61 kB08:50, 11 Sep 2008gribskovActions
    No description
    1414.87 kB14:56, 17 Sep 2008gribskovActions
    No description
    1073.52 kB10:36, 24 Sep 2008gribskovActions
    No description
    617.92 kB10:36, 24 Sep 2008gribskovActions
     08biol478_13 (1).pdf
    No description
    597.77 kB14:04, 29 Sep 2008gribskovActions
    No description
    474.51 kB10:36, 24 Sep 2008gribskovActions
     08biol478_15 (1).pdf
    No description
    660.44 kB07:24, 3 Oct 2008gribskovActions
    No description
    669.55 kB09:47, 3 Oct 2008gribskovActions
    No description
    495.71 kB12:18, 6 Oct 2008gribskovActions
    No description
    361.6 kB12:10, 8 Oct 2008gribskovActions
    No description
    413.38 kB11:23, 28 Oct 2008gribskovActions
    No description
    3 MB14:56, 15 Sep 2008gribskovActions
    29 Aug
    783.64 kB08:50, 11 Sep 2008gribskovActions
    3 Sep
    815.95 kB08:50, 11 Sep 2008gribskovActions
    5 Sep
    627.83 kB08:50, 11 Sep 2008gribskovActions
    8 Sep
    373.36 kB08:50, 11 Sep 2008gribskovActions
    10 Sep
    327.67 kB08:50, 11 Sep 2008gribskovActions
    No description
    460.82 kB16:51, 12 Sep 2008gribskovActions
    No description
    365.37 kB13:44, 15 Sep 2008gribskovActions
    No description
    149.13 kB09:31, 11 Sep 2008gribskovActions
    You must login to post a comment.