Was this page helpful?

.Assignments and Reading

    Table of contents
    1. 1. Reading Assignments
    2. 2. Assignments

    Reading Assignments

    Date Reading
    28 August
    • Ekblom R, Wolf JBW. A field guide to whole-genome sequencing, assembly and annotation. Evol Appl. DOI: 10.1111/eva.12178, 2014.  PDF
    • Zhou X, Rokas A. Prevention, diagnosis and treatment of high-throughput sequencing data pathologies. Molec. Ecol. 23, 1679–1700, 2014.  PDF
    2 September
    • Miller JR, Koren S, Sutton G. Assembly algorithms for next-generation sequencing data. Genomics 95, 315-327, 2010. PDF
    • Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821-829, 2008. PDF
    • Butler J, MacCallum I, Kleber M, Shlyakhter IA, Belmonte MK, Lander ES, Nusbaum C, Jaffe DB. ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res. 18, 810-20, 2008. PDF

    4 September
    • Settles, MA et al. Sequence-indexed mutations in maize using the UniformMu transposon tagging population BMC Genomics 8, 116, 2007  PDF
    • Tsai, H. et al. Discovery of rare mutations in populations: TILLING by Sequencing. Plant Physiol 156, 1257–1268, 2011 PDF
    9 September
    • Sonah H, Bastien M, Iquira E, Tardivel A, Le ́gare ́ G, et al. (2013) An Improved Genotyping by Sequencing (GBS) Approach Offering Increased Versatility and Efficiency of SNP Discovery and Genotyping. PLoS ONE 8(1): e54603. doi:10.1371/journal.pone.0054603 PDF
    • Schneeberger, K, Stephan Ossowski, Christa Lanz, Trine Juul, Annabeth Høgh Petersen, Kåre Lehmann Nielsen, Jan-Elo Jørgensen, Detlef Weigel & Stig Uggerhø Andersen (2009) SHOREmap: simultaneous mapping and mutation identification by deep sequencing Nature Methods 6, 550 - 551 doi:10.1038/nmeth0809-550 PDF
    • Scheeberger et al Supplemental Data PDF

    11  September
    • Flicek P, Birney E. Sense from sequence reads: methods for alignment and assembly. Nat Methods. 2009 Nov;6(11 Suppl):S6-S12. PDF

    16 September
    • Salzberg SL, Phillippy AM, Zimin A, Puiu D, Magoc T, Koren S, Treangen TJ, Schatz MC, Delcher AL, Roberts M, Marçais G, Pop M, Yorke JA. GAGE: A critical evaluation of genome assemblies and assembly algorithms. Genome Res. 2012 Mar;22(3):557-67. PDF
    • Bradnam KR, Fass JN, Alexandrov A, Baranay P, Bechner M, Birol I, Boisvert S, Chapman JA, Chapuis G, Chikhi R, Chitsaz H, Chou WC, Corbeil J, Del Fabbro C, Docking TR, Durbin R, Earl D, Emrich S, Fedotov P, Fonseca NA, Ganapathy G, Gibbs RA, Gnerre S, Godzaridis E, Goldstein S, Haimel M, Hall G, Haussler D, Hiatt JB, Ho IY, Howard J, Hunt M, Jackman SD, Jaffe DB, Jarvis ED, Jiang H, Kazakov S, Kersey PJ, Kitzman JO, Knight JR, Koren S, Lam TW, Lavenier D, Laviolette F, Li Y, Li Z, Liu B, Liu Y, Luo R, Maccallum I, Macmanes MD, Maillet N, Melnikov S, Naquin D, Ning Z, Otto TD, Paten B, Paulo OS, Phillippy AM, Pina-Martins F, Place M, Przybylski D, Qin X, Qu C, Ribeiro FJ, Richards S, Rokhsar DS, Ruby JG, Scalabrin S, Schatz MC, Schwartz DC, Sergushichev A, Sharpe T, Shaw TI, Shendure J, Shi Y, Simpson JT, Song H, Tsarev F, Vezzi F, Vicedomini R, Vieira BM, Wang J, Worley KC, Yin S, Yiu SM, Yuan J, Zhang G, Zhang H, Zhou S, Korf IF. Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. Gigascience. 2013 Jul 22;2(1):10. PDF
    6 October
    • Cech, T and Steitz, J. (2014) The Noncoding RNA Revolution—Trashing Old Rules to Forge New Ones Cell 157:77-94 PDF
    • Parla, J, et al (2011) A comparative analysis of exome capture Genome Biology 12:R97 PDF
       
    9 October
    • Yandell M, Ence D. A beginner's guide to eukaryotic genome annotation. Nat Rev Genet. 2012 Apr 18;13(5):329-42. PDF
    • Liang C, Mao L, Ware D, Stein L. Evidence-based gene predictions in plant genomes. Genome Res. 2009 Oct;19(10):1912-23. PDF
    • Lomsadze A, Ter-Hovhannisyan V, Chernoff YO, Borodovsky M. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res. 2005 Nov 28;33(20):6494-506. PDF
    • Ter-Hovhannisyan V, Lomsadze A, Chernoff YO, Borodovsky M. Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. Genome Res. 2008 Dec;18(12):1979-90. PDF
    28 October
    Narayana Annaluru et al. Total Synthesis of a Functional Designer Eukaryotic Chromosome Science 344, 55-58 (2014)  PDF
    4 November

    Nehrt NL, Clark WT, Radivojac P, Hahn MW. Testing the ortholog conjecture with comparative functional genomic data from mammals. PLoS Comput Biol. 2011 Jun;7(6):e1002073. PDF

    11 November
    • Schulz MH, Zerbino DR, Vingron M, Birney E. Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels. Bioinformatics. 2012 Apr 15;28(8):1086-92. PDF
    • Haas BJ, Papanicolaou A, Yassour M, Grabherr M, Blood PD, Bowden J, Couger MB, Eccles D, Li B, Lieber M, Macmanes MD, Ott M, Orvis J, Pochet N, Strozzi F, Weeks N, Westerman R, William T, Dewey CN, Henschel R, Leduc RD, Friedman N, Regev A. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat Protoc. 2013 Aug;8(8):1494-512. PDF
    • Martin JA, Wang Z. Next-generation transcriptome assembly. Nat Rev Genet. 2011 Sep 7;12(10):671-82. PDF
    13 November
    • Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinformatics. 2011 Aug 4;12:323. doi:10.1186/1471-2105-12-323.PDF
    • Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11(10):R106. PDF
    • Leng N, Dawson JA, Thomson JA, Ruotti V, Rissman AI, Smits BM, Haag JD, Gould  MN, Stewart RM, Kendziorski C. EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics. 2013 Apr 15;29(8):1035-43. PDF
    • Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010 Jan 1;26(1):139-40. PDF
    • Burden CJ, Qureshi SE, Wilson SR. Error estimates for the analysis of differential expression from RNA-seq count data. PeerJ. 2014 Sep 23;2:e576. doi: 10.7717/peerj.576. eCollection 2014. PDF
    • Guo Y, Li CI, Ye F, Shyr Y. Evaluation of read count based RNAseq analysis methods. BMC Genomics. 2013;14 Suppl 8:S2. doi: 10.1186/1471-2164-14-S8-S2. PDF
    • Zhang ZH, Jhaveri DJ, Marshall VM, Bauer DC, Edson J, Narayanan RK, Robinson GJ, Lundberg AE, Bartlett PF, Wray NR, Zhao QY. A comparative study of techniques for differential expression analysis on RNA-Seq data. PLoS One. 2014 Aug 13;9(8):e103207. PDF

    Assignments

    Date Assignment
    28 August
    1. Logon to wiki
    2. Logon to scholar.rcac.purdue.edu
    3. cd to your scratch directory
    4. look at the primary data file - /scratch/carter/m/mgribsko/monascus/data/Monpu1.genome.rawReads.fastq
    2 September
    1. Choose one of the adapter cleaning programs and clean the Monpu1.genome.rawReads.fastq data
      • 5 pts - use one of the installed programs available with module load
      • 10 pts - download and use one of the other programs
      • you may do both
    2. Check the success of your cleaning
      • using FastQC
      • using grep/A file with the adapter sequences is in the NGS data cleaning directory
    9 September
    1. Use a mapping program, I suggest bowtie2,  to check your reads for contamination.  One obvious contaminant could be mitochondrial DNA, but others could be bacterial, viral, or human,  to name a few
    2. Look at available DNA assembly programs, read about them and choose one you would like to try on our data. On Thursday,  you should be ready to begin assembly. 
    3. Post the details of your best cleaned data set on the data cleaning page.  Provide the number of reads, number of bases, and counts for hits to each adapter using a 14 base exact match to each of the four adapter sequences as determined with grep.  If contamination sequences have been removed,  list how many reads were removed.
    30 October

    Annotation

    1. Find a partner and form a two person group.
    2. Each group should annotate at least 50 genes.
    3. choose one or more contigs to edit and note on the annotation page.  If you plan to do more than one contig, I suggest you finish the first one before reserving a second one.
    4. What is an annotation?
    • Is the gene model correct?  Loctions of exons, introns, TSS, polyA site, UTRs, start codon
    • Is the predicted protein sequence correct?
    • Is there a Blast match good enough to assign an exact or approximate function (putative or possible)?  Or is the sequnces merely like a known protein?
    • or is it conserved but of unknown function (conserved hypothetical protein)
    • Are there functional domains?  For example, interpro, Pfam, Prosite, Panther, etc.
    • Can you annotate the gene with GO terms?
    Was this page helpful?
    Tag page (Edit tags)
    • No tags
    You must login to post a comment.