Table of contents
    1. 1. Sequence 205 Annotation

    Sequence 205 Annotation

     

    Sequence Composition

    Sequence 205 is a 400,000 base pair fragment of the maize (Zea mays) genome composed of 26.49% adenine, 26.27% thymine, 23.46% cytosine, and 23.06% guanine with 0.92% unidentified bases as determined by Artemis.  Analysis of the sequence composition by EMBOSS program Compseq has indicated a relatively low frequency of GC and CG  at 0.88 and 0.72 times the expected frequency.  High GC and CG content, known as CpG Islands, are often indicative of regions containing genes as these CpG patterns are resistant to methylation.  AT and TA composition were at 1.1 and 0.90 times the expected frequency.  AC and CA content was also found to be high at 0.87 and 1.06 times the expected frequency.  For more information regarding the composition of the sequence, please see Supplemental Figure S1.  

    Figure 1: CpG Island Plot

    cpgplot.txt.png

    Analysis by Repeatmasker revealed that Sequence 205 contains 138 retroelements (70.42%), 44 transposons (7.99%), and 2 simple repeats.  High retrotransposon content is expected of the maize genome, which is typically found to have at least 60% retrotransposons.  The output of Repeatmasker is located in Supplemental Figure S2.  BLAST analysis of the retrotransposons has demonstrated close correlation to Cinful1, Zeon1, GAGPOL precursors, and gypsy-type transposable elements.  Sequences not masked by Repeatmasker were analyzed with Genemark and FGENESH.

     

     

    Predicted Proteins

    Predicted genes were formulated using Genemark and FGENESH. 17 predicted proteins were formulated from Genemark and 4 predicted proteins were formulated from FGENESH. 3 of the 4 predicted FGENESH proteins matched with 3 predicted Genemark proteins; 1 of the 4 predicted FGENESH proteins was not matched with any Genemark predicted proteins. It appears that a polygalacturonase gene and a triacylglycerol lipase gene have been found through an E value of 0 for each gene (highlighted in yellow). Another protein/gene, RNA-binding protein 45 (RBP45), has a fairly strong correlation (E=e-36) in A. thaliana and N. plumbaginifolia (highlighted in light blue-needs a closer look).  BLAST outputs can be accessed by clicking on the highlighted Gene IDs.

    The table below summarizes:

     

    Gene ID

    Start (From First Exon)

    End (From Last Exon)

    Strand Orientation

    # of Exons

    F=FGENESH

    G=Genemark

    BLAST Conclusion

    205-1

    14082

    (13925)

    11242

    11242

    --

    --

    5

    5

    G

    F

    G*Polygalacturonase (Z. mays)à E=4e-122

    *Polygalacturonase, putative (R. communis)à E=7e-61

    *Polygalacturonase, putative (A. thaliana)à E=4e-60

    F*Polygalacturonase (Z. mays)à E=0

    205-2

    27678

    28107

    +

    2

    G

    No match.

    205-3

    62420

    62600

    +

    2

    G

    Could be part of 3,2-trans-enoyl-CoA Isomerase or dodecenoyl CoA Isomerase (E>2), but probably not a match at all.

    205-4

    95031

    95031

    94497

    94497

    --

    --

    2

    2

    G

    F

    G Some matches with A. thaliana (E>0.50), but probably no match.

    F Some matches with Z. mays (E>1), no match

    205-5

    95040

    95456

    +

    1

    G

    No match.

    205-6

    112856

    113050

    +

    2

    G

    No match.

    205-7

    146879

    146349

    --

    2

    G

    Many close matches, but theme and function seem to be RNA-binding. Lots of unknown/hypothetical proteins with Z. mays, V. vinifera, and many other plant genomes.

    *RNA-binding protein 45 (RBP45) (A. thaliana)à E=3e-36

    *RNA-binding protein 45 (N. plumbaginifolia)à E=4e-36

    *RNA-binding protein 45, putative (O. sativa Japonica Group)à E=1e-34

    205-8

    183145

    182595

    --

    3

    G

    No match.

    205-9

    228585

    229283

    +

    3

    G

    No match.

    205-10

    235188

    234815

    --

    2

    G

    No match.

    205-11

    271957

    272117

    +

    2

    G

    All matches with E>9, no match.

    205-12

    321727

    322098

    +

    3

    G

    No match.

    205-FGENESH

    357006

    355265

    --

    3

    F

    No match.

    205-13

    367145

    366129

    --

    3

    G

    No match.

    205-14

    368507

    368581

    +

    2

    G

    No match.

    205-15

    374792

    372824

    --

    8

    G

    No match.

    205-16

    376826

    377079

    +

    2

    G

    No match.

    205-17

    381037

    (381052)

    388385

    388385

    +

    +

    5

    7

    G

    F

    G *Triacylglycerol lipase (Z. mays)à E=7e-118

    *Triacylglycerol lipase, putative (R. communis)à E=2e-39

    Lipase class 3 family protein themed.

    F *Triacylglycerol lipase (Z. mays)à E=0

     

    Both polygalacturonase and triacylglycerol lipase genes were ensured to be annotated (checked for splice junctions and proper start/stop codons) using Artemis. Exons were mapped out as well. FGENESH start and stop positions of the gene and exons were used. In the DNAPlotter figure below, description follows the five circles, 1 being the outer most circle and 5 being the inner most circle. Circle 1 is the nucleotide sequence of seq205. Circle 2 reveals the two genes (marked in green) found through BLAST analysis, the locations of the genes in the sequence tell them apart. Circle 3 (marked in yellow) are the plotted exons of each of the genes. Circle 4 represents a plot of G+C content (in a 10 kb window); gold represents an above average amount of G+C content, purple represents a below average amount of G+C content. Circle 5 shows a plot of GC skew ([G^AC]/[G+C]; in a 10 kb window); gold represents that there is more C than G and purple represents that there is more G than C.
     

    205 GC and genes.jpg

     

    For more information regarding our putative genes, please click on the following links:

    Seq205-01

    Seq205-17

    Other Putative Genes

     

     

    Links of Interest

    Works Cited

    Supplemental  Information

     

    Annotator Contact Information: 

    Sara Reagan: reagan0@purdue.edu

    Jeff Grabowski: jgrabows@purdue.edu

    Was this page helpful?
    Tag page (Edit tags)
    • No tags

    Files 58

    FileSizeDateAttached by 
     205 GC and genes.jpg
    No description
    78.5 kB17:00, 14 Oct 2010jgrabowsActions
     205 GC content.jpg
    DNA Plotter--GC Content figure
    31.5 kB23:32, 4 Oct 2010jgrabowsActions
     205-01 model1.png
    No description
    186.57 kB15:27, 3 Dec 2010reagan0Actions
     205-01 model2.png
    No description
    77.72 kB15:27, 3 Dec 2010reagan0Actions
     205-01 model3.png
    No description
    94.98 kB15:27, 3 Dec 2010reagan0Actions
     205-01 model4.png
    No description
    84.84 kB15:27, 3 Dec 2010reagan0Actions
     205-01 model5.png
    No description
    89.55 kB15:27, 3 Dec 2010reagan0Actions
     205-17.jpg
    No description
    92.86 kB14:08, 10 Dec 2010reagan0Actions
     Bergman and Quesneville--Discovery and Detect of TEs in genome sequences.pdf
    No description
    136.56 kB11:54, 23 Sep 2010jgrabowsActions
     compseq.txt
    No description
    965 bytes15:17, 17 Sep 2010reagan0Actions
     cpgplot.txt.png
    No description
    9.03 kB13:46, 10 Sep 2010reagan0Actions
     cpgplot2.txt
    No description
    6.06 kB15:27, 17 Sep 2010reagan0Actions
     Eukaryotic GeneMark.hmm.pdf
    genemark using repeat masker output
    127.68 kB08:34, 5 Oct 2010reagan0Actions
     FGENESH of repeat masker output.pdf
    No description
    70.27 kB12:59, 21 Sep 2010reagan0Actions
     fgenesh with repeatmaskerinput.xlsx
    No description
    35.88 kB13:14, 21 Sep 2010reagan0Actions
     geecee.txt
    No description
    42 bytes13:50, 10 Sep 2010reagan0Actions
     GENSCAN Output of whole 205.pdf
    No description
    87.13 kB16:43, 8 Oct 2010reagan0Actions
     GENSCAN Output.pdf
    No description
    45.93 kB10:35, 5 Oct 2010reagan0Actions
     http:www.repeatmasker.org:tmp:RM2sequpload_1286225803.align.txt.pdf
    No description
    924.88 kB08:29, 5 Oct 2010reagan0Actions
     http:www.repeatmasker.org:tmp:RM2sequpload_1286225803.masked.txt.pdf
    No description
    154.72 kB08:29, 5 Oct 2010reagan0Actions
     http:www.repeatmasker.org:tmp:RM2sequpload_1286225803.out.txt.pdf
    No description
    47.28 kB08:29, 5 Oct 2010reagan0Actions
     I-TASSER results seq205-17.pdf
    No description
    463.43 kB09:42, 29 Nov 2010reagan0Actions
     I-TASSER results.pdf
    No description
    538.42 kB12:05, 23 Nov 2010reagan0Actions
     interpro 205-01.jpg
    No description
    136.93 kB14:08, 10 Dec 2010reagan0Actions
     InterProScan 205 1.pdf
    interpro output of 201-1
    90.27 kB10:48, 11 Nov 2010reagan0Actions
     InterProScan 205-17.pdf
    Interpro of 205-17
    85.13 kB10:54, 11 Nov 2010reagan0Actions
     isochore.txt.png
    No description
    6.21 kB13:23, 10 Sep 2010reagan0Actions
     lipase model1.png
    No description
    182.92 kB15:27, 3 Dec 2010reagan0Actions
     lipase model2.png
    No description
    133.9 kB15:28, 3 Dec 2010reagan0Actions
     lipase model3.png
    No description
    134.6 kB15:28, 3 Dec 2010reagan0Actions
     lipase model4.png
    No description
    85.58 kB15:28, 3 Dec 2010reagan0Actions
     lipase model5.png
    No description
    82.16 kB15:28, 3 Dec 2010reagan0Actions
     NCBI Blast 205 EXON 1 FGENESH.pdf
    No description
    451.81 kB11:04, 5 Oct 2010reagan0Actions
     NCBI Blast 205 EXON 2 FGENESH.pdf
    No description
    143.46 kB11:04, 5 Oct 2010reagan0Actions
     NCBI Blast EXON 4 FGENESH.pdf
    No description
    193.09 kB11:04, 5 Oct 2010reagan0Actions
     NCBI BLAST EXON3 FGENESH.pdf
    No description
    146.05 kB11:04, 5 Oct 2010reagan0Actions
     NCBI Blast:Protein Sequence (148 letters).pdf
    No description
    826.61 kB18:10, 6 Dec 2010reagan0Actions
     plotorf.1.png
    No description
    7.15 kB23:56, 19 Sep 2010jgrabowsActions
     polgalt BLASTx.jpg
    No description
    50.12 kB03:44, 9 Dec 2010jgrabowsActions
     polygalact 1.png
    No description
    37.53 kB23:43, 29 Nov 2010jgrabowsActions
     polygalact 2.png
    No description
    37.18 kB23:54, 29 Nov 2010jgrabowsActions
     polygalt BLASTn.jpg
    No description
    46.57 kB03:44, 9 Dec 2010jgrabowsActions
     polygalt NJ.jpg
    No description
    142.97 kB03:45, 9 Dec 2010jgrabowsActions
     RepeatMasker Results.pdf
    No description
    76.42 kB09:53, 20 Sep 2010reagan0Actions
     rust protein model1.png
    No description
    68.54 kB15:28, 3 Dec 2010reagan0Actions
     rust protein model2.png
    No description
    134.04 kB15:28, 3 Dec 2010reagan0Actions
     rust protein model3.png
    No description
    136.34 kB15:29, 3 Dec 2010reagan0Actions
     rust protein model4.png
    No description
    125.63 kB15:29, 3 Dec 2010reagan0Actions
     rust protein model5.png
    No description
    153.23 kB15:29, 3 Dec 2010reagan0Actions
     seq 205-17.pdf
    interproscan
    27.77 kB22:45, 7 Dec 2010reagan0Actions
     seq205 aa seq 1 total output genemark.pdf
    BLAST of seq 205 #1 of genemark
    176.37 kB20:29, 24 Oct 2010reagan0Actions
     seq205-01.pdf
    No description
    54.1 kB22:41, 7 Dec 2010reagan0Actions
     seq205.txt
    No description
    403.67 kB13:17, 7 Sep 2010reagan0Actions
     seq205ExPASy.txt
    No description
    1700.35 kB16:27, 8 Sep 2010reagan0Actions
     tacylglyc blastn.jpg
    No description
    81.91 kB23:34, 8 Dec 2010jgrabowsActions
     tacylglyc blastx.jpg
    No description
    76.31 kB23:34, 8 Dec 2010jgrabowsActions
     triacygly NJ.jpg
    No description
    163.86 kB02:43, 9 Dec 2010jgrabowsActions
     Untitled.jpg
    No description
    108.41 kB23:09, 29 Nov 2010jgrabowsActions
    Viewing 15 of 19 comments: view all
    Hey I'm working on doing a Blast2Go analysis on our sequence. I'll post what I find.

    Scratch that...seems like a simple BLAST should suffice with this sequence. edited 11:52, 23 Sep 2010
    Posted 11:39, 20 Sep 2010
    I've moved our outline and rough stuff to this page. I figure we can outline what we find, then write the annotation together at a later date.
    Posted 10:25, 21 Sep 2010
    When I run FGENESH, I get a pretty big file. When I compared this to 190 annotation, we have a ton more. However, when I put seq190 into FGENESH, I do not get the same output as the professors did. Maybe I did something wrong.
    Posted 11:16, 21 Sep 2010
    Loaded repeat masker output into FGENESH. I think that fixed the problem of too much info.
    Posted 13:00, 21 Sep 2010
    Uploaded xls spreadsheet with accession numbers of top hitter in BLAST. Note that 2 have >6 hits (though one is mostly from rice). Check it out. One hit looks like a polygalacturonase, and the other looks like triacyglycerol lipase. However, I did pull some stuff out with seq205 (no repeat masker), through FGENESH. Some hit were transposons....some were fragments of other things, like a "Cinful1 polyprotein" and a putative gag-pol precursor -orf of Z. mays.
    Posted 13:19, 21 Sep 2010
    May need to repeat the "repeat masker". Links on pdf are not working. They worked fine before....
    Posted 14:35, 28 Sep 2010
    Rerunning repeat masker.

    -sara
    Posted 16:57, 4 Oct 2010
    Was scrummaging through Artemis today with our sequence, and created a pretty neat, organized, and straight forward figure with GC content. Gold around the circle is GC content above average and the purple is GC content below average. Gives a regional perspective.
    Posted 23:34, 4 Oct 2010
    nice!

    I'll attach the pdfs from repeat masker.
    Posted 08:28, 5 Oct 2010
    Ran FGENESH and Genemark. Have not tried Glimmer or GENSCAN yet.
    Posted 08:37, 5 Oct 2010
    I've attached the BLAST using FGENESH outputs of putative proteins.
    Posted 11:06, 5 Oct 2010
    Awesome work, partner!
    Posted 17:34, 14 Oct 2010
    Check out the first amino acid sequence given by the genemark using the whole 205 sequence. I've attached the BLAST file from it, and it looks like there might be a rust resistance gene in there!!!!
    Posted 20:31, 24 Oct 2010
    Cool, but E value is quite high. We'll have to look at it more.
    Posted 16:17, 2 Nov 2010
    Nice work, Jeff. Can you figure out how to paste the RepeatMasker output to the Supplemental page? I'm having issues for some reason. I'll proceed to hyperlink stuff. If we can't get it on the page, then I'll just link it to the pdf.
    Posted 15:28, 30 Nov 2010
    Viewing 15 of 19 comments: view all
    You must login to post a comment.