Table of contents
    No headers


    Seq191 Supplemental Information


    Sequence Architecture

    cpgplot.1.pngThe first reference genome sequence of maize has been released. The next step in analysis involves annotation of the reference genome. Our sequence we are annotating contains 400,000 base pairs located on the short arm of chromosome 10. Sequence 191 consists of higher  A/T base pair content (56%). A higher frequency of AA and TT two bp word combinations exist, representing 1.46 and 1.39 times more than the average, respectively. As expected GC and CG words have the smallest ratios. However, AC and GT words are also less frequently represented in the sequence (0.84 and 0.85, respectively). A CpG plot of sequence 191 (Figure 1) illustrates GC rich islands in the sequence with two large extended islands, one in the first 20,000 bp and the other at the 320,000 bp mark.


    The higher AA and TT word content can be explained by the common occurence of repeat elements within the sequence. After analyzing Seq 191 with RepeatMasker, 80.3% of the sequence was masked. Most of the masking (78.15%) was due to Retroelements, shown in Figure 2, especially the LTR elements Copia (44.69%) and Gypsy (31.90%). DNA transposons made up 1.87% of the sequence and low complexity and simple repeats were credited for the remaining 0.27% of sequence 191. Unmasked regions of the annotation sequence were located in islands among the masked regions. Figure 3 illustrates this characteristic of the sequence. The bottom plot represents the masked region of the sequence and the top plot represents the unmasked regions. This might suggest that most of the genes in sequence 191 are located in gene rich areas of the sequence or that they are located in isolated islands.GA Figure 3.png


    Gene Modeling

    GA Figure 4.pngGene modeling was conducted using FGENESH and GeneMark software for the unmasked regions of sequence 191. FGENESH predictions resulted in 9 genes being identified with the ninth predicted gene being incomplete due to sequence truncation. Exons per genes range from 1 to 13, and exon lengths range from 6bp to 744bp.(FGENESH Gene Models for 191Masked.pdf).  The GeneMark software predictions resulted in 14 genes with the last predicted gene being incomplete due to sequence truncation. Exons per genes range from 2 to 11, and exon lengths range from 6 to 746 (GeneMark Gene Prediction.docx). Seven of the gene predictions were identified by both software packages, as shown in Figure 4. However, all 7 genes with shared predictions from both softwares differed in their respective gene length or exon number and length. This may represent the potential for several alternative splicing sites within these gene models. Both gene model programs predicted genes on both positive and negative strands. Table 1 lists specific information for each gene model from both programs. GeneMark shows the tendency to predict genes with smaller protein sequence with 6 of the 7 genes solely predicted in GeneMark containing less than 600 bp.


    Protein Prediction

    Predicted amino acid sequence from each individual predicted gene model of the FGENESH and GeneMark programs were queried using the basic local alignment search tool for proteins (BLASTp) on the NCBI website. Protein sequence was compared to all available sequences in the NCBI database. Most of the queried protein sequences predicted for sequence 191 aligned most significantly with predicted protein, especially predicted proteins from Sorghum bicolor. The alignments with the highest E-value for each predicted gene model is listed in Table 1.


    GA Table 1.png


    Of the 16 combined gene models for both prediction programs, only four protein BLASTs returned identifiable domains or motifs. These gene models (listed by ID from Table1) are described below.



    The BLASTp search using the predicted protein sequence from 191a resulted in specific hits to crotonase-like families, crotonase-like superfamilies, and PRK05617 multi-domains. A protein motif search with InterProScan identified an enoyl CoA hydrotase motif. This motif is an essential part of the crotonase enzyme, which is involved with metabolizing fatty acids. Several of the most significant alignments from the BLASTp search were of the hydroxyisobutyryl CoA hydrolase genes in a myriad of species including Zea mays, Oriza sativa, and Glycine max.



    No putative domains were retrieved from the BLASTp search of the predicted protein sequence in 191b, and most alignments were with hypothetical proteins of Sorghum bicolor. However, a protein motif search with InterProScan identified a zinc finger motif (ZnF_C2HC). Zinc fingers are involved with binding DNA, RNA, proteins, or lipid substrates. This characteristic suggests that this gene could be part of a transcription factor regulating family of genes.



    The BLASTp search for the protein sequence of 191c returned a 99% alignment and E-value of zero with a maternal protein pumilio of Zea mays. This is substantial evidence that this in fact is the sequence from this gene. A similar result with a slightly lower score is found for a alignment in Sorghum bicolor and Oryza sativa, suggesting these could be the orthologs of the Z. mays gene. The gene has domains from the pumilio family and pumilio superfamily, which are involved with RNA-binding in drosophila and humans. 



    The query of protein sequence 191p using BLASTp resulted in a 97% alignment to a JmjC domain containing protein in Oryza sativa with an E-value of zero.  Hypthetical proteins with 97-98% coverage in Sorghum bicolor, Vitis vinifera, and Arabidopsis thaliana were also identified.  A motif search using InterProScan also confirmed the presence of a JmjC domain in protein sequence 191p.  Proteins with the JmjC domain,  a member of the cupin metalloenzyme superfamily, are usually involved in chromatin organization.  However, JmjC containing proteins, have also been involved in protecting genes from DNA methylation and RNA silencing in Arabidopsis.  Further analysis is needed to determine the function of 191p in Zea mays.    



    Through our annotation of Sequence 191 of the long-arm of chromosome 10 in maize, we have narrowed 400,000 bp to four genes with putative functions. Further molecular techniques will be needed to verify the gene models, discover gene function, and elucidate phenotypic effect of different alleles. These techniques include: ChIP analysis, yeast two-hybrid systems, mutation characterization, cell transformation, association studies, and many more.

    Was this page helpful?
    Tag page (Edit tags)
    • No tags
    You must login to post a comment.