Was this page helpful?

Gene 3

    Gene Models

    Evidence suggests that alternative splicing occurs, two models resulting from such splicing are outlined below:

    **There was no expression evidence to support the existance of exons 2, 3, and 4 that were precicted by FGeneSH and GeneMark.

     

    Model 1, 10 exons:

    gene 3 model 1 table.jpg


    Model 2, 12 exons:

    gene 3 model 2 table.jpg


    cDNAs supporting models.png

     

    Coding Sequence

    Click here for coding sequences file

    model 1 cds.png

    gene 3 model 2 cds.jpg

    MaizeGDB model CDS.png

    Exons 1-7 of models 1 and 2 agree perfectly with the MaizeGDB gene model of lyce1 GRMZM2G012966. Exon 8 is extended in both gene models and 9 and 10 of model 1, and 9, 10, 11, 12 of model 2 are in addition to what exists in GRMZM2G012966.

    Evidence

    Gene Predictions

        FGeneSH Prediction GeneMark Prediction
    Exon Strand Start End Start End
    1 + 82817 83131 82817 83131
    2 + 83224 83315 83270 83315
    3 + 83394 83460 83420 83460
    4 + 84163 84168 84163 84168
    5 + 84287 84458 84287 84458
    6 + 84568 84703 84568 84703
    7 + 84808 85021 84919 85021
    8 + 85172 85315 85406 85513
    9 + 85406 85513 85611 85682
    10 + 85586 85682 85986 86048
    11 + 85759 85887 87022 87223
    12 + 87029 87508 87302 87396
    13 + 87665 88190 87494 87615
    14 + 88289 89626 87686 88190
    15 + - - 88289 89626
               

    Therea are many similiarities in the results from the two prediction programs. They both span the better part of a 10 kb region and share many of the same predicted exon/intron junction

    The predicted coding sequence and protein sequence (both below) fail to align to lyce1, suggesting that the gene model needs to be corrected. It does produce weak alignments E.value <= 2.53e^-42 to two domains; DUF3633 superfamily produces the strongest alignment and alignments to various members of the protein kinase-like superfamily provide many weaker alignments.
     
    See results of blastp here
     
    FGeneSH predicted coding sequence:
    ATGATTAGATTCTCGTTTGCGGCCTGGCCGGCCCTACGCGCTCTGACCAGTTTCCCGCGT
    GACGACCGGCCCGGCAGCTCATCAATTCAATACTGGCGCTCTGGCAGTGGCAGTGGCAGT
    GGCAGTGGCAGCTCAGATTTGCCGAAGCCCAAAACGAAACCAGACGACCAGACCCAGTCC
    CCCAAAACGTCGCAGGCAACACAACAACAAAAGATATTAAAAAGACCAATAATTGGACCA
    TGCAACAGAACCATAAATATGATCACAGGGCCATACAAACTCATTCGACAGTGTGAAGTG
    ACTGCAATTCTTATATTGTACGAGCTGCCAAGGCTGCTAACTGGCTCGATTCTAGCTCAT
    GAGATGGTGCATGCTTACCTCCGACTTACAGATGCGTCGTCGTCATCTTCTACATCTTCC
    TCAATGAAAAAGGGTGCAAAGACGGAATTCGAAAAAAGACTTGGCGAGTTCTTCAAGCAC
    CAAATCGAAACAGATTCTTCTGTGGCTTATGGAGATGGGTTTCGAGCACTACACGTTATT
    CCTTTGCCTGAACGAACAGAATTAGAGGACCCTGATGAAATGCAGAAATGGAAGTGGAGT
    TTAAAGAAGGCAAAGAAAACTAACCGTGAGCTGCATGCTGAGAGATGTGACAGTGAGCTC
    AAACTCTTGATACTGTGGTGCAGTTTATATGCTATTGCTTTGCTGGAATTATACTTGTTT
    CTTTTCTGTAAACATGTTGCTCGTAAATTGAGAGAGAAGGATGGATTTTATTACCCTCAC
    AATTTAGATTTCCGTGGGCGTGCATACCCTATGCATCCTCATTTAAGTCACCTAGGTTCA
    GATCTTTGTCGGGGTGTTCTAGAGTATGCTGAAGGGCGGCCACTAGGAAAGTATGGGTTG
    TGCTGGTTGAAGATACACTTGGCTAACAAATATGGTGGTGGCATTGAAAAGCTTTCTCAC
    GAGGGCAAGCTAGCTTTTGTGGAGAACCAACTGTTTGATATATTTGATTCAGCATCAAAT
    CCTGTTGATGGGAACTGCTGGTGGACGAATGTTGAGGATCCCTTTCAGTGTTTAGCTGCA
    TGCATGGACCTTTCTGATGCCTTAAGAAGTCCATCACCTTACCATGCTGTCTCTCACCTG
    CCTATTCATCAGGATGGTTCATGCAATGGTTTACAACATTATGCCGCTCTTGGAAGGGAC
    TATCAGAAGCTATGCCTTTCAATTAAGAATGGTAGAATTGGTCTAGATGGGAAGGGGCTC
    GCCGGCACGCTCCCGCCCGCCGTCGCCGGGCTCCGCTCCCTCACGGGCCTCTACCTCCAC
    TACAACGCGCTCCGCGGTGGCATCCCGCGGGAGCTGGCGGCGCTCGCCGCGCTCACCGAC
    CTCTACCTCGATGTCAACAACTTCTCGGGGCCCGTCCCGCCGGAGATCGGCGCCATGGCA
    TCACTGCAAGTGGTGCAGCTCTGCTACAACCAGCTCACTGGGAGCATCCCCACCCAGCTG
    GGCAACCTCAGCAGGCTCACCGTGCTGGCCCTGCAGTCCAACCGCCTCAACGGCGCCATC
    CCCGCCAGCCTCGGCGACCTGCCGCTGCTTGCGCGCCTGGATCTCAGCTTCAACCGCCTC
    TTCGGCTCCATCCCCGTCAGGCTCGCGCAGCTGCCCAGCCTTGTCGCGCTCGACGTCAGG
    AACAACTCACTCACTGGCAGCGTGCCCGCCGAACTGGCCGCCAAGTTGCAGGCCGGCTTC
    CAGTACGGGAATAACAGCGACCTGTGCGGCGCGGGGCTGCCCGCGCTCCGTCCGTGCACC
    CCGGCGGACCTCATCGACCCGGACAGGCCGCAGCCCTTCAGCGCTGGCATCGCGCCGCAG
    GTCAGGCCCTCCGACGGGCGCGCGCCATCTACCAGGGCGCTCGCGGCAGTGGTCGTCGTC
    GCCGTGGCCCTCCTCGCGGCCACGGGGGTCGGCCTTCTCGCGCTCTCATGGCGTCGGTGG
    CGCCGACAGAGGGTCGCGGGGGGATCGCCGTCGACGGTCTCCGGCGGCCGGTGCAGCACC
    GAGGCTGCGCCGTCCGCGGCGAAGGCGTCGTCGGCTCGCAAGAGCGCGTCGTCGGCGCTG
    GCGAGCCTCGAGTACTCCAACGCCTGGGACCCGCTGGCCGACGCGCGCGGCGGGCTGGGC
    CTCTTCTCGCAGGACGCGCTGGCACAGAGCCTCCGCATCAGCACGGAGGAGGTGGAGTCC
    GCGACGCGCTACTTCTCGGAGCTCAACCTCCTCGGGAGGCGCGGCAAGAAGGCCGGCGGC
    CTGGCGGGCACGTACAGGGGCACGCTCCGCGACGGCACCTCCGTGGCCGTGAAGCGGCTG
    GGCAAGACGTGCTGCCGCCAGGAGGAGGCCGACTTCCTGAGCGGGCTGAGGCTGCTCGCG
    GAGCTCCGCCACGACAATGTGGTCGCGCTGAGGGGGTTCTGCTGCTCCAGGGCGCGCGGG
    GAGTGCTTCCTCGTCTACGACTTCGTGCCCAACGGCAGCCTGTCGCAGTTCCTCGACGTC
    GACGCCGACAACGCCGGCGGCGGCAGCGGCCGTGTCCTCCAGTGGTCCACGAGGATCTCC
    ATCATCAAGGGCATCGCCAAAGGAATTGAGTATCTGCACAGCACAAGGACGAACAAGCCG
    GCCCTCGTCCACCAAAACATCTCAGCGGACAAGGTCCTGCTGGACTACGCGTACAGGCCC
    CTCATCTCCGGGTGCGGCCTGCACAAGCTCCTCGTGGACGACCTCGTCTTCTCGACGCTC
    AAAGCCAGCGCCGCCATGGGGTACCTGGCGCCGGAGTACACCACCGTGGGCCGCTTCTCG
    GAGAAGAGCGACGTGTACGCGTTCGGGGTCATCGTGTTCCAGGTGCTCACCGGCAAGAGC
    AAGGTGACGACGACGCATGCGCAGCTGCCTGACAACGACGTCGATGAGCTCGTCGACGGC
    AACCTGCAGGGGGATAATTACTCGGCGGCCGAGGCCGCCCAGCTGGCGAAGATCGGCTCG
    GCTTGCACCAGCGAGAACCCGGACCAGAGGCCGACGATGGCGGAGCTGCTCCAAGAACTG
    AGCACCGTCTGA
     
    FGeneSh predicted protein sequence:
    MIRFSFAAWPALRALTSFPRDDRPGSSSIQYWRSGSGSGSGSGSSDLPKPKTKPDDQTQS
    PKTSQATQQQKILKRPIIGPCNRTINMITGPYKLIRQCEVTAILILYELPRLLTGSILAH
    EMVHAYLRLTDASSSSSTSSSMKKGAKTEFEKRLGEFFKHQIETDSSVAYGDGFRALHVI
    PLPERTELEDPDEMQKWKWSLKKAKKTNRELHAERCDSELKLLILWCSLYAIALLELYLF
    LFCKHVARKLREKDGFYYPHNLDFRGRAYPMHPHLSHLGSDLCRGVLEYAEGRPLGKYGL
    CWLKIHLANKYGGGIEKLSHEGKLAFVENQLFDIFDSASNPVDGNCWWTNVEDPFQCLAA
    CMDLSDALRSPSPYHAVSHLPIHQDGSCNGLQHYAALGRDYQKLCLSIKNGRIGLDGKGL
    AGTLPPAVAGLRSLTGLYLHYNALRGGIPRELAALAALTDLYLDVNNFSGPVPPEIGAMA
    SLQVVQLCYNQLTGSIPTQLGNLSRLTVLALQSNRLNGAIPASLGDLPLLARLDLSFNRL
    FGSIPVRLAQLPSLVALDVRNNSLTGSVPAELAAKLQAGFQYGNNSDLCGAGLPALRPCT
    PADLIDPDRPQPFSAGIAPQVRPSDGRAPSTRALAAVVVVAVALLAATGVGLLALSWRRW
    RRQRVAGGSPSTVSGGRCSTEAAPSAAKASSARKSASSALASLEYSNAWDPLADARGGLG
    LFSQDALAQSLRISTEEVESATRYFSELNLLGRRGKKAGGLAGTYRGTLRDGTSVAVKRL
    GKTCCRQEEADFLSGLRLLAELRHDNVVALRGFCCSRARGECFLVYDFVPNGSLSQFLDV
    DADNAGGGSGRVLQWSTRISIIKGIAKGIEYLHSTRTNKPALVHQNISADKVLLDYAYRP
    LISGCGLHKLLVDDLVFSTLKASAAMGYLAPEYTTVGRFSEKSDVYAFGVIVFQVLTGKS
    KVTTTHAQLPDNDVDELVDGNLQGDNYSAAEAAQLAKIGSACTSENPDQRPTMAELLQEL
    STV

     

    GeneMark predicted two different genes (Exons 1-10, and 11-15) originally, but after observing the cDNA evidence it appears that the candidate gene lycopene epsilon cyclase 1 (lyce1, GRMZM2G012966) spans the region as there are multiple high-confidence hits to it. A dot plot was made to check for the occurence of a tandem repeat, which turned up negative.

    dot plot of Gene 3.png

    Expression

    MaizeGDB EST/cDNA

    est and cDNA from MaizeGDB.png

     

    mRNA from NCBI blastn against nucleotide db

    Click here for full results

    Summary of Results:

    mRNA evidence            
    Accession ID Start End Associated Predicted Exon(s) (FGSH/GM) % Match E.value
    gb|BT037027.1;  GENE ID: 100216601 LOC100216601  Gene3:cDNA1.14/15 5892 7376 14 and 15 99 0
    Gene3:cDNA1.13/14 5191 5794 13 and 14 100 0
    gb|BT067056.1| Gene3:cDNA2.14/15 5892 7386 14 and 15 93 0
    Gene3:cDNA2.13/14 5290 5794 13 and 14 88 2e^-164
    Gene3:cDNA2.0/0 4448 4817 none and none 87 6e^-110
    Gene3:cDNA2.0/13 5097 5218 none and 13 94 e^-42
    Gene3:cDNA2.0/12 4905 5007 none and 12 95 e^-35
    gb|BT063754.1|;  GENE ID: 100280448 lyce1  Gene3:cDNA3.11/0 3360 4245 11 and none 100 0
    Gene3:cDNA3.1/1 329 734 1 and 1 100 0
    Gene3:cDNA3.7/7 2409 2626 7 and 7 100 e^-107
    Gene3:cDNA3.5/5 1888 2061 5 and 5 100 3e^-83
    Gene3:cDNA3.8/0 2773 2919 8 and none 100 3e^-68
    Gene3:cDNA3.6/6 2169 2306 6 and 6 100 3e^-63
    Gene3:cDNA3.9/8 3007 3117 9 and 8 100 3e^-48
    Gene3:cDNA3.10/9 3187 3289 10 and 9 99 4e^-42
    gb|EU924262.1|/ lcyE-W22 allele Gene3:cDNA4.1/1 281 734 1 and 1 99 0
    Gene3:cDNA4.7/7 2409 2626 7 and 7 100 e^-107
    Gene3:cDNA4.0/0 3828 4017 none and none 100 4e^-92
    Gene3:cDNA4.5/5 1888 2061 5 and 5 100 3e^-83
    Gene3:cDNA4.8/0 2773 2919 8 and none 100 3e^-68
    Gene3:cDNA4.6/6 2169 2306 6 and 6 100 3e^-63
    Gene3:cDNA4.0/10 3588 3722 none and 10 100 e^-61
    Gene3:cDNA4.11/0 3360 3490 11 and noen 100 2e^-59
    Gene3:cDNA4.9/8 3007 3117 9 and 8 100 3e^-48
    Gene3:cDNA4.10/9 3187 3289 10 and 9 99 4e^-42

    GENE ID: 100216601 LOC100216601| = lycopene epsilon cyclase1 [Zeamays]

    EST from NCBI

    Click here for blastn results of gene 3 region using NCBI's EST database. A fair number of ESTs span the region and support the existence of an expressed gene consistent with gene models 1 and 2.

    Protein

    Predicted protein sequence of GRMZM2G012966 from MaizeGDB

    >lcl|GRMZM2G012966_P03 seq=translation; coord=8:138882594..138889638:1; parent_transcript=GRMZM2G012966_T03; parent_gene=GRMZM2G012966
    MGLSGATISAPLGCCVLRCGAVGGGKALKADAERWRRAGWSRRVGGPKVRCVATEKHDETAAVGAAVGVDFADEEDYRKG
    GGGELLYVQMQSTKPMESQSKIASKLSPISDENTVLDLVIIGCGPAGLSLASESAKKGLTVGLIGPDLPFTNNYGVWEDE
    FKDLGLESCIEHVWKDTIVYLDNNKPILIGRSYGRVHRDLLHEELLKRCYEAGVTYLNSKVDKIIESPDGHRVVCCDKGR
    EIICRLAIVASGAASGRLLEYEVGGPRVCVQTAYGVEVEVENNPYDPSLMVFMDYRDCFKEEFSHTEQENPTFLYAMPMS
    PTRVFFEETCLASKDAMSFDLLKKRLMYRLNAMGIRILKVYEEEWSYIPVGGSLPNTDQKNLAFGAAASMVHPATGYSVV
    RSLSEAPRYASVISDILGNRVPAEYMLGNSQNYSPSMLAWRTLWPQERKRQRSFFLFGLALIIQLNNEGIQTFFEAFFRV
    PRWMWRGFLGSTLSSVDLILFSFYMFAIAPNQLRMNLVRHLLSDPTGSSMIKTYLTL

    Blastx of Model 1 CDS

    gene 3 model 1 blastx.png

    When expanded the "NADB_Rossmann superfamily" (blue bars) in all three reading frames are exactly lined up with domains of lyce1. Top two most significant hits align to lyce with E.values of 0 and 2e^-179, respectively.

    Blastx of Model 2 CDS

    Gene 3 model 2 blastx.png

    When expanded the "NADB_Rossmann superfamily" (blue bars) in all three reading frames are exactly lined up with domains of lyce1. Top two most significant hits align to lyce with E.values of 0 and 2e^-178, respectively.

    Blastp of GRMZM2G012966 protein sequence

    Full results file click here

    MaizeGDB model blastx results.png

    Summary of protein results

     

    blastx of MaizeGDB gene model
    Organism % Match E.Value  
    Arabidopsis thaliana 71% 0  
    Tomatoe 72% 0  
    Tobacco 38% 4e^-89  

     

    The blastp of the GRMZM2G012966 model gives the most complete alignment to lyce1, so that model is likely the most correct model. The lyce1 alignment for Models 1 and 2 are truncated at the 3' end, suggesting that the models have a framshift mutation somewhere in exon 8. 

    Models 1 and 2 do provide some additional information, however. The 3' end of the sequence contains a protein kinase-like (PKc_like superfamily) region, which suggests that both models should in fact be split up into two different genes. This means that the cDNA (gb|BT037027.1;  GENE ID: 100216601 LOC100216601  is not part of the lcye1 gene, confirming that the gene is confined between coordinates 82,726 and 85,759 (or 138882727 and 138885760 in the Reference Genome).

    The blastx results of Model 2 show that exons 8 and 9 (the ones not contained in Model 1 or GRMZM2G012966) aligns to lyce1 (see reading frame +1 between 2000 and 2500 of the query). This supports the idea that an alternative gene model involving alternatively spliced exons may exist for lyce1; for example, a combination of the GRMZM2G012966 model with exons 8 and/or 9 from Model 2 may transcribe an alternatively spliced lyce1 protein.

    Conclusion

    Considering the three models it seems likely that the true gene model for lyce1 contains 8 exons, exists between coordinates 82817 and 85887 of the 150kb segment (138882818 and 138885888 of the reference genome). The existing model, GRMZM2G012966, seems to be the best fitting model. Model 2 also shows better alignment than model 1 to lyce1, so while they may be alternative transcripts of one another, it is also possible that model 1 was designmed based off a cDNA that was captured before mRNA processing.

    GRMZM2G012966 gene model.jpg

    From Models 1 and 2 we learned that the 3' end of this region (87588 to 88289) codes for a putative protein kinase-like enzyme. So exons 9 and 10 from Model 1 (same as exons 11 and 12 from Model 2) could be their own model for a putative protein kinase-like enzyme.

    Furthermoe, this evidence to confine the lyce1 region to the region noted above explained by the model, it does not extend to the 3' end of the region as may have been beleived from the blastn hit to gb|BT037027.1| cDNA (described above). Similarly, transcripts T01 and T02 in the above image, which are putative transcript variants of lyc1 are not likely since they overlap with the PKc_like superfamily region.

    Was this page helpful?
    Tag page (Edit tags)
    • No tags

    Files 17

    FileSizeDateAttached by 
     cDNAs supporting models.png
    No description
    30.36 kB00:15, 3 Dec 2012tiedeActions
     dot plot of Gene 3.png
    No description
    4.32 kB21:47, 2 Dec 2012tiedeActions
     est and cDNA from MaizeGDB.png
    No description
    177.87 kB22:27, 2 Dec 2012tiedeActions
    gene 3 model 1 blastx.png
    No description
    71.8 kB02:03, 3 Dec 2012tiedeActions
     gene 3 model 1 table.jpg
    No description
    103.44 kB00:50, 5 Dec 2012tiedeActions
     Gene 3 model 2 blastx.png
    No description
    48.81 kB09:24, 6 Dec 2012tiedeActions
     gene 3 model 2 cds.jpg
    No description
    323.59 kB09:22, 6 Dec 2012tiedeActions
     gene 3 model 2 table.jpg
    No description
    105.65 kB09:18, 6 Dec 2012tiedeActions
     Gene 3 Model Sequences.docx
    No description
    18.42 kB02:00, 3 Dec 2012tiedeActions
     GRMZM2G012966 gene model.jpg
    No description
    25.96 kB02:15, 3 Dec 2012tiedeActions
     MaizeGDB model blastx results.png
    No description
    25.95 kB01:32, 3 Dec 2012tiedeActions
     MaizeGDB model CDS.png
    No description
    33.82 kB00:47, 3 Dec 2012tiedeActions
     model 1 cds.png
    No description
    75.56 kB00:47, 3 Dec 2012tiedeActions
     NCBI Blast_Gene 3 FGeneSH model blastp.pdf
    No description
    1902.35 kB08:49, 3 Dec 2012tiedeActions
     NCBI Blast_Gene 3 Maize GDB model blastx.pdf
    No description
    541.55 kB02:21, 3 Dec 2012tiedeActions
     NCBI Blast_Gene 3 whole sequence blastn est db.pdf
    No description
    1678.36 kB00:56, 3 Dec 2012tiedeActions
     NCBI Blast_Gene 3 whole sequence blastn nr_nt db.pdf
    No description
    1902.93 kB00:58, 3 Dec 2012tiedeActions
    You must login to post a comment.