Was this page helpful?

Group 6, Gene I

    Table of contents
    No headers

    Gene I (also mentioned as Gene 2 elsewhere) was predicted by both GeneMark and FGENESH. Both of the programs predicted perfectly same 3 exons. We ran a nucleotide BLAST search in the refseq_rna database using the prediction and it resulted

    >ref|NM_001139368.1| Zea mays uncharacterized LOC100194332 (LOC100194332), mRNA gb|BT036851.1| Zea mays full-length cDNA clone ZM_BFb0143J15 mRNA, complete cds Length=1384

    Score =  1051 bits (569),  Expect = 0.0 Identities = 569/569 (100%), Gaps = 0/569 (0%) Strand=Plus/Plus

    Multiple alignment was done with the translated sequences of GENE I and predicted mRNA. It showed

    translated sequences of GENE I and predicted mRNA.png

    The Gene I was found to have some additional sequences prior to the start codon in the predicted protein. So we looked at our sequence and removed the additional sequence considering it as a part of intron/unexpressed region. On the other hand, we found some amino-acid sequence in the predicted protein that was not present in the translated sequence of the Gene I. We suspected that the sequence might have been removed as intron from our original sequence. We went back to our original sequence and found that a sequence between exon 2 and exon 3 has been removed as it was predicted to be intron by FGENESH and GENEMARK. The sequence (so called intron) resembled with nucleotide sequence of the missing amino-acid sequence except 8 T-mers in our sequence instead of 9 T-mers. So, we concluded that the lack of one T might be sequencing error and the two gene prediction programs mistakenly predicted the part of exon as an intron. Then we introduced the sequence with added T in between exon 2 and 3. This reduced our exon number from 3 to 2. Then we did the multiple alignment of the two nucleotide sequences ( i.e. predicted protein and corrected Gene I) and found that the two nucleotide sequences resembled each other except an extra C in Gene I.

    alignment 0f the two nucleotide sequences.png

    Considering it as a sequencing error, we deleted the C from our Gene I. Then we translated the corrected Gene I sequence, and the translated sequence was aligned with sequence of predicted protein . This gave a perfect match.

    perfectly matched alignment.png

    Gene I Model:

    -predicted identity: Zea mays uncharacterized protein LOC100194332

    -protein features: Soluble protein

    -number of exons: 2

    -bp range: 12,202-13479 in group 6 sequence

    -starts with start codon: Yes

    -ends with stop codon: Yes

    -all splice sites appear correct: Yes

    -nucleotide sequence:

    -start codon in green, stop codon in red, splice sites in blue, intron in orange, exons in black

    ATGCTCTGCGAAGGTGGCGACGACAATAATGGCCGCAGCGGCCTAAGGC
    TCTTCGGAGTGCAGGTCCGTATCGGCGGCGGCGGTGGCGCAGGGTCGGCG
    TCCATGAAGAAGAGCTACAGCATGGACTGCCTGCAGCTCGCGGCTCCTCA
    TGCTTGTTCGTCCCTCGTCTCGTCCCCGTCGTCGTCGTCGTTGTGCTCGT
    CCTCGTCCCCGTCGTCGCTGCTGCTGTCGATTGATGATGGGTTGCAGAGA

    GGGGCGGCCGATGGATACCTGTCTGATGGACCTCATGGCAGAGCTGTGCA

    GGAGAGGAAGAAAGGTACGTCAAGAAATTTTTTTCCCTTCCTGATGAAAA

    GACAAAAGAGGAAGATTTGTTCTTTGGAAATTCAGAAACTTCACCTTACA
    CGAATTTCTTTTCTTATTGGTAATACGTACTGTCCATCAGTCGTCTGACC
    TAAAAAAAAAGTTGCATTATGCTCTACTTTTTTTTGGCCTAAAAAAAGAG
    AGTAATATAGCTTTATCTTCCACTGATGCTGCAACCCTTCAACTGCAAGT
    AACAATGAAGCAGCTGAGACTGAGAGATACTAGTGCTTCCCTGCACATTA

    TTGAGTCCTTGTTGGGTTGGTCGGTAGCAGCTGATGTTTCTAAACAAAAA

    CAATACTCTGGATATATACCGACGACGCCAGGAGTTCCATGGAGCGAGGA

     

    GGAGCACAGGCAGTTCCTTGCCGGCCTGGACAAGCTGGGCAAGGGCGACT
    GGCGAGGCATCGCCAGGAGCTACGTGCCGACGAGGACCCCGACGCAGGTC
    GCCAGCCACGCGCAGAAGTTCTTCCTCCGGCAGAGCAGCATGGGGAAGAA
    GAAGCGCCGGTCCAGCCTCTTCGACATGGTATGCGCATGCAGGCACCAAC
    ATCTCTCTCTTTTTTTTCTGTACGTAATTGCTTGAGTTTTGATAACCGGG
    TCCAGCGGCAGGTGCCGATCTGCGAGAACAGTGCGAGCATCTCCGATCCG
    CTGGACGTGGCACGCCATGGCGTCGCCTCGGCAGACTCCGAGAGGGCGGC
    CGTGGACGTGGACCTGATGAATTCGACGACGGAGGGTGACGACGGCAGGA
    GCAGGGACGTCGTGTCGTCGGCGTCAGGTGCAGGGACCGCACTGCGGTTC
    CCGGCAGCGGCAGCTCAGACGGAGCCGCTGCATCCTTCGTCATCGCATGG

    CCATGGGCGCGGCCACCACTGCTCTCCGTTGGACCTGGAGCTGGGCATGT

    CCCTGCCCGCCCCATCCGTCGGAACGTGA

     

    -amino acid length: 303 without stop codon

    -amino acid sequence:

    MLCEGGDDNNGRSGLRLFGVQVRIGGGGGAGSASMKKSYSMDCLQLAAPHACSSLVSSPS
    SSSLCSSSSPSSLLLSIDDGLQRGAADGYLSDGPHGRAVQERKKGVPWSEEEHRQFLAGL
    DKLGKGDWRGIARSYVPTRTPTQVASHAQKFFLRQSSMGKKKRRSSLFDMVCACRHQHLS
    LFFSVRNCLSFDNRVQRQVPICENSASISDPLDVARHGVASADSERAAVDVDLMNSTTEG
    DDGRSRDVVSSASGAGTALRFPAAAAQTEPLHPSSSHGHGRGHHCSPLDLELGMSLPAPS
    VGT-

    Evidence from the blast against EST database

    >gb|DV023755.1| ZM_BFb0143J15.r ZM_BFb Zea mays cDNA 5', mRNA sequence. Length=855 Score =  1463 bits (792),  Expect = 0.0 Identities = 797/799 (99%), Gaps = 1/799 (0%) Strand=Plus/Plus

    EST data covered both exon 1 and 2 except 114 nucleotides of exon 2.

    Alignment of EST sequence with the corrected Gene I

     Alignment of EST & corr. gene I.png

    FUNCTION prediction for the uncharacterized protein

    -Search for conserved motif revealed a DNA binding domain in the protein. It belonged to MYB DNA binding domain super family (SANT super family myb_SHAQKYF). So, it is predicted to be a MYB related transcription factor. A member of the Myb gene family to be identified was the C1 locus of maize. The Myb proteins bind to DNA and regulate gene expression. They have Myb domain, a sequence of approximately 50 amino acids. Multiple copies of this domain are frequently present as tandem repeats within a single protein . Blastx, interpro and other searches showed the presence of this conserved domain. (Ref: http://www.stanford.edu/group/lipsic...myb%20long.htm).

    Was this page helpful?
    Tag page (Edit tags)
    • No tags
    You must login to post a comment.