Was this page helpful?

Sequence 204 Gene Prediction 204e Analysis Process


     

    Initial predicted protein (Genemark)


    >seq204 1600001-2000000 Xiaoqing Yu 09-16-2010_4|GeneMark.hmm|gene 4|191_aa
    MSAAGQQGPEIWRPRPSAIPVSRVPVEPDTAVAEVAPARPASLVEAIVVSSEDEAEAAPH
    VASPTVSDLLRAILDTPEVTPSSGAGCAEGASSSRSPWRIVYGPLLGDETIIEPPIIGVS
    GDLVRSGPDPTLWGGSPLAWTSTEGDPYFVLDDVEERDMVTKVFLPPQELMKRSMSKSIF
    LHRERCVWEAL


    Initial Blast result (Genemark)

    No match

    204e-G_Blastp.jpg
     
    Initial predicted protein (FGENESH)


    >FGENESH:   2   2 exon (s)  76128  -  76824   159 aa, chain -
    MSAAGQQGPEIWRPRPSAIPVSRVPVEPDTAVAEVAPARPASLVEAIVVSSEDEAEAAPH
    VASPTVSDLLRAILDTPEVTPSSGAGCAEGASSSRSPWRIVYGPLLGDETIIEPPIIGVS
    GDLVRSGPDPTLWGGSPLAWTSTEGDPYFVLDDVEEREY


    Initial Blast result (FGENESH)  

    No match

    204e-F_Blastp.jpg
    Two regions combined sequence (masked) 74K-79K


    >seq204 1600001-2000000 Xiaoqing Yu 09-16-2010 [ Using bases 74000 to 79000 ]
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGCTGCACCCTCC
    TACTTAAGCCCCTCGCACCGGGACGTCAGGTCAGCCACTTTGGCGCTCCTCCGTGCCAGC
    TGCTCATTCGACCGAAGCAGCGCGGTCCGATGGCGCTACAGCGCTTCCCAGACACAGCGT
    TCGCGGTGGAGGAAGATGGACTTGGACATCGACCTCTTCATGAGCTCCTGCGGGGGAAGG
    AAAACTTTCGTCACCATGCTGCTAAGGGACAAATAATCTATACAAAAGAGAGCAACGCCC
    GTACGTTGGCCAACGCCCTGATCTCAGCCTAGTACTCCCGCTCCTCGACGTCGTCGAGGA
    CAAAGTATGGGTCGCCCTCGGTAGACGTCCACGCCAGAGGCGACCCTCCCCATAACGTGG
    GGTCTGGCCCGGAGCGAACAAGGTCACCGCTAACCCCTATGATAGGGGGCTCGATGATCG
    TTTCATCACCGAGCAGCGGCCCGTAGACTATTCTCCATGGCGACCTCGAGGAGCTGGCCC
    CCTCGGCGCATCCCGCCCCCGAGCTCGGCGTGACTTCGGGGGTGTCGAGGATGGCGCGTA
    GGAGGTCGGAGACCGTGGGCGACGCGACGTGCGGCGCAGCTTCTGCCTCGTCCTCGGAGG
    AGACGACAATGGCCTCCACCAGCGACGCGGGACGAGCGGGAGCCACCTCAGCCACAGCCG
    TGTCAGGCTCAACCGGCACCCTGAGAGAGGGCACAGCCACCTCAGCGCCCTGCTGAGGCA
    GCTCCTAGACATAAGGGGGGCAGGACGAGGCCCCCACAGACATCCCCGCACCGCGCTTCA
    CCGCCTTGGTGATAGTAAAGATCGGCGCATTGGCTCAACGCTTGGAGCTAAGGACAGAAA
    CGACAGTCAGAGGAAAAACGGTGCATGAAAAGACGTTCTCGCACTTTGACTACATACCTG
    GAGACAGGTATTGCTGAAGGCCTTGGCCTCCAGATTTCTGGCCCCTGCTGCCCCGCCGCC
    GACATAGGCAGGGAATGGTCACCTCTTCTGCAACCCCCGACTCAGGGGCCGGGGATGGAA
    CGATGGCCGAGGTCACTATGGTCCCCAGGACGACGACATGAGGGGAGTGACGGGCCCCTC
    CTCCATGTTCCCCCCAGCGGGCGGCCGCTGAGGGGACTCGGCCGGCGCACCCGTAGGCGC
    GACGACGACTTCCACGGGAAGGGGAGCGACTTCCCCCTCCACGACCTCTGCCTCTTCCAC
    CGAGGCAGGGATGGGGGCAGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNTCTTTGGGCAGAA
    GGTGAAAACAGCGTTCGAAGGCAAAGGAATGAAACAACAAAGGTAACGGTCTGAAGCTTC
    CCCCTTGGTCACCTTTATATAGGGACAGAGCCACCAAAGTGTGAATCTAAAAAGGCCGGA
    CGACGGTCGTTTCGAATCCTACGCGCACGCGTGCCGAGAATGGTCGGCTGACCGGCAGTA
    TGTCCTGTCGCATTAAATGGCCATGTCCCCACCGAGATAAAATGTGTCGAGCGTCATTCC
    TCATCCCGCCTCAAAAGGTACACCCTCTACCCTCGACTTCTCTGCCCTTTATCCTCTTGG
    CAATCGGACAAGAAGGAAGTGTCTTCTGCCTCGGGCAACCCCCCTCGGAAAAGGGAGCAG
    CGAATCAGGAAAAGTCGTCTTCTGCCTCGGCATGTCACTCCGGGAGGAAGAAAGGCAAGG
    AAAGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
    NNNNNNNNNNNNNNNNNNNNN


    BlastX confirmation


    No good match
    204e-combined(masked)-blastx.jpg
     

    Blastn confirmation

     

    Used to avoid frame shift. Found several matched hypothetical genes:
    Matched to a hypothetical protein LOC100502455 e-value=0 1855-2565 equal to (75885, 76565) genemark predicted region. Zay maize mRNA e-value=-35, region:1855-2030+2110-2278 (consider the upper long one only)
    First consideration is that there exist a frame shift!
    204e-combined(masked)-blastn.jpg

    LOC100502455 protein sequence:


    >gi|308080604|ref|NP_001183862.1| hypothetical protein LOC100502455 [Zea mays]
    MCALPIALRTGAPRAAFPSSSAFSALTIAPHRRLQPSSSVPQPLSQHMTSCAGASLFAIVSQRAQPSSGH
    PLHLHAIHLQASRNRRSSFDDQVIKLSVFYETEDPDYLFGESFLWCNLCSGK


    Compared this to the my Genemark protein


    Only 4 aa match. It do exist a frame shift because the DNA sequence is same but the Protein can not match to each other.
    LOC100502455 Blast against 204e-G protein.jpg
     

     

    Use LOC100502455 Blast against reference protein


    LOC100502455 Blast against reference protein database.jpg

    All results are maize, can not trustable , no good evidence
    Use LOC100502455 Blast against Swiss protein,
    not trust, no good evidence, no putative conserved domains have been detected

    Use LOC100502455 PSI-Blast against NR (non-redundant protein)


    No good evidence, highest e-value =e-15, it is a very short protein, so stop here, can not go further.
     
     LOC100502455 PSI-Blast against NR (non-redundant protein) database.jpg


    Last step:Unmasked region to do a blastn; 75500-77000


    Same result to Unmasked blast. The first 300 bp region is the retrotransposon region (should ignore)
    Finally stop here. I consider we do not have strong evidence to see there have a gene. The most possible reason I guess here is that the LOC100502455 annotation may predicted wrong. Or by less chance, this gene is a unique of maize or a new gene.

    Unmasked region 75500-77000 blastn result.jpg

    Was this page helpful?
    Tag page (Edit tags)
    • No tags
    You must login to post a comment.