Annotator(s)

    Michael Gribskov

    smo9dottup.1.png

    Simple Statistics

    • Sequence: Smo9
    • Scaffold: 89 
    • Bases: 535000-660000 
    • Length: 125001 bases
    • GC content: 47%
    • Contains run of 212 Ns at bases 34250-34462

    Sequence Repeats

    Dotplot

    shows the presence of several direct repeats:

    • approx 2k direct repeat at base 1 and 15k
    • tandem repeats at bases 43k-53k
    • possibly three diverged tandem repeats aT 96k-125k. Some of these are the ones found below by etandem.  The first etandem repeat is a degenerate repeat of a shorter sequence - tcctacttgggc

    Etandem

      Start     End   Score   Size  Count Identity Consensus
     109775  110239     158     93      5     77.0 gactccatctcggactccaactcgggtccatctcgggtcctacttgggctccatttcggactccatctcgg
                                                   actccatctcgggtccatctcg
     116809  117552     142    186      4     72.0 gtcaaagtgggaatttgtacaaagccatgaagatgttcgattggatgccagacaagaatttgatctcatgg
                                                   aactcaattctaagagcctttgctcatcatgggcaacttgacgaagcaaagatattatttgataaaatgcc
                                                   cgagtgggacctaatgtcgttgaattcaatgcttgcggcatata
     117603  118253     106     93      7     65.3 cctgagcgaaatcttgtttcttggaacgctatgcttgcagcatatgctcaacatgggcatattgaagatgc
                                                   aaaggtgctgtttgataacatg
      34251   34462     104    106      2     99.5 nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
                                                   nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnna
      32398   32553      71     39      4     85.3 actctccaacttactttgattcatcgcagtagctccatc
      19938   20059      47     61      2     94.3 accgaaacacatcaagaaaaggaagacctcgaaactaggattgatagaacaagagctcacc
       5405    5500      42     32      3     88.5 cgcaagaacaacaccatgaatcgcaaagaaaa
     109582  109773      38     96      2     84.9 ggcctcggtctcggtctcggtcccgatctcggtctcggtctcggtctcggtcccggtctcggtctcgactt
                                                   cggactccatctcggactccatctc
     113787  114065      34     93      3     72.8 acttggtgtcgtggaacaccatgcttgcagcttatacgaaggccggggatttgagaggagctagggagatt
                                                   ttcagcgccatgctggggaagg
      15489   15558      31     35      2     97.1 ttgctggatggagttggaccaagttcgatcgctgc
      15028   15111      28     42      2     91.7 cacaacgctttcgaaaagatgcaatgtttatatggaaattta
     111425  111516      27     23      4     77.2 ttggttttgttaagtttattgct
        496     605      27     55      2     87.3 agctagattgctgcaccgctgcgttgttggatggagctggatcgctgtttgctcg
     105153  105306      26     22      7     65.6 attattttaataatataaaaat
      15603   15682      26     40      2     91.2 gctggattgctgcttgcttggctcaggctgcttgtctggt
     106291  106386      26     48      2     88.5 ttaaaatttaatacttccatcattttttggcatgcaaaaccaccaaat
     104739  104885      24     21      7     65.3 ataaaaaatgaaaaaataaaa
       3572    3731      24     40      4     70.0 attttaataaaaatattaataaaatatattttcttatttg
      36853   36904      22     26      2     96.2 gaagatccacttaatcctcttaccgt
     116413  116475      20     21      3     82.5 gggcggcggcggtggtggggg
      96621   96684      20     32      2     90.6 agtgtcttctcatttgattgtaaagaacttca
    

    Repeat Masker

       SW   perc perc perc  query        position in query     matching  repeat           position in repeat
    score   div. del. ins.  sequence     begin end    (left)   repeat    class/family   begin  end    (left)  ID
    
     333   35.9  6.5  2.2  scaffold_89  102633 102942 (22059) C RIRE3_I   LTR/Gypsy      (2347)   3405   3083  11  
     339   29.4  2.0  2.0  scaffold_89  114662 114807 (10194) C SZ-17_I   LTR/Copia       (418)   2698   2553  17  
    

    Gene Predictions 

    Computational Predictions 

      some comparisons over the first 30K bases.  several gene predictions are consistent, but many wildly different.


    Genemark (maize)
    FGENESH (rice)
    Genscan (maize)


    # begin
    end
    exons
    #
    begin
    end
    exons
    #
    begin
    end
    exons


    1 553
    6056
    20 -
    1
    553
    6062
    16 -
    2
    696
     7211 10 -
    lipase, thymidilate kinase, and possibly more - this model is certainly more than one gene.  genemark models are significantly better in blastp match.  Genscan model has just the lipase, but poorer match.

     2 6585
    7955
    7 +
    2
    6585
    10297
    5 +






     3 8000
    10297
    1 +




    3
    8225
    10297
    4 +


     4 10479
    11227
    4 -










     5 11415
    15049
    8 +
    3
    12385
    15049
    6 +














     4 14627
    21216
    10 -


     6 15376
    17090
    6 -
    4
    15376
    21216
    12 -






     7 17552
    21216
    9 -










     8 22747
    23201
    2 +










     9 24193
    26868
    13 -
    5
    24294
    27657
    15 -
    5

    26868
    4 -


     10 27012
    27725
    3-
    6



    6
    27012
    27725
    5-


    11
    27996
    29854
    7 +
    7
    28286
    29717
    8-
    7
    28495
    27775
    4-

    Curated/Reviewed predictions

     Gene  Type  Strand  Begin  End  Genmark (GM)
     FGeneSH (FG)
    GenScan (GS)
     Comments

    Smo9-A

    unknown or pseudogene

    or contaminant?

    Exon
    -
    3109
    2998
    Gene 1, exon 10-11



    Similar to thymidylate kinase, but closest matches are in fungi, and C-terminal 1/4 of protein is missing.  Missing part is highly conserved.


    BlastP

    Exon
    -
    3283
    3171

    Smo9-1

    Triacylglycerol lipase 1























     Exon
     - 6056 5892 Gene1, exons 1-9







    Gene 1, exons 9-16, missing GM exon 3




    missing GM exons 1,2,9







    Matches 4-377/393 to At LIP1 in BLASTP.  Small insertions are not near intron/exon boundaries

    BlastP

     

     Exon  - 5818 5708
     Exon  - 5657 5517




     Exon  - 5159 5049
     Exon  - 4994 4851
     Exon  - 4743 4703
     Exon  - 4641 4565
     Exon  - 4513 4418
     Exon  - 4340 4060


    Exon
    -









    Exon
    -


    Was this page helpful?
    Tag page (Edit tags)
    • No tags

    Files 5

    FileSizeDateAttached by 
     genscan.maize.pep.fa
    Genscan.maize peptide predictions
    8.63 kB22:54, 10 Oct 2008gribskovActions
     smo9.fgenesh.monocot.fa
    No description
    85.94 kB10:17, 10 Oct 2008gribskovActions
     smo9.fgenesh.pdf
    No description
    560.18 kB10:17, 10 Oct 2008gribskovActions
     smo9.genemarkhmm.fa
    No description
    25.32 kB10:21, 10 Oct 2008gribskovActions
     smo9dottup.1.png
    No description
    7.45 kB08:43, 10 Oct 2008gribskovActions
    You must login to post a comment.