Was this page helpful?

Sequence 190 Supplemental results

    Table S1. Sequence Composition

    Result of the EMBOSS compseq program.


    # The Expected frequencies are calculated on the (false) assumption that every
    # word has equal frequency.
    #
    # The input sequences are:
    # seq190 Word size 2 Total count 399999
    #
    Word    Obs N    Obs Freq    Exp Freq    Obs/Exp Freq
    AA    31319    0.0782977    0.0625000    1.2527631
    AC    21031    0.0525776    0.0625000    0.8412421
    AG    25454    0.0636352    0.0625000    1.0181625
    AT    27698    0.0692452    0.0625000    1.1079228  
    CA    25930    0.0648252    0.0625000    1.0372026
    CC    23033    0.0575826    0.0625000    0.9213223
    CG    16958    0.0423951    0.0625000    0.6783217
    CT    25651    0.0641277    0.0625000    1.0260426
    GA    25722    0.0643052    0.0625000    1.0288826
    GC    21821    0.0545526    0.0625000    0.8728422
    GG    23090    0.0577251    0.0625000    0.9236023
    GT    22199    0.0554976    0.0625000    0.8879622
    TA    22526    0.0563151    0.0625000    0.9010423
    TC    25688    0.0642202    0.0625000    1.0275226
    TG    27331    0.0683277    0.0625000    1.0932427
    TT    32427    0.0810677    0.0625000    1.2970832

    Other  2121    0.0053025    0.0000000    

    Table S2: Location of Undefined Sequences

    Length Position Segment
    Length
    Length Position

    Segment
    Length

    Length Position Segment
    Length
    100 2347 2346 100 123229 8317 100 250910 82978
    100 9827 7380 100 124091 762 100 256385 5375
    100 10585 658 100 136838 12647 100 266082 9597
    100 65681 54996 100 154807 17869 100 303135 36953
    100 84093 18312 100 161635 6728 100 367505 64270
    100 84589 396 100 163379 1644 100 386360 18755
    100 114812 30123 100 167832 4353 100 388749 2289

    Position is the position of the first N in the sequence.  Segment is the length of the region, not including Ns, between the first N and the last N of the previous run.  The terminal segment is 11152 bases long.

    Table S3: CpG analysis. 

    Result of the EMBOSS cpgplot program.

    CPGPLOT islands of unusual CG composition
    seq190 from 1 to 400000
    
         Observed/Expected ratio > 0.60
         Percent C + Percent G > 50.00
         Length > 200
    
     Length 214 (48..261)
     Length 201 (1005..1205)
     Length 309 (2032..2340)
     Length 280 (7186..7465)
     Length 201 (8783..8983)
     Length 657 (10682..11338)
     Length 296 (15432..15727)
     Length 284 (16457..16740)
     Length 276 (17267..17542)
     Length 546 (18091..18636)
     Length 251 (19575..19825)
     Length 240 (20007..20246)
     Length 235 (21053..21287)
     Length 565 (21341..21905)
     Length 289 (22552..22840)
     Length 262 (22863..23124)
     Length 373 (23580..23952)
     Length 376 (24411..24786)
     Length 432 (29166..29597)
     Length 462 (29614..30075)
     Length 292 (34145..34436)
     Length 327 (49521..49847)
     Length 732 (50172..50903)
     Length 204 (52093..52296)
     Length 232 (52971..53202)
     Length 717 (57386..58102)
     Length 203 (59258..59460)
     Length 201 (60141..60341)
     Length 256 (61134..61389)
     Length 348 (61796..62143)
     Length 484 (63556..64039)
     Length 242 (64124..64365)
     Length 908 (70224..71131)
     Length 352 (71780..72131)
     Length 228 (75163..75390)
     Length 401 (79268..79668)
     Length 328 (80927..81254)
     Length 459 (81801..82259)
     Length 291 (82463..82753)
     Length 363 (85045..85407)
     Length 1225 (88168..89392)
     Length 1276 (91078..92353)
     Length 485 (92862..93346)
     Length 205 (93730..93934)
     Length 964 (95970..96933)
     Length 366 (100698..101063)
     Length 232 (101519..101750)
     Length 315 (102531..102845)
     Length 208 (105212..105419)
     Length 352 (109076..109427)
     Length 312 (110686..110997)
     Length 370 (111824..112193)
     Length 291 (116648..116938)
     Length 211 (117160..117370)
     Length 583 (122655..123237)
     Length 702 (123366..124067)
     Length 553 (127123..127675)
     Length 558 (133325..133882)
     Length 215 (134785..134999)
     Length 245 (135135..135379)
     Length 333 (135626..135958)
     Length 313 (136528..136840)
     Length 823 (136947..137769)
     Length 628 (141419..142046)
     Length 219 (142812..143030)
     Length 265 (143177..143441)
     Length 456 (160223..160678)
     Length 201 (160965..161165)
     Length 245 (161337..161581)
     Length 202 (166475..166676)
     Length 414 (167962..168375)
     Length 288 (168981..169268)
     Length 702 (170030..170731)
     Length 214 (172645..172858)
     Length 308 (177087..177394)
     Length 220 (181655..181874)
     Length 257 (181938..182194)
     Length 264 (193605..193868)
     Length 495 (193945..194439)
     Length 216 (194585..194800)
     Length 284 (195719..196002)
     Length 461 (196039..196499)
     Length 395 (196968..197362)
     Length 640 (197372..198011)
     Length 223 (200801..201023)
     Length 355 (201195..201549)
     Length 746 (204637..205382)
     Length 1062 (205423..206484)
     Length 1416 (206514..207929)
     Length 211 (208074..208284)
     Length 361 (208350..208710)
     Length 353 (208780..209132)
     Length 1364 (209202..210565)
     Length 728 (210583..211310)
     Length 548 (211579..212126)
     Length 336 (212132..212467)
     Length 346 (212525..212870)
     Length 206 (213113..213318)
     Length 577 (213534..214110)
     Length 962 (214333..215294)
     Length 753 (215439..216191)
     Length 481 (216496..216976)
     Length 1734 (217011..218744)
     Length 1724 (218949..220672)
     Length 350 (220710..221059)
     Length 583 (221152..221734)
     Length 451 (236525..236975)
     Length 351 (244323..244673)
     Length 377 (247003..247379)
     Length 390 (256476..256865)
     Length 646 (256878..257523)
     Length 480 (259960..260439)
     Length 211 (264624..264834)
     Length 771 (266943..267713)
     Length 644 (267726..268369)
     Length 386 (273888..274273)
     Length 440 (274740..275179)
     Length 206 (279138..279343)
     Length 257 (284088..284344)
     Length 264 (284514..284777)
     Length 365 (285831..286195)
     Length 256 (293107..293362)
     Length 305 (293386..293690)
     Length 369 (294246..294614)
     Length 205 (299941..300145)
     Length 504 (300176..300679)
     Length 310 (302640..302949)
     Length 1038 (303377..304414)
     Length 356 (304484..304839)
     Length 2010 (304914..306923)
     Length 539 (306952..307490)
     Length 1065 (307534..308598)
     Length 853 (308780..309632)
     Length 1611 (309650..311260)
     Length 1029 (311319..312347)
     Length 1777 (312364..314140)
     Length 355 (314201..314555)
     Length 327 (316587..316913)
     Length 258 (319839..320096)
     Length 897 (321192..322088)
     Length 276 (331826..332101)
     Length 315 (332120..332434)
     Length 202 (332623..332824)
     Length 289 (336586..336874)
     Length 490 (337817..338306)
     Length 243 (338427..338669)
     Length 248 (343859..344106)
     Length 262 (344332..344593)
     Length 214 (345995..346208)
     Length 360 (347218..347577)
     Length 323 (351026..351348)
     Length 219 (363539..363757)
     Length 359 (363944..364302)
     Length 314 (367631..367944)
     Length 204 (368411..368614)
     Length 393 (369256..369648)
     Length 554 (369830..370383)
     Length 339 (370388..370726)
     Length 849 (370926..371774)
     Length 1133 (371778..372910)
     Length 275 (372940..373214)
     Length 641 (373241..373881)
     Length 1120 (373988..375107)
     Length 1062 (375158..376219)
     Length 236 (376232..376467)
     Length 1339 (376477..377815)
     Length 1138 (377892..379029)
     Length 237 (391703..391939)
     Length 428 (393220..393647)
     Length 559 (394267..394825)
     Length 232 (395054..395285)
     Length 775 (395287..396061)
     Length 983 (396070..397052)
     Length 1182 (397131..398312)
     Length 343 (398398..398740)
    

    Table S4: Retroelement and Transposon Analysis

    Using the Repeatmasker server, http://www.repeatmasker.org/cgi-bin/WEBRepeatMasker

    total length:     400000 bp  (397900 bp excl N/X-runs)
    GC level:         46.35 %
    bases masked:     302901 bp ( 75.73 %)
    ==================================================
                   number of      length   percentage
                   elements*    occupied  of sequence
    --------------------------------------------------
    Retroelements          109       264030 bp   66.01 %
       SINEs:                0            0 bp    0.00 %
       Penelope              0            0 bp    0.00 %
       LINEs:                2         3499 bp    0.87 %
        CRE/SLACS            0            0 bp    0.00 %
         L2/CR1/Rex          0            0 bp    0.00 %
         R1/LOA/Jockey       0            0 bp    0.00 %
         R2/R4/NeSL          0            0 bp    0.00 %
         RTE/Bov-B           0            0 bp    0.00 %
         L1/CIN4             2         3499 bp    0.87 %
       LTR elements:       107       260531 bp   65.13 %
         BEL/Pao             0            0 bp    0.00 %
         Ty1/Copia          35        79983 bp   20.00 %
         Gypsy/DIRS1        72       180548 bp   45.14 %
           Retroviral        0            0 bp    0.00 %
    
    DNA transposons         64        37209 bp    9.30 %
       hobo-Activator        5         3158 bp    0.79 %
       Tc1-IS630-Pogo        0            0 bp    0.00 %
       En-Spm               49        32925 bp    8.23 %
       MuDR-IS905            0            0 bp    0.00 %
       PiggyBac              0            0 bp    0.00 %
       Tourist/Harbinger     1          131 bp    0.03 %
       Other (Mirage,        0            0 bp    0.00 %
        P-element, Transib)
    
    Rolling-circles          0            0 bp    0.00 %
    
    Unclassified:            0            0 bp    0.00 %
    
    Total interspersed repeats:      301239 bp   75.31 %
    
    
    Small RNA:               0            0 bp    0.00 %
    
    Satellites:              0            0 bp    0.00 %
    Simple repeats:         10          832 bp    0.21 %
    Low complexity:         20          830 bp    0.21 %
    ==================================================
    
    * most repeats fragmented by insertions or deletions
      have been counted as one element
                                                          
    
    The query species was assumed to be zea           
    RepeatMasker version open-3.2.9 , default mode
    

    Table S5: FGENESH gene prediction

    program: FGENESH, Softberry, Inc.

    input: repeat masked sequence

    FGENESH 2.6 Prediction of potential genes in Monocot genomic DNA
     Time    :   Thu Sep 16 17:51:23 2010
     Seq name: seq190 1-400000
     Length of sequence: 400000
     Number of predicted genes 11: in +chain 7, in -chain 4.
     Number of predicted exons 31: in +chain 15, in -chain 16.
     Positions of predicted genes and exons: Variant   1 from   1, Score:286.823096
       G Str   Feature   Start        End    Score           ORF           Len
    
       1 -      PolA     16467               -6.16
       1 -    1 CDSl     16677 -     16982   -5.61     16677 -     16982    306
       1 -    2 CDSi     17205 -     17348    7.13     17205 -     17348    144
       1 -    3 CDSi     18277 -     18412   15.17     18277 -     18411    135
       1 -    4 CDSi     18459 -     18559   -9.02     18461 -     18559     99
       1 -    5 CDSi     18617 -     18820   10.79     18617 -     18820    204
       1 -    6 CDSi     18937 -     19053    2.88     18937 -     19053    117
       1 -    7 CDSi     19094 -     19420   14.41     19094 -     19420    327
       1 -    8 CDSf     19656 -     19787    3.84     19656 -     19787    132
       1 -      TSS      19901               -4.18
    
       2 +      TSS      33459               -2.48
       2 +    1 CDSf     34269 -     34418    4.32     34269 -     34418    150
       2 +    2 CDSl     34533 -     34703   19.56     34533 -     34703    171
       2 +      PolA     34946               -5.46
    
       3 -      PolA     80169                0.44
       3 -    1 CDSl     81435 -     81658    2.03     81435 -     81656    222
       3 -    2 CDSi     82212 -     82287    2.53     82213 -     82287     75
       3 -    3 CDSi     82421 -     82570    1.82     82421 -     82570    150
       3 -    4 CDSf     82782 -     82982   17.48     82782 -     82982    201
       3 -      TSS      84851               -6.28
    
       4 -      PolA     87657               -2.56
       4 -    1 CDSo     88223 -     89377  119.45     88223 -     89377   1155
       4 -      TSS      89944               -2.48
    
       5 +      TSS      90767               -5.78
       5 +    1 CDSo     91169 -     92317  103.65     91169 -     92317   1149
       5 +      PolA     92428                0.44
    
       6 +      TSS     102262               -1.38
       6 +    1 CDSo    110346 -    110636   13.21    110346 -    110636    291
       6 +      PolA    111539                0.44
    
       7 +      TSS     146510               -6.68
       7 +    1 CDSo    146921 -    147088    8.26    146921 -    147088    168
       7 +      PolA    147545                0.44
    
       8 +      TSS     181589               -8.78
       8 +    1 CDSf    181731 -    181866   13.50    181731 -    181865    135
       8 +    2 CDSi    181967 -    182173    9.39    181969 -    182172    204
       8 +    3 CDSl    182809 -    182837   -3.93    182811 -    182837     27
       8 +      PolA    183545                0.44
    
       9 +      TSS     291450               -7.28
       9 +    1 CDSf    291830 -    291832   -5.36    291830 -    291832      3
       9 +    2 CDSi    292300 -    292421   18.74    292300 -    292419    120
       9 +    3 CDSl    292497 -    292545    4.35    292498 -    292545     48
       9 +      PolA    292610               -2.36
    
      10 -      PolA    297736                0.44
      10 -    1 CDSl    297947 -    298174   10.80    297947 -    298174    228
      10 -    2 CDSi    299988 -    300109   12.70    299988 -    300107    120
      10 -    3 CDSf    300942 -    300960   -4.88    300943 -    300960     18
      10 -      TSS     300964               -3.98
    
      11 +      TSS     318934               -1.58
      11 +    1 CDSf    321297 -    322032   18.73    321297 -    322031    735
      11 +    2 CDSi    323985 -    324043    0.75    323987 -    324043     57
      11 +    3 CDSi    324125 -    324181    4.74    324125 -    324181     57
      11 +    4 CDSl    324488 -    324688    6.30    324488 -    324688    201
      11 +      PolA    324740               -1.06

    Table S6: Genemark Gene Prediction

    Program: Eukaryotic GeneMark.hmm

    Input: Repeat masked sequence

    References:
    Lomsadze A., Ter-Hovhannisyan V., Chernoff Y. and Borodovsky M.,
    "Gene identification in novel eukaryotic genomes by self-training algorithm",
    Nucleic Acids Research, 2005, Vol. 33, No. 20, 6494-6506

     

    GeneMark.hmm (Version 2.2a)
    Sequence name: Thu Sep 16 18:04:24 EDT 2010
    Sequence length: 400000 bp
    G+C content: 86.75%
    Matrix: corn
    Thu Sep 16 18:04:28 2010
    
    Predicted genes/exons
    
    Gene Exon Strand Exon           Exon Range     Exon      Start/End
      #    #         Type                         Length       Frame
    
      1     1   +  Initial      16456     16544      89          1 2
      1     2   +  Internal     16733     16880     148          3 3
      1     3   +  Terminal     17340     17432      93          1 3
    
      2     6   -  Terminal     17770     17893     124          3 3
      2     5   -  Internal     18013     18089      77          2 1
      2     4   -  Internal     18277     18412     136          3 3
      2     3   -  Internal     18621     18820     200          2 1
      2     2   -  Internal     19094     19420     327          3 1
      2     1   -  Initial      19713     19787      75          3 1
    
      3     1   +  Initial      34281     34418     138          1 3
      3     2   +  Terminal     34533     34703     171          1 3
    
      4     1   +  Initial      81056     81061       6          1 3
      4     2   +  Internal     81181     81321     141          1 3
      4     3   +  Terminal     82541     82660     120          1 3
    
      5     2   -  Terminal     82918     83065     148          3 3
      5     1   -  Initial      83222     83322     101          2 1
    
      6     1   +  Initial      85243     85325      83          1 2
      6     2   +  Terminal     85591     85837     247          3 3
      7     1   -  Single       88223     89377    1155          3 1
    
      8     1   +  Single       91169     92317    1149          1 3
    
      9     2   -  Terminal    100674    100811     138          3 1
      9     1   -  Initial     100827    100853      27          3 1
    
     10     1   +  Initial     100907    101041     135          1 3
     10     2   +  Terminal    101482    101865     384          1 3
    
     11     1   +  Initial     110346    110364      19          1 1
     11     2   +  Terminal    110467    110636     170          2 3
    
     12     2   -  Terminal    125610    125717     108          3 1
     12     1   -  Initial     126074    126199     126          3 1
    
     13     3   -  Terminal    146651    146737      87          3 1
     13     2   -  Internal    146914    147017     104          3 2
     13     1   -  Initial     147300    147477     178          1 1
    
     14     1   +  Initial     181731    181866     136          1 1
     14     2   +  Internal    181967    182114     148          2 2
     14     3   +  Terminal    182292    182355      64          3 3
    
     15     2   -  Terminal    221929    222002      74          3 2
     15     1   -  Initial     222109    222157      49          1 1
    
     16     1   +  Initial     271738    271746       9          1 3
     16     2   +  Terminal    273091    273183      93          1 3
    
     17     1   +  Initial     292367    292414      48          1 3
     17     2   +  Internal    292497    292546      50          1 2
     17     3   +  Terminal    293252    293561     310          3 3
    
     18     3   -  Terminal    299625    299726     102          3 1
     18     2   -  Internal    299988    300077      90          3 1
     18     1   -  Initial     300170    300265      96          3 1
    
     19     1   +  Initial     321726    321874     149          1 2
     19     2   +  Internal    321942    321998      57          3 2
     19     3   +  Internal    322110    322163      54          3 2
     19     4   +  Internal    323791    323860      70          3 3
     19     5   +  Internal    323963    324043      81          1 3
     19     6   +  Internal    324125    324181      57          1 3
     19     7   +  Internal    324280    324378      99          1 3
     19     8   +  Internal    324488    324621     134          1 2
     19     9   +  Terminal    326130    326184      55          3 3
    
     20     1   -  Terminal    367492    367498       7          3 3
    
    
    Was this page helpful?
    Tag page (Edit tags)
    • No tags
    You must login to post a comment.