Was this page helpful?

Seq 204 Supplemental page

     

     

     

    Table S1 Sequence composition

    Table 1 Sequence composition (Finished in 09-16-2010 through EMBOSS-->
    NUCLEIC COMPOSITION-->Compseq-->default setting)
    
    # Output from 'compseq'
    #
    # The Expected frequencies are calculated on the (false) assumption that every
    # word has equal frequency.
    #
    # The input sequences are:
    #	seq204
    
    
    Word size	2
    Total count	399999
    
    #
    # Word	Obs Count	Obs Frequency	Exp Frequency	Obs/Exp Frequency
    #
    AA	33064		0.0826602	0.0625000	1.3225633
    AC	20757		0.0518926	0.0625000	0.8302821
    AG	25018		0.0625452	0.0625000	1.0007225
    AT	27595		0.0689877	0.0625000	1.1038028
    CA	26150		0.0653752	0.0625000	1.0460026
    CC	22581		0.0564526	0.0625000	0.9032423
    CG	15929		0.0398226	0.0625000	0.6371616
    CT	25876		0.0646902	0.0625000	1.0350426
    GA	25241		0.0631027	0.0625000	1.0096425
    GC	21369		0.0534226	0.0625000	0.8547621
    GG	22580		0.0564501	0.0625000	0.9032023
    GT	21328		0.0533201	0.0625000	0.8531221
    TA	21971		0.0549276	0.0625000	0.8788422
    TC	25832		0.0645802	0.0625000	1.0332826
    TG	26995		0.0674877	0.0625000	1.0798027
    TT	33774		0.0844352	0.0625000	1.3509634
    
    Other	3939		0.0098475	0.0000000	10000000000.0000000
    
    

    Table S2 Retroelements transposons and other repeat sequence analysis

    ==================================================
    file name: RM2_C__Documents_and_Settings_yu87_Desktop_Seq_204.txt_1284751110
    sequences:             1
    total length:     400000 bp  (396100 bp excl N/X-runs)
    GC level:         45.71 %
    bases masked:     297275 bp ( 74.32 %)
    ==================================================
                   number of      length   percentage
                   elements*    occupied  of sequence
    --------------------------------------------------
    Retroelements          148       284383 bp   71.10 %
       SINEs:                0            0 bp    0.00 %
       Penelope              0            0 bp    0.00 %
       LINEs:                2         1123 bp    0.28 %
        CRE/SLACS            0            0 bp    0.00 %
         L2/CR1/Rex          0            0 bp    0.00 %
         R1/LOA/Jockey       0            0 bp    0.00 %
         R2/R4/NeSL          0            0 bp    0.00 %
         RTE/Bov-B           1           70 bp    0.02 %
         L1/CIN4             1         1053 bp    0.26 %
       LTR elements:       146       283260 bp   70.81 %
         BEL/Pao             0            0 bp    0.00 %
         Ty1/Copia          69       119315 bp   29.83 %
         Gypsy/DIRS1        77       163945 bp   40.99 %
           Retroviral        0            0 bp    0.00 %
    
    DNA transposons         30        11512 bp    2.88 %
       hobo-Activator        6         1253 bp    0.31 %
       Tc1-IS630-Pogo        2          412 bp    0.10 %
       En-Spm               12         8155 bp    2.04 %
       MuDR-IS905            0            0 bp    0.00 %
       PiggyBac              0            0 bp    0.00 %
       Tourist/Harbinger     4          574 bp    0.14 %
       Other (Mirage,        0            0 bp    0.00 %
        P-element, Transib)
    
    Rolling-circles          0            0 bp    0.00 %
    
    Unclassified:            0            0 bp    0.00 %
    
    Total interspersed repeats:      295895 bp   73.97 %
    
    
    Small RNA:               0            0 bp    0.00 %
    
    Satellites:              1          179 bp    0.04 %
    Simple repeats:         15          652 bp    0.16 %
    Low complexity:         11          549 bp    0.14 %
    ==================================================
    
    * most repeats fragmented by insertions or deletions
      have been counted as one element
                                                          
    
    The query species was assumed to be zea           
    RepeatMasker version open-3.2.9 , default mode
                                       
    run with blastp version 3.0SE-AB [2009-10-30] [linux26-x64-I32LPF64 2009-10-30T17:06:09]
    RepBase Update 20090604, RM database version 20090604
    


     

    Table S3: CpG analysis. 

    Result of the EMBOSS cpgplot program. 

    CPGPLOT islands of unusual CG composition
    seq204 from 1 to 400000
     
         Observed/Expected ratio > 0.60
         Percent C + Percent G > 50.00
         Length > 200
     
     Length 419 (2271..2689)
     
     Length 407 (10036..10442)
     
     Length 336 (11951..12286)
     
     Length 531 (15444..15974)
     
     Length 553 (15986..16538)
     
     Length 636 (18299..18934)
     
     Length 389 (18947..19335)
     
     Length 359 (39432..39790)
     
     Length 241 (42473..42713)
     
     Length 869 (43734..44602)
     
     Length 684 (51940..52623)
     
     Length 206 (56579..56784)
     
     Length 317 (60203..60519)
     
     Length 215 (69781..69995)
     
     Length 254 (70522..70775)
     
     Length 709 (75821..76529)
     
     Length 226 (76829..77054)
     
     Length 427 (77059..77485)
     
     Length 202 (77846..78047)
     
     Length 410 (78126..78535)
     
     Length 362 (78567..78928)
     
     Length 399 (79104..79502)
     
     Length 353 (80459..80811)
     
     Length 236 (93061..93296)
     
     Length 236 (100021..100256)
     
     Length 235 (100533..100767)
     
     Length 261 (101946..102206)
     
     Length 281 (110795..111075)
     
     Length 233 (114226..114458)
     
     Length 295 (115567..115861)
     
     Length 455 (119665..120119)
     
     Length 271 (121304..121574)
     
     Length 765 (121911..122675)
     
     Length 207 (122714..122920)
     
     Length 469 (122999..123467)
     
     Length 392 (123483..123874)
     
     Length 363 (123921..124283)
     
     Length 1754 (124870..126623)
     
     Length 547 (126877..127423)
     
     Length 277 (127545..127821)
     
     Length 503 (130786..131288)
     
     Length 397 (131592..131988)
     
     Length 260 (132816..133075)
     
     Length 472 (136450..136921)
     
     Length 243 (137182..137424)
     
     Length 380 (141376..141755)
     
     Length 634 (142541..143174)
     
     Length 249 (145788..146036)
     
     Length 433 (150365..150797)
     
     Length 260 (152265..152524)
     
     Length 1080 (152588..153667)
     
     Length 750 (153692..154441)
     
     Length 1243 (154522..155764)
     
     Length 1053 (155778..156830)
     
     Length 203 (163478..163680)
     
     Length 235 (164020..164254)
     
     Length 907 (164267..165173)
     
     Length 856 (165849..166704)
     
     Length 256 (168392..168647)
     
     Length 371 (171575..171945)
     
     Length 266 (173602..173867)
     
     Length 217 (174194..174410)
     
     Length 728 (182327..183054)
     
     Length 310 (186395..186704)
     
     Length 376 (187210..187585)
     
     Length 404 (190476..190879)
     
     Length 933 (200000..200932)
     
     Length 252 (206506..206757)
     
     Length 793 (208939..209731)
     
     Length 200 (218820..219019)
     
     Length 566 (225721..226286)
     
     Length 223 (228876..229098)
     
     Length 256 (229329..229584)
     
     Length 307 (230702..231008)
     
     Length 580 (239134..239713)
     
     Length 588 (240371..240958)
     
     Length 226 (241293..241518)
     
     Length 262 (241765..242026)
     
     Length 480 (242413..242892)
     
     Length 342 (246550..246891)
     
     Length 384 (251147..251530)
     
     Length 214 (251713..251926)
     
     Length 339 (256384..256722)
     
     Length 224 (256951..257174)
     
     Length 570 (257882..258451)
     
     Length 507 (259349..259855)
     
     Length 1070 (259921..260990)
     
     Length 504 (261014..261517)
     
     Length 225 (261569..261793)
     
     Length 336 (261869..262204)
     
     Length 703 (262216..262918)
     
     Length 1206 (262983..264188)
     
     Length 347 (273469..273815)
     
     Length 224 (274991..275214)
     
     Length 628 (279726..280353)
     
     Length 255 (281605..281859)
     
     Length 458 (282433..282890)
     
     Length 354 (285892..286245)
     
     Length 317 (286383..286699)
     
     Length 326 (286769..287094)
     
     Length 260 (287101..287360)
     
     Length 819 (287362..288180)
     
     Length 203 (288416..288618)
     
     Length 853 (288677..289529)
     
     Length 493 (289548..290040)
     
     Length 259 (290049..290307)
     
     Length 763 (290328..291090)
     
     Length 276 (291442..291717)
     
     Length 336 (292364..292699)
     
     Length 631 (293522..294152)
     
     Length 504 (296632..297135)
     
     Length 255 (297749..298003)
     
     Length 224 (298376..298599)
     
     Length 213 (302925..303137)
     
     Length 460 (304414..304873)
     
     Length 340 (305785..306124)
     
     Length 265 (306505..306769)
     
     Length 202 (308905..309106)
     
     Length 533 (311920..312452)
     
     Length 1077 (312519..313595)
     
     Length 855 (313683..314537)
     
     Length 1778 (314611..316388)
     
     Length 208 (318157..318364)
     
     Length 271 (318551..318821)
     
     Length 216 (318942..319157)
     
     Length 652 (319191..319842)
     
     Length 720 (320112..320831)
     
     Length 819 (320901..321719)
     
     Length 770 (321984..322753)
     
     Length 444 (322946..323389)
     
     Length 516 (323394..323909)
     
     Length 451 (324603..325053)
     
     Length 600 (325062..325661)
     
     Length 722 (325677..326398)
     
     Length 445 (326549..326993)
     
     Length 271 (327011..327281)
     
     Length 711 (327609..328319)
     
     Length 254 (328357..328610)
     
     Length 685 (328642..329326)
     
     Length 938 (329336..330273)
     
     Length 822 (330502..331323)
     
     Length 222 (331496..331717)
     
     Length 730 (331724..332453)
     
     Length 299 (333196..333494)
     
     Length 420 (334444..334863)
     
     Length 248 (334945..335192)
     
     Length 802 (335271..336072)
     
     Length 324 (336241..336564)
     
     Length 306 (336899..337204)
     
     Length 595 (337708..338302)
     
     Length 770 (341237..342006)
     
     Length 276 (349880..350155)
     
     Length 321 (350893..351213)
     
     Length 205 (351219..351423)
     
     Length 399 (354373..354771)
     
     Length 238 (355607..355844)
     
     Length 323 (362515..362837)
     
     Length 208 (367182..367389)
     
     Length 306 (370015..370320)
     
     Length 578 (370833..371410)
     
     Length 471 (371889..372359)
     
     Length 652 (372572..373223)
     
     Length 203 (375531..375733)
     
     Length 287 (376271..376557)
     
     Length 297 (376635..376931)
     
     Length 710 (377285..377994)
     
     Length 1033 (378077..379109)
     
     Length 480 (379145..379624)
     
     Length 300 (379910..380209)
     
     Length 268 (380587..380854)
     
     Length 271 (380918..381188)
     
     Length 219 (382464..382682)
     
     Length 1072 (383755..384826)
     
     Length 935 (384849..385783)
     
     Length 800 (386079..386878)
     
     Length 364 (387101..387464)
     
     Length 1059 (387476..388534)
     
     Length 425 (388630..389054)
     
     Length 653 (389070..389722)
     
     Length 905 (389889..390793)
     
     Length 1079 (390859..391937)
     
     Length 287 (392001..392287)
     
     Length 859 (392797..393655)
     
     Length 344 (394191..394534)
     
     Length 542 (394544..395085)

     

    Figure 1. CpG analysis


     Seq204-cpgplot.png

    Table S4. FGENESH gene prediction

            program:[1]FGENESH, Softberry, Inc.

    data input: repeated masked sequence (seq204)

     

    FGENESH 2.6 Prediction of potential genes in Monocot genomic DNA
     Time    :   Sun Oct 24 09:30:48 2010
     Seq name: seq204 1600001-2000000 Xiaoqing Yu 09-16-2010
     Length of sequence: 400000
     Number of predicted genes 8: in +chain 3, in -chain 5.
     Number of predicted exons 34: in +chain 9, in -chain 25.
     Positions of predicted genes and exons: Variant   1 from   1, Score:532.120410
       G Str   Feature   Start        End    Score           ORF           Len

       1 -      PolA     32243                     0.44
       1 -    1 CDSl     32419 -     32718   15.92     32419 -     32718    300
       1 -    2 CDSi     32837 -     33184   15.72     32837 -     33184    348
       1 -    3 CDSi     33264 -     33362    8.66     33264 -     33362     99
       1 -    4 CDSi     33440 -     33532    0.37     33440 -     33532     93
       1 -    5 CDSi     33897 -     34147   12.19     33897 -     34145    249
       1 -    6 CDSi     34248 -     34392    9.23     34249 -     34392    144
       1 -    7 CDSi     34720 -     34774    1.13     34720 -     34773     54
       1 -    8 CDSi     35485 -     35601    5.36     35487 -     35600    114
       1 -    9 CDSi     35657 -     36021    1.86     35659 -     36021    363
       1 -   10 CDSi     37912 -     38123   12.90     37912 -     38121    210
       1 -   11 CDSf     43786 -     43858    8.53     43787 -     43858     72
       1 -      TSS      44869               -2.68

       2 -      PolA     76045               -9.56
       2 -    1 CDSl     76128 -     76539   26.56     76128 -     76538    411
       2 -    2 CDSf     76757 -     76824    2.01     76759 -     76824     66
       2 -      TSS      78551                     -1.88

       3 -      PolA    148828                    -4.26
       3 -    1 CDSl    149005 -    149361   30.79    149005 -    149361    357
       3 -    2 CDSi    149457 -    149762    7.15    149457 -    149762    306
       3 -    3 CDSf    151760 -    151894    9.95    151760 -    151894    135
       3 -      TSS     151985                     -9.98

       4 +      TSS     165010                    -0.28
       4 +    1 CDSf    166294 -    166992   43.73    166294 -    166992    699
       4 +    2 CDSi    167633 -    167755   10.48    167633 -    167755    123
       4 +    3 CDSi    167847 -    167913    6.65    167847 -    167912     66
       4 +    4 CDSl    168368 -    168372   -3.36    168370 -    168372      3
       4 +      PolA    168470                    -1.06

       5 -      PolA    175722                      0.44
       5 -    1 CDSl    176135 -    176237    4.98    176135 -    176236    102
       5 -    2 CDSi    176907 -    177193   12.63    176909 -    177193    285
       5 -    3 CDSi    177295 -    177372   12.04    177295 -    177372     78
       5 -    4 CDSi    182334 -    182518   22.17    182334 -    182516    183
       5 -    5 CDSf    182623 -    182686    9.75    182624 -    182686     63
       5 -      TSS     183205                     -8.38

       6 +      TSS     312275                     -3.48
       6 +    1 CDSf    312591 -    313327   49.27    312591 -    313325    735
       6 +    2 CDSi    313684 -    314533   90.01    313685 -    314533    849
       6 +    3 CDSl    314638 -    316446  171.89    314638 -    316446   1809
       6 +      PolA    317010                     0.44

       7 +      TSS     341193                     -5.48
       7 +    1 CDSf    341280 -    341463   24.50    341280 -    341462    183
       7 +    2 CDSl    341580 -    341881   25.36    341582 -    341881    300
       7 +      PolA    343037                     0.44

       8 -      PolA    347134                      0.44
       8 -    1 CDSl    347278 -    347297    3.25    347278 -    347295     18
       8 -    2 CDSi    348407 -    348761   14.05    348408 -    348761    354
       8 -    3 CDSi    348867 -    348982    4.19    348867 -    348980    114
        8 -    4 CDSf    349143 -    349188    3.96    349144 -    349188     45
               8 -      TSS     349277                     -6.88

    Table S5 GeneMark gene Prediction

            Program: Eukaryotic GeneMark.hmm

    input: Repeat Masked Seq204

    References:
    Lomsadze A., Ter-Hovhannisyan V., Chernoff Y. and Borodovsky M.,
    "Gene identification in novel eukaryotic genomes by self-training algorithm",
    Nucleic Acids Research, 2005, Vol. 33, No. 20, 6494-6506

     

    GeneMark.hmm (Version 2.2a)
    Sequence name: >seq204 1600001-2000000 Xiaoqing Yu 09-16-2010
    
    Sequence length: 400000 bp
    G+C content: 85.88%
    Matrix: corn
    Sun Oct 24 09:56:32 2010
    
    Predicted genes/exons
    
    Gene Exon Strand Exon           Exon Range     Exon      Start/End
      #    #         Type                         Length       Frame
    
      1     1   +  Initial      15815     15933     119          1 2
      1     2   +  Terminal     15999     16050      52          3 3
    
      2     1   +  Initial      18719     18889     171          1 3
      2     2   +  Terminal     18979     19107     129          1 3
    
      3     1   +  Initial      55692     55703      12          1 3
      3     2   +  Terminal     56473     56778     306          1 3
    
      4     3   -  Terminal     75955     76057     103          3 3
      4     2   -  Internal     76135     76539     405          2 3
      4     1   -  Initial      76757     76824      68          2 1
    
      5     1   +  Initial     103450    103473      24          1 3
      5     2   +  Terminal    103568    103639      72          1 3
    
      6     3   -  Terminal    142875    142976     102          3 1
      6     2   -  Internal    143016    143153     138          3 1
      6     1   -  Initial     143495    143554      60          3 1
    
      7     3   -  Terminal    149005    149361     357          3 1
      7     2   -  Internal    149457    149514      58          3 3
      7     1   -  Initial     150323    150372      50          2 1
    
      8     2   -  Terminal    150427    150515      89          3 2
      8     1   -  Initial     150692    150806     115          1 1
    
      9     1   +  Initial     164826    164872      47          1 2
      9     2   +  Internal    166152    166992     841          3 3
      9     3   +  Internal    167633    167755     123          1 3
      9     4   +  Terminal    167847    167939      93          1 3
    
     10     8   -  Terminal    172660    172816     157          3 3
     10     7   -  Internal    173046    173128      83          2 1
     10     6   -  Internal    173323    173418      96          3 1
     10     5   -  Internal    173675    173779     105          3 1
     10     4   -  Internal    182334    182518     185          3 2
     10     3   -  Internal    182623    182735     113          1 3
     10     2   -  Internal    182977    183041      65          2 1
     10     1   -  Initial     183590    183679      90          3 1
    
     11     1   +  Initial     226021    226122     102          1 3
     11     2   +  Terminal    229998    230090      93          1 3
    
     12     1   +  Initial     279709    279781      73          1 1
     12     2   +  Terminal    279901    279998      98          2 3
    
     13     1   +  Initial     312591    313609    1019          1 2
     13     2   +  Internal    313708    314533     826          3 3
     13     3   +  Terminal    314638    316446    1809          1 3
    
     14     1   +  Initial     341280    341463     184          1 1
     14     2   +  Terminal    341580    341881     302          2 3
    
     15     1   -  Terminal    355721    355843     123          3 1
    
    

    Was this page helpful?
    Tag page (Edit tags)
    • No tags

    Files 1

    FileSizeDateAttached by 
     Seq204-cpgplot.png
    No description
    8.8 kB00:00, 8 Dec 2010yu87Actions
    You must login to post a comment.