Overall Characteristics

    Base and Dinucleotide content (EMBOSS compseq)

    Total count	150000
    # Word	Obs Count	Obs Frequency	Exp Frequency	Obs/Exp Frequency
    A	41947		0.2796467	0.2500000	1.1185867
    C	32635		0.2175667	0.2500000	0.8702667
    G	33334		0.2222267	0.2500000	0.8889067
    T	41084		0.2738933	0.2500000	1.0955733
    Other 	1000		0.0066667	0.0000000	NA
    
    AA	13686		0.0912406	0.0782023	1.1667260
    AC	8058		0.0537204	0.0608418	0.8829516
    AG	9289		0.0619271	0.0621449	0.9964942
    AT	10912		0.0727472	0.0765934	0.9497841
    CA	10173		0.0678205	0.0608418	1.1147017
    CC	7929		0.0528604	0.0473353	1.1167227
    CG	5280		0.0352002	0.0483491	0.7280430
    CT	9249		0.0616604	0.0595901	1.0347432
    GA	9616		0.0641071	0.0621449	1.0315737
    GC	7499		0.0499937	0.0483491	1.0340141
    GG	8082		0.0538804	0.0493847	1.0910336
    GT	8135		0.0542337	0.0608664	0.8910284
    TA	8471		0.0564737	0.0765934	0.7373186
    TC	9144		0.0609604	0.0595901	1.0229962
    TG	10680		0.0712005	0.0608664	1.1697829
    TT	12786		0.0852406	0.0750176	1.1362749

    Other is presumably 'N' bases, indeed I can see there are 10 regions of Ns in this sequence, each 100 bases.  From this, I infer that the Ns do not represent actual estimated gap lengths in the scaffold.  Note CG sequences are lower than random expectation (but only barely lower than TA).  the overall GC content, 43.98%  is similar but significantly lower than the overall GC content of the Maize genome (46.5%1).  This suggests that this segment of DNA may have a lower than average gene density.

    Transposons and Other Repeats (Repeatmasker)

    repeats.jpg


    About 95 kbp (63.5 %) of this sequence comprises repeated sequences.  The majority of the repeat

    s are Gypsy and Copia type retrotransposons (91.7% of repeated sequence).  The relatively high content of transposons may account for the relatively low GC content.  In comparison, the overall Maize genome comprises only 30.7% repeats, of which 61.3% are Copia and Gypsy retroelements1.  The overall distribution of repeats is shown to the left.  repeats on the forward strand are shown in blue, and on the reverse strand in red.  The two lower tracks show the GC content (top) and GC skew,  (G - C) / (G + C), (bottom), with positive values shown in yellow and negative values in purple.  GC skew has been found to be high near transcriptional start sites in plants2.  Some of the highest GC skew values are associated with transposon-poor regions of the sequence suggesting the presence of non-transposon genes.

    total length:     150000 bp  (149000 bp excluding N/X-runs)
    GC level:         44.27 %
    bases masked:      95331 bp ( 63.55 %)
    ====================================================
                   number of      length   percentage
                   elements*    occupied  of sequence
    ----------------------------------------------------
    Retroelements           44        87627 bp   58.42 %
       SINEs:                0            0 bp    0.00 %
       Penelope              0            0 bp    0.00 %
       LINEs:                1          180 bp    0.12 %
        CRE/SLACS            0            0 bp    0.00 %
         L2/CR1/Rex          0            0 bp    0.00 %
         R1/LOA/Jockey       0            0 bp    0.00 %
         R2/R4/NeSL          0            0 bp    0.00 %
         RTE/Bov-B           0            0 bp    0.00 %
         L1/CIN4             1          180 bp    0.12 %
       LTR elements:        43        87447 bp   58.30 %
         BEL/Pao             0            0 bp    0.00 %
         Ty1/Copia          19        64439 bp   42.96 %
         Gypsy/DIRS1        24        23008 bp   15.34 %
           Retroviral        0            0 bp    0.00 %
    
    DNA transposons         17         5884 bp    3.92 %
       hobo-Activator        9         4323 bp    2.88 %
       Tc1-IS630-Pogo        1          144 bp    0.10 %
       En-Spm                0            0 bp    0.00 %
       MuDR-IS905            0            0 bp    0.00 %
       PiggyBac              0            0 bp    0.00 %
       Tourist/Harbinger     4          444 bp    0.30 %
       Other (Mirage,        0            0 bp    0.00 %
        P-element, Transib)
    
    Rolling-circles          0            0 bp    0.00 %
    
    Unclassified:            8          800 bp    0.53 %
    
    Total interspersed repeats:       94311 bp   62.87 %
    
    
    Small RNA:               0            0 bp    0.00 %
    
    Satellites:              0            0 bp    0.00 %
    Simple repeats:          7          754 bp    0.50 %
    Low complexity:          6          266 bp    0.18 %
    

     

    Gene Predictions

    Summary Gene Predictions

    Gene predictions were made with Genemark (http://exon.gatech.edu/eukhmm.cgi) and FGENESH using Maize parameters.

    Begin  End Comments
    43209 47476

    Gene 7-1 [details]

    Appears to be a member of the RENi family. Matches are seen throughout the land plants including Physcomitrella, monocots, and dicots.  The protein comprises a SUMO-like domain fused to an upstream domain.  This overall organization is reminiscent of RAD60 or NIP45, which have been termed RENi proteins (Rad60-Esc2p-Nip45). This family of proteins possesses both SUMO-like domains and SUMO binding motifs; in RAD60 these domains apparently function in self-association.  While the family was originally identified in animals, plant proteins belonging to this family were identified in Arabidopsis, Zea, and Oryza4.

       

    Strong EST Support for this gene model (> 10 ESTs for most exons)

          43209           43361           UTR          
          43363           43370           exon 1
          43832           43925           exon 2
          43994           44061           exon 3
          44147           44286           exon 4
          45831           46044           exon 5
          46961           47123           exon 6
          47124           47476           UTR
    70370 71365

    Pseudogene 7-2 [details]

    Very significant matches to protein phosphatase 2c, but only the middle half of the protein appears to be present.  Even in the matching regions, many missense and frameshift errors must be assumed to construct a gene model.  The presence of nearby matches to a retrotransposon gag polyprotein further suggests that this gene may have been interupted by a transposon insertion.  ESTs that match in this region show relatively poor matches (<95% identity) suggesting that they do not come from this gene region - there are no high identity matches, implying this region is not transcribed.

    82688 71953

    Gene 7-4 [details]

    Homolog of Arabidopsis LEUNIG, a WD-40 transcriptional co-repressor.  Very good EST support for a full length gene.  Several alternatively spliced regions can be seen from the ESTs.  FGENESH suggests a TSS at 82688 while thye farthest upstream EST is at 82440, a TSS near either postions seems equally plausible.  In the gene model below, all slice junctions are canonical and consistant with available EST information (991 ESTs).  The final gene model relies most heavily on the 312 ESTs with >95% identity to the genomic sequence.

    In plants, LEUNIG is usually present as two homologs, referred to as LEUNIG (LUG) and LEUNIG-homolog (LUG).  The genes have at least partial functional overlap based on knockout analysis (Sitarman et al., 2008). The gene model prsented below is an end to end match with other plant Leunig genes.

    LEUNIG is a transcriptional co-repressor that lacks a DNA interaction domain.  Interaction specificity is usually provided by an adaptor protein such as SEUSS (SEU).  The LUG/SEUSS complex appears act similarly to the yeast  Tup1/Ssn6 complex, and is believed to be a repressor of AGAMOUS (AG) (Liu & Meyerowitz, 1995).  Lug/luh mutants also show defects in leaf growth and polarity (Stahle et al., 2009), and possibly act in conjunction with YAB proteins.

    71953 72256   UTR
    72257 72400   exon 18                    
    72736 72942   exon 17               
    73430 73549   exon 16               
    73650 73814   exon 15               
    73902 74051   exon 14               
    74136 74245   exon 13               
    74629 74746   exon 12               
    74853 75083   exon 11               
    75647 75844   exon 10               
    77217 77513   exon 9               
    77905 77954   exon 8               
    78111 78186   exon 7               
    78290 78380   exon 6               
    79556 79641   exon 5               
    79739 80149   exon 4 
    80684 80793   exon 3              
    80917 81014   exon 2               
    82020 82051   exon 1  
    82052 82688   UTR   
    
    82000 99000 There are several possible matches in this region: a calcium dependent or calcium/calmodulin dependent kinase, a partial nucleolin, and a fragment of an E3 ubiquitin ligase.  I have not been able to complete a full length gene model for any of them, and there is no EST support for genes in this region. The full length/long cDNAs that map here are, at least for the most part, fairly low identity.
         
         
         
         

     

    Working List

    NP = Not predicted

    Gene Genmark
    Begin
    Genemark
    End
    FGENESH
    Begin
    FGENESH EST Blast
                 
     7-5 125041 125158       protein phosphatase 2c, putative.
     7-5.1  125041  125158        
     7-5.2  125225  125313        
                 
     7-6 125868 127543        
     7-6.1 125868 127262        
     7-6.2 127373 127543        
                 
    7-7 136895 131540        
    7-7.1 136895 135874        
    7-7.2 135786 135506        
    7-7.3 132211 131844        
    7-7.4 131818 131540        
                 
    7-8 145538 145410        
    7-8.1 145538 145410        

    Putative genes with mRNA/cDNA support

    Blast search (NCBI) filtered for Z. mays specific repeats then filtered for genes that have mRNA/cDNA support.

    Needs color key.  I'm guessing, black are not interesting, yellow maybe, and pink definitely?  But red seems to mean a pseudogene.

    Accession Description Max score Total score Query coverage E value Max ident Begin End
    FJ614806.1 Similarity to Zea mays cultivar B73 region flanking p gene cluster on Chr 1, but not P genes themselves 5777 22636 9% 0 93% 1734 2996
    NM_001143648 PUTATIVE LEUNIG-LIKE TRANSCRIPTION FACTOR            4,180 2,434
    EU940857.1 Zea mays clone 1164521 mRNA sequence 991 2274 1% 0 94% 4196 3305
    EU940857.1 Zea mays clone 1164521 mRNA sequence           6686 6666
    EU952235.1 likely retroelement polyprotein           10941 11795
    EU956257.1 Zea mays clone 1559671 hypothetical protein mRNA, complete cds 1095 1095 0% 0 92% 11836 12618
    EU943223.1 may be 3' end of this gene?           12615 12866
    EU948150.1 Only mRNA hits are from Ceres project, rest are either genomic or retroelement           12895 13820
    EU940857.1 Zea mays clone 1164521 mRNA sequence 991 2274 1% 0 94% 15122 14331
    NM_001176968.1 RETROELEMENT           17,647 16,295
    EU949068.1 Putative hAT family Tpase like protein           17,596 18,842
    NM_001175949.1 RETROELEMENT 1449 1562 0% 0 90% 22486 22627
      RETROELEMENT           24955 24210
    NM_001196821.1 RETROELEMENT           26,819 25,703
    EU944967.1 Zea mays clone 236552 mRNA sequence, not retro element, no hits in blastx search 2502 2502 0% 0 99% 33076 31195
    AF465266.1 CytP450 monooxygenase CYP72A27           33989 33389
    NM_001196821.1 RETRO           35,376 36,261
    AY883559.2 tga1 fragmented pseudogene           36300 37407
    X15704.1 alpha tubulins 1 and 2           42354 42466
    AY110978.1 Zea mays CL7313_1 mRNA sequence, ZmATG12 attached to SUMO? Homologs in six grasses 1095 1715 0% 0 100% 43209 47,770
    NM_001177137.1 Possible three RNAs corresponding  to cytosolic aldehyde dehydrogenase, may contain sequencing errors           ~47500 ~48800
    XM_002447018.1 Sorghum bicolor hypothetical protein, mRNA 1099 1099 0% 0 87% 49869 48971
    No maize RNA match, BUT… Significant homology to RNAs in barley, sorghum, rice, Brachy, others.  Medicago and Arabidopsis nuclear posre complex protein           ~50000 ~49000
      Homology only to genomic DNA, no mRNAs from any species           ~50000 ~51000
    EU966944.1  hypothetical protein, mRNA 1020 2067 1% 0 83% 51993 51285
    EU949068.1             52,068 52,937
    X81828.1 Z.mays CYP71C1 gene for cytochrome P-450 1222 2355 1% 0 81% 52112 53445
    NM_001176828.1 RETROELEMENT           57535 56500
    NM_001150942.1 Still checking           69170 68166
    X73151.1 Z.mays GapC2 gene 1552 2492 1% 0 86% 70143 71729
    EU943322.1 Zea mays clone 1599166 mRNA sequence 1579 2404 1% 0 87% 70271 71729
    NM_001176968.1

    Maize oleosin gene?  Has upstream, in-frame stop codon and protein does not start until nt 550 of 1 kb mRNA--check for sequencing errors.  Zea mays full-length cDNA clone ZM_BFb0384J11 mRNA, complete cds

    i don't see any oleosin gene in blastx search, only ppc2 as discussed above.

    1400 2018 1% 0 90% 71731 70700
    BT067929.1 full length cDNA clone           79,933 72,015
    NM_001196307.1

    Zea mays uncharacterized LOC100501624 (LOC100501624), mRNA

              82,296 79,559
    EU964336.1

    Zea mays clone 278065 mRNA sequence

    blastx: no signif. similarity

    1281 1328 0% 0 89% 82772 83719
    NM_001165861.1

    Zea mays uncharacterized LOC100304427 (LOC100304427), mRNA >gb|BT061124.2| Zea mays full-length cDNA clone ZM_BFb0108N24 mRNA, complete cds

    blastx: no signif. similarity

    1115 1115 0% 0 85% 83715 82769
    NM_001175949.1 RETROELEMENT 1449 1562 0% 0 90% 83836 82271
    X58700.1 RETROELEMENT 1016 1016 0% 0 79% 84059 82772
      RETROELEMENT           85966 84933
    NM_001196821.1 RETROELEMENT           88599 86656
    NM_001174429.1

    Zea mays uncharacterized LOC100381611 (LOC100381611), mRNA

    very partial (110/567) E3 ubiquitin protein ligase

              89,877 89,595
    NM_001150968.1

    Zea mays uncharacterized LOC100277401 (LOC100277401), mRNA

    1e-12 match to chloroquine resistance trancporter

              91,526 90,076
    BT024069.1

    Zea mays clone EL01N0450D06 mRNA sequence

    very good match to rice nucleolin like protein (1e-105).  but only 180-634 og 707 present in this clone

              92,144 91,820
                     
    NM_001176828.1 RETROELEMENT           99,410 98,375
    NM_001196821.1 RETROELEMENT           106438 108835
    NM_001196821.1 RETROELEMENT           111191 112731
    NM_001175949.1 RETROELEMENT           114460 113132
    NM_001196821.1 RETROELEMENT           120730 122722
    NM_001176828.1 RETROELEMENT           123541 125392
    XM_002455883.1 Sorghum bicolor hypothetical protein, mRNA 2259 2259 1% 0 91% 125873 127460
    EU944967.1 Zea mays clone 236552 mRNA sequence 2502 2502 0% 0 99% 126830 128,218
    NM_001149643.1 alternative 3' end of sequence above?           128830 129,020
    BT068923.2 Zea mays full-length cDNA clone ZM_BFc0009K22 mRNA, complete cds, antisense of 3' end for NM_001152559.1 1097 1250 0% 0 86% 131272 132588
    BT086111.2 Zea mays full-length cDNA clone ZM_BFc0117A05 mRNA, complete cds, antisense to 5' end of NM_001152559.1 2073 2073 0% 0 100% 135,854 137,002
    NM_001152559.1 Zea mays uncharacterized LOC100279562 (LOC100279562), mRNA           137,042 132217
    NM_001050069.2 Oryza sativa Japonica Group Os01g0607900 (Os01g0607900) mRNA, complete cds; receptor-like protein kinase 1 1310 2090 1% 0 86% 136818 135500
    JQ420135.1 Oryza sativa Indica Group receptor-like protein kinase 1 mRNA, complete cds 1292 2060 1% 0 86% 136818 135505
    XM_002458086.1 Sorghum bicolor hypothetical protein, mRNA 1952 2927 1% 0 92% 136895 131540
    EU956737.1 Zea mays clone 1572084 mRNA sequence 2037 2037 1% 0 88% 137042 135502
    NM_001152559.1 Zea mays uncharacterized LOC100279562 (LOC100279562), mRNA >gb|BT067205.1| Zea mays full-length cDNA clone ZM_BFb0287F05 mRNA, complete cds 2037 3127 1% 0 89% 137088 131272
    EU957017.1 Zea mays clone 1580606 mRNA sequence 1579 2840 1% 0 89% 137104 131296
    EU974079.1 Zea mays clone 439222 integral membrane protein like mRNA, complete cds           146,423 147,340
    BT064696.1 Zea mays full-length cDNA clone ZM_BFb0038L06 mRNA, complete cds., alternatively spliced form of EU974079.1           146,423 147,518

    Supplemental Data

    Genemark

    Genmark was run at its website using the Z. mays parameters.

    Genemark on Masked Sequence

    Relatively few predicted genes are found using the masked sequence.

    GeneMark.hmm (Version 2.2a)
    Sequence name: Seq 7
    Sequence length: 150000 bp
    G+C content: 80.41%
    Matrix: corn
    Fri Oct 19 09:52:40 2012
    
    Predicted genes/exons
    
    Gene Exon Strand Exon           Exon Range     Exon      Start/End
      #    #         Type                         Length       Frame
    
      1     1   +  Initial      43362     43370       9          1 3
      1     2   +  Internal     43832     43925      94          1 1
      1     3   +  Internal     43994     44061      68          2 3
      1     4   +  Internal     44147     44286     140          1 2
      1     5   +  Internal     45831     46044     214          3 3
      1     6   +  Internal     46813     46857      45          1 3
      1     7   +  Terminal     46961     47128     168          1 3
    
      2     1   +  Initial      70370     70429      60          1 3
      2     2   +  Terminal     70764     70940     177          1 3
    
      3     1   +  Initial      71005     71057      53          1 2
      3     2   +  Terminal     71233     71365     133          3 3
    
      4    19   -  Terminal     72254     72400     147          3 1
      4    18   -  Internal     72834     72942     109          3 3
      4    17   -  Internal     73509     73549      41          2 1
      4    16   -  Internal     73650     73814     165          3 1
      4    15   -  Internal     73920     74032     113          3 2
      4    14   -  Internal     74629     74746     118          1 1
      4    13   -  Internal     75647     75748     102          3 1
      4    12   -  Internal     77217     77513     297          3 1
      4    11   -  Internal     77905     77954      50          3 2
      4    10   -  Internal     78111     78186      76          1 1
      4     9   -  Internal     78271     78345      75          3 1
      4     8   -  Internal     80684     80793     110          3 2
      4     7   -  Internal     80917     80968      52          1 1
      4     6   -  Internal     90279     90329      51          3 1
      4     5   -  Internal     90416     90451      36          3 1
      4     4   -  Internal     91822     91976     155          3 2
      4     3   -  Internal     92096     92195     100          1 1
      4     2   -  Internal     92747     92785      39          3 1
      4     1   -  Initial      93769     94326     558          3 1
    
      5     1   +  Initial     125041    125158     118          1 1
      5     2   +  Terminal    125225    125313      89          2 3
    
      6     1   +  Initial     125868    127262    1395          1 3
      6     2   +  Terminal    127373    127543     171          1 3
    
      7     4   -  Terminal    131540    131818     279          3 1
      7     3   -  Internal    131844    132211     368          3 2
      7     2   -  Internal    135506    135786     281          1 3
      7     1   -  Initial     135874    136895    1022          2 1
    
      8     1   -  Terminal    145410    145538     129          3 1

    Genemark on Unmasked Sequence

    This is a sanity check to make sure that parts of genes are not cut off when using the masked sequence.  Indeed, in addition to additional predicted genes, there are some changes to the gene models of genes 7-4 and 7-8.  Gene 7-4 was already suspicious based on the blast search which suggested at least two different proteins in the gene model (fused genes).

    GeneMark.hmm (Version 2.2a)
    Sequence length: 150000 bp
    G+C content: 44.65%
    Matrix: corn
    
    Predicted genes/exons
    
    Gene Exon Strand Exon           Exon Range     Exon      Start/End
      #    #         Type                         Length       Frame
    
      1     1   +  Initial       1840      2364     525          1 3     BlastP: No hits < E=1 
      1     2   +  Terminal      2513      3091     579          1 3
    
      2     4   -  Terminal     26468     26542      75          3 1     BlastP: Retrovirus Pol protein
      2     3   -  Internal     26614     27546     933          3 1
      2     2   -  Internal     29501     30469     969          3 1
      2     1   -  Initial      30903     30908       6          3 1
    
      3     1   +  Initial      34871     35006     136          1 1     BlastP: uncharacterized Mt protein   E=0.002
      3     2   +  Internal     35106     35286     181          2 2             retrovirus Pol polyprotein   E=0.045
      3     3   +  Internal     35393     35557     165          3 2             CBL interact. prot. kinase 9 E=0.11
      3     4   +  Internal     35662     35737      76          3 3
      3     5   +  Terminal     36307     36390      84          1 3
    
      4     1   +  Initial      37572     37654      83          1 2     BlastP: Put. F-box/FBD/LRR-repeat E=0.008
      4     2   +  Internal     37760     38087     328          3 3
      4     3   +  Internal     38194     38311     118          1 1
      4     4   +  Terminal     38759     38883     125          2 3
    
      5     2   -  Terminal     39634     39909     276          3 1     BlastP: Metal tolerance protein 4 E=0.049
      5     1   -  Initial      40017     40058      42          3 1             Put. F-box/LRR-repeat     E=0.055 
    
      6     1   +  Initial      43362     43370       9          1 3     gene 7-1
      6     2   +  Internal     43832     43925      94          1 1     BlastP: Put. SUMO E=5e-07 
      6     3   +  Internal     43994     44061      68          2 3
      6     4   +  Internal     44147     44286     140          1 2
      6     5   +  Internal     45831     46044     214          3 3
      6     6   +  Internal     46813     46857      45          1 3
      6     7   +  Terminal     46961     47128     168          1 3
    
      7     2   -  Terminal     66827     67053     227          3 2     BlastP: Agamous-like MADS-box protein E=0.011
      7     1   -  Initial      67153     67480     328          1 1
    
      8     1   +  Initial      70370     70429      60          1 3     gene 7-2
      8     2   +  Terminal     70764     70940     177          1 3     BlastP: Prob. protein phosphatase E=1e-17
    
      9     1   +  Initial      71005     71057      53          1 2     gene 7-3
      9     2   +  Terminal     71233     71365     133          3 3     BlastP: Prob. protein phosphatase 2C E=5e-10
    
     10    15   -  Terminal     72254     72400     147          3 1     This gene appears as gene 7-4 in the masked
     10    14   -  Internal     72834     72942     109          3 3     version, but begins in a different place and 
     10    13   -  Internal     73509     73549      41          2 1     has an additional intron.  The 7-4 version
     10    12   -  Internal     73650     73814     165          3 1     clearly corresponds to two genes, it looks
     10    11   -  Internal     73920     74032     113          3 2     like they are split apart here, 
     10    10   -  Internal     74629     74746     118          1 1     see gene 11 and 12 immediately below.
     10     9   -  Internal     75647     75748     102          3 1     BlastP: Trans. corep. LEUNIG E=9e-59
     10     8   -  Internal     77217     77513     297          3 1             EMBRYO DEFECTIVE 2776 E=1e-09
     10     7   -  Internal     77905     77954      50          3 2             Mut11p E=-1-07
     10     6   -  Internal     78111     78186      76          1 1
     10     5   -  Internal     78271     78345      75          3 1
     10     4   -  Internal     79739     80149     411          3 1
     10     3   -  Internal     80684     80793     110          3 2
     10     2   -  Internal     80917     80968      52          1 1
     10     1   -  Initial      81460     81468       9          3 1
    
     11     1   +  Initial      84052     84270     219          1 3     BlastP: Pyruvate dehydrogenase E1 E=0.070
     11     2   +  Terminal     84592     84669      78          1 3             Put. membrane protein ycf1 E=0.086
    
     12     9   -  Terminal     86893     86951      59          3 2     BlastP: Retrovirus-related Pol polyprotein
     12     8   -  Internal     87496     88872    1377          1 2
     12     7   -  Internal     88969     89326     358          1 1
     12     6   -  Internal     90279     90329      51          3 1
     12     5   -  Internal     90416     90451      36          3 1
     12     4   -  Internal     91822     91976     155          3 2
     12     3   -  Internal     92096     92195     100          1 1
     12     2   -  Internal     92747     92785      39          3 1
     12     1   -  Initial      93769     94326     558          3 1
    
     13     2   -  Terminal     96091     97526    1436          3 2     BlastP: Retrovirus-related Pol polyprotein
     13     1   -  Initial      97716     98037     322          1 1
    
     14     1   +  Initial      99418    100170     753          1 3     BlastP: Retrovirus-related Pol polyprotein
     14     2   +  Internal    100645    100813     169          1 1
     14     3   +  Internal    102372    102614     243          2 1
     14     4   +  Internal    102804    103756     953          2 3
     14     5   +  Internal    105711    106068     358          1 1
     14     6   +  Internal    106168    106222      55          2 2
     14     7   +  Internal    106374    106643     270          3 2
     14     8   +  Internal    106716    106791      76          3 3
     14     9   +  Internal    107343    107387      45          1 3
     14    10   +  Internal    107485    107565      81          1 3
     14    11   +  Internal    110387    111142     756          1 3
     14    12   +  Internal    111314    112099     786          1 3
     14    13   +  Terminal    112363    112422      60          1 3
    
     15     1   +  Initial     116581    116890     310          1 1     BlastP: Retrovirus-related Pol polyprotein
     15     2   +  Internal    117092    118377    1286          2 3
     15     3   +  Internal    120008    120270     263          1 2
     15     4   +  Internal    120666    120932     267          3 2
     15     5   +  Internal    121004    121079      76          3 3
     15     6   +  Internal    121630    121674      45          1 3
     15     7   +  Internal    121772    121852      81          1 3
     15     8   +  Terminal    124632    124700      69          1 3
    
     16     1   +  Initial     125041    125158     118          1 1     gene 7-5
     16     2   +  Terminal    125225    125313      89          2 3     BlastP: Prob. protein phosphatase 2C E=2e-09
    
     17     1   +  Initial     125868    127262    1395          1 3     gene 7-6
     17     2   +  Terminal    127373    127543     171          1 3     BlastP: Pentatricopeptide repeat-containing protein E=3e-133
    
     18     3   -  Terminal    131540    131818     279          3 1     gene 7-7, but starts at second exon
     18     2   -  Internal    131844    132211     368          3 2     BlastP:  LRR Receptor-like Prot. Kin E=6e-176
     18     1   -  Initial     135506    136895    1390          1 1
    
     19    10   -  Terminal    138092    138807     716          3 2     BlastP: Peroxisomal (S)-2-hydroxy-acid oxidase E=0.017
     19     9   -  Internal    139042    139214     173          1 3             Uncharacterized mitochondrial protein E=0.071
     19     8   -  Internal    139310    139399      90          2 3
     19     7   -  Internal    139551    139654     104          2 1
     19     6   -  Internal    139970    140137     168          3 1
     19     5   -  Internal    140477    140545      69          3 1
     19     4   -  Internal    140959    141015      57          3 1
     19     3   -  Internal    141392    143079    1688          3 2
     19     2   -  Internal    143366    143435      70          1 1
     19     1   -  Initial     145233    145304      72          3 1
    
     20     2   -  Terminal    145410    145538     129          3 1     gene 7-8, with additional initial exon
     20     1   -  Initial     145619    145642      24          3 1     BlastP: Urocanate hydratase E=0.31     
    
     21     1   +  Initial     149545    149601      57          1 3     BlastP: Formin-like protein E=0.59
     21     2   +  Internal    149707    149812     106          1 1

    FGENESH

    Masked Sequence

     FGENESH 2.6 Prediction of potential genes in Monocot genomic DNA
     Time    :   Sat Oct 20 09:14:36 2012
     Seq name: sequence_7 
     Length of sequence: 150000 
     Number of predicted genes 5: in +chain 2, in -chain 3.
     Number of predicted exons 41: in +chain 9, in -chain 32.
     Positions of predicted genes and exons: Variant   1 from   1, Score:428.246191 
       G Str   Feature   Start        End    Score           ORF           Len
    
       1 +      TSS      42802               -6.28
       1 +    1 CDSf     43362 -     43370    6.92     43362 -     43370      9
       1 +    2 CDSi     43832 -     43925   11.45     43832 -     43924     93
       1 +    3 CDSi     43994 -     44061   10.88     43996 -     44061     66
       1 +    4 CDSi     44147 -     44286   11.16     44147 -     44284    138
       1 +    5 CDSi     45831 -     46044   22.65     45832 -     46044    213
       1 +    6 CDSi     46401 -     46496    1.28     46401 -     46496     96
       1 +    7 CDSi     46961 -     47107   26.38     46961 -     47107    147
       1 +    8 CDSl     47156 -     47425   -3.88     47156 -     47425    270
       1 +      PolA     47587                0.44
    
       2 -      PolA     70648                0.44
       2 -    1 CDSl     71480 -     71584   -2.86     71480 -     71584    105
       2 -    2 CDSi     72290 -     72400   17.10     72290 -     72400    111
       2 -    3 CDSi     72736 -     72942    6.24     72736 -     72942    207
       2 -    4 CDSi     73430 -     73549    6.13     73430 -     73549    120
       2 -    5 CDSi     73650 -     73814    5.00     73650 -     73814    165
       2 -    6 CDSi     73902 -     74051    5.01     73902 -     74051    150
       2 -    7 CDSi     74136 -     74245    7.47     74136 -     74243    108
       2 -    8 CDSi     74629 -     74746   16.40     74630 -     74746    117
       2 -    9 CDSi     74853 -     75083    6.95     74853 -     75083    231
       2 -   10 CDSi     75647 -     75847   13.84     75647 -     75847    201
       2 -   11 CDSi     77217 -     77513   22.26     77217 -     77513    297
       2 -   12 CDSi     77905 -     77954    5.08     77905 -     77952     48
       2 -   13 CDSi     78111 -     78186    6.75     78112 -     78186     75
       2 -   14 CDSi     78271 -     78345    4.10     78271 -     78345     75
       2 -   15 CDSi     79113 -     79218   -2.72     79113 -     79217    105
       2 -   16 CDSi     79556 -     79641    5.90     79558 -     79641     84
       2 -   17 CDSi     80684 -     80793    5.07     80684 -     80791    108
       2 -   18 CDSi     80917 -     81014    1.95     80918 -     81013     96
       2 -   19 CDSf     82020 -     82051    4.10     82022 -     82051     30
       2 -      TSS      82606               -7.68
    
       3 -      PolA     89983               -1.06
       3 -    1 CDSl     90267 -     90278   -3.17     90267 -     90278     12
       3 -    2 CDSi     90416 -     90451    7.30     90416 -     90451     36
       3 -    3 CDSi     90527 -     90626   -2.08     90527 -     90625     99
       3 -    4 CDSi     91484 -     91521    4.03     91486 -     91521     36
       3 -    5 CDSi     91822 -     91976   14.35     91822 -     91974    153
       3 -    6 CDSi     92096 -     92274   19.45     92097 -     92273    177
       3 -    7 CDSi     92310 -     92396    1.41     92312 -     92395     84
       3 -    8 CDSi     92655 -     92785   13.84     92657 -     92785    129
       3 -    9 CDSf     93769 -     94326   35.57     93769 -     94326    558
       3 -      TSS     114587               -5.98
    
       4 +      TSS     125579               -7.98
       4 +    1 CDSo    125868 -    127703  106.91    125868 -    127703   1836
       4 +      PolA    128741                0.44
    
       5 -      PolA    131072                0.44
       5 -    1 CDSl    131200 -    131295    2.31    131200 -    131295     96
       5 -    2 CDSi    131727 -    132211   29.67    131727 -    132209    483
       5 -    3 CDSi    135506 -    135786   24.44    135507 -    135785    279
       5 -    4 CDSf    136192 -    136895   69.04    136194 -    136895    702
       5 -      TSS     137452               -2.68
    

    Unmasked Sequence

     FGENESH 2.6 Prediction of potential genes in Monocot genomic DNA
     Time    :   Sat Oct 20 08:15:40 2012
     Seq name:  test sequence 
     Length of sequence: 150000 
     Number of predicted genes 21: in +chain 14, in -chain 7.
     Number of predicted exons 98: in +chain 48, in -chain 50.
     Positions of predicted genes and exons: Variant   1 from   1, Score:2271.243164 
       G Str   Feature   Start        End    Score           ORF           Len
    
       1 +      TSS       1350               -7.38
       1 +    1 CDSf      1840 -      2364   20.11      1840 -      2364    525
       1 +    2 CDSl      2513 -      3091   27.27      2513 -      3091    579
       1 +      PolA      3380               -2.36
    
       2 +      TSS       4814               -6.18
       2 +    1 CDSo      5440 -      6072   20.04      5440 -      6072    633
       2 +      PolA      6140                0.44
    
       3 +      TSS       7578               -6.18
       3 +    1 CDSo      8204 -      8683   11.59      8204 -      8683    480
       3 +      PolA      8904                0.44
    
       4 +      TSS      10202               -2.28
       4 +    1 CDSo     10695 -     10817    4.12     10695 -     10817    123
       4 +      PolA     11068                0.44
    
       5 +      TSS      15842               -6.18
       5 +    1 CDSf     16555 -     16561    3.85     16555 -     16560      6
       5 +    2 CDSi     17723 -     17845    5.11     17725 -     17844    120
       5 +    3 CDSl     18175 -     18845    8.88     18177 -     18845    669
       5 +      PolA     18963               -4.06
    
       6 +      TSS      20716               -4.08
       6 +    1 CDSf     22395 -     22446    4.06     22395 -     22445     51
       6 +    2 CDSl     22913 -     24030   34.65     22915 -     24030   1116
       6 +      PolA     24089               -4.96
    
       7 -      PolA     24820                0.44
       7 -    1 CDSl     25590 -     26031    5.48     25590 -     26030    441
       7 -    2 CDSi     26079 -     26125    0.36     26081 -     26125     45
       7 -    3 CDSi     26188 -     27980   65.26     26188 -     27978   1791
       7 -    4 CDSi     28113 -     28263   10.06     28114 -     28263    150
       7 -    5 CDSf     28298 -     30691  129.28     28298 -     30691   2394
       7 -      TSS      31372               -6.38
    
       8 +      TSS      33952               -8.38
       8 +    1 CDSf     34286 -     34353   -8.98     34286 -     34351     66
       8 +    2 CDSi     34540 -     35299   32.99     34541 -     35299    759
       8 +    3 CDSi     35393 -     36271   42.41     35393 -     36271    879
       8 +    4 CDSi     36307 -     36386    1.43     36307 -     36384     78
       8 +    5 CDSi     37321 -     37350    1.21     37322 -     37348     27
       8 +    6 CDSl     38094 -     38379    1.76     38095 -     38379    285
       8 +      PolA     39450               -1.06
    
       9 -      PolA     39504               -5.36
       9 -    1 CDSo     39634 -     39879   23.47     39634 -     39879    246
       9 -      TSS      40009               -2.08
    
      10 +      TSS      42802               -6.28
      10 +    1 CDSf     43362 -     43370    6.92     43362 -     43370      9
      10 +    2 CDSi     43832 -     43925   11.45     43832 -     43924     93
      10 +    3 CDSi     43994 -     44061   10.88     43996 -     44061     66
      10 +    4 CDSi     44147 -     44286   11.16     44147 -     44284    138
      10 +    5 CDSi     45831 -     46044   22.65     45832 -     46044    213
      10 +    6 CDSi     46401 -     46496    1.28     46401 -     46496     96
      10 +    7 CDSi     46961 -     47107   26.38     46961 -     47107    147
      10 +    8 CDSl     47156 -     47425   -3.88     47156 -     47425    270
      10 +      PolA     47587                0.44
    
      11 +      TSS      49563               -4.18
      11 +    1 CDSf     51056 -     51073    1.09     51056 -     51073     18
      11 +    2 CDSl     51250 -     53442   57.38     51250 -     53442   2193
      11 +      PolA     53559               -4.06
    
      12 +      TSS      55311               -4.08
      12 +    1 CDSf     56989 -     57040    4.06     56989 -     57039     51
      12 +    2 CDSl     57507 -     58990   50.60     57509 -     58990   1482
      12 +      PolA     59341               -1.06
    
      13 -      PolA     65279                0.44
      13 -    1 CDSl     66827 -     67069   14.72     66827 -     67069    243
      13 -    2 CDSf     67298 -     67480    7.52     67298 -     67480    183
      13 -      TSS      68793               -4.48
    
      14 -      PolA     68832               -1.06
      14 -    1 CDSl     69041 -     69274    5.98     69041 -     69274    234
      14 -    2 CDSi     70734 -     70985    4.84     70734 -     70985    252
      14 -    3 CDSi     72290 -     72400   17.10     72290 -     72400    111
      14 -    4 CDSi     72736 -     72942    6.24     72736 -     72942    207
      14 -    5 CDSi     73430 -     73549    6.13     73430 -     73549    120
      14 -    6 CDSi     73650 -     73814    5.00     73650 -     73814    165
      14 -    7 CDSi     73902 -     74051    5.01     73902 -     74051    150
      14 -    8 CDSi     74136 -     74245    7.47     74136 -     74243    108
      14 -    9 CDSi     74629 -     74746   16.40     74630 -     74746    117
      14 -   10 CDSi     74853 -     75083    6.95     74853 -     75083    231
      14 -   11 CDSi     75647 -     75847   13.84     75647 -     75847    201
      14 -   12 CDSi     77217 -     77513   22.26     77217 -     77513    297
      14 -   13 CDSi     77905 -     77954    5.08     77905 -     77952     48
      14 -   14 CDSi     78111 -     78186    6.75     78112 -     78186     75
      14 -   15 CDSi     78271 -     78345    4.10     78271 -     78345     75
      14 -   16 CDSi     79113 -     79287    3.53     79113 -     79286    174
      14 -   17 CDSi     79556 -     79641    5.90     79558 -     79641     84
      14 -   18 CDSi     79739 -     80173   67.76     79739 -     80173    435
      14 -   19 CDSi     80684 -     80793    5.07     80684 -     80791    108
      14 -   20 CDSi     80917 -     81014    1.95     80918 -     81013     96
      14 -   21 CDSf     82020 -     82051    4.10     82022 -     82051     30
      14 -      TSS      82668               -5.38
    
      15 +      TSS      83414               -1.18
      15 +    1 CDSo     84052 -     84669   37.81     84052 -     84669    618
      15 +      PolA     85456               -2.56
    
      16 -      PolA     85830                0.44
      16 -    1 CDSl     86468 -     86539   -4.09     86468 -     86539     72
      16 -    2 CDSi     87476 -     89326  128.30     87476 -     89326   1851
      16 -    3 CDSi     90279 -     90329    9.08     90279 -     90329     51
      16 -    4 CDSi     90416 -     90451    7.30     90416 -     90451     36
      16 -    5 CDSi     90527 -     90626   -2.08     90527 -     90625     99
      16 -    6 CDSi     91484 -     91521    4.03     91486 -     91521     36
      16 -    7 CDSi     91822 -     91976   14.35     91822 -     91974    153
      16 -    8 CDSi     92096 -     92274   19.45     92097 -     92273    177
      16 -    9 CDSi     92310 -     92396    1.41     92312 -     92395     84
      16 -   10 CDSi     92655 -     92785   13.84     92657 -     92785    129
      16 -   11 CDSi     93769 -     94343   17.71     93769 -     94341    573
      16 -   12 CDSi     94676 -     95047   14.66     94677 -     95045    369
      16 -   13 CDSi     95192 -     95330   10.81     95193 -     95330    138
      16 -   14 CDSi     95365 -     96066   19.03     95365 -     96066    702
      16 -   15 CDSf     96160 -     98037  157.79     96160 -     98037   1878
      16 -      TSS      98627               -2.18
    
      17 +      TSS      98896               -8.78
      17 +    1 CDSf     99418 -    100813   73.60     99418 -    100812   1395
      17 +    2 CDSi    101025 -    101350    3.33    101027 -    101350    324
      17 +    3 CDSi    102302 -    104080  121.36    102302 -    104080   1779
      17 +    4 CDSi    104150 -    104959   22.92    104150 -    104959    810
      17 +    5 CDSi    104994 -    105144   11.05    104994 -    105143    150
      17 +    6 CDSi    105277 -    106643   60.13    105279 -    106643   1365
      17 +    7 CDSi    106716 -    106791    3.24    106716 -    106790     75
      17 +    8 CDSi    106832 -    106927    6.67    106834 -    106926     93
      17 +    9 CDSi    107129 -    107565   18.86    107131 -    107565    435
      17 +   10 CDSi    110387 -    111046   35.07    110387 -    111046    660
      17 +   11 CDSl    111260 -    112792   84.99    111260 -    112792   1533
      17 +      PolA    113178                0.44
    
      18 +      TSS     115963               -2.18
      18 +    1 CDSf    116581 -    119154  179.31    116581 -    119154   2574
      18 +    2 CDSi    119291 -    119429   10.83    119291 -    119428    138
      18 +    3 CDSi    119574 -    119644   -0.99    119576 -    119644     69
      18 +    4 CDSi    120008 -    120313   21.71    120008 -    120313    306
      18 +    5 CDSi    120561 -    121391   47.55    120561 -    121391    831
      18 +    6 CDSi    121427 -    121832   15.80    121427 -    121831    405
      18 +    7 CDSl    122078 -    122526    0.20    122080 -    122526    447
      18 +      PolA    123452                0.44
    
      19 +      TSS     125579               -7.98
      19 +    1 CDSo    125868 -    127703  106.91    125868 -    127703   1836
      19 +      PolA    128595                0.44
    
      20 -      PolA    131072                0.44
      20 -    1 CDSl    131200 -    131295    2.31    131200 -    131295     96
      20 -    2 CDSi    131727 -    132211   29.67    131727 -    132209    483
      20 -    3 CDSf    135506 -    136895  147.60    135507 -    136895   1389
      20 -      TSS     137452               -2.68
    
      21 -      PolA    138055               -4.96
      21 -    1 CDSl    138092 -    143107  232.24    138092 -    143107   5016
      21 -    2 CDSi    143782 -    143873   -2.66    143782 -    143871     90
      21 -    3 CDSf    144995 -    145304    7.48    144996 -    145304    309
    

    References

    1. Haberer G, Young S, Bharti AK, Gundlach H, Raymond C, Fuks G, Butler E, Wing, RA, Rounsley S, Birren B, Nusbaum C, Mayer KF, Messing J. Structure and architecture of the maize genome. Plant Physiol. 139:1612-1624, 2005.
    2. Fujimori S, Washio T, Tomita M. GC-compositional strand bias around transcription start sites in plants and fungi. BMC Genomics 6:26, 2005.
    3. Raffa GD, Wohlschlegel J, Yates JR 3rd, Boddy MN. SUMO-binding motifs mediate the Rad60-dependent response to replicative stress and self-association. J Biol Chem 281:27973-27981, 2006.
    4. Novatchkova M, Bachmair A, Eisenhaber B, Eisenhaber F. Proteins with two SUMO-like domains in chromatin-associated complexes: the RENi (Rad60-Esc2-NIP45) family. BMC Bioinformatics 6:22, 2005.
    Was this page helpful?
    Tag page (Edit tags)
    • No tags

    Files 3

    FileSizeDateAttached by 
     repeats.jpg
    plot of repeats and GC content and skew
    24.54 kB09:31, 19 Oct 2012gribskovActions
     RM2_seq7.fa.txt_1350606859.masked
    repeat masker masked file
    149.43 kB08:01, 19 Oct 2012gribskovActions
     seq7.fa.txt
    No description
    152.36 kB13:32, 18 Sep 2012gribskovActions
    You must login to post a comment.