Was this page helpful?

Supplemental Analysis

    General Info

    From blastn of beginning and ending portions of the sequence on MaizeGDB we found that the genetic sequence is from chromosome 8 of maize. Coordinates are Chr8 138800001..138950000.

    Gramene blast result chromosome picture.png

    Intrinsic Sequence Analysis

    Dot Plot

    dot plot whole 150kb sequence, name size = 15.png

    Comment: Dot plot shows that there may be areas of the sequence to look closer at, possible repeats, insertions, or deletions. Approximate possible problematic regions are:

    1. 1 - 1000bp
    2. 16,000 - 35,000bp
    3. 50,000 - 60,000bp
    4. 90,000 - 105,000bp
    5. 125,000 - 140,000bp

     

    tiede 24 October 2012

    "Word" frequency

    # Output from 'compseq'
    #
    # The Expected frequencies are calculated from the observed single
    # base or residue frequencies in these sequences
    #
    # The input sequences are:
    #	sequence_1
    
    Word size	2
    Total count	149999
    
    #
    # Word	Obs Count	Obs Frequency	Exp Frequency	Obs/Exp Frequency
    #
    AA	10748		0.0716538	0.0625734	1.1451170
    AC	8295		0.0553004	0.0617012	0.8962612
    AG	9406		0.0627071	0.0621348	1.0092110
    AT	9072		0.0604804	0.0624033	0.9691867
    CA	9593		0.0639538	0.0617012	1.0365079
    CC	9768		0.0651204	0.0608412	1.0703353
    CG	8109		0.0540604	0.0612687	0.8823487
    CT	9525		0.0635004	0.0615334	1.0319660
    GA	9804		0.0653604	0.0621348	1.0519141
    GC	9430		0.0628671	0.0612687	1.0260881
    GG	9927		0.0661804	0.0616992	1.0726296
    GT	8098		0.0539870	0.0619659	0.8712383
    TA	7373		0.0491537	0.0624033	0.7876778
    TC	9504		0.0633604	0.0615334	1.0296908
    TG	9816		0.0654404	0.0619659	1.0560725
    TT	10724		0.0714938	0.0622336	1.1487973
    
    Other	807		0.0053800	0.0053333	1.0087567

    GC content - Used geecee in Emboss package

    #Sequence   GC content
    sequence_1    0.50
    

    The results from geecee tell us that the GC content is 50%, which is higher than the 46.5%1 cited in the literature. Perhaps our 150kb segment encompasses a genic rice region of the genome. Also, the AA and TT "words" appear noticeably higher than expected in our sequence while the "word" TA appears noticeably less than expected, perhaps there are a significant number of homo-A and homo-T repeats (not sure if that is an actual term). 

    tiede 24 October 2012

    RepeatMasker

    ==================================================
    file name: RM2sequpload_1351040604  
    sequences:             1
    total length:     150000 bp  (149200 bp excl N/X-runs)
    GC level:         49.77 %
    bases masked:      99668 bp ( 66.45 %)
    ==================================================
                   number of      length   percentage
                   elements*    occupied  of sequence
    --------------------------------------------------
    Retroelements           38        94476 bp   62.98 %
       SINEs:                0            0 bp    0.00 %
       Penelope              0            0 bp    0.00 %
       LINEs:                3          578 bp    0.39 %
        CRE/SLACS            0            0 bp    0.00 %
         L2/CR1/Rex          0            0 bp    0.00 %
         R1/LOA/Jockey       0            0 bp    0.00 %
         R2/R4/NeSL          0            0 bp    0.00 %
         RTE/Bov-B           0            0 bp    0.00 %
         L1/CIN4             3          578 bp    0.39 %
       LTR elements:        35        93898 bp   62.60 %
         BEL/Pao             0            0 bp    0.00 %
         Ty1/Copia          15        35327 bp   23.55 %
         Gypsy/DIRS1        19        58480 bp   38.99 %
           Retroviral        0            0 bp    0.00 %
    
    DNA transposons         12         4101 bp    2.73 %
       hobo-Activator        3          372 bp    0.25 %
       Tc1-IS630-Pogo        0            0 bp    0.00 %
       En-Spm                0            0 bp    0.00 %
       MuDR-IS905            0            0 bp    0.00 %
       PiggyBac              0            0 bp    0.00 %
       Tourist/Harbinger     0            0 bp    0.00 %
       Other (Mirage,        0            0 bp    0.00 %
        P-element, Transib)
    
    Rolling-circles          0            0 bp    0.00 %
    
    Unclassified:            1          164 bp    0.11 %
    
    Total interspersed repeats:       98741 bp   65.83 %
    
    
    Small RNA:               0            0 bp    0.00 %
    
    Satellites:              0            0 bp    0.00 %
    Simple repeats:          6          206 bp    0.14 %
    Low complexity:         10          721 bp    0.48 %
    ==================================================
    * most repeats fragmented by insertions or deletions
      have been counted as one element                                                      
    
    The query species was assumed to be zea           
    RepeatMasker version open-3.3.0 , default mode
                                       
    run with cross_match version 1.080812
    RepBase Update 20110920, RM database version 20110920
    
    LTR = Long Terminal Repaet; LINEs = Long Interspersed Elements

    For masked sequence (masked by an N) click here

    For RepeatMasker annotation file click here

    For RepeatMasker alignment file click here

    tiede 24 October 2012

    Nucleotide Density

    -After masking repeasts

    Regions with no peak correspond to masked regions. The masked regions correspond well to the rough estimates of problematic areas predicted from the dotplot shown above.

     

    Peaks of high GC (blue) content may be corroborate with genic regions.

    tiede 24 October 2012

    CpG Island

    -See plot in the "Gene Predictions Section"

    CPGPLOT islands of unusual CG composition
    sequence_1 from 1 to 150000
    
         Observed/Expected ratio > 0.60
         Percent C + Percent G > 50.00
         Length > 200
    
     Length 416 (21782..22197)
     Length 201 (59401..59601)
     Length 440 (60386..60825)
     Length 401 (61289..61689)*
     Length 864 (61701..62564)*
     Length 215 (79672..79886)
     Length 710 (82400..83109)*
     Length 386 (86909..87294)*
     Length 1166 (87646..88811)*
     Length 840 (88848..89687)*
     Length 204 (102599..102802)*
     Length 303 (102889..103191)*
     Length 1017 (103277..104293)*
     Length 982 (109208..110189)*
     Length 1159 (110643..111801)*
     Length 495 (112272..112766)*
     Length 286 (119513..119798)
     Length 275 (119832..120106)
     Length 427 (138807..139233)
     Length 634 (142953..143586)*
    
    *corresponds to gene predicted by FGeneSh

    Gene Predictions

    Overview

    Click HERE to see a comparison (excel spreadsheet) of the predicted genes from FGeneSH and AUGUSTUS. Also contains what genes will be analyzed further.

    FGeneSH

    Post-masking results, click here

    Summary of predicted genes:

    1)  44,007 - 47,228 BP - 5 exons, 141 AA long protein predicted

    2)  61,190 - 62,728 BP - 3 exons, 313 AA long protein predicted

    -Blastn results show strong match (E.value = 0) to predicted gene 1-aminocyclopropane-1-carboxylate           oxidase 1 in maize and Brachypodium distachyon

    3)  82,397 - 89,906 BP - 14 exons, 1277 AA long protein predicted

    -Blastn results show strong match (E.value = 0) to predicted gene as lycopene epsilon cyclase

    4)  101.949 - 113,682 BP - 13 exons, 1023 AA long protein predicted

    5)  141,659 - 143,980 BP - 2 exons, 201 AA long protein predicted

     

    The predicted genes from FGeneSH line up well to the high (observed/expected) CpG regions (plot below) from predcted by the CPGPLOT software in EMBOSS. CpG-rich regions are associated with genic regions2, which supports the gene prediction results from FGeneSH.

    cpgplot whole MASKED sequence.png

     

    AUGUSTUS

    For detailed results click here

    AUGUSTUS results browser image.png

    Summary of predicted AUGUSTUS genes:

    1) "g1" --> 61,306 - 62,499 BP

    -Blastn results show strong match (E.value = 0) to predicted gene 1-aminocyclopropane-1-carboxylate           oxidase 1 in maize and Brachypodium distachyon

    -Same Blastn result as "gene 2" predicted by FGeneSH; BP coordinates awfully similar as well

    2) "g2" --> 82,817 - 86,391 BP

    3) "g3" --> 87,022 - 89,626 BP

    4) "g4" --> 109,545 - 112,756 BP

    5) "g5" --> 138,554 = 143,926 BP

    GeneMark

    PDF graphical/detailed results click here

    GeneMark.hmm (Version 2.2a)
    Sequence name: Group 1 GeneMark Masked Sequence
    Sequence length: 150000 bp
    G+C content: 82.05%
    Matrix: corn
    Wed Oct 24 23:50:36 2012
    
    Predicted genes/exons
    
    Gene Exon Strand Exon           Exon Range     Exon      Start/End
      #    #         Type                         Length       Frame
    
      1     2   -  Terminal     45230     45349     120          3 1
      1     1   -  Initial      45431     45550     120          3 1
    
      2     1   +  Initial      61318     61425     108          1 3
      2     2   +  Internal     61545     61789     245          1 2
      2     3   +  Terminal     61911     62499     589          3 3
    
      3     1   +  Initial      69752     69825      74          1 2
      3     2   +  Internal     70116     70179      64          3 3
      3     3   +  Terminal     70244     70498     255          1 3
    
      4     1   +  Initial      82817     83131     315          1 3
      4     2   +  Internal     83270     83315      46          1 1
      4     3   +  Internal     83420     83460      41          2 3
      4     4   +  Internal     84163     84168       6          1 3
      4     5   +  Internal     84287     84458     172          1 1
      4     6   +  Internal     84568     84703     136          2 2
      4     7   +  Internal     84919     85021     103          3 3
      4     8   +  Internal     85406     85513     108          1 3
      4     9   +  Internal     85611     85682      72          1 3
      4    10   +  Terminal     85986     86048      63          1 3
    
      5     1   +  Initial      87022     87223     202          1 1
      5     2   +  Internal     87302     87396      95          2 3
      5     3   +  Internal     87494     87615     122          1 2
      5     4   +  Internal     87686     88190     505          3 3
      5     5   +  Terminal     88289     89626    1338          1 3
    
      6     4   -  Terminal    102970    103049      80          3 2
      6     3   -  Internal    103501    103548      48          1 2
      6     2   -  Internal    103658    103742      85          1 1
      6     1   -  Initial     103848    103949     102          3 1
    
      7     1   +  Initial     109263    109393     131          1 2
      7     2   +  Internal    109772    110001     230          3 1
      7     3   +  Internal    110573    110833     261          2 1
      7     4   +  Internal    110920    111810     891          2 1
      7     5   +  Terminal    112287    112756     470          2 3
    
      8     4   -  Terminal    138472    138549      78          3 1
      8     3   -  Internal    138821    139013     193          3 3
      8     2   -  Internal    139101    139198      98          2 1
      8     1   -  Initial     142968    143528     561          3 1
    
      9     1   +  Initial     143726    143750      25          1 1
      9     2   +  Terminal    143847    143905      59          2 3
    
     

    tiede 25 October 2012

    Alignments

    Blastn

    NCBI Blastn result using database nr; maize included in search NCBI Blastn masked sequence.pdf

    *-Plan to run a Blastn search on predicted gene sequences to see if any genes/gene models already exist

    *-Plan to run a Blastn search without maize to see if any regions of interest pop up to investigate

    *-Plan to run a Blastx search with and without maize to see if I can find any protein evidence to support the gene predictions or point to any regions not predicted by FGeneSH

     

     

    Below Blast results need to be relooked at;

    • Whole sequence, nothing excluded:
      • Click on link below to open PDF of results
      • Appears that the sequence comes from the zea mays (maize) genome
        • BAC clones from multiple maize chromosomes align to the sequence
    • Whole sequence, zea mays excluded:

     

    Blastx results of whole sequence using Swissprot of NCBI:

     

    tiede 02 October 2012

     

    RepeatMasker analysis:

    Species: Zea mays  output file (table)

    Yanzhu 02 October 2012

    References

    1. Haberer G, Young S, Bharti AK, Gundlach H, Raymond C, Fuks G, Butler E, Wing, RA, Rounsley S, Birren B, Nusbaum C, Mayer KF,     Messing J. Structure and architecture of the maize genome. Plant Physiol. 139:1612-1624, 2005.
    2. EMBO J. 1988 August; 7(8): 2295–2299.
    Was this page helpful?
    Tag page (Edit tags)
    • No tags

    Files 43

    FileSizeDateAttached by 
     AUGUSTUS results browser image.png
    No description
    7.22 kB23:58, 24 Oct 2012tiedeActions
     AUGUSTUS text results masked sequence.txt
    No description
    19.75 kB00:01, 25 Oct 2012tiedeActions
     Combined CpG plot hits.pdf
    No description
    78.42 kB21:49, 1 Oct 2012tiedeActions
    compseq results - frequency of unique words.jpg
    No description
    59.54 kB08:40, 2 Oct 2012tiedeActions
     compseq results - frequency of unique words.txt
    No description
    956 bytes08:31, 2 Oct 2012tiedeActions
     CpG plot 1-10000bp.pdf
    No description
    129.02 kB21:30, 1 Oct 2012tiedeActions
     CpG plot 100001-110000.pdf
    No description
    134.57 kB21:30, 1 Oct 2012tiedeActions
     CpG plot 10001-20000bp.pdf
    No description
    134.95 kB21:30, 1 Oct 2012tiedeActions
     CpG plot 110001-120000.pdf
    No description
    131.48 kB21:30, 1 Oct 2012tiedeActions
     CpG plot 120001-130000.pdf
    No description
    133.8 kB21:30, 1 Oct 2012tiedeActions
     CpG plot 130001-140000.pdf
    No description
    134.3 kB21:30, 1 Oct 2012tiedeActions
     CpG plot 140001-150000.pdf
    No description
    135.25 kB21:30, 1 Oct 2012tiedeActions
     CpG plot 20001-30000.pdf
    No description
    136.14 kB21:30, 1 Oct 2012tiedeActions
     CpG plot 30001-40000.pdf
    No description
    131.79 kB21:30, 1 Oct 2012tiedeActions
     CpG plot 40001-50000.pdf
    No description
    131.58 kB21:30, 1 Oct 2012tiedeActions
     CpG plot 50001-60000.pdf
    No description
    129.24 kB21:30, 1 Oct 2012tiedeActions
     CpG plot 60001-70000 png.png
    No description
    11.81 kB23:36, 23 Oct 2012tiedeActions
     CpG plot 60001-70000.pdf
    No description
    134.49 kB21:30, 1 Oct 2012tiedeActions
     CpG plot 70001-80000.pdf
    No description
    134.21 kB21:30, 1 Oct 2012tiedeActions
     CpG plot 80001-90000.pdf
    No description
    133.75 kB21:30, 1 Oct 2012tiedeActions
     CpG plot 90001-100000.pdf
    No description
    127.32 kB21:30, 1 Oct 2012tiedeActions
     cpgplot whole MASKED sequence.png
    No description
    6.82 kB23:47, 23 Oct 2012tiedeActions
     dot plot whole 150kb sequence, name size = 15.png
    No description
    4.76 kB21:56, 1 Oct 2012tiedeActions
     dot plot whole 150kb sequence.png
    Whole sequence (150kb) dot plot, name size = 10
    23.5 kB20:27, 1 Oct 2012tiedeActions
     FGeneSH results of masked sequence.pdf
    No description
    100.43 kB23:09, 23 Oct 2012tiedeActions
     G1_repeatmasker.tbl
    No description
    2.42 kB00:01, 2 Oct 2012ji20Actions
     Gene overview.xlsx
    Overview of Predicted genes from FGeneSH and AUGUSTUS
    17.47 kB22:00, 26 Nov 2012tiedeActions
     GeneMark predictions masked sequence.pdf
    No description
    293.15 kB23:54, 24 Oct 2012tiedeActions
     Gramene blast result chromosome picture.png
    No description
    4.71 kB18:20, 27 Nov 2012tiedeActions
     group 1 original sequence _ MASKED _ alignment file.txt
    No description
    546.15 kB22:06, 23 Oct 2012tiedeActions
     group 1 original sequence _ MASKED _ annotation file.txt
    No description
    11.34 kB22:06, 23 Oct 2012tiedeActions
     group 1 original sequence _ MASKED.txt
    No description
    149.43 kB22:15, 23 Oct 2012tiedeActions
     NCBI Blast_FGeneSH predicted gene 2.pdf
    No description
    981.65 kB00:49, 24 Oct 2012tiedeActions
     NCBI Blastn masked sequence image.gif
    No description
    98 bytes22:59, 23 Oct 2012tiedeActions
    NCBI Blastn masked sequence.pdf
    No description
    2007.61 kB00:22, 24 Oct 2012tiedeActions
     NCBI blastn nr all 150kb.pdf
    No description
    1791.94 kB20:30, 1 Oct 2012tiedeActions
     NCBI Blastn nr AUGUSTUS g1.pdf
    No description
    331.27 kB00:15, 25 Oct 2012tiedeActions
     NCBI blastn nr excluding zea mays all 150kb.pdf
    No description
    522.25 kB20:43, 1 Oct 2012tiedeActions
     NCBI Blastn_FGeneSH predicted gene 3.pdf
    No description
    583.41 kB00:30, 24 Oct 2012tiedeActions
     NCBI blastx swiss_prot 80,001 - 90,000 bp.pdf
    No description
    560.59 kB22:00, 1 Oct 2012tiedeActions
     NCBI blastx swiss_prot 80,001 - 90,000 pb image.gif
    No description
    98 bytes22:02, 1 Oct 2012tiedeActions
     NCBI blastx swiss_prot all 150kb.pdf
    No description
    449.33 kB21:17, 1 Oct 2012tiedeActions
     NCBI blastx swiss_prot excluding zea mays all 150kb.pdf
    No description
    453.04 kB21:51, 1 Oct 2012tiedeActions
    You must login to post a comment.