Table of contents
    No headers

    Screen shot 2010-09-20 at 1.40.08 PM.png

    General analysis of seq203

    Firstly, to make sure which species our sequence203 comes from, we did blast at NCBI and we found most hits on Zea may. Query length is 400,000 and it was aligned to several BAC clones from different chromosomes. It might be caused by lots of transposable elements considering maize genome's charateristic (we found many retrotransposons in our sequence. data below). Mostly, the clones on Chromosome 5, 7, 8 and 10 were aligned. The red boxes in the right figure which indicate aligned sequences look choppy. That might be because all the sequences of clones were not yet updated on the NCBI database.  

    Screen shot 2010-09-20 at 1.40.25 PM.png

    Details about blast results are posted in supplementary information. 

     

     

     

     

     

     

     

     

    Sequence composition

    The results of EMBOSS compseq program shows that the sequence 203 consists of 400,000 bases, containing 3131 unknown bases (0.78% of the entire sequence). The content of AA and TT are 1.18 and 1.23 times higher than expected frequencies, respectively. On the other hand, the content of CG, GC, AC, GT and TA are lower than expected frequencies by 0.74, 0.92, 0.83, 0.87 and 0.80 times, respectively. The reason why the content of AA and TT are higher and that of CG and GC are lower could be because the seq 203 contains high amount of repetitive sequences and low amount of genic sequences. The GC content is 0.48 but not evenly distributed in sequence 203. Several regions (170 kb-180 kb, 205-210 kb, 215-225 kb, 220-235 kb, 345-360 kb) contains higher GC content.

    cpgplot.png

     

    Analysis of non-Genic sequences

    To characterize our seq203, dot plot was used. Dot plot is a good tool to identify tandem repeats visually. Seq203 was compared with itself, so the solid diagonal indicated the query sequences on x-axis and y-axis are same. If there is tandem repeat sequence, lots of short diagonal lines show in vertical way, which are parallel to central diagonal. Diagonal lines which are perpendicular to central diagonal indicated inversions. The result we got doesn't show any clear tandem repeats or inversions. Sporadic faint lines turned out many retrotransposons later in our project (data shown below). As class I retotransposon is known to have copy & paste system using transcription and reverse transcription, many copies of 100bp~5kbp (LTR transposon) must exist in maize genome, which may result in the blastn results

    seq203 dot plot.jpg

     

    Now, to identify how many repeats and which kind of repeat they have, RepeatMasker was used to analyze them.

    More than 85 % (342,734bp) was masked by RepeatMasker and most of them were LTRs of retroelements. The LTRs in the sequence was categorized into two groups: Ty1/Copia and Gypsy/DIRS1, 122,121 bp and 206,932 bp, respectively. The rest of repeats were DNA transposons. They consist of hobo-Activator, En-Spm, MuDR-IS905 and Tourist/Harbinger.  

    Screen shot 2010-09-23 at 5.49.28 PM.png

     

     

    blastp.png

    Analysis of putative functional region

     When the sequence was blasted against protein database using tblastx and blastx, no specific proteins were detected from the whole sequence (see supplementary data). So, the longest putative protein which was predicted using Genmark were blasted against Maize protein database using blastp.

    Most (?) predicted proteins are related heat-shock response and one StbA domain were identified, which was plasmid stability protein. 

     

     To find out any putative gene in the region, the masked sequence by RepeatMasker were used as query sequence for blast job against Arabidopsis genome. WU-BLAST RESULT   The blast result was consistent with the result of NCBI blast job against Maize protein database. Most of Arabidopsis genes were related to protein folding, cadmium ion and heat shock. They were located mostly in cell wall, plasma membrane, chloroplast and mitochondria.   

     Using the same masked sequence by RepeatMasker, putative transcribed regions were predicted using FGENSH and genmark. The table shows the results of gene prediction from both programs. The traslated amino acid sequences of each predicted genes were blasted against plant protein database using blastP.

     

    Out of 15 predicted genes, 1 predicted gene (203_8) showed strong significant match to heat shock protein, 2 (203_4. 203_6) seemed to be polymerase alpha subunit B family protein which are suggested using PSI-blast and 2 (203_12, 203_14) showed weak match.

     

    ID Begin (TSS) End (PolA) Strand N Exons F=FGENESH G=Genmark BLAST
    203_1 29194 29367 + 2 G no match
    203_2 63874 63626 - 2 G no match
    203_3 112835 113143 + 2 G no match
    203_4 114480 119729 + 6 G hypothetical protein                        Probably DNA polymerase alpha subunit B family protein
      (114191) 116856 + 8 F
      114480 (116894)        
    203_5 119956 120254 + 2 G no match
      (120765)   - 2 F no match
      120210 119568        
        (119530)        
    203_6 158833 156963 - 7 G  
      (164993)   - 9 F hypothetical protein
      162609 156963       Probably DNA polymerase alpha subunit B family protein (match the one expressed in oryza)
        (156442)        
    203_7 162994 162695 - 2 G no match
    203_8 165597 170326 + 7 G

    stromal 70 kDa heat shock-related protein

    This protein consists of 703 amino acids and the correct prediction is FHENSH.

    (FGENSH predict entire region of this protein; however, genmark predictied 96% of sequence sequence though the sequence strongly match from end to end  Zea mays, Oryza sativa and Triticum aestivum)

      (165132)   + 8 F
      165597 169928      
        (170561)      
    203_9 186732 186279 - 2 G no match
    203_10 236043 236408 + 3 G no match
    203_11 (237589) 237651 + 1 F no match
        (238566)        
    203_12 258845 257671 - 2 G hypothetical protein
      (259346)   - 1 F unknown function
      257793 (257761)        
    203_13 309434 308737 - 2 G no match
    203_14 312443 310252 - 3 G Weak match to catalytic/ protein phosphatase type 2C. However, entire amino acid sequence obtained from rice did not exsist in the maize seq 203. Also there are deletion of 28 amino acids. This prediction might not be correct.
    203_15 352199 352126 - 1 G no match

     

     

    Comparative analysis with 70kDa heat shock-related protein

    (which is the most reliably predicted gene) 

     

    203_8 which significantly match to stromal 70 kDa heat shock-related protein are well conserved among monocot species (ex.Oryza sativa, Sorghum bicolor and Triticum aestivum) and eudicot species (ex. Medicago truncatula, Vitis vinifera, Ipomoea nil, Populus trichocarpa and Cucumis sativus) (supplemental data). In some of the species such as Oryza sativa, Sorghum bicolor, Vitis vinifera, this HSP protein significantly match to hypothetical protein. However, the sequence alignment between 70kDa HS-related protein in zea mays and hypothetical proteins are well conserved, so we considered the hypothetical proteins have same function as HS-related protein. It is a good example to show comparative genomics is useful to translate knowledge from one to another species.

    Surprisingly, this heat shock-related protins in maize is also well conserved in even lower plant species such as Volvox and Chlamydomonas. It is known that heat shock protein is related to stress response. The conservation of this gene among lots of species suggests that this protein is primary protein for plant species to live.

    Luminal binding protein (BiP) is also a member of heat shock protein70. The part of sequence of 203_8 (69 AA~) showed significant match to luminal binding protein in Arabidopsis, Glycine max, Gossypium hirsutum and Nicotiana benthamiana.

    In order to look at the phylogenetic relationship of orthologs, we constructed bootstrap N-J tree.We chose a best hit of 27 species.

    Phylogenetic tree.pngZm: Zea mays, Sb: Sorghum bicolor Os: Oryza sativa, Ta:Triticum aestivum, Mt: Medicago trancatula,  Vv: Vitis vinifera, In:Ipomoea nil,Pg:Pennisetum glaucum, Rc: Ricinus communis, Cs: Cucumis sativus, Pt: Populus trichocarpa, So:Spinacia oleracea, Cl:Citrullus lanatus, Ps: Picea sitchensi, Pp:Physcomitrella patens subsp. patens, At:Arabidopsis thaliana, Sl:Solanum lycopersicum, Sm: Selaginella moellendorffii, Cr: Chlamydomonas reinhardtii], Vc:Volvox carteri f. nagariensis, Cv:Chlorella variabilis, Mp:Micromonas pusilla CCMP1545, Ds:Dunaliella salina, C: Cyanothece sp. PCC 7822, S: Synechocystis sp. PCC 6803, Cr: Cylindrospermopsis raciborskii CS-505, Ma:Microcystis aeruginosa NIES-843
     

    For more specific analysis, we analyzed the conservativeness of 70kDa HSP with maize, sorghum, rice and arabidopsis. We got the protein sequences by blast the predicted 70kDa heat shock-related protein against each protein database. Using clustalW, the protein sequences from each were aligned and compared(Alignment details). 

    Most part of proteins were well conserved in four species.

    Screen shot 2010-12-09 at 9.08.24 PM.pngScreen shot 2010-12-09 at 9.08.54 PM.png

    Screen shot 2010-12-09 at 9.09.27 PM.png

    Screen shot 2010-12-09 at 9.09.52 PM.png

    Screen shot 2010-12-09 at 9.11.25 PM.png

    Average distance tree was drawn based on alignment data from clustaW. As expected Arabidopsis was isolated from other 3 species, because it is dicot. Sorghum, which is regarded as maize's ancestor, was the closest to maize, of course.

    average distance tree.png

     

    Six putative transcription sites were detected and each predicted amino acid sequences were analyzed using NCBI blastp option. The first and third aa sequences contained domains which were called like POL12 and MPP superfamily, respectively. Both were related to DNA polymerase subunit but the functions were not clearly known. The fourth one, which is the longest, was predicted as heat shock protein as the previous analysis showed. The last one contained domain similar with UEV and UBCc superfamily. 

     

    This time, to have higher reliability, seq203 was blasted against sorghum which is known as the ancestor of maize. The blast result which showed the best hit score was as long as about 4.5k and it was anchored to chromosome 8 of sorghum. The e-value was 0 and identity and similarity was 89.2%, which mean the sequences are pretty conserved in both species. In case that a sequence or a gene is conserved over several species, the gene have higher possibility to play an important role than others. (only chr8? how about repetitive region?)

    Screen shot 2010-10-21 at 11.00.04 AM.png

    Was this page helpful?
    Tag page (Edit tags)
    • No tags

    Files 24

    FileSizeDateAttached by 
     average distance tree.png
    No description
    9.53 kB17:00, 10 Dec 2010hajActions
     blastp.png
    the longest protein predicted by FGENESH
    101.69 kB12:38, 3 Oct 2010hajActions
     clustalW summary.txt
    No description
    868 bytes16:48, 10 Dec 2010hajActions
     compseq.txt
    No description
    967 bytes14:59, 25 Sep 2010aiwataActions
     cpgplot.png
    No description
    8.74 kB17:55, 20 Sep 2010aiwataActions
     FGENESH.pdf
    gene prediction in seq203
    93.11 kB11:44, 4 Oct 2010hajActions
     gene prediction.xlsx
    No description
    55.6 kB16:33, 18 Nov 2010aiwataActions
     Genmark.txt
    No description
    3.05 kB16:21, 25 Sep 2010aiwataActions
     Genmark_translated.txt
    No description
    2.98 kB16:21, 25 Sep 2010aiwataActions
     HSP_NJtree.pdf
    No description
    19.45 kB13:52, 10 Dec 2010aiwataActions
     Phylogenetic tree.png
    No description
    30.96 kB14:17, 10 Dec 2010aiwataActions
     Screen shot 2010-09-20 at 1.40.08 PM.png
    blastn against Zea may 1
    54.32 kB17:55, 20 Sep 2010aiwataActions
     Screen shot 2010-09-20 at 1.40.25 PM.png
    blastn against Zea may 2
    258.55 kB17:55, 20 Sep 2010aiwataActions
     Screen shot 2010-09-23 at 5.49.28 PM.png
    RepeatMasker Summary
    59.29 kB17:50, 23 Sep 2010hajActions
     Screen shot 2010-10-21 at 11.00.04 AM.png
    blastn result against sorghum
    36.87 kB11:52, 21 Oct 2010hajActions
     Screen shot 2010-10-21 at 12.19.05 PM.png
    gene prediction in the corresponding sorghum seq
    122.7 kB12:19, 21 Oct 2010hajActions
     Screen shot 2010-12-09 at 9.08.24 PM.png
    No description
    52.51 kB16:54, 10 Dec 2010hajActions
     Screen shot 2010-12-09 at 9.08.54 PM.png
    No description
    47.6 kB16:54, 10 Dec 2010hajActions
     Screen shot 2010-12-09 at 9.09.27 PM.png
    No description
    48.64 kB16:54, 10 Dec 2010hajActions
     Screen shot 2010-12-09 at 9.09.52 PM.png
    No description
    48.75 kB16:55, 10 Dec 2010hajActions
     Screen shot 2010-12-09 at 9.11.25 PM.png
    No description
    26.61 kB16:55, 10 Dec 2010hajActions
     seq203 dot plot.jpg
    No description
    509.46 kB16:04, 10 Dec 2010hajActions
     summary.rtf
    No description
    2.81 kB17:34, 23 Sep 2010hajActions
     TAIR WU-BLAST 2.0 Result.webarchive
    No description
    368.23 kB11:24, 4 Oct 2010hajActions
    You must login to post a comment.