Table of contents
    No headers

    Seq 204 Annotation Page

     

    Input:Seq 204 

    Annotator: Xiaoqing Yu

    Predicted Proteins: GeneMark , FGENESH

    More Tables: Seq204 Supplemental page

     

     

    Sequence Decomposition

    The results are similar to seq190:

    CG AC GT GC are the lowest words with frequencies 0.673 0.830 0.853 0.855 respectively.

    The unknow sequences (Ns): 0.99%
     

    Word Obs Count  Obs Frequency  Exp Frequency  Obs/Exp Frequency
    CG 15929 0.03982 0.0625 0.637
    AC 20757 0.05189 0.0625 0.830
    GT 21328 0.05332 0.0625 0.853
    GC 21369 0.05342 0.0625 0.855
    TA 21971 0.05493 0.0625 0.879
    GG 22580 0.05645 0.0625 0.903
    CC 22581 0.05645 0.0625 0.903
    AG 25018 0.06255 0.0625 1.001
    GA 25241 0.06310 0.0625 1.010
    TC 25832 0.06458 0.0625 1.033
    CT 25876 0.06469 0.0625 1.035
    CA 26150 0.06538 0.0625 1.046
    TG 26995 0.06749 0.0625 1.080
    AT 27595 0.06899 0.0625 1.104
    AA 33064 0.08266 0.0625 1.323
    TT 33774 0.08444 0.0625 1.351
    Other 3939 0.00985    
    Total 399999      

     

    Figure 1: CpG Island Plot

     

    Seq204-cpgplot.png

     


    Predicted Proteins:

    Twenty two candidate proteins predicted through GeneMark and FGENESH. Based on the physical positio, the nearby ones have been put into one group to do analysis. Sixteen groups of predicted protein have been selected out in total, marked 204a to 204p.  The analysis performed basically following these steps:

    1. Blast all those 22 proteins in NCBI through blastp and specify organism within flowering plants.
    2. To further confirm the results, expanded sequence (both masked and unmasked +1000bp in upper and lower end, under some condition, expand up to 3000bp) were used to search closed related regions through nucleotide blast and/or blastx. When doing this, carefully exam the matches to make sure they are not just matched to some repeated area or transposons
    3. Try to find some matches in other species such as Rich, Sorghum and Arabidopsis to collect enough evidence to clarify the function of candidate predictions.
    4. If some candidate domain found, blast the domain database to make further confirmation.
    5. If any good match has been found, blast its mRNA sequence back to sequence 204.
    6. Do step 1-5 back and forth several times to get comfortable annotation.
    7. Blast Maize EST/cDNA database using the candidate genes to do expression confirmation.  

    After the analysis above, eight gene or domain have been found in the 400kb sequence (seq204) area. Among them, six have very strong evidence and their expression have been confirmed by Maize EST/cDNA database. The other two may need some further check since they are transponsase domain containing gene. Details please refer to the table below.

    Orange indicates definite gene or gene domain with strong evidence; Blue need more checking, white stand for no match or finally no gene found. 

    ID

    Begin (TSS)

    End (PolA)

    Length

    Strand

    N Exons

    F=FGENESH

    G=Genmark

    BLAST

    204a

    15815

    16050

    56aa

    +

    2

    G

     No matchab

    204b

    18719

    19107

    99aa

    +

    2

    G

     No matchab Conmbinded 204a & 204b still no match

    204c

    43858

    32243

    685aa

    -

    11

    F

    Transposase_24 Superfamily Evidence

    204d

    55692

    56788

    105aa

    +

    2

    G

     No matchab

    204e

    78551

    76842

    76045

    75955

    159aa

    191aa

    -

    -

    2

    3

    F

    G

     No matchab  Conmbinded F & G results still no match

    Analysis process

    204f

    103450

    103639

    31aa

    +

    2

    G

     No matchab

    204g

    143554

    142875

    99aa

    -

    3

    G

    Protein phosphatase type 2Cab; expanded up to 200aa Evidence

    204h

    151985

    150372

    150806

    148828

    149005

    150427

    265aa

    154aa

    67aa

    -

    -

    -

    3

    3

    2

    F

    G

    G

    Part of Pre-mRNA-splicing factor containing MA3 domain. Matched to Rice, Sorghum, Grape and Arabidopsis lyrata with e-value < e-99. F model is better.

    Evidence

    204i

    165010

    164826

    168470

    167939

    297aa

    367aa

    +

    +

    4

    4

    F

    G

    Part of Pre-mRNA-splicing factor contain MIF4G domain.Matched to same group of reference genes as 204h.

    Possibly can combine with 204h as a whole Pre-mRNA splicing factor. Details in Evidence

    204j

    183679

    183205

    172660

    175722

    297aa

    238aa

    -

    -

    8

    5

    G

    F

     
    Fiber protein Fb2/ Di19 domain containing protein. Matched to Sorghum, Rice, Populus with e-value range from e-30 to e-105. Evidence

    204k

    226021

    230090

    64aa

    +

    2

    G

    First No match found. Then expanded this gene to about 440aa in genome area 225159-228046 coding a putative protein containing a FAR1 DNA-binding and a MULE transposase domain. Evidence

    204l

    279709

    279998

    56aa

    +

    2

    G

    No matcha

    204m

    312275

    312591

    317010

    316446

    1131aa

    1217aa

    +

    +

    3

    3

    F

    G

    Cellulose synthase 

    More than 50 matches Evalue=0! (Rice, Sorghum,  tobacco, AT, Ricinus communis and etc). Expression Confirmation

    204n

    341193

    341280

    343037

    341881

    161aa

    161aa

    +

    +

    2

    2

    F

    G

    Pathogenesis-related protein in Rice and Wheat; 4e-88 and 3e-47 respactively. Expression Confirmation

    204o

    349277

    347134

    177aa

    -

    4

    F

     Very weak match which can be considered as randoma

    204p

    355843

    355721

    40aa

    -

    1

    G

     No matcha

    a: Using genome DNA blastn confirmed (masked sequence + 1000bp in upper and lower end)

    b: Using genome DNA blastn confirmed (unmasked sequence + 1000bp in upper and lower end)

    Was this page helpful?
    Tag page (Edit tags)
    • No tags

    Files 1

    FileSizeDateAttached by 
     Seq204-cpgplot.png
    No description
    8.8 kB23:38, 7 Dec 2010yu87Actions
    You must login to post a comment.