Was this page helpful?

Annotation Page (seq 207)

    Table of contents
    1. 1.
    2. 2.
    3. 3.  
    4. 4.  
    5. 5.  
    6. 6. Predicted Proteins

    Sequence 207 Annotation

     

    Annotators: Jie Yin, Jiajie Huang

    Annotators contact information;

    DNA: Sequence 207, Repeat Masked Sequence

    Predicted proteins: FGENESH, Genmark

    Supplemental Information

     

    Sequence Composition

    Sequence 207 contains 400,000 bases which includes 2929 unknown bases in our sequence (0.73%).  These are all Ns indicated in the sequence.  As expected the content of CG and its complement GC are low, 0.59 and 0.84 times the expected frequenciesrespectively. 

    Surprisingly, AC content is also low 0.85 times the expected frequency?  Content of AA and TT are unusually high, 1.36 and 1.31 times the expected frequencies respectively. It seems that the high AA/TT content may be due to the presence of repeated sequences, but the reason for the low GT and TA content is unknown. For further details, see supplemental table S1.

    Sequence 207 contains several regions (15 - 35 kb, 180 -190 kb, 200 - 210 kb, 220 - 230 kb, 260 -270) with conspicuously different GC content (Fig 1).  Approximately 82% of sequence 207 is made up of retrotransposon and other repeats (Table S3).

    Figure 1: CpG Island Plot

    cpgplot.png

    Edit section

     

     

     

     

     

     

     

     

     

     

     

     

     

    Predicted Proteins

    Some inital predicted proteins.  One gene is found to be DNA polymerase I family protein in Oryza sativa. Pink indicates definite gene, white needs more checking, blue seems uninteresting.

    To date, at least 12 classes of DNA polymerase have been identified in animals, α, β, γ, δ, ɛ, ζ, η, θ, ι, κ, λ and µ (. However, information concerning plant DNA polymerases is still very limited . To date, only two DNA polymerases from higher plants have been isolated, the catalytic subunit of DNA polymerase α  and catalytic and small subunits of DNA polymerase δ.

     

    Table 1: Proteins Predicted by FGENESH and GeneMark

    ID

    Begin (TSS)

    End (PolA)

    Strand

    N Exons

    F=FGENESH G=Genmark

    BLAST

    207

    a

     

    32665

     

     

    32962

     

    +

    2

    G

    no match

    207

    b

    75556


     

    75971

     

     

    -

    3

    G

    hypothetical protein LOC100193428

    207

    c

    94736

    97100

     

    -

    7

    G

    hypothetical protein LOC100216784

     

    207

    d

    129228

    129471

    +

    +

    2

    G

    No match

    207

    e

    160825

    160825

    162012

    162012

    +

    +

    2

    2

    F

    G

    Cinful1 polyprotein

    207

    f

     

    201954

     

    202281

    +

    2

    G

    hypothetical protein

    207

    g

     

    212181

    212181

     

     

    221125

    221229

     

     

    +

    +

     

    12

    10

    F

    G

    F (BLASTX result)
    DNA polymerase I family protein, expressed [Oryza sativa (japonica cultivar-group)] e-value=0

    hypothetical protein OsJ_35810 [Oryza sativa Japonica Group] e-value=0

    hypothetical protein OsI_38041 [Oryza sativa Indica Group] e-value=0

    G (BLASTP result)
    hypothetical protein SORBIDRAFT_08g011930 [Sorghum bicolor] e-value=3e-173

    hypothetical protein OsJ_35810 [Oryza sativa Japonica Group] e-value=1e-101

    DNA polymerase I family protein, expressed [Oryza sativa (japonica cultivar-group)] e-value=2e-101

    207

    h

    230091

     

    230091

    256674

     

    231144

    +

     

    +

    18

     

    3

    F

     

    G

    hypothetical protein LOC100382802

    207

    i

    253572

    265050     

     

    -

     

    3

    G

    hypothetical protein LOC100383572

    207

    j

     

    299856

    300000

    -

    2

    G

    hypothetical protein LOC100501802

     

    207

    k

     

    342726

     

    344660

    +

    2

    G

    Unknown

    207

    l

     

    368559

     

    378154

    +

    2

    G

    hypothetical protein LOC100279776

    207

    m

    385411

    386100

    +

    2

    G

    protein brittle-1, chloroplastic/amyloplastic precursor

     

     

    Sequence 207g

    We used InterProScan to reveal amino acid sequence 207g and we found a Cysteine peptidase active site, a DEAD/DEAH box type DNA/RNA helicase (N-terminal), a DEAD-like helicase (N-terminal), and a Helicase superfamily 1/2 ATP-binding domain. (InterProScan result)

     

    Peptidases

    In the MEROPS database peptidases and peptidase homologues are grouped into clans and families. Clans are groups of families for which there is evidence of common ancestry based on a common structural fold:

    • Each clan is identified with two letters, the first representing the catalytic type of the families included in the clan (with the letter 'P' being used for a clan containing families of more than one of the catalytic types serine, threonine and cysteine). Some families cannot yet be assigned to clans, and when a formal assignment is required, such a family is described as belonging to clan A-, C-, M-, S-, T- or U-, according to the catalytic type. Some clans are divided into subclans because there is evidence of a very ancient divergence within the clan, for example MA(E), the gluzincins, and MA(M), the metzincins.
    • Peptidase families are grouped by their catalytic type, the first character representing the catalytic type: A, aspartic; C, cysteine; G, glutamic acid; M, metallo; S, serine; T, threonine; and U, unknown. The serine, threonine and cysteine peptidases utilise the amino acid as a nucleophile and form an acyl intermediate - these peptidases can also readily act as transferases. In the case of aspartic, glutamic and metallopeptidases, the nucleophile is an activated water molecule.

    In many instances the structural protein fold that characterises the clan or family may have lost its catalytic activity, yet retain its function in protein recognition and binding.

    Cysteine peptidases have characteristic molecular topologies, which can be seen not only in their three-dimensional structures, but commonly also in the two-dimensional structures. These are peptidases in which the nucleophile is the sulphydryl group of a cysteine residue. Cysteine proteases are divided into clans (proteins which are evolutionary related), and further sub-divided into families, on the basis of the architecture of their catalytic dyad or triad [1].

    This entry represents the catalytic triad of the cysteine peptidases that are found in the MEROPS peptidase families C1A (papain), C1B (bleomycin hydrolase) and C2 (calpain).

    Some of the proteins in this family are also allergens. Allergies are hypersensitivity reactions of the immune system to specific substances called allergens (such as pollen, stings, drugs, or food) that, in most people, result in no symptoms. A nomenclature system has been established for antigens (allergens) that cause IgE-mediated atopic allergies in humans [2]. This nomenclature system is defined by a designation that is composed of the first three letters of the genus; a space; the first letter of the species name; a space and an arabic number. In the event that two species names have identical designations, they are discriminated from one another by adding one or more letters (as necessary) to each species designation.

    The allergens in this family include allergens with the following designations: Der f 1, Der m 1 and Der p 1.

    The pathways that peptidases are involved are shown in the KEGG PATHWAY diagrams below:

    pathway_peptidase_1.png

    pathway_peptidase_2.png

     

    Helicases

    Helicases have been classified in 5 superfamilies (SF1-SF5). All of the proteins bind ATP and, consequently, all of them carry the classical Walker A (phosphate-binding loop or P-loop) and Walker B (Mg2+-binding aspartic acid) motifs. For the two largest groups, commonly referred to as SF1 and SF2, a total of seven characteristic motifs has been identified [1]. These two superfamilies encompass a large number of DNA and RNA helicases from archaea, eubacteria, eukaryotes and viruses that seem to be active as monomers or dimers. RNA and DNA helicases are considered to be enzymes that catalyse the separation of double-stranded nucleic acids in an energy-dependent manner [2].

    The various structures of SF1 and SF2 helicases present a common core with two alpha-beta RecA-like domains [2, 3]. The structural homology with the RecA recombination protein covers the five contiguous parallel beta strands and the tandem alpha helices. ATP binds to the amino proximal alpha-beta domain, where the Walker A (motif I) and Walker B (motif II) are found. The N-terminal domain also contains motif III (S-A-T) which was proposed to participate in linking ATPase and helicase activities. The carboxy-terminal alpha-beta domain is structurally very similar to the proximal one even though it is bereft of an ATP-binding site, suggesting that it may have originally arisen through gene duplication of the first one.

    Some members of helicase superfamilies 1 and 2 are listed below:

    • DEAD-box RNA helicases. The prototype of DEAD-box proteins is the translation initiation factor eIF4A. The eIF4A protein is an RNA-dependent ATPase which functions together with eIF4B as an RNA helicases [4].
    • DEAH-box RNA helicases. Mainly pre-mRNA-splicing factor ATP-dependent RNA helicases [4].
    • Eukaryotic DNA repair helicase RAD3/ERCC-2, an ATP-dependent 5'-3' DNA helicase involved in nucleotide excision repair of UV-damaged DNA.
    • Eukaryotic TFIIH basal transcription factor complex helicase XPB subunit. An ATP-dependent 3'-5' DNA helicase which is a component of the core-TFIIH basal transcription factor, involved in nucleotide excision repair (NER) of DNA and, when complexed to CAK, in RNA transcription by RNA polymerase II. It acts by opening DNA either around the RNA transcription start site or the DNA.
    • Eukaryotic ATP-dependent DNA helicase Q. A DNA helicase that may play a role in the repair of DNA that is damaged by ultraviolet light or other mutagens.
    • Bacterial and eukaryotic antiviral SKI2-like helicase. SKI2 has a role in the 3'-mRNA degradation pathway. It represses dsRNA virus propagation by specifically blocking translation of viral mRNAs, perhaps recognising the absence of CAP or poly(A).
    • Bacterial DNA-damage-inducible protein G (DinG). A probable helicase involved in DNA repair and perhaps also replication [5].
    • Bacterial primosomal protein N' (PriA). PriA protein is one of seven proteins that make up the restart primosome, an apparatus that promotes assembly of replisomes at recombination intermediates and stalled replication forks.
    • Bacterial ATP-dependent DNA helicase recG. It has a critical role in recombination and DNA repair. It helps process Holliday junction intermediates to mature products by catalysing branch migration. It has a DNA unwinding activity characteristic of a DNA helicase with a 3' to 5' polarity.
    • A variety of DNA and RNA virus helicases and transcription factors

    This entry represents the ATP-binding domain found within most SF1 and SF2 helicases.

    The pathways that peptidases are involved are shown in the KEGG PATHWAY diagrams below:

    pathway_helicase_1.png

     

    pathway_helicase_2.png

     

    pathway_helicase_3.png

     

    I-TASSER predicted protein structure

    We use I-TASSER to predict the protein structure of sequence 207g, which is supposed to be DNA polymerase I family protein. DNA Polymerase I (or Pol I) is an enzyme that participates in the process of DNA replication in prokaryotes. It can sequentially catalyze multiple polymerisations. It was initially characterized in E. coli, although it is ubiquitous in prokaryotes. In E. coli and many other bacteria, the gene which encodes Pol I is known as polA.


    Pol I possesses three enzymatic activities:


       1. A 5' -> 3' (forward) DNA polymerase activity, requiring a 3' primer site and a template strand
       2. A 3' -> 5' (reverse) exonuclease activity that mediates proofreading
       3. A 5' -> 3' (forward) exonuclease activity mediating nick translation during DNA repair.

    In the replication process, DNA Polymerase I removes the RNA primer (created by Primase) from the lagging strand and fills in the necessary nucleotides of the Okazaki fragments (see DNA replication) in 5' -> 3' direction, proofreading for mistakes as it goes. It is a template-dependent enzyme - it only adds nucleotides that correctly base pair with an existing DNA strand acting as a template. Ligase then joins the various fragments together into a continuous strand of DNA.

    These 3D models are built based on multiple-threading alignments by LOMETS and iterative TASSER simulations.

    LOMETS (LOcalMEta-Threading-Server) is a locally installed meta-server method for protein structure prediction. It generates protein structure predictions by ranking and selecting models from 8 state-of-the-art threading programs. Spatial restraints are combined from the consensus of top 20 threading alignments.

    Protein threading, also known as fold recognition, is a method of protein modeling (i.e. computational protein structure prediction) which is used to model those proteins which have the same fold as proteins of known structures, but do not have homologous proteins with known structure. It differs from the homology modeling method of structure prediction in that it is used for proteins which do not have their homologous protein structures deposited in the Protein Data Bank (PDB), whereas homology modeling is used for those proteins which do. Threading works by using statistical knowledge of the relationship between the structures deposited in the PDB and the sequence of the protein which one wishes to model.

     

    Model 1

    model 1.png

     

    Model 2

    model 2.png

     

    Model 3

    model 3.png

     

    Model 4

    model 4.png

     

    Model 5

    model 5.png

    Was this page helpful?
    Tag page (Edit tags)

    Files 21

    FileSizeDateAttached by 
    207g FGENESH blastx.pdf
    207e FGENESH blastx
    219.77 kB01:11, 9 Dec 2010huang147Actions
     207g InterProScan.pdf
    207g InterProScan result
    52.68 kB02:14, 9 Dec 2010huang147Actions
    cpgplot.png
    CpG plot
    8.71 kB23:48, 18 Oct 2010jyinActions
     fgenesh gene prediction.txt
    FGENESH Predicted Protein
    9.87 kB23:24, 23 Oct 2010jyinActions
     GeneMark blastp.pdf
    207g GeneMark blastp
    184.02 kB01:32, 9 Dec 2010huang147Actions
    genemark gene prediction.txt
    genemark_proteins
    2.63 kB23:23, 23 Oct 2010jyinActions
     model 1.png
    model 1
    244.65 kB12:19, 11 Dec 2010huang147Actions
     model 2.png
    model 2
    225.45 kB12:36, 11 Dec 2010huang147Actions
     model 3.png
    model 3
    154.49 kB12:36, 11 Dec 2010huang147Actions
     model 4.png
    model 4
    260.26 kB12:36, 11 Dec 2010huang147Actions
     model 5.png
    model 5
    127.15 kB12:36, 11 Dec 2010huang147Actions
     model2.pdb
    model 2
    823.77 kB12:24, 11 Dec 2010huang147Actions
     model3.pdb
    model 3
    823.32 kB12:24, 11 Dec 2010huang147Actions
     model4.pdb
    model 4
    823.77 kB12:24, 11 Dec 2010huang147Actions
     model5.pdb
    model 5
    823.77 kB12:24, 11 Dec 2010huang147Actions
     pathway_helicase_1.png
    pathway_helicase_1
    80.29 kB02:32, 9 Dec 2010huang147Actions
     pathway_helicase_2.png
    pathway_helicase_2
    118.5 kB02:32, 9 Dec 2010huang147Actions
     pathway_helicase_3.png
    pathway_helicase_3
    69.83 kB02:32, 9 Dec 2010huang147Actions
     pathway_peptidase_1.png
    pathway_peptidase_1
    58.96 kB02:30, 9 Dec 2010huang147Actions
     pathway_peptidase_2.png
    pathway_peptidase_2
    86.72 kB02:30, 9 Dec 2010huang147Actions
     RM2_207.txt_1287783987.masked.txt
    Repeat masked sequence
    398.46 kB23:04, 23 Oct 2010jyinActions
    You must login to post a comment.