Was this page helpful?

Seq 202-14 Further Analysis

    Table of contents
    No headers

    1. Protein Sequence of 202-14

    >Seq 202-14

    MERKGGCCLAPRYAATAAAQQAGAAWQMGRIMLKFRPIAPKPAAMAPAPAPASAPVTGSA
    AGAGRGKRKAACGGGGRRGRKPKKAAKVAMVTAAPAATAAAQDVGDCREHCDKEKSSSSP
    SSSSSGTSSVDSSPPPRPQQRQLATLPLMPVTAAEDKAAACPATVGPELVPSQVATAARP
    LAPRAMRPAAAAAYLVTVEEVTATWRDGEAPASATGHDEAPAFVSDQWGRVTLWNAAFVR
    AASADDGDEAAAPVVLGGALPAWGTCAGFTCRVRARHWSARRVGSPVVAPCDVWRLDAAG
    SYLWRLDLQAALTLGGCL

     

    2. Conserved Domain in Seq 202-14

    We searched the NCBI Conserved Domain Database (CDD) with Seq 202-14 protein sequence. When using the default option, we found nothing (figure 1).

     

    202-14_figure_1.PNG

    figure 1

     

    However, when we unchecked "Apply low-complexity filter", a  domain, named PRK07003[PRK07003] or DNA polymerase III subunits gamma and tau was found (figure 2). PRK07003 is classified as a model that may span more than one domain, and is not assigned to any domain superfamily.

     

    202-14_figure_2.PNG

    figure 2

     

    Most of matched part in Seq 202-14 is low-complexity region, that is why we find nothing at first. Considering that the low-complexity region may cause inaccurate annotation, We compared Seq 202-14 with the listed proteins that surely contain this domain by ClustalW2 server on EBI (defult parameter, used sequences).

    The alignment result is here. Seq 202 did not match with these proteins very well, while these proteins matched very well with each other. We conclude that there is no PRK07003 domain in Seq 202-14. We confirmed Wootton's theory that a high similarity score between a pair of unrelated protein sequences can occur because the sequences contain compositionally biased or non-random (so called low-complexity) regions of some amino acid residues (Wootton, 1994).

    Also, Seq 202-14 contains many low-complexity regions. We think it is difficult do determine the 3D structure of Seq 202-14 by homology modeling, because of two reasons. First, as the reason we mentioned before, we may get wrong homologous proteins by search on PDB database. Second, structural genomics groups usually select proteins that do not have those low-complexity regions, in order to get a good chance of successful structure determination(Bannen et al, 2008), so there may be few proteins having similarity with Seq 202-14 in PDB. The result of BLAST against PDB database confirmed our guess (figure 3).

    BLAST_seq_202-14.PNG

     figure 3

    Reference

    http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi, Conserved Domain Database (CDD) on NCBI, 2010-12-08

    http://www.ebi.ac.uk/Tools/msa/clustalw2/, ClustalW2 sever on EBI, 2010-12-08

    Wootton, J.C. 1994 Sequence with ‘unusual’ amino acid composition. Curr. Opin. Struct. Biol. 14413–421

    Bannen RM, Bingman CA, Phillips GN Jr. 2008 Effect of low-complexity regions on protein structure determination.

    J Struct Funct Genomics. 8(4):217-26

    Was this page helpful?
    Tag page (Edit tags)
    • No tags

    Files 4

    FileSizeDateAttached by 
     202-14_figure_1.PNG
    No description
    17.92 kB12:00, 9 Dec 2010xiao12Actions
     202-14_figure_2.PNG
    No description
    50.82 kB12:05, 9 Dec 2010xiao12Actions
     BLAST_seq_202-14.PNG
    No description
    50.78 kB13:08, 9 Dec 2010xiao12Actions
     clustalw_sequences.txt
    No description
    9.62 kB12:23, 9 Dec 2010xiao12Actions
    You must login to post a comment.