Was this page helpful?

Gene 1 (CCDC94)

                                                                                                                                                          Back to main page

    Coiled-coil domain-containing protein 94 (CCDC94)  

    Gene Model:

    5009-5206 exon_1

    7973-8122 exon_2

    8217-8351 exon_3

    8444-8509 exon_4

    8683-8748 exon_5

    8896-8985 exon_6

    9934-10014 exon_7

    11091-11288 exon_8

     

    Final Protein Sequence:

    MGERKVLNKYYPPDFDPSKIPRRRQPKNQQIKVRMMLPMSIQCGTCGTYIYKGTKFNSRKEDVVGETYLGIQIFRFYFKCTKCSAEITFKT
    DPQNSDYTVESGASRNFEPWRAQDEAADKEKIKRDAEEMGDAMKALENRAMDSKQDMDILAALEEMRSMKSRHAGVSVDQMLEILKRSAHE
    KEEKAIAELDEEDEELIKSITFRNSGFYVKRIEDDDDDDDEDLAPGQSSRTIKINESSESVTKPTDVLSKNNGSEVANKEGSKSWMPKFIV
    KPKSASADSHKRQKIESTAVQDTGKGQDDEQKSEPAKQTNVLQSLCQNYDSDDSE

     

    Function:

    CCDC94 is a gene that encodes a protein with few known functions or informative domains but that is highly conserved from yeast to humans. It is recently discovered, in vertebrate cells, that CCDC94 is a functional component of the Prp19 complex and involves mRNA splicing and DNA repair. Further, it can negatively regulate the expression of p53 gene and protect cells from radiation [Sorrells S et al. 2012].  

     

    Annotation Process

    1. Comparison of the two predicted gene models by FGENESH and GeneMark:

    There are some minor discrepancies between the two predictions, but we can get a well-around gene model if we combine them together and take the ESTs support into account.

    GeneMark, 1074bp, 357aa FGENESH, 1101bp, 366aa EST Evidence (Identity>95%)
    2 + 4455 11288 1 + 5009 11288 Start End Matches 
    2_1 + 4455 4529               
    2_2 + 4997 5206 1_1 + 5009 5206 4853 5206 >10
    2_3 + 7973 8122 1_2 + 7973 8122 7971 8123 >10
    2_4 + 8217 8351 1_3 + 8217 8351 8219 8351 >10
    2_5 + 8444 8509 1_4 + 8444 8509 8442 8510 >10
    2_6 + 8683 8748 1_5 + 8683 8748 8681 8748 >5
    2_7 + 8896 8985 1_6 + 8896 8985 8896 8985 >5
            1_7 + 9425 9454      
            1_8 + 9508 9591      
    2_8 + 9934 10015 1_9 + 9934 10014 9932 10016 >5
    2_9 + 11089 11288 1_10 + 11091 11288 11087 11287 >10

     

    Blastn for ESTs evidence:

    G1_EST.bmp

    The exons 1-6,9,10 (FGENESH model) have good match in EST Blast.

     

    ClustalW2 alignment: 

    Through comparing the translated protein sequence, we can identify the extra amino acids from the two predictions. The red parts in following alignment are unnecessary based on Blastp and Blastx evidence.

                                 (this part FGENESH is right)
    FGENESH_                     -----------------------------MGERKVLNKYYPPDFDPSKIP 21
    GMK_2|GeneMark.hmm|gene      MDHRRWIRLGWNLLSRGRARSGGTTGSGKMGERKVLNKYYPPDFDPSKIP 50
                                                              *********************
    
    FGENESH_                     RRRQPKNQQIKVRMMLPMSIQCGTCGTYIYKGTKFNSRKEDVVGETYLGI 71
    GMK_2|GeneMark.hmm|gene      RRRQPKNQQIKVRMMLPMSIQCGTCGTYIYKGTKFNSRKEDVVGETYLGI 100
                                 **************************************************
    
    FGENESH_                     QIFRFYFKCTKCSAEITFKTDPQNSDYTVESGASRNFEPWRAQDEAADKE 121
    GMK_2|GeneMark.hmm|gene      QIFRFYFKCTKCSAEITFKTDPQNSDYTVESGASRNFEPWRAQDEAADKE 150
                                 **************************************************
    
    FGENESH_                     KIKRDAEEMGDAMKALENRAMDSKQDMDILAALEEMRSMKSRHAGVSVDQ 171
    GMK_2|GeneMark.hmm|gene      KIKRDAEEMGDAMKALENRAMDSKQDMDILAALEEMRSMKSRHAGVSVDQ 200
                                 **************************************************
    
    FGENESH_                     MLEILKRSAHEKEEKAIAELDEEDEELIKSITFRNSGFYVKRIEDDDDDD 221
    GMK_2|GeneMark.hmm|gene      MLEILKRSAHEKEEKAIAELDEEDEELIKSITFRNSGFYVKRIEDDDDDD 250
                                 **************************************************
    
    FGENESH_                     DEDLAPGQSSRTIKDEGTVRAKKKQCTNFQIVTGGCIWITCSRLAGQEAG 271
    GMK_2|GeneMark.hmm|gene      DEDLAPGQSSRTIK------------------------------------ 264
                                 **************  (this part GeneMark is right)                                  
    FGENESH_                     NAINESSESVTKPTDVLSKNNGSEVANKEGSKSWMPKFIVKPKSASADSH 321
    GMK_2|GeneMark.hmm|gene      --INESSESVTKPTDVLSKNNGSEVANKEGSKSWMPKFIVKPKSASADSH 312
                                   ************************************************
    
    FGENESH_                     KRQKIESTAVQDTGKGQDDEQKSEPAKQTNVLQSLCQNYDSDDSE 366
    GMK_2|GeneMark.hmm|gene      KRQKIESTAVQDTGKGQDDEQKSEPAKQTNVLQSLCQNYDSDDSE 357
                                 ********************************************* 

     

    Query from GeneMark, Blastp

    coiled-coil domain-containing protein 94 homolog [Brachypodium distachyon]

    Score =   503 bits (1295),  Expect = 8e-180, Method: Compositional matrix adjust.

    Identities = 264/331 (80%), Positives = 289/331 (87%), Gaps = 4/331 (1%)

    (start from 30th)

    Query  30   MGERKVLNKYYPPDFDPSKIPRRRQPKNQQIKVRMMLPMSIQCGTCGTYIYKGTKFNSRK  89
                MGERKVLNKYYPPDFDPSKIPRRRQPKNQQIKVRMMLPMSI+CGTCGTYIYKGTKFNSRK
    Sbjct  1    MGERKVLNKYYPPDFDPSKIPRRRQPKNQQIKVRMMLPMSIRCGTCGTYIYKGTKFNSRK  60
     
    Query  90   EDVVGETYLGIQIFRFYFKCTKCSAEITFKTDPQNSDYTVESGASRNFEPWRAQDEAADK  149
                ED +GETYLGIQIFRFYFKCT+CSAEITFKTDPQNSDYTVESGASRNFEPWR +DE  DK
    Sbjct  61   EDCIGETYLGIQIFRFYFKCTRCSAEITFKTDPQNSDYTVESGASRNFEPWREEDEVVDK  120
     
    Query  150  EKIKRDAEEMGDAMKALENRAMDSKQDMDILAALEEMRSMKSRHAGVSVDQMLEILKRSA  209
                EK KR+AEEMGDAM+ALENRAMDSKQDMDILAALEEMRSMKSRHAGVSVDQMLEILK S 
    Sbjct  121  EKRKREAEEMGDAMRALENRAMDSKQDMDILAALEEMRSMKSRHAGVSVDQMLEILKHST  180
     
    Query  210  HEKEEKAIAELDEEDEELIKSITFRNSGFYVKRIEDDDDDDDEDLAPGQSSRTIKINESS  269
                H+KEEK +AELDEEDEELIKSITFRNS  YVKRIEDDDD+D++    GQSS T KIN SS
    Sbjct  181  HQKEEKTVAELDEEDEELIKSITFRNSKDYVKRIEDDDDEDEDSFVAGQSSVTSKINGSS  240
     
    Query  270  ESVTKPTDVLSKNNGSEVANKEGSKSW---MPKFIVKPKSASADSHKRQKIESTAVQDTG  326
                ESV  PTDVL+K NG E  +KE +KSW   MPKFIVKPK  +   +K+QK E+ A Q+ G
    Sbjct  241  ESVLHPTDVLTKTNGPESGSKEENKSWASKMPKFIVKPKPTATSPNKKQKTEAAASQNNG  300
     
    Query  327  KGQDDEQKSEPAKQTNVLQSLCQNYDSDDSE  357
                K    E+KSE +++TNVLQSLCQ YDSD+S+
    Sbjct  301  KAPVAEEKSEDSEKTNVLQSLCQ-YDSDESD  330

      

    Query from FGENESH, Blastp

    coiled-coil domain-containing protein 94 homolog [Brachypodium distachyon]
    Score =   484 bits (1247),  Expect = 3e-170, Method: Compositional matrix adjust.
    Identities = 264/369 (72%), Positives = 289/369 (78%), Gaps = 42/369 (11%)

     

    Query  1    MGERKVLNKYYPPDFDPSKIPRRRQPKNQQIKVRMMLPMSIQCGTCGTYIYKGTKFNSRK  60
                MGERKVLNKYYPPDFDPSKIPRRRQPKNQQIKVRMMLPMSI+CGTCGTYIYKGTKFNSRK
    Sbjct  1    MGERKVLNKYYPPDFDPSKIPRRRQPKNQQIKVRMMLPMSIRCGTCGTYIYKGTKFNSRK  60
     
    Query  61   EDVVGETYLGIQIFRFYFKCTKCSAEITFKTDPQNSDYTVESGASRNFEPWRAQDEAADK  120
                ED +GETYLGIQIFRFYFKCT+CSAEITFKTDPQNSDYTVESGASRNFEPWR +DE  DK
    Sbjct  61   EDCIGETYLGIQIFRFYFKCTRCSAEITFKTDPQNSDYTVESGASRNFEPWREEDEVVDK  120
     
    Query  121  EKIKRDAEEMGDAMKALENRAMDSKQDMDILAALEEMRSMKSRHAGVSVDQMLEILKRSA  180
                EK KR+AEEMGDAM+ALENRAMDSKQDMDILAALEEMRSMKSRHAGVSVDQMLEILK S 
    Sbjct  121  EKRKREAEEMGDAMRALENRAMDSKQDMDILAALEEMRSMKSRHAGVSVDQMLEILKHST  180
     
    Query  181  HEKEEKAIAELDEEDEELIKSITFRNSGFYVKRIEDDDDDDDEDLAPGQSSRTIKDEGTV  240
                H+KEEK +AELDEEDEELIKSITFRNS  YVKRIEDDDD+D++    GQSS T K     
    Sbjct  181  HQKEEKTVAELDEEDEELIKSITFRNSKDYVKRIEDDDDEDEDSFVAGQSSVTSK-----  235
                          (redundant exons)
    Query  241  RAKKKQCTNFQIVTGGCIWITCSRLAGQEAGNAINESSESVTKPTDVLSKNNGSEVANKE  300
                                                 IN SSESV  PTDVL+K NG E  +KE
    Sbjct  236  ---------------------------------INGSSESVLHPTDVLTKTNGPESGSKE  262
     
    Query  301  GSKSW---MPKFIVKPKSASADSHKRQKIESTAVQDTGKGQDDEQKSEPAKQTNVLQSLC  357
                 +KSW   MPKFIVKPK  +   +K+QK E+ A Q+ GK    E+KSE +++TNVLQSLC
    Sbjct  263  ENKSWASKMPKFIVKPKPTATSPNKKQKTEAAASQNNGKAPVAEEKSEDSEKTNVLQSLC  322
     
    Query  358  QNYDSDDSE  366
                Q YDSD+S+
    Sbjct  323  Q-YDSDESD  330
     
    Blastp after deletion of the extra protein parts:
     
    G1.bmp
     
     
    InterProScan for Protein Function
    G1_interpro.bmp
    Reference:

    Sorrells SCarbonneau S, et al. Ccdc94 protects cells from ionizing radiation by inhibiting the expression of p53PLoS Genet. 2012 Aug;8(8):e1002922.

    Was this page helpful?
    Tag page (Edit tags)
    • No tags

    Files 3

    FileSizeDateAttached by 
     G1.bmp
    No description
    237.87 kB20:59, 24 Oct 2012cui19Actions
     G1_EST.bmp
    No description
    1033.18 kB18:10, 2 Dec 2012cui19Actions
     G1_interpro.bmp
    No description
    1376.05 kB17:45, 2 Dec 2012cui19Actions
    You must login to post a comment.