Was this page helpful?

Candidate Genes- Group 5

    Candidate Genes

    We have gone through our 150,000bp sequence and used gene prediction software.  We then used the predicted genes and conducted different blast techniques to verify candidate genes. We looked at hits on other organisms as well as looked at their EST/cDNA data.

    Our in depth evaluation of the sequence with identified candidate genes can be seen by clicking here.


    Below is a description of the identified genes loci with coordinates of exons and a brief description of why we believe they are in fact genes:


    137-3498 Forward Strand

    Exon 1: 137-2325

    Exon 2: 2462-2600

    Exon 3: 2745-3116

    Exon 4: 3179-3234

    Identified through FGENESH and Augustus, however the Augustus prediction shows a much larger gene with a huge intron at the end leading to one extra tiny exon.  We should look more into it, but here I am posting the FGENESH prediction.

    Using BLAST, it shows homology with loci near 9008 in both B73 and Mo17 cultivars.  When excluding maie, it show homology to Gossypium raimondii clone GR__Ba0131I15-ho.  


    7206-10856 Reverse Strand

    Exon 1: 7421-7456

    Exon 2: 7593-7721

    Exon 3: 8031-8117

    Exon 4: 8518-8583

    Exon 5: 9922-10017

    Exon 6: 10127-10243

    Very close prediction in both FGENESH and Augustus.  Strong candidate gene. 

    Has homology to maize nuclear ribonuclease Z mRNA as well as several Sorghum bicolor hypothetical proteins.  


    13766-26718 Forward Strand

    Exon 1: 15600-15663

    Exon 2: 16520-16620

    Exon 3: 17778-17872

    Exon 4: 18824-19196

    Exon 5: 19689-19918

    Exon 6: 21401-21458

    Exon 7: 21961-22036

    Exon 8: 22619-22718

    Exon 9: 24884-24968

    Exon 10: 25189-25272

    Exon 11: 25364-25696

    Exon 12: 25775-25792

    Again confirmed with both FGENESH and Augustus, only FGENESH predicted a larger gene with 6 more exons.  The FGENESH prediction is used here because its wide range will allow us to look more closely at potential elements within the gene, though further refinement is required. Most likely, the Augustus prediction is more correct but there might be something else going on in this gene.

    Shows homology to maize ZM_BFb0153F14 mRNA as well as single myb histone 1 mRNA and Sorghum bicolor hypothetical proteins.  There were also several lower scoring hits to rice myb proteins.


    38446-43264 Forward Strand

    Exon 1: 38764-39102

    Exon 2: 39189-39271

    Exon 3: 39400-39471

    Exon 4: 41058-41175

    Exon 5: 41858-41946

    Exon 6: 42139-42220

    Predicted in both Augustus and FGENESH, and as usual the FGENESH prediction is longer and with more exons. 

    Shows homology to maize  ZM_BFc0037B10 mRNA as well as several Sorghum bicolor hypothetical proteins. 


    46516-48075 Forward Strand

    Exon 1: 47239-48036

    Despite being predicted in both Augustus and FGENESH (albeit a bit longer), this gene looks a bit weird.  I've put it here because of the strong prediction, but we should be careful about what it actually is.

    Shows homology to B73 putative gag protein as well as MMC1 retrotransposon xilon leading me to think this may be a transposable element.


    66287-69085 Forward Strand

    Exon 1: 66287-67405 

    Exon 2: 67439-67615 

    Exon 3: 68270-69085 

    May be from a transposable element but has hits on RNAse.

    RNase_HI_archaeal_like[cd09279], RNAse HI family that includes Archaeal RNase HI;

    RT_LTR[cd01647], RT_LTR: Reverse transcriptases (RTs) from retrotransposons and retroviruses which have long terminal repeats (LTRs) in their DNA copies but not in their RNA template.

    rve[pfam00665], Integrase core domain

    RVT_3[pfam13456], Reverse transcriptase-like; This domain is found in plants and appears to be part of a retrotransposon.

    RNase_HI_RT_Ty3[cd09274], Ty3/Gypsy family of RNase HI in long-term repeat retroelements;

    RNase_H[cd06222], RNase H is an endonuclease that cleaves the RNA strand of an RNA/DNA hybrid in a sequence non-specific manner

    RNase_H[pfam00075], RNase H; RNase H digests the RNA strand of an RNA/DNA hybrid. Important enzyme in retroviral replication cycle.

    RVT_1[pfam00078], Reverse transcriptase (RNA-dependent DNA polymerase) PRK07238[PRK07238], bifunctional RNase H/acid phosphatase

    PRK07708[PRK07708], hypothetical protein; Validated


    86898-88664 Forward Strand

    Exon 1: 86898-87090

    Exon 2: 87304-88664

    May be from a transposable element but has hits on RNAse.

    RNase_HI_archaeal_like[cd09279], RNAse HI family that includes Archaeal RNase HI

    101001-102816 Forward Strand

    Exon 1: 101001-101557

    Exon 2: 102747-102816

    Identified through Augustus, FGENESH had same first exon but different second exon.  Did a blastp of both results and found better homology to Augustus result.

    Shows high homology to Sb03g026500 and Os01g0589700 with supporting EST data from Sorghum bicolor, Saccharum officinarium (sugar cane), and Zea mays. (Phytozome)  Plantgdb.org shows both EST and cDNA data supporting my gene structure.

    Splice donor site present after exon 1 and splice acceptor site found before exon 2. No polypyrimidine identified.


    107614-108618 Forward Strand

    Exon 1: 107614-108056

    Exon 2: 108504-108618

    Augustus and FGENESH showed gene structure exactly the same. Nucleotide blast (nr) showed a plant-light harvesting domain (PLN00014).

    Shows high homology to Sb03g026510 (2e-91) with supporting EST/cDNA data.  Also has supporting EST data from Saccharum officinarium (sugar cane) and Zea mays. (Phytozome) Maizegdb.org shows both EST and cDNA data supporting my gene structure.


    112206-115963 Reverse Strand

    Exon 1: 115759-115963

    Exon 2: 112455-112528

    Exon 3: 112206-112358

    Both Augustus and FGENESH show a possible gene at this locus.  Augustus shows two possible gene structures, but through blast and homology to Sorghum bicolor I believe that transcript 1 is more correct. BlastX gives a hit to a coiled-coil domain containing protein 94 (6e-12).  This is a relatively low evalue.  Reading about these domains, it appears they are more common in animals and transposons in rice have been shown to contain these domains.  Nucleotide blast has only one hit to Sorghum bicolor and this is a putatitve protein.  I therefore believe that there is possibly a transposable element at this locus but still has potential for a candidate gene.


    128236-128604 Forward Strand

    Exon 1: 128236-128365

    Exon 2: 128531-128604

    Augustus and FGENESH both show the same gene structure. Blastp of the proposed protein showed a top of Arabinogalactan peptide 22 from Arabidopsis (1e-06).  This is a relative high evalue but it did show a conserved domain of DUF1070.

    The DUF1070 is a family that consists of short hypothetical proteins of unknown function (http://aranet.mpimp-golm.mpg.de/pfamnet/DUF1070).  Gene ontolgy shows that the Arabinogalactan peptide 22 is a plasma membrane anchored protein (multiple GO terms) that appears to be involved in transport (GO:0006826 iron ion transport and GO:0015706 nitrate transport). It could also possibly be involved in brassinosteroid biosynthesis as proposed by gene ontology (GO:0016132). (http://www.ebi.ac.uk/QuickGO/GProtein?ac=Q9FK16)


    143398-145767 (or 146054) Forward Strand

    Exon 1: 143398-143493

    Exon 2: 143562-143652

    Exon 3: 144047-144164

    Exon 4: 144238-144318

    Exon 5: 144399-144447

    Exon 6: 144715-144829

    Exon 7: 144945-145036

    Exon 8: 145136-145283

    Exon 9: 145743-145849

    Exon 10: 145881-145978

    Exon 11: 146048-146054

    Identified through both Augustus and FGENESH. Augustus has two possible gene structures (4 or 5 exons...additional internal exon). FGENESH has 11 exons. I believe FGENESH is more accurate according to EST/cDNA data (see below).

    Did a blastp and showed a hit on Serine/threonine-protein kinase (PKc_like superfamily domain) with an evalue of 5e-31 for transcript 1 from Augustus. Transcript 2 from augustus did not have as good of a match. FGENESH actually showed a stronger hit of 2e-138 (Serine/threonine-protein kinase AFC3). I therefore used the FGENESH and I tried to translate the EST data that showed Exon 3, 4, and 5 (or just 3 and 4 from cDNA/EST) being one exon and got stop codons.  The FGENESH also matched other cDNA/EST data where exon 3, 4, and 5 are separate.

    EST serine threonine- protein kinase.jpg

    cDNA serine threonine- protein kinase.jpg



    Unlikely Genes from Gene Prediction Software (Low Support)

    These are potential genes that showed up on the gene prediction software programs. We do not believe these are in fact genes due to low support from blast/EST/cDNA data. Many of these identified appear to be from transposable elements.


    27759-31589 Reverse Strand

    Exon 1: 29387-29802

    Exon 2: 30527-30734

    Only shows up in the FGENESH prediction.  While looking at the structure, I am unsure if it is a gene but I do think something is going on.  Further work is required.

    Shows homology to loci near 9009 in both B73 and Mo17 as well as Gossypium raimondii clone GR__Ba0131I15-hog. 


    32825-35595 Reverse Strand

    Exon 1: 33353-33516

    Exon 2: 34458-34545

    Exon 3: 34619-34708

    Exon 4: 35008-35067

    Exon 5: 35143-35259

    Another gene that only shows up in the FGENESH prediction.  Structurally, it looks like there is actually a gene here but since it didn't show support in both, I posting it here.

    Shows homology to maize PCO147975 mRNA sequence as well as Brachypodium distachyon sucrose nonfermenting 4-like protein.  


    36702-38203 Forward Strand

    Exon 1: 37861-38187

    Only predicted in FGENESH and is unlikely to be a gene as structurally it just doesn't look like one.  It is more likely to be a transposon or some other element.

    Shows homology to maize 506018 hypothetical protein mRNA as well as several qLTG3-1 genes in rice. 


    62665-64698 Forward Strand

    Exon 1: 62665-63676 

    Exon 2: 63853-64698

    Both FGENESH and Augustus predicted this as a gene. Blastp shows it contains conserved domains as "Retrotrans_gag[pfam03732], Retrotransposon gag protein; Gag or Capsid-like proteins from LTR retrotransposons". 


    66287 - 69085 Forward Strand

    Exon 1:  66287 - 67405  1119bp

    Exon 2:  67439 - 67615  177bp

    Exon 3:  68270 - 69085  816bp

    Both FGENESH and Augustus predicted this as a gene. Blastp shows it contains conserved domains as 

    "RNase_HI_archaeal_like[cd09279], RNAse HI family that includes Archaeal RNase HI" and

    "RT_LTR[cd01647], RT_LTR:Reverse transcriptases (RTs) from retrotransposons and retroviruses"


    82383 – 88664 Forward Strand

    Exon 1: 82383 - 83722   1338bp

    Exon 2: 84124 - 84298   174bp

    Exon 3:  84369 - 85018   684bp

    Exon 4: 85130 - 85433   303bp

    Exon 5:  85920 - 86500   579bp

    Exon 6: 86862 - 87035   174bp

    Exon 7:  87327 - 88664   1338bp

    Both FGENESH and Augustus predicted this as a gene. Blastp shows it contains conserved domains as: 

    "RT_LTR[cd01647], RT_LTR: Reverse transcriptases (RTs) from retrotransposons and retroviruses" "RNase_HI_archaeal_like[cd09279], RNAse HI family that includes Archaeal RNase HI"

    "RT_DIRS1[cd03714], RT_DIRS1: Reverse transcriptases (RTs) occurring in the DIRS1 group of retransposons"

    "RVT_1[pfam00078], Reverse transcriptase (RNA-dependent DNA polymerase); A reverse transcriptase gene is usually indicative of a mobile element such as a retrotransposon or retrovirus."

    So these sequences are probably transposons.


    117336-120152 Forward Strand

    Exon 1: 117336-120152

    Both Augustus and FGENESH show this as a gene.  Blastx and blastp excluding maize both show high hits (~1e-30) on retrotransposons.  Most likely not an actual gene.


    120627-122528 Forward Strand

    Exon 1: 120627-122258

    Exon 2: 122415-122528

    Augustus shows this as a possible gene. Blastp gives a top hit of uncharacterized mitochondrial protein from Arabidopsis (evalue= 0.80). The putative conserved domain shows hits to retrotrans_gag (5') and retropepsin_like (3'). This is most likely a retroactive element and not an actual gene.


    123178-123753 Reverse Strand

    Exon 1: 123178-123753

    Augustus shows a possible gene at this locus.  Blastx and blastp show very low hits (top hit: Uronate isomerase (evalue= 0.049))  Most likely not a real gene.


    126466-127960 Reverse Strand

    Exon 1: 127819-127960

    Exon 2: 126608-126940

    Exon 3: 126466-126482

    Augustus shows a possible gene at this locus, however shows very low GC content for 3' end. Blastp show poor hits with the top one being 3-dehydroquinate synthase (evalue=2.8). Most likely not an actual gene.


    134333-139660 Reverse Strand

    Exon 1: 139399-139660

    Exon 2: 138262-138746

    Exon 3: 135950-136307

    Exon 4: 134333-134373

    Augustus and FGENESH show genes present at this relative locus. Blastp shows a good hit to retro-virus transposable element (7e-19). There is a conserved rve superfamily domain that is an integrase core domain. It is unlikely that this is an actual gene. It should be noted that there was also a zinc-finger domain present at about 160aa downstream from the start codon.


    149039-149581 Forward Strand

    Exon1: 149039-149581

    FGENESH only showed this as a possible gene.  Blastp had a top hit of Acetate non-utilizing protein 9, mitochondrial (evalue= 0.87).

    This is a large evalue and Augustus did not have this as a potential gene.  I do not believe this is in fact a gene.

    Was this page helpful?
    Tag page (Edit tags)
    • No tags

    Files 16

    FileSizeDateAttached by 
    No description
    84.58 kB16:02, 4 Dec 2012nbbestActions
     Augustus part2.jpg
    No description
    56.65 kB03:31, 25 Oct 2012qian24Actions
    augustus results part3.txt
    No description
    27.49 kB17:16, 24 Oct 2012nbbestActions
     Candidate gene sequences group 5.docx
    No description
    810.92 kB07:31, 25 Oct 2012nbbestActions
    cDNA serine threonine- protein kinase part3.jpg
    No description
    30.92 kB17:16, 24 Oct 2012nbbestActions
     Coiled-coil protein.pdf
    No description
    548.67 kB22:58, 23 Oct 2012nbbestActions
     conserved domains 3.jpg
    No description
    82.69 kB03:33, 25 Oct 2012qian24Actions
     conserved domains 4.jpg
    No description
    214.76 kB03:33, 25 Oct 2012qian24Actions
     conserved domains 5.jpg
    No description
    160.66 kB03:33, 25 Oct 2012qian24Actions
    EST serine threonine- protein kinase part3.jpg
    No description
    95 kB17:16, 24 Oct 2012nbbestActions
    127.08 kB03:26, 25 Oct 2012qian24Actions
    No description
    192.23 kB18:25, 24 Oct 2012cdugardActions
    No description
    176.65 kB22:15, 22 Oct 2012nbbestActions
     Seq5part2 Gene Prediction Results.docx
    No description
    747.1 kB03:25, 25 Oct 2012qian24Actions
    No description
    1029.37 kB03:37, 25 Oct 2012qian24Actions
    No description
    15.09 kB18:25, 24 Oct 2012cdugardActions
    You must login to post a comment.