Was this page helpful?

Sequence analysis of the first 50,000 bases

    Table of contents
    No headers

    Analysis of the first 50,000 bases of SMO5 (2209000-2259000) (link to summary table)

    by Pasajee Kongsila

    Gene predict programs

    1.FGENESH using monocot

    2.Genemark-hmm using Oryza sativa

    For gene predictions "summary table" with alignment and BLAST result, please click the link above

    The 1st predicted gene

    FGENESH predicted: base 280-669 (single exon 390 bp), amino acid length =130

    The best hit of tblastn: XM_001774108Physcomitrella patens subsp. patens early light-induced protein 9 (ELIP9) mRNA, complete cds.

    E-value = 1e-23

    Genemark predicted: base 200-592 (single exon 393 bp), aa length = 120

    The best hit of tblastn: The best hit of tblastn: XM_001779259 Physcomitrella patens subsp. patens early light-induced protein 7(ELIP7) mRNA, complete cds E-value = 1e-11

    Some predictions: XM_001774108 Physcomitrella patens subsp. patens early light-induced protein 9(ELIP9) mRNA, complete cds. E-value = 1e-09

    Analysis: This prediction has the same result as using sequence in range of 3948-4690 bp, 5869-6603 bp, 7818-8569 bp (please see the following predictions in gene2-4). These predictions are related to the repetition in the dot plot which showed 4 repetitions in 1-10000 bp range.

    For this gene, both programs predicted the same gene which is early light-induced protein 7 (ELIP7). FGENESH predicted a little better than genemark (lower e-value).

    The 2nd predicted gene

    FGENESH predicted: base 3948-4690 (two exons: 3948-4241 (294 bp-terminal), 4301-4690 (390 bp-initial)), aa length = 227

    The best hit of tblastn: XM_001774108 Physcomitrella patens subsp. patens early light-induced protein 9 (ELIP9) mRNA, complete cds. E-value = 1e-59

    Genemark predicted: base 3948-4690 (two exons: 3948-4241 (294 bp-terminal), 4301-4690 (390 bp-initial)), aa length = 227

    The best hit of tblastn: XM_001774108 Physcomitrella patens subsp. patens early light-induced protein 9(ELIP9) mRNA, complete cds E-value = 1e-59

    Analysis: Both programs predicted exactly the same gene.

    The 3rd predicted gene

    FGENESH predicted: base 5869-6603 (two exons: 5869-6162 (294 bp-terminal), 6229-6603 (375 bp-initial)), aa length = 222

    The best hit of tblastn: XM_001774108 Physcomitrella patens subsp. patens early light-induced protein 9(ELIP9) mRNA, complete cds. E-value = 5e-60

    Genemark predicted: base 5014-6603 (three exons: 5014-5134 (130 bp-terminal), 5873-6162 (290 bp-internal), 6229-6603 (375 bp-initial)), aa length = 264

    The best hit of tblastn: XM_001774108 Physcomitrella patens subsp. patens early light-induced protein 9(ELIP9) mRNA, complete cdsE-value = 1e-59

    Analysis: FGENESH predicted the shorter gene which has two exons. Genemark predicted three exons. However, both programs predicted the same initial exon region.

    The 4th predicted gene

    FGENESH predicted: base 7818-8569 (two exons: 7818-8111 (294 bp-terminal), 8186-8569 (384 bp-initial), aa length = 225

    The best hit of tblastn: XM_001774108 Physcomitrella patens subsp. patens early light-induced protein 9(ELIP9) mRNA, complete cds. E-value = 3e-60

    Genemark predicted: base 7506-8569 (three exons: 7506-7629 (124 bp-terminal), 7822-8111 (290 bp-internal), 8186-8569 (384 bp-initial)), aa length = 265

    The best hit of tblastn: XM_001774108 Physcomitrella patens subsp. patens early light-induced protein 9(ELIP9) mRNA, complete cds. E-value = 6e-60

    Analysis: These prediction followed the same pattern as in gene 3.

    The 5th predicted gene

    FGENESH predicted: base 9337-14094 (five exons: 9337-9490 (153 bp-initial), 9728-10807 (1077 bp-internal), 11428-11510 (81 bp-internal), 12449-12526 (78 bp-internal), 12697-14094 (1398 bp-terminal)), aa length = 928

    The best hit of tblastn: XM_001756373 Physcomitrella patens subsp. patens predicted protein (PHYPADRAFT_24385) mRNA, partial cds. E-value = 1e-67

    Some predictions: NM_122278 Arabidopsis thaliana nucleotide binding (AT5G23730) mRNA, complete cds. E-value = 2e-54

    NM_124604 Arabidopsis thaliana transducin family protein / WD-40 repeat family protein (AT5G52250) mRNA, complete cds. E-value = 9e-51

    NM_104188 Arabidopsis thaliana SPA4 (SPA1-RELATED 4); signal transducer (SPA4) mRNA, complete cds. E-value = 2e-49

    Genemark predicted (two genes): base 9409-11579 (four exons: 9409-9490 (82 bp-initial), 9728-10807 (1080 bp-internal), 11017-11226 (210 bp-internal), 11446-11579 (134 bp-terminal), aa length = 502

    The best hit of tblastn: AC189209 Brassica rapa subsp. pekinensis clone KBrB007M04, complete sequence. E-value = 5e-31

    Some predictions: NM_148393 Arabidopsis thaliana RPD1 (ROOT PRIMORDIUM DEFECTIVE 1) (RPD1)mRNA, complete cds. E-value = 8e-20

    Base 12476-14094 (two exons: 12476-12526 (51 bp-initial), 12697-14094 (1398 bp-terminal)), aa length = 482

    The best hit of tblastn: XM_001756373 Physcomitrella patens subsp. patens predicted protein(PHYPADRAFT_24385) mRNA, partial cds E-value = 3e-68

    Some predictions: NM_124604 Arabidopsis thaliana transducin family protein / WD-40 repeat family protein (AT5G52250) mRNA, complete cds. E-value = 3e-51

    NM_104188 Arabidopsis thaliana SPA4 (SPA1-RELATED 4); signal transducer(SPA4) mRNA, complete cds E-value = 2e-49

    The 6th gene predicted

    FGENESH predicted: base 14409-16995 (five exons: 14409-15288 (879 bp-terminal), 16037-16309 (270 bp-internal), 16361-16473 (111 bp-internal), 16536-16757 (222 bp-internal), 16810-16995 (186 bp-initial)), aa length = 555

    The best hit of tblastn: XM_001754133 Physcomitrella patens subsp. patens predicted protein (PHYPADRAFT_114928) mRNA, complete cds. E-value = 1e-110

    Some predictions: NM_112392 Arabidopsis thaliana 3-hydroxybutyryl-CoA dehydrogenase, putative(AT3G15290) mRNA, complete cds. E-value = 4e-94

    Genemark predicted: base 14409-16995 (six exons: 14409-15254 (846 bp-terminal), 15702-15816 (115 bp-internal), 15971-16309 (339 bp-internal), 16361-16473 (113 bp-internal), 16536-16757 (222 bp-internal), 16810-16995 (186 bp-initial)), aa length = 606

    The best hit of tblastn: XM_001754133 Physcomitrella patens subsp. patens predicted protein(PHYPADRAFT_114928) mRNA, complete cds. E-value = 6e-122

    Some predictions: NM_112392Arabidopsis thaliana 3-hydroxybutyryl-CoA dehydrogenase, putative(AT3G15290) mRNA, complete cds E-value = 6e-122

    The 7th predicted gene

    FGENESH predicted: base 17202-18382 (five exons: 17202-17263 (60 bp-initial), 17299-17396 (96 bp-internal), 17455-17759 (303 bp-internal), 17818-18223 (405 bp-internal), 18267-18382 (114 bp-terminal)), aa length = 325

    The best hit of tblastn: EF082524 Picea sitchensis clone WS0279_P09 unknown mRNA.E-value = 3e-95

    Some predictions: NM_126173 Arabidopsis thaliana APG2 (ALBINO AND PALE GREEN 2) (APG2) mRNA,complete cds. E-value = 2e-92

    Genemark predicted: base 17202-18382 (five exons: 17202-17263 (62 bp-initial), 17299-17396 (98 bp-internal), 17455-17759 (305 bp-internal), 17818-18108 (291 bp-internal), 18161-18382 (222 bp-terminal)), aa length = 325

    The best hit of tblastn: EF082524 Picea sitchensis clone WS0279_P09 unknown mRNA E-value = 8e-113

    Some predictions: NM_126173 Arabidopsis thaliana APG2 (ALBINO AND PALE GREEN 2) (APG2) mRNA,complete cds. E-value = 8e-109

    The 8th predicted gene

    FGENESH predicted: base 22143-22436 (a single exon) 294 bp, aa length = 97

    The best hit of tblastn: AC158184 Selaginella moellendorffii clone JGIASXY-5E21, complete sequence. E-value = 8e-13

    Genemark predicted: base 19414-21489 (two exons: 19414-19492 (79 bp-terminal), 21326-21489 (164 bp-initial)), aa length = 80

    The best hit of tblastn: AC158184 Selaginella moellendorffii clone JGIASXY-5E21, complete sequence E-value = 7e-07

    Base 21592-22436 (three exons: 21592-21698 (107 bp-terminal), 21854-22006 (157 bp-internal), 22229-22436 (208 bp-initial)), aa length = 155

    The best hit of tblastn: AC158184 Selaginella moellendorffii clone JGIASXY-5E21, complete sequence E-value = 2e-13

    The 9th predicted gene

    FGENESH predicted: base 27984-32821 (16 exons: 27984-28034 (51 bp-terminal), 28081-28164 (84 bp-internal), 28216-28323 (108 bp-internal), 28382-28441 (60 bp-internal), 28495-28566 (72 bp-internal), 29042-29224 (183 bp-internal), 29287-29415 (129 bp-internal), 29472-29673 (201 bp-internal), 29735-29856 (120 bp-internal), 29909-30085 (177 bp-internal), 30147-30278 (132 bp-internal), 30332-30676 (345 bp-internal), 30730-31020 (291 bp-internal), 31081-31281 (201 bp-internal), 31332-31734 (402 bp-internal), 31788-32821 (1032 bp-initial)), aa length = 1195

    The best hit of tblastn: XM_001773270 Physcomitrella patens subsp. patens predicted protein (PHYPADRAFT_191392) mRNA, complete cds. E-vlaue = 2e-174

    Some prediction: NM_111305 Arabidopsis thaliana EMB2458 (EMBRYO DEFECTIVE 2458); ATPase (EMB2458) mRNA, complete cds E-vlaue = 1e-133

    Genemark predicted: base 27260-32821 (19 exons: 27260-27268 (9 bp-terminal), 27892-27942 (51 bp-internal), 28081-28164 (84 bp-internal), 28716-28323 (108 bp-internal), 28382-28441 (60 bp-internal), 28495-28566 (72 bp-internal), 28622-28807 (186 bp-internal), 28864-28983 (120 bp-internal), 29042-29224 (183 bp-internal), 29287-29415 (129 bp-internal), 29472-29673 (202 bp-internal), 29735-29856 (122 bp-internal), 29942-30085 (144 bp-internal), 30147-30278 (132 bp-internal), 30332-30672 (345 bp-internal), 30730-31020 (291 bp-internal), 31081-31281 (201 bp-internal), 31332-31734 (403 bp-internal), 31788-32821 (1034 bp-initial)), aa length = 1291

    The best hit of tblastn: XM_001773270 Physcomitrella patens subsp. patens predicted protein(PHYPADRAFT_191392) mRNA, complete cds. E-value = 0.0

    Some predictions: NM_111305 Arabidopsis thaliana EMB2458 (EMBRYO DEFECTIVE 2458); ATPase(EMB2458) mRNA, complete cds. E-value = 0.0

    The 10th predicted gene

    FGENESH predicted: base 33215-35059 (two exons: 33215-33501 (285 bp-initial), 33937-35059 (1122 bp-terminal)), aa length = 468

    The best hit of tblastn: AK247525 Solanum lycopersicum cDNA, clone: LEFL1041DD09, HTC in leaf E-value = 6e-135

    Some predictions: NM_113340 Arabidopsis thaliana glycosyl hydrolase family 17 protein (AT3G24330) mRNA, complete cds E-value = 2e-131

    Genemark predicted: base 33140-35059 (two exons: 33140-33501 (342 bp-initial), 33937-35059 (1123 bp-terminal)), aa length = 494

    The best hit of tblastn: AK247525 Solanum lycopersicum cDNA, clone: LEFL1041DD09, HTC in leaf E-value = 4e-135

    Some predictions: NM_113340 Arabidopsis thaliana glycosyl hydrolase family 17 protein (AT3G24330) mRNA, complete cds E-value = 3e-132

    The 11th predicted gene

    FGENESH predicted: base 37784-39157 (two exons: 37784-38247 (462 bp-initial), 38323-39157 (834 bp-terminal)), aa length = 431

    The best hit of tblastn: AK245119 Glycine max cDNA, clone: GMFL01-21-H20 E-value = 6e-144

    Some predictions: NM_124149 Arabidopsis thaliana protein kinase, putative (AT5G47750) mRNA,complete cds. E-value = 2e-139

    Genemark predicted: base 37784-38247 (464 bp-initial), 38323-39157 (835 bp-terminal)), aa length = 432

    The best hit of tblastn: AK245119 Glycine max cDNA, clone: GMFL01-21-H20 E-value = 6e-144

    Some predictions: NM_124149 Arabidopsis thaliana protein kinase, putative (AT5G47750) mRNA complete cds E-value = 2e-139

    The 12th predicted gene

    FGENESH predicted: base 39791-43119 (8 exons: 39791-40755 (963 bp-terminal), 40842-40890 (48 bp-internal), 41145-41306 (162 bp-internal), 41362-41457 (96 bp-internal), 42380-42470 (90 bp-internal), 42750-42861 (108 bp-internal), 42929-43000 (69bp-internal), 43056-43119 (63 bp-initial)), aa length = 532

    The best hit of tblastn: EU262743 Selaginella moellendorffii putative gibberellin receptor mRNA,complete cds E-value = 5e-57

    Some predictions: XM_001757014 Physcomitrella patens subsp. patens GLP1 GID1-like protein (GLP1)mRNA, complete cds E-value = 1e-53

    Genemark predicted (two genes predicted): base 39791-42449 (7 exons: 39791-40755 (965 bp-terminal), 40842-40890 (49 bp-internal), 41145-41306 (162 bp-internal), 41362-41457 (96 bp-internal), 41919-41972 (54bp-internal), 42074-42225 (152 bp-internal), 42380-42449 (70 bp-initial)), aa length = 515

    The best hit of tblastn: EU262743 Selaginella moellendorffii putative gibberellin receptor mRNA E-value = 1e-56

    Base 42617-43119 (three exons: 42617-42861 (245 bp-terminal), 42929-43000 (72 bp-internal), 43056-43119 (64 bp-initial)), aa length = 126

    The best hit of tblastn: EF081679Picea sitchensis clone WS02813_H13 unknown mRNA E-value = 7e-25

    Some prediction: AB013853 Vigna radiata ARG8 mRNA for GPI-anchored protein, complete cds E-value = 2e-13

     

    The 13th predicted gene

    FGENESH predicted: base 43595-45184 (1590 bp), aa length = 529

    The best hit of tblastn: XM_001767223 Physcomitrella patens subsp. patens predicted protein(PHYPADRAFT_230053) mRNA, complete cds. E-value = 2e-166

    Some predictions: NM_115255Arabidopsis thaliana glyoxal oxidase-related (AT3G53950) mRNA,complete cds E-value = 6e-145

    Genemark predicted: base 43595-45184 (1590 bp), aa length = 529

    The best hit of tblastn: XM_001767223 Physcomitrella patens subsp. patens predicted protein

    (PHYPADRAFT_230053) mRNA, complete cds E-value = 2e-166

    Some prediction: NM_115255 Arabidopsis thaliana glyoxal oxidase-related (AT3G53950) mRNA,complete cds E-value = 6e-145

     

    The 14th predicted gene

    FGENESH predicted: base 48850-49940 (4 exons: 48850-49014 (165 bp-initial), 49080-49325 (246 bp-internal), 49394-49492 (99 bp-internal), 49542-49940 (399 bp-terminal)), aa length = 302

    The best hit of tblastn: EF084225 Picea sitchensis clone WS02712_G18 unknown mRNA E-value = 3e-84

    Some prediction: NM_102923 Arabidopsis thaliana AT-IE (Arabidopsis thaliana bifunctional

    HisI-HisE protein) (AT-IE) mRNA, complete cds. E-value = 2e-83

    AY079376 Arabidopsis thaliana putative phosphoribosyl-ATP pyrophosphohydrolase At-IE (At1g31860) mRNA, complete cds. E-value = 2e-83

    Genemark predicted: base 48850-49940 (4 exons: 48850-49014 (165 bp-initial), 49080-49325 (246 bp-internal), 49394-49492 (99 bp-internal), 49542-49940 (399 bp-terminal)), aa length = 302

    The best hit of tblastn: EF084225 Picea sitchensis clone WS02712_G18 unknown mRNA E-value = 3e-84

    Some prediction: NM_102923 Arabidopsis thaliana AT-IE (Arabidopsis thaliana bifunctional HisI-HisE protein) (AT-IE) mRNA, complete cds E-value = 2e-83

    AY079376 Arabidopsis thaliana putative phosphoribosyl-ATP pyrophosphohydrolase At-IE (At1g31860) mRNA, complete cds E-value = 2e-83

     

     

     

    Was this page helpful?
    Tag page (Edit tags)
    • No tags
    You must login to post a comment.