Orphelia currently provides two models for scoring open reading frames in sequence fragments, Net700 and Net300. In order to run the webserver, you need to specify, which model should be used to predict genes in your input data.
- Net700 was trained on 700 bp fragments. This model should be used for input data of Sanger read length (~700 bp) or longer.
- Net300 is a scoring model for fragments smaller than 300 MB, as they result e.g. from pyrosequencing. When evaluated on simulated 300 bp fragments from annotated genomes that were excluded from training, Net300 shows a higher specificty than Net700. Both models have similar sensitivity on 300 bp fragments.
The user may define the maximal overlap between two predicted genes. If the maximal overlap is e.g. 60, no genes will be predicted that overlap more than 60 bp. A more conservative overlap parameter, e.g. 0, will lead to the prediction of fewer genes, while the permission of a larger overlap, e.g. 400 bp, may lead to overprediction. See a length histogram of overlapping genes in the Orphelia training species (x-axis: number of overlapping base pairs, y-axis: absolut frequency``).
Input file in multiple FASTA:
> Seq No 1 CCTCCTCCTGTTTTTCCCTCAATACAACCTCATTGGATTATTCAATTCAC CATCCTGCCCTTGTTCCTTCCATTATACAGCTGTCTTTGCCCTCTCCTTC TCTCGCTGGACTGTTCACCAACTCTCAGCCCGCGATCCCAATTTCCAGAC AACCCATCTTATCAGCTTGGCCACGGCCTCGACCCGAACAGACCGGCGTC CAGCGAGAAGAGCGTCGCCTCGACGCCTCTGCTTGACCGCACCTTGATGC TCAAGACTTATCGCGATGCCAAGAAGCGTCTCATCATGTTCGACTACGA > Seq No 2 CGAAACGGGCACCTATACAACGATTGAAACCATTATTCAAGCTCAGCAAG CGTCTATGCTAGCGGTTATTGCGAGCACTTCAGCGGTTGCTACTACGACT ACTACTTGATAAATGAAACGGCTATAAAAGAGGCTGGGGCAAAAGTATGT TAGTTGAAGGGTGACCTGAACGATGAATCGGTCGAATTTTTTATTGGCAG AGGGAAGGTAGGTTTACTCAATTTAGTTACTTCTAGCCGTTGATTGGAGG AGCGCAAGCGACGAGGAGGCTCATCGGCCGCCCGCGGAAAGCGTAGTCT TACACGGAAATCAACGGCGGTGTCATAAGCGAG > Seq No 3 .....
The results are send via Email.
- gene.pred : The final prediction in coord format
- FragmentNo : Number of the fragment (resp. input order)
- ORFinFragNo : Counter of the predicted ORF in this fragment (simply incremented)
- posLeft : Left coordinate of the ORF in the fragment sequence
- posRight : Right coordinate of the ORF in the fragment sequence
- +|- : Strand
- Frame : Reading frame of a predicted gene, counted from the 5'-end
of the input sequence. Reading frame 1 begins at the 1st
position of the sequence, reading frame 2 at the 2nd,
frame 3 at the third position.
Examples for Frame:
---------------------- DNA Fragment ATG------TAG Gene, begins on position 4 -> Fr. 1 ---------------------- DNA Fragment ATG------TAG Gene, begins on pos. 3 -> Fr. 3 ---------------------- DNA Fragment ATG------TAG Gene, begins on pos. 5 -> Fr. 2
- C|I : Is the predicted gene complete (C) of incomplete (I)
Please direct your questions and comments to firstname.lastname@example.org.