Download Práctica 10 – Predicción de genes

Document related concepts
Transcript
Práctica 10 – Predicción de genes
Programas de predicción de genes:












GENESCAN
HMMgene
GeneMark
Glimmer
GRAIL
GrailEXP
GENVIEW
Genie
GeneFinder
MZEF
FGENESH
NETGENE2
Otros programas útiles:

1.
RepeatMasker enmascara repeticiones
Para la secuencia #1 del apéndice realice predicción de genes siguiendo los
siguientes pasos:
o
Utilice alguno de los métodos de la lista (le será seleccionado por el docente
en clase) y responda:

¿Cuántos exones tiene la predicción?

¿Cuáles son sus probabilidades?

Si es posible marque en la opción de predicción de señales (donor,
aceptor, promoter, etc.). Y vuelva a ejecutar.

Aplique el RepeatMasker a la secuencia y vuelva a ejecutar. ¿Se han
modificado los resultados obtenidos?
2.
Repita el ejercicio anterior con la secuencia #2 pero con otro programa (utilice el del
grupo de su izquierda). Tome los exones predichos y concatenelos para luego
buscarlos en el NCBI por proteínas existentes o ESTs que corroboren la predicción.
3.
Dada la secuencia HS307871 del apéndice, prediga sus genes pero ahora utilizando
el programa del grupo de enfrente. Busque la secuencia en el NCBI y compare con
los resultados de los exones e intrones anotados.
4.
Repita el ejercicio anterior pero esta vez usando el GRAILEXP con la opción
“Perceval Exon Candidates”. Ahora vuelva a usarlo pero cambie esa opción por
“Galahad EST/mRNA/cDNA Alignments” y active “Gawain Gene Models” para
predecir la secuencia codificante.
5.
Un poco de trabajo manual… Tome la secuencia “short sequence” del apéndice y
realice los siguientes pasos:
o
Determine el inicio y fin del gen utilizando Transcriptional Start Site Finder
o
Busque indicios de una polyA utilizando Poly A Signal Predictor
o
Grafique los datos obtenidos y prediga en dirección se transcribirá el gen.
o
Realice estos dos pasos nuevamente pero para la secuencia reverso
complementaria
(http://www.dnalc.org/bioinformatics/dnalc_nucleotide_analyzer.htm#permu
tator).
o
Utilice el Splice site prediction program con donor score cutoff de 0.88 y
acceptor score cutoff de 0.94 (donor site - al principio del intron, acceptor
site - al final del intron).
o
Con todos los resultados obtenidos hasta el momento grafique las distintas
alternativas para el gen. Ubique los codons de start y stop.
o
Identifique las posibles secuencias que codifica este gen y realice una
búsqueda en NCBI.
Apéndice – Secuencias
Sequence #1
>gi|2789671|gb|AF040714.1|AF040714 Homo sapiens
ATGCCAGGCCCCCCACCAGCCACGTTGGGGCAGCCCCCACAGCTCCCGGCCTTCGGGCCAAGGTGTCGGG
GTGCGTCTCCTGGCCCATCAATACAGATTACATATTTATATCAATCGCGGGCTCTGAGGGCGCCCTCGGA
GAGCGGCCCCGCGCCTACGAAACCAAACTGGGAGTGGTCGCGCGGAAACTCTGGCTCGGGATTGGCTGCG
GGCGCCCGCCGCGGTGCGGGGGGATTGCTAATCGTATTCAGCATGTTTTGCACAAGAAATGTCAGCCAGA
AAGGGCTATCTGCTCCCTTCGCCAAATTATCCCACAACAATGTCATGCTCGGAGAGCCCCGCCGCGAACT
CTTTTTTGGTCGACTCGCTCATCAGCTCGGGCAGAGGCGAGGCAGGCGGCGGTGGTGGTGGCGCGGGGGG
CGGCGGCGGTGGCGGTTACTACGCCCACGGCGGGGTCTACCTGCCGCCCGCCGCCGACCTGCCATACGGG
CTGCAGAGCTGCGGGCTCTTCCCCACGCTGGGCGGCAAGCGCAATGAGGCAGCGTCGCCGGGCAGCGGTG
GCGGTGGCGGGGGTCTAGGTCCCGGGGCGCACGGCTACGGGCCCTCGCCCATAGACCTGTGGCTAGACGC
GCCCCGGTCTTGCCGGATGGAGCCGCCTGACGGGCCGCCGCCGCCGCCCCAGCAGCAGCCGCCGCCCCCG
CCGCAACCACCCCAGCCAGCGCCGCAGGCCACCTCGTGCTCTTTCGCGCAGAACATCAAAGAAGAGAGCT
CCTACTGCCTCTACGACTCGGCGGACAAATGCCCCAAAGTCTCGGCCACCGCCGCCGAACTGGCTCCCTT
CCCGCGGGGCCCGCCGCCCGACGGCTGCGCCCTGGGCACCTCCAGCGGGGTGCCAGTGCCTGGCTACTTC
CGCCTTTCTCAGGCCTACGGCACCGCCAAGGGCTATGGCAGCGGCGGCGGCGGCGCGCAGCAACTCGGGG
CTGGCCCGTTCCCCGCGCAGCCCCCGGGGCGCGGTTTCGATCTCCCGCCCGCGCTAGCCTCCGGCTCGGC
CGATGCGGCCCGGAAGGAGCGAGCCCTCGATTCGCCGCCGCCCCCCACGCTGGCTTGCGGCAGCGGCGGG
GGCTCGCAGGGCGACGAGGAGGCGCACGCGTCGTCCTCGGCCGCGGAGGAGCTCTCCCCGGCCCCTTCCG
AGAGCAGCAAAGCCTCGCCGGAGAAGGATTCCCTGGGTAAGCAGGGCTGCAGAGGGCTGCAGTCAGGCGG
GCAGACAGGCAGACACAAGGAGGAGAAGGATCAGAAAACTAGGAGCCCGCGCAGCAGCCGGCCGGCCTTG
GCCCAAGCTGCAGGCAGGCTGACCTTGTGAACTTGCTTTTTAATATTTGGGCGTGGGGGCGCAGTAAAAT
TCATGTCCGGCTTAGCGCCCCACAGCAAGACGTCCTCGGCGCTGGCCTCAGCTCCCCCTGACTAGGGACG
AGGACACCAGCGAGCAGGCCCCCTCCTGTGCGCTCTTTCCTGTGGCCGGGAGGACCCAGAGCCCTGGTCC
CTGCCCAGCCTGCGCGGCGCGGCCCACGCGGGGGGAGGGGGAGGGAGGGAAAGTAGCTCGCCCGCAGATA
GCGCGGATGTTTGTAAGGCATCCAAAATAAGCAGCCGCCAGCGCCAATAAATAAGCCCATTAACCGGCGA
AGTTCGAGTGTACGATCCCCCATGCTTTTTTCAAAGTTGCTGAGGGGCGGGAATCTTCGTGGCGGGAAGA
AGAAAAGGCAAATCCGGCCTGGAAGCGGGGGGCCCTGAGCTGAGAGCCAGAGAAGGGCCATTTCCCTTCC
CCTGGACCTCGGAATCGCCCAGCTATGTATCCTGGCTCCTGGAGAAACTTGAGGGAGGGCCCTTGACCCC
CGAATCGGTTTTTCCTGCCTTCCCCATTGGACCAATGATGCCCTTCTTTCTCCCCTTATCGAGTCTTGGG
CAATCAGGGCCCTGGGGTGAGACAGCCAAGCTGCCTGGCCCATCTTCCAAGTAAGCACCCCGCGCTCCTA
GCCTGGGGGCTACAGGAAATGCTTGTCTGCCATATGGCAAGAGGCAAAGAAAAGCGTTAAGTTCAAGATG
TACAGCCTGCCCTCCCAGGCCTTTCCTTCTGCAAGCATCTACGGCTTAGCGCTAAAACAGGTGTTTGGAA
AAGTGGGGGAAATGTAAATTGGAAGGGTCATGTAGATTGAAGGCCCACTCAATTTTTGTCATGACTTATG
GAGGAACTGCTTGCTCTCAGCAAGCCAAAAACGGGGGCACGACTCTCTTCTCTGTGACTTGGGACATCTC
TCTTATGGGAGAAACGGAGGCAATTCACCCCCGCGGGCAGCCCGTGTGGCCTCGACTTAATCATCCCCTC
TTTATTCTCTTACATGCCAGGCAATTCCAAAGGTGAAAACGCAGCCAACTGGCTCACGGCAAAGAGTGGT
CGGAAGAAGCGCTGCCCCTACACGAAGCACCAGACACTGGAGCTGGAGAAGGAGTTTCTGTTCAATATGT
ACCTTACTCGAGAGCGGCGCCTAGAGATTAGCCGCAGCGTCCACCTCACGGACAGACAAGTGAAAATCTG
GTTTCAGAACCGCAGGATGAAACTGAAGAAAATGAATCGAGAAAACCGGATCCGGGAGCTCACAGCCAAC
TTTAATTTTTCCTGATGAATCTCCAGGCGAC
Sequence #2
>gi|2739430|gb|U70368.1|MMU70368 Mus musculus hematopoietic-specific
IL-2 deubiquitinating enzyme (DUB-2) gene, complete cds
GGAAGGAAAACCAGACCTAGGCTGCTTATACTGGTTCTGTGTGGTTAGCAAGGTAACAGAAACTCTTGTA
TGGCATGTGTAGTCATCTATTTGACATGATTTTGTAACTTTATTCCAAGTAAAACCCAAGCTTAAGACAC
CTAGGAAATTGGAGCTAAATTCAGGGAAATGCACTCCAATAATGTGACATTTCTGAGCTGCTTTGCAGAA
ACCACACCCAAATTGGGAGAAGCTTGTCTGGGATTGGCTGTCCTTGGAAGACTGTAGGCGTGGTCACAAG
ACTGGAGTATAAAAGACTGAGCATTTGTCCTCACTTGCAGAGATTCTCTGGAGGGAAAGACTTCCTTCTG
CTCCCTTAGAAGACTCCAGCAAGTTATTTGAAGAGGTCTTTGGAGACATGGTGGTTTCTCTTTCCTTCCC
AGAAGGTAAGTCTCACTGTAAGGTCTTTATGTCTTGTGTGTCCCCCAGCAGCCTTGTCATCTCCGGCTGC
CCTAGACCTGCATAAGGACAGATTGAGTGTGCTGGGATAGACTTTTGTTGACAAAGGGGCTGCTCTGCCC
TTCTAAGAGGTTGAGTCTCATCATAAGGCCTTTTGCAGCTTGCATGTGTAGTGCCAGGAAAGAGTAGTCA
TCCCCCAAAACCAGACAGGAACTGACGAGATGCAATCACTGTGTGGACTTTTTACCAGCTAGCTAGGGCA
CTACCATGAGCCACTGTCTAGCAGGGAGGCTTTGGGGATGGTGTGCCCCGAATATCTCTCAGGGTAAGAG
TTTACAGTAAGCAGCAAGCAGAGGGGTGTGGGTGAGTGTGCAAGTATCTAATTGGCTAGTTTTTGTGGCC
TGTAACATATTGGTGGGTGTTGGGAGTCATAAGCTAAATGTTTGCTTTCCTCTGCATTGGTGGTCATTAG
GGAGGGGGCAGATTATGAACCTAGGTTGCAGATCTGTTGGAGTAATAACAAGACACTGGTCTTGTTGGGG
GTATAACCTAGAGACTCGATTTATGTTCATGTTTGGTTTGGGATGGGTTTTATGTGAGTGTTTTCTTTTT
TGGGGAGGGGGTCGGTTAACTTGGAAAGTAATGCTAGGTACTGTCCTGTTCATTTCCCTGAGGTGAAAGT
TAGGTCAGGTTTTCTAGAATGGAGTCTGAAGGTAAAARATTTGGCCACTGGCATGCCCTAAAGTCTTTTT
GTGTTCTTGTCCCCTAGCAGATCCAGCCCTATCATCTCCTGGTGCCCAACAGCTGCATCAGGATGAAGCT
CAGGTAGTGGTGGAGCTAACTGCCAATGACAAGCCCAGTCTGAGTTGGGAATGTCCCCAAGGACCAGGAT
GCGGGCTTCAGAACACAGGCAACAGCTGCTACCTGAATGCAGCCCTGCAGTGCTTGACACACACACCACC
TCTAGCTGACTACATGCTGTCCCAGGAGTACAGTCAAACCTGTTGTTCCCCAGAAGGCTGTAAGATGTGT
GCTATGGAAGCCCATGTAACCCAGAGTCTCCTGCACTCTCACTCGGGGGATGTCATGAAGCCCTCCCAGA
TTTTGACCTCTGCCTTCCACAAGCACCAGCAGGAAGATGCCCATGAGTTTCTCATGTTCACCTTGGAAAC
AATGCATGAATCCTGCCTTCAAGTGCACAGACAATCAGAACCCACCTCTGAGGACAGCTCACCCATTCAT
GACATATTTGGAGGCTTGTGGAGGTCTCAGATCAAGTGTCTCCATTGCCAGGGTACCTCAGATACATATG
ATCGCTTCCTGGATGTCCCCCTGGATATCAGCTCAGCTCAGAGTGTAAATCAAGCCTTGTGGGATACAGA
GAAGTCAGAAGAGCTACGTGGAGAGAATGCCTACTACTGTGGTAGGTGTAGACAGAAGATGCCAGCTTCC
AAGACCCTGCATATTCATAGTGCCCCAAAGGTACTCCTGCTAGTGTTAAAGCGCTTCTCGGCCTTCATGG
GTAACAAGTTGGACAGAAAAGTAAGCTACCCAGAGTTCCTTGACCTGAAGCCATACCTGTCCCAGCCTAC
TGGAGGACCTTTGCCTTATGCCCTCTATGCTGTCCTGGTCCATGAAGGTGCGACTTGTCACAGTGGACAT
TACTTCTCTTATGTCAAAGCCAGACATGGGGCATGGTACAAGATGGATGATACTAAGGTCACCAGCTGCG
ATGTGACTTCTGTCCTGAATGAGAATGCCTATGTGCTCTTCTATGTGCAGCAGACTGACCTCAAACAGGT
CAGTATTGACATGCCAGAGGGCAGAGTACATGAGGTTCTCGACCCTGAATACCAGCTGAAGAAATCCCGG
AGAAAAAAGCATAAGAAGAAAAGCCCTTGCACAGAAGATGCGGGAGAGCCCTGCAAAAACAGGGAGAAGA
GAGCAACCAAAGAAACCTCCTTAGGGGAGGGGAAAGTGCYTCAGGAAAAGAACCACAAGAAAGCTGGGCA
GAAACATGAGAATACCAAACTTGTGCCTCAGGAACAGAACCACCAGAAACTTGGGCAGAAACACAGGATC
AATGAAATCTTGCCTCAGGAACAGAACCACCAGAAAGCTGGGCAGAGCCTCAGGAACACGGAAGGTGAAC
TTGATCTGCCTGCTGATGCAATTGTGATTCACCTGCTCAGATCCACAGAAAACTGGGGCAGGGATGCTCC
AGACAAGGAGAATCAACCCTGGCACAATGCTGACAGGCTCCTCACCTCTCAGGACCCTGTGAACACTGGG
CAGCTCTGTAGACAGGAAGGAAGACGAAGATCAAAGAAGGGGAAGAACAAGAACAAGCAAGGGCAGAGGC
TTCTGCTTGTTTGCTAGTGTTCACTCACCCACTCACACAGGCTCCTGTGGACACCCTGCCAACCCAAGGT
GCCTGGAACAAGAGGTTTGGACCTCTGTCCCAGGCAGGGACAATGCCTCACCCTTCATGTGGGGTCCACC
TATCCTCTGGGCCCTTGCCTGTTTTTACTGACTGACTCTCTGAGAATGGTCATTTGAATGTGGAAAAAAA
ATGCCCAGGGTGTTGCTACAGGTTAAAGACAGGAAAGCTGGACAGTCAGGGGAGGTCTGCATAGCCTCTC
CTGCAACTCATGGGATCTGAGTAGCGTAGAGACTAAATCACCACACTGGAGCTTTCTTTACTTTGCTTTC
CTTTTTTTTTAATTTATTTTTTGTTATTAGATATTTTCTTTATTTACATTTCAAATGCTATCCCAAAAGT
TCCCTATACCCTCCCCCCCCCCGCTAACCTACCCACCCA
>HS307871
AAGCTTCGTAAGCACCTCTCGCGGCACGAAAGCCAGCGCTGCCTAGGCGCCGCCCGGCGC
GAGGCTCTCACCTCTGCCAAGAAGCGCACCGGCCCAGCAGCTGCCGGGGGGACTCCAGCA
CCGCGCCGGGCCATGGACCCGCCATGAGTCAGCTGGCGCGACCGCGGACAGAGCTTCCCA
CCACGCCCTTCCCCGCCTTTGGCCAGCCTTTGCCGTATGTTCTGGACTAAGCGCACCCCA
GCTCTCACTGTATTGGACTGTGTACTCCCACACTCAACCATATTACTTATCTCTGTGCCA
CCCTAACCCAGCCGACCAAACCCAAGATTGGTGATTGCTACCTGATCAATCTCCCTCTCT
CCATTTCCTTGTGACTACCATTTTATCTCTACTGCTACTACCCTCATTCAAGTCACCATT
CTAGCTAGCCTGGGTCATTGCCAACAGTCATTTTTCTGGTTCTTCGGCCTGCTGTTTTTC
CTCCCACTCCCAGCGAATCTGCTGGACTCCCTATCCTATGGGTGGTGTGATTAAAGTGTT
TGAGACAATGGCCCCTTCCCCTGCCACTGACAGGAGTCTTGAGTCATTAGGGTTGAGTTC
TGTTTGACACTCCTAATCCCAAGGACACTGGAGATCATTATTCATTTTAATGTGATTGCT
GATTTCTGTTTCCCCAGTCTTGTAGCTCCTTAAAGGCTGGGGTGTCTTGAGCAGAGCTAA
CCTCTGCACCTACTATAGGTCCAGGCTATAGTATGGACCTGGCTGGATAAGACTGTTGGT
ATCATAGTTGGGACTTGCGCCAAGCTCCGGATACCCAGACTGTCAGATGAGAACAAATTC
CTCATGTCACCGTAAGATACATTTACAGCGGAGTTTTCTTTTGGGCCTTTGTTGTTTCGT
CGCTACAGCAAACTTTACGGTGAAAAAAGGTAGGGGTCTACGGCAGCAGCAGGGCAGCCC
TGGAGCTGTCGCTGGAGTCCGATCATGTGATCTTCAACATGGCGACGCTCTTGGTTCCCT
ACAGAAAGGGGCGGAGCCTGGACTGGGGGGCAGGCTCAGATTCAGGTTAAATTGTGGATT
GAGCTCGCAGTTACAGACAGCTGACCATGGAAGCGAATGGGTTGGGGTGAGTTCTCCAGA
GCACGCGGTGTGGCTAGCCGGGCTTCTAATTTGAGTCTTCCAACTCAGGACTCTATCCCT
CTACTCCCCTTTCCCCACCCTGGAGAACCTCCCAACCTGAACTCCGTTAGCTGGATCCTG
AATCCTAAAACCATGGATTTTTGAGATGTTCATCCCAGGGCCTTAATTCAAGGGATGCCT
CAGGATTTCCAACCAGGATCTTCATTCTGGGACCATCAACTCTGATCCCTCTTTATCCCC
CAGCCTGGGTATTTCTCAGCCCCTGAACCAGCCCAGTGACATTTCCCGGTTTCTGAGGCT
CACTAGTTCGAAGACCCCCAAACTATCCTTAGTGGGCCTTCATTCCCTCCCCCCAGTCCC
TCTGGTTGCTTCGAGCTTGGAAGAGTAGAGACTAAGTGGAGGGAAGAGGCCCCAGGGCGG
GCCCTTCTGGAGTTTGTGCACTGATAGGCAGAGAGGAGGCGGAACGGGCGGAAAGCCAGG
GTTTGGGAGCTGGCCTGGAGGAGGTAGGATAGCGGTCCTGGACTGAATCGGCCTTATGAA
CCCGCGCTTTCCCCAGCCGTCCAACGTAGCATACTGACACCTACCCCCACCCCCACCTGA
TCGCCAGACCTCAGGGTTTTCCGGAGCTGAAGAATGACACATTCCTGCGAGCAGCCTGGG
GAGAGGAAACAGACTACACTCCCGTTTGGTGCATGCGCCAGGCAGGCCGTTACTTACCAG
GTAAGAGTCAGGGTCTGGAAATCTAGATAAAACTCCGGAGGAGAAAAGTTTTCGAGGGGC
AGGGGAGGGCTCTGGAGGGCCTCAAGGCTGAGCCCTGTCTTCCCTCTGTATGCAGAGTTT
AGGGAAACCCGGGCTGCCCAGGACTTTTTCAGCACGTGTCGCTCTCCTGAGGCCTGCTGT
GAACTGACTCTGCAGGTGAGGGGTCCACAAAAGAGGGAAAGATTTATGCCTTCAGTCTGC
CACCTAGCAACCTGTCTCCTGTTTCCTACAGCCACTGCGTCGCTTCCTTCTGGATGCTGC
CATCATTTTCTCCGACATCCTTGTTGTACCCCAGGTACCCACTCAAACCTGATCCTAGAA
TATAATCCAAGGACGCCTTGAAAATCCTTCTATCAGTCCAGTCAAGGTTTACAATAAGCA
CTTATCCTAACTGGATCGAGGGAAAAACTAAGGTTGAAAGAAATGGAGTTTGGCAGAGTT
TTATTCTCCTTTTCCTTCCTCCTGGAATGAGCTGAACAGAACCTTTCCTCCTGGATTCCA
TTTTGGGAACCCAGATGTTTTCTCCCCCTCCAGGCACTGGGCATGGAGGTGACCATGGTA
CCTGGCAAAGGACCCAGCTTCCCAGAGCCATTAAGAGAAGAGCAGGACCTAGAACGCCTA
CGGGATCCAGAAGTGGTAGCCTCTGAGCTAGGCTATGTGTTCCAAGCCATCACCCTTACC
CGACAACGACTGGCTGGACGTGTGCCGCTGATTGGCTTTGCTGGTGCCCCAGTAATGTGG
GACAGGGCAGGGACTCGGGGCGCGGGGAGATCACTCTGGAAGGTCTGGGGTAGACAAAAG
GAAGGGTCAGTCTGGCTTCTGTGACACCATCTTTCTATCCTTCTCTAGTGGACCCTGATG
ACATACATGGTTGAGGGTGGTGGCTCAAGCACCATGGCTCAGGCCAAGCGCTGGCTCTAT
CAGAGACCTCAGGCTAGTCACCAGCTGCTTCGCATCCTCACTGATGCTCTGGTCCCATAT
CTGGTAGGACAAGTGGTGGCTGGTGCCCAGGTGAGTCCTGAGAGAGAGAGAAATAGGCTG
GGATTTGGTCTGTAAGGCCGAGAAGCAAGAGTGTCCTAAACCTGAGAGGGCAGGGGTCTT
AATGCTAGGGATGAAAGAACCTTGGCCTCCAGTGATCTAGCTGAGCAGCCAAGCCCATCC
TGACACTGACAGTGGGGCTTAATGCTCTAAGTATTCAGACACCAAAGTTAGTGCTGGGAT
CTGAGGAAAGTAAATTTTTTTTTTTTTAATTACTGGGTTTTTAGGGTCAGGCAGTATCAG
GGATTGAAGTCATTTGGGGAAAATTGAGGTGGATTTTGTATGTGGGGGAAACTTCCTCTT
TGTGTGTTACATATTTTTCTTCACCATACCCTAACTAGGCATTGCAGCTGTTTGAGTCCC
ATGCAGGGCATCTTGGCCCACAGCTCTTCAACAAGTTTGCACTGCCTTACATCCGTGATG
TGGCCAAGCAAGTGAAGGCCAGGTTGCGGGAGGCAGGCCTGGCACCAGTGCCCATGGTGA
GGATTGGGATGGGTTGAGTGAAGGTGGTCCTGTGGAGCTTTCAGGCTAAGTCCTGCATGG
ACTGGAGTGACCACTGGAGGGCAGCAGAAGTACAGTCAAGAAAGATTAGTGGTTGTAGCA
AGGCCCTCTGTAGCCTGAGATCTGCTTTTTTCTAGATCATCTTTGCTAAGGATGGGCATT
TTGCCCTGGAGGAGCTGGCCCAAGCTGGCTATGAGGTGGTTGGGCTTGACTGGACAGTGG
CCCCAAAGAAAGCCCGGTAAGCCATGGAAGGGTGAGGCCTTGAGGTTGAGGTGGGGGTGT
TGGCTGGGGGAGCTGCCATGTATGCAGTTACCAGAACGTGGCGCTGGCTTTGCTTCCAGG
GAGTGTGTGGGGAAGACGGTGACATTGCAGGGCAACCTGGACCCCTGTGCCTTGTATGCA
TCTGAGGTAACAGCCAGGGCCCCTCTGTGTGTCCTGTTACTGTGCACTCCTGTGGCCTGT
GGTTGTATTATTCTGTGTGCACTTGTTTTTAATGTCTGTCTGTCCTTTTCTTCTCATCTG
TACACCATAAGCCCTAGAAAGACCGGACTTTTTGTTGCTGTTGTTCATTTGTGTTTATGC
TTCATGCCTGGGTCCATACTAGGGATCTATAAATTTTATTGAATGACTGAATAACACTGA
GTTAGAAGCATGCCTACCATATGCGTTTCTACTAGTATATATAGGGAGGACAAAGGCTTG
CTGGTCCTCCTGTAGCCAGTGCCCTGTTGGTCCCCCAGGAGGAGATCGGGCAGTTGGTGA
AGCAGATGCTGGATGACTTTGGACCACATCGCTACATTGCCAACTTGGGCCATGGGCTTT
ATCCTGACATGGACCCAGAACATGTGGGCGCCTTTGTGGATGCTGTGCATAAACACTCAC
GTCTGCTTCGACAGAACTGACCGTATACCTTTACCCTCAAGTACCACTAACACAGATGAT
TGATCGTTTCCAGGACAATAAAAGTTTCGGAGTTGAACCTATTGTGTAGTTTTGTTTGTG
AAAGATTGTCCCATATCCTCAGTTCTTCTTAGCCTCTGTCTCCTTCCCTGGGACCCTCTC
ATATCCTCTTATAG
>unknown
AGGTCGACTGAACCCCACAGGTGATCTCTAAGTGGTGTGCCCCCCACCCCCCCGTCTTCATGGTACGCCT
TACCTCCTAA
GGGTTGTCGAGCATAGCTAGGTGAAGGATGTACACTTGGAGTTTAAACTATTGAGGAAGCCGAGGTTGGG
GGAGTTCAAA
GCCAGCCTGAACAATGTACCAAGACATCTTCTAACAAAACAAAACACCGGCTGGTGAGGTACCTCAGTGG
GTAAAGGTGC
GTAGCCCTAAGCCTGATAACCGGAGTTTGCTCTCTCTAGAACTGACGTGGGAGAAGAGAAGCTGTCTCTA
CAGTCCTCCT
CTGACCTCCACACCATGCTGCAACATTCACCCCCAGCCCCAACGAGAGTAGTAAAAACTCAAAACAAAAC
AAACAAAACA
GGGAGGGACTGGAGAGATGGCTCAGTGGTTAGGGCCACCAGGCTGCTCTTCTGGAGGACATCCACATTCA
CAACCACCTC
TGACTCTCGTTCCAGAGGATCTAACATCTTCTTCCAGCCTCTACGTAGAGGCACCAGGAATGCGTGGCGC
ACACAGATGT
ACATGGGGACAAAACATGCACATAAAATACAGTAATAAGCCGGGCAGTGGTGGCACATGCATTCAGGAGG
CAGAGGCCAG
CCTGGTCTACAGAGTGAGTTCCAGGACAGCCAGGGATACACAGAGAAGCCCTGCCTCAAAAAACCAACAA
CAACATAAAA
ATTAATAAAAAATTTTTTTGATTTACTTTATTTATATGAGAACACTGTCACTGTCTTCAGACACCAGGAA
AGGACATCAG
ACCCCATTACAGATGGTTGTGAGTCACCATGTGGTTTTTAACCACTGAGCCATCTCTCCAGCCCTATATA
TATATTTTTT
TTTAAGATTTATTTATTTATTTATTATATGTGTGTATGCTGTAGCTGTCTTCAGACACTCCAGAAGAGGG
CATCAGATTT
TTGTTACAGATGGTTGTGAGCCACCATGTGGTTGCTGGGATTTGAACTCAGGACCTTTGGAAGAGCAGTC
GGTGCTCTTA
ACCGCTGAGCCATCTCACCAGCCCCAAAATATATTAAAACAACAACAACAAGAGAGTGTGAAACACAGCC
TCTGGGGCCC
CCCACAGAGTCCTGTGTCCCTATTCTAAGGATCTGACAATTTAACCCTACCTCCTCCATGGTGAGGCCCC
AGTGGAGCTA
GGGGCATAGGCACAGACAGGACCATTGGACTAGAGTTTATATTGGGGTTCTTAGCACTTCTGAGACTTCC
TTTCCTAACT
AAGGGTGACAATAGTACCTATTATTGTTGACACTGGTATTATTTTTATTGTTGTTTGTTGTTATTTACAG
AGTCTTGCCA
CATAGCCAGGTTAGCCTTGCTAGTACTAGCCCCTGAGTGTCTCTGATACTATACTCTTCTGTGTGGGTCT
CACTGTGTAG
CCAGGGCTGCTTGGAACTTACTATGTAGACCAGGATGGCTTTGAACCACTGAGACCTGACTTTTTTTAAC
CCGCTAAGTG
CTGGGATTAAAGGCGTGTGCTGCCATGCCTGGCTTTGCACACACACACATACTGTGTGGCGTGCAGGTGT
GTGCACATGC
TAGGACATGTGCATGTCACAGTGCACATGTGGCAGTCAGAACACAACTTCAGGTGTCATCCTCCTGTCTC
TGCTTTACAT
CTGGACCTAGGAGAGTTGGGATTGCAGATAGGAGCAACTTCTGTGGGTTCTGGGGATCCGAACTCGGGCC
CTCAGACTTG
CTCAGAAAGCACTTACCTACTATGCCATCTTCCCAACGCTATGAGAAGTAATTTTTAAAAACATTTGCTG
ATTTTACATG
TGCATCTGTGTGTGTGTATGTAAGGAGTGCACAGGGGCAATGCACACTGCTCACGGCATGTGTGCAAAGA
ACAGAGAACA
ACTTGCAGGAGCTGGCTCTCGCCTGCAACCGCGTGGCCTCTAGGGCTTGAACTCAGGTCGTTGATCTTGG
TGGCAAGCAT
CTTTGCCTCCTGAGCCAGCTCCCATAAAAAGTAATGTGTGTAATATGCTTAAAAGAATCAGCCAATACCT
TGTGTTATTA
CTAACAGTCAATGAGTAGTTGTTGCTATTCGCCCATTCTTTGATCATGGAGACTTCTATTCCTGGACCTA
GAAATGGGGC
AAGAAGAGGCGTAGACATGGTAGTATATATACCTGTAATCCTAGCATTAGGGAGGCTGAGGCAGGAGTTC
TGGGCTAGCC
TGTGTTACACAGAGGAATTCTGCTTTACTCTTCACCCTAAAGCATGGACATAAAGGAGCAGACACTGTCT
TTGCACTCCT
GCACAATCGAGTGTTTTCTCAGGGGGGAAATGCATGCACCCAAGGTCCTTGTCTCCTTCCATCCCTTCCG
GGGCTGCCTC
AGGGCCCTCAGATTTCTTCTCTCCAGTCTTATGGAGTCAGGCAGGAAAGGTCAGGCTCTAGGATAGGGAA
CCAACAAGAG
ACCCTCCAAAAGGCTCTGCCCACTTTGGCTTCCATGTTCGGCAGCCCCCGTTTTGCCTTTTTCCTTCCTG
GCTTAGGGCC
AGTTTCTCATTTGCCCTAAACTCGTCCCTGAGTGAGGGAGGGCAGAGTAAGAGAATCAGGAAGCCTGATG
CTGTGTTCCT
GCATTCTCAGGCTCAGGTCCGTCCTCGGTGTGGGCGCCTTGTGGCCTTCAGCTCTGGTGGTGAGAATCCC
CATGGTGTAT
GGGCTGTGACTCGGGGACGGCGCTGTGCCCTAGCACTGTGGCACACGTGGGCACCTGAGCACAGTGAACA
GGTAGGGAGG
AAGAGGTGAGGGGGTGGGGGGGTGGGCAGGTGGGCAGGTGGTCACTGGAGAAGTTTGTGAGTGGGTGAGA
CCCCCAGCAA
AGCGCTTGGGTGACTGAGATAGCCCAGGGGTGGTCACTGGAGAAGTTTGTGAGTGGGTGAGACCCCCAGC
AAAGCTTGGG
TGACTGAGATAGCCCAGGAGAGGTGGATAAAGGAGACTGAACCACACTTCTCCCCATTCCCAGGAGTGGA
CAGAAGCCAA
AGAGCTGCTGCAGGAGGAAGAGGAGGAAGAAGAGGAGGAAGACATTCTCAGCAGAGACCCTTCCCCAGAA
CCCCCAAGTC
ACAAGCTTCAGCGAGTCCAGGAGAAAGCTGGGAAGCCCCGCCGGGTCCGGGTCCGAGAGGAACTGTGAAT
GGCTGAGCCT
GCTTCTCAGGATCAGGCCACTCAACTTGGGAAGGAACTGATGAGAAGGCTCTGGAGGATATCAGGAACAT
AGTAGCATGC
CAAGTCTACCATCTCGGGGACTTACAAGGGCTACCAGACCCTGGACTCACAAGCTTGCTACACAGACTTA
GCCTACAGCA
CATCAGGCCCGGGAGCCAGGTCTGGCCCCAGCTGAGGGACCTGCAAGGTCCCCAGGACAGACAAAAATCA
CTATGCCTCC
CTGAAAGGCAGGCATGTGGAGGAGTGCAGAGCAACTGCTTCTAATAAGAAACACACAGAGGGGCTGGAGA
GATGGCTCAG
CGGTTAAGAGCACTGACTGCTCTTCTGAGGTCCAGAGTTCAAATCCCAGCAACCACATGGTGGCTCACAA
CCATCCGTAA
TGAGATCTGGCGCCCTCTTCTGGGGCGTCTGAGGACAGCAACAGTGTACTTACATATAATAAATAAATAA
ATCTTTAAAA
AAAGAAAAAGAAAAGAAAAGAAACACACAGAGGAGACAGTCCCATCCTCT
>short sequence, 1500 bases
tccggcgcccgaaaggaaagggtggcgctgcgctccggggtgcacgagcc
gacagcgcccgaccccaacgggccggccccgccagcgccgctaccgccct
gcccccgggcgagcgggatgggcgggagtggagtggcgggtggagggtgg
agacgtcctggcccccgccccgcgtgcacccccaggggaggccgagcccg
ccgcccggccccgcgcaggccccgcccgggactcccctgcggtccaggcc
gcgccccgggctccgcgccagccaatgagcgccgcccggccgggcgtgcc
cccgcgccccaagtataaaccctggcgcgctcgcggcccggcactcttct
ggtccccacagactcagagagaacccaccatggtgctgtctcctgccgac
aagaccaacgtcaaggccgcctggggtaaggtcggcgcgcacgctggcga
gtatggtgcggaggccctggagaggtgaggctccctcccctgctccgacc
cgggctcctcgcccgcccggacccacaggccaccctcaaccgtcctggcc
ccggacccaaaccccacccctcactctgcttctccccgcaggatgttcct
gtccttccccaccaccaagacctacttcccgcacttcgacctgagccacg
gctctgcccaggttaagggccacggcaagaaggtggccgacgcgctgacc
aacgccgtggcgcacgtggacgacatgcccaacgcgctgtccgccctgag
cgacctgcacgcgcacaagcttcgggtggacccggtcaacttcaaggtga
gcggcgggccgggagcgatctgggtcgaggggcgagatggcgccttcctc
gcagggcagaggatcacgcgggttgcgggaggtgtagcgcaggcggcggc
tgcgggcctgggccctcggccccactgaccctcttctctgcacagctcct
aagccactgcctgctggtgaccctggccgcccacctccccgccgagttca
cccctgcggtgcacgcctccctggacaagttcctggcttctgtgagcacc
gtgctgacctccaaataccgttaagctggagcctcggtggccatgcttct
tgccccttgggcctccccccagcccctcctccccttcctgcacccgtacc
cccgtggtctttgaataaagtctgagtgggcggcagcctgtgtgtgcctg
agttttttccctcagcaaacgtgccaggcatgggcgtggacagcagctgg
gacacacatggctagaacctctctgcagctggatagggtaggaaaaggca
ggggcgggaggaggggatggaggagggaaagtggagccaccgcgaagtcc
agctggaaaaacgctggaccctagagtgctttgaggatgcatttgctctt
tcccgagttttattcccagacttttcagattcaatgcaggtttgctgaaa
taatgaatttatccatctttacgtttctgggcactcttgtgccaagaact