Download Práctica 10 – Predicción de genes
Document related concepts
Transcript
Práctica 10 – Predicción de genes Programas de predicción de genes: GENESCAN HMMgene GeneMark Glimmer GRAIL GrailEXP GENVIEW Genie GeneFinder MZEF FGENESH NETGENE2 Otros programas útiles: 1. RepeatMasker enmascara repeticiones Para la secuencia #1 del apéndice realice predicción de genes siguiendo los siguientes pasos: o Utilice alguno de los métodos de la lista (le será seleccionado por el docente en clase) y responda: ¿Cuántos exones tiene la predicción? ¿Cuáles son sus probabilidades? Si es posible marque en la opción de predicción de señales (donor, aceptor, promoter, etc.). Y vuelva a ejecutar. Aplique el RepeatMasker a la secuencia y vuelva a ejecutar. ¿Se han modificado los resultados obtenidos? 2. Repita el ejercicio anterior con la secuencia #2 pero con otro programa (utilice el del grupo de su izquierda). Tome los exones predichos y concatenelos para luego buscarlos en el NCBI por proteínas existentes o ESTs que corroboren la predicción. 3. Dada la secuencia HS307871 del apéndice, prediga sus genes pero ahora utilizando el programa del grupo de enfrente. Busque la secuencia en el NCBI y compare con los resultados de los exones e intrones anotados. 4. Repita el ejercicio anterior pero esta vez usando el GRAILEXP con la opción “Perceval Exon Candidates”. Ahora vuelva a usarlo pero cambie esa opción por “Galahad EST/mRNA/cDNA Alignments” y active “Gawain Gene Models” para predecir la secuencia codificante. 5. Un poco de trabajo manual… Tome la secuencia “short sequence” del apéndice y realice los siguientes pasos: o Determine el inicio y fin del gen utilizando Transcriptional Start Site Finder o Busque indicios de una polyA utilizando Poly A Signal Predictor o Grafique los datos obtenidos y prediga en dirección se transcribirá el gen. o Realice estos dos pasos nuevamente pero para la secuencia reverso complementaria (http://www.dnalc.org/bioinformatics/dnalc_nucleotide_analyzer.htm#permu tator). o Utilice el Splice site prediction program con donor score cutoff de 0.88 y acceptor score cutoff de 0.94 (donor site - al principio del intron, acceptor site - al final del intron). o Con todos los resultados obtenidos hasta el momento grafique las distintas alternativas para el gen. Ubique los codons de start y stop. o Identifique las posibles secuencias que codifica este gen y realice una búsqueda en NCBI. Apéndice – Secuencias Sequence #1 >gi|2789671|gb|AF040714.1|AF040714 Homo sapiens ATGCCAGGCCCCCCACCAGCCACGTTGGGGCAGCCCCCACAGCTCCCGGCCTTCGGGCCAAGGTGTCGGG GTGCGTCTCCTGGCCCATCAATACAGATTACATATTTATATCAATCGCGGGCTCTGAGGGCGCCCTCGGA GAGCGGCCCCGCGCCTACGAAACCAAACTGGGAGTGGTCGCGCGGAAACTCTGGCTCGGGATTGGCTGCG GGCGCCCGCCGCGGTGCGGGGGGATTGCTAATCGTATTCAGCATGTTTTGCACAAGAAATGTCAGCCAGA AAGGGCTATCTGCTCCCTTCGCCAAATTATCCCACAACAATGTCATGCTCGGAGAGCCCCGCCGCGAACT CTTTTTTGGTCGACTCGCTCATCAGCTCGGGCAGAGGCGAGGCAGGCGGCGGTGGTGGTGGCGCGGGGGG CGGCGGCGGTGGCGGTTACTACGCCCACGGCGGGGTCTACCTGCCGCCCGCCGCCGACCTGCCATACGGG CTGCAGAGCTGCGGGCTCTTCCCCACGCTGGGCGGCAAGCGCAATGAGGCAGCGTCGCCGGGCAGCGGTG GCGGTGGCGGGGGTCTAGGTCCCGGGGCGCACGGCTACGGGCCCTCGCCCATAGACCTGTGGCTAGACGC GCCCCGGTCTTGCCGGATGGAGCCGCCTGACGGGCCGCCGCCGCCGCCCCAGCAGCAGCCGCCGCCCCCG CCGCAACCACCCCAGCCAGCGCCGCAGGCCACCTCGTGCTCTTTCGCGCAGAACATCAAAGAAGAGAGCT CCTACTGCCTCTACGACTCGGCGGACAAATGCCCCAAAGTCTCGGCCACCGCCGCCGAACTGGCTCCCTT CCCGCGGGGCCCGCCGCCCGACGGCTGCGCCCTGGGCACCTCCAGCGGGGTGCCAGTGCCTGGCTACTTC CGCCTTTCTCAGGCCTACGGCACCGCCAAGGGCTATGGCAGCGGCGGCGGCGGCGCGCAGCAACTCGGGG CTGGCCCGTTCCCCGCGCAGCCCCCGGGGCGCGGTTTCGATCTCCCGCCCGCGCTAGCCTCCGGCTCGGC CGATGCGGCCCGGAAGGAGCGAGCCCTCGATTCGCCGCCGCCCCCCACGCTGGCTTGCGGCAGCGGCGGG GGCTCGCAGGGCGACGAGGAGGCGCACGCGTCGTCCTCGGCCGCGGAGGAGCTCTCCCCGGCCCCTTCCG AGAGCAGCAAAGCCTCGCCGGAGAAGGATTCCCTGGGTAAGCAGGGCTGCAGAGGGCTGCAGTCAGGCGG GCAGACAGGCAGACACAAGGAGGAGAAGGATCAGAAAACTAGGAGCCCGCGCAGCAGCCGGCCGGCCTTG GCCCAAGCTGCAGGCAGGCTGACCTTGTGAACTTGCTTTTTAATATTTGGGCGTGGGGGCGCAGTAAAAT TCATGTCCGGCTTAGCGCCCCACAGCAAGACGTCCTCGGCGCTGGCCTCAGCTCCCCCTGACTAGGGACG AGGACACCAGCGAGCAGGCCCCCTCCTGTGCGCTCTTTCCTGTGGCCGGGAGGACCCAGAGCCCTGGTCC CTGCCCAGCCTGCGCGGCGCGGCCCACGCGGGGGGAGGGGGAGGGAGGGAAAGTAGCTCGCCCGCAGATA GCGCGGATGTTTGTAAGGCATCCAAAATAAGCAGCCGCCAGCGCCAATAAATAAGCCCATTAACCGGCGA AGTTCGAGTGTACGATCCCCCATGCTTTTTTCAAAGTTGCTGAGGGGCGGGAATCTTCGTGGCGGGAAGA AGAAAAGGCAAATCCGGCCTGGAAGCGGGGGGCCCTGAGCTGAGAGCCAGAGAAGGGCCATTTCCCTTCC CCTGGACCTCGGAATCGCCCAGCTATGTATCCTGGCTCCTGGAGAAACTTGAGGGAGGGCCCTTGACCCC CGAATCGGTTTTTCCTGCCTTCCCCATTGGACCAATGATGCCCTTCTTTCTCCCCTTATCGAGTCTTGGG CAATCAGGGCCCTGGGGTGAGACAGCCAAGCTGCCTGGCCCATCTTCCAAGTAAGCACCCCGCGCTCCTA GCCTGGGGGCTACAGGAAATGCTTGTCTGCCATATGGCAAGAGGCAAAGAAAAGCGTTAAGTTCAAGATG TACAGCCTGCCCTCCCAGGCCTTTCCTTCTGCAAGCATCTACGGCTTAGCGCTAAAACAGGTGTTTGGAA AAGTGGGGGAAATGTAAATTGGAAGGGTCATGTAGATTGAAGGCCCACTCAATTTTTGTCATGACTTATG GAGGAACTGCTTGCTCTCAGCAAGCCAAAAACGGGGGCACGACTCTCTTCTCTGTGACTTGGGACATCTC TCTTATGGGAGAAACGGAGGCAATTCACCCCCGCGGGCAGCCCGTGTGGCCTCGACTTAATCATCCCCTC TTTATTCTCTTACATGCCAGGCAATTCCAAAGGTGAAAACGCAGCCAACTGGCTCACGGCAAAGAGTGGT CGGAAGAAGCGCTGCCCCTACACGAAGCACCAGACACTGGAGCTGGAGAAGGAGTTTCTGTTCAATATGT ACCTTACTCGAGAGCGGCGCCTAGAGATTAGCCGCAGCGTCCACCTCACGGACAGACAAGTGAAAATCTG GTTTCAGAACCGCAGGATGAAACTGAAGAAAATGAATCGAGAAAACCGGATCCGGGAGCTCACAGCCAAC TTTAATTTTTCCTGATGAATCTCCAGGCGAC Sequence #2 >gi|2739430|gb|U70368.1|MMU70368 Mus musculus hematopoietic-specific IL-2 deubiquitinating enzyme (DUB-2) gene, complete cds GGAAGGAAAACCAGACCTAGGCTGCTTATACTGGTTCTGTGTGGTTAGCAAGGTAACAGAAACTCTTGTA TGGCATGTGTAGTCATCTATTTGACATGATTTTGTAACTTTATTCCAAGTAAAACCCAAGCTTAAGACAC CTAGGAAATTGGAGCTAAATTCAGGGAAATGCACTCCAATAATGTGACATTTCTGAGCTGCTTTGCAGAA ACCACACCCAAATTGGGAGAAGCTTGTCTGGGATTGGCTGTCCTTGGAAGACTGTAGGCGTGGTCACAAG ACTGGAGTATAAAAGACTGAGCATTTGTCCTCACTTGCAGAGATTCTCTGGAGGGAAAGACTTCCTTCTG CTCCCTTAGAAGACTCCAGCAAGTTATTTGAAGAGGTCTTTGGAGACATGGTGGTTTCTCTTTCCTTCCC AGAAGGTAAGTCTCACTGTAAGGTCTTTATGTCTTGTGTGTCCCCCAGCAGCCTTGTCATCTCCGGCTGC CCTAGACCTGCATAAGGACAGATTGAGTGTGCTGGGATAGACTTTTGTTGACAAAGGGGCTGCTCTGCCC TTCTAAGAGGTTGAGTCTCATCATAAGGCCTTTTGCAGCTTGCATGTGTAGTGCCAGGAAAGAGTAGTCA TCCCCCAAAACCAGACAGGAACTGACGAGATGCAATCACTGTGTGGACTTTTTACCAGCTAGCTAGGGCA CTACCATGAGCCACTGTCTAGCAGGGAGGCTTTGGGGATGGTGTGCCCCGAATATCTCTCAGGGTAAGAG TTTACAGTAAGCAGCAAGCAGAGGGGTGTGGGTGAGTGTGCAAGTATCTAATTGGCTAGTTTTTGTGGCC TGTAACATATTGGTGGGTGTTGGGAGTCATAAGCTAAATGTTTGCTTTCCTCTGCATTGGTGGTCATTAG GGAGGGGGCAGATTATGAACCTAGGTTGCAGATCTGTTGGAGTAATAACAAGACACTGGTCTTGTTGGGG GTATAACCTAGAGACTCGATTTATGTTCATGTTTGGTTTGGGATGGGTTTTATGTGAGTGTTTTCTTTTT TGGGGAGGGGGTCGGTTAACTTGGAAAGTAATGCTAGGTACTGTCCTGTTCATTTCCCTGAGGTGAAAGT TAGGTCAGGTTTTCTAGAATGGAGTCTGAAGGTAAAARATTTGGCCACTGGCATGCCCTAAAGTCTTTTT GTGTTCTTGTCCCCTAGCAGATCCAGCCCTATCATCTCCTGGTGCCCAACAGCTGCATCAGGATGAAGCT CAGGTAGTGGTGGAGCTAACTGCCAATGACAAGCCCAGTCTGAGTTGGGAATGTCCCCAAGGACCAGGAT GCGGGCTTCAGAACACAGGCAACAGCTGCTACCTGAATGCAGCCCTGCAGTGCTTGACACACACACCACC TCTAGCTGACTACATGCTGTCCCAGGAGTACAGTCAAACCTGTTGTTCCCCAGAAGGCTGTAAGATGTGT GCTATGGAAGCCCATGTAACCCAGAGTCTCCTGCACTCTCACTCGGGGGATGTCATGAAGCCCTCCCAGA TTTTGACCTCTGCCTTCCACAAGCACCAGCAGGAAGATGCCCATGAGTTTCTCATGTTCACCTTGGAAAC AATGCATGAATCCTGCCTTCAAGTGCACAGACAATCAGAACCCACCTCTGAGGACAGCTCACCCATTCAT GACATATTTGGAGGCTTGTGGAGGTCTCAGATCAAGTGTCTCCATTGCCAGGGTACCTCAGATACATATG ATCGCTTCCTGGATGTCCCCCTGGATATCAGCTCAGCTCAGAGTGTAAATCAAGCCTTGTGGGATACAGA GAAGTCAGAAGAGCTACGTGGAGAGAATGCCTACTACTGTGGTAGGTGTAGACAGAAGATGCCAGCTTCC AAGACCCTGCATATTCATAGTGCCCCAAAGGTACTCCTGCTAGTGTTAAAGCGCTTCTCGGCCTTCATGG GTAACAAGTTGGACAGAAAAGTAAGCTACCCAGAGTTCCTTGACCTGAAGCCATACCTGTCCCAGCCTAC TGGAGGACCTTTGCCTTATGCCCTCTATGCTGTCCTGGTCCATGAAGGTGCGACTTGTCACAGTGGACAT TACTTCTCTTATGTCAAAGCCAGACATGGGGCATGGTACAAGATGGATGATACTAAGGTCACCAGCTGCG ATGTGACTTCTGTCCTGAATGAGAATGCCTATGTGCTCTTCTATGTGCAGCAGACTGACCTCAAACAGGT CAGTATTGACATGCCAGAGGGCAGAGTACATGAGGTTCTCGACCCTGAATACCAGCTGAAGAAATCCCGG AGAAAAAAGCATAAGAAGAAAAGCCCTTGCACAGAAGATGCGGGAGAGCCCTGCAAAAACAGGGAGAAGA GAGCAACCAAAGAAACCTCCTTAGGGGAGGGGAAAGTGCYTCAGGAAAAGAACCACAAGAAAGCTGGGCA GAAACATGAGAATACCAAACTTGTGCCTCAGGAACAGAACCACCAGAAACTTGGGCAGAAACACAGGATC AATGAAATCTTGCCTCAGGAACAGAACCACCAGAAAGCTGGGCAGAGCCTCAGGAACACGGAAGGTGAAC TTGATCTGCCTGCTGATGCAATTGTGATTCACCTGCTCAGATCCACAGAAAACTGGGGCAGGGATGCTCC AGACAAGGAGAATCAACCCTGGCACAATGCTGACAGGCTCCTCACCTCTCAGGACCCTGTGAACACTGGG CAGCTCTGTAGACAGGAAGGAAGACGAAGATCAAAGAAGGGGAAGAACAAGAACAAGCAAGGGCAGAGGC TTCTGCTTGTTTGCTAGTGTTCACTCACCCACTCACACAGGCTCCTGTGGACACCCTGCCAACCCAAGGT GCCTGGAACAAGAGGTTTGGACCTCTGTCCCAGGCAGGGACAATGCCTCACCCTTCATGTGGGGTCCACC TATCCTCTGGGCCCTTGCCTGTTTTTACTGACTGACTCTCTGAGAATGGTCATTTGAATGTGGAAAAAAA ATGCCCAGGGTGTTGCTACAGGTTAAAGACAGGAAAGCTGGACAGTCAGGGGAGGTCTGCATAGCCTCTC CTGCAACTCATGGGATCTGAGTAGCGTAGAGACTAAATCACCACACTGGAGCTTTCTTTACTTTGCTTTC CTTTTTTTTTAATTTATTTTTTGTTATTAGATATTTTCTTTATTTACATTTCAAATGCTATCCCAAAAGT TCCCTATACCCTCCCCCCCCCCGCTAACCTACCCACCCA >HS307871 AAGCTTCGTAAGCACCTCTCGCGGCACGAAAGCCAGCGCTGCCTAGGCGCCGCCCGGCGC GAGGCTCTCACCTCTGCCAAGAAGCGCACCGGCCCAGCAGCTGCCGGGGGGACTCCAGCA CCGCGCCGGGCCATGGACCCGCCATGAGTCAGCTGGCGCGACCGCGGACAGAGCTTCCCA CCACGCCCTTCCCCGCCTTTGGCCAGCCTTTGCCGTATGTTCTGGACTAAGCGCACCCCA GCTCTCACTGTATTGGACTGTGTACTCCCACACTCAACCATATTACTTATCTCTGTGCCA CCCTAACCCAGCCGACCAAACCCAAGATTGGTGATTGCTACCTGATCAATCTCCCTCTCT CCATTTCCTTGTGACTACCATTTTATCTCTACTGCTACTACCCTCATTCAAGTCACCATT CTAGCTAGCCTGGGTCATTGCCAACAGTCATTTTTCTGGTTCTTCGGCCTGCTGTTTTTC CTCCCACTCCCAGCGAATCTGCTGGACTCCCTATCCTATGGGTGGTGTGATTAAAGTGTT TGAGACAATGGCCCCTTCCCCTGCCACTGACAGGAGTCTTGAGTCATTAGGGTTGAGTTC TGTTTGACACTCCTAATCCCAAGGACACTGGAGATCATTATTCATTTTAATGTGATTGCT GATTTCTGTTTCCCCAGTCTTGTAGCTCCTTAAAGGCTGGGGTGTCTTGAGCAGAGCTAA CCTCTGCACCTACTATAGGTCCAGGCTATAGTATGGACCTGGCTGGATAAGACTGTTGGT ATCATAGTTGGGACTTGCGCCAAGCTCCGGATACCCAGACTGTCAGATGAGAACAAATTC CTCATGTCACCGTAAGATACATTTACAGCGGAGTTTTCTTTTGGGCCTTTGTTGTTTCGT CGCTACAGCAAACTTTACGGTGAAAAAAGGTAGGGGTCTACGGCAGCAGCAGGGCAGCCC TGGAGCTGTCGCTGGAGTCCGATCATGTGATCTTCAACATGGCGACGCTCTTGGTTCCCT ACAGAAAGGGGCGGAGCCTGGACTGGGGGGCAGGCTCAGATTCAGGTTAAATTGTGGATT GAGCTCGCAGTTACAGACAGCTGACCATGGAAGCGAATGGGTTGGGGTGAGTTCTCCAGA GCACGCGGTGTGGCTAGCCGGGCTTCTAATTTGAGTCTTCCAACTCAGGACTCTATCCCT CTACTCCCCTTTCCCCACCCTGGAGAACCTCCCAACCTGAACTCCGTTAGCTGGATCCTG AATCCTAAAACCATGGATTTTTGAGATGTTCATCCCAGGGCCTTAATTCAAGGGATGCCT CAGGATTTCCAACCAGGATCTTCATTCTGGGACCATCAACTCTGATCCCTCTTTATCCCC CAGCCTGGGTATTTCTCAGCCCCTGAACCAGCCCAGTGACATTTCCCGGTTTCTGAGGCT CACTAGTTCGAAGACCCCCAAACTATCCTTAGTGGGCCTTCATTCCCTCCCCCCAGTCCC TCTGGTTGCTTCGAGCTTGGAAGAGTAGAGACTAAGTGGAGGGAAGAGGCCCCAGGGCGG GCCCTTCTGGAGTTTGTGCACTGATAGGCAGAGAGGAGGCGGAACGGGCGGAAAGCCAGG GTTTGGGAGCTGGCCTGGAGGAGGTAGGATAGCGGTCCTGGACTGAATCGGCCTTATGAA CCCGCGCTTTCCCCAGCCGTCCAACGTAGCATACTGACACCTACCCCCACCCCCACCTGA TCGCCAGACCTCAGGGTTTTCCGGAGCTGAAGAATGACACATTCCTGCGAGCAGCCTGGG GAGAGGAAACAGACTACACTCCCGTTTGGTGCATGCGCCAGGCAGGCCGTTACTTACCAG GTAAGAGTCAGGGTCTGGAAATCTAGATAAAACTCCGGAGGAGAAAAGTTTTCGAGGGGC AGGGGAGGGCTCTGGAGGGCCTCAAGGCTGAGCCCTGTCTTCCCTCTGTATGCAGAGTTT AGGGAAACCCGGGCTGCCCAGGACTTTTTCAGCACGTGTCGCTCTCCTGAGGCCTGCTGT GAACTGACTCTGCAGGTGAGGGGTCCACAAAAGAGGGAAAGATTTATGCCTTCAGTCTGC CACCTAGCAACCTGTCTCCTGTTTCCTACAGCCACTGCGTCGCTTCCTTCTGGATGCTGC CATCATTTTCTCCGACATCCTTGTTGTACCCCAGGTACCCACTCAAACCTGATCCTAGAA TATAATCCAAGGACGCCTTGAAAATCCTTCTATCAGTCCAGTCAAGGTTTACAATAAGCA CTTATCCTAACTGGATCGAGGGAAAAACTAAGGTTGAAAGAAATGGAGTTTGGCAGAGTT TTATTCTCCTTTTCCTTCCTCCTGGAATGAGCTGAACAGAACCTTTCCTCCTGGATTCCA TTTTGGGAACCCAGATGTTTTCTCCCCCTCCAGGCACTGGGCATGGAGGTGACCATGGTA CCTGGCAAAGGACCCAGCTTCCCAGAGCCATTAAGAGAAGAGCAGGACCTAGAACGCCTA CGGGATCCAGAAGTGGTAGCCTCTGAGCTAGGCTATGTGTTCCAAGCCATCACCCTTACC CGACAACGACTGGCTGGACGTGTGCCGCTGATTGGCTTTGCTGGTGCCCCAGTAATGTGG GACAGGGCAGGGACTCGGGGCGCGGGGAGATCACTCTGGAAGGTCTGGGGTAGACAAAAG GAAGGGTCAGTCTGGCTTCTGTGACACCATCTTTCTATCCTTCTCTAGTGGACCCTGATG ACATACATGGTTGAGGGTGGTGGCTCAAGCACCATGGCTCAGGCCAAGCGCTGGCTCTAT CAGAGACCTCAGGCTAGTCACCAGCTGCTTCGCATCCTCACTGATGCTCTGGTCCCATAT CTGGTAGGACAAGTGGTGGCTGGTGCCCAGGTGAGTCCTGAGAGAGAGAGAAATAGGCTG GGATTTGGTCTGTAAGGCCGAGAAGCAAGAGTGTCCTAAACCTGAGAGGGCAGGGGTCTT AATGCTAGGGATGAAAGAACCTTGGCCTCCAGTGATCTAGCTGAGCAGCCAAGCCCATCC TGACACTGACAGTGGGGCTTAATGCTCTAAGTATTCAGACACCAAAGTTAGTGCTGGGAT CTGAGGAAAGTAAATTTTTTTTTTTTTAATTACTGGGTTTTTAGGGTCAGGCAGTATCAG GGATTGAAGTCATTTGGGGAAAATTGAGGTGGATTTTGTATGTGGGGGAAACTTCCTCTT TGTGTGTTACATATTTTTCTTCACCATACCCTAACTAGGCATTGCAGCTGTTTGAGTCCC ATGCAGGGCATCTTGGCCCACAGCTCTTCAACAAGTTTGCACTGCCTTACATCCGTGATG TGGCCAAGCAAGTGAAGGCCAGGTTGCGGGAGGCAGGCCTGGCACCAGTGCCCATGGTGA GGATTGGGATGGGTTGAGTGAAGGTGGTCCTGTGGAGCTTTCAGGCTAAGTCCTGCATGG ACTGGAGTGACCACTGGAGGGCAGCAGAAGTACAGTCAAGAAAGATTAGTGGTTGTAGCA AGGCCCTCTGTAGCCTGAGATCTGCTTTTTTCTAGATCATCTTTGCTAAGGATGGGCATT TTGCCCTGGAGGAGCTGGCCCAAGCTGGCTATGAGGTGGTTGGGCTTGACTGGACAGTGG CCCCAAAGAAAGCCCGGTAAGCCATGGAAGGGTGAGGCCTTGAGGTTGAGGTGGGGGTGT TGGCTGGGGGAGCTGCCATGTATGCAGTTACCAGAACGTGGCGCTGGCTTTGCTTCCAGG GAGTGTGTGGGGAAGACGGTGACATTGCAGGGCAACCTGGACCCCTGTGCCTTGTATGCA TCTGAGGTAACAGCCAGGGCCCCTCTGTGTGTCCTGTTACTGTGCACTCCTGTGGCCTGT GGTTGTATTATTCTGTGTGCACTTGTTTTTAATGTCTGTCTGTCCTTTTCTTCTCATCTG TACACCATAAGCCCTAGAAAGACCGGACTTTTTGTTGCTGTTGTTCATTTGTGTTTATGC TTCATGCCTGGGTCCATACTAGGGATCTATAAATTTTATTGAATGACTGAATAACACTGA GTTAGAAGCATGCCTACCATATGCGTTTCTACTAGTATATATAGGGAGGACAAAGGCTTG CTGGTCCTCCTGTAGCCAGTGCCCTGTTGGTCCCCCAGGAGGAGATCGGGCAGTTGGTGA AGCAGATGCTGGATGACTTTGGACCACATCGCTACATTGCCAACTTGGGCCATGGGCTTT ATCCTGACATGGACCCAGAACATGTGGGCGCCTTTGTGGATGCTGTGCATAAACACTCAC GTCTGCTTCGACAGAACTGACCGTATACCTTTACCCTCAAGTACCACTAACACAGATGAT TGATCGTTTCCAGGACAATAAAAGTTTCGGAGTTGAACCTATTGTGTAGTTTTGTTTGTG AAAGATTGTCCCATATCCTCAGTTCTTCTTAGCCTCTGTCTCCTTCCCTGGGACCCTCTC ATATCCTCTTATAG >unknown AGGTCGACTGAACCCCACAGGTGATCTCTAAGTGGTGTGCCCCCCACCCCCCCGTCTTCATGGTACGCCT TACCTCCTAA GGGTTGTCGAGCATAGCTAGGTGAAGGATGTACACTTGGAGTTTAAACTATTGAGGAAGCCGAGGTTGGG GGAGTTCAAA GCCAGCCTGAACAATGTACCAAGACATCTTCTAACAAAACAAAACACCGGCTGGTGAGGTACCTCAGTGG GTAAAGGTGC GTAGCCCTAAGCCTGATAACCGGAGTTTGCTCTCTCTAGAACTGACGTGGGAGAAGAGAAGCTGTCTCTA CAGTCCTCCT CTGACCTCCACACCATGCTGCAACATTCACCCCCAGCCCCAACGAGAGTAGTAAAAACTCAAAACAAAAC AAACAAAACA GGGAGGGACTGGAGAGATGGCTCAGTGGTTAGGGCCACCAGGCTGCTCTTCTGGAGGACATCCACATTCA CAACCACCTC TGACTCTCGTTCCAGAGGATCTAACATCTTCTTCCAGCCTCTACGTAGAGGCACCAGGAATGCGTGGCGC ACACAGATGT ACATGGGGACAAAACATGCACATAAAATACAGTAATAAGCCGGGCAGTGGTGGCACATGCATTCAGGAGG CAGAGGCCAG CCTGGTCTACAGAGTGAGTTCCAGGACAGCCAGGGATACACAGAGAAGCCCTGCCTCAAAAAACCAACAA CAACATAAAA ATTAATAAAAAATTTTTTTGATTTACTTTATTTATATGAGAACACTGTCACTGTCTTCAGACACCAGGAA AGGACATCAG ACCCCATTACAGATGGTTGTGAGTCACCATGTGGTTTTTAACCACTGAGCCATCTCTCCAGCCCTATATA TATATTTTTT TTTAAGATTTATTTATTTATTTATTATATGTGTGTATGCTGTAGCTGTCTTCAGACACTCCAGAAGAGGG CATCAGATTT TTGTTACAGATGGTTGTGAGCCACCATGTGGTTGCTGGGATTTGAACTCAGGACCTTTGGAAGAGCAGTC GGTGCTCTTA ACCGCTGAGCCATCTCACCAGCCCCAAAATATATTAAAACAACAACAACAAGAGAGTGTGAAACACAGCC TCTGGGGCCC CCCACAGAGTCCTGTGTCCCTATTCTAAGGATCTGACAATTTAACCCTACCTCCTCCATGGTGAGGCCCC AGTGGAGCTA GGGGCATAGGCACAGACAGGACCATTGGACTAGAGTTTATATTGGGGTTCTTAGCACTTCTGAGACTTCC TTTCCTAACT AAGGGTGACAATAGTACCTATTATTGTTGACACTGGTATTATTTTTATTGTTGTTTGTTGTTATTTACAG AGTCTTGCCA CATAGCCAGGTTAGCCTTGCTAGTACTAGCCCCTGAGTGTCTCTGATACTATACTCTTCTGTGTGGGTCT CACTGTGTAG CCAGGGCTGCTTGGAACTTACTATGTAGACCAGGATGGCTTTGAACCACTGAGACCTGACTTTTTTTAAC CCGCTAAGTG CTGGGATTAAAGGCGTGTGCTGCCATGCCTGGCTTTGCACACACACACATACTGTGTGGCGTGCAGGTGT GTGCACATGC TAGGACATGTGCATGTCACAGTGCACATGTGGCAGTCAGAACACAACTTCAGGTGTCATCCTCCTGTCTC TGCTTTACAT CTGGACCTAGGAGAGTTGGGATTGCAGATAGGAGCAACTTCTGTGGGTTCTGGGGATCCGAACTCGGGCC CTCAGACTTG CTCAGAAAGCACTTACCTACTATGCCATCTTCCCAACGCTATGAGAAGTAATTTTTAAAAACATTTGCTG ATTTTACATG TGCATCTGTGTGTGTGTATGTAAGGAGTGCACAGGGGCAATGCACACTGCTCACGGCATGTGTGCAAAGA ACAGAGAACA ACTTGCAGGAGCTGGCTCTCGCCTGCAACCGCGTGGCCTCTAGGGCTTGAACTCAGGTCGTTGATCTTGG TGGCAAGCAT CTTTGCCTCCTGAGCCAGCTCCCATAAAAAGTAATGTGTGTAATATGCTTAAAAGAATCAGCCAATACCT TGTGTTATTA CTAACAGTCAATGAGTAGTTGTTGCTATTCGCCCATTCTTTGATCATGGAGACTTCTATTCCTGGACCTA GAAATGGGGC AAGAAGAGGCGTAGACATGGTAGTATATATACCTGTAATCCTAGCATTAGGGAGGCTGAGGCAGGAGTTC TGGGCTAGCC TGTGTTACACAGAGGAATTCTGCTTTACTCTTCACCCTAAAGCATGGACATAAAGGAGCAGACACTGTCT TTGCACTCCT GCACAATCGAGTGTTTTCTCAGGGGGGAAATGCATGCACCCAAGGTCCTTGTCTCCTTCCATCCCTTCCG GGGCTGCCTC AGGGCCCTCAGATTTCTTCTCTCCAGTCTTATGGAGTCAGGCAGGAAAGGTCAGGCTCTAGGATAGGGAA CCAACAAGAG ACCCTCCAAAAGGCTCTGCCCACTTTGGCTTCCATGTTCGGCAGCCCCCGTTTTGCCTTTTTCCTTCCTG GCTTAGGGCC AGTTTCTCATTTGCCCTAAACTCGTCCCTGAGTGAGGGAGGGCAGAGTAAGAGAATCAGGAAGCCTGATG CTGTGTTCCT GCATTCTCAGGCTCAGGTCCGTCCTCGGTGTGGGCGCCTTGTGGCCTTCAGCTCTGGTGGTGAGAATCCC CATGGTGTAT GGGCTGTGACTCGGGGACGGCGCTGTGCCCTAGCACTGTGGCACACGTGGGCACCTGAGCACAGTGAACA GGTAGGGAGG AAGAGGTGAGGGGGTGGGGGGGTGGGCAGGTGGGCAGGTGGTCACTGGAGAAGTTTGTGAGTGGGTGAGA CCCCCAGCAA AGCGCTTGGGTGACTGAGATAGCCCAGGGGTGGTCACTGGAGAAGTTTGTGAGTGGGTGAGACCCCCAGC AAAGCTTGGG TGACTGAGATAGCCCAGGAGAGGTGGATAAAGGAGACTGAACCACACTTCTCCCCATTCCCAGGAGTGGA CAGAAGCCAA AGAGCTGCTGCAGGAGGAAGAGGAGGAAGAAGAGGAGGAAGACATTCTCAGCAGAGACCCTTCCCCAGAA CCCCCAAGTC ACAAGCTTCAGCGAGTCCAGGAGAAAGCTGGGAAGCCCCGCCGGGTCCGGGTCCGAGAGGAACTGTGAAT GGCTGAGCCT GCTTCTCAGGATCAGGCCACTCAACTTGGGAAGGAACTGATGAGAAGGCTCTGGAGGATATCAGGAACAT AGTAGCATGC CAAGTCTACCATCTCGGGGACTTACAAGGGCTACCAGACCCTGGACTCACAAGCTTGCTACACAGACTTA GCCTACAGCA CATCAGGCCCGGGAGCCAGGTCTGGCCCCAGCTGAGGGACCTGCAAGGTCCCCAGGACAGACAAAAATCA CTATGCCTCC CTGAAAGGCAGGCATGTGGAGGAGTGCAGAGCAACTGCTTCTAATAAGAAACACACAGAGGGGCTGGAGA GATGGCTCAG CGGTTAAGAGCACTGACTGCTCTTCTGAGGTCCAGAGTTCAAATCCCAGCAACCACATGGTGGCTCACAA CCATCCGTAA TGAGATCTGGCGCCCTCTTCTGGGGCGTCTGAGGACAGCAACAGTGTACTTACATATAATAAATAAATAA ATCTTTAAAA AAAGAAAAAGAAAAGAAAAGAAACACACAGAGGAGACAGTCCCATCCTCT >short sequence, 1500 bases tccggcgcccgaaaggaaagggtggcgctgcgctccggggtgcacgagcc gacagcgcccgaccccaacgggccggccccgccagcgccgctaccgccct gcccccgggcgagcgggatgggcgggagtggagtggcgggtggagggtgg agacgtcctggcccccgccccgcgtgcacccccaggggaggccgagcccg ccgcccggccccgcgcaggccccgcccgggactcccctgcggtccaggcc gcgccccgggctccgcgccagccaatgagcgccgcccggccgggcgtgcc cccgcgccccaagtataaaccctggcgcgctcgcggcccggcactcttct ggtccccacagactcagagagaacccaccatggtgctgtctcctgccgac aagaccaacgtcaaggccgcctggggtaaggtcggcgcgcacgctggcga gtatggtgcggaggccctggagaggtgaggctccctcccctgctccgacc cgggctcctcgcccgcccggacccacaggccaccctcaaccgtcctggcc ccggacccaaaccccacccctcactctgcttctccccgcaggatgttcct gtccttccccaccaccaagacctacttcccgcacttcgacctgagccacg gctctgcccaggttaagggccacggcaagaaggtggccgacgcgctgacc aacgccgtggcgcacgtggacgacatgcccaacgcgctgtccgccctgag cgacctgcacgcgcacaagcttcgggtggacccggtcaacttcaaggtga gcggcgggccgggagcgatctgggtcgaggggcgagatggcgccttcctc gcagggcagaggatcacgcgggttgcgggaggtgtagcgcaggcggcggc tgcgggcctgggccctcggccccactgaccctcttctctgcacagctcct aagccactgcctgctggtgaccctggccgcccacctccccgccgagttca cccctgcggtgcacgcctccctggacaagttcctggcttctgtgagcacc gtgctgacctccaaataccgttaagctggagcctcggtggccatgcttct tgccccttgggcctccccccagcccctcctccccttcctgcacccgtacc cccgtggtctttgaataaagtctgagtgggcggcagcctgtgtgtgcctg agttttttccctcagcaaacgtgccaggcatgggcgtggacagcagctgg gacacacatggctagaacctctctgcagctggatagggtaggaaaaggca ggggcgggaggaggggatggaggagggaaagtggagccaccgcgaagtcc agctggaaaaacgctggaccctagagtgctttgaggatgcatttgctctt tcccgagttttattcccagacttttcagattcaatgcaggtttgctgaaa taatgaatttatccatctttacgtttctgggcactcttgtgccaagaact