Download Diapositiva 1 - VHIR`s Statistics and Bioinformatics Unit

Document related concepts
no text concepts found
Transcript
CURS OF BIOINFORMATICS
FOR BIOMEDICAL RESEARCH
Vall d’Hebron Institut de Recerca (VHIR)
Institut d’Investigació Sanitària acreditat per l’Instituto de Salud Carlos III (ISCIII)
NEXT GENERATION SEQUENCING
TECHNOLOGIES AND APPLICATIONS
Rosa Prieto
Head of the High Tech Unit
[email protected]
15/05/2014
1
CURS OF BIOINFORMATICS
FOR BIOMEDICAL RESEARCH
Index
1
INTRODUCTION TO NGS
2
NGS TECHNOLOGY OVERVIEW
3
NGS APPLICATIONS OVERVIEW
4
WHAT IS NEXT IN SEQUENCING TECHNOLOGIES?
2
Introduction
Personalized medicine era
-The right therapeutic strategy for the right person at the right time
-Predisposition to disease
-Early and targeted prevention
Biomarker identification:
•Diagnostic
•Susceptibility/risk (prevention)
•Prognostic (indolent vs. aggressive)
•Predictive (response)
5
Introduction: “omics”
“Omics”
Omics aims at the collective characterization and quantification of pools of biological molecules that
translate into the structure, function, and dynamics of an organism or organisms (Wikipedia).
High-throughput
technologies
Genomics Transcriptomics Proteomics
Epigenomics
Metagenomics
http://www.genomicglossaries.com/content/omes.asp
Metabolomics
Lipidomics
7
Everything can be sequenced…
Next generation sequencing
The future is here, now?
8
Introduction to NGS technologies
Automatic sequencer ABI
1987
3.234,83 Mb (haploid)
$ 2,7 billion
(GS20)
1st generation
2nd generation
http://www.ipc.nxgenomics.org/newsletter/no11.htm
3rd generation
9
Sequencing technology milestones
First generation sequencing
Second generation sequencing
NGS increases capacity and reduces costs
Moore’s Law: the number of transistors in an
integrated circuit duplicates in 2-years time (1965).
Date
Cost per Mb
Cost per Genome
% cost vs. sep01
Sep-01 $5.292,39
$95.263.072
100%
Sep-02 $3.413,80
$61.448.422
64,5039%
Oct-03 $2.230,98
$40.157.554
42,1544%
Oct-04 $1.028,85
$18.519.312
19,4402%
Oct-05
$766,73
$13.801.124
14,4874%
Oct-06
$581,92
$10.474.556
10,9954%
Oct-07
$397,09
$7.147.571
7,5030%
Oct-08
$3,81
$342.502
0,3595%
Oct-09
$0,78
$70.333
0,0738%
Oct-10
$0,32
$29.092
0,0305%
Oct-11
$0,086
$7.743
0,0081%
Oct-12
$0,074
$6.618
0,0069%
Oct-13
$0,057
$5.096
0,0053%
Jan-14
$0,045
$4.008
0,0042%
Source - NHGRI : http://www.genome.gov/sequencingcosts/
Sanger sequencing vs. NGS (2nd and 3rd generation)
Sanger
2ªNGS
1. Fragmentación de DNA
1. Fragmentación de DNA
2.Clonaje en Vectores; Transformación
Bacterias; crecimiento y aislamiento
vector DNA
2. Ligación de adaptadores in
vitro y Amplificación clonal
3ªNGS
1. Fragmentación de DNA
2. y 3. Ligación de adaptadores in vitro
y Secuenciación masiva
SIN Amplificación
3. Ciclo Secuenciación
3. Secuenciación masiva en paralelo
Secuencia:
Primer:
Polimerasa
dNTPs
ddNTPs marcados
4. Procesamiento imagen y
análisis de datos
4. Procesamiento imagen
Electroforesis
(1 Secuencia/Capilar)
CTATGCTCG
4. Procesamiento imagen y
análisis de datos
Comparison of different NGS platforms
-Similarities (and differences vs. Sanger):
•library preparation:
starting material: short fragments of nucleic acids
adapter ligation
multiplexing (MID tags)
•clonal amplification (not for 3rd generation sequencing)
•massive parallel sequencing
•the use of physical location to identify unique reads is a critical concept for all next
generation sequencing systems. The density of the reads and the ability to record
them without interfering noise is vital to the throughput of a given instrument.
•signal needs to be processed and post-treated to get the individual sequences
•complex data analysis due to the big amount of data
-Differences:
•Clonal amplification method/sequencing technology/signal detection
•Throughput
•Read-length
•Run time
•Cost per base
2ns generation NGS platforms
Benchtop
Instruments
ROCHE
GS Junior 454
GS FLX+ 454
Illumina
NextSeq500
HiSeq 2500
HiSeq X-Ten (exp.2014)
MiSeq
Life
Technologies
SOLID5500xl
IonProton
IonPGM
16
NGS general workflow
1
Library preparation
2
Clonal amplification
3
Cyclic array sequencing
1
DNA fragmentation and in vitro adaptor ligation
Different kinds of libraries (amplicons, shot-gun,
cDNA….)
emulsion PCR
bridge PCR
2
3
Pyrosequencing
454 sequencing
Semiconductor sequencing
Ion Proton/PGM
4-colour fluorescent nucleotides
Illumina technology
17
Clonal amplification by emPCR (454, Ion)
emPCR based systems (Roche, SoLID, Ion)
High-speed
shaker
-1 starting effective fragment per microreactor
- ~106 microreactors per ml
- All processed in parallel
(Clonal amplification)
18
Clonal amplification by emPCR (454, Ion)
No empty beads
Clonal amplification??
No beads containing more than one
amplified fragment
1) Bead vs. starting DNA quantity titration
2) Optimal enrichment:
Melt
5-20% OK
dsDNA
Unión de Primer marcado
con Biotina a bolas de
captura con ssDNA
Adición de bolas
magnéticas con
estreptavidina
Melt
19
Bridge amplification (Illumina)
HiSeq2500: 2 “flow-cells”, 8 carriles por celda
Clusters clonales de cadena doble
Eliminación de las cadenas reversas
Unión de cadenas sencillas a los adaptadores
Bloqueo y adición primer secuenciación
Generación de clusters: PCR “en puente”
100-200 millones de clusters
20
GS FLX 454 sequencing
Metal coated PTP reduces crosstalk
29 μm well diameter (20/bead)
3,400,000 wells per PTP
21
GS FLX 454 sequencing
Pyrosequencing (sequencing by synthesis)
CCD Camera
“flowgram” (signal intensity is proportional to the
number of nucleotides incorporated in the
sequence)
- throughput limited by the nº of wells in the PTP
- errors in homopolymers :S (454)
- long sequences (up to 1000bp) are achieved
- low throughput, very expensive reagents
22
Illumina sequencing
Reversible dye terminator nucleotides (sequencing by synthesis)
Liberación secuencial de 4
nucleótidos fluorescentes
Eliminación terminador 3’
Incorporación
Captación de imagen
- Limited by the fragment length than can “bridge”
- Labelled nucleotides are not incorporated as efficiently as
native ones
- Short sequences
-Strand-specific errors, substitutions towards the end of the
read, base substitution errors (sistematic error GGT >GGG)
-High throughput, expensive machines, cost per Mb OK
23
Ion Torrent sequencing
ION TORRENT (Life Techn.)
Fragmentación
& secuencias adaptadoras
Amplificación clonal (emPCR sobre beads)
Deposición de las beads+DNA en los pocillos del chip
1.
2.
3.
Liberación secuencial de nucleótidos no modificados
La incorporación de un nucleótido por la polimerasa libera un H+
Detección directa y simultánea de un cambio de pH en todos los
pocillos.
•pHmeter, no optical system: rapid output improvement based on chips
•Fast runs (native nucleotides)
•Inexpensible machine and reagents
•Fails in homopolymers detection
24
NGS data analysis
Pyrosequencing
454 sequencing
25
NGS platforms comparison
PLATFORM
ROCHE GS FLX+ 454
ILLUMINA HISEQ 2500
ION PROTON
emPCR
Bridge amplification
emPCR
Sequencing chemistry
Pyrosequencing
Reversible dye terminators
pH change
Read length
Up to 1000bp
From 2x125 bp to 2x300 bp
Up to 200 bp
Run time
22 hrs
7 hrs-6 days
From 2 to 4 hrs
Throughput/run
Up to 700 Mb
500-1000Gb (1Tb)
10Gb (PI), 100Gb (PII)
Equipment Cost
500.000 $
750.000 $
250.000 $
Reagents Cost/run
8.000 $
5.500 $
1.000 $
GOOD!
Longest read length
High throughput/low cost per
base/ease of use
Quick, easy to use and cheap
BAD!
High error rate in
homopolymers (>6); very
expensive; low throughput;
not automatized at all
Short sequences
Strand-specific errors,
substitutions towards the end of
the read, base substitution errors
(sistematic error GGT >GGG)
Library preparation
Errors in homopolymers
Higher bias than Illumina
26
NGS High-Throughput Platforms comparison
HiSeq Xten
(10 HiSeqX)
Two modes: Rapid Run and High Output
Single/Dual Flow Cells
PE 2 x 125 pb
120 Gb in 27 hours (Rapid)
1 Tb in 6 days (High)
20 exomes in a day
1 human genome in a day
30 RNAseq samples in 5 hours
Only High Output mode
Single/Dual Flow Cells
PE 2 x 150 pb
600 Gb in a day (dual flow cell)
1.8 Tb in 3 days (4x faster than HiSeq2500)
HiSeq XTen: 10.000 genomes at 30x per year
Human exome, 30x, aprox. 800-1000 €
Human RNAseq (30Mreads, 100bp PE, strand
specific): aprox. 800-1000 €
Human whole genome 30x: 4000 €
Source: Nextgenseek.com & Allseq.com.
Todos estos costes son orientativos a mayo de 2.014 y de ninguna manera vinculantes para la UAT
Ion Proton
Ion PI chip:
Up to 20 Gb output (specific. 10 Gb)
Read length:Up to 200 bp
Run time: 2-4 hrs
1 human exome (aprox. 1000 €)
Ion PII chip:
Up to 100 Gb output (expected 2014),
now reduced to 20-30 Gb at launch
Run time: 2-4 hrs
Read length: 100 pb
Human Whole Genome (10x, ?)
Ion PIII chip (???): 200 Gb output per run
27
NGS Platforms specifications and applications
Illumina
Ion PGM/Ion Proton
28
NGS Platforms specifications and applications
Roche 454
PacBio RSII (3rd generation)
29
NGS advantages and limitations
Journal of Investigative Dermatology (2013) 133
31