Next-Generation Sequencing (NGS) is a technology that enables the rapid sequencing of large amounts of DNA or RNA by massively parallel sequencing—meaning millions or even billions of fragments are sequenced simultaneously. NGS is also referred to as high-throughput or massively-parallel sequencing[1][2][3].
The primary objective of NGS is to determine the exact sequence of nucleotides (A, T, C, G for DNA; A, U, C, G for RNA) in genetic material. It is used to:
Step | Description |
---|---|
1. Nucleic Acid Isolation | Extraction and purification of DNA or RNA from the sample. High yield and purity are crucial for accuracy[7][8] |
2. Library Preparation | The DNA/RNA is fragmented, and specialized adapters (short DNA sequences) are attached to both ends. These adapters allow fragments to bind to sequencer surfaces and may include barcodes for sample identification[7][9][8] |
3. Clonal Amplification & Sequencing | DNA fragments are amplified and immobilized on surfaces like beads or flow cells. The sequencer reads the sequence by detecting fluorescent signals as nucleotides are incorporated, determining the exact base order[7][10] |
4. Data Analysis | Bioinformatics tools process the data: aligning sequences to reference genomes, annotating variants, and interpreting the results to generate useful biological insights[7][6] |
NGS data analysis follows a structured workflow, generally involving three core stages, each with its own objectives and associated bioinformatic tools:
Objective: Convert raw instrument data into base calls and basic quality scores.
Tools: Usually built into the sequencer software (e.g., Illumina RTA, ONT Guppy).
Tools: bcl2fastq (Illumina), Guppy (ONT).
Objective: Process raw reads into interpretable genome/transcriptome alignments or assemblies.
Typical Steps & Tools:
Tools: FastQC, fastp, MultiQC[11][12]
Tools: Trimmomatic, Cutadapt, fastp[12]
Tools: BWA, Bowtie2, STAR (RNA-Seq), minimap2[11][12][13]
Tools: SAMtools, Picard, GATK MarkDuplicates[11][13]
Tools: GATK HaplotypeCaller/Mutect2, Strelka2, VarScan, FreeBayes[11][14][13]
Tools: featureCounts, HTSeq, Salmon, Kallisto
Objective: Biological interpretation, visualization, and reporting.
Steps & Tools:
Tools: ANNOVAR, SnpEff, VEP[14][11]
Tools: Integrative Genomics Viewer (IGV), R, Python (matplotlib, seaborn)[15]
Tools: DESeq12, EdgeR, MetaPhlAn.
Step | Objective | Example Tools |
---|---|---|
Base Calling/Demultiplexing | Raw data → Reads per sample | Illumina RTA, Guppy, bcl2fastq |
Quality Control & Trimming | Filter/clean sequence data | FastQC, Trimmomatic, cutadapt, fastp |
Alignment/Mapping | Locate reads on reference | BWA, Bowtie2, STAR, minimap2 |
BAM File Processing | Organize/process alignments | SAMtools, Picard, GATK |
Variant Calling/Quantification | Find variants/expressed genes | GATK, Strelka2, FreeBayes, featureCounts |
Annotation | Biological interpretation | ANNOVAR, SnpEff, VEP |
Visualization/Reporting | Summarize and interpret results | IGV, MultiQC, R, Python |
NGS data is generated from sequencing devices and passed through multiple steps: