NGS Tutorial Series

1. What is NGS Sequencing?

Next-Generation Sequencing (NGS) is a technology that enables the rapid sequencing of large amounts of DNA or RNA by massively parallel sequencing—meaning millions or even billions of fragments are sequenced simultaneously. NGS is also referred to as high-throughput or massively-parallel sequencing[1][2][3].

Objective of NGS

The primary objective of NGS is to determine the exact sequence of nucleotides (A, T, C, G for DNA; A, U, C, G for RNA) in genetic material. It is used to:

Study genetic variation and mutations
Identify pathogens
Decode entire genomes
Analyze gene expression
Understand genetic contributions to diseases[2][4][3].

What is Analyzed with NGS?

Genomic DNA (entire genomes, specific regions, or genes).
RNA (for transcriptome analysis, such as RNA-Seq).
Microbial communities (metagenomics).
Genetic mutations and variants, including single nucleotide changes SNPs, insertions & deletions (indels), copy number variations (CNVs), and gene fusions[2][4][5][6]

NGS Workflow Steps

Step	Description
1. Nucleic Acid Isolation	Extraction and purification of DNA or RNA from the sample. High yield and purity are crucial for accuracy[7][8]
2. Library Preparation	The DNA/RNA is fragmented, and specialized adapters (short DNA sequences) are attached to both ends. These adapters allow fragments to bind to sequencer surfaces and may include barcodes for sample identification[7][9][8]
3. Clonal Amplification & Sequencing	DNA fragments are amplified and immobilized on surfaces like beads or flow cells. The sequencer reads the sequence by detecting fluorescent signals as nucleotides are incorporated, determining the exact base order[7][10]
4. Data Analysis	Bioinformatics tools process the data: aligning sequences to reference genomes, annotating variants, and interpreting the results to generate useful biological insights[7][6]

🎯 NGS sequencing is a transformative technology to rapidly read genetic information. It has revolutionized genomics research and modern medicine by providing highly accurate, large-scale genetic data for diagnostics, research, and understanding biological mechanisms[1][2][3].

2. Steps in NGS Data Analysis

NGS data analysis follows a structured workflow, generally involving three core stages, each with its own objectives and associated bioinformatic tools:

1. Primary Analysis

Objective: Convert raw instrument data into base calls and basic quality scores.

Base Calling: Raw signals/images from the sequencer are turned into nucleotide sequences (reads).

Tools: Usually built into the sequencer software (e.g., Illumina RTA, ONT Guppy).

Demultiplexing: Assigns reads to samples using barcodes sequences

Tools: bcl2fastq (Illumina), Guppy (ONT).

2. Secondary Analysis

Objective: Process raw reads into interpretable genome/transcriptome alignments or assemblies.

Typical Steps & Tools:

Quality Control(QC): Assess and improve read quality.

Tools: FastQC, fastp, MultiQC[11][12]

Filtering/Trimming: Remove low-quality reads and sequencing adapters.

Tools: Trimmomatic, Cutadapt, fastp[12]

Alignment/Mapping:

Tools: BWA, Bowtie2, STAR (RNA-Seq), minimap2[11][12][13]

BAM Processing: Sorting, indexing, marking PCR duplicates.

Tools: SAMtools, Picard, GATK MarkDuplicates[11][13]

Variant Calling (for DNA): Identify variants (SNPs, indels, CNVs).

Tools: GATK HaplotypeCaller/Mutect2, Strelka2, VarScan, FreeBayes[11][14][13]

Transcript Quantification (for RNA): Count gene expression levels.

Tools: featureCounts, HTSeq, Salmon, Kallisto

3. Tertiary Analysis

Objective: Biological interpretation, visualization, and reporting.

Steps & Tools:

Variant Annotation: Add functional/clinical information to variants.

Tools: ANNOVAR, SnpEff, VEP[14][11]

Data Visualization/Exploration: Generate plots, cluster analysis, PCA.

Tools: Integrative Genomics Viewer (IGV), R, Python (matplotlib, seaborn)[15]

o Specialized Analyses: Differential expression (RNA), metagenomic profiling, etc.

Tools: DESeq12, EdgeR, MetaPhlAn.

Step	Objective	Example Tools
Base Calling/Demultiplexing	Raw data → Reads per sample	Illumina RTA, Guppy, bcl2fastq
Quality Control & Trimming	Filter/clean sequence data	FastQC, Trimmomatic, cutadapt, fastp
Alignment/Mapping	Locate reads on reference	BWA, Bowtie2, STAR, minimap2
BAM File Processing	Organize/process alignments	SAMtools, Picard, GATK
Variant Calling/Quantification	Find variants/expressed genes	GATK, Strelka2, FreeBayes, featureCounts
Annotation	Biological interpretation	ANNOVAR, SnpEff, VEP
Visualization/Reporting	Summarize and interpret results	IGV, MultiQC, R, Python

Notes

Workflow and tools vary with analysis type (DNA, RNA, exome, metagenome, etc.).

Platforms like Galaxy and BaseSpace allow GUI-based or cloud analysis.

Pipeline frameworks such as Nextflow or Snakemake help automate and scale analyses[16].

🛠️ In summary, NGS data analysis is a multi-stage process involving a range of bioinformatic tools—each dedicated to transforming raw sequencing output into biologically meaningful insights through quality control, sequence alignment, variant discovery, and data interpretation[17][11][12].

3. NGS Data Analysis (Overview)

NGS data is generated from sequencing devices and passed through multiple steps:

Fragments are assembled using tools
Quality is checked and contigs are formed
Redundant/poor-quality regions are trimmed
Output files contain annotated genes and proteins

        💡 Bioinformatics enables genome reconstruction, sequence analysis, and full annotation after raw sequencing.
      

References

← Return to Home Go to Another Tutorial: Molecular Docking →