MaGIC QC & Normalization Tool

Welcome to the QC & Normalization Tool by the Molecular and Genomics Informatics Core (MaGIC).


What This Tool Does

This is the front door of the MaGIC bulk-expression pipeline. Upload a raw count matrix and sample metadata, inspect the quality of your samples, flag outliers, and export a normalized expression matrix in the exact format the downstream MaGIC tools (DEG, Volcano, Heatmap, Dimensionality Reduction) expect.


How to Use This Tool

  1. Data Input. Upload your raw count matrix and sample metadata (and, optionally, a gene-info table for biotype QC and TPM). Confirm the sample IDs match, then Submit.
  2. QC Overview. Review library sizes, detected genes, biotype composition, sample–sample correlation, and expression distributions.
  3. PCA / Outlier. Project samples onto principal components, colour by metadata, and flag outliers by Mahalanobis distance.
  4. Normalization. Pick a normalization method, compare before/after distributions, and download the normalized matrix + per-sample QC metrics.

Required Input Data Formats

Raw Count Matrix
  • File format: CSV or TSV
  • Rows: genes (one gene per row)
  • Columns: samples (one sample per column)
  • First column: gene identifiers
  • Values: raw (un-normalized) integer counts
GeneID, Control1, Control2, Treatment1
ENSG0001, 149,     122,      218
ENSG0002, 409,     151,      46
Sample Metadata
  • File format: CSV or TSV
  • Rows: samples (one sample per row)
  • First column: sample names — must match matrix column names
  • Additional columns: any annotation (condition, batch, time, ...)
Sample,    Group,     Batch
Control1,  Control,   B1
Control2,  Control,   B2
Gene Info Table (optional)
  • Maps gene IDs to symbol, biotype, and length
  • Biotype column enables the biotype-composition QC plot
  • Length column (in bp) enables TPM normalization
GeneID,   Symbol, Biotype,         Length
ENSG0001, GENE1,  protein_coding,  2400
ENSG0002, GENE2,  lncRNA,          1800

Input Data


Raw Count Matrix

Salmon Quant Files

Raw counts are taken from the NumReads column. Sample names come from the file names (so rename quant.sf → sample1.sf, …, or upload from per-sample folders). Gene lengths are read automatically to enable TPM.

Sample Metadata

Gene Info (optional)


Explore the tool with a built-in demo dataset.

Demo: 300 genes across 9 samples in 3 groups (Control / Treatment / Sham), with batch + time metadata and a gene-info table. Sham3 is a deliberately low-depth sample.



Loading...

Loading...

Loading...

Loading...

QC Options

Options change with the selected plot.






X axis
Y axis
Legend


Loading...


Loading...


Loading...


Loading...


Loading...

PCA Options

Options change with the selected plot.


Components


Computed on the PCs shown. Samples beyond the threshold are highlighted and listed in the Outlier Table.

X axis
Y axis
Legend


Loading...


Loading...

Normalization



Before / After Plot
X axis
Y axis
Legend

Download Normalized Matrix
Download normalized matrix


Loading...

Loading...