Version 7 (modified by r.w.w.brouwer@…, 5 years ago) (diff)


User guide


NARWHAL will obtain sample information from a tab-delimited sample-sheet. The sample-sheet is a tab-delimited format with a fixed column order. Withing the columns, values can be delimited up to 2 levels deep using commas and semicolons repectively. The format of the sample-sheet is as follows:

Column Description Comment
1Sample ID This value should not contain spaces. The LaTEX document processor used to create the PDF QC report is not able to process samples with spaces
2Lane numberThe lane number in which the sample is present.
3Data reads*If multiple data reads are present these should be comma delimited.
4multiplex sequence (barcode)*The barcode sequence.
5multiplex start sites*In NARWHAL, barcodes are not required to start at the first base. In this field, the 0-offset barcode start should be present.
7Index read*The read that contains the barcode sequence.
8Reference sequenceThe path to the reference sequence.
9Paired end alignmentShould a paired-end alignment be performed?
10Application profileThe alignment settings and tool that will be used to align the data files for this sample are specified by the application profile. The application profiles are specified in the json configuration file.
11Additional optionsAdditional options can be specified in this field. The options should be comma delimited in a specific option format. This option format should be [sub-tool]:key=value. Key and value are mandatory and the sub-tool is optional. For an example see example 1.

Quick start

After creating the sample-sheet, NARWHAL can be run using a single command:

# simple run
> ./ -s path_to_sample_sheet path_to_BASECALLS path_to_OUTPUT

The script runs the and the scripts in sequence.

Detailed run

In most cases, users may want to perform some error checking prior to running a long-running analysis.

 Perform a dry run to test the configuration
> python -Ds path_to_sample_sheet path_to_BASECALLS path_to_OUTPUT
# Output describing the various processes that will be performed

# make the working directory
> python -s path_to_sample_sheet path_to_BASECALLS path_to_OUTPUT
# Output describing the various processes that will be performed

The "" script will setup the analysis output folder and all of the parameter files required to run the Narwhal tools. This runfolder has the following fixed folder structure:


The raw_data folder will contain the FastQ obtained from the Qseq conversions. These FastQ files are obtained per tile. These will be used in the demultiplex procedure that will write its ouput in the demultiplexed folder. The alignment uses those files and writes the aligned reads in SAM format to the alignemnt folder. In that folder, the SAM to BAM format conversions will be performed. The output from the QC analysis will be written to the stats folder. The projects folder is reserved for sample specific analyses.

To start the analyses, simply run the script on the runfolder.

# run the tools
> bash $runfolder


After the script has finished, numerous files will have been created in the various sub-folders of the runfolder. Depending on which files need to be exported to the researchers, the operator will need to go to the demultiplexed of alignment folders. Of particular importance is the stats folder. This folder holds the alignment QC reports generated from the BAM files.

In the QC reports, information is present on the general alignment statistics and figures showing the edit rates, alignment percentage per chromosome, read-length distribution and replication distributions. Examples of these graphs are shown below:

Table 1) Alignment statistics

Total number of reads	49922743
Number of aligned reads	48165331
Alignment percentage	96.4
Forward reads	21304254
Reverse reads	26861077
Edit-rate	0.0033234

Attachments (8)

Download all attachments as: .zip