|Version 8 (modified by r.w.w.brouwer@…, 4 years ago)|
NARWHAL will obtain sample information from a tab-delimited sample-sheet. The sample-sheet is a tab-delimited format with a fixed column order. Withing the columns, values can be delimited up to 2 levels deep using commas and semicolons repectively. The format of the sample-sheet is as follows:
|1||Sample ID||This value should not contain spaces. The LaTEX document processor used to create the PDF QC report is not able to process samples with spaces|
|2||Lane number||The lane number in which the sample is present.|
|3||Data reads*||If multiple data reads are present these should be comma delimited.|
|4||multiplex sequence (barcode)*||The barcode sequence.|
|5||multiplex start sites*||In NARWHAL, barcodes are not required to start at the first base. In this field, the 0-offset barcode start should be present.|
|7||Index read*||The read that contains the barcode sequence.|
|8||Reference sequence||The path to the reference sequence.|
|9||Paired end alignment||Should a paired-end alignment be performed?|
|10||Application profile||The alignment settings and tool that will be used to align the data files for this sample are specified by the application profile. The application profiles are specified in the json configuration file.|
|11||Additional options||Additional options can be specified in this field. The options should be comma delimited in a specific option format. This option format should be [sub-tool]:key=value. Key and value are mandatory and the sub-tool is optional. For an example see example 1.|
After creating the sample-sheet, NARWHAL can be run using a single command:
# simple run > ./narwhal.sh -s path_to_sample_sheet path_to_BASECALLS path_to_OUTPUT
The narwhal.sh script runs the prepare_run.py and the run_tools.sh scripts in sequence.
In most cases, users may want to perform some error checking prior to running a long-running analysis.
Perform a dry run to test the configuration > python prepare_run.py -Ds path_to_sample_sheet path_to_BASECALLS path_to_OUTPUT # # Output describing the various processes that will be performed # # make the working directory > python prepare_run.py -s path_to_sample_sheet path_to_BASECALLS path_to_OUTPUT # Output describing the various processes that will be performed
The "prepare_run.py" script will setup the analysis output folder and all of the parameter files required to run the Narwhal tools. This runfolder has the following fixed folder structure:
output_folder/NARWHAL_illumina_runfolder/ raw_data/ demultiplexed/ sample_name_1/ .. sample_name_N alignment/ sample_name_1/ .. sample_name_N/ stats/ sample_name_1/ .. sample_name_N/ projects
The raw_data folder will contain the FastQ obtained from the Qseq conversions. These FastQ files are obtained per tile. These will be used in the demultiplex procedure that will write its ouput in the demultiplexed folder. The alignment uses those files and writes the aligned reads in SAM format to the alignemnt folder. In that folder, the SAM to BAM format conversions will be performed. The output from the QC analysis will be written to the stats folder. The projects folder is reserved for sample specific analyses.
To start the analyses, simply run the run_tools.sh script on the runfolder.
# run the tools > bash run_tools.sh $runfolder
After the run_tools.sh script has finished, numerous files will have been created in the various sub-folders of the runfolder. Depending on which files need to be exported to the researchers, the operator will need to go to the demultiplexed of alignment folders. Of particular importance is the stats folder. This folder holds the alignment QC reports generated from the BAM files.
In the QC reports, information is present on the general alignment statistics and figures showing the edit rates, alignment percentage per chromosome, read-length distribution and replication distributions. Examples of these graphs are shown below:
Table 1) Alignment statistics
Total number of reads 49922743 Number of aligned reads 48165331 Alignment percentage 96.4 Forward reads 21304254 Reverse reads 26861077 Edit-rate 0.0033234
(7.5 KB) - added by r.w.w.brouwer@…
4 years ago.
(5.7 KB) - added by r.w.w.brouwer@…
4 years ago.
(4.6 KB) - added by r.w.w.brouwer@…
4 years ago.
(7.1 KB) - added by r.w.w.brouwer@…
4 years ago.
- example_simple.txt (0.6 KB) - added by r.w.w.brouwer@… 4 years ago.
- example_multiplex.txt (1.3 KB) - added by r.w.w.brouwer@… 4 years ago.
- example_multiplex_multiple_lanes.txt (0.7 KB) - added by r.w.w.brouwer@… 4 years ago.
(270 bytes) - added by m.vandenhout@…
3 years ago.
Narwhal samplesheet example for FastQ mode