wiki:WikiStart

Version 6 (modified by r.straver@…, 10 years ago) (diff)

--

Welcome to project WISECONDOR

Welcome to the project page of WISECONDOR (WIthin-SamplE COpy Number aberration DetectOR): Detect fetal trisomies and smaller CNV's in a maternal plasma sample using whole-genome data.

This page is meant as a short introduction to get you started.

Explanation

For details on the method, see the WISECONDOR paper which should be available soon. Additional information will be put on this page in the future.

Getting Started

Obtaining Scripts

To obtain WISECONDOR, use

svn co https://trac.nbic.nl/svn/wisecondor/trunk wisecondor

on a linux machine to check out the hottest scripts available. To obtain the latest stable version, pick the highest version number you can find from the tags instead of the trunk.

Dependencies

WISECONDOR was developed and tested using Python2.7. Using any other version may cause errors or faulty results. The working version is tested using SAMTOOLS on .bam files created by BWA.

WISECONDOR uses several python packages, of which the most common are:

sys
pickle
math
glob
argparse

Additional less common packages which most likely have to be installed separately are:

numpy
biopython
matplotlib

First Steps

To understand the system and what we will do next, here is a flow diagram showing where specific data is going to and required for.

Flow diagram of the scripts used in WISECONDOR

Count GC Frequency per Bin

TODO - This little step fell off in a transition to BWA. Hang on.

Convert BAM to PICKLE

Currently, WISECONDOR's entry point expects to receive a SAM formatted, sorted, input stream. It will filter the reads to remove so called Read-Towers and count the amount of reads left per bin. This can be obtained by a bash script or putting something like this into a terminal:

/path/to/samtools view ex_sample.bam | python consam.py /path/to/ex_sample.pickle

Creating a Reference Table

To teach WISECONDOR what bins behave alike we will need to feed it a set of (healthy) reference samples. Copy or move healthy files created in the previous step into a separate folder and tell WISECONDOR to build a new reference table using all .pickle files in the directory, the GC-count file previously created (to apply GC-Correction) and store the reference table in a file for later use, for example:

python newref.py /path/to/refdir/ /path/to/gccountperbin.pickle /path/to/reftable.pickle

This step may take several minutes, mostly dependent on the amount of reference samples you provided.

Testing A Sample

Now WISECONDOR knows what bins on the genome are likely to behave alike, we can feed it a sample and it will try to discover areas that differ greatly from their own set of reference bins. To test a sample, run the test script and feed it the sample pickle, the GC count file (again, to apply GC-Correction), the reference file and a path+basename so it knows where to put down a plot of the results. Also, the output goes into stdout, which you may want to save for later use by writing it to a file using >.

python test.py /path/to/ex_sample.pickle /path/to/gccountperbin.pickle /path/to/reftable.pickle /path/to/ex_sample.plot > /path/to/ex_sample.result

Tweaking and Fine-Tuning

WISECONDOR has a massive amount of variables that may require some tweaking to work well on your data as results may differ from system to system over different methods and machines used to obtain your NGS data. In the steps described above, all variables are left to their defaults to keep it readable while they can easily be altered. If you want to tweak some variables, try running any script with the '-h' argument. A list with options, their descriptions and their default values will be returned. Do keep in mind that several options need to be exactly the same over different scripts, i.e. the binsize used in any step should be the same or results will simply be rubbish. Any option for which this is true has the same argument name and is marked in its description. For example, using '-h' on the newref.py script:

$>python newref.py -h

usage: newref.py [-h] [-binsize BINSIZE] [-gccmaxn GCCMAXN]
                 [-gccminrd GCCMINRD] [-gccfval GCCFVAL] [-gccival GCCIVAL]
                 refdir gccount refout

Create a new reference table from a set of reference samples. Applies gc-correction. Outputs table as pickle to a specified output file.

positional arguments:
  refdir              directory containing samples to be used as reference
                      (pickle)
  gccount             gc-counts file used for gc-correction (pickle)
  refout              reference table output, used for sample testing (pickle)

optional arguments:
  -h, --help          show this help message and exit
  -binsize BINSIZE    binsize used for samples (default: 1000000)
  -gccmaxn GCCMAXN    maximum relative amount of unknown (n) bases in bin used
                      for gc-correction (equals arg used in test) (default:
                      0.1)
  -gccminrd GCCMINRD  minimum relative amount of reads in bin used for gc-
                      correction (equals arg used in test) (default: 0.0001)
  -gccfval GCCFVAL    width of data used in loess function used for gc-
                      correction (equals arg used in test) (default: 0.1)
  -gccival GCCIVAL    amount of fitting iterations in loess function used for
                      gc-correction (equals arg used in test) (default: 3)

Default Page Info

Mailing lists

This project provides the following mailing lists.

Source access

If available, anonymous readonly subversion access works as follows:

  svn co https://trac.nbic.nl/svn/wisecondor wisecondor

Write access is only available to registered developers.

You can become a developer by registering yourself if you haven't already done so, and requesting write access on the wisecondor-users mailing list.

Starting Points

Other NBIC software projects

All active NBIC software projects can be accessed from the project index.

Attachments (2)

Download all attachments as: .zip