Version 8 (modified by r.straver@…, 10 years ago) (diff) |
---|
Welcome to project WISECONDOR
Welcome to the project page of WISECONDOR (WIthin-SamplE COpy Number aberration DetectOR): Detect fetal trisomies and smaller CNV's in a maternal plasma sample using whole-genome data.
This page is meant as a short introduction to get you started.
Explanation
For details on the method, see the WISECONDOR paper which should be available soon. Additional information will be put on this page in the future.
Getting Started
Obtaining Scripts
To obtain WISECONDOR, use
svn co https://trac.nbic.nl/svn/wisecondor/trunk wisecondor
on a linux machine to check out the hottest scripts available. To obtain the latest stable version, pick the highest version number you can find from the tags instead of the trunk.
Dependencies
WISECONDOR was developed and tested using Python2.7. Using any other version may cause errors or faulty results. The working version is tested using SAMTOOLS on .bam files created by BWA.
WISECONDOR uses several python packages, of which the most common are:
sys pickle math glob argparse
Additional less common packages which most likely have to be installed separately are:
numpy biopython matplotlib
As for the reference genome, any reference that contains every autosomal chromosome once should do. Such a reference can be built by downloading every chromosome from UCSC and concatenating them into a single fasta file:
http://hgdownload-test.cse.ucsc.edu/goldenPath/hg19/chromosomes/
Also, to map your reads and produce a .bam file, we suggest using BWA:
http://bio-bwa.sourceforge.net/
To read the produced .bam file we use SAMtools, which can be obtained here: http://samtools.sourceforge.net/
First Steps
To understand the system and what we will do next, here is a flow diagram showing where specific data is going to and required for.
Count GC Frequency per Bin
TODO - This little step fell off in a transition to BWA. Hang on.
Convert BAM to PICKLE
Currently, WISECONDOR's entry point expects to receive a SAM formatted, sorted, input stream. It will filter the reads to remove so called Read-Towers and count the amount of reads left per bin. This can be obtained by a bash script or putting something like this into a terminal:
/path/to/samtools view ex_sample.bam | python consam.py /path/to/ex_sample.pickle
Creating a Reference Table
To teach WISECONDOR what bins behave alike we will need to feed it a set of (healthy) reference samples. Copy or move healthy files created in the previous step into a separate folder and tell WISECONDOR to build a new reference table using all .pickle files in the directory, the GC-count file previously created (to apply GC-Correction) and store the reference table in a file for later use, for example:
python newref.py /path/to/refdir/ /path/to/gccountperbin.pickle /path/to/reftable.pickle
This step may take several minutes, mostly dependent on the amount of reference samples you provided.
Testing A Sample
Now WISECONDOR knows what bins on the genome are likely to behave alike, we can feed it a sample and it will try to discover areas that differ greatly from their own set of reference bins. To test a sample, run the test script and feed it the sample pickle, the GC count file (again, to apply GC-Correction), the reference file and a path+basename so it knows where to put down a plot of the results. Also, the output goes into stdout, which you may want to save for later use by writing it to a file using >.
python test.py /path/to/ex_sample.pickle /path/to/gccountperbin.pickle /path/to/reftable.pickle /path/to/ex_sample.plot > /path/to/ex_sample.result
Tweaking and Fine-Tuning
WISECONDOR has a massive amount of variables that may require some tweaking to work well on your data as results may differ from system to system over different methods and machines used to obtain your NGS data. In the steps described above, all variables are left to their defaults to keep it readable while they can easily be altered. If you want to tweak some variables, try running any script with the '-h' argument. A list with options, their descriptions and their default values will be returned. Do keep in mind that several options need to be exactly the same over different scripts, i.e. the binsize used in any step should be the same or results will simply be rubbish. Any option for which this is true has the same argument name over different scripts and is marked in its description. For example, using '-h' on the newref.py script:
$>python newref.py -h usage: newref.py [-h] [-binsize BINSIZE] [-gccmaxn GCCMAXN] [-gccminrd GCCMINRD] [-gccfval GCCFVAL] [-gccival GCCIVAL] refdir gccount refout Create a new reference table from a set of reference samples. Applies gc-correction. Outputs table as pickle to a specified output file. positional arguments: refdir directory containing samples to be used as reference (pickle) gccount gc-counts file used for gc-correction (pickle) refout reference table output, used for sample testing (pickle) optional arguments: -h, --help show this help message and exit -binsize BINSIZE binsize used for samples (default: 1000000) -gccmaxn GCCMAXN maximum relative amount of unknown (n) bases in bin used for gc-correction (equals arg used in test) (default: 0.1) -gccminrd GCCMINRD minimum relative amount of reads in bin used for gc- correction (equals arg used in test) (default: 0.0001) -gccfval GCCFVAL width of data used in loess function used for gc- correction (equals arg used in test) (default: 0.1) -gccival GCCIVAL amount of fitting iterations in loess function used for gc-correction (equals arg used in test) (default: 3)
Default Page Info
Mailing lists
This project provides the following mailing lists.
- wisecondor-users: a list intended for general discussion on the project.
- wisecondor-commits: a list that receives source code commit messages.
- wisecondor-devel: a list intended for discussion among developers (subscription is restricted to registered developers).
Source access
If available, anonymous readonly subversion access works as follows:
svn co https://trac.nbic.nl/svn/wisecondor wisecondor
Write access is only available to registered developers.
You can become a developer by registering yourself if you haven't already done so, and requesting write access on the wisecondor-users mailing list.
Starting Points
- TracGuide -- Built-in Documentation
- TitleIndex -- A complete list of local wiki pages.
- Trac FAQ -- Frequently Asked Questions
Other NBIC software projects
All active NBIC software projects can be accessed from the project index.
Attachments (2)
-
wisecondor.svg
(56.5 KB) -
added by r.straver@… 10 years ago.
Flow diagram of the scripts used in WISECONDOR
- wiselogo.svg (12.3 KB) - added by r.straver@… 10 years ago.
Download all attachments as: .zip