Changes between Version 1 and Version 2 of WikiStart


Ignore:
Timestamp:
Apr 3, 2013, 5:52:45 PM (11 years ago)
Author:
r.straver@…
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • WikiStart

    v1 v2  
    1 
    21= Welcome to project WISECONDOR =
    32
     3Welcome to the project page of WISECONDOR (WIthin-SamplE COpy Number aberration DetectOR): Detect fetal trisomies and smaller CNV's in a maternal plasma sample using whole-genome data.
     4
     5This page is meant as a short introduction to get you started.
     6
     7= Explanation =
     8
     9For details on the method, see the WISECONDOR paper which should be available soon. Additional information will be put on this page in the future.
     10
     11= Getting Started =
     12
     13== Obtaining Scripts ==
     14To obtain WISECONDOR, use
     15{{{
     16svn co https://trac.nbic.nl/svn/wisecondor/trunk wisecondor
     17}}}
     18on a linux machine to check out the hottest scripts available. To obtain the latest stable version, pick the highest version number you can find from the tags instead of the trunk.
     19
     20
     21== Dependencies ==
     22WISECONDOR was developed and tested using Python2.7. Using any other version may cause errors or faulty results. The working version is tested using SAMTOOLS on .bam files created by BWA.
     23
     24WISECONDOR uses several python packages, of which the most common are:
     25{{{
     26sys
     27pickle
     28math
     29glob
     30argparse
     31}}}
     32
     33Additional less common packages which most likely have to be installed separately are:
     34{{{
     35numpy
     36biopython
     37matplotlib
     38}}}
     39
     40== First Steps ==
     41
     42=== Count GC Frequency per Bin ===
     43TODO - This little step fell off in a transition to BWA. Hang on.
     44
     45=== Convert BAM to PICKLE ===
     46Currently, WISECONDOR's entry point expects to receive a SAM formatted, sorted, input stream. It will filter the reads to remove so called Read-Towers and count the amount of reads left per bin. This can be obtained by a bash script or putting something like this into a terminal:
     47{{{
     48/path/to/samtools view ex_sample.bam | python consam.py /path/to/ex_sample.pickle
     49}}}
     50
     51=== Creating a Reference Table ===
     52To teach WISECONDOR what bins behave alike we will need to feed it a set of (healthy) reference samples. Copy or move healthy files created in the previous step into a separate folder and tell WISECONDOR to build a new reference table using all .pickle files in the directory, the GC-count file previously created (to apply GC-Correction) and store the reference table in a file for later use, for example:
     53{{{
     54python newref.py /path/to/refdir/ /path/to/gccountperbin.pickle /path/to/reftable.pickle
     55}}}
     56This step may take several minutes, mostly dependent on the amount of reference samples you provided.
     57
     58=== Testing A Sample ===
     59Now WISECONDOR knows what bins on the genome are likely to behave alike, we can feed it a sample and it will try to discover areas that differ greatly from their own set of reference bins. To test a sample, run the test script and feed it the sample pickle, the GC count file (again, to apply GC-Correction), the reference file and a path+basename so it knows where to put down a plot of the results. Also, the output goes into stdout, which you may want to save for later use by writing it to a file using >.
     60{{{
     61python test.py /path/to/ex_sample.pickle /path/to/gccountperbin.pickle /path/to/reftable.pickle /path/to/ex_sample.plot > /path/to/ex_sample.result
     62}}}
     63
     64=== Tweaking and Fine-Tuning ===
     65WISECONDOR has a massive amount of variables that may require some tweaking to work well on your data as results may differ from system to system over different methods and machines used to obtain your NGS data. In the steps described above, all variables are left to their defaults to keep it readable while they can easily be altered. If you want to tweak some variables, try running any script with the '-h' argument. A list with options, their descriptions and their default values will be returned. Do keep in mind that several options need to be exactly the same over different scripts, i.e. the binsize used in any step should be the same or results will simply be rubbish. Any option for which this is true has the same argument name and is marked in its description.
     66For example, using '-h' on the newref.py script:
     67{{{
     68$>python newref.py -h
     69
     70usage: newref.py [-h] [-binsize BINSIZE] [-gccmaxn GCCMAXN]
     71                 [-gccminrd GCCMINRD] [-gccfval GCCFVAL] [-gccival GCCIVAL]
     72                 refdir gccount refout
     73
     74Create a new reference table from a set of reference samples. Applies gc-
     75correction. Outputs table as pickle in stdout, use > to write this table to a
     76file.
     77
     78positional arguments:
     79  refdir              directory containing samples to be used as reference
     80                      (pickle)
     81  gccount             gc-counts file used for gc-correction (pickle)
     82  refout              reference table output, used for sample testing (pickle)
     83
     84optional arguments:
     85  -h, --help          show this help message and exit
     86  -binsize BINSIZE    binsize used for samples (default: 1000000)
     87  -gccmaxn GCCMAXN    maximum relative amount of unknown (n) bases in bin used
     88                      for gc-correction (equals arg used in test) (default:
     89                      0.1)
     90  -gccminrd GCCMINRD  minimum relative amount of reads in bin used for gc-
     91                      correction (equals arg used in test) (default: 0.0001)
     92  -gccfval GCCFVAL    width of data used in loess function used for gc-
     93                      correction (equals arg used in test) (default: 0.1)
     94  -gccival GCCIVAL    amount of fitting iterations in loess function used for
     95                      gc-correction (equals arg used in test) (default: 3)
     96}}}
     97
     98----
     99
     100= Default Page Info =
    4101== Mailing lists ==
    5102This project provides the following mailing lists.