Changes between Version 13 and Version 14 of WikiStart
- Timestamp:
- Apr 4, 2013, 3:40:25 PM (10 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
WikiStart
v13 v14 58 58 As several steps require information about the GC contents of areas on the genome we need to prepare the necessary information for WISECONDOR. This step only needs to be repeated when the reference genome is replaced. 59 59 {{{ 60 python countgc.py /path/to/hg19.fasta gccountperbin.pickle60 python countgc.py hg19.fasta gccountperbin.pickle 61 61 }}} 62 62 This takes quite a few minutes as the implementation was not focused on speed, rather on quick functionality. As you rarely have to do this step this shouldn't really be a problem. … … 65 65 Currently, WISECONDOR's entry point expects to receive a SAM formatted, sorted, input stream. It will filter the reads to remove so called Read-Towers and count the amount of reads left per bin. This can be obtained by a bash script or putting something like this into a terminal: 66 66 {{{ 67 /path/to/samtools view ex_sample.bam | python consam.py /path/to/ex_sample.pickle67 /path/to/samtools view ex_sample.bam | python consam.py ex_sample.pickle 68 68 }}} 69 69 This step is required for every test and reference sample used. … … 72 72 To teach WISECONDOR what bins behave alike we will need to feed it a set of (healthy) reference samples. Copy or move healthy files created in the previous step into a separate folder and tell WISECONDOR to build a new reference table using all .pickle files in the directory, the GC-count file previously created (to apply GC-Correction) and store the reference table in a file for later use, for example: 73 73 {{{ 74 python newref.py /path/to/refdir/ /path/to/gccountperbin.pickle /path/to/reftable.pickle74 python newref.py refdir/ gccountperbin.pickle reftable.pickle 75 75 }}} 76 76 This step may take several minutes, mostly dependent on the amount of reference samples you provided. Due to the design of WISECONDOR, the more reference samples available the merrier. Even adding extremely low coverage samples (i.e. 0.03 times coverage) may improve reliability of the WISECONDOR. As these samples are only used to build a reference, any healthy whole genome sample that was produced in the same manner as the samples you would like to test will most likely do fine; male, female, pregnant, non-pregnant, different lanes, different times, different coverages, just make sure it is run on the same machine and prepared the same way, then everything should be fine. This also means that, if done right, no additional reference samples need to be sequenced for testing at some point as the reference samples made previously provide enough information. … … 80 80 Now WISECONDOR knows what bins on the genome are likely to behave alike, we can feed it a sample and it will try to discover areas that differ greatly from their own set of reference bins. To test a sample, run the test script and feed it the sample pickle, the GC count file (again, to apply GC-Correction), the reference file and a path+basename so it knows where to put down a plot of the results. Also, the output goes into stdout, which you may want to save for later use by writing it to a file using >. 81 81 {{{ 82 python test.py /path/to/ex_sample.pickle /path/to/gccountperbin.pickle /path/to/reftable.pickle /path/to/ex_sample.plot > /path/to/ex_sample.result82 python test.py ex_sample.pickle gccountperbin.pickle reftable.pickle ex_sample.plot > ex_sample.result 83 83 }}} 84 84 Output formatting is a bit confusing at this point and may improve over time.