IDP-ASE

  • IDP-ASE is a tool for haplotyping and quantifying allele-specific expression
    from Second Generation Sequencing and PacBio sequencing.
  •  
  •     

Latest News: IDP-ASE_20161017 is released read more

Tutorial

This setup guide is in progress. For now see the Github.

Step 1 - Prepare Gene level data

 mkdir ~/temp/; mkdir ~/gene_files; mkdir ~/isoform_files; mkdir ~/gene_out; mkdir ~/isoform_out;
  julia ~/.julia/v0.4/IDPASE/src/prep_runs.jl -a test_data/SGS.psl test_data/TGS.psl \
    -g test_data/TDRKH.gpd -v test_data/sim.vcf -q test_data/SGS.fq test_data/TGS.fq \ 
    -d ~/temp/ -c 1 -f 1 1 -o ~/gene_files/ -p sim
  
where flag -a is space separated list of PSL files, -g is GPD file, -v is VCF file, -q is space separated list of FASTQ files, -d is a directory for intermediate output, -c is a space separated list of chromosomes of interest, -f is a space separated list of FASTQ formats corresponding to the FASTQ files of -q, where 1 indicates PHRED+33, and 2 indicates PHRED+64, and -o indicates output directory and output prefix (so in example /out/ is directory and output files will be prefixed by sim).

Step 2 - Obtain individual run commands

 julia  ~/.julia/v0.4/IDPASE/scripts/phase_by_loci.jl -a ~/gene_files/ -o ~/gene_out/ -n SGS TGS \ 
          -m 1 0 0 1 1 1 -d ~/.julia/v0.4/IDPASE/scripts/ -p sim > to_run_curr.sh
  
where -a is the -o flag from command in step 5, -o is an output directory, -n are unique names corresponding to PSL files, -m is a vector indicating which combinations of the seq data to use with IDPASE. In the above example 3 runs of IDPASE will be issued where 1 0 indicates SGS only, 0 1 indicates TGS only, and 1 1 indicates hybrid-Seq. p is the prefix specified in step 5. The output is a list of commands to run for each gene individually. The flag -d is the directory where the IDPASE scripts are stored.

Step 3 - Run each gene level command

 bash to_run_curr.sh 

Step 4 - Concatenate gene level results

 find ~/gene_out/ -name "REAL*" | xargs cat > ~/gene_out/gene_results.txt 

Step 5 - Prepare isoform level data

 julia ~/.julia/v0.4/IDPASE/src/prep_runs.jl -a test_data/SGS.psl test_data/TGS.psl \
          -g test_data/TDRKH.gpd -v test_data/sim.vcf -q test_data/SGS.fq test_data/TGS.fq \
          -d ~/temp/ -c 1 -f 1 1 -o ~/isoform_files/ -p sim -l 100 -i -s -e -r ~/gene_out/ \
	  gene_results.txt
  
where -l is read length for short reads, -s to skip file pre-processing (if using same GPD/VCF files are gene level), -e to use estimated haplotypes from gene leve, otherwise will use information from VCF, asumming it is phased, -r is the gene level results file.

Step 6 - Obtain individual run commands

 julia ~/.julia/v0.4/IDPASE/scripts/phase_isoforms_by_loci.jl -i ~/isoform_files/ \
          -o ~/isoform_out/ -b ~/.julia/v0.4/IDPASE/scripts -a -p sim > to_run_isofs.sh
  

Step 7 - Run each isoform level command

 bash to_run_isofs.sh 

Step 8 - Concatenate isoform level results

 find ~/isoform_out/ -name "EXTRA*" | xargs cat > ~/isoform_results.txt