Getting Started
These simple steps will help you integrate IDP into your transcriptomics analysis pipeline.
- Read the IDP_requirements for running IDP.
- Download and set-up the IDP package.
- Follow the tutorial to see how IDP works on some example data.
- Read the manual if anything is unclear.
- Check the guide for generating long read files suitable for isoform predicton or fusion detection.
- You're ready, Happy IDPing!
Latest publication
Kin Fai Au, Vittorio Sebastiano, Pegah Tootoonchi Afshar, Jens Durruthy Durruthy, Lawrence Lee, Brian A. Williams, Honoratus Van Bakel, Eric Schadt, Renee A. Reijo Pera, Jason Underwood, Wing Hung WongCharacterization of the human ESC transcriptome by hybrid sequencing
Proc. Natl. Acad. Sci. USA 2013 110 (50) E4821-E4830 [preprint]
SpliceMap-LSC-IDP pipeline
This hESC transcriptome is identified by SpliceMap-LSC-IDP pipeline.
SpliceMap takes short reads from the Second Generation Sequencing platforms, such as Illumina, to detect exon junctions.
LSC makes use of the high-quality short reads to correct the long reads from PacBio platform. The output is the error-corrected long reads.
IDP uses the junction detections and the alignment of error-corrected long reads to detect the relatively short isoforms at full-length and predict the very long isoforms by statistical modeling.
Latest News
08-04-2014 - Major Update to IDP
This is a major update to IDP and update includes changes to software requirements, additional features, and bug fixes.
- 1. The IDP software is now licensed under Apache 2.0 (a very open license)
- 2. BLAT and seqmap (part of SpliceMap) aligners are no longer bundled. Paths to aligner executables must be specificed in the config file if they are not installed under their default names.
- 3. GMAP can now be used rather than BLAT. This is by setting 'aligner_choice' to either 'gmap' or 'blat' in the config file. GMAP also requires the folder holding the index be specified in the config file.
- 4. An option to use MLE rather than Maximum a posteriori probability (MAP) is available by setting 'estimator_choice' to 'MAP' or 'MLE' in the config file. MAP is used by default, but in data sets with few long reads where few isoforms are detected, MLE should be used.
- 5. A bug was fixed where in the previous version, where IDP should have generated the file when 'detected_exp_len' whas left blank but did not.
08-01-2014 - Preparing long read outputs of LSC for use in IDP
Please concatenate the LSC outputs: corrected.fa with full.fa, and use this new fasta
file as your long read inputs for IDP.
The reason is that corrected.fa will lose some flanking sequences on
the long reads that were not corrected by short reads, and there still
may be some informative junctions in that region. If we used only
corrected.fa, we could lose this information. full.fa includes those
flanking regions in addition to the corrections that were made.
However, if we used only full.fa, it is likely the IDP algorithm could
throw out many of those long reads for failing to find short read
support for junctions in those regions. If you combine the two
datasets, you will not suffer any loss of information, and any
redundancies will be handled by IDP.
04-24-2014 - IDP 0.1.2 minor update is released
This minor update fixes several bugs.
04-17-2014 - IDP 0.1.1 minor update is released
This minor update fixes several bugs and is accompanied by a convenient, small-sized, test dataset available in the tutorial.
11-26-2013 - IDP 0.1 and the manual and a tutorial are released
IDP integrates short reads (e.g. Illumina data) and long reads (e.g. PacBio data) to identify gene isoforms (transcripts) from transcriptome (see Figure above).
- One input of IDP is the short-read RNA-seq results: junctions (bed file) AND alignments of short reads (sam file).
Most RNA-seq tools, such as SpliceMap and Tophat can output these two files. - The other input is the long reads: raw sequences (FASTA file) OR alignment of long reads (PSL file by BLAT or GPD file)
The error-corrected long reads from PacBio data is perferred. LSC is our default error-correction tool. - The IDP output are the gene isoform identifications and quantification of genes and gene isoforms. hESC transcriptome (H1 cell line) is the first one identified by this methods. For more details of this transcriptome, please see its homepage http://www.augroup.org/IDP/hESC.html and our paper Characterization of the human ESC transcriptome by hybrid sequencing [preprint].
11-26-2013: Hompage of hESC transcriptome identified by SpliceMap-LSC-IDP pipline is released.
The homepage of hESC transcriptome (H1 cell line) is released. You can also find novel genes, novel isoforms of existing genes (including pluripency markers) and novel ncRNA in this website:The details of this hESC transcriptome can be in our publication: Characterization of the human ESC transcriptome by hybrid sequencing [preprint]
11-26-2013: IDP and hESC transcriptome paper is released
Kin Fai Au, Vittorio Sebastiano, Pegah Tootoonchi Afshar, Jens Durruthy Durruthy, Lawrence Lee, Brian A. Williams, Honoratus Van Bakel, Eric Schadt, Renee A. Reijo Pera, Jason Underwood, Wing Hung WongCharacterization of the human ESC transcriptome by hybrid sequencing [preprint]
In press
For detailed information about this release, please see the release notes.