• LSC is a long read error correction tool.
    It offers fast correction with high sensitivity
    and good accuracy.

Latest News: Major update version 2.0 ... read more

Getting Started

These simple steps will help you integrate LSC into your transcriptomics analysis pipeline.

Latest publication

Kin Fai Au, Jason Underwood, Lawrence Lee and Wing Hung Wong
Improving PacBio Long Read Accuracy by Short Read Alignment [Manuscript]
PLoS ONE 2012. 7(10): e46679. doi:10.1371/journal.pone.0046679

Latest News

04-11-2016: Major update version 2.0


02-02-2015: Minor updates and bug fixes LSC 1.beta

Feature Updates:

Bug Fixes:

12-01-2013: Faster and much less memory-required LSC 1.alpha is released

In the LSC 0.3.0 or 0.3.1, we optimized the setting of bowtie2 and BWA to get much more short read alignment, which improve the the accuracy of error correction a lot/ However, the increase of alignments also requires much more running time (on both alignment and the following error correction step) and memory usage. Therefore, a few users met difficulty of running LSC 0.3.0 or 0.3.1.

In LSC 1.alpha, we apply probabilistic algorithm ("SCD" option) to select ""enough" short read alignment for error correction. LSC 1.alpha does NOT sacrifice the error correction performace (sensitivity and specificity). Please see Thus, we save running time and memory usage significantly. The running time is 30-50% of LSC 0.3.1. The peak memory usage decreases to ~10G regardless of the data size.

New features:

Miscellaneous changes:

If you want to see the manual and tutorial of the old LSC (before 1.alpha), we keep the links of its the Old manual and Old tutorial in the right side bar.

09-30-2013: More robust and faster LSC 0.3.1

In LSC 0.3.1, we don't have pseudo chromosome, the alignment time reduced to ~10% (in Bowtie2 mode). And you can re-run some crashed jobs easily now.

New features:

Miscellaneous changes:

08-07-2013: Big changes in LSC 0.3

In LSC 0.3, we have a few updates. They are very IMPORTANT updates, new features and small fixes

Very IMPORTANT updates:

  • Support for Bowtie2 and RazerS3 as initial aligners. Now, BWA, Bowtie2, RazerS3 and Novoalign work in LSC. Please see the comparison details of aligners in the "Short read - Long read aligner#manual".
  • Added SR length coverage percentage on LR (SR-covered length/full length of corrected LR) to corrected_LR output file. Here is an example, where the last number 0.82 is the SR length coverage percentage on LR:
  • Added support for three modes for step-wise runs:
      • mode 0: end-to-end
        mode 1: generating file
        mode 2: correction step
  • Generating FASTQ output format based on correction probability given short read coverage. Please refer to LSC paper and manual page for more details. You can select well-corrected reads for downstream analyses by using the quality in FASTQ output or SR length coverage percentage above. Please the the filtering in the "Output#manual".

New features

  • Used the python path in the cfg file instead of default user/bin path
  • Added option (-clean_up) to remove intermediate files or not (Note: important/useful ones will still be there in temp folder)
  • Support for input fastq format for LR (long reads) and/or SR (short reads)
  • Updated default BWA and novoalign commands options
  • Printing out original LR names in the output file
  • Support for printing out version number using -v/-version option

Small bug fixed

  • Fixed in removing XZ pattern printed out at the end of some uncorrected_LR sequences
  • Fixed samParser bug (which was ignoring some valid alignments in BWA output)

06-09-2013: BWA is accepted in LSC 0.2.4

We use a short read aligner in the first step of LSC. By default, Novoaligner is used. You can use BWA to run this process as well, which could be much faster. Please find the new aligner options in the webpage ".cfg file format"

The default settings of Novoalign options are:

	-r All -F FA  -n 300 -o sam -o FullNW 
The default settings of BWA options are:
	-n 0.08 -o 10 -e 3 -d 0 -i 0 -M 1 -O 1 -E  1 -N 
You can change these aligner setting. The details of these options can be found in the aligners' home page.

In addition, a bug is fixed:

some uncertain corrections may exist at the right ends of the long reads in the old LSC. LSC 0.2.4 settles this problem and likely improves the accuracy further.

02-06-2013: a bin path bug is fixed in LSC 0.2.2

If you run LSC at the bin folder (the bin folder is the work directory) or set the bin as the default path, then you may meet this bug. LSC 0.2.3 fixes this bug of finding the correct bin folder. You can download the LSC 0.2.3

10-17-2012: LSC 0.2.2 Released and a RemoveBothTail bug is fixed

LSC 0.2.2 fixes the bug of the option "I_RemoveBothTails". LSC 0.2.1 ran this option even if you set "N". It may halt the process in LSC 0.2.1 because the read name does not allow "RemoveBothTails". Now you can choose to use this option or not.

10-13-2012: The manual and a tutorial are released

The manual is released and you can also learn LSC by running the example in the tutorial.

10-4-2012: LSC paper is released

Kin Fai Au, Jason Underwood, Lawrence Lee and Wing Hung Wong
Improving PacBio Long Read Accuracy by Short Read Alignment

8-12-2012 - LSC 0.2.1 Released

LSC 0.2.1 fixes the bug of python path. Another bug of removing redundant reads is also fixed. LSC takes a long read data sets (>=100bp) and a short reads data sets (50 - 100bp) as input. They should be in FASTA format. Running time is almost linear with the the number of threads.

8-7-2012: some bugs are found

  • Great thanks to Hans Jansen@SEQanswers for testing LSC 0.2. Some bugs of python and novoalign path setting will be fix in the coming verison very soon. The thread in SEQanswers may be helpful for your questions:

    If you need to run LSC 0.2 now, please add novoalign in your default path and change the first lines of all scripts "#!/home/stow/swtree/bin/python2.6" to "#!/usr/bin/python".
  • 5-2-2012: LSC 0.2 Released

    LSC 0.2 takes a long read data sets (>=100bp) and a short reads data sets (50 - 100bp) as input. They should be in FASTA format. Running time is almost linear with the the number of threads.

    For detailed information about this release, please see the release notes.