The Opinionated Guide to Sequencing and Assembly

Authors: Holly Bik, C. Titus Brown, Nick Loman, Lex Nederbragt, and Jared Simpson

Expiration date: 6/1/2014.

(Meta)genome Goal Dataset Assembly strategy
Bacteria (Near) completion PacBio 100x HGAP/Celera
Bacteria Draft few contigs Ilmn Nextera/TruSeq PE 2x250 c50x + Nextera MP 5kbp c50x SPADES/MIRA
Bacteria Draft 10s - 100(s) of contigs Ilmn Nextera/TruSeq PE 2x250 c50x SPADES/A5/MIRA
Small eukaryote up to 100 Mbp contigs Ilmn Nextera/TruSeq PE 2x250 c50x SOAPdenovo, MIRA, SGA
Small eukaryote up to 100 Mbp scaffolds Ilmn Nextera/TruSeq PE 2x250 c50x + Nextera MP 3-10kbp c50x (each) Optional: PacBio SOAPdenovo, MIRA, SGA (ALLPATHS_LG with right libraries) PBJelly and/or AHA
Eukaryote 100-500 Mbp contigs Ilmn Nextera/TruSeq PE 2x250 c50x SOAPdenovo, SGA
Eukaryote 100-500 Mbp scaffolds Ilmn Nextera/TruSeq PE 2x250 MiSeq OR 2x150 HiSeq c50x; optional: multiple fr. lengths; Nextera MP 3-10kbp c50x (each); Optional: PacBio SOAPdenovo, SGA, MaSuRCA, CA, Abyss (ALLPATHS_LG with right libraries) PBJelly and/or AHA
Eukaryotes over 500 contigs / non-repetitive components Ilmn Nextera/TruSeq PE 2x250 MiSeq OR 2x150 HiSeq c50x SOAPdenovo, SGA, diginorm + velvet
Eukaryotes over 500 scaffolds as for 100-500 Mpb add more library types SOAPdenovo, SGA, MaSuRCA, CA, Abyss (ALLPATHS_LG with right libraries) PBJelly and/or AHA
Metagenome low diversity (2-50 “species”) Diversity estimates, gene mining Ilmn Nextera/TruSeq PE 2x150 HiSeq (tip: long insert) IDBA-UD, SPADES, MIRA
Metagenome low diversity (2-50 “species”) Complete genomes PacBio or Moleculo IDBA-UD, diginorm + velvet/SGA, Ray
Metagenome medium diversity (50-500 “species”) Diversity estimates, gene mining Ilmn Nextera/TruSeq PE 2x150 HiSeq (tip: long insert) IDBA-UD, diginorm + velvet/SGA, Ray
Metagenome high-diversity (e.g. soil, sediment) Diversity estimates, gene mining Ilmn Nextera/TruSeq PE 2x150 HiSeq (tip: long insert) diginorm + velvet/SGA
Metatranscriptome Expression, gene mining Ilmn Nextera/TruSeq PE 2x150 HiSeq diginorm + velvet/SGA, Ray?
Single-cell genome bacterial Partial genome Ilmn Nextera/TruSeq PE 2x250 c50x SPADES
Single-cell genome eukaryote (protist) Partial genome Ilmn Nextera/TruSeq PE 2x250 c50x SPADES?, diginorm + velvet/SGA/
RNA-seq De novo transcriptome Ilmn TruSeq/Nextera PE 2x100 HiSeq. 50 - 100 million reads per tissue, 300-500 bp fragment Trinity

Previous topic

#UCD – Assemble! Outputs

Next topic

The 10+ Commandments of Assembly

This Page


LICENSE: This documentation and all textual/graphic site content is licensed under the Creative Commons - 0 License (CC0) -- fork @ github. Presentations (PPT/PDF) and PDFs are the property of their respective owners and are under the terms indicated within the presentation.

Development and posting of this material, and the associated workshop, were supported by Grant Number R25HG006243 from the National Human Genome Research Institute and an NSF OCI supplement to NSF DBI-0939454.


Edit this document!

This file can be edited directly through the Web. Anyone can update and fix errors in this document with few clicks -- no downloads needed.

  1. Go to The Opinionated Guide to Sequencing and Assembly on GitHub.
  2. Edit files using GitHub's text editor in your web browser (see the 'Edit' tab on the top right of the file)
  3. Fill in the Commit message text box at the bottom of the page describing why you made the changes. Press the Propose file change button next to it when done.
  4. Then click Send a pull request.
  5. Your changes are now queued for review under the project's Pull requests tab on GitHub!

For an introduction to the documentation format please see the reST primer.