Genome Assembly Terminology

Below is a list of commonly used terms and definitions in the field of genomics (source : Genome Reference Consortium).

  • Assembly : a set of chromosomes, unlocalized and unplaced sequences and alternate loci used to represent an organism’s genome
  • Chromosome Assembly : a relatively complete pseudo-molecule assembled from smaller sequences that represent a biological chromosome
  • Diploid Assembly : a genome assembly for which a Chromosome Assembly is available for both sets of an individual’s chromosomes
  • Haploid Assembly : the collection of Chromosome assemblies, unlocalized and unlocalized sequences and alternate loci that represent an organism’s genome
  • Primary Assembly : a primary assemblies represents the collection of assembled chromosomes, unlocalized and unplaced sequences that, when combined, should represent a non-redundant haploid genome
  • Assembly Units : collections of sequences used to define discrete parts of an assembly
  • Genome Patch : a contig sequence that is released outside of the full assembly release cycle
  • FIX patch : FIX patches are released to correct an error in the assembly and will be removed when the new full assembly is released
  • NOVEL patch : NOVEL patches are sequences that were not in the last full assembly release and will be retained with the next full assembly release
  • Alternate Locus :
  • Unlocalized Sequence : a sequence found in an assembly that is associated with a specific chromosome but cannot be ordered or oriented on that chromosome
  • Unplaced Sequence : a sequence found in an assembly that is not associated with any chromosome
  • PAR (Pseudo-autosomal region) : a region found on the X and Y chromosomes of mammals that allow recombination between the sex chromosomes
  • AGP File : a file used to describe the instructions for building a contig, scaffold or chromosome sequence
  • Contig : a contiguous sequence generated from determining the non-redundant path along an order set of component sequences
  • Component : a low genomic level sequence used to construct the genome, typically these are either clone sequences, WGS sequence or a PCR fragment
  • Join : the sequence overlap between two adjacent components in a contig
  • Scaffold : an ordered and oriented set of contigs with gaps
  • Switch Point : the base at which the contig sequence stops being generated from one component sequence and switches to using the next component sequence
  • TPF (Tiling Path file) : provides the order of the component sequences used to build a contig, scaffold or chromosome