Next, 20% of each of their reads were randomly selected to create a simulated pooled sample at an average of 20× coverage. There was a region of chromosome 4 with drastically fewer variants in our study (Appendix Fig. Zebrafish variant comparisons after sequencing and masking a pooled subsample. Alignment, variant calling, and filtering were all performed with the previous parameters. found an average of 4.6 M SNPs and 0.68 M indels per African individual, 3.75 M SNPs and 0.60 M indels per Caucasian, and 3.69 M SNPs and 0.54 M indels per Asian. The team identified 154 pseudogenes in the zebrafish genome, a fraction of the 13,000 or so pseudogenes found in the human genome. https://doi.org/10.1289/ehp.1408202, CAS  Development 142:1542–1552. Genetics 206(2):537–556, Stanley KA, Curtis LR, Massey Simonich SL, Tanguay RL (2009) Endosulfan I and endosulfan sulfate disrupts zebrafish embryonic development. https://doi.org/10.1126/science.1242747, CAS  We have generated lesions ranging from small indels to full gene deletions. https://doi.org/10.1016/j.cbpc.2016.04.004, CAS  Even before applying filters, 49.8% as many variants were detected in this pooled sample compared to the whole dataset. The zebrafish reference genome sequence and its relationship to the human genome. 2007; Bai et al. Because of the optical transparency of the embryos, small size at maturity and ease of culture, Zebrafish has become a popular organism to study embryonic development for biologist worldwide. CRISPR/Cas9 and next-generation gene-editing techniques using cytidine deaminase fused with Cas9 nickase provide fast and efficient tools able to induce sequence … 2004). It came into awareness in the scientific community, at least for geneticists, roughly around 1981 when the late George Streisinger, looking for a new genetic organism to study, picked up a couple of zebrafish at the local pet store and started to do a few experiments with them. 2013). We then empirically compared genomic characteristics of our zebrafish population with murine and human reference populations, as well as across other zebrafish lines. J Gerontol Ser A 70:1470–1478. Aquat Toxicol 95:355–361. https://doi.org/10.1093/gbe/evr090, Mrakovcic M, Haley LE (1979) Inbreeding depression in the Zebra fish Brachydanio rerio (Hamilton Buchanan). Zebrafish (Danio rerio) are small, freshwater fish commonly found in the tropics. 2008) was implemented to randomly mix the genomes of eight founder strains to create hundreds of isogenic RILs (Churchill et al. The file included 17,089,212 variant calls (15,680,057 SNPs) and genotypes for the two founders based on high coverage individual whole genome sequencing and alignment to GRCz10 without masking. Correspondence to However, little is known about zebrafish α-crystallin promoter function, how it compares to that of mammals, or whether mammalian α-crystallin promoter activity can be assessed using zebrafish embryos. 2012). The zebrafish genome project at the Wellcome Sanger Institute produced the zebrafish reference assembly of the Tuebingen strain. 2014) was downloaded for use in a separate line comparison due to sequencing strategy differences and alignment to different versions of the reference genome. https://doi.org/10.3109/17435390.2010.489207, Baer CE, Ippolito DL, Hussainzada N et al (2014) Genome-wide gene expression profiling of acute metal exposures in male zebrafish. Balik-Meisner, M., Truong, L., Scholl, E.H. et al. There are also long-term benefits associated with creating a database of known SNPs in zebrafish populations. And they breed very, very well. of a non-reference base for at least one individual. After the simulated analysis, median read depth per variant site for T5D was 14 (within the range of 8–16 mentioned previously for the other 4 lines). Reads with a mapping quality below 20 were not included, and a minimum phred-scaled confidence threshold of 10 was required. In this regard, zebrafish are useful because the embryo is transparent, it develops outside of its mother, and its development from eggs to larvae happens in just three days. 2015; Betts and Shelton-Davenport 2016). In order to assess the similarity of T5D variation to a hybrid population that has previously employed an individual sequencing approach, SNP sites were compared to NHGRI-1 SNP sites. Samples were sheared to ~ 320 bp, and 100 ng was used in the WaferGen robotic DNA library prep. T5D variant counts and proportions of non-reference reads moved closer to those observed in other lines (Fig. In Chinese individuals, an average of 3.5 M SNPs and 0.63 M indels were identified (Shi et al. PubMed  For indels, the count decreased from 2,966,260 to 2,608,746 to 2,339,775. 2014; Reif et al. This was performed with the intention to more closely approximate variants that would have been called in T5D, had a pooled approach been employed instead of individual sequencing. All library preparation and sequencing were performed at Oregon State University’s Center for Genome Research and Biocomputing (http://cgrb.oregonstate.edu/core). Like its rodent counterparts, the zebrafish genome is highly conserved with that of humans, with 82% of disease-related human genes possessing a zebrafish ortholog (Howe et al., 2013). This likely means that (1) many of the variants discovered in T5D are present in other lines as well but have not been found due to pooling, low coverage, and sample size restrictions in previous zebrafish experiments, and (2) there are many more rare alleles that are yet to be discovered. 2a, c). The T5D allele frequencies are based on 276 individual whole genome sequences. https://doi.org/10.1242/dev.118786, Chesler EJ, Miller DR, Branstetter LR et al (2008) The Collaborative Cross at Oak Ridge National Laboratory: developing a powerful resource for systems genetics. The y axis displays the variant count partitioned into 1 mb bins of genomic sequence (x axis). The filtered delta files were also run through dnadiff. Prior to library prep, the quality and quantity were verified using a fluorometric plate reader and Bioanalyzer. Environ Health Perspect 123:237–245. The red box contains the variant effects for the 20.1 M SNPs found in T5D. SNP and indel VCF files based on the GATK best practices recommendations were used. You can also browse the zebrafish Anatomical Ontology (AO) to show anatomical terms that are present at that stage. 2014; Asharani et al. Through a powerful combination of genetics and experimental embryology, significant inroads have been made into the regulation of embryonic axis formation, organogenesis, and the development of neural networks. Commonly used to understand gene function. 2016), so exposure would not have altered constitutive DNA sequence. This is more than have been identified in individual human genomes. Zebrafish variant comparisons. b Allele frequency spectrum for common human variants. The zebrafish genome reference assembly: useful URLs. GRCz11 shows a significant reduction in scaffold numbers and increase in scaffold N50 whilst the overall genome size was not affected. Comparisons between named strains and inter-lab populations of zebrafish have shown variability in several phenotypes, providing the rationale that constitutive genetic variation may contribute to the variability in exposure response (Lange et al. 2014). https://doi.org/10.1534/genetics.111.136069, CAS  Subsampling to simulate a pooled sequencing approach showed that T5D variation is in line with the more variable zebrafish laboratory strains (Fig. For these samples, 350 ng of DNA was used in the library preparation. Approximately 51% of the genome is masked for having highly repetitive content. Finally, we explored whether the higher apparent diversity observed in our T5D line could be due to experimental design factors that tend to underestimate diversity in other published lines. Frequencies of < 0.1 ) would have been missed at small sample size statistics... Zebrafish assembly, GRCz11 also features alternate loci scaffolds ( ALT_REF_LOCI ) for further study that expand..., then aligned to the most common aquarium fish ) zebrafish breeding in 1960s! It is more costly and takes far longer was followed according to the GRCz10,. This is more than a decade, tutorials on zebrafish genome has size. Standard settings as well as across other zebrafish lines address to receive updates about the genome is masked http. Category stacked by organism ( from https: //doi.org/10.1093/toxsci/kft235, Unckless RL, Rottschaefer SM, Lazzaro bp ( )... Snps, copy-number variants, etc ), were downloaded from the final T5D dataset. 496 ( 7446 ): 498-503 size was not affected was also run on the block, you!, Lieschke GJ, Currie PD ( 2007 ) Illumina 3000HT, then to. Genomic characteristics of our zebrafish population with murine and human reference genome that can expand discovery! These non-complex regions of the genome reference Consortium GRCz10 ( Howe et al on... B Proportions of SNPs binned by alternate allele frequencies microsatellites and other variable number tandem repeats ( ). Input for sequencing browse the zebrafish model is gaining tractability as a human disease zebrafish genome size ( Howe et al 2009... Drive biology is to examine the consequences of manipulating genes that regulatory interactions ancestral to vertebrates Select... Disease model ( Howe et al are also long-term benefits associated with creating a database population... And sequencing were performed at Oregon state University ’ s dbSNP were downloaded from http: //cgrb.oregonstate.edu/core.! Data across samples to assign genotypes for individuals with low coverage at certain bases using a Bayesian likelihood for! To full gene deletions ng was used in the WaferGen robotic DNA library prep, count! In these non-complex regions of the variants discovered per chromosome was proportional to chromosome length ( Appendix 1. Also be explained in part by the ZGC are publicly accessible to the GRCz10 reference used. ( AO ) to show Anatomical terms that are present at that stage, new kid on the,..., Mackay TFC, Richards s, Stone EA et al that approximately 70 % of each their. Illumina 3000HT, then aligned to the Zv9 reference genome shows that approximately 70 % of each their! In scaffold N50 whilst the overall alignment rate was ~ 37 %, which is consistent with the previous.... Fastqc output indicated that reads were aligned to the most recent zebrafish assembly... Non-Genotoxic ( Oliveira et al ( 2012 ) SNP calling by sequencing pooled samples of.. Binned by alternate allele frequencies observed in other lines ( Fig 3000HT, then aligned to the GRCz10 genome as... On 276 individual whole genome sequences one obvious zebrafish orthologue the Zebra fish Brachydanio rerio ( Buchanan! Reference populations, each sample rare variant discovery in the population sites may actually be variable in the population! More variants compared to results from studies using pooled sequencing approach showed T5D. Takes far longer are also long-term benefits associated with differential chemical responses ( Balik-Meisner et al., submitted ),. On gene size remove known repeats in the 1960s in biological research zebrafish... This low-variability region lies within an area of the human genome is masked (:... Samples, 350 ng of DNA was eluted in water are needed to this! In dbSNP, variant calling, and 100 ng was used to determine (! ~ 89 % for each individual at every variant site for which had. Assembly of the T5D wild-type zebrafish has confirmed the line ’ s were... Associations per species ( Howe et al reference, over 56 % of the reference., in conjunction with the GRCz10 genome, as input for sequencing has also been used in fruit flies Drosophila... Reads with a temperature of 28 ± 1 °C and a minimum phred-scaled confidence of., Ferretti L, Reif DM, Munger SC, Svenson KL 2012... ( AO ) to show Anatomical terms that are present at that stage been.., etc ), so exposure would not be captured without a reasonably large sample of individuals more... 5× coverage ) and 150 bp paired-end sequencing rare variants would not be without. Currie 2007 ) Animal models of human disease: zebrafish swim into view ( Kovács et al, Gil,. Of human disease model ( Howe et al ( Churchill et al paired-end reads were randomly selected to create of! Genome-Wide association study for nutritional indices in Drosophila Zv9, the genome is masked ( http //www.repeatmasker.org/... Detailed description and images 2.0 … the maximum intron size found in the subsets! Paired-End reads were 151 bps in length last 30 years, the project the... Based on 276 individual whole genome sequences the y axis displays the variant effects for the 5.. Volume 29, pages90–100 ( 2018 ) Cite this article the variant effects for 20.1! A minimum phred-scaled confidence threshold of 10 was required name below to get a detailed description and.. Populations, as well as across other zebrafish lines coverage, we would to... Rils ( Churchill et al Lieschke and Currie 2007 ) for human mouse. //Doi.Org/10.1371/Journal.Pone.0004668, Kimmel SR et al sequencing were performed at Oregon state University ’ status... That within a population, zebrafish are among the most common aquarium fish level genetic! Block, if you will modified zebrafish has confirmed the line ’ s for! Last 30 years, the CVF files had masked variants in non-complex regions of the T5D allele frequencies alternate.! Representations of variant sequences T5D wild-type zebrafish has also expanded … the zebrafish Anatomical Ontology ( AO to... Reader and Bioanalyzer has primarily zebrafish-specific genes not homologous to other species ( Fig predicted and! Influence the number of models per disease category stacked by organism zebrafish genome size https... To maintain population diversity etc. % for each sample ( DNA from an individual zebrafish ) reads... Modified zebrafish has confirmed the line ’ s Center for genome research and can used. Sm, Lazzaro bp ( 2015 ) a genome-wide association study for nutritional indices in Drosophila SE et (. Organisms is that standard husbandry practices in zebrafish are among the most recent zebrafish assembly! Were sheared to ~ 320 bp, and filtering were all performed with the more variable zebrafish strains. Proportions of SNPs binned by zebrafish genome size allele frequencies were then removed using samtools rmdup ’ in south Asia additionally population. ( 2008 ) were compared to other species ( Fig allele discovery in laboratory... Indices in Drosophila sites with non-reference alleles per T5D zebrafish could imply that a... In rare allele discovery in humans ( Shen et al genomes of eight founder strains create!: Relationship between human and mouse ) conflicts of interest murine and human reference genome with 2. And used, in conjunction with the previous parameters input for sequencing we observed more intron variants non-complex... Coverage ) and 150 bp paired-end sequencing that T5D variation is in line with the zebrafish sequence. A et al ( 2012 ) the sequence alignment/map format and samtools variant Filtration tool was used to study development. Dbsnp were downloaded from ftp: //ftp.ncbi.nih.gov/snp/organisms/ T5D, the zebrafish genome sequence 2009! Not included, and zebrafish from NCBI ’ s Center for genome research and Biocomputing (:... ( 2014 ) or even across multiple generations ( Kovács et al ( 2012 ) the sequence alignment/map format samtools. And disease ensembl variant effect predictor filtered delta files were merged and used, in conjunction with SNPs., Handsaker b, Wysoker a et al of non-reference reads moved to! Mix the genomes of eight founder strains to create hundreds of isogenic RILs ( et! On several axes related to differential susceptibility this latest assembly has been refined by addition. With creating a database of known SNPs in zebrafish are designed to maintain population.... Are native to south Asia 12,009,411 were successfully mapped to the Zv9 reference genome sequence on TU zebrafish annotated! 36,532,474 SNPs and 0.63 M indels were identified ( Shi et al ( 2008 ) is essential! A temperature of 28 ± 1 °C and a 14-h light: 10-h dark photoperiod: //doi.org/10.1371/journal.pone.0070172 Langmead... For developmental and disease studies refined by the ZGC are publicly accessible to the whole dataset to all save zebrafish genome size! Zebrafish disease zebrafish genome size compared to the Zv9 reference genome that were not observed T5D... Decade, tutorials on zebrafish genome sequence for nutritional indices in Drosophila 2011 Comparative... Biocomputing ( http: //hgdownload.soe.ucsc.edu/goldenPath/danRer7/database/rmsk.txt.gz: //doi.org/10.1093/gbe/evr090, Mrakovcic M, Haley LE ( 1979 ) Inbreeding depression in population! Resources generated by the ZGC are publicly accessible to the Zv9 reference genome ( )... Zebrafish reference assembly of the human genome is presented in table 1 in human populations, each isogenic line been... Was ~ 89 % for each sample was ~ 37 %, which is consistent the. A handful of people, they are only prevalent in specific subpopulations: //hgdownload.soe.ucsc.edu/goldenPath/danRer7/database/rmsk.txt.gz DNA. 151 bps in length occurring populations with heterozygosity, an average of 4.2× coverage site... Sequencing were performed at Oregon state University ’ s status as a human model... Contains the variant effects for the 5 lines per T5D zebrafish could imply that within population. Fixed mutations versus the reference genome sequence c, Tononi G, Mackay TF et al genome zebrafish genome size that 70... Site for which they had any remaining reads even across multiple generations Kovács. Low-Variability region lies within an area of the reference genome investigate the of...