Publications by authors named "Arong Luo"

10 Publications

  • Page 1 of 1

The complete mitochondrial genome of (Timberlake) (Hymenoptera: Encyrtidae).

Mitochondrial DNA B Resour 2021 Feb 14;6(2):550-552. Epub 2021 Feb 14.

Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.

The complete mitochondrial genome of the (Timberlake, 1916) (Hymenoptera: Encyrtidae) was obtained via next-generation sequencing. This mitochondrial genome is 15,749 bp in length with 37 classical eukaryotic mitochondrial genes and an A + T-rich region. All the 13 PCGs begin with typical ATN codons. Among them, 12 PCG genes terminate with TAA, only one with TAG. All of the 22 tRNA genes, ranging from 58 to 72 bp with typical cloverleaf structure except for trnS1 and trnE, whose dihydrouridine arm forms a simple loop. A dramatic gene rearrangement with a large inversion of six protein-coding genes (----) also found in . . Phylogenetic analysis highly supported the monophyly of Pteromalidae, Eupelmidae, and Encyrtidae are sister groups. Within Encyrtidae, and are close to each other.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1080/23802359.2021.1872450DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7889208PMC
February 2021

The First Draft Genome of the Plasterer Bee Colletes gigas (Hymenoptera: Colletidae: Colletes).

Genome Biol Evol 2020 06;12(6):860-866

Key Laboratory of the Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.

Despite intense interest in bees, no genomes are available for the bee family Colletidae. Colletes gigas, one of the largest species of the genus Colletes in the world, is an ideal candidate to fill this gap. Endemic to China, C. gigas has been the focus of studies on its nesting biology and pollination of the economically important oil tree Camellia oleifera, which is chemically defended. To enable deeper study of its biology, we sequenced the whole genome of C. gigas using single-molecule real-time sequencing on the Pacific Bioscience Sequel platform. In total, 40.58 G (150×) of long reads were generated and the final assembly of 326 scaffolds was 273.06 Mb with a N50 length of 8.11 Mb, which captured 94.4% complete Benchmarking Universal Single-Copy Orthologs. We predicted 11,016 protein-coding genes, of which 98.50% and 84.75% were supported by protein- and transcriptome-based evidence, respectively. In addition, we identified 26.27% of repeats and 870 noncoding RNAs. The bee phylogeny with this newly sequenced colletid genome is consistent with available results, supporting Colletidae as sister to Halictidae when Stenotritidae is not included. Gene family evolution analyses identified 9,069 gene families, of which 70 experienced significant expansions (33 families) or contractions (37 families), and it appears that olfactory receptors and carboxylesterase may be involved in specializing on and detoxifying Ca. oleifera pollen. Our high-quality draft genome for C. gigas lays the foundation for insights on the biology and behavior of this species, including its evolutionary history, nesting biology, and interactions with the plant Ca. oleifera.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/gbe/evaa090DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7313665PMC
June 2020

A Simulation-Based Evaluation of Tip-Dating Under the Fossilized Birth-Death Process.

Syst Biol 2020 03;69(2):325-344

School of Life and Environmental Sciences, University of Sydney, Sydney, New South Wales 2006, Australia.

Bayesian molecular dating is widely used to study evolutionary timescales. This procedure usually involves phylogenetic analysis of nucleotide sequence data, with fossil-based calibrations applied as age constraints on internal nodes of the tree. An alternative approach is tip-dating, which explicitly includes fossil data in the analysis. This can be done, for example, through the joint analysis of molecular data from present-day taxa and morphological data from both extant and fossil taxa. In the context of tip-dating, an important development has been the fossilized birth-death process, which allows non-contemporaneous tips and sampled ancestors while providing a model of lineage diversification for the prior on the tree topology and internal node times. However, tip-dating with fossils faces a number of considerable challenges, especially, those associated with fossil sampling and evolutionary models for morphological characters. We conducted a simulation study to evaluate the performance of tip-dating using the fossilized birth-death model. We simulated fossil occurrences and the evolution of nucleotide sequences and morphological characters under a wide range of conditions. Our analyses of these data show that the number and the maximum age of fossil occurrences have a greater influence than the degree of among-lineage rate variation or the number of morphological characters on estimates of node times and the tree topology. Tip-dating with the fossilized birth-death model generally performs well in recovering the relationships among extant taxa but has difficulties in correctly placing fossil taxa in the tree and identifying the number of sampled ancestors. The method yields accurate estimates of the ages of the root and crown group, although the precision of these estimates varies with the probability of fossil occurrence. The exclusion of morphological characters results in a slight overestimation of node times, whereas the exclusion of nucleotide sequences has a negative impact on inference of the tree topology. Our results provide an overview of the performance of tip-dating using the fossilized birth-death model, which will inform further development of the method and its application to key questions in evolutionary biology.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/sysbio/syz038DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC7175741PMC
March 2020

A High-quality Draft Genome Assembly of Sinella curviseta: A Soil Model Organism (Collembola).

Genome Biol Evol 2019 02;11(2):521-530

Key Laboratory of the Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing, China.

Sinella curviseta, among the most widespread springtails (Collembola) in Northern Hemisphere, has often been treated as a model organism in soil ecology and environmental toxicology. However, little information on its genetic knowledge severely hinders our understanding of its adaptations to the soil habitat. We present the largest genome assembly within Collembola using ∼44.86 Gb (118X) of single-molecule real-time Pacific Bioscience Sequel sequencing. The final assembly of 599 scaffolds was ∼381.46 Mb with a N50 length of 3.28 Mb, which captured 95.3% complete and 1.5% partial arthropod Benchmarking Universal Single-Copy Orthologs (n = 1066). Transcripts and circularized mitochondrial genome were also assembled. We predicted 23,943 protein-coding genes, of which 83.88% were supported by transcriptome-based evidence and 82.49% matched protein records in UniProt. In addition, we also identified 222,501 repeats and 881 noncoding RNAs. Phylogenetic reconstructions for Collembola support Tomoceridae sistered to the remaining Entomobryomorpha with the position of Symphypleona not fully resolved. Gene family evolution analyses identified 9,898 gene families, of which 156 experienced significant expansions or contractions. Our high-quality reference genome of S. curviseta provides the genetic basis for future investigations in evolutionary biology, soil ecology, and ecotoxicology.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/gbe/evz013DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6389355PMC
February 2019

The molecular clock and evolutionary timescales.

Biochem Soc Trans 2018 10 28;46(5):1183-1190. Epub 2018 Aug 28.

School of Life and Environmental Sciences, University of Sydney, Sydney, NSW 2006, Australia

The molecular clock provides a valuable means of estimating evolutionary timescales from genetic and biochemical data. Proposed in the early 1960s, it was first applied to amino acid sequences and immunological measures of genetic distances between species. The molecular clock has undergone considerable development over the years, and it retains profound relevance in the genomic era. In this mini-review, we describe the history of the molecular clock, its impact on evolutionary theory, the challenges brought by evidence of evolutionary rate variation among species, and the statistical models that have been developed to account for these heterogeneous rates of genetic change. We explain how the molecular clock can be used to infer rates and timescales of evolution, and we list some of the key findings that have been obtained when molecular clocks have been applied to genomic data. Despite the numerous challenges that it has faced over the decades, the molecular clock continues to offer the most effective method of resolving the details of the evolutionary timescale of the Tree of Life.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1042/BST20180186DOI Listing
October 2018

Comparison of Methods for Molecular Species Delimitation Across a Range of Speciation Scenarios.

Syst Biol 2018 09;67(5):830-846

Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China.

Species are fundamental units in biological research and can be defined on the basis of various operational criteria. There has been growing use of molecular approaches for species delimitation. Among the most widely used methods, the generalized mixed Yule-coalescent (GMYC) and Poisson tree processes (PTP) were designed for the analysis of single-locus data but are often applied to concatenations of multilocus data. In contrast, the Bayesian multispecies coalescent approach in the software Bayesian Phylogenetics and Phylogeography (BPP) explicitly models the evolution of multilocus data. In this study, we compare the performance of GMYC, PTP, and BPP using synthetic data generated by simulation under various speciation scenarios. We show that in the absence of gene flow, the main factor influencing the performance of these methods is the ratio of population size to divergence time, while number of loci and sample size per species have smaller effects. Given appropriate priors and correct guide trees, BPP shows lower rates of species overestimation and underestimation, and is generally robust to various potential confounding factors except high levels of gene flow. The single-threshold GMYC and the best strategy that we identified in PTP generally perform well for scenarios involving more than a single putative species when gene flow is absent, but PTP outperforms GMYC when fewer species are involved. Both methods are more sensitive than BPP to the effects of gene flow and potential confounding factors. Case studies of bears and bees further validate some of the findings from our simulation study, and reveal the importance of using an informed starting point for molecular species delimitation. Our results highlight the key factors affecting the performance of molecular species delimitation, with potential benefits for using these methods within an integrative taxonomic framework.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1093/sysbio/syy011DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC6101526PMC
September 2018

A simulation study of sample size for DNA barcoding.

Ecol Evol 2015 12 1;5(24):5869-79. Epub 2015 Dec 1.

Key Laboratory of Zoological Systematics and Evolution Institute of Zoology Chinese Academy of Sciences Beijing 100101 China; College of Life Sciences University of Chinese Academy of Sciences Beijing 100049 China.

For some groups of organisms, DNA barcoding can provide a useful tool in taxonomy, evolutionary biology, and biodiversity assessment. However, the efficacy of DNA barcoding depends on the degree of sampling per species, because a large enough sample size is needed to provide a reliable estimate of genetic polymorphism and for delimiting species. We used a simulation approach to examine the effects of sample size on four estimators of genetic polymorphism related to DNA barcoding: mismatch distribution, nucleotide diversity, the number of haplotypes, and maximum pairwise distance. Our results showed that mismatch distributions derived from subsamples of ≥20 individuals usually bore a close resemblance to that of the full dataset. Estimates of nucleotide diversity from subsamples of ≥20 individuals tended to be bell-shaped around that of the full dataset, whereas estimates from smaller subsamples were not. As expected, greater sampling generally led to an increase in the number of haplotypes. We also found that subsamples of ≥20 individuals allowed a good estimate of the maximum pairwise distance of the full dataset, while smaller ones were associated with a high probability of underestimation. Overall, our study confirms the expectation that larger samples are beneficial for the efficacy of DNA barcoding and suggests that a minimum sample size of 20 individuals is needed in practice for each population.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1002/ece3.1846DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4717336PMC
December 2015

Positive selection on hemagglutinin and neuraminidase genes of H1N1 influenza viruses.

Virol J 2011 Apr 21;8:183. Epub 2011 Apr 21.

Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China.

Background: Since its emergence in March 2009, the pandemic 2009 H1N1 influenza A virus has posed a serious threat to public health. To trace the evolutionary path of these new pathogens, we performed a selection-pressure analysis of a large number of hemagglutinin (HA) and neuraminidase (NA) gene sequences of H1N1 influenza viruses from different hosts.

Results: Phylogenetic analysis revealed that both HA and NA genes have evolved into five distinct clusters, with further analyses indicating that the pandemic 2009 strains have experienced the strongest positive selection. We also found evidence of strong selection acting on the seasonal human H1N1 isolates. However, swine viruses from North America and Eurasia were under weak positive selection, while there was no significant evidence of positive selection acting on the avian isolates. A site-by-site analysis revealed that the positively selected sites were located in both of the cleaved products of HA (HA1 and HA2), as well as NA. In addition, the pandemic 2009 strains were subject to differential selection pressures compared to seasonal human, North American swine and Eurasian swine H1N1 viruses.

Conclusions: Most of these positively and/or differentially selected sites were situated in the B-cell and/or T-cell antigenic regions, suggesting that selection at these sites might be responsible for the antigenic variation of the viruses. Moreover, some sites were also associated with glycosylation and receptor-binding ability. Thus, selection at these positions might have helped the pandemic 2009 H1N1 viruses to adapt to the new hosts after they were introduced from pigs to humans. Positive selection on position 274 of NA protein, associated with drug resistance, might account for the prevalence of drug-resistant variants of seasonal human H1N1 influenza viruses, but there was no evidence that positive selection was responsible for the spread of the drug resistance of the pandemic H1N1 strains.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1743-422X-8-183DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3094300PMC
April 2011

Potential efficacy of mitochondrial genes for animal DNA barcoding: a case study using eutherian mammals.

BMC Genomics 2011 Jan 28;12:84. Epub 2011 Jan 28.

Key Laboratory of Zoological Systematics and Evolution, Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, PR China.

Background: A well-informed choice of genetic locus is central to the efficacy of DNA barcoding. Current DNA barcoding in animals involves the use of the 5' half of the mitochondrial cytochrome oxidase 1 gene (CO1) to diagnose and delimit species. However, there is no compelling a priori reason for the exclusive focus on this region, and it has been shown that it performs poorly for certain animal groups. To explore alternative mitochondrial barcoding regions, we compared the efficacy of the universal CO1 barcoding region with the other mitochondrial protein-coding genes in eutherian mammals. Four criteria were used for this comparison: the number of recovered species, sequence variability within and between species, resolution to taxonomic levels above that of species, and the degree of mutational saturation.

Results: Based on 1,179 mitochondrial genomes of eutherians, we found that the universal CO1 barcoding region is a good representative of mitochondrial genes as a whole because the high species-recovery rate (> 90%) was similar to that of other mitochondrial genes, and there were no significant differences in intra- or interspecific variability among genes. However, an overlap between intra- and interspecific variability was still problematic for all mitochondrial genes. Our results also demonstrated that any choice of mitochondrial gene for DNA barcoding failed to offer significant resolution at higher taxonomic levels.

Conclusions: We suggest that the CO1 barcoding region, the universal DNA barcode, is preferred among the mitochondrial protein-coding genes as a molecular diagnostic at least for eutherian species identification. Nevertheless, DNA barcoding with this marker may still be problematic for certain eutherian taxa and our approach can be used to test potential barcoding loci for such groups.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2164-12-84DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3042414PMC
January 2011

Performance of criteria for selecting evolutionary models in phylogenetics: a comprehensive study based on simulated datasets.

BMC Evol Biol 2010 Aug 9;10:242. Epub 2010 Aug 9.

Institute of Zoology, Chinese Academy of Sciences, Beijing 100101, China.

Background: Explicit evolutionary models are required in maximum-likelihood and Bayesian inference, the two methods that are overwhelmingly used in phylogenetic studies of DNA sequence data. Appropriate selection of nucleotide substitution models is important because the use of incorrect models can mislead phylogenetic inference. To better understand the performance of different model-selection criteria, we used 33,600 simulated data sets to analyse the accuracy, precision, dissimilarity, and biases of the hierarchical likelihood-ratio test, Akaike information criterion, Bayesian information criterion, and decision theory.

Results: We demonstrate that the Bayesian information criterion and decision theory are the most appropriate model-selection criteria because of their high accuracy and precision. Our results also indicate that in some situations different models are selected by different criteria for the same dataset. Such dissimilarity was the highest between the hierarchical likelihood-ratio test and Akaike information criterion, and lowest between the Bayesian information criterion and decision theory. The hierarchical likelihood-ratio test performed poorly when the true model included a proportion of invariable sites, while the Bayesian information criterion and decision theory generally exhibited similar performance to each other.

Conclusions: Our results indicate that the Bayesian information criterion and decision theory should be preferred for model selection. Together with model-adequacy tests, accurate model selection will serve to improve the reliability of phylogenetic inference and related analyses.
View Article and Find Full Text PDF

Download full-text PDF

Source
http://dx.doi.org/10.1186/1471-2148-10-242DOI Listing
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2925852PMC
August 2010