Gene duplications create genetic redudancy and can have various effects, including detrimental mutations or divergent evolution. Gene duplication is the process by which a region of DNA coding for a gene is copied. Gene duplication can occur as the result of an error in recombination or through a retrotransposition event. Duplicate genes are often immune to the selective pressure under which genes normally exist. This can result in a large number of mutations accumulating in the duplicate gene code.
This may render the gene non-functional or in some cases confer some benefit to the organism. There are multiple mechanisms by which gene duplication can occur. Duplications can arise from unequal crossing-over that occurs during meiosis between misaligned homologous chromosomes. The product of this recombination is a duplication at the site of the exchange and a reciprocal deletion.
Ectopic recombination is typically mediated by sequence similarity at the duplicate breakpoints, which form direct repeats. Repetitive genetic elements, such as transposable elements, offer one source of repetitive DNA that can facilitate recombination, and they are often found at duplication breakpoints in plants and mammals. Gene Duplication : This figure indicates a schematic of a region of a chromosome before and after a duplication event. Replication slippage is an error in DNA replication, which can produce duplications of short genetic sequences.
During replication, DNA polymerase begins to copy the DNA, and at some point during the replication process, the polymerase dissociates from the DNA and replication stalls.
When the polymerase reattaches to the DNA strand, it aligns the replicating strand to an incorrect position and incidentally copies the same section more than once. Replication slippage is also often facilitated by repetitive sequence but requires only a few bases of similarity.
During cellular invasion by a replicating retroelement or retrovirus, viral proteins copy their genome by reverse transcribing RNA to DNA.
If viral proteins attach irregularly to cellular mRNA, they can reverse-transcribe copies of genes to create retrogenes. Retrogenes usually lack intronic sequence and often contain poly A sequences that are also integrated into the genome. Many retrogenes display changes in gene regulation in comparison to their parental gene sequences, which sometimes results in novel functions. Aneuploidy occurs when nondisjunction at a single chromosome results in an abnormal number of chromosomes.
Aneuploidy is often harmful and in mammals regularly leads to spontaneous abortions. Some aneuploid individuals are viable. For example, trisomy 21 in humans leads to Down syndrome, but it is not fatal.
Aneuploidy often alters gene dosage in ways that are detrimental to the organism and therefore, will not likely spread through populations. Gene duplications are an essential source of genetic novelty that can lead to evolutionary innovation. Thus, duplicate genes accumulate mutations faster than a functional single-copy gene, over generations of organisms, and it is possible for one of the two copies to develop a new and different function.
This is an examples of neofunctionalization. Gene duplication is believed to play a major role in evolution; this stance has been held by members of the scientific community for over years. It has been argued that gene duplication is the most important evolutionary force since the emergence of the universal common ancestor.
Another possible fate for duplicate genes is that both copies are equally free to accumulate degenerative mutations, so long as any defects are complemented by the other copy.
Neither gene can be lost, as both now perform important non-redundant functions, but ultimately neither is able to achieve novel functionality. Subfunctionalization can occur through neutral processes in which mutations accumulate with no detrimental or beneficial effects. However, in some cases subfunctionalization can occur with clear adaptive benefits.
If an ancestral gene is pleiotropic and performs two functions, often times neither one of these two functions can be changed without affecting the other function. In this way, partitioning the ancestral functions into two separate genes can allow for adaptive specialization of subfunctions, thereby providing an adaptive benefit.
Genetic divergence is the process in which two or more populations of an ancestral species accumulate independent genetic changes through time, often after the populations have become reproductively isolated for some period of time. In some cases, subpopulations living in ecologically distinct peripheral environments can exhibit genetic divergence from the remainder of a population, especially where the range of a population is very large.
Genetic drift or allelic drift is the change in the frequency of a gene variant allele in a population due to random sampling. The alleles in the offspring are a sample of those in the parents, and chance has a role in determining whether a given individual survives and reproduces. Genetic drift may cause gene variants to disappear completely and thereby reduce genetic variation. When there are few copies of an allele, the effect of genetic drift is larger, and when there are many copies the effect is smaller.
These changes in gene frequency can contribute to divergence. Divergent evolution is usually a result of diffusion of the same species to different and isolated environments, which blocks the gene flow among the distinct populations allowing differentiated fixation of characteristics through genetic drift and natural selection.
Divergent evolution can also be applied to molecular biology characteristics. This could apply to a pathway in two or more organisms or cell types. This can apply to genes and proteins, such as nucleotide sequences or protein sequences that are derived from two or more homologous genes. Both orthologous genes resulting from a speciation event and paralogous genes resulting from gene duplication within a population can be said to display divergent evolution.
Noncoding DNA are sequences of DNA that do not encode protein sequences but can be transcribed to produce important regulatory molecules. The amount of noncoding DNA varies greatly among species. However, many types of noncoding DNA sequences do have important biological functions, including the transcriptional and translational regulation of protein-coding sequences, origins of DNA replication, centromeres, telomeres, scaffold attachment regions SARs , genes for functional RNAs, and many others.
Other noncoding sequences have likely, but as-yet undetermined, functions. Some sequences may have no biological function for the organism, such as endogenous retroviruses. The amount of total genomic DNA varies widely between organisms, and the proportion of coding and noncoding DNA within these genomes varies greatly as well. While overall genome size, and by extension the amount of noncoding DNA, are correlated to organism complexity, there are many exceptions.
For example, the genome of the unicellular Polychaos dubium formerly known as Amoeba dubia has been reported to contain more than times the amount of DNA in humans.
The extensive variation in nuclear genome size among eukaryotic species is known as the C-value enigma or C-value paradox. Most of the genome size difference appears to lie in the noncoding DNA. The information on this site should not be used as a substitute for professional medical care or advice. Contact a health care provider if you have questions about your health. How are gene variants involved in evolution? From Genetics Home Reference. Topics in the Variants and Health chapter What is a gene variant and how do variants occur?
How can gene variants affect health and development? Do all gene variants affect health and development? What kinds of gene variants are possible?
Can a change in the number of genes affect health and development? The MLD computed for simulated sequences with various divergence times. In all panels, the red dotted line represents the theoretical distribution obtained when computing the same experiment on random iid sequences with the same length and the same nucleotide frequencies than the simulated sequences.
For small lengths, MLDs are consistent with these expectations. All empirical data are represented using logarithmic binning to reduce the sampling noise. Each plot shows the probability distribution obtained for 10 4 sequences of length 10 6 bp.
In this article, we have shown that only certain evolutionary scenarios are able to account for various empirical power law behaviors in the MLDs of a self-alignment of whole genomes, of processed pseudogenomes, and of the comparative alignment of two distantly related genomes.
The basic and necessary ingredients for these scenarios are point mutations, duplications, as well as a heterogeneity of mutation rates. Such a heterogeneity reflects the existence of neutrally evolving regions and conserved parts of the genomes, as for instance UCE. For illustrative purposes, we also developed an in silico model of such an evolution and are able to reproduce the empirically observed properties of MLDs in genomes.
Above we demonstrate that this function has different shapes for various evolutionary scenarios see fig. In the genomic context, this condition implies that segmental duplications occur continuously and therefore homologous pairs of sequences that have not diverged yet exist.
In the genomic context, the first condition indicates that all homologous sequences have already diverged, and the second one implies that the number of closely related homologous pairs increases linearly with their divergence.
As explained in the text, different ways of analyzing genomic data either performing a self-alignment or aligning two genomes or focusing on distinct compartments e. The functional forms of the first three scenarios are given in the article, the last one is a convolution of two exponential distributions the exact functional form affects the exponent of the power law tail; see eq.
This observation also implies that random segmental duplications occurred continuously and with a constant rate in the history of these species, and is an ongoing process. If other processes—as for instance retroduplication, whole genome duplication, or burst of segmental duplication—did occur in these genomes, their contribution to the statistical properties of those genomes is negligible compared with random segmental duplications.
Note that one cannot judge whether duplicated sequences are prone to duplicate again or not from the knowledge of the MLD alone. In the first case, the duplicated sequences follow a branching process and the Yule framework developed in this article should be used.
Otherwise the simple random duplication model introduced by Massip and Arndt can be used. However, it has been observed that exact matches occurring several times i. This observation could be accounted for in the Yule framework, but not with the simple random duplication model where exact matches with more than two occurrences are rare.
S2C , Supplementary Material online could therefore also be due to a higher rate of retroduplication in this particular genome.
The understanding of MLDs of comparative alignments requires a different reasoning. In the text above, we further argue that continuous duplication processes in the two genomes after their split cannot account for the observed power law tail in the MLD. In general, a power law tail can be accounted for by assuming distribution of mutation rates along the genomes, as we have shown analytically and numerically. This indicates that the mutation rate in the studied genomes fulfils three conditions.
First, the mutation rate of well-conserved segments is correlated along the genome with a typical correlation length of at least hundreds of base pairs. Second, there should be nonmutating long regions, such that the distribution of the mutation rate does not vanish at zero. Indeed, comparing eukaryotic genomes numerous such regions have been identified Bejerano et al. Third, the mutation rate of well-conserved regions is not the same for all the regions but is continuously distributed.
In summary, the distribution of mutation rates of well-conserved regions is a smooth function which does not vanish at zero. We observed this behavior for the self-alignment of the genomes of the plant and fish model organisms: A rabidopsis thaliana and D anio rerio Zebrafish , in which a whole-genome duplication event occurred recently Van de Peer ; Nakatani et al. For example, we compared the human H and mouse M exomes. This indicates that the distribution of exomic mutation rates vanishes for small rates in at least one of the species, which could be due to relaxed selective constraints on synonymous sites see supplementary data , Supplementary Material online.
MLDs computed from the self-alignment of many other genomes have been presented by Taillefer and Miller However, genomes with long and highly similar sequences, which are generated by segmental duplications and especially tandem duplications, are not easy to sequence and assemble when using short read next generation sequencing technologies. As the power law behavior only holds for long matches—typically longer than the read length—such power law behavior often remains highly questionable unless the genomic assembly is of a high quality, that is, comparable with the one of the human and mouse genomes.
Any deviation from this behavior could in principle be interpreted as a lack of proper repeat masking notably if one observes peaks for certain lengths in the MLD , a prevalence of another biological process if one observes a power law with a different exponent or a poor assembly quality if one observes a strong deviation from power law behavior.
Computing the MLD of a genome, which is a simple and fast computational procedure, can in this sense be of great help in order to understand the biological processes that shape the evolution of this genome and to assess the quality of its assembly.
In conclusion, we have shown that different duplication mechanisms left different footprints in the MLD of eukaryotic genomes. Notably, we have shown that exact self-similarities as long as 1, bp in a typical eukaryotic genome could occur without involving any selection.
Besides, we have shown that the distribution of matches in a genomic alignment of two species goes through qualitatively different regimes as the genomes diverge fig. Such a power law therefore occurs naturally in the MLD of two diverging genomes and is a signature of differences in functional constrains and it is therefore not occurring neutrally. To compute the MLD from either a given sequence or two distinct sequences, we first used the MUMmer software to obtain all maximal matches Kurtz et al.
We then simply counted the resulting number of matches for each length to obtain the MLD. To do so, we first retrieved all the sequences matching in the two genomes each match between the two genomes corresponds to one sequence. Segments that do match with another segment are then considered nonunique. Namely, we define a match as nonunique if it shares a continuous segment of more than 20 bp with any other matches.
In supplementary fig. S3 , Supplementary Material online, we show that the distribution obtained after filtering out all the matches were not unique in both the mouse and human genome. To simulate the dynamical evolution of a genome under the discussed processes duplications and mutations with given rates, we use a Kinetic Monte Carlo scheme.
The first process, mutation, replaces one nucleotide by another one. Depending on the evolutionary scenario, we consider different duplication processes. The copied segment replaces the K pre-existing nucleotides such that the total length L of the sequence remains constant. In this model, we also reduce the rate of nucleotide exchanges for the first K positions to mimic the selection on those sites due to functional constrains on a genomic locus. To generate sequences for self-alignments, we apply the dynamics until a stationary state is reached.
We also use a Kinetic Monte Carlo procedure to generate sequences of species diverging from a common ancestor while including mutation rate heterogeneity. This way, some regions are highly conserved with a low mutation rate , while others evolve fast. In this model, we also include random segmental duplications.
We then simulate the dynamics until a stationary state is reached and then duplicate the whole sequence to mimic a speciation event.
Later, the dynamics is simulated for some divergence time t 1. The two sequences are aligned to find exactly matching segments. All the repeat-masked genomes we analyze in this article were downloaded from the Ensembl website version 72 Flicek et al. Genome Biol. PubMed Google Scholar.
Li, C. PLoS Comput. Wang, D. Pharmacogenomics J. Suenaga, Y. Willemsen, A. Download references. Article 10 NOV News 05 NOV Article 03 NOV Francis Crick Institute. Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily. Advanced search. Skip to main content Thank you for visiting nature. You have full access to this article via your institution. Download PDF. References 1. Article Google Scholar 2. Google Scholar 4. PubMed Article Google Scholar 5.
PubMed Article Google Scholar 6. PubMed Article Google Scholar 7. PubMed Article Google Scholar 8. PubMed Article Google Scholar 9. Article Google Scholar PubMed Article Google Scholar PubMed Google Scholar Close banner Close.
Email address Sign up. Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing.
0コメント