Gene + artificial intelligence, where is precision medicine going?


The genetics industry is a little hot recently. On the 17th of last month, BGI announced the establishment of a new business organization with artificial intelligence as the core, which caused a lot of speculation in the industry; then on July 29th, CCTV fully focused on precision medicine, and introduced genetic testing at a great length. They all talk about genes. However, during this period, a Canadian company called Deep Genomics was quietly established, and quickly occupied the headlines of major foreign media (but rarely reported in China).

So what exactly is this company doing? What's so special about it? Let us first look at foreign media evaluations. The Globe and Mail of Canada stated that "this Toronto startup intends to shake up the gene sequencing market"; while the Washington Post of the United States commented that "Deep Genomics, a startup that brings the power of deep learning to genomics"; Gizmag said "Deep Genomics intends to use deep learning to revolutionize genetic medicine"; "Wired" previously reported that "machine intelligence deciphers genetic control"; "Scientific American" said it mysteriously, "Some corners of our DNA hide clues to disease – The light of deep learning illuminates the little-known corners of genetic mutation”.

To sum up, Deep Genomics is the product of the marriage of artificial intelligence and genomics, namely "Deep Learning + Genomics". In the era of genomics research with deep learning, Deep Genomics opened the first window.

Maybe you have a big question in your mind. Genetic testing has been done for so long, and many diseases can be detected. Why does genomics require deep learning technology? Here is an example. There is a sudden power outage in a certain city. In order to find out why the power outage, there are two ways: the first is to check all the wires, and then find the damaged location; the second is to choose those that are easily damaged in normal times location to investigate. If we do a statistical analysis of the causes of power outages in 100 different cities, it is not difficult to find that some causes occur with high frequency, while others occur with low frequency.

The same is true for our human body. The total number of DNA mutations (SNVs) in the population is probably in the hundreds of millions. Among them, the mutation frequency of more than 1% is called SNPs, and there are about 3 million SNPs. To study the relationship between diseases and SNPs, a huge sample size of patients is required, and the differences between the SNPs in the patient population and the normal population are calculated. For SNVs with a mutation frequency of less than 1%, although the population is large, the individual is not statistically significant, so it is automatically screened out in the analysis of the disease. From the number, it is not difficult to see that if genetic testing lacks in-depth analysis of SNVs with a mutation frequency of less than 1%, precision medicine can only be limited to a narrow range.

At present, the items approved by the National Health and Family Planning Commission for clinical testing include: genetic disease diagnosis, prenatal screening and diagnosis, preimplantation embryo genetic diagnosis, and tumor diagnosis and treatment. The common feature of these four types of projects is that the disease is only associated with one or a few susceptibility genes. In fact, in addition to single-gene genetic diseases, the number of susceptibility genes for other diseases depends on the degree of research on the disease. For example, the current genetic testing for breast cancer mainly focuses on the BRCA1 and BRCA2 genes, and a large number of mutations have been found in these two genes, but we lack a deep understanding of the impact of these mutations on breast cancer. What's more, with the in-depth study of breast cancer samples, 40 genes related to breast cancer have been found (of course, there may be multiple SNVs in each gene). Therefore, from the perspective of genetic testing, it is still too early to achieve precision medicine.

The founder of Deep Genomics, Professor Frey of the University of Toronto in Canada, has been focusing on research in this field very early. Their academic team has successively published research results in this field in the top international journals "Science", "Nature Biotechnology" and "Bioinformatics", hoping to use deep machine learning technology to transform the development of precision medicine, genetic testing, diagnosis and treatment.

Next, I will talk about how Deep Genomics analyzes the relationship between SNVs with a mutation frequency of less than 1% and disease. Of course, to clarify the solution of Deep Genomics, we still need to continue to popularize science. For students who have no biological background and have just learned a little bit of genetic knowledge, when they talk about diseases, they will think of genes, but in fact, there are several steps from genes to diseases. If the pot is not done well, there may be a problem with the design drawings, or there may be a problem with the mold.

Suppose we want to make a robot, we must first draw drawings and material cutting diagrams (DNA), then make molds (RNA) according to the drawings and material cutting diagrams, and then make various originals (proteins) according to the molds, and finally these components are composed of functional components robot. Our life activities are also realized in this way. Life information is transmitted from DNA that carries genes to RNA, and then to biologically active proteins, and finally all life activities are realized by proteins.

In the process of making the robot, mistakes may appear in the drawings (genes) or in the material cutting drawings. Both errors can cause the robot to malfunction. Current genetic testing analyzes the impact of frequently occurring variants in genes on disease, while seriously ignoring the impact of gene splicing variants on disease. The reason is nothing more than the low frequency of control gene splicing variants, which is not statistically significant. But their numbers are enormous—hundreds of millions. Deep Genomics currently provides predictions of how 328 million SNVs affect the shearing of RNA, the material from which the mold is made. So how does Deep Genomics do it?

According to the current thinking of genetic testing, it is difficult to analyze these SNVs. Therefore, Deep Genomics introduces deep learning artificial intelligence techniques. First, Frey's team established a mathematical model, and then input the whole genome sequence and RNA sequence of healthy people to train the model, so that the model can learn the RNA splicing pattern of healthy people; The model is confirmed and corrected; finally, the accuracy of the model judgment is tested using several currently known case data. Guided by this line of thinking, Deep Genomics launched their first product, SPIDEX. Simply import the sequencing results and cell type, and SPIDEX can analyze the effect of a variant on RNA splicing and calculate the relationship between the variant and the disease.

If Deep Genomics' deep learning analysis becomes accurate enough, the technology's contribution will be clear: direct analysis of low mutation frequency variants in relation to disease; accelerated genomics research and drug development. At the same time, we must be soberly aware that the current SPIDEX technology of Deep Genomics can only analyze the relationship between RNA splicing variants caused by SNVs and diseases, and can do nothing for diseases caused by other causes. But even so, the application of artificial intelligence in genetic analysis is still worth looking forward to, perhaps it will become a golden key to decode the mysteries of genes and diseases.


Caretium Medical Instruments Co., Limited was founded in 2001, a high-tech company focusing on the research and development, manufacturing, sales and after-sale service of in-vitro diagnostic equipment and reagents. Caretium has been certified as China’s national “High-tech Enterprise” from 2011, got CE mark, ISO 13485, ISO 9001, GMP certified by South Korea and other certifications. 


All rights reserved:Caretium Medical Instruments Co., 粤ICP备11050458号  |  Powered by :300.cn