English > NEWS&EVENTS > Content

HZAU Scholars Advocate CropGPT - A New Model of Crop Intelligent Breeding

Ensuring food security remains the common mission for all agricultural scientists around the world. Over the past 80 years, enhanced food production can be attributed to both improved crop management practices and crop breeding through genetics (especially cross breeding). However, new challenges emerge in terms of food supply and breeding reform as global population keeps growing. Although we are on the cusp of intelligent breeding, the global level of breeding mainly remains at conventional breeding, especially in developing countries. In this context, it is imperative for us to integrate the latest modern technology and improve the breeding technology,so as to accelerate the process of genetic improvement.

Recently, in partnership with several domestic research teams, Professor Li Lin from The Maize Team of HZAU published a paper online themed “The CropGPT project: A call for a global, coordinated effort in precision design breeding driven by AI using biological big-data” in the Molecular Plant, an international journal. This paper advocated an open, shared and win-win cooperation model -- CropGPT, in a bid to facilitate intelligent design breeding worldwide. This paper also discussed how to integrate existing resources (including germplasm, biological big-data) and artificial intelligence (AI) methods to upgrade existing breeding technology. In addition, it put forward possible operational processes and potential theoretical basis.

Figure 1 The proposed workflow of the CropGPT project

Generative Pre-trained Transformers (GPT) such as GPT-2 and GPT-3 are powerful language models that utilize the transformers model (a neural network model based on a self-attention mechanism) to learn from extensive training data and generate text. The prosperity of GPT/NLP (Natural Language Processing) brings hopes for carrying out accurate intelligent design of breeding by enhancing the analysis of biological big-data. Therefore, researchers put forward the new concept of CropGPT in this paper, and advocated cooperation of CropGPT intelligent design breeding to the world. It is an open, cooperative, and win-win team breeding model, which requires cooperation among breeders, biologists, mathematicians, computer scientists, breeding companies, and biotechnology companies (Figure 1A). Firstly, breeders need to provide high-quality basic germplasm resources as elite founder, and breeding companies will complete the engineered construction of large populations of elite germplasms using double-haploid (DH) technology or other methods; Secondly, biologists build gene networks based on the multi-omics data generated and collected in the early stage. And in combination of AI technology systems, they investigate the regulatory relationship of genes-traits and quickly clone functional gene sets of important traits in batches. Thirdly, Li Lin’s team designed a number of precise intelligent breeding chips based on the global functional genes of important traits in the whole genome. These chips can carry out high-throughput genotyping of gene sets with specific functions in a population at a low cost. Meanwhile, researchers use the intelligent phenotypic platforms to conduct high-throughput phenotypic identification of the population. Fourthly, mathematicians integrate multi-model inputs (such as genotypic data, phenotypic data, and environmental factors) to develop ideal big-data models, while computer scientists use the model to predict reasonably optimized hybrid and provide breeding advice. Finally, the breeders make the hybrid combination, yield evaluation and stress resistance evaluation according to the breeding advice. High-quality materials can be directly applied commercially, and can also be added to the elite founder. Through the iterative cycle, the big-data model can be continuously optimized, enhancing the prediction accuracy and the intelligent breeding ability of CropGPT.

Mathematicians will develop an appropriate breeding language model using fine-tuned existing general-purpose LLMs based on large-scale breeding corpus, a key foundational factor that contribute to the success of CropGPT. On basis of that, researchers aim to develop a feature fusion approach in CropGPT that uses independent pre-trained encoders to process multi-modal data, including phenotypes, environmental factors, genotypes, multi-omics data, gene networks, and text, thus unifying the language of crop life (Figure 1B). Ideally, neural networks and self-supervised learning techniques will be incorporated into CropGPT to perform alignment and translation between multiple modalities (multi-modal data) and human natural language. Ultimately, CropGPT is expected to implement free-text inquiries, digest multi-modal inputs, and support diverse downstream tasks.

In summary, the CropGPT project aims to integrate diverse resources (germplasm and biological big-data) and make rapid and accurate predictions by using advanced technologies including DH technology, high-throughput genotyping, high-throughput phenotyping and AI. In doing so, it is expected to optimize and upgrade breeding technology to serve breeders.

Associate Professor Zhu Wanchao from the College of Agronomy at Northwest A&F University (formerly a postdoctoral researcher at HZAU) is the first author of this paper. Li Lin, professor from College of Plant Sciences & Technology of HZAU, and Li Weifu, associate professor from College of Informatics of HZAU, are the corresponding authors. Researcher Fan Xingming from Yunnan Academy of Agricultural Sciences, Associate Researcher Zhang Hongwei from Institute of Crop Science, Chinese Academy of Agricultural Sciences, Professor Chen Hong and Associate Professor Feng Zaiwen from HZAU participated in the design and revision of the paper.

Translated by Xia Xinyi
Proofread by Zhu Kaiyue
Supervised by Wang Xiaoyan