Word embeddings are numerical distributed word representations that have recently sparked significant interest in the research community. They are used by Google and Facebook corporations in various applications such as enhancing search and recommendation engines. Many word embedding techniques such as word2vec, doc2vec, glove, BERT, RoBERT and others have recently been published. Word embedding methods identify the target word by their accompanying words and convert their textual representation into numerical vector spaces where arithmetic calculations can be done. Extracting knowledge from written text into structured formats such as datasets usually takes a significant amount of time and resources. For example, collecting data about underutilised crops from the literature can be a tedious process. This research aims to partially automate the process of data collection that can be used to build domain-specific decision support systems. To demonstrate the usefulness of neural information retrieval, we propose an application of this method to extract soil requirements of underutilised crops. We trained a word2vec model on 178 different languages with 72GB of Wikipedia text with 8.1 billion words. The trained model was used to predict the soil classification requirements of 2201 crops and the verification was done using FAO Ecocrop data. We tested several arithmetic methods for calculating the crops’ scientific and common English names against soil classes in the word embedding vector space. We achieved an accuracy of 76.11% for soil classification against crop scientific names while English common names against soil classes resulted in 49.97% accuracy. The results suggest that it is effective to use neural information retrieval in knowledge extraction. It also suggests the use of scientific names word embeddings for crops provides higher accuracy than common English names in this domain. Future work will involve other types of predictions relevant to agriculture including aspects pertaining to macro and micronutrients, crop types, and crop usage.
Supplementary notes can be added here, including code, math, and images.