Text Mining in Medical Science based on Virtual Library of Science

ICM UW develops and supplies the Taxila framework in Poland for quick and practical analysis of scientific texts. Thanks to the cooperation with the Systems Biology Institute in Tokyo and the resources of the Virtual Library of Science (WBN), dozens of doctors and scientists from recognized medical centers throughout Poland have already benefited from the workshops dedicated to the use of Taxila in oncology and in research on Covid-19.

Taxila is a platform based on artificial intelligence that allows to analyze text at a large scale and obtain scientific insight from it. The framework uses text-mining technology: converting text contained in different types of documents into data structures suitable for various algorithms. In particular, by operating on a huge collection of publications, Taxila allows to generate scientific hypotheses combining various areas of knowledge contained in the text using tools such as: tag analysis, searching for correlations between concepts or graph visualization.

ICM provides Taxila tool on the basis of a scientific cooperation agreement with the Systems Biology Institute in Tokyo (SBI) and on the basis of licensing resources collected in the Virtual Library of Science available to Polish licensed institutions. During the oncology workshops, integrating Taxila with the WBN allows to analyze 25,000 full scientific articles, mainly from Springer and Elsevier journals. The Library is constantly expanding and new articles are being added.

“So far, we are the only research unit in the world with Taxila system installed, which, combined with the WBN corpus, enables the automatic analysis of thousands of scientific articles. Our immediate goal is to increase the pool of scientific articles from 25,000 to 100,000 and expand trainings for the medical community, and in the future, allow access to the framework for doctors, researchers and students of medical universities, including the inclusion of the application in the offer of the newly established Medical Faculty at the University of Warsaw – informs Dr. Robert Sot, director of the Interdisciplinary Center for Mathematical and Computational Modelling at the University of Warsaw.

Text Mining workshops are held in groups of twenty and are conducted in English by the SBI instructors. So far, ICM hosted meetings devoted to medical topics: “Taxila: Empowering the fight against COVID-19 through text” and two editions of “Taxila global scientific literature text-mining intelligence for oncology research”. The events were attended by academics, doctors and researchers from universities and medical universities in Gdańsk, Kraków, Lublin, Białystok, Katowice and Warsaw, the National Institute of Oncology (Warsaw, Gliwice), the Institute of Mother and Child, the International Institute of Molecular and Cell Biology, the Agency Medical Research and several other research units.

Details of the next editions of the workshops will be published on the ICM Academy website.

Virtual Library of Science (WBN) is a program for purchasing and sharing global knowledge resources in the form of electronic journals, books and databases for Polish academic and scientific institutions. The program is co-financed by the Ministry of Education and Science and is mostly carried by ICM. WBN access to resources on publishers’ servers is purchased under annually renewed licenses, but a large part of the resources is currently archived on the Infona server in ICM, which ensures the possibility of unlimited use of the archives in the event of temporary or permanent loss of access to publishers’ servers, especially in the event of non-continuation of the license.

WBN provides 26,000 magazine titles and 157,000 book titles for over 500 Polish institutions, as a part of national and consortium licenses. The number of downloads in 2021 amounted to 18 million articles, respectively and 5.2 million book chapters, and the researchers used the possibility of searching databases 7.9 million times. Details on licenses and open publishing programs are available at wbn.icm.edu.pl

See more:

COVID-19 Taxila – open source version
Akujuobi U., Spranger M., Palaniappan S.K., Zhang X. 2020 T-PAIR: Temporal Node-pair Embedding for Automatic Biomedical Hypothesis Generation IEEE Transactions on Knowledge Sand Data Engineering
Hiroaki Kitano, Nobel Turing Challenge — Creating the Engine of Scientific Discovery (VIRTUAL ICM SEMINARS) https://www.youtube.com/watch?v=OwINVXEusQY