MBA: A LITERATURE MINING SYSTEM FOR EXTRACTING BIOMEDICAL ABBREVIATIONS

MBA: a literature mining system for extracting biomedical abbreviations

MBA: a literature mining system for extracting biomedical abbreviations

Blog Article

Abstract Background The exploding growth of the biomedical literature presents many challenges for biological researchers.One such challenge is from the use of a great deal of abbreviations.Extracting abbreviations and their definitions accurately is very helpful to biologists and also facilitates biomedical text analysis.

Existing approaches fall into four broad categories: rule based, machine learning based, text alignment based and statistically based.State of the art methods either focus exclusively on acronym-type abbreviations, or could Hockey Protective - Pants - Intermediate not recognize rare abbreviations.We propose a systematic method to extract abbreviations effectively.

At first a scoring method is used to classify the abbreviations into acronym-type and non-acronym-type abbreviations, and then their corresponding definitions are identified by two different methods: text alignment algorithm for the former, statistical method Perfumes for the latter.Results A literature mining system MBA was constructed to extract both acronym-type and non-acronym-type abbreviations.An abbreviation-tagged literature corpus, called Medstract gold standard corpus, was used to evaluate the system.

MBA achieved a recall of 88% at the precision of 91% on the Medstract gold-standard EVALUATION Corpus.Conclusion We present a new literature mining system MBA for extracting biomedical abbreviations.Our evaluation demonstrates that the MBA system performs better than the others.

It can identify the definition of not only acronym-type abbreviations including a little irregular acronym-type abbreviations (e.g., ), but also non-acronym-type abbreviations (e.

g., ).

Report this page