Technological background of a speech recognition system for the dictation of thyroid gland medical reports

Authors

  • András Kocsor Research Group on Artifical Intelligence of the address: Hungarian Academy of Sciences and University of Szeged, H-6720 Szeged, Aradi vértanúk tere 1.
  • András Bánhalmi Research Group on Artifical Intelligence of the address: Hungarian Academy of Sciences and University of Szeged, H-6720 Szeged, Aradi vértanúk tere 1.
  • Dénes Paczolay Research Group on Artifical Intelligence of the address: Hungarian Academy of Sciences and University of Szeged, H-6720 Szeged, Aradi vértanúk tere 1.

Keywords:

continous speech recognizer, ASR, automatic speech recognition, dictation system, HMM, Hidden Markov Model, MSD, Morphosyntactical descriptor, grammar, accoustic model, N-gram

Abstract

With the considerable development of speech recognition technologies in several administration-requiring professions the demand for the so called speech-based documentation has grown. This is particularly true in the case of the documentation of medical reports therefore the acceleration of this procedure is of great importance for smaller languages whit special linguistic features few systems for dictating medical reports have been developed so far wich fact can be attributed to linguistic specialties and high development expenses. In Szeged we developed a core module capable of automatic recognition of the Hungarian language on wich several domain oriented system can be built- The core module contains the so called acoustic model, which is suitable for building of the model we used two significantly different approaches. One is the Hidden Markov Model well know in speech recognition, the other is the novel stochastic segmental approach developed in Szeged. For the developed of both models we used a large speech corpus with 500 speakers, and then the performance of the modules was tested on test databases. To accompany the core module we built languages module was tested on test databases. To accompany the core module we built a languages module (for Windows environment) suitable for the dictation of thyroid gland medical reports in order to justify the applicability of the developed methods. The module was built on 9231 written thyroid medical reports and over 2500 word forms. We present the structure of built language and acoustic models, the test results describing the efficiency of the models, furthermore we mention the different aspect of the application and technology of the software.

Author Biography

  • András Kocsor, Research Group on Artifical Intelligence of the address: Hungarian Academy of Sciences and University of Szeged, H-6720 Szeged, Aradi vértanúk tere 1.

    corresponding author
    kocsor@inf.u-szeged.hu

References

Becchetti, C., Ricotti, L. P. (2000). Speech Recognition, John Wiley & Sons LTD, Chichester, England

C. M. Bishop, (1995). Neural Networks for Pattern Recognition, Oxford University Press Duda, R. O., Hart, P. E., Stork, D. G. (2001). Pattern Classification, Wiley

Felföldi, L., Kocsor, A., Tóth, L. (2002). Classifier Combination in Speech Recognition, Conference of PhD students on Computer Sciences, Volume of Extended Abstracts, Szeged, Hungary, 30–31.

Huang, X., Acero, A., Hon, H. (2001). Spoken Language Processing, Prentice Hall, New Jersey

Kocsor, A., Tóth, L., Kuba Jr., A., Kovács, K., Jelasity, M., Gyimóthy, T., Csirik, J. (2000a). A Comparative Study of Several Feature Space Transformation and Learning Methods for Phoneme Classification, International Journal of Speech Technology, 3. 3/4. 263–276

Kocsor, A., Kuba, A., Tóth, L. (2000b). Phoneme Classification Using Kernel Principal Component Analysis, Periodica Polytechnica, 44(1) 77–90.

Kocsor, A., Kuba, A., Tóth, L., Jelasity, M., Felföldi, L., Gyimóthy, T., Csirik, J. (1999). A Segment-Based Statistical Speech Recognition System for Isolated/Continuous Number Recognition, Proceedings of the FUSST'99, Aug. 19–21, Sagadi, Estonia, 201–211.

Crochemore, M., Ryller, W. (1994). Text Algorithms, Oxford University Press, Oxford

Nyers, Á., (2004). Beszédfelismerés az orvosi dokumentáció korszerűsítésére, IME (Informatika és Menedzsment az Egészségügyben) 3(5) 39–43.

Moore, B. C. J. (1997). An Introduction to the Psychology of Hearing, Academic Press https://doi.org/10.1163/9789004658820

Rabiner, L. R., Juang, B. H. (1993). Fundamentals of Speech Recognition, Prentice-Hall, Englewood

Rabiner, L. R., Schafer, R.W. (1978). Digital Processing of Speech Signals, Prentice-Hall, Englewood

Smith, J., IBM, (2002). ViaVoice and Dragon Naturally Speaking XP, ANWALT S.32.

Tóth, L., Kocsor, A., Kovács, K. (2000). A Discriminative Segmental Speech Model and its Application to Hungarian Number Recognition, Springer Verlag, TSD'2000, 307–313.

Vapnik, V. N. (1998). Statistical Learning Theory, Wiley

Published

2006-02-15

How to Cite

Kocsor, A., Bánhalmi, A., & Paczolay, D. (2006). Technological background of a speech recognition system for the dictation of thyroid gland medical reports. Acta Agraria Kaposváriensis, 10(1), 113-128. https://journal.uni-mate.hu/index.php/aak/article/view/1764