Menu

Generating Non-English Synthetic Medical Data Sets

calendar icon Oct 7, 2024 55 views
video thumbnail
Pause
Mute
speed icon
speed icon
0.25
0.5
0.75
1
1.25
1.5
1.75
2

Usingsynthetic datasets to train medicine-focused machine learning models has been shown to enhance their performance, however, most research focuses on English texts. In this paper, we explore generating non-Englishsyntheticmedicaltexts. Wepropose a methodologyforgeneratingmedicalsynthetic data, showcasing it by generating Greeklish medical texts relating to hypertension. Weevaluate our approach with seven different language models and assess the quality of the datasets by training a classifier to distinguish between original and synthetic examples. We find that the Llama-3 performs best for our task.

RELATED CATEGORIES

MORE VIDEOS FROM THE SAME CATEGORIES

Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license.