Dataset size sensitivity analysis of machine learning classifiers to differentiate molecular markers of paediatric low-grade gliomas based on MRI


Author(s): Matthias W. Wagner*, Khashayar Namdar, Abdullah Alqabbani, Nicolin Hainc, Liana Nobre Figuereido, Min Sheng, Manohar M Shroff, Eric Bouffet, Uri Tabori, Cynthia Hawkins, Michael Zhang, Kristen W. Yeom, Farzad Khalvati and Birgit B. Ertl-Wagner

Objectives: BRAF status has important implications for prognosis and therapy of Pediatric Low-Grade Gliomas (pLGG). Machine Learning (ML) approaches can predict BRAF status of pLGG on pre-therapeutic brain MRI, but the impact of training data sample size and type of ML model is not established.

Methods: In this bi-institutional retrospective study, 251 pLGG FLAIR MRI datasets from 2 children’s hospitals were included. Radiomics features were extracted from tumor segmentations and five models (Random Forest, XGBoost, Neural Network (NN) 1 (100:20:2), NN2 (50:10:2), NN3 (50:20:10:2)) were tested to classify them. Classifiers were cross-validated on data from institution 1 and validated on data from institution 2. Starting with 10% of the training data, models were cross-validated using a 4-fold approach at every step with an additional 2.25% increase in sample size.

Results: Two-hundred-twenty patients (mean age 8.53 ± 4.94 years, 114 males, 67% BRAF fusion) were included in the training dataset and 31 patients (mean age 7.97 ± 6.20 years, 18 males, 77% BRAF fusion) in the independent dataset. NN1 (100:20:2) yielded the highest area under the receiver operating characteristic curve (AUC). It predicted BRAF status with a mean AUC of 0.85, 95% CI (0.83, 0.87) using 60% of the training data and with mean AUC of 0.83, 95% CI (0.82, 0.84) on the independent validation data set.

Conclusion: Neural nets have the highest AUC to predict BRAF status compared to Random Forest and XG Boost. The highest AUC for training and independent data was reached at 60% of the training population (132 patients).

Share this article

Awards Nomination

Editors List

  • Prof. Elhadi Miskeen

    Obstetrics and Gynaecology Faculty of Medicine, University of Bisha, Saudi Arabia

  • Ahmed Hussien Alshewered

    University of Basrah College of Medicine, Iraq

  • Sudhakar Tummala

    Department of Electronics and Communication Engineering SRM University – AP, Andhra Pradesh




  • Alphonse Laya

    Supervisor of Biochemistry Lab and PhD. students of Faculty of Science, Department of Chemistry and Department of Chemis


  • Fava Maria Giovanna


Google Scholar citation report
Citations : 208

Onkologia i Radioterapia received 208 citations as per Google Scholar report

Onkologia i Radioterapia peer review process verified at publons
Indexed In
  • Directory of Open Access Journals
  • Scimago
  • MIAR
  • Euro Pub
  • Google Scholar
  • Medical Project Poland
  • Cancer Index
  • Gdansk University of Technology, Ministry Points 20