Abstract
A comprehensive evaluation of transformer-based models for Turkish Question Answering (QA) is conducted, introducing the novel pretraining and fine-tuning of CrammedBERTurk for the first time in this domain. The CrammedBERTurk model was pretrained on a single consumer GPU within 48 hours, showcasing efficient language model (LM) training under constrained computational resources. In contrast, pretraining BERT-base required 16 TPUs over four days. For Turkish QA, CrammedBERTurk was compared against BERTurk, XLM-RoBERTa, and ALBERT using Exact Match (EM), F1, ROUGE scores, and LLM-as-Judge assessment. CrammedBERTurk achieved competitive results compared to BERTurk and, in some cases, outperformed it across multiple datasets. On TQuAD, CrammedBERTurk achieved an EM score of 67.94% and an F1 score of 85.21%, representing a relative improvement of 4.3% in EM and 4.8% in F1 compared to BERTurk. Similarly, on THQuAD, CrammedBERTurk set a new state-of-the-art with an EM score of 69.4% and an F1 score of 86.58%, marking a relative improvement of 8.3% in EM and 5.3% in F1 over BERTurk. On the closed-domain EMUQuAD dataset, it achieved an EM score of 41.32% and an F1 score of 73.73%. The study provides insights into developing efficient transformers for low-resource languages under constrained computational budgets.
Affiliated Institutions
Related Publications
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
We present BART, a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning...
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
Abstract Motivation Biomedical text mining is becoming increasingly important as the number of biomedical documents rapidly grows. With the progress in natural language processi...
A Deep Reinforced Model for Abstractive Summarization
Attentional, RNN-based encoder-decoder models for abstractive summarization have achieved good performance on short input and output sequences. For longer documents and summarie...
How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment
Background Chat Generative Pre-trained Transformer (ChatGPT) is a 175-billion-parameter natural language processing model that can generate conversation-style responses to user ...
Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection
Unsupervised anomaly detection on multi- or high-dimensional data is of great importance in both fundamental machine learning research and industrial applications, for which den...
Publication Info
- Year
- 2025
- Type
- article
- Citations
- 0
- Access
- Closed
External Links
Social Impact
Social media, news, blog, policy document mentions
Citation Metrics
Cite This
Identifiers
- DOI
- 10.1145/3780096