Derlem Dilbilimi ile Makine Çevirisinin Tarihsel Gelişimi: Çeviribilimsel Bir İnceleme The Historical Development of Corpus Linguistics and Machine Translation: A Translation Studies Perspective
Main Article Content
Abstract
Technological developments have led to profound transformations in the field of translation, as in many other disciplines. Especially since the 1990s, with the widespread accessibility of computers, computer-assisted translation (CAT) tools and machine translation systems have become increasingly prevalent in translation practice. Today, traditional translation has largely evolved into the post-editing and revision of outputs produced by neural machine translation (NMT) engines and large language models (LLMs). A major milestone in this transformation was the introduction of its neural machine translation system by Google in 2016, which significantly accelerated the technological impact on the field. This impact has become even more evident with the rapid proliferation of contemporary artificial intelligence systems. Accordingly, research on customized machine translation engines and large language models has gained increasing importance. The most fundamental component underlying these customized systems is data, particularly parallel corpora, which have become indispensable for training, tailoring, and improving the quality of such systems. While corpora were traditionally developed mainly for linguistic and descriptive studies, they are now increasingly integrated with technological applications and systematically constructed for neural machine translation engines and large language models. In this context, the present study examines the historical development of corpora, the evolution of machine translation throughout history, and English–Turkish corpus studies.
Article Details

This work is licensed under a Creative Commons Attribution 4.0 International License.
References
Aksan, Y., Aksan, M., Koltuksuz, A., Sezer, T., Mersinli, Ü., & Ufuk, U. (2012). Construction of the Turkish National Corpus (TNC).
Baker, M. (1993). Corpus linguistics and translation studies: Implications and applications. In M. Baker, G. Francis, & E. Tognini-Bonelli (Eds.), Text and technology: In honour of John Sinclair (pp. 233–250). John Benjamins.
Baker, M. (1996). Corpus-based translation studies: The challenges that lie ahead. John Benjamins.
Baker, M. (2000). Towards a methodology for investigating the style of a literary translator. Target, 12(2), 241–266.
Bauer, L. (1993). Manual of information to accompany the Wellington Corpus of Written New Zealand English. Victoria University of Wellington.
Biber, D. (1993). Representativeness in corpus design. Literary and Linguistic Computing, 8(4), 243–257. https://doi.org/10.1093/llc/8.4.243
Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investigating language structure and use. Cambridge University Press.
Blackwell, S. (1993). From dirty data to clean language. In J. Aarts et al. (Eds.), English corpus linguistics (pp. 97–106).
Collins, P. (1991). Cleft and pseudo-cleft constructions in English. Routledge.
Crowdy, S. (1993). Spoken corpus design. Literary and Linguistic Computing, 8(4), 259–265.
Francis, W. N., & Kučera, H. (1964). Manual of information to accompany a standard corpus of present-day edited American English, for use with digital computers. Brown University.
Francis, W. N., & Kučera, H. (1967). Computational analysis of present-day American English. Brown University Press.
Francis, W. N., & Kučera, H. (1982). Frequency analysis of English usage: Lexicon and grammar. Houghton Mifflin.
Granger, S. (2003). The corpus approach: A common way forward for contrastive linguistics and translation studies. In S. Granger, J. Lerot, & S. Petch-Tyson (Eds.), Corpus-based approaches to contrastive linguistics and translation studies (pp. 17–29). Rodopi.
Greenbaum, S. (1991). The development of the International Corpus of English. In K. Aijmer & B. Altenberg (Eds.), English corpus linguistics (pp. 83–91). Longman.
Greenbaum, S. (Ed.). (1996). Comparing English worldwide: The International Corpus of English. Clarendon Press.
Greenbaum, S., & Svartvik, J. (1990). The London–Lund Corpus of Spoken English. Lund University Press.
Hutchins, W. J., & Somers, H. L. (1992). An introduction to machine translation. Academic Press.
Johansson, S. (1980). Some aspects of the vocabulary of learned and scientific English. In S. Greenbaum, G. Leech, & J. Svartvik (Eds.), Studies in English linguistics for Randolph Quirk (pp. 123–147). Longman.
Johansson, S., Leech, G., & Goodluck, H. (1978). Manual of information to accompany the Lancaster–Oslo/Bergen Corpus of British English. Norwegian Computing Centre for the Humanities.
Kennedy, G. (1998). An introduction to corpus linguistics. Longman.
Koehn, P. (2010). Statistical machine translation. Cambridge University Press. https://doi.org/10.1017/CBO9780511815829
Laviosa, S. (2002). Corpus-based translation studies: Theory, findings, applications. Rodopi.
Leech, G. (1991). The state of the art in corpus linguistics. In K. Aijmer & B. Altenberg (Eds.), English corpus linguistics: Studies in honour of Jan Svartvik (pp. 8–29). Longman.
McEnery, T., & Hardie, A. (2012). Corpus linguistics: Method, theory and practice. Cambridge University Press. https://doi.org/10.1017/CBO9780511981395
McEnery, T., & Wilson, A. (2001). Corpus linguistics: An introduction (2nd ed.). Edinburgh University Press.
Nivre, J., de Marneffe, M.-C., Ginter, F., Goldberg, Y., Hajic, J., Manning, C. D., … Zeman, D. (2016). Universal Dependencies v1: A multilingual treebank collection. Proceedings of LREC 2016, 1659–1666.
O’Brien, S. (2012). Translation as human–computer interaction. Translation Spaces, 1(1), 101–122. https://doi.org/10.1075/ts.1.06obr
O’Brien, S., Balling, L. W., Carl, M., Simard, M., & Specia, L. (2014). Post-editing of machine translation: Processes and applications. Cambridge Scholars Publishing.
Oflazer, K. (1994). Two-level description of Turkish morphology. Literary and Linguistic Computing, 9(2), 137–148. https://doi.org/10.1093/llc/9.2.137
Renouf, A. (1987). Corpus development. In J. Sinclair (Ed.), Looking up (pp. 1–40). Collins ELT.
Quirk, R. (1968). The Survey of English Usage. In Essays on the English language: Medieval and modern (Chap. 7). Longman.
Sak, H., Güngör, T., & Saraçlar, M. (2008). Turkish language resources: Morphological parser, morphological disambiguator and web corpus. In Proceedings of the International Conference on Natural Language Processing (pp. 417–427). Springer.
Say, B., Zeyrek, D., Oflazer, K., & Özge, U. (2002). Development of a corpus and a treebank for present-day written Turkish. In Proceedings of the 11th International Conference of Turkish Linguistics (pp. 183–192).
Say, B., Zeyrek Bozşahin, D., Oflazer, K., & Özge, U. (2004). Development of a corpus and a treebank for present-day written Turkish. https://hdl.handle.net/11511/81526
Sezer, T. (2017). TS Corpus Project: An online Turkish dictionary and TS DIY corpus. European Journal of Language and Literature, 9, 18–24. https://doi.org/10.26417/ejls.v9i1.18
Shao, X., & Zhu, Y. (2025). The assistance role of LLMs and NMT in student translators’ Chinese–English post-editing. Journal of China Computer-Assisted Language Learning.
Sinclair, J. M. (1991). Corpus, concordance, collocation. Oxford University Press.
Summers, D. (1991). Longman/Lancaster English Language Corpus: Criteria and design. Longman.
Sulubacak, U., Gökırmak, M., Tyers, F., Çöltekin, Ç., Nivre, J., & Eryiğit, G. (2016). Universal dependencies for Turkish. In Proceedings of COLING 2016 (pp. 3444–3454).
Tiedemann, J. (2011). Bitext alignment. Morgan & Claypool. https://doi.org/10.2200/S00320ED1V01Y201102HLT014
Tiedemann, J. (2012). Parallel data, tools and interfaces in OPUS. Proceedings of LREC 2012, 2214–2218.
Toury, G. (1995). Descriptive translation studies and beyond. John Benjamins.
Türk, U., Atmaca, F., Özateş, Ş. B., Berk, G., Bedir, S. T., Köksal, A., Öztürk, B., Güngör, T., & Özgür, A. (2021). Resources for Turkish dependency parsing: Introducing the BOUN Treebank and the BoAT Annotation Tool. Language Resources and Evaluation, 1–49.
Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., … Dean, J. (2016). Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv. https://arxiv.org/abs/1609.08144
Zanettin, F. (2012). Translation-driven corpora: Corpus resources in descriptive and applied translation studies. St. Jerome.
Zanettin, F., Bernardini, S., & Stewart, D. (Eds.). (2003). Corpora in translator education. St. Jerome.
Zeyrek, D., Demirşahin, I., Sevdik-Çallı, A., Bayezit, E., & Yücebaş, E. (2012). Development of a corpus and a treebank for present-day written Turkish. Proceedings of LREC 2012, 1057–1063