References
Allal, L., & von Werra, L. (2024). Huggingface/text–clustering: Easily embed, cluster and semantically label text datasets. GitHub. Retrieved June 25, 2025 from https://github.com/huggingface/text-clustering
Akiba, T., Sano, S., Yanase, T., Ohta, T., & Koyama, M. (2019). Optuna: A next-generation hyperparameter optimization framework. Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. https://doi.org/10.1145/3292500.3330701
A
Akshat, A., Tripathi, K., Raj, G., Sar, A., Choudhury, T., Saraf, S., & Dewangan, B. K. (2024, June). A comparative study between chat GPT, T5 and LSTM for machine language translation. In
2024 OPJU International Technology Conference (OTCON) on Smart Computing for Innovation and Advancement in Industry 4.0 (pp. 1–6). IEEE.
Alves, D. M., Pombal, J., Guerreiro, N. M., Martins, P. H., Alves, J., Farajian, A., Peters, B., Rei, R., Fernandes, P., Agrawal, S., & Colombo, P. (2024). Tower: An open multilingual large language model for translation–related tasks. CoRR. https://doi.org/10.48550/arXiv.2402.17733. arXiv:2402.17733.
A
Anthropic (2025, May). System card: Claude Opus 4 & Claude Sonnet 4.
https://www-cdn.anthropic.com/4263b940cabb546aa0e3283f35b686f4f3b2ff47.pdfA
Brants, T., Popat, A., Xu, P., Och, F. J., & Dean, J. (2007, June). Large language models in machine translation. In
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (pp. 858–867).
Chen, X., Wang, H., & Xiang, W. (2019). Implementation of Tibetan–Chinese translation platform based on LSTM algorithm. Proceedings of the ACM Turing Celebration Conference – China (ACM TURC ’19) (Article 142), 1–5. https://doi.org/10.1145/3321408.3326670
Chu, C., & Wang, R. (2018). A survey of domain adaptation for neural machine translation. In Proceedings of the 27th International Conference on Computational Linguistics (pp. 1304–1319). Association for Computational Linguistics. https://doi.org/10.48550/arXiv.1806.00258
DeHaven, M., & Billa, J. (2022). Improving low-resource speech recognition with pretrained speech models: Continued pretraining vs. semi-supervised training. arXiv preprint arXiv:2207.00659.
Deshwal, M., & Chawla, A. (2024). PHUDGE: Phi-3 as Scalable Judge. arXiv e-prints, arXiv-2405.
Dewangan, V., Suri, G., Raj, S., & Sonavane, R. (2025). When every token counts: Optimal segmentation for low–resource language models. Proceedings of the First Workshop on Language Models for Low–Resource Languages, 294–308. Association for Computational Linguistics.
Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., & Ganapathy, R. (2024). The llama 3 herd of models. arXiv e-prints, arXiv-2407.
Fan, A., Bhosale, S., Schwenk, H., Ma, Z., El–Kishky, A., Goyal, S., & Ott, M. (2021). Beyond English-centric multilingual machine translation. Journal of Machine Learning Research, 22(107), 1–48. https://doi.org/10.48550/arXiv.2010.11125
Freitag, M., & Al-Onaizan, Y. (2016). Fast Domain Adaptation for Neural Machine Translation. arXiv e-prints, arXiv-1612.
Gan, S., Yin, Y., Jiang, Z., Xie, L., & Lu, S. (2023). October. Towards real-time sign language recognition and translation on edge devices. In Proceedings of the 31st ACM International Conference on Multimedia (pp. 4502–4512).
A
Gemma Team, Kamath, A., Ferret, J., Pathak, S., Vieillard, N., Merhej, R., Perrin, S., Matejovicova, T., Rouillard, L., Mesnard, T., & Cideron, G. (2025). Gemma 3 Technical Report.
arXiv e-prints,
https://doi.org/10.48550/arXiv.2503.19786Gordon, M. A., Duh, K., & Kaplan, J. (2021). Data and parameter scaling laws for neural machine translation. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 5915–5922. Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.emnlp-main.478
Hurst, A., Lerer, A., Goucher, A. P., Perelman, A., Ramesh, A., Clark, A., & Kivlichan, I. (2024). GPT-4o System Card. arXiv e-prints, arXiv-2410.
Jiao, X., Yin, Y., Shang, L., Jiang, X., Chen, X., Li, L., Wang, F., & Liu, Q. (2020). TinyBERT: Distilling BERT for natural language understanding. Findings of EMNLP 2020, 4163–4174. Association for Computational Linguistics. https://doi.org/10.18653/v1/2020.findings-emnlp.372
Kim, Y., & Rush, A. M. (2016). Sequence-level knowledge distillation. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 1317–1327. Association for Computational Linguistics. https://doi.org/10.18653/v1/D16-1139
Khalili, L., You, Y., & Bohannon, J. (2022). BabyBear: Cheap inference triage for expensive language models. arXiv e-prints, pp.arXiv-2205.
A
Khandro, S. (2025).
Prayer to Ārya Tārā. (A. Pearcey, Trans). Lotsawa House. Retrieved June 23, 2025, from
https://www.lotsawahouse.org/tibetan-masters/sera-khandro/tara-prayer-protect-all-fears [Licensed under CC BY-NC 4.0].
Kocmi, T., & Federmann, C. (2023). Large Language Models Are State-of-the-Art Evaluators of Translation Quality. Proceedings of the 24th Annual Conference of the European Association for Machine Translation, 193–203.
Le, C. (2025). Privacy-Preserving Real-Time Vietnamese-English Translation on iOS using Edge AI. arXiv preprint arXiv:2505.07583.
Liu, Y. (2025). Improving machine translation accuracy for underrepresented languages in linguistic research using transformer models. Journal of Computational Methods in Sciences and Engineering, p.14727978251337995.
A
Lommel, A. R., Burchardt, A., & Uszkoreit, H. (2013). Multidimensional quality metrics: a flexible system for assessing translation quality. In
Proceedings of Translating and the Computer 35.
Magister, L. C., Mallinson, J., Adamek, J., Malmi, E., & Severyn, A. (2023). Teaching Small Language Models to Reason. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 1773–1781).
McInnes, L., Healy, J., & Astels, S. (2017). hdbscan: Hierarchical density based clustering (JOSS 2(11), 205).
McInnes, L., Healy, J., Saul, N., & Großberger, L. (2018). UMAP: Uniform manifold approximation and projection. Journal of Open Source Software, 3(29), 861. https://doi.org/10.48550/arXiv.1802.03426
Mistral, A. I. (2023). Mixtral-8x7b-instruct-v0. 1
Miyagawa, S. (2023). Machine translation for highly low-resource language: A case study of ainu, a critically endangered indigenous language in northern Japan. Proceedings of the Joint 3rd International Conference on Natural Language Processing for Digital Humanities and 8th International Workshop on Computational Linguistics for Uralic Languages (pp. 120–124).
Nehrdich, S., & Keutzer, K. (2025). MITRA: A Large-Scale Parallel Corpus and Multilingual Pretrained Language Model for Machine Translation and Semantic Retrieval for Pāli, Sanskrit, Buddhist Chinese, and Tibetan. Unpublished manuscript.
Nguyen, X. P., Aljunied, M., Joty, S., & Bing, L. (2024). Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 3501–3516. https://doi.org/10.18653/v1/2024.acl-long.192
A
Palzang, K. N. (2024).
Bending mind toward good (J. McClellan, Trans.). Lotsawa House. Retrieved June 23, 2025, from
https://www.lotsawahouse.org/tibetan-masters/khenchen-ngawang-palzang/advice-bending-mind-to-good [Licensed under CC BY-NC 4.0].
Pang, J., Yang, B., Wong, D. F., Wan, Y., Liu, D., Chao, L. S., & Xie, J. (2024). Rethinking the exploitation of monolingual data for low–resource neural machine translation. Computational Linguistics, 50(1), 25–47. https://doi.org/10.1162/coli_a_00496
Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2002). Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics (pp. 311–318).
Popović, M. (2015). chrF: character n-gram F-score for automatic MT evaluation. In Proceedings of the tenth workshop on statistical machine translation (pp. 392–395).
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text–to–text transformer. Journal of Machine Learning Research, 21(140), 1–67. https://doi.org/10.48550/arXiv.1910.10683
Reimers, N., & Gurevych, I. (2020). Making monolingual sentence embeddings multilingual using knowledge distillation. In Proceedings of the Fifth Conference on Machine Translation, 1174–1182. https://doi.org/10.48550/arXiv.2004.09813
Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv e-prints, arXiv-1910.
Schwenk, H., Wenzek, G., Edunov, S., Grave, É., Joulin, A., & Fan, A. (2021). CCMatrix: Mining billions of high-quality parallel sentences on the web. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics, 1, 6490–6500. https://doi.org/10.18653/v1/2021.acl-long.507
Sennrich, R., Haddow, B., & Birch, A. (2015). Neural Machine Translation of Rare Words with Subword Units. ArXiv, abs/1508.07909.
Sennrich, R., & Zhang, B. (2019). Revisiting low–resource neural machine translation: A case study. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 211–221. https://doi.org/10.18653/v1/P19-1021
Shazeer, N., & Stern, M. (2018). Adafactor: Adaptive learning rates with sublinear memory cost. Proceedings of the International Conference on Machine Learning, 4596–4604. https://doi.org/10.48550/arXiv.1804.04235
Shazeer, N. (2020). Glu variants improve transformer. arXiv preprint arXiv:2002.05202.
Shibata, Y., Kida, T., Fukamachi, S., Takeda, M., Shinohara, A., Shinohara, T., & Arikawa, S. (1999). Byte pair encoding: A text compression scheme that accelerates pattern matching.
Shu, P., Chen, J., Liu, Z., Wang, H., Wu, Z., Zhong, T., Li, Y., Zhao, H., Jiang, H., Pan, Y., & Zhou, Y. (2024). Transcending Language Boundaries: Harnessing LLMs for Low-Resource Language Translation. arXiv e-prints, pp.arXiv-2411.
Snover, M., Dorr, B., Schwartz, R., Micciulla, L., & Makhoul, J. (2006). A study of translation edit rate with targeted human annotation. In Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers (pp. 223–231).
Suryakusuma, M. R., Shiddiq, M. F. A., Lucky, H., & Iswanto, I. A. (2023). November. Investigating T5 Generation Neural Machine Translation Performance on English to German. In 2023 International Conference on Informatics, Multimedia, Cyber and Informations System (ICIMCIS) (pp. 12–15). IEEE.
Tan, Z., Yang, Z., Zhang, M., Liu, Q., Sun, M., & Liu, Y. (2022). Dynamic multi-branch layers for on-device neural machine translation. IEEE/ACM Transactions on Audio Speech and Language Processing, 30, 958–967.
Tay, Y., Dehghani, M., Rao, J., Fedus, W., Abnar, S., Chung, H. W., & Metzler, D. (2021). Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers. arXiv e-prints, arXiv-2109.
Thupten, T., Rinchen, D., Nyima, T., Yu, Y., & Deng, Q. (2021). Research on Chinese–Tibetan machine translation model based on improved byte pair encoding. Journal of the University of Electronic Science and Technology of China, 50(2), 249–255. https://doi.org/10.12178/1001–0548.2020218
A
Tiedemann, J. (2012, May). Parallel data, tools and interfaces in OPUS. In
Lrec (Vol. 2012, pp. 2214–2218).
Tiedemann, J. (2020, November). The Tatoeba Translation Challenge–Realistic Data Sets for Low Resource and Multilingual MT. In Proceedings of the Fifth Conference on Machine Translation (pp. 1174–1182).
Tournadre, N. (2010). The Classical Tibetan cases and their transcategoriality: From sacred grammar to modern linguistics. Himalayan Linguistics, 9(2).
Tournadre, N., & Dorje, S. (2003). Manual of Standard Tibetan. Snow Lion.
Tsarfaty, R., Seddah, D., Kübler, S., & Nivre, J. (2013). Parsing morphologically rich languages: Introduction to the special issue. Computational linguistics, 39(1), 15–22.
Usui, H., & Komiya, K. (2023). December. Translation from Historical to Contemporary Japanese Using Japanese T5. In Proceedings of the Joint 3rd International Conference on Natural Language Processing for Digital Humanities and 8th International Workshop on Computational Linguistics for Uralic Languages (pp. 27–35).
Verma, N., Murray, K., & Duh, K. (2022). Strategies for adapting multilingual pre–training for domain–specific machine translation. Proceedings of the 15th Conference of the Association for Machine Translation in the Americas, 1, 31–44.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 5998–6008. https://doi.org/10.48550/arXiv.1706.03762
Wang, W., Wei, F., Dong, L., Bao, H., Yang, N., & Zhou, M. (2020). MiniLM: Deep self–attention distillation for task–agnostic compression of pre–trained transformers. NeurIPS, 33, 5776–5788. https://doi.org/10.48550/arXiv.2002.10957
Watt, T., Chrysoulas, C., & Gkatzia, D. (2023). October. Edge NLP for Efficient Machine Translation in Low Connectivity Areas. In 2023 IEEE 9th World Forum on Internet of Things (WF-IoT) (pp. 1–6). IEEE.
Wilson, J. B. (1998). Translating Buddhism from Tibetan. Snow Lion.
Wu, Z., Liu, Z., Lin, J., & Han, S. (2020). Lite Transformer with long–short range attention. Proceedings of the International Conference on Learning Representations. https://doi.org/10.48550/arXiv.2004.11886
Zaki, M. Z. (2024). Revolutionising translation technology: A comparative study of variant transformer models–BERT, GPT and T5. Computer Science and Engineering–An International Journal, 14(3), 15–27.
Zhang, T., Kishore, V., Wu, F., Weinberger, K. Q., & Artzi, Y. (2019). BERTScore: Evaluating Text Generation with BERT. arXiv e-prints, arXiv-1904.
Zhang, P., Zeng, G., Wang, T., & Lu, W. (2024). Tinyllama: An open-source small language model. arXiv preprint arXiv:2401.02385.
Zheng, J., Hong, H., Liu, F., Wang, X., Su, J., Liang, Y., & Wu, S. (2024). Fine-tuning large language models for domain-specific machine translation. arXiv preprint arXiv:2402.15061.
Zhou, M. (2024). Research on Tibetan-Chinese neural machine translation integrating statistical method. In Proceedings of the 2023 6th International Conference on Machine Learning and Natural Language Processing, 126–129. https://doi.org/10.1145/3639479.3639506