Academic Work

Publications

Research spanning AI for Social Good, Natural Language Processing, Signal Processing, and Wireless Communications Engineering. Work spans journal articles, international conference papers, and ongoing PhD research.

Journal Article● Published2026
Inter‐Model Feature Fusion for Robust Low‐Resource Speech Recognition
Kimanuka, U., Ciira wa Maina, Büyük, O.
Applied AI Letters
Substantial improvements in automatic speech recognition performance have been realized through supervised fine‐tuning after self‐supervised pre‐training of a speech foundation model. The large size of foundation models, along with their varied losses and objective functions, makes it impractical to obtain optimum results with these models, and fine‐tuning each model independently for several downstream tasks is prohibitively expensive. The proposed methodology consists of three phases: feature extraction, feature fusion, and prediction. During feature extraction, several pre‐trained models, each with varying losses and objective functions, are used to derive representations. Then, a designed co‐attentional fusion mechanism is applied during feature fusion, enabling the network to adaptively weight different fusion operations to acquire common representations across models. Finally, a connectionist temporal classification (CTC) layer is used as a framework to generate transcription predictions. Moreover, the proposed self‐supervised feature‐fusion transformer block (SSF‐FT), incorporating inter‐model techniques, effectively captures both shared and distinctive information across all fused representations. We conducted an interpretability study in high‐resource (English) and low‐resource (Congolese) scenarios. In both settings, we observe that features performing well with shallow ensemble methods also perform well with attention‐weighted soft mixing. Experimental results demonstrate that our approach offers complementary strengths to existing ensemble techniques, with particular improvements in acoustically challenging and low‐resource scenarios.
speech recognitionfeature fusionself-supervised learningfoundation modelslow-resource ASRco-attentionCTCCongolese languages
View Details →External Link ↗DOI ↗
Journal Article● Published2025
A Congolese Swahili Task-Oriented Dialogue System for Addressing Humanitarian Crises
Kimanuka, U., Ciira wa Maina, Büyük, O., Masika Kassay Godelive
Engineering, Technology & Applied Science Research
As Artificial Intelligence (AI) advances, conversational agents are increasingly used across sectors, including humanitarian response. However, current systems and datasets mainly support high-resource languages and open-domain tasks, resulting in significant limitations in addressing low-resource, domain-specific needs. This study addresses this gap by focusing on a Congolese Swahili corpus collected from Short Message Service (SMS) messages and call-center humanitarian questions to develop an effective conversational agent for low-resource languages that supports communication during humanitarian crises. The goal of this research is to develop an effective Task-Oriented Dialogue System (ToDS) to assist displaced persons seeking humanitarian information in Congolese Swahili. We built a pipeline-based ToDS that converts natural language into SPARQL by utilizing a trained Named Entity Recognition (NER) model and a Dual Intent and Entity Transformer (DIET) classifier. This ToDS includes a humanitarian-specific ontology and dynamically queries a local triple store with data derived from the Humanitarian Data Exchange (HDX). The preliminary results indicate high accuracy in entity recognition and intent classification, which enables precise and timely information responses. The agent effectively provides context-relevant answers to humanitarian questions in crisis interactions. The findings demonstrate that applying Natural Language Understanding (NLU) methods in a low-resource, crisis-based context is viable and impactful. This ToDS offers a scalable solution for improving information accessibility in humanitarian emergencies and during forced internal displacements.
conversational AITask-Oriented Dialogue SystemSPARQLontologylow-resource languageshumanitarian crisisNERDIET classifier
View Details →External Link ↗DOI ↗
Journal Article● Published2025
AI Governance through Fractal Scaling: Integrating Universal Human Rights with Emergent Self-Governance for Democratized Technosocial Systems
Eglash, R., Nayebare, M., Robinson, K., Robert, L., Bennett, A., Kimanuka, U., Maina, C.
AI & Society – Springer
📚 6 citations
One of the challenges facing AI governance is the need for multiple scales. Universal human rights require a global scale. If someone asks AI if education is harmful to women, the answer should be "no" regardless of their location. But economic democratization requires local control: if AI's power over an economy is dictated by corporate giants or authoritarian states, it may degrade democracy's social and environmental foundations. AI democratization, in other words, needs to operate across multiple scales. Nature allows the multiscale flourishing of biological systems through fractal distributions. In this paper, we show that key elements of the fractal scaling found in nature can be applied to the AI democratization process. We begin by looking at fractal trees in nature and applying similar analytics to tree representations of online conversations. We first examine this application in the context of OpenAI's "Democratic Inputs" projects for globally acceptable policies. We then look at the advantages of independent AI ownership at local micro-levels, reporting on initial outcomes for experiments with AI and related technologies in community-based systems. Finally, we offer a synthesis of the two, micro and macro, in a multifractal model. Just as nature allows multifractal systems to maximize biodiverse flourishing, we propose a combination of community-owned AI at the micro-level, and globally democratized AI policies at the macro-level, for a more egalitarian and sustainable future.
fractaldemocratic AIegalitarianself-organizationOpenAIAI governancehuman rightstechnosocial systems
View Details →External Link ↗DOI ↗
Conference Paper● Published2025
Leveraging Electronic Syndromic Surveillance Synthetic Data to Predict Diarrhoea in Zimbabwean Children Under-Five: An Explainable AI Framework
Chikotie, T., Watson, B., Kimanuka, U., Banda, T.
2025 IST-Africa Conference (IST-Africa)
📚 1 citation
This study investigates the use of synthetic data in developing an electronic syndromic surveillance system to predict diarrhoeal outcomes among Zimbabwean children under five. Given the high morbidity and mortality rates linked to diarrhoeal diseases in Zimbabwe, implementing a real-time surveillance system can significantly enhance early outbreak detection and response. Machine learning models, including Random Forest, XGBoost, and Long Short-Term Memory (LSTM), were trained on synthetic data created through Generative Adversarial Networks (GANs), simulating real-world conditions and enhancing dataset diversity. Explainable AI technique like SHAP was employed for model interpretability, revealing crucial predictors, such as healthcare-seeking behaviours and socio-demographic factors, which drive diarrheal outcomes. The findings emphasise the potential of AI-driven surveillance in low-resource settings, offering actionable insights for public health interventions. This research provides a foundational framework for implementing electronic syndromic surveillance to improve public health resilience in Zimbabwe.
syndromic surveillancediarrhoeaexplainable AIsynthetic dataGANSHAPpublic healthZimbabwe
View Details →External Link ↗DOI ↗
Journal Article● Published2024
Speech Recognition Datasets for Low-Resource Congolese Languages
Kimanuka, U., Ciira wa Maina, Büyük, O.
Elsevier Data in Brief
📚 20 citations
Large pre-trained Automatic Speech Recognition (ASR) models have shown improved performance in low-resource languages due to the increased availability of benchmark corpora and the advantages of transfer learning. However, only a limited number of languages possess ample resources to fully leverage transfer learning. In such contexts, benchmark corpora become crucial for advancing methods. In this article, we introduce two new benchmark corpora designed for low-resource languages spoken in the Democratic Republic of the Congo: the Lingala Read Speech Corpus, with 4 hours of labelled audio, and the Congolese Speech Radio Corpus, which offers 741 hours of unlabelled audio spanning four significant low-resource languages of the region. During data collection, Lingala Read Speech recordings of thirty-two distinct adult speakers, each with a unique context under various settings with different accents, were recorded. Concurrently, Congolese Speech Radio raw data were taken from the archive of a broadcast station, followed by a designed curation process. The datasets, freely accessible to all researchers, serve as a valuable resource for investigating and developing monolingual and multilingual approaches for linguistically similar and distant languages. Using supervised and self-supervised learning techniques, they enable inaugural benchmarking of speech recognition systems for Lingala and the first multilingual model tailored for four Congolese languages spoken by an aggregated population of 95 million.
ASRlow-resource languagesCongolese languagesLingalaspeech corpustransfer learningself-supervised learningDRC
View Details →External Link ↗DOI ↗
Conference Paper● Published2023
MasakhaneNews: News Topic Classification for African Languages
Adelani, D. I., Masiak, M., Azime, I. A., Alabi, J., Tonja, A. L., Mwase, C., Ogundepo, O., Kimanuka, U., et al.
Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
📚 55 citations
Despite representing roughly a fifth of the world population, African languages are underrepresented in NLP research, in part due to a lack of datasets. While there are individual language-specific datasets for several tasks, only a handful of tasks (e.g. named entity recognition and machine translation) have datasets covering geographical and typologically-diverse African languages. In this paper, we develop MasakhaNEWS — the largest dataset for news topic classification covering 16 languages widely spoken in Africa. We provide and evaluate a set of baseline models by training classical machine learning models and fine-tuning several language models. Furthermore, we explore several alternatives for transfer learning to improve classification in low-resource settings.
news classificationAfrican languagesNLPMasakhanemultilingualtext classification
View Details →External Link ↗
Technical Report● Published2023
Interim Report for Ubuntu-AI: A Bottom-up Approach to More Democratic and Equitable Training and Outcomes for Machine Learning
Nayebare, M., Eglash, R., Kimanuka, U., Baguma, R., Mounsey, J., wa Maina, C.
Democratic Inputs for AI (Conference / OpenAI Grant Report)
📚 8 citations
Artificial Intelligence (AI) can be a threat to creative arts and design, taking data and images without permission or compensation. But with AI becoming a global portal for human knowledge access, anyone resisting inclusion in its data inputs will become invisible to its outputs. This is the AI double bind, in which the threat of exclusion forces us to give up any claims of ownership to our creative endeavors. To address such problems, this project develops an experimental platform designed to return value to those who create it, using a case study on African arts and design. If successful, it will allow African creatives to work with AI instead of against it, creating new opportunities for funding, gaining wider dissemination of their work, and creating a database for machine learning that results in more inclusive knowledge of African arts and design for AI outputs.
democratic AIAfrican creativesAI governanceequitable MLOpenAIdata equity
View Details →External Link ↗
Journal Article● Published2018
Turkish Speech Recognition Based on Deep Neural Networks
Kimanuka, U. A., Buyuk, O.
Süleyman Demirel University Journal of Natural and Applied Sciences (Special Issue)
📚 21 citations
In this paper we develop a Turkish speech recognition (SR) system using deep neural networks and compare it with the previous state-of-the-art traditional Gaussian mixture model-hidden Markov model (GMM-HMM) method using the same Turkish speech dataset and the same large vocabulary Turkish corpus. Nowadays most SR systems deployed worldwide and particularly in Turkey use Hidden Markov Models to deal with the speech temporal variations. Gaussian mixture models are used to estimate the amount at which each state of each HMM fits a short frame of coefficients which is the representation of an acoustic input. A deep neural network consisting of a feed-forward neural network is another way to estimate the fit; this neural network takes as input several frames of coefficients and gives as output posterior probabilities over HMM states. It has been shown that the use of deep neural networks can outperform the traditional GMM-HMM in other languages such as English and German. The fact that Turkish language is an agglutinative language and the lack of a huge amount of speech data complicate the design of a performant SR system. By making use of deep neural networks we will obviously improve the performance but still we will not achieve better results than English language due to the difference in the availability of speech data. We present various architectural and training techniques for the Turkish DNN-based models. The models are tested using a Turkish database collected from mobile devices. In the experiments, we observe that the Turkish DNN-HMM system decreased the word error rate approximately 2.5% when compared to the GMM-HMM traditional system.
Turkish speech recognitiondeep neural networkGaussian mixture modelHidden Markov modelGMM-HMMDNN-HMM
View Details →External Link ↗DOI ↗

* Some entries are placeholder / in preparation. DOI links will be added upon publication. Contact me directly for preprint copies.

Publications

Inter‐Model Feature Fusion for Robust Low‐Resource Speech Recognition

A Congolese Swahili Task-Oriented Dialogue System for Addressing Humanitarian Crises

AI Governance through Fractal Scaling: Integrating Universal Human Rights with Emergent Self-Governance for Democratized Technosocial Systems

Leveraging Electronic Syndromic Surveillance Synthetic Data to Predict Diarrhoea in Zimbabwean Children Under-Five: An Explainable AI Framework

Speech Recognition Datasets for Low-Resource Congolese Languages

MasakhaneNews: News Topic Classification for African Languages

Interim Report for Ubuntu-AI: A Bottom-up Approach to More Democratic and Equitable Training and Outcomes for Machine Learning

Turkish Speech Recognition Based on Deep Neural Networks