Academic Work
Publications
Research spanning AI for Social Good, Natural Language Processing, Signal Processing, and Wireless Communications Engineering. Work spans journal articles, international conference papers, and ongoing PhD research.
- Journal Article● Published2026
Inter‐Model Feature Fusion for Robust Low‐Resource Speech Recognition
Kimanuka, U., Ciira wa Maina, Büyük, O.
Applied AI Letters
Substantial improvements in automatic speech recognition performance have been realized through supervised fine‐tuning after self‐supervised pre‐training of a speech foundation model. The large size of foundation models, along with their varied losses and objective functions, makes it impractical to obtain optimum results with these models, and fine‐tuning each model independently for several downstream tasks is prohibitively expensive. The proposed methodology consists of three phases: feature extraction, feature fusion, and prediction. During feature extraction, several pre‐trained models, each with varying losses and objective functions, are used to derive representations. Then, a designed co‐attentional fusion mechanism is applied during feature fusion, enabling the network to adaptively weight different fusion operations to acquire common representations across models. Finally, a connectionist temporal classification (CTC) layer is used as a framework to generate transcription predictions. Moreover, the proposed self‐supervised feature‐fusion transformer block (SSF‐FT), incorporating inter‐model techniques, effectively captures both shared and distinctive information across all fused representations. We conducted an interpretability study in high‐resource (English) and low‐resource (Congolese) scenarios. In both settings, we observe that features performing well with shallow ensemble methods also perform well with attention‐weighted soft mixing. Experimental results demonstrate that our approach offers complementary strengths to existing ensemble techniques, with particular improvements in acoustically challenging and low‐resource scenarios.
speech recognitionfeature fusionself-supervised learningfoundation modelslow-resource ASRco-attentionCTCCongolese languages - Journal Article● Published2025
A Congolese Swahili Task-Oriented Dialogue System for Addressing Humanitarian Crises
Kimanuka, U., Ciira wa Maina, Büyük, O., Masika Kassay Godelive
Engineering, Technology & Applied Science Research
As Artificial Intelligence (AI) advances, conversational agents are increasingly used across sectors, including humanitarian response. However, current systems and datasets mainly support high-resource languages and open-domain tasks, resulting in significant limitations in addressing low-resource, domain-specific needs. This study addresses this gap by focusing on a Congolese Swahili corpus collected from Short Message Service (SMS) messages and call-center humanitarian questions to develop an effective conversational agent for low-resource languages that supports communication during humanitarian crises. The goal of this research is to develop an effective Task-Oriented Dialogue System (ToDS) to assist displaced persons seeking humanitarian information in Congolese Swahili. We built a pipeline-based ToDS that converts natural language into SPARQL by utilizing a trained Named Entity Recognition (NER) model and a Dual Intent and Entity Transformer (DIET) classifier. This ToDS includes a humanitarian-specific ontology and dynamically queries a local triple store with data derived from the Humanitarian Data Exchange (HDX). The preliminary results indicate high accuracy in entity recognition and intent classification, which enables precise and timely information responses. The agent effectively provides context-relevant answers to humanitarian questions in crisis interactions. The findings demonstrate that applying Natural Language Understanding (NLU) methods in a low-resource, crisis-based context is viable and impactful. This ToDS offers a scalable solution for improving information accessibility in humanitarian emergencies and during forced internal displacements.
conversational AITask-Oriented Dialogue SystemSPARQLontologylow-resource languageshumanitarian crisisNERDIET classifier - Journal Article● Published2025
AI Governance through Fractal Scaling: Integrating Universal Human Rights with Emergent Self-Governance for Democratized Technosocial Systems
Eglash, R., Nayebare, M., Robinson, K., Robert, L., Bennett, A., Kimanuka, U., Maina, C.
AI & Society – Springer
📚 6 citations
One of the challenges facing AI governance is the need for multiple scales. Universal human rights require a global scale. If someone asks AI if education is harmful to women, the answer should be "no" regardless of their location. But economic democratization requires local control: if AI's power over an economy is dictated by corporate giants or authoritarian states, it may degrade democracy's social and environmental foundations. AI democratization, in other words, needs to operate across multiple scales. Nature allows the multiscale flourishing of biological systems through fractal distributions. In this paper, we show that key elements of the fractal scaling found in nature can be applied to the AI democratization process. We begin by looking at fractal trees in nature and applying similar analytics to tree representations of online conversations. We first examine this application in the context of OpenAI's "Democratic Inputs" projects for globally acceptable policies. We then look at the advantages of independent AI ownership at local micro-levels, reporting on initial outcomes for experiments with AI and related technologies in community-based systems. Finally, we offer a synthesis of the two, micro and macro, in a multifractal model. Just as nature allows multifractal systems to maximize biodiverse flourishing, we propose a combination of community-owned AI at the micro-level, and globally democratized AI policies at the macro-level, for a more egalitarian and sustainable future.
fractaldemocratic AIegalitarianself-organizationOpenAIAI governancehuman rightstechnosocial systems - Journal Article● Published2024
Speech Recognition Datasets for Low-Resource Congolese Languages
Kimanuka, U., Ciira wa Maina, Büyük, O.
Elsevier Data in Brief
📚 20 citations
Large pre-trained Automatic Speech Recognition (ASR) models have shown improved performance in low-resource languages due to the increased availability of benchmark corpora and the advantages of transfer learning. However, only a limited number of languages possess ample resources to fully leverage transfer learning. In such contexts, benchmark corpora become crucial for advancing methods. In this article, we introduce two new benchmark corpora designed for low-resource languages spoken in the Democratic Republic of the Congo: the Lingala Read Speech Corpus, with 4 hours of labelled audio, and the Congolese Speech Radio Corpus, which offers 741 hours of unlabelled audio spanning four significant low-resource languages of the region. During data collection, Lingala Read Speech recordings of thirty-two distinct adult speakers, each with a unique context under various settings with different accents, were recorded. Concurrently, Congolese Speech Radio raw data were taken from the archive of a broadcast station, followed by a designed curation process. The datasets, freely accessible to all researchers, serve as a valuable resource for investigating and developing monolingual and multilingual approaches for linguistically similar and distant languages. Using supervised and self-supervised learning techniques, they enable inaugural benchmarking of speech recognition systems for Lingala and the first multilingual model tailored for four Congolese languages spoken by an aggregated population of 95 million.
ASRlow-resource languagesCongolese languagesLingalaspeech corpustransfer learningself-supervised learningDRC - Journal Article● Published2018
Turkish Speech Recognition Based on Deep Neural Networks
Kimanuka, U. A., Buyuk, O.
Süleyman Demirel University Journal of Natural and Applied Sciences (Special Issue)
📚 21 citations
In this paper we develop a Turkish speech recognition (SR) system using deep neural networks and compare it with the previous state-of-the-art traditional Gaussian mixture model-hidden Markov model (GMM-HMM) method using the same Turkish speech dataset and the same large vocabulary Turkish corpus. Nowadays most SR systems deployed worldwide and particularly in Turkey use Hidden Markov Models to deal with the speech temporal variations. Gaussian mixture models are used to estimate the amount at which each state of each HMM fits a short frame of coefficients which is the representation of an acoustic input. A deep neural network consisting of a feed-forward neural network is another way to estimate the fit; this neural network takes as input several frames of coefficients and gives as output posterior probabilities over HMM states. It has been shown that the use of deep neural networks can outperform the traditional GMM-HMM in other languages such as English and German. The fact that Turkish language is an agglutinative language and the lack of a huge amount of speech data complicate the design of a performant SR system. By making use of deep neural networks we will obviously improve the performance but still we will not achieve better results than English language due to the difference in the availability of speech data. We present various architectural and training techniques for the Turkish DNN-based models. The models are tested using a Turkish database collected from mobile devices. In the experiments, we observe that the Turkish DNN-HMM system decreased the word error rate approximately 2.5% when compared to the GMM-HMM traditional system.
Turkish speech recognitiondeep neural networkGaussian mixture modelHidden Markov modelGMM-HMMDNN-HMM
* Some entries are placeholder / in preparation. DOI links will be added upon publication. Contact me directly for preprint copies.