This page showcases my work as a machine learning engineer and software developer, spanning data science, automation pipelines, cross-platform applications, and developer tooling.


Professional Work

My professional work centers on machine learning research and data processing pipelines for linguistic analysis. During my 3-year paid internship on the PRODIS project, I built the complete ML and data processing infrastructure, including a first-of-its-kind phoneme-level GPT model for Polish.

The majority of this code is not publicly available due to legal and privacy constraints (e.g., GDPR-protected personal data). Some components may be open-sourced in the future.

NameStackTypeDescription
modelPython, PyTorch, Pandas, MatplotlibCLI toolPipeline for training a phoneme-level GPT model on Polish IPA with custom tokenizer and multithreaded postprocessing tools.
asrPython, WhisperCLI toolCLI wrapper around OpenAI Whisper for batch automatic speech recognition with stereo-to-mono conversion.
datasetsPython, PandasCLI toolData processing scripts for downloading, cleaning, and phonemizing OpenSubtitles, OSCAR, and Wikipedia corpora into IPA.
surveyPython, PandasCLI toolCI-based tool for cleaning and standardizing Microsoft Forms exports with translation and data validation.
transcriptionsPythonCLI toolCI-based repository scanner that verifies audio transcription structure and generates verification status reports.
wordfreq-stressPython, PandasCLI toolData processing scripts for Polish word frequency statistics and X-SAMPA syllabification with cleaning and alignment.
prodis-opus19.github.ioHTML, CSS, JavaScriptWebsiteMain website for the PRODIS project, with experiment subpage for collecting data.
fatturaC++17, SFML2Desktop appCross-platform GUI app for editing verification statuses with auto-saving and keyboard navigation.

Personal Projects

I believe in the saying “if you want to understand how something works, build it yourself.” My personal projects thus tackle challenging programming problems from the ground up. A notable example is vroom, a 2D game engine built from scratch.

As a developer, I also build tools to eliminate repeating problems that slow down my workflow. One such project is header-warden, a static C++ dependency checker that’s now part of my C++ workflow.

NameStackTypeDescription
vroomC++20, SFML3, ImGuiGame EngineCross-platform 2D racing game with arcade drift physics, procedurally generated tracks, and waypoint AI.
header-wardenC++17CLI toolCross-platform multithreaded CLI tool that identifies and reports missing standard library headers in C++ code.
aegyoC++17, SFML3Desktop appCross-platform GUI app for learning Korean Hangul with mouse and keyboard input.
yt-tableC++17CLI toolCross-platform CLI tool for managing YouTube subscriptions locally through a shell-like interface.
asset-packerC17CLI tool*nix CLI tool for embedding assets (e.g., images, sounds, fonts) into C++ headers.
applefetchC++17CLI toolmacOS CLI system information tool, inspired by neofetch.
py-templatePython, PoetryTemplateMinimal Python project template (poetry, pytest, CI).
google-usa-searchJavaScriptBrowser extensionFirefox extension that forces Google to display search results in American English.