This page showcases my work as a machine learning engineer and software developer, spanning data science, automation pipelines, cross-platform applications, and developer tooling.
Professional Work
My professional work centers on machine learning research and data processing pipelines for linguistic analysis. During my 3-year paid internship on the PRODIS project, I built the complete ML and data processing infrastructure, including a first-of-its-kind phoneme-level GPT model for Polish.
The majority of this code is not publicly available due to legal and privacy constraints (e.g., GDPR-protected personal data). Some components may be open-sourced in the future.
Name | Stack | Type | Description |
---|---|---|---|
model | Python, PyTorch, Pandas, Matplotlib | CLI tool | Pipeline for training a phoneme-level GPT model on Polish IPA with custom tokenizer and multithreaded postprocessing tools. |
asr | Python, Whisper | CLI tool | CLI wrapper around OpenAI Whisper for batch automatic speech recognition with stereo-to-mono conversion. |
datasets | Python, Pandas | CLI tool | Data processing scripts for downloading, cleaning, and phonemizing OpenSubtitles, OSCAR, and Wikipedia corpora into IPA. |
survey | Python, Pandas | CLI tool | CI-based tool for cleaning and standardizing Microsoft Forms exports with translation and data validation. |
transcriptions | Python | CLI tool | CI-based repository scanner that verifies audio transcription structure and generates verification status reports. |
wordfreq-stress | Python, Pandas | CLI tool | Data processing scripts for Polish word frequency statistics and X-SAMPA syllabification with cleaning and alignment. |
prodis-opus19.github.io | HTML, CSS, JavaScript | Website | Main website for the PRODIS project, with experiment subpage for collecting data. |
fattura | C++17, SFML2 | Desktop app | Cross-platform GUI app for editing verification statuses with auto-saving and keyboard navigation. |
Personal Projects
I believe in the saying “if you want to understand how something works, build it yourself.” My personal projects thus tackle challenging programming problems from the ground up. A notable example is vroom
, a 2D game engine built from scratch.
As a developer, I also build tools to eliminate repeating problems that slow down my workflow. One such project is header-warden
, a static C++ dependency checker that’s now part of my C++ workflow.
Name | Stack | Type | Description |
---|---|---|---|
vroom | C++20, SFML3, ImGui | Game Engine | Cross-platform 2D racing game with arcade drift physics, procedurally generated tracks, and waypoint AI. |
header-warden | C++17 | CLI tool | Cross-platform multithreaded CLI tool that identifies and reports missing standard library headers in C++ code. |
aegyo | C++17, SFML3 | Desktop app | Cross-platform GUI app for learning Korean Hangul with mouse and keyboard input. |
yt-table | C++17 | CLI tool | Cross-platform CLI tool for managing YouTube subscriptions locally through a shell-like interface. |
asset-packer | C17 | CLI tool | *nix CLI tool for embedding assets (e.g., images, sounds, fonts) into C++ headers. |
applefetch | C++17 | CLI tool | macOS CLI system information tool, inspired by neofetch. |
py-template | Python, Poetry | Template | Minimal Python project template (poetry, pytest, CI). |
google-usa-search | JavaScript | Browser extension | Firefox extension that forces Google to display search results in American English. |