This page showcases my work as a machine learning engineer and software developer, spanning data science, automation pipelines, cross-platform applications, and developer tooling.
Professional Work
My prior work focused on applied machine learning and large-scale linguistic data processing.
During a 3-year paid internship on the PRODIS project, I developed the project’s complete ML and data infrastructure, including the first phoneme-level GPT model for Polish.
Due to legal and privacy restrictions involving GDPR-protected data, most source code cannot be released publicly. Select components may be open-sourced later.
| Name | Stack | Type | Description |
|---|---|---|---|
| model | Python, PyTorch, Pandas, Matplotlib | CLI tool | Pipeline for training a phoneme-level GPT model on Polish IPA with custom tokenizer and multithreaded postprocessing tools. |
| asr | Python, Whisper | CLI tool | CLI wrapper around OpenAI Whisper for batch automatic speech recognition with stereo-to-mono conversion. |
| datasets | Python, Pandas | CLI tool | Data processing scripts for downloading, cleaning, and phonemizing OpenSubtitles, OSCAR, and Wikipedia corpora into IPA. |
| survey | Python, Pandas | CLI tool | CI-based tool for cleaning and standardizing Microsoft Forms exports with translation and data validation. |
| transcriptions | Python | CLI tool | CI-based repository scanner that verifies audio transcription structure and generates verification status reports. |
| wordfreq-stress | Python, Pandas | CLI tool | Data processing scripts for Polish word frequency statistics and X-SAMPA syllabification with cleaning and alignment. |
| prodis-opus19.github.io | HTML, CSS, JavaScript | Website | Main website for the PRODIS project, with experiment subpage for collecting data. |
| fattura | C++17, SFML2 | Desktop app | Cross-platform GUI app for editing verification statuses with auto-saving and keyboard navigation. |
Personal Projects
My independent work focuses on low-level system design and automation.
A representative example is vroom, a 2D game engine developed from scratch. Other projects, such as header-warden, have become part of my regular C++ development workflow.
| Name | Stack | Type | Description |
|---|---|---|---|
| vroom | C++20, SFML3, ImGui | Game Engine | Cross-platform 2D racing game with arcade drift physics, procedurally generated tracks, and waypoint AI. |
| header-warden | C++17 | CLI tool | Cross-platform multithreaded CLI tool that identifies and reports missing standard library headers in C++ code. |
| aegyo | C++17, SFML3 | Desktop app | Cross-platform GUI app for learning Korean Hangul with mouse and keyboard input. |
| ungpt | C++20, SFML3 | Desktop app | Cross-platform GUI app that converts ChatGPT’s smart punctuation and symbols to plain ASCII. |
| yt-table | C++17 | CLI tool | Cross-platform CLI tool for managing YouTube subscriptions locally through a shell-like interface. |
| asset-packer | C17 | CLI tool | *nix CLI tool for embedding assets (e.g., images, sounds, fonts) into C++ headers. |
| applefetch | C++17 | CLI tool | macOS CLI system information tool, inspired by neofetch. |
| py-template | Python, Poetry | Template | Minimal Python project template (poetry, pytest, CI). |
| google-usa-search | JavaScript | Browser extension | Firefox extension that forces Google to display search results in American English. |