Data Scientist / ML Engineer

Aleks Kapich

Data Scientist with 3 years of professional experience working with Python, ML models, and data pipelines. Skilled in developing and optimizing machine learning solutions, including deep learning models and data-driven applications. Currently pursuing an MSc in Data Science at Warsaw University of Technology. Hobbyist of applying data into football analytics.

Core Capabilities

Deep Learning

  • Designing and training CNNs for vision tasks
  • Tuning transformer-based models for NLP and speech
  • Applying transfer learning and fine-tuning on domain-specific data
  • Evaluating models and performing error analysis

Classical Machine Learning

  • Performing feature engineering and preprocessing for tabular data
  • Selecting and optimizing models
  • Applying statistical modeling and validation strategies
  • Running experiments and benchmarking performance

ML Engineering

  • Building end-to-end ML pipelines
  • Tracking experiments and ensuring reproducibility
  • Developing containerized workflows (Docker)
  • Building interactive data and ML applications (Streamlit)

Projects

Showing 6 of 8 projects

Satellite Image Change Detection

Built a U-Net model with ResNet-34 encoder to detect changes between satellite image pairs from LEVIR-CD dataset.

Handled severe class imbalance with weighted loss, tracked experiments with MLflow, packaged in Docker container. Model resulted in Validation F1 0.81, further proved useful with Google Earth imagery beyond the training dataset.

  • PyTorch
  • Docker
  • MLflow
  • OpenCV

Speech Recognition with Transformers and CNNs

Compared Transformer and CNN approach (based on MEL spectrograms), for audio classification on TF Speech Recognition dataset.

Wav2Vec2 transformer embeddings with MLP/RNN heads achieved F1 0.92 and 94.4% accuracy. Tackled severe class imbalance with a two-stage CNN pipeline, reaching F1 equal 0.83.

  • PyTorch
  • Hugging Face Transformers

Reinforcement Learning Agent for Doom Game

Developed RL agents using PPO and Advantage Actor-Critic in the OpenAI Gym environment.

Built training pipeline with TensorBoard experiment tracking, model checkpointing, and hyperparameter optimization. Trained agents across multiple scenarios, achieving convergence through systematic reward shaping.

  • OpenAI Gym
  • Stable-Baselines3
  • OpenCV
  • TensorBoard

Image Classification using CNNs

Using PyTorch implemented and compared CNN architectures such as ResNet, VGG16 for 10-class image classification on the CINIC dataset

Experimented with data augmentation, regularisation, hyperparameter optimisation, and ensemble methods, with soft voting ensemble achieving 81.6% accuracy. Applied few-shot learning.

  • PyTorch
  • TorchVision
  • EasyFSL

Web App for Spatiotemporal Data Exploration

Streamlit-based app facilitating exploration of StatsBomb 360 contextual event data.

Tool may be used in two modes: either by choosing particular moment from the match using time slider or by selecting freeze frames for particular shots taken throughout the chosen match. Moreover, it features Voronoi diagrams for match events, visualizing pitch control.

  • Streamlit
  • StatsBomb
  • Pandas
  • Matplotlib

Clustermatic - simple AutoML library for clustering tasks

Library designed to accelerate clustering tasks using scikit-learn

Serves as a quick tool for selecting the optimal clustering algorithm and its hyperparameters, providing visualizations and metrics for comparison, with easy HTML reporting.

  • Scikit-learn
  • Scikit-optimize
  • SciPy

Writing

Long-Form Articles

Mathematics Behind Predicting Football Results

End-to-end explanation of how the Poisson Model, Skellam Distribution & ELO Ratings can be leveraged to predict football match outcomes, with a practical implementation in Python and discussion of real-world performance, along with data viz.

Open Article

Adapting Elo Ratings for Draw Possibility

Mathematical explanation of how to adapt ELO ratings to account for the possibility of draws in sport event predictions.

Open Article

Football Data-Driven Analysis: Case Study of 2023/2024 Cercle Brugge

Unveiling peculiarities of belgian side's tactics and performance in the 2023/2024 season through data-driven analysis, with particular focus on their focal point, striker Kévin Denkey.

Open Article

Leveraging Mathematics for Football Scouting: Analysis of 2023/2024 Polish Ekstraklasa

Searching for Ibrahim Osman's replacement for Nordsjælland: data-driven scouting in Polish Ekstraklasa 2023/24. Article walks through the whole process of identyfing the best suited players using data.

Open Article

Data Viz Tutorials

Football Match Momentum

Full walkthrough of calculating and plotting popular metric of football match momentum, using event data, resembling Opta visualizations.

Open Tutorial

Passing Sonars for Football

Modern, geometrical approach to showcasing passing behaviour for each player using football event data, with a full walkthrough of the Python code.

Open Tutorial

Career

Experience

3+ years

Commercial experience as a Data Scientist since 2023, working on statistical modeling & data pipelines.

Education

Data Science BSc + MSc (in progress)

BSc Engineering in Data Science @ WUT, Faculty of Mathematics and Information Science. Currently pursuing MSc in Data Science at the same faculty.

Contact

Feel free to reach out for any professional matters!