Zheniya Mogilevski
  • Home
  • Projects
  • Me
  • Search
Navigation bar avatar
âś•

    Grinds that pay off


    Selected learnings from the job
    • Symplifying the Data Stack with Ponder

      Running pandas directly on data warehouses

      June 29, 2023

      Post thumbnail
      Post thumbnail
      Not affiliated, but spreading the love. Because the capability of running pandas directly on data warehouses is far from being something casual. [Read More]
      Tags:
      • pandas
      • data engineering
      • dataops
      • workflow
      • ponder
    • Building Data Pipelines with Docker

      From introduction to Docker to a Python pipeline app

      April 29, 2023

      Post thumbnail
      Post thumbnail
      This blog post serves as a tutorial, aiming to provide an overview of how Docker Compose can be utilized with Python and Postgres to create a robust data ingestion pipeline app. Divided into three parts, this tutorial walks the reader through the fundamental concepts of Docker and Dockerfiles for container... [Read More]
      Tags:
      • docker
      • data engineering
      • dataops
      • workflow
      • tutorial
    • Data Scientist Joining CI/CD party, Part 2

      From prototype to production - a data scientist's journey in CI/CD

      April 15, 2023

      Post thumbnail
      Post thumbnail
      This blog post is the second in a series discussing the closely related notions of version control, dependency management, and continuous integration/continuous deployment practices. In the first part, I walked the reader through the basics of version control and dependency management. In the current post, I show how a continuous... [Read More]
      Tags:
      • devops
      • github actions
      • CI/CD
      • software development
      • workflows
      • version control
      • automated testing
    • Data Scientist Joining CI/CD party, Part 1

      From prototype to production - a data scientist's journey in CI/CD

      April 8, 2023

      Post thumbnail
      Post thumbnail
      This blog post is the first in a series discussing the closely related notions of version control, dependency management, and continuous integration/continuous deployment practices. In this series, I continue to draw on the benefits of introducing good software development practices in the data scientists’ workflow. In the first part, I... [Read More]
      Tags:
      • devops
      • github actions
      • CI/CD
      • software development
      • workflows
      • version control
      • automated testing
    • Speaking at Python Web Conf 2023

      Being lightning fast from idea to implementation

      March 14, 2023

      Post thumbnail
      Post thumbnail
      On March 14, 2023, I found myself giving my first public conference talk ever. At my first-ever Python conference! This post is to mark the event and to reflect on how quickly things fell into place. [Read More]
      Tags:
      • speaking
      • conferences
    • Boosting Primary Data Quality through Machine Learning Techniques

      Anomaly detection in batch process manufacturing data using an autoencoder

      December 17, 2022

      Post thumbnail
      Post thumbnail
      In this blog post, I will demonstrate how I applied a reconstruction convolutional autoencoder model to detect quality issues of sensor data. I also show that a small percentage of anomalous data points can result in a disproportionately large percentage of downstream calculations being inaccurate. [Read More]
      Tags:
      • sensor data
      • time series
      • anomaly detection
      • neural networks
      • autoencoder
      • primary data
      • data quality rating
      • product carbon footprint
      • pcf
      • ocf
    • Comparing Distribution Differences of Generated Vectors - A Cautionary Tale

      Examining the impact of different methods on generated data

      November 12, 2022

      Post thumbnail
      Post thumbnail
      This time, I implement a function that generates vectors with a truncated normal distribution. I compare two different versions of the function and examine the impact of the different methods used on the resulting distributions. [Read More]
      Tags:
      • generated data
      • vectors
      • distributions
      • data visualization
      • statistical testing
      • nonparametric test
      • unit testing
      • tdd
    • Implementing Label Filtering in a GitHub Workflow

      Python package automated publishing example

      October 15, 2022

      Post thumbnail
      Post thumbnail
      When building workflows while runnig multiple projects in one repository, it is crucial to filter PRs based on specific labels, given that each PR can receive labels of different projects. The documentation did not cover our case specifically; thus, it took some trial and error to figure out the proper... [Read More]
      Tags:
      • devops
      • github actions
      • package publishing
      • filtering
    • Applying OPP Principles to Manufacturing Analytics Testing Data Generation

      From separate methods to a unified class

      August 6, 2022

      Post thumbnail
      Post thumbnail
      In this blog post, I explore how to move away from procedure-based code and increase code usability by following object-oriented programming principles. To this end, I refactor the code from one of the previous blog posts, namely Generating Realistic Testing Data for Manufacturing Analytics Software. By streamlining the code and... [Read More]
      Tags:
      • generated data
      • sensor data
      • time series
      • software development
      • oop
      • data visualization
    • Generating Realistic Testing Data for Manufacturing Analytics Software

      Modeling real-life data imperfections

      July 16, 2022

      Post thumbnail
      Post thumbnail
      In the development of a software that includes data tools for providing analytics on batches in chemical manufacturing, the use of realistic testing data is critical to ensure accurate and reliable results. In this blog, I exemplify how such testing data can be generated, starting with a basic example of... [Read More]
      Tags:
      • generated data
      • sensor data
      • time series
      • unit testing
      • tdd
      • data visualization
    • Making Sense of Chemical Manufacturers Corporate Goals on Reducing Carbon Footprint

      Rethinking the role of process flow analysis in identifying carbon reduction opportunities in chemical manufacturing

      December 28, 2021

      Post thumbnail
      Post thumbnail
      The chemical manufacturing industry is a significant contributor to global greenhouse gas emissions, although the estimates of its share of the total carbon footprint vary depending on the source of data and the specific sector or region considered. Thus, according to the Global Change Data Lab, the chemical industry accounted... [Read More]
      Tags:
      • industry
    • GitHub
    • LinkedIn

    Zheniya Mogilevski  •  2023  •  Edit page

    Powered by Beautiful Jekyll