I’m a data engineer with a background in mechanical engineering and robotics.

Zanwar Faraj

About Me


Data engineering Software development Data cleaning Database design Version control Machine learning Time series forecasting Data analysis Data visualization Statistics Data mining Natural language processing Web development Project documentation Mechanical engineering


AWS Azure Python SQL Spark / PySpark Airflow MySQL PostgreSQL Git GitHub Java C# Scala R pandas scikit-learn NumPy Matplotlib Power BI Tableau Jupyter Notebook Excel Regular expressions Cosmos HTML CSS Bootstrap SolidWorks



M.S. in Data Science
The University of Texas at Austin

Aug. 2021 - Present

  • GPA: 4.00

B.S. in Mechanical Engineering
Columbia University in the City of New York

Sep. 2013 - May 2017

  • GPA: 4.05 departmental / 3.87 cumulative


Data Engineer
at Amazon

Oct. 2022 – Mar. 2023

  • Supported Amazon Flex Analytics by creating and maintaining datasets for analysis and machine learning use cases
  • Updated production databases by using AWS, PySpark, and Airflow to create and monitor scheduled data pipelines
  • Created Redshift tables using an ETL workflow and altered existing tables to satisfy evolving business requirements
  • Resolved urgent data availability and quality issues in production datasets in response to alarms and internal tickets

Data Engineer / Senior Consultant
at Microsoft / Design Laboratory

Aug. 2021 – Oct. 2022

  • Helped Microsoft’s MSAI Search Relevance team improve the quality of search results across Microsoft products
  • Processed large datasets of search queries with Python and Spark to extract insights and engineer ML features
  • Developed tooling and data pipelines on the Azure Machine Learning platform to automate engineering workflows
  • Created Power BI dashboards to visualize key metrics, enable exploratory data analysis, and guide engineering effort


Oct. 2018 – Apr. 2020

  • Established a sole proprietorship business providing in-person and online lessons in the Seattle area
  • Taught Python and math to middle and high school students via a personalized curriculum

Mechanical Engineer

May 2017 – Feb. 2018

  • Developed mechanical designs that improved the performance of ASML’s photolithography machines
  • Designed and conducted statistical experiments and analyzed experimental data to guide design decisions

at Columbia University Creative Machines Lab

Sep. 2015 – May 2017

  • Designed and manufactured Eva, a robotic face capable of making facial expressions and head movements
  • Eva is an open-source platform designed to facilitate artificial intelligence research in Python

Data Analyst Intern
at Microsoft

June 2013 – Aug. 2013

  • Improved Bing’s search results by using SQL, regex, and NLP to find and normalize business name synonyms
  • Initially, querying “7 11” returned fewer results than “7-Eleven”. After my work, the results were identical.
  • Normalized 38,910 distinct businesses and propagated website information between 49,145 distinct businesses


Robotics / AI: Eva

  • Designed and manufactured Eva, a facially expressive robotic face, for the Columbia University Creative Machines Lab
  • Eva is an open-source platform designed to facilitate artificial intelligence research, particularly in the domain of human-robot interaction

Skills: Mechanical engineering Software development Project documentation

Tools: Python SolidWorks

Database Implementation: Financial Data

  • Designed and implemented a MySQL database hosted on AWS to store data about US stocks
  • Created a Python script to clean and transfer financial data from Interactive Brokers into the MySQL database
  • Designed and implemented a custom version control system using Git and GitHub

Skills: Data engineering Database design Data cleaning Software development Version control Project documentation

Tools: Python SQL MySQL AWS Regular expressions Git GitHub Jupyter Notebook

Machine Learning: Hotel Bookings

  • Predicted the rates of two hotels in Portugal by training a random forest regressor and an Elastic-Net model
  • Predicted if a booking would be cancelled by training a random forest classifier and a logistic regression model
  • Performed customer segmentation with k-means clustering to identify customers for a possible rewards program

Skills: Machine learning Exploratory data analysis Data cleaning Data visualization Project documentation

Tools: Python scikit-learn pandas Matplotlib NumPy Jupyter Notebook

Data Analysis: Cost of Living

  • Compared the cost of living in Seattle to other cities and the cost of living in the US to other countries
  • Determined the most uniformly priced goods and analyzed how the prices of goods are correlated with each other
  • Created data visualizations using pandas, Matplotlib, and Tableau to communicate findings

Skills: Data analysis Data visualization Statistics Data cleaning Project documentation

Tools: Python pandas Matplotlib NumPy Tableau Jupyter Notebook

Data Mining: Bing Search Engine

  • Improved Bing's search results by finding and normalizing business name synonyms during my data analyst internship at Microsoft
  • Before my work, searching for “7 11” returned fewer results than “7-Eleven”. After my work, both queries returned the same number of results.
  • Normalized 38,910 distinct businesses and propagated website information between 49,145 distinct businesses

Skills: Data mining Data analysis Natural language processing Project documentation

Tools: SQL Regular expressions Cosmos


Send me a message below and I'll reply via email as soon as possible.