I’m a data engineer with a background in mechanical engineering, robotics, software development, and teaching.

Zanwar Faraj

Data Science Portfolio

Data Mining: Bing Search Engine

  • Improved Bing's search results by finding and normalizing business name synonyms during my data analyst internship at Microsoft
  • Before my work, searching for “7 11” returned fewer results than “7-Eleven”. After my work, both queries returned the same number of results.
  • Normalized 38,910 distinct businesses and propagated website information between 49,145 distinct businesses

Skills: Data mining Data analysis Natural language processing Project documentation

Tools: SQL Regular expressions Cosmos

Robotics / AI: Eva

  • Designed and manufactured Eva, a facially expressive robotic face, for the Columbia University Creative Machines Lab
  • Eva is an open-source platform designed to facilitate artificial intelligence research, particularly in the domain of human-robot interaction.

Skills: Mechanical engineering Software development Project documentation

Tools: Python SolidWorks

Machine Learning: Hotel Bookings

  • Predicted the rates of two hotels in Portugal by training a random forest regressor and an Elastic-Net model
  • Predicted if a booking would be cancelled by training a random forest classifier and a logistic regression model
  • Performed customer segmentation with k-means clustering to identify customers for a possible rewards program

Skills: Machine learning Exploratory data analysis Data cleaning Data visualization Project documentation

Tools: Python scikit-learn pandas Matplotlib NumPy Jupyter Notebook

Data Analysis: Cost of Living

  • Compared the cost of living in Seattle to other cities and the cost of living in the US to other countries
  • Determined the most uniformly priced goods and analyzed how the prices of goods are correlated with each other
  • Created data visualizations using pandas, Matplotlib, and Tableau to communicate findings

Skills: Data analysis Data visualization Statistics Data cleaning Project documentation

Tools: Python pandas Matplotlib NumPy Tableau Jupyter Notebook

Database Design: Financial Data

  • Designed and implemented a MySQL database hosted on AWS to store data about US stocks
  • Created a Python script to clean and transfer financial data from Interactive Brokers into the MySQL database
  • Designed and implemented a custom version control system using Git and GitHub

Skills: Database design Data engineering Data cleaning Software development Version control Project documentation

Tools: Python SQL MySQL AWS Regular expressions Git GitHub Jupyter Notebook

About Me


M.S. in Data Science
The University of Texas at Austin

Aug. 2021 - Present

  • GPA: 4.00

B.S. in Mechanical Engineering
Columbia University in the City of New York

Sep. 2013 - May 2017

  • GPA: 4.05 departmental / 3.87 cumulative


Data Engineer
at Amazon

Oct. 2022 – Present

Data Engineer - Senior Consultant
at Microsoft - Design Laboratory

Aug. 2021 – Oct. 2022

  • Helped Microsoft’s MSAI Search Relevance team improve the quality of search results across Microsoft products
  • Processed large datasets of search queries with Python and Spark to extract insights and engineer ML features
  • Developed tooling and data pipelines on the Azure Machine Learning platform to automate engineering workflows
  • Created Power BI dashboards to visualize key metrics, enable exploratory data analysis, and guide engineering effort


Oct. 2018 – Apr. 2020

  • Established a sole proprietorship business providing in-person and online lessons in the Seattle area
  • Taught Python and math to middle and high school students via a personalized curriculum

Mechanical Engineer

May 2017 – Feb. 2018

  • Developed mechanical designs that improved the performance of ASML’s photolithography machines
  • Designed and conducted statistical experiments and analyzed experimental data to guide design decisions

at Columbia University Creative Machines Lab

Sep. 2015 – May 2017

  • Designed and manufactured Eva, a robotic face capable of making facial expressions and head movements
  • Eva is an open-source platform designed to facilitate artificial intelligence research in Python

Data Analyst Intern
at Microsoft

June 2013 – Aug. 2013

  • Improved Bing’s search results by using SQL, regex, and NLP to find and normalize business name synonyms
  • Initially, querying “7 11” returned fewer results than “7-Eleven”. After my work, the results were identical.
  • Normalized 38,910 distinct businesses and propagated website information between 49,145 distinct businesses


Data engineering Software development Data visualization Data cleaning Database design Machine learning Time series forecasting Data analysis Statistics Data mining Natural language processing Web development Version control Project documentation Mechanical engineering


Python pandas scikit-learn Matplotlib NumPy Spark / PySpark Power BI Tableau Azure Machine Learning SQL MySQL Java C# Scala R Git GitHub Azure DevOps Jupyter Notebook Excel Regular expressions Cosmos AWS HTML CSS Bootstrap SolidWorks


I am no longer actively teaching, but if you need help, please let me know and I'll offer a free lesson when I'm available.



  • General-purpose programming
  • Data science


  • Statistics
  • Precalculus
  • Algebra


Online lessons

  • 30-minute minimum lesson length
  • via Skype, Zoom, or similar video conferencing software

In-person lessons

  • 60-minute minimum lesson length
  • Meet in student's home or at a library


Send me a message below and I'll reply via email as soon as possible.