I’m a data engineer with a background in mechanical engineering, robotics, software development, and teaching.

Zanwar Faraj


Data Science Portfolio

Data Mining: Bing Search Engine

  • Improved Bing's search results by finding and normalizing business name synonyms during my data analyst internship at Microsoft
  • Before my work, searching for “7 11” returned fewer results than “7-Eleven”. After my work, both queries returned the same number of results.
  • Normalized 38,910 distinct businesses and propagated website information between 49,145 distinct businesses

Skills: Data mining Data analysis Natural language processing Project documentation

Tools: SQL Regular expressions Cosmos DB

Robotics / AI: Eva

  • Designed and manufactured Eva, a facially expressive robotic face, for the Columbia University Creative Machines Lab
  • Eva is an open-source platform designed to facilitate artificial intelligence research, particularly in the domain of human-robot interaction.

Skills: Mechanical engineering Software development Project documentation

Tools: Python SolidWorks

Machine Learning: Hotel Bookings

  • Predicted the rates of two hotels in Portugal by training a random forest regressor and an Elastic-Net model
  • Predicted if a booking would be cancelled by training a random forest classifier and a logistic regression model
  • Performed customer segmentation with k-means clustering to identify customers for a possible rewards program

Skills: Machine learning Exploratory data analysis Data cleaning Data visualization Project documentation

Tools: Python scikit-learn pandas Matplotlib NumPy Jupyter Notebook

Data Analysis: Cost of Living

  • Compared the cost of living in Seattle to other cities and the cost of living in the US to other countries
  • Determined the most uniformly priced goods and analyzed how the prices of goods are correlated with each other
  • Created data visualizations using pandas, Matplotlib, and Tableau to communicate findings

Skills: Data analysis Data visualization Statistics Data cleaning Project documentation

Tools: Python pandas Matplotlib NumPy Tableau Jupyter Notebook

Database Design: Financial Data

  • Designed and implemented a MySQL database hosted on AWS to store data about US stocks
  • Created a Python script to clean and transfer financial data from Interactive Brokers into the MySQL database
  • Designed and implemented a custom version control system using Git and GitHub

Skills: Database design Data engineering Data cleaning Software development Version control Project documentation

Tools: Python SQL MySQL AWS Regular expressions Git GitHub Jupyter Notebook


About Me

Education

M.S. in Data Science
The University of Texas at Austin

Aug. 2021 - Present

B.S. in Mechanical Engineering
Columbia University in the City of New York

Sep. 2013 - May 2017

  • Departmental GPA: 4.05
  • Cumulative GPA: 3.87

Experience

Senior Consultant - Data Engineer
at Design Laboratory - Microsoft

Aug. 2021 – Present

  • Working in a data engineer role with Microsoft as a Design Laboratory consultant

Teaching

Oct. 2018 – Apr. 2020

  • Established a sole proprietorship business providing in-person and online lessons in the Seattle area
  • Taught Python and math to middle and high school students via a personalized curriculum

Mechanical Engineer
at ASML

May 2017 – Feb. 2018

  • Developed mechanical designs that improved the performance of ASML’s photolithography machines
  • Designed and conducted statistical experiments and analyzed experimental data to guide design decisions

Researcher
at Columbia University Creative Machines Lab

Sep. 2015 – May 2017

  • Designed and manufactured Eva, a robotic face capable of making facial expressions and head movements
  • Eva is an open-source platform designed to facilitate artificial intelligence research in Python

Data Analyst Intern
at Microsoft

June 2013 – Aug. 2013

  • Improved Bing’s search results by using SQL, regex, and NLP to find and normalize business name synonyms
  • Initially, querying “7 11” returned fewer results than “7-Eleven”. After my work, the results were identical.
  • Normalized 38,910 distinct businesses and propagated website information between 49,145 distinct businesses

Skills

Data engineering Software development Data cleaning Database design Machine learning Time series forecasting Data analysis Data visualization Statistics Data mining Natural language processing Web development Version control Project documentation Mechanical engineering

Tools

Python pandas scikit-learn Matplotlib NumPy SQL MySQL Java C# R Jupyter Notebook Tableau Excel Regular expressions Git GitHub Cosmos DB AWS HTML CSS Bootstrap SolidWorks


Teaching

I am no longer actively teaching, but if you need help, please let me know and I'll offer a free lesson when I'm available.

Subjects

Python

  • General-purpose programming
  • Data science

Math

  • Statistics
  • Precalculus
  • Algebra

Policies

Online lessons

  • 30-minute minimum lesson length
  • via Skype, Zoom, or similar video conferencing software

In-person lessons

  • 60-minute minimum lesson length
  • Meet in student's home or at a library

I am not currently offering in-person lessons due to COVID-19.


Contact

Send me a message below and I'll reply via email as soon as possible.