Work History
-
2019
Amazon Music
(March 2019 - Present) | San Francisco, California, United States
Software Engineer at Amazon Music in the Music Metrics team.
-
2017
Capital One
(July 2017 - March 2019) | San Francisco, California, United States
Data Engineer at Capital One on the Digital Experimentation team. Refactored core backend analysis logic code to Spark running on AWS EMR, resulting in 50% process time reduction and reducing Redshift resource. Migrated data to enterprise cloud data warehouse solution, Snowflake, and data lake, making data available to Capital One associates for business analysis. Designed and built ML model pipeline to run 100+ A/B tests in parallel using distributed cloud computing powered by AWS EC2, Lambda, S3, and Redshift, reducing A/B testing time by 50%. Implemented APIs to dynamically allocate A/B tests web traffic based on ML model output, minimizing lost opportunity cost on testing. Refactored legacy data ETL code to Spark running on AWS EMR, resulting in 50% process time reduction and 30% data traffic capacity increase. Deployed multiple large-scale data ETL pipelines to ingest terabytes of data, generating more accurate ML models for A/B testing. Optimized Redshift database, resulting in 30% storage capacity increase and 50% query process time reduction.
-
2017
Berkeley Institute for Data Science
(January 2017 - May 2017) | Berkeley, California, United States
Data Science Connector Assistant for History 100S Text Analysis for Digital Humanists and Social Scientists. Developed lecture materials in iPython Notebook on NLP, text analysis, and machine learning techniques for students in humanities without prior knowledge to understand and to use on final research paper. Designed coding exercises for students to practice with and gain familiarity with text analysis in Python using libraries such as NumPy, SciPy, Matplotlib, scikit-learn, Pandas, and nltk.
-
2016
UC Berkeley Computer Vision Group
(March 2016 - May 2017) | Berkeley, California, United States
Research Assistant for Computer vision research that focuses on the analysis and identifying photo parameters that affect the realism of the captured photo versus the natural scene. Rebuilt the iOS photo editing application in Objective-C for subjects to use during experiments to collect raw photo data for research purposes. Implemented high-dynamic range photo processing algorithm in Objective-C with OpenCV library for subjects to capture more accurate photos.
-
2016
Lockheed Martin Space Systems Company
Data Scientist Intern
(May 2016 - August 2016) | Sunnyvale, California, United States
Data Scientist Intern that supported Market Analysis Team within Competitive Intelligence to analyze space industry. Designed a sentiment analysis tool (Java) using NLP to generate LM’s major competitors’ trending using over 3,000 articles scrapped from space industry related news websites for quick market analysis. Developed a predictive analytics tool (R) that predicts project costs using regression trees with existing LM database consisting of over 15,000 data of previous space projects to assist with business strategic planning. Generated statistical reports of estimated project costs from predictive analytics tool results for team to use on market analysis and strategic planning for future space projects.
-
2014
UC Berkeley Physics
(August 2014 - May 2016) | Berkeley, California, United States
Online Database of Extragalactic Transient Theoretical Astrophysics (ODETTA) is an open source astrophysics website and database that uses supernovae data gathered from the Lawrence Berkeley National Laboratory to run computer simulations on the NERSC supercomputers to improve our knowledge in supernovae. This research aims to apply machine learning techniques to help analyze and predict the categories of supernova data and its features. Implemented the k-means clustering machine learning algorithm in Python for supernova classification used on the backend server of the ODETTA research website for astrophysicists in laboratory to use. Applied Principal Component Analysis (PCA) to extract key features of over 1,000,000 data in Python using NumPy, SciPy, Matplotlib, scikit-learn libraries to build predictive models used for classification. Created a data processing pipeline to process all observed and computer simulated data on backend server.
Education History
-
2013
University of California, Berkeley College of Engineering
Bachelor of Science , Engineering Mathematics and Statistics with emphasis in Computer Science
(2013 - 2017) | Berkeley, California, United States
My current major is engineering mathematics and statistics with emphasis in computer science. Relevant coursework for my major includes many of both theoretical and applied mathematics, statistics, and computer science courses. Relevant coursework include Machine Learning, Database, Artificial Intelligence, Efficient Algorithms and Intractable Problems, Data Structures, Computational Photography, Concepts in Probability, Concepts in Statistics, Concepts in Data Computing, Linear Algebra, Numerical Analysis, Real Analysis, Complex Analysis, Optimization.
Programming Skills
-
Python
-
SQL
-
Java
Language Skills
- NativeEnglish
- NativeChinese