Data-Science-Portfolio
Some of my Data Science Projects
Project 1: Deep Learning for Lending Club Loan Data Analysis
-
DESCRIPTION: Create a model that predicts whether or not a loan will be default using the historical data.
-
Problem Statement:
For companies like Lending Club correctly predicting whether or not a loan will be a default is very important. In this project, using the historical data from 2007 to 2015, you have to build a deep learning model to predict the chance of default for future loans. As you will see later this dataset is highly imbalanced and includes a lot of features that make this problem more challenging. -
Domain: Finance
-
Analysis to be done: Perform data preprocessing and build a deep learning prediction model.
Project 2: Big data analysis with Pyspark
-
Domain Banking-Market-Analysis-
-
Background and Objective: Your client, a Portuguese banking institution, ran a marketing campaign to convince potential customers to invest in a bank term deposit scheme. The marketing campaigns were based on phone calls. Often, the same customer was contacted more than once through phone, in order to assess if they would want to subscribe to the bank term deposit or not. You have to perform the marketing analysis of the data generated by this campaign.
Project 3: Using NLP and ML, make a model to identify hate speech (racist or sexist tweets) in Twitter.
-
Problem Statement:
Twitter is the biggest platform where anybody and everybody can have their views heard. Some of these voices spread hate and negativity. Twitter is wary of its platform being used as a medium to spread hate. You are a data scientist at Twitter, and you will help Twitter in identifying the tweets with hate speech and removing them from the platform. You will use NLP techniques, perform specific cleanup for tweets data, and make a robust model. -
Domain: Social Media
-
Analysis to be done: Clean up tweets and build a classification model by using NLP techniques, cleanup specific for tweets data, regularization and hyperparameter tuning using stratified k-fold and cross validation to get the best model.
Project 4: Anomalie Detection in Finance using Machine and Deep Learning
- Problem Statement Finance Industry is the biggest consumer of Data Scientists. It faces constant attack by fraudsters, who try to trick the system. Correctly identifying fraudulent transactions is often compared with finding needle in a haystack because of the low event rate. It is important that credit card companies are able to recognize fraudulent credit card transactions so that the customers are not charged for items that they did not purchase.
- You are required to try various techniques such as supervised models with oversampling, unsupervised anomaly detection, and heuristics to get good accuracy at fraud detection.
Project 5: Recommandation System for Netflix
-
Problem Statement: The dataset provided contains movie reviews given by Amazon customers. Reviews were given between May 1996 and July 2014.
-
Analysis Task Exploratory Data Analysis: Recommendation Model: Some of the movies hadn’t been watched and therefore, are not rated by the users. Netflix would like to take this as an opportunity and build a machine learning recommendation algorithm which provides the ratings for each of the users: Build a recommendation model on training data and Make predictions on the test data