Project - Analyze Loan Listing Data

3 minute read

Slide Show (Hit Auto Play button to suspend/resume slide show or Manual Play button to advance slide page manually)

Prosper Loan Dataset Exploration

Explore loan data from Prosper with plots of one, two and more variables. Produce a short presentation to illustrate interesting properties, trends, and relationships discovered in the dataset. This project is part of the deliverables for my Data Analyst Nanodegree with Udacity.

Dataset

The data consists of 6,123 Prosper loan listings created between 1 July 2008 and 31 December 2009 from Prosper, an online peer-to-peer lending business. The data has 16 features (excluding two ID type variables for borrower ID and listing key), most of which are quantitative in nature but there are a few categorical ones as well.

A data feature dictionary describing the variables is available here.

The dataset for exploration can be downloaded here, and the clean dataset for explanatory purpose can be downloaded here.

Summary of Findings

In the exploration, I focused on 6,123 listings between 1 July 2008 and 31 December 2009, and examined selective attributes on borrower and loans, time series events to see their effects on loan listing distributions.

In univariate exploration, I found:

  • distributions of listing categories, borrower professions and states reveal the majority of listings were for debt consolidation, most of the borrowers held professional jobs and residents of CA had the highest number of listings.

  • distribution listing amount shows majority of loan listings were below $3000 in values, employment status distribution shows a combined 97.02% of borrowers were in full, self or part time employment statuses. Yet listings in charged off and defaulted statuses is significant at 26.14% combined.

In bivariate exploration, I observed that listing amount distribution is concentrated in lower end of loan values, well below $10,000. This is consistent with my finding in univariate exploration. I also observed that adding a second feature to previously explored features can cause distribution trends to vary, as seen in the two scenarios where Occupation and Credit Ranking were added in turn to listing distribution in the top five borrower states. In both cases, the listing distributions vary from their corresponding distributions in univariate exploration.

In multivariate exploration, I found adding a related new feature to a bivariate distribution did result in the related features strengthening each other. For examples,

  • adding a third feature Borrower APR to the bivariate distribution of listing amount and credit ranking adds visibility to varying concentrations of listing amounts at different APR rates.

  • adding a fourth feature Listing Category to multivariate distribution of listing amount and borrower APR by credit ranking enables visibility to segmentation of borrower APR on listing amount and credit ranking by listing category.

  • the three features Time, Principal borrowed and Delinquent amount enhance each other and yield a clear trend on principal borrowed and linquent amount between 1 July 2008 and 31 December 2009.

  • the three features Listing amount, Credit ranking and Period enhance each other and enable comparison of listing amount and credit ranking distribution in two different time periods.

  • the three feature Listing amount, Listing category and Period enhance each other and add visibility to listing amount and listing category distribution in two different time periods.

Through univariate, bivariate and multivariate explorations, I was able to look at features pertaining to Prosper borrowers, loan listings and time periods in the dataset to study their influences on Prosper loan distributions, and see the effect of the three historical significant events on Prosper loan listings between 1 July 2008 and 31 December 2009.

Key Insights for Presentation

For the presentation, I focused on showing the effects of selective borrower, loan attributes and time series events on Prosper loan listing distributions.

I started with introducing visuals of listing distributions associated with borrower profession, employment status, resident state, listing category, listing status and listing amount in univariate perspective.

Next, I demonstrated how adding an additional attribute to univariate listing distributions can result in variations in listing distributions.

Lastly, I created multivariate visuals by adding APR and time series attributes to bivariate listing distributions, show casing the effects of these attributes on multivariate listing distributions.

Tool

Jupter Notebook (Python)

Programming library

panda, numpy, matplotlib, seaborn

Artifact

  • Communicate_Data_Findings-Prosper_Loan_Part1.ipynb, Communicate_Data_Findings-Prosper_Loan_Part1.html
  • Communicate_Data_Findings-Prosper_Loan_Part2.ipynb, Communicate_Data_Findings-Prosper_Loan_Part2.html
  • Communicate_Data_Findings-Prosper_Loan_Part2.slides.html (slide show)
  • readme.md
  • prosperLoanData_src.csv (for data wrangling use), prosperLoanData.csv (for exploration and explanatory use)

See code here.


Acknowledgement

Special thanks to the legendary Udacity Data Analyst Nanodegree mentor Myles Callan, whose knowledge competency is impressively splendid, and dedication to serve students, undeniably unparalleled. I have benefited and learned a great deal from his remarkble style of coaching.

Thank you, Myles, for the privilege to learn from you, you are simply wonderful !!!

Updated: