terça-feira, 24 de março de 2020

Credit Score using Machine Learning

 The goal is to use machine learning to create a credit score for customers. This score gives the degree of confidence that the customer will meet the agreed payments. The higher the score, define the greater the probability of non-payment.

Multiple Linear Regression in Python with Scikit-Learn


We just performed linear regression in the above section involving two variables. Almost all the real-world problems that you are going to encounter will have more than two variables.



Linear regression involving multiple variables is called “multiple linear regression” or multivariate linear regression. The steps to perform multiple linear regression are almost similar to that of simple linear regression.

We will use customer information to generate a ‘trust’ score on the customer. The scoring formula can be adapted for each company according to its credit context. In this example, we are going to use the average number of days the customer is late, and the average billing amount for the past 2 years to calculate a score that combines the 2 information.

After calculating the score, we submit the information to a machine learning with Scikit-Learn, so that the system can predict new scores based on the learning information.

Our formula for Score calculation described on Score calculation.xlsx

Customer information is in the excel: Customers_CODE.XLSX


Customer company information:

  1. Customer from date
  2. State, Region, Postcode, Salesman, Main CNAE (type of company classification in Brazil)
  3. Highest Billing Date
  4. Maximum billing amount
  5. Last Date invoice issued
  6. Largest credit exposure date
  7. Highest credit exposure
  8. Average historical delay
  9. Average revenue last 48 months
  10. Amount payable Overdue
  11. Amount payable due
  12. Customer Last Order Date
  13. Date of this information

SERASA Information 

( Serasa it’s a company that sell’s information about other companies)
  1. Serasa Score
  2. Probability of not paying
  3. Last date non payment
  4. Amount of unpaid documents
  5. Value of unpaid documents
  6. Last date bad checks
  7. Amount of bad checks
  8. Last date protests
  9. Value protests
  10. Last date judicial actions
  11. Value judicial actions
  12. Last date overdue debts
  13. Value overdue debts

I create some Python code to read and clean data from my data set.

After we create a routine to read all records and calculate my score (newscore) using my definitions. I have an excel file with my definitions to calculate my score.


Applying Keras-Model to training and test the model. Create Train and Test datasets, 80% for Train and 20% to Test. In graphic mode, we can compare Train and Test results:


I need to predict some individual records, so, I made a python function to predict score:


Now testing, running to predict my record:
Using tkinter python library, we can create a screen with better visual:




Conclusion:

The creation of a score using the information known to a customer, can automate and make a credit system more reliable. This approach contributes to cooler and more reliable risk analysis, resulting only from market data and information, removing from the system the criteria of proximity to the client and the emotions that negotiation can generate.


References:


MY GITHUB

MY MEDIUM

Sem comentários:

Enviar um comentário