The goal is to use machine learning to create a credit score for customers. This score gives the degree of confidence that the customer will meet the agreed payments. The higher the score, define the greater the probability of non-payment.
Multiple Linear Regression in Python with Scikit-Learn
We just performed linear regression in the above section involving two variables. Almost all the real-world problems that you are going to encounter will have more than two variables.
Linear regression involving multiple variables is called “multiple linear regression” or multivariate linear regression. The steps to perform multiple linear regression are almost similar to that of simple linear regression.
We will use customer information to generate a ‘trust’ score on the customer. The scoring formula can be adapted for each company according to its credit context. In this example, we are going to use the average number of days the customer is late, and the average billing amount for the past 2 years to calculate a score that combines the 2 information.
After calculating the score, we submit the information to a machine learning with Scikit-Learn, so that the system can predict new scores based on the learning information.
Our formula for Score calculation described on Score calculation.xlsx
Customer information is in the excel: Customers_CODE.XLSX
Customer company information:
- Customer from date
- State, Region, Postcode, Salesman, Main CNAE (type of company classification in Brazil)
- Highest Billing Date
- Maximum billing amount
- Last Date invoice issued
- Largest credit exposure date
- Highest credit exposure
- Average historical delay
- Average revenue last 48 months
- Amount payable Overdue
- Amount payable due
- Customer Last Order Date
- Date of this information
SERASA Information
( Serasa it’s a company that sell’s information about other companies)
- Serasa Score
- Probability of not paying
- Last date non payment
- Amount of unpaid documents
- Value of unpaid documents
- Last date bad checks
- Amount of bad checks
- Last date protests
- Value protests
- Last date judicial actions
- Value judicial actions
- Last date overdue debts
- Value overdue debts
I create some Python code to read and clean data from my data set.
After we create a routine to read all records and calculate my score (newscore) using my definitions. I have an excel file with my definitions to calculate my score.
Applying Keras-Model to training and test the model. Create Train and Test datasets, 80% for Train and 20% to Test. In graphic mode, we can compare Train and Test results:
I need to predict some individual records, so, I made a python function to predict score:
Sem comentários:
Enviar um comentário