Test data for testing scoring models

Pay with:
i agree with "Terms for Customers"
Sold: 4 last one 20.04.2016
Refunds: 0

Uploaded: 23.07.2010
Content: data_for_research_and_building_scoring_models.rar 2028,25 kB

Description

Test data for testing scoring models

The archive contains the anonymous test data for testing a variety of statistical scoring models, as well as for research to find a variety of statistical regularities.

Model Data (Modeling_Data.txt within the file Modeling_Data.zip, located in the main archive) containing 50,000 records, record fields are separated by tabs. Each record represents a depersonalized information about 31-m parameter (regressor) of the borrower, and information he gave a loan or not. Despite the fact that these are impersonal, they contain all the laws of the real domain.

The file contained in the archive Variables_List.zip, describes the fields of modeling data.

Additional information

All data in the archive in English! Consequently, in order to use them need a minimum of his knowledge (or desire to understand). The data available to the public at one of the international competitions of Data Mining-y.

Some of the fields of model data:
ID_CLIENT - Customer ID (borrower)
ID_SHOP - Identifier loan store where you purchased the credit product
SEX - Sex (M - male, F - female)
MARITAL_STATUS - Marital status (S - Single / Single, On - Single / Single, D - divorced, V - a widower / widow, O - Other)
AGE - Age
QUANT_DEPENDANTS - Number of dependents in the borrower
EDUCATION - educational level (can be specified)
FLAG_RESIDENCIAL_PHONE - Is there a permanent phone number (Y - yes, N - no)
AREA_CODE_RESIDENCIAL_PHONE - Changed the area code phone borrower
PAYMENT_DAY - Fixed day of the month of the regular payment of the loan repayment
SHOP_RANK - Rating vendor loan product, presented in financial terms
RESIDENCE_TYPE - Type of housing (P - own, A - leased, C - at home parents, O - Other)
MONTHS_IN_RESIDENCE - while staying in the current location in months
FLAG_MOTHERS_NAME - Does the application form the name of the borrower's mother (Y - yes, N - no)
FLAG_FATHERS_NAME - Does the questionnaire borrower name of the father (Y - yes, N - no)

and so on until the last field:
TARGET_LABEL_BAD - I gave you in the end the borrower the loan (1 - not to give 0 - handed)

Possible areas of applied research, which may be based on these data:
- Scoring.
- Mathematical statistics (including non-classical sections, for example, non-numeric objects of nature Statistics).
- Neural Networks

In addition, the archive has two sets of data (files and Prediction_Data.zip LeaderBoard_Data.zip) 10,000 records each without indication of the borrower paid or not. These datasets can be used to verify that you have created statistical models. Of particular value is the fact that these two dataset contains data for other time periods (there is even a field does not coincide), which will verify the robustness (resistance) of your scoring a mathematical model to minor opportunistic socio-economic changes taking place over time. This will help you to create models really reflect is hidden patterns domain, that is, the laws of nature.

UPD.
According to these data, for example, it is possible to establish that the fact that women - more conscientious payers not speculation and a statistical fact virtually any confidence level - with 95% and 99%.

Feedback

1
No feedback yet.
Period
1 month 3 months 12 months
0 0 0
0 0 0
In order to counter copyright infringement and property rights, we ask you to immediately inform us at support@plati.market the fact of such violations and to provide us with reliable information confirming your copyrights or rights of ownership. Email must contain your contact information (name, phone number, etc.)