AI and machine learning are notoriously complex terms for anyone to grasp. Here at Instantor, we want to make people’s financial lives easier and with the word “easy” in mind, we’ve created this glossary.
The glossary defines general machine learning and artificial intelligence terms as well as terms specific to the FinTech and financing industry. Because here at Instantor, we don’t just throw around these terms – we actually use ML in our solutions on a daily basis. Our ambition is to show what you can achieve with AI, machine learning and other tricky terms and make it more accessible to everyone.
Why is this important? We think everyone, no matter their level of knowledge of the financial and tech sectors, should have access to smart financial services that make life easier. Wouldn’t it be great if the Ordinary Joe also understood the meaning behind these brilliant services and tools?
We definitely think it would. This is our attempt to educate the credit risk manager, the millenial, the senior, the newcomer – well, anyone keen on getting a deeper understanding of the lingo used in the FinTech world – on these terms.
AccuracyAccuracy is usually measured by the extent to which the outcome deviates from the true value. When talking machine learning; it’s a metric for evaluating classification models, meaning, the percentage of predictions that the models got right. Gini and AUC are similar measures to accuracy, used for assessing the performance of credit scoring models. In other words; accuracy is the difference between the predicted outcome and the actual result.
AI (Artificial Intelligence)AI, a frequently used term, is intelligence demonstrated by machines, in contrast to the more traditional intelligence displayed by humans and animals, called natural intelligence. When a machine mimics cognitive functions similar to a human’s, like problem-solving and learning, the term “artificial intelligence” is applied. It can be seen as a process that converts unstructured information into useful knowledge. This can include functions like planning, interpreting, problem-solving and understanding language. Currently, AI is being used to assess credit risk decisions, which were previously made by humans; now this process can be fully automated.
AISPAccount Information Service Providers (AISPs) are licensed companies authorised by the end customer to access their bank information. The AISPs then provide other parties with this information, such as credit companies. Instantor is an AISP as we aggregate end-users' data - with the end user's consent - in order to provide information to our customers.
Typically carried out by a computer. An algorithm is a specification on how to solve a class of problems, based on calculations or a specific set of rules. In other words, a method computers use to solve a specific set of problems.
Alternative credit scoring
Credit bureaus’ current methods of assessing creditworthiness can be a problem for the underserved and millennials, but there is a solution: alternative credit scoring.
As opposed to traditional credit scoring, where the outcome of a score is based on information from the credit bureau and payslips, alternative credit scoring refers to a process of analysing a person’s transactional data by inputting alternative data into a credit scoring model, for example a machine learning model. By doing so, you will extract a realistic, up-to-date and fair prediction of a person’s financial capability.
The beauty of alternative credit scoring is that the approach is much fairer to groups that don’t have an extensive credit history. This is because alternative scoring is not biased by age, lifestyle choices or lack of history, but instead uses other, more accessible methods to assess risk.
AML (Anti-Money Laundering)
Anti Money Laundering refers to a set of regulations used in the financial industry to prevent the disguising of illegally obtained funds as legitimate income.
AUC (Area Under the Curve)
AUC, or Area Under the Curve, is used in classification analysis to determine the predictive capability of a model. It usually applies to a Receiver Operating Characteristic curve, a ROC curve, that measures true positive rates towards false positive rates. These rates measure the proportion of actual positives that are correctly identified as such. A true positive is an outcome where the model correctly predicts the positive case. A true negative is an outcome where the model correctly predicts the negative case.
What does this mean? Well, AUC describes the quality of a predictive model (specifically a binary model), the higher the number the better the model. Scores are given between 0 and 1, with 0.5 being the worst score you can get, in practice.
An automated underwriting process that utilises digitalised tools, such as machine learning, to ensure a borrower meets all requirements for getting a loan. Automated underwriting processes are more efficient and streamlined. This is opposed to traditional underwriting, where there are various manual processes before a credit risk manager can determine a person’s financial capability
API (Application Programming Interface)
An API, or Application Programming Interface, is a set of programming codes that help different software platforms to build upon each other and communicate, by sending instructions and querying data. An API can mean different things in different contexts but usually we talk about HTTP APIs. These are means for computer applications to communicate with one another.
The use of APIs is particularly fundamental to the implementation of open banking, where the use of an open API enables third-party developers to build services around the financial institutions.
Bank dataA particular piece of financial information, e.g. account number, personal information and transactional data.
BatchA batch is a term used in machine learning. When training a model, a batch is the number of samples used in one iteration.
Bias (AI)Bias is a phenomenon that occurs when faulty assumptions are input into a machine learning model, leading to the algorithm that produces results which are systematically prejudiced.
We generate data whenever we go online, when we are communicating using social media or simply navigating using our phones. Essentially every time we do anything digitally we are creating data. Big data refers to the composite of subsets of this data. The amount of data, or information, is too large or complex for traditional data-processing application software to deal with.
CategorisationWhen talking about categorisation in finance, it refers to the process of identifying the context or purposes of specific bank account transactions. Simply put: dividing financial transactions into different categories. The process employes trained algorithms to identify patterns and recognising keywords.
Compliance in finance refers to the different laws and regulations controlling how personal and financial data is used and stored. GDPR and PSD2 are two of the most significant regulations shaking up the financial world this year.
Credit risk managementCredit risk management refers to the practice of identifying the probability of a borrower not repaying their loan. Credit risk managers mitigate potential losses for financial institutions.
Credit risk modelingCredit risk modelling is digitised process financial institutions rely on to predict the likelihood of a borrower defaulting on a loan. Credit risk models look at information such as the loan amount, the term, employment status of the borrower and other geographic and macroeconomic factors to calculate the risk associated with lending to a particular individual.
This is the process of analysing raw data in order to make conclusions about that information. For example, one could analyse transactional data to get a deeper understanding of someone’s financial behaviour.
Refers to a process of extracting information from bank customers' accounts, with their consent. Instantor is specialised in data aggregation and in creating reports (JSON and PDF) from transactional data to help our customers to make faster and more accurate credit risk decisions.
A decision tree is a model or tool that uses a tree-like model of decisions and possible consequences. Decision trees are represented as a sequence of branching statements. For example, if Borrower A has a steady income, then we should ask: “have they repaid any previous loans?”, or if they do not have a steady income then we reject their application for a loan. The decision tree continues to branch until all reasonable questions have been asked.
Deep learning is a method, or a subset, of machine learning. It is part of a broader family of machine learning methods based on learning data representations through neural networks inspired by the human brain, as opposed to task-specific algorithms.
A digital underwriting process utilises digitised tools, like machine learning and data processing, to ensure a borrower meets all requirements for getting a loan – which is an efficient and streamlined process. This is opposed to traditional underwriting, where the credit risk manager uses manual methods to determine a person’s financial capability.
This is the process of encoding data - such as names and numbers - so that they can’t be seen by unwanted parties. By using algorithms, this data is turned into codes. A key is required in order to turn the code back into useful data. At Instantor, all the reports and data sent to the client are encrypted for security reasons.
An ensemble is a group of machine learning models which are combined to create predictions.
A false negative in credit risk management is a measure of how many uncreditworthy applications are incorrectly identified as creditworthy. The model predicted that the borrower wouldn’t default but in reality the borrower defaulted.
An input variable or a measurable characteristic used in machine learning to make predictions.
Feature engineering is the process of selecting specific features that will be useful for training a model. This also includes the conversion of raw data files into specific features.
FinTech stands for Financial Technology and refers to new tech innovation offered by challengers within the financial market, often challenging traditional methods of delivering financial services.
FSA (Financial Supervisory Authority)
The FSA, or the Financial Supervisory Authority, is a government agency present in every country but under different names, with the task of monitoring the financial market. It aims to maintain a stable financial system; develops rules and makes sure the companies follow them.
Examples of financial supervisory authorities are Finansinspektionen (FI) in Sweden, the Financial Supervisory Authority (FSA) in the UK and Bank of Spain along with National Securities Market Commission (CNMV) in Spain.
GDPR stands for the General Data Protection Regulation and is a set of EU regulations aiming to control data protection and privacy for all individuals within the European Union and the European Economic Area. The export of personal data outside the EU and EEA areas are also addressed.
The Gini coefficient, which is sometimes called the Gini index, is a measure of the degree of variation represented in a set of values in a variable. A Gini coefficient can be used to evaluate the performance of a model and is most commonly used for imbalanced datasets where the probability alone makes it difficult to predict an outcome.
Gini is a standard metric in credit risk assessment because the likelihood of an individual defaulting is relatively low. In the financing industry, Gini can assess the accuracy of a prediction whether a loan applicant will default or repay.
A Gini score is between 0 and 1, where a score of 1 indicates that the model is 100% accurate in predicting an outcome and a score of 0 means the model is completely inaccurate. In credit risk a higher Gini means acceptance rates can be increased without taking on more risk.
An optimisation algorithm used for finding the minimum of a certain function. To put it simply: this is a method used to update models so that they learn correctly.
This is a parameter that is explicitly set prior to training a model, as opposed to parameters that are derived through training.
Refers to the identification of individuals carried out by comparing the input from end-users with the data aggregated from the bank.
The process of verifying someone’s income is one of the most important procedures a bank or lender conducts before approving a loan. This is often a legal requirement where the aim is to determine whether or not an applicant fits the risk profile and is eligible for a loan.
In machine learning terms, this is the process of taking a model that has already been trained and using that model to make useful predictions. Normally, this involves casting the model mathematically while using the principles of probability to quantify the quality of the match. Hence, making predictions by applying the trained model to unlabeled examples.
This is the degree to which level a human can understand the cause of a decision. In machine learning that refers to how well a model’s predictions can be readily explained.
KYC (Know Your Customer)
The term Know Your Customer (KYC) is a customer identification process where a customer’s identity, financial status and address are verified. The idea is that by knowing your customers – verifying identities, making sure they’re real, confirming they’re not on any prohibited lists and assessing their risk factors — can keep money laundering, terrorist financing and more run-of-the-mill fraud schemes at bay. Most KYC procedures now run online with optimised online KYC registration processes.
These are deep neural networks consisting of layers that all handle a specific task. The output of one layer becomes the input to the next layer in the network and so on and so forth.
It is used for categorical categorisation. Logistic regression is the classification of information. This can be used to predict if an email is spam or not or if an applicant is creditworthy or not.
In more scientific terms this is a model generating a probability for each value in a classification problem through linear prediction or a sigmoid function.
The loss function measures how far the actual outcome is from the predicted outcome. It compares the prediction and the actual result to determine a scalar value that is the distance between these two values. The ultimate goal of machine learning and optimisation in general is to reduce this number to zero.
Machine learning algorithms use programming and other computational methods to learn information directly from data. ML is a way to train an algorithm to identify and learn patterns in information and then progressively improve their performance on a specific task. ML models are built without relying on a predetermined equation as a model, as more data points are provided, the algorithms adapt to improve performance. Natural patterns within data are identified to provide insight and enhance predictive capabilities.
ModelA model is a mathematical representation of a real-world process. It’s based on the learning algorithms’ findings in the training data so that the input parameters correspond to the set goal.
NoiseThese are values that contain information that is not useful for solving a particular problem. A noisy data set is one in which there is not much predictability in the data. In essence noise is data that does not follow the model. Too much noise in a model typically results in a bad model.
Onboarding is the process of integrating a customers into a new product, service or program. This process can vary between companies, but the general concept is to get customers up and running. At Instantor, onboarding means the whole integration process – counting from the moment a customer is interested in using our service all the way to when Instantor’s service is operating in their daily banking activity.
This is a term in financial technology that refers to the transparency and access to banking data by third parties through an open API. Thanks to open banking, AISP companies such as Instantor can access data of bank customers.
The action of making the best out of a certain situation or resource. The optimal path is what maximises the objective, which could be producing the greatest learning gains; generating the most significant performance improvement; or minimising churn and accelerating the learning process.
A parameter is a value derived through training, in contrast to a hyperparameter where the value is set prior to training the model. They are often referred to as weights for neural networks.
PISP (Payment Initiation Service Provider)
PISPs, or Payment Initiation Service Providers, are service providers that are authorised to initiate a payment transactions on behalf of the customer. This means they are able to withdraw the money straight from your account if you’ve given your consent.
The predictions are a specific model’s output when provided with an input example, like forecasting expected outcomes. Machine learning is a way of identifying patterns in data and using them to automatically make predictions or decisions in the future.
PSD2, which stands for the 2nd Payment Services Directive, is an EU directive administered by the European Commission. It aims to regulate payment services and payment service providers throughout the EU. Instantor was approved in September 2018, by the FSA, to operate as an Account Information Service Provider under PSD2.
Quantisation is an umbrella term for different techniques to reduce the size and complexity of a model or data to a reasonable size while maintaining a high-performance accuracy. A great way to think about this is: if it’s 15.43, and someone asks you the time, and you say “quarter to four”, this is not correct, in terms of minutes and seconds, but, unless in a limited number of cases, this is a close as people need to know.
This function operates through analogue to digital converters, which creates a series of digital values to represent the original analogue signal.
In machine learning, there are two types of problems; regression and classification. Regression analysis is a method to examine the relationship between two or more variables. While there are many different types of regression analysis, at the core, they determine the dependency between input and output variables. I.e. they determine the relationship between the input and the output. For example, determining the relationship between income and creditworthiness.
Based on how previous sequences of decisions have resulted in rewards and punishments – reinforcement learning is an approach for learning what decisions maximise a measure of reward.
The attribute that describes how capable a system is to perform under significant growth. It could also refer to its potential to get enlarged to accommodate the growth without harming that system.
The SHapley Additive exPlanation (SHAP) framework provides clear explanations for all kinds of machine learning models – from tree classifiers to deep convolutional neural networks. Every feature used in the model is given a relative importance score: a SHAP value.
These are values that represent relative contribution – of the agents in the cooperation task – or of different features in the machine learning model.
Supervised machine learning
In supervised machine learning, previous knowledge is used to train a model, which is used to make predictions. For example, banks use data about the behaviour of previous customers to make decisions about new customers. They build general models that allow them to identify patterns in behaviour to determine ideal and non-ideal customers. Banks have learned through statistical analysis that information like salary, expenses, marital status and employment status of the borrower can help predict the creditworthiness of an individual.
The process of training a machine learning model involves providing an ML algorithm with training data to learn from.
Transactional data refers to data which is a result from financial data aggregation activities. This type of data can help you get a deeper understanding in credit risk management.
A true negative in credit risk management is a measure of how many uncreditworthy applications are correctly identified as uncreditworthy.
A true positive in credit risk assessment is a measure of how many creditworthy applicants are correctly identified as creditworthy.
Unsupervised machine learning
These types of machine learning algorithms don’t leverage existing data but instead infer patterns from a dataset without reference to known or labelled outcomes. Unsupervised machine learning can be used for clustering or to discover the underlying structure of the data.
A weight is a coefficient for an input in a neural network layer. If a weight is zero – its corresponding input feature doesn’t contribute to the model. In plain English, a weight it a way of mapping input data in terms of importance, in other words, assigning the relative importance to data or information.