Supervised vs Unsupervised Learning

These are the two primary approaches to machine learning. Supervised learning uses labeled training data while unsupervised in unlabeled. Labeled data already has a target variable in mind. For example, if you are trying to predict housing costs and the previous housing costs are in your data then you are using labeled data. Social Media datasets are usually unlabeled anf the goal is to fin a target customer with no target variable already established.

Supervised Methods include:

Regression
Classification
Random Forest
Decision Tree

and real life applications are:

Predictive Analytics
Text Recognition
Spam Detection
Object Detection

Unsupervised Methods include:

Clustering
Association (Apriori)
Dimensionality Reduction
Autoencoders (ANN)

and real life applications are:

Data Exploration
Customer Segmentation
Targeted marketing
Data Prep and Visualization

We use input variables(x) to conduct these analysis, also known as features.

Machine Learning

Machine learning is a subfield of artificial intelligence, which is broadly defined as the capability of a machine to imitate intelligent human behavior. Artificial intelligence systems are used to perform complex tasks in a way that is similar to how humans solve problems.

Unlike traditional programming, where explicit instructions are given. Machine learning models learn patterns from data. In traditional programming we give the computer a set of rules to follow, while machine learning allows the computer to learn those rules from examples.

Bias-Variance Tradeoff

ias and Variance are the core parameters to tune while training a Machine Learning model.

When we discuss prediction models, prediction errors can be decomposed into two main subcomponents: error due to bias, and error due to variance.

Bias-variance trade-off is tension between the error introduced by the bias and the error produced by the variance. To understand how to make the most of this trade-off and avoid underfit or overfit our model, lets first learn that Bias an Variance.

An error due to Bias is the distance between the predictions of a model and the true values. In this type of error, the model pays little attention to training data and oversimplifies the model and doesn’t learn the patterns. The model learns the wrong relations by not taking in account all the features

Variability of model prediction for a given data point or a value that tells us the spread of our data. In this type of error, the model pays an lot of attention in training data, to the point to memorize it instead of learning from it. A model with a high error of variance is not flexible to generalize on the data which it hasn’t seen before.

Overfitting and underfitting

Overfitting is when a model studys data so well that it even looks at it;s outliers and performs poorly on the new data. Overfitting in Machine Learning. A statistical model is said to be overfitted when the model does not make accurate predictions on testing data. When a model gets trained with so much data, it starts learning from the noise and inaccurate data entries in our data set. And when testing with test data results in High variance. Then the model does not categorize the data correctly, because of too many details and noise. The causes of overfitting are the non-parametric and non-linear methods because these types of machine learning algorithms have more freedom in building the model based on the dataset and therefore they can really build unrealistic models. A solution to avoid overfitting is using a linear algorithm if we have linear data or using the parameters like the maximal depth if we are using decision trees.

A statistical model or a machine learning algorithm is said to have underfitting when a model is too simple to capture data complexities. It represents the inability of the model to learn the training data effectively result in poor performance both on the training and testing data. In simple terms, an underfit model’s are inaccurate, especially when applied to new, unseen examples. It mainly happens when we uses very simple model with overly simplified assumptions. To address underfitting problem of the model, we need to use more complex models, with enhanced feature representation, and less regularization.

Data mining

Data mining is the process of finding anomalies, patterns and correlations within large data sets to predict outcomes. Using a broad range of techniques, you can use this information to increase revenues, cut costs, improve customer relationships, reduce risks and more.

Data mining is the process of analyzing a large batch of information to discern trends and patterns.
Data mining can be used by corporations for everything from learning about what customers are interested in or want to buy to fraud detection and spam filtering.
Data mining programs break down patterns and connections in data based on what information users request or provide.
Social media companies use data mining techniques to commodify their users in order to generate profit.
This use of data mining has come under criticism lately as users are often unaware of the data mining happening with their personal information, especially when

Cross-Validation and Regularization

Regularization is a way of avoiding overfit by restricting the magnitude of model coefficients (or in deep learning, node weights). A simple example of regularization is the use of ridge or lasso regression to fit linear models in the presence of collinear variables or (quasi-)separation. The intuition is that smaller coefficients are less sensitive to idiosyncracies in the training data, and hence, less likely to overfit.

Cross-validation is a way to safely reuse training data in nested model situations. This includes both the case of setting hyperparameters before fitting a model, and the case of fitting models (let’s call them base learners) that are then used as variables in downstream models, as shown in Figure 1. In either situation, using the same data twice can lead to models that are overtuned to idiosyncracies in the training data, and more likely to overfit.

Figure 1 Properly nesting models with cross-validation

Hyper-parameters

When you’re training machine learning models, each dataset and model needs a different set of hyperparameters, which are a kind of variable. The only way to determine these is through multiple experiments, where you pick a set of hyperparameters and run them through your model. This is called hyperparameter tuning. In essence, you’re training your model sequentially with different sets of hyperparameters. This process can be manual, or you can pick one of several automated hyperparameter tuning methods.

Whichever method you use, you need to track the results of your experiments. You’ll have to apply some form of statistical analysis, such as the loss function, to determine which set of hyperparameters gives the best result. Hyperparameter tuning is an important and computationally intensive process.

ROC Curve

https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc

Deep Learning

Deep learning is a method in artificial intelligence (AI) that teaches computers to process data in a way that is inspired by the human brain. Deep learning models can recognize complex patterns in pictures, text, sounds, and other data to produce accurate insights and predictions. You can use deep learning methods to automate tasks that typically require human intelligence, such as describing images or transcribing a sound file into text.

Artificial intelligence (AI) attempts to train computers to think and learn as humans do. Deep learning technology drives many AI applications used in everyday products, such as the following:

Digital assistants
Voice-activated television remotes
Fraud detection
Automatic facial recognition

It is also a critical component of emerging technologies such as self-driving cars, virtual reality, and more.

Deep learning models are computer files that data scientists have trained to perform tasks using an algorithm or a predefined set of steps. Businesses use deep learning models to analyze data and make predictions in various applications.

Deep learning has several use cases in automotive, aerospace, manufacturing, electronics, medical research, and other fields. These are some examples of deep learning:

Self-driving cars use deep learning models to automatically detect road signs and pedestrians.
Defense systems use deep learning to automatically flag areas of interest in satellite images.
Medical image analysis uses deep learning to automatically detect cancer cells for medical diagnosis.
Factories use deep learning applications to automatically detect when people or objects are within an unsafe distance of machines.

Linear Regression

Linear regression analysis is used to predict the value of a variable based on the value of another variable. The variable you want to predict is called the dependent variable. The variable you are using to predict the other variable’s value is called the independent variable.

This form of analysis estimates the coefficients of the linear equation, involving one or more independent variables that best predict the value of the dependent variable. Linear regression fits a straight line or surface that minimizes the discrepancies between predicted and actual output values. There are simple linear regression calculators that use a “least squares” method to discover the best-fit line for a set of paired data. You then estimate the value of X (dependent variable) from Y (independent variable).

Logistic Regression

Logistic regression is a data analysis technique that uses mathematics to find the relationships between two data factors. It then uses this relationship to predict the value of one of those factors based on the other. The prediction usually has a finite number of outcomes, like yes or no.

For example, let’s say you want to guess if your website visitor will click the checkout button in their shopping cart or not. Logistic regression analysis looks at past visitor behavior, such as time spent on the website and the number of items in the cart. It determines that, in the past, if visitors spent more than five minutes on the site and added more than three items to the cart, they clicked the checkout button. Using this information, the logistic regression function can then predict the behavior of a new website visitor.

Neural Networks

A neural network is a method in artificial intelligence that teaches computers to process data in a way that is inspired by the human brain. It is a type of machine learning process, called deep learning, that uses interconnected nodes or neurons in a layered structure that resembles the human brain. It creates an adaptive system that computers use to learn from their mistakes and improve continuously. Thus, artificial neural networks attempt to solve complicated problems, like summarizing documents or recognizing faces, with greater accuracy.

Neural networks have several use cases across many industries, such as the following:

Medical diagnosis by medical image classification
Targeted marketing by social network filtering and behavioral data analysis
Financial predictions by processing historical data of financial instruments
Electrical load and energy demand forecasting
Process and quality control
Chemical compound identification

Clustering

Introduction to Clustering: It is a type of unsupervised learning method. An unsupervised learning method is a method in which we draw references from datasets consisting of input data without labeled responses. Generally, it is used as a process to find meaningful structure, explanatory underlying processes, generative features, and groupings inherent in a set of examples.

Clustering is the task of dividing the population or data points into a number of groups such that data points in the same groups are more similar to other data points in the same group and dissimilar to the data points in other groups. It is basically a collection of objects on the basis of similarity and dissimilarity between them.

For example The data points in the graph below clustered together can be classified into one single group. We can distinguish the clusters, and we can identify that there are 3 clusters in the below picture.

Decision Tree

A decision tree is a type of supervised machine learning used to categorize or make predictions based on how a previous set of questions were answered. The model is a form of supervised learning, meaning that the model is trained and tested on a set of data that contains the desired categorization.

The decision tree may not always provide a clear-cut answer or decision. Instead, it may present options so the data scientist can make an informed decision on their own. Decision trees imitate human thinking, so it’s generally easy for data scientists to understand and interpret the results.

Root node: The base of the decision tree.
Splitting: The process of dividing a node into multiple sub-nodes.
Decision node: When a sub-node is further split into additional sub-nodes.
Leaf node: When a sub-node does not further split into additional sub-nodes; represents possible outcomes.
Pruning: The process of removing sub-nodes of a decision tree.
Branch: A subsection of the decision tree consisting of multiple nodes.

A decision tree resembles, well, a tree. The base of the tree is the root node. From the root node flows a series of decision nodes that depict decisions to be made. From the decision nodes are leaf nodes that represent the consequences of those decisions. Each decision node represents a question or split point, and the leaf nodes that stem from a decision node represent the possible answers. Leaf nodes sprout from decision nodes similar to how a leaf sprouts on a tree branch. This is why we call each subsection of a decision tree a “branch.” Let’s take a look at an example for this. You’re a golfer, and a consistent one at that. On any given day you want to predict where your score will be in two buckets: below par or over par.

more: https://towardsdatascience.com/apriori-association-rule-mining-explanation-and-python-implementation-290b42afdfc6

Pattern Recognition

Pattern recognition is the ability of machines to identify patterns in data, and then use those patterns to make decisions or predictions using computer algorithms. It’s a vital component of modern artificial intelligence (AI) systems. This guide provides an overview of the most important techniques used to recognize patterns and real-world applications. We will look into what pattern recognition is, and review practical pattern recognition systems and forms of pattern recognition with AI.

Bayesian Linear Regression

In the Bayesian viewpoint, we formulate linear regression using probability distributions rather than point estimates. The response, y, is not estimated as a single value, but is assumed to be drawn from a probability distribution. The model for Bayesian Linear Regression with the response sampled from a normal distribution is:

More: https://towardsdatascience.com/introduction-to-bayesian-linear-regression-e66e60791ea7

Algorithms

Machine learning algorithms form the core of data science applications. They enable computers to learn from data and make predictions or decisions without being explicitly programmed. This section will explore various machine learning algorithms, including supervised learning algorithms like regression and classification and unsupervised learning algorithms like clustering and dimensionality reduction. Commonly used alorithms are:

Classification
Regression
Least Squares method

Apriori Alorithem and Association Mining

The most famous story about association rule mining is the “beer and diaper”. Researchers discovered that customers who buy diapers also tend to buy beer. This classic example shows that there might be many interesting association rules hidden in our daily data.

Association rule mining is a technique to identify underlying relations between different items. There are many methods to perform association rule mining. The Apriori algorithm that we are going to introduce in this article is the most simple and straightforward approach. However, since it’s the fundamental method, there are many different improvements that can be applied to it.

Frequent itemsets or also known as frequent pattern simply means all the itemsets that the support satisfies the minimum support threshold.

All non-empty subsets of a frequent itemset must also be frequent.

Classification

Classification is the process of identifying and and grouping objects or ideas into predetermined categories. In data management, classification enables the separation and sorting of data according to set requirements for various business or personal objectives. Advertisements

In machine learning (ML), classification is used in predictive modeling to assign input data with a class label. For example, an email security program tasked with identifying spam might use natural language processing (NLP) to classify emails as being “spam” or “not spam.”

Examples of classification are:

Separating customer data based on gender
Identifying and keeping frequently used data in disk/memory cache
Data sorting based on content/file type, size and time of data
Sorting for security reasons by classifying data into restricted, public or private data types

Normalizations vs Standardization

Normalization typically means rescales the values into a range of [0,1].

Standardization typically means rescales data to have a mean of 0 and a standard deviation of 1 (unit variance).

KNN Alorithm

The K-Nearest Neighbors (KNN) algorithm is a popular machine learning technique used for classification and regression tasks. It relies on the idea that similar data points tend to have similar labels or values.

During the training phase, the KNN algorithm stores the entire training dataset as a reference. When making predictions, it calculates the distance between the input data point and all the training examples, using a chosen distance metric such as Euclidean distance.

Next, the algorithm identifies the K nearest neighbors to the input data point based on their distances. In the case of classification, the algorithm assigns the most common class label among the K neighbors as the predicted label for the input data point. For regression, it calculates the average or weighted average of the target values of the K neighbors to predict the value for the input data point.

Clustering

From the universe of unsupervised learning algorithms, K-means is probably the most recognized one. This algorithm has a clear objective: partition the data space in such a way so that data points within the same cluster are as similar as possible (intra-class similarity), while data points from different clusters are as dissimilar as possible (inter-class similarity).

K-Means is an unsupervised learning method. In K-means, each cluster is represented by its center (called a “centroid”), which corresponds to the arithmetic mean of data points assigned to the cluster. A centroid is a data point that represents the center of the cluster (the mean), and it might not necessarily be a member of the dataset. This way, the algorithm works through an iterative process until each data point is closer to its own cluster’s centroid than to other clusters’ centroids, minimizing intra-cluster distance at each step. But how?

K-means searches for a predetermined number of clusters within an unlabelled dataset by using an iterative method to produce a final clustering based on the number of clusters defined by the user (represented by the variable K). For example, by setting “k” equal to 2, your dataset will be grouped in 2 clusters, while if you set “k” equal to 4 you will group the data in 4 clusters.

K-means triggers its process with arbitrarily chosen data points as proposed centroids of the groups and iteratively recalculates new centroids in order to converge to a final clustering of the data points. Specifically, the process works as follows:

The algorithm randomly chooses a centroid for each cluster. For example, if we choose a “k” of 3, the algorithm randomly picks 3 centroids.
K-means assigns every data point in the dataset to the nearest centroid, meaning that a data point is considered to be in a particular cluster if it is closer to that cluster’s centroid than any other centroid.
For every cluster, the algorithm recomputes the centroid by taking the average of all points in the cluster, reducing the total intra-cluster variance in relation to the previous step. Since the centroids change, the algorithm re-assigns the points to the closest centroid.
The algorithm repeats the calculation of centroids and assignment of points until the sum of distances between the data points and their corresponding centroid is minimized, a maximum number of iterations is reached, or no changes in centroids value are produced.

Anomoly Detection and Outliers

An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism. Anomalies are instances or collections of data that occur very rarely in the data set and whose features differ significantly from most of the data.

Anomaly detection is examining specific data points and detecting rare occurrences that seem suspicious because they’re different from the established pattern of behaviors. Anomaly detection isn’t new, but as data increases manual tracking is impractical.

An anomaly detection strategy begins by identifying Key Performance Indicators (KPI’s). These are typically tied to the business problem you’re working to solve. You’ll also need to understand the characteristics of your data. How does it flow into your network? Is it continuous or batch? What data points are you tracking? Answering these questions helps sculpt your strategy as the data plays a major role in this process. Next, create a budget and set goals. Lastly, make sure each member of your team understands the goals and the role they play in achieving them.

Time Series

Time series analysis is used for non-stationary data—things that are constantly fluctuating over time or are affected by time. Industries like finance, retail, and economics frequently use time series analysis because currency and sales are always changing. Stock market analysis is an excellent example of time series analysis in action, especially with automated trading algorithms. Likewise, time series analysis is ideal for forecasting weather changes, helping meteorologists predict everything from tomorrow’s weather report to future years of climate change. Examples of time series analysis in action include:

Weather data
Rainfall measurements
Temperature readings
Heart rate monitoring (EKG)
Brain monitoring (EEG)
Quarterly sales
Stock prices
Automated stock trading
Industry forecasts
Interest rates

Naïve Bayers Classifier

The Naïve Bayes classifier is a supervised machine learning algorithm, which is used for classification tasks, like text classification. It is also part of a family of generative learning algorithms, meaning that it seeks to model the distribution of inputs of a given class or category. Unlike discriminative classifiers, like logistic regression, it does not learn which features are most important to differentiate between classes.

More: https://www.ibm.com/topics/naive-bayes

Random Forest

Random forest is a supervised learning algorithm. The “forest” it builds is an ensemble of decision trees, usually trained with the bagging method. The general idea of the bagging method is that a combination of learning models increases the overall result.

One big advantage of random forest is that it can be used for both classification and regression problems, which form the majority of current machine learning systems.

Decision tree is a combination of decisions, and a random forest is a combination of many decision trees. Random forest is slow, but decision tree is fast and easy on large data, especially on regression tasks.

Let’s look at random forest in classification, since classification is sometimes considered the building block of machine learning. Below you can see how a random forest model would look like with two trees:

More: https://builtin.com/data-science/random-forest-algorithm

Lasso Regression

Lasso regression is like linear regression, but it uses a technique “shrinkage” where the coefficients of determination are shrunk towards zero.

Linear regression gives you regression coefficients as observed in the dataset. The lasso regression allows you to shrink or regularize these coefficients to avoid overfitting and make them work better on different datasets.

This type of regression is used when the dataset shows high multicollinearity or when you want to automate variable elimination and feature selection.

More: https://dataaspirant.com/lasso-regression/

Hypothesis Testing

When interpreting research findings, researchers need to assess whether these findings may have occurred by chance. Hypothesis testing is a systematic procedure for deciding whether the results of a research study support a particular theory which applies to a population.

Hypothesis testing uses sample data to evaluate a hypothesis about a population. A hypothesis test assesses how unusual the result is, whether it is reasonable chance variation or whether the result is too extreme to be considered chance variation.

More: https://latrobe.libguides.com/maths/hypothesis-testing

P-Value

The P value is defined as the probability under the assumption of no effect or no difference (null hypothesis), of obtaining a result equal to or more extreme than what was actually observed. The P stands for probability and measures how likely it is that any observed difference between groups is due to chance. Being a probability, P can take any value between 0 and 1. Values close to 0 indicate that the observed difference is unlikely to be due to chance, whereas a P value close to 1 suggests no difference between the groups other than due to chance. Thus, it is common in medical journals to see adjectives such as “highly significant” or “very significant” after quoting the P value depending on how close to zero the value is.

Before the advent of computers and statistical software, researchers depended on tabulated values of P to make decisions. This practice is now obsolete and the use of exact P value is much preferred. Statistical software can give the exact P value and allows appreciation of the range of values that P can take up between 0 and 1. Briefly, for example, weights of 18 subjects were taken from a community to determine if their body weight is ideal (i.e. 100kg). Using student’s t test, t turned out to be 3.76 at 17 degree of freedom. Comparing tstat with the tabulated values, t= 3.26 is more than the critical value of 2.11 at p=0.05 and therefore falls in the rejection zone. Thus we reject null hypothesis that ì = 100 and conclude that the difference is significant. But using an SPSS (a statistical software), the following information came when the data were entered, t = 3.758, P = 0.0016, mean difference = 12.78 and confidence intervals are 5.60 and 19.95. Methodologists are now increasingly recommending that researchers should report the precise P value. For example, P = 0.023 rather than P < 0.05 10. Further, to use P = 0.05 “is an anachronism. It was settled on when P values were hard to compute and so some specific values needed to be provided in tables. Now calculating exact P values is easy (i.e., the computer does it) and so the investigator can report (P = 0.04) and leave it to the reader to (determine its significance)”11. Hypothesis Tests

A statistical test provides a mechanism for making quantitative decisions about a process or processes. The purpose is to make inferences about population parameter by analyzing differences between observed sample statistic and the results one expects to obtain if some underlying assumption is true. This comparison may be a single obser ved value versus some hypothesized quantity or it may be between two or more related or unrelated groups. The choice of statistical test depends on the nature of the data and the study design.

Neyman and Pearson proposed this process to circumvent Fisher’s subjective practice of assessing strength of evidence against the null effect. In its usual form, two hypotheses are put forward: a null hypothesis (usually a statement of null effect) and an alternative hypothesis (usually the opposite of null hypothesis). Based on the outcome of the hypothesis test one hypothesis is rejected and accept the other based on a previously predetermined arbitrary benchmark. This bench mark is designated the P value. However, one runs into making an error: one may reject one hypothesis when in fact it should be accepted and vise versa. There is type I error or á error (i.e., there was no difference but really there was) and type II error or â error (i.e., when there was difference when actually there was none). In its simple format, testing hypothesis involves the following steps:

Identify null and alternative hypotheses.
Determine the appropriate test statistic and its distribution under the assumption that the null hypothesis is true.
Specify the significance level and determine the corresponding critical value of the test statistic under the assumption that null hypothesis is true.
Calculate the test statistic from the data. Having discussed P value and hypothesis testing, fallacies of hypothesis testing and P value are now looked into.

Sampling

A sample is a subset of individuals from a larger population. Sampling means selecting the group that you will actually collect data from in your research. For example, if you are researching the opinions of students in your university, you could survey a sample of 100 students.

In statistics, sampling allows you to test a hypothesis about the characteristics of a population.

SVM’s

A support vector machine (SVM) is a supervised machine learning model that uses classification algorithms for two-group classification problems. After giving an SVM model sets of labeled training data for each category, they’re able to categorize new text.

Compared to newer algorithms like neural networks, they have two main advantages: higher speed and better performance with a limited number of samples (in the thousands). This makes the algorithm very suitable for text classification problems, where it’s common to have access to a dataset of at most a couple of thousands of tagged samples.

More: https://monkeylearn.com/blog/introduction-to-support-vector-machines-svm/

Harmonic Mean

Harmonic mean is a type of average that is calculated by dividing the number of values in a data series by the sum of the reciprocals (1/x_i) of each value in the data series. A harmonic mean is one of the three Pythagorean means (the other two are arithmetic mean and geometric mean). The harmonic mean always shows the lowest value among the Pythagorean means.

The harmonic mean is often used to calculate the average of the ratios or rates. It is the most appropriate measure for ratios and rates because it equalizes the weights of each data point. For instance, the arithmetic mean places a high weight on large data points, while the geometric mean gives a lower weight to the smaller data points.

In finance, the harmonic mean is used to determine the average for financial multiples such as the price-to-earnings (P/E) ratio. The financial multiples should not be averaged using the arithmetic mean because it is biased toward larger values. One of the most common problems in finance that uses the harmonic mean is the calculation of the ratio of a portfolio that consists of several securities.

More: https://corporatefinanceinstitute.com/resources/data-science/harmonic-mean/

Bootstrapping

Bootstrapping is a statistical procedure that resamples a single dataset to create many simulated samples. This process allows you to calculate standard errors, construct confidence intervals, and perform hypothesis testing for numerous types of sample statistics. Bootstrap methods are alternative approaches to traditional hypothesis testing and are notable for being easier to understand and valid for more conditions.

In this blog post, I explain bootstrapping basics, compare bootstrapping to conventional statistical methods, and explain when it can be the better method. Additionally, I’ll work through an example using real data to create bootstrapped confidence intervals.

Multiple Linear Regression

Regression models are used to describe relationships between variables by fitting a line to the observed data. Regression allows you to estimate how a dependent variable changes as the independent variable(s) change.

Multiple linear regression is used to estimate the relationship between two or more independent variables and one dependent variable. You can use multiple linear regression when you want to know:

How strong the relationship is between two or more independent variables and one dependent variable (e.g. how rainfall, temperature, and amount of fertilizer added affect crop growth).
The value of the dependent variable at a certain value of the independent variables (e.g. the expected yield of a crop at certain levels of rainfall, temperature, and fertilizer addition).

Multiple linear regression example You are a public health researcher interested in social factors that influence heart disease. You survey 500 towns and gather data on the percentage of people in each town who smoke, the percentage of people in each town who bike to work, and the percentage of people in each town who have heart disease.

Because you have two independent variables and one dependent variable, and all your variables are quantitative, you can use multiple linear regression to analyze the relationship between them.

Confusion Matrix

Well, it is a performance measurement for machine learning classification problem where output can be two or more classes. It is a table with 4 different combinations of predicted and actual values.

It is extremely useful for measuring Recall, Precision, Specificity, Accuracy, and most importantly AUC-ROC curves.

Let’s understand TP, FP, FN, TN in terms of pregnancy analogy.

True Positive:

Interpretation: You predicted positive and it’s true.

You predicted that a woman is pregnant and she actually is.

True Negative:

Interpretation: You predicted negative and it’s true.

You predicted that a man is not pregnant and he actually is not.

False Positive: (Type 1 Error)

Interpretation: You predicted positive and it’s false.

You predicted that a man is pregnant but he actually is not.

False Negative: (Type 2 Error)

Interpretation: You predicted negative and it’s false.

You predicted that a woman is not pregnant but she actually is.

Just Remember, We describe predicted values as Positive and Negative and actual values as True and False.

Univariate vs Bivariate vs Multivariate

Univariate data consists of only one variable. The analysis of univariate data is thus the simplest form of analysis since the information deals with only one quantity that changes. It does not deal with causes or relationships and the main purpose of the analysis is to describe the data and find patterns that exist within it. The example of a univariate data can be height.

Bivariate data involves two different variables. The analysis of this type of data deals with causes and relationships and the analysis is done to find out the relationship among the two variables.

When the data involves three or more variables, it is categorized under multivariate. Example of this type of data is suppose an advertiser wants to compare the popularity of four advertisements on a website, then their click rates could be measured for both men and women and relationships between variables can then be examined. It is similar to bivariate but contains more than one dependent variable. The ways to perform analysis on this data depends on the goals to be achieved. Some of the techniques are regression analysis, path analysis, factor analysis and multivariate analysis of variance (MANOVA).

Monte Carlo Simulation

The Monte Carlo simulation is a mathematical technique that predicts possible outcomes of an uncertain event. Computer programs use this method to analyze past data and predict a range of future outcomes based on a choice of action. For example, if you want to estimate the first month’s sales of a new product, you can give the Monte Carlo simulation program your historical sales data. The program will estimate different sales values based on factors such as general market conditions, product price, and advertising budget.

The Monte Carlo simulation provides multiple possible outcomes and the probability of each from a large pool of random data samples. It offers a clearer picture than a deterministic forecast. For instance, forecasting financial risks requires analyzing dozens or hundreds of risk factors. Financial analysts use the Monte Carlo simulation to produce the probability of every possible outcome.

Companies use Monte Carlo methods to assess risks and make accurate long-term predictions. The following are some examples of use cases. Business

Business leaders use Monte Carlo methods to project realistic scenarios when making decisions. For example, a marketer needs to decide whether it’s feasible to increase the advertising budget for an online yoga course. They could use the Monte Carlo mathematical model on uncertain factors or variables such as the following:

Subscription fee
Advertising cost
Sign-up rate
Retention

The simulation would then predict the impact of changes on these factors to indicate whether the decision is profitable. Finance

Financial analysts often make long-term forecasts on stock prices and then advise their clients of appropriate strategies. While doing so, they must consider market factors that could cause drastic changes to the investment value. As a result, they use the Monte Carlo simulation to predict probable outcomes to support their strategies. Online gaming

Strict regulations govern the online gaming and betting industry. Customers expect gaming software to be fair and mimic the characteristics of its physical counterpart. Therefore, game programmers use the Monte Carlo method to simulate results and ensure a fair-play experience.

LLM’s

A large language model (LLM) is a deep learning algorithm that can perform a variety of natural language processing (NLP) tasks. Large language models use transformer models and are trained using massive datasets — hence, large. This enables them to recognize, translate, predict, or generate text or other content.

Large language models are also referred to as neural networks (NNs), which are computing systems inspired by the human brain. These neural networks work using a network of nodes that are layered, much like neurons.

In addition to teaching human languages to artificial intelligence (AI) applications, large language models can also be trained to perform a variety of tasks like understanding protein structures, writing software code, and more. Like the human brain, large language models must be pre-trained and then fine-tuned so that they can solve text classification, question answering, document summarization, and text generation problems. Their problem-solving capabilities can be applied to fields like healthcare, finance, and entertainment where large language models serve a variety of NLP applications, such as translation, chatbots, AI assistants, and so on.

Large language models also have large numbers of parameters, which are akin to memories the model collects as it learns from training. Think of these parameters as the model’s knowledge bank.

Ridge Regression

Ridge regression is a model-tuning method that is used to analyze any data that suffers from multicollinearity. This method performs L2 regularization. When the issue of multicollinearity occurs, least-squares are unbiased, and variances are large, this results in predicted values being far away from the actual values.

More: https://www.mygreatlearning.com/blog/what-is-ridge-regression/

Polynomial Regression

Polynomial Regression is a form of regression analysis in which the relationship between the independent variables and dependent variables are modeled in the nth degree polynomial.
Polynomial Regression models are usually fit with the method of least squares.The least square method minimizes the variance of the coefficients,under the Gauss Markov Theorem.
Polynomial Regression is a special case of Linear Regression where we fit the polynomial equation on the data with a curvilinear relationship between the dependent and independent variables.

More: https://medium.com/analytics-vidhya/understanding-polynomial-regression-5ac25b970e18

Quantile Regression

Regression is a statistical method broadly used in quantitative modeling. Multiple linear regression is a basic and standard approach in which researchers use the values of several variables to explain or predict the mean values of a scale outcome. However, in many circumstances, we are more interested in the median, or an arbitrary quantile of the scale outcome. Quantile regression models the relationship between a set of predictor (independent) variables and specific percentiles (or “quantiles”) of a target (dependent) variable, most often the median. It has two main advantages over Ordinary Least Squares regression:

Quantile regression makes no assumptions about the distribution of the target variable.
Quantile regression tends to resist the influence of outlying observations

Quantile regression is widely used for researching in industries such as ecology, healthcare, and financial economics.

Example: What is the relationship between total household income and the proportion of income that is spent on food? Engel’s law is an observation in economics stating that as income rises, the proportion of income spent on food falls, even if absolute expenditure on food rises. Applying quantile regression to these data, you can determine which food expense can cover 90% of families (for 100 families with a given income) when not interested in the mean food expense. Statistics: Quantile Regression, Simplex approach, Frisch-Newton interior-point non-linear optimization algorithm, Barrodale and Roberts, Bofinger, Hall Sheather, bandwidth, significance level, matrix manipulations, convergence criterion, regression weights, intercept term, predicted target, prediction residuals, tabulation, prediction plots, parameter estimates, covariance matrix, correlation matrix, observed values, confidence interval.

Probability

Statistically, the probability of any one of us being here is so small that you’d think the mere fact of existing would keep us all in a contented dazzlement of surprise. Lewis Thomas

For anyone taking first steps in data science, Probability is a must know concept. Concepts of probability theory are the backbone of many important concepts in data science like inferential statistics to Bayesian networks. It would not be wrong to say that the journey of mastering statistics begins with probability.

More: https://www.analyticsvidhya.com/blog/2017/02/basic-probability-data-science-with-examples/

Master Study Guide

Josh Arford

Supervised vs Unsupervised Learning

Machine Learning

Bias-Variance Tradeoff

Overfitting and underfitting

Data mining

Cross-Validation and Regularization

Hyper-parameters

ROC Curve

Deep Learning

Linear Regression

Logistic Regression

Neural Networks

Clustering

Decision Tree

Pattern Recognition

Bayesian Linear Regression

Algorithms

Apriori Alorithem and Association Mining

Classification

Normalizations vs Standardization

KNN Alorithm

Clustering

Anomoly Detection and Outliers

Time Series

Naïve Bayers Classifier

Random Forest

Lasso Regression

Hypothesis Testing

P-Value

Sampling

SVM’s

Harmonic Mean

Bootstrapping

Multiple Linear Regression

Confusion Matrix

Univariate vs Bivariate vs Multivariate

Monte Carlo Simulation

LLM’s

Ridge Regression

Polynomial Regression

Quantile Regression

Probability