Posts

Random Forest (Easily Explained)

Image
Random Forest (Easily Explained)                    -  (With Python implementation in depth!) R andom Forest is an ensemble technique which can be used for both regression and classification tasks. An ensemble method is a technique that combines the predictions from multiple machine learning algorithms together to make more accurate predictions than any individual model. Random Forests basically combine the simplicity of decision tree with flexibility resulting in a vast improvement in accuracy. It is also called “Bagging”(Bootstrap Aggregation) and the main goal of the Random Forest is to reduce the variance of the decision tree. Low Bias and High Variance:   Over-fitting  (this is where we use Random Forest to minimize the variance by splitting the data into chunks of features/data and train it). Random Forest is used when our goal is to  reduce the variance  of a decision tree. Here idea is to create several  subsets of data  from the training samples chosen randomly with replacemen

Handle Imbalanced Dataset

Image
Handle Imbalanced Dataset  -  (Along with Implementation in python!) Let's take an example of Cancer Patient dataset where we are checking whether a person is having cancer or not based upon the input features. Suppose in our dataset we have 1000 records and out of those 1000, 900 are the ones having cancer and rest 100 is non-cancer patient data. So it is clearly an example of imbalance dataset as we have more number of rows with people having cancer than not having cancer. So if we train our data with this imbalance dataset and test it later with the new testing data, our model will be a lot partial towards the people having cancer as we have trained our model with the imbalanced dataset and thus our model accuracy with being very less in that case. So How do we handle the Imbalance Dataset? Let look at some of the great techniques to avoid this kind of problem and train our model in a more precise way. Now, we will use a couple of techniques to resolve this imbalanced dataset pr