Hello! Welcome to Dimensionality Reduction using feature extraction and feature selection Dimensionality Reduction is the process of
reducing the number of variables/features in review. Dimensionality Reduction can be divided into
two subcategories called Feature Selection which includes Wrappers, Filters, and Embedded. And Feature Extraction which includes Principle
Components Analysis. So how exactly does Dimensionality Reduction
Improve Performance? It does so by reducing the number of features
that are to be considered. To see how this works, think of a simple algebraic
equation. a + b + c + d=e. If you can equate ab=a + b, making a representation
of two variables into one, you’re using Feature Extraction to reduce the number of variables. Now, consider if c was equal to 0 or an arbitrarily
small number, it wouldn’t really be relevant, therefore it could be taken out of the equation. By doing so, you’d be using Feature Selection
because you’d be selecting only the relevant variables and leaving out the irrelevant ones. Feature Selection is the process of selecting
a subset of relevant features or variables. There are 3 main subset types:
* Wrappers, * Filters, and
* Embedded. To help you visualize how feature selection
works, imagine a set of variables let’s use a series of shapes, for example — with each
shape representing different dimensions or features. By ignoring the irrelevant variables, or selecting
the ones that improve accuracy, we reduce the amount of strain on the system and produce
better results. Wrappers use a predictive model that scores
feature subsets based on the error-rate of the model. While they’re computationally intensive, they
usually produce the best selection of features. A popular technique is called stepwise regression. It’s an algorithm that adds the best feature,
or deletes the worst feature at each iteration. Filters use a proxy measure which is less
computationally intensive but slightly less accurate. So it might have a good prediction, but it
still may not be the best Filters do capture the practicality of the
dataset but, in comparison to error measurement, the feature set that’s selected will be more
general than if a Wrapper was used. An interesting fact about filters is that
they produce a feature set that don’t contain assumptions based on the predictive model,
making it a useful tool for exposing relationships between features, such as which variables
are ‘Bad’ together and, as a result, drop the accuracy or ‘Good’ together and therefore
raise the accuracy. Embedded algorithms learn about which features
best contribute to an accurate model during the model building process. The most common type of is called a regularization
model. In our shape example, it would be similar
to picking the shapes or good features in each step of the model building process. It might be Picking the Triangle feature in
Step One, Picking the Cross feature in Step Two. Or Picking the Lightning feature in Step Three
to obtain our accurate model. Feature Extraction is the process of transforming
or projecting a space composing of many dimensions into a space of fewer dimensions. This is similar to representing data in multiple
dimensions to ones that are less. This is useful for when you need to keep your
information but want to reduce the resources that it may consume during processing. The main linear technique is called Principle
Components Analysis. There are other linear and non-linear techniques
but reviewing them here is out of scope for this course. Principle Components Analysis is the reduction
of higher vector spaces to lower orders through projection. It can be used to visualize the dataset through
compact representation and compression of dimensions. An easy representation of this would be the
projection from a 3-dimensional plane to a 2-dimensional one. A plane is first found which captures most
(if not all) of the information. Then the data is projected onto new axes and
a reduction in dimensions occur. When the projection of components happens,
new axes are created to describe the relationship. This is called the principle axes, and the
new data is called principle components. This becomes a more compact visualization
for the data and thus, is easier to work with. Thanks for watching!