Hello! Welcome to Dimensionality Reduction using feature extraction and feature selection Dimensionality Reduction is the process of

reducing the number of variables/features in review. Dimensionality Reduction can be divided into

two subcategories called Feature Selection which includes Wrappers, Filters, and Embedded. And Feature Extraction which includes Principle

Components Analysis. So how exactly does Dimensionality Reduction

Improve Performance? It does so by reducing the number of features

that are to be considered. To see how this works, think of a simple algebraic

equation. a + b + c + d=e. If you can equate ab=a + b, making a representation

of two variables into one, you’re using Feature Extraction to reduce the number of variables. Now, consider if c was equal to 0 or an arbitrarily

small number, it wouldn’t really be relevant, therefore it could be taken out of the equation. By doing so, you’d be using Feature Selection

because you’d be selecting only the relevant variables and leaving out the irrelevant ones. Feature Selection is the process of selecting

a subset of relevant features or variables. There are 3 main subset types:

* Wrappers, * Filters, and

* Embedded. To help you visualize how feature selection

works, imagine a set of variables let’s use a series of shapes, for example — with each

shape representing different dimensions or features. By ignoring the irrelevant variables, or selecting

the ones that improve accuracy, we reduce the amount of strain on the system and produce

better results. Wrappers use a predictive model that scores

feature subsets based on the error-rate of the model. While they’re computationally intensive, they

usually produce the best selection of features. A popular technique is called stepwise regression. It’s an algorithm that adds the best feature,

or deletes the worst feature at each iteration. Filters use a proxy measure which is less

computationally intensive but slightly less accurate. So it might have a good prediction, but it

still may not be the best Filters do capture the practicality of the

dataset but, in comparison to error measurement, the feature set that’s selected will be more

general than if a Wrapper was used. An interesting fact about filters is that

they produce a feature set that don’t contain assumptions based on the predictive model,

making it a useful tool for exposing relationships between features, such as which variables

are ‘Bad’ together and, as a result, drop the accuracy or ‘Good’ together and therefore

raise the accuracy. Embedded algorithms learn about which features

best contribute to an accurate model during the model building process. The most common type of is called a regularization

model. In our shape example, it would be similar

to picking the shapes or good features in each step of the model building process. It might be Picking the Triangle feature in

Step One, Picking the Cross feature in Step Two. Or Picking the Lightning feature in Step Three

to obtain our accurate model. Feature Extraction is the process of transforming

or projecting a space composing of many dimensions into a space of fewer dimensions. This is similar to representing data in multiple

dimensions to ones that are less. This is useful for when you need to keep your

information but want to reduce the resources that it may consume during processing. The main linear technique is called Principle

Components Analysis. There are other linear and non-linear techniques

but reviewing them here is out of scope for this course. Principle Components Analysis is the reduction

of higher vector spaces to lower orders through projection. It can be used to visualize the dataset through

compact representation and compression of dimensions. An easy representation of this would be the

projection from a 3-dimensional plane to a 2-dimensional one. A plane is first found which captures most

(if not all) of the information. Then the data is projected onto new axes and

a reduction in dimensions occur. When the projection of components happens,

new axes are created to describe the relationship. This is called the principle axes, and the

new data is called principle components. This becomes a more compact visualization

for the data and thus, is easier to work with. Thanks for watching!

## 5 comments

## Ryan Denziloe

The "algebraic" explanation is very bad. In algebra, "ab" means the product of a and b, so involves two terms and is no more tractable than a + b. What you really meant is something like substituting x = a + b. But it's still a pretty bad analogy.

## Rush Sina

that is very nice and simple video. Thank you

## Azmain Yakin Srizon

Hey! I'm starting and it helped a lot. Just letting you know. Thanks!

## sanoopk sanu

it is a nice video. very helpful for me ….thank you

## amir mohammad babaee parsa

Thank you short and simple.