2018.12.2 Fix hyperlinks and formatting issues.

2022-05-07 18:26:02 +03:00 · 2018-12-02 22:44:06 +08:00
parent 149c80cedf
commit 918bf6168c
3 changed files with 216 additions and 136 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -3,4 +3,6 @@ __pycache__
 .ipynb_checkpoints
 .gitignore.bak
 history
-README_bk.md
+README_bk.md
+A Short Guide for Feature Engineering and Feature Selection.docx
+A Short Guide for Feature Engineering and Feature Selection.html
--- a/Selection.md
+++ b/Selection.md
@@ -1,6 +1,75 @@
 **Table of Contents**:

-[TOC]
+A Short Guide for Feature Engineering and Feature Selection
+	0. Basic Concepts
+		0.1 What is Machine Learning
+		0.2 Methodology
+		0.3 Typical Tasks
+		0.4 Terminology
+	1. Data Exploration
+		1.1 Variables
+		1.2 Variable Identification
+		1.3 Univariate Analysis
+		1.4 Bi-variate Analysis
+	2. Feature Cleaning
+		2.1 Missing Values
+			2.1.1 Why Missing Data Matters
+			2.1.2 Missing Mechanisms 
+			2.1.3 How to Assume a Missing Mechanism
+			2.1.4 How to Handle Missing Data
+		2.2 Outliers
+			2.2.1 Why Outlier Matters
+			2.2.2 Outlier Detection
+			2.2.3 How to Handle Outliers
+		2.3 Rare Values
+			2.3.1 Why Rare Value Matters
+			2.3.2 How to Handle Rare Value
+		2.4 High Cardinality
+			2.4.1 Why High Cardinality Matters
+			2.4.2 How to Handle High Cardinality
+	3. Feature Engineering
+		3.1 Feature Scaling
+			3.1.1 Why Feature Scaling Matters
+			3.1.2 How to Handle Feature Scaling
+		3.2 Discretize
+			3.2.1 Why Discretize Matters
+			3.2.2 How to Handle Discretization
+		3.3 Feature Encoding
+			3.3.1 Why Feature Encoding Matters
+			3.3.2 How to Handle Feature Encoding
+		3.4 Feature Transformation
+			3.4.1 Why Feature Transformation Matters
+				3.4.1.1 Linear Assumption
+				3.4.1.2 Variable Distribution
+			3.4.2 How to Handle Feature Transformation
+		3.5 Feature Generation
+			3.5.1 Missing Data Derived Feature
+			3.5.2 Simple Statistical Derived Feature
+			3.5.3 Feature Crossing
+			3.5.4 Ratios and Proportions
+			3.5.5 Cross Products between Categorical Features
+			3.5.6 Polynomial Expansion
+			3.5.7 Feature Learning by Trees
+			3.5.8 Feature Learning by Deep Networks
+	4. Feature Selection
+		4.1 Filter Method
+		4.2 Wrapper Method
+			4.2.1 Forward Selection
+			4.2.2 Backward Elimination
+			4.2.3 Exhaustive Feature Selection
+			4.2.4 Genetic Algorithm
+		4.3 Embedded Method
+			4.3.1 Regularization with Lasso
+			4.3.2 Random Forest Importance
+			4.3.3 Gradient Boosted Trees Importance
+		4.4 Feature Shuffling
+		4.5 Hybrid Method
+			4.5.1 Recursive Feature Elimination
+			4.5.2 Recursive Feature Addition
+		4.6 Dimensionality Reduction
+	5. Data Leakage
+
+

 # A Short Guide for Feature Engineering and Feature Selection

@@ -137,9 +206,9 @@ Descriptive statistics between two or more variables.

 A study on the impact of missing data on different ML algorithm can be found [here](http://core.ecu.edu/omgt/krosj/IMDSDataMining2003.pdf).

-#### 2.1.2 Missing Mechanisms[^1]
+#### 2.1.2 Missing Mechanisms [1]

-It is important to understand the mechanisms by which missing fields are introduced in a dataset. Depending on the mechanism, we may choose to process the missing values differently. The mechanisms were first introduced by Rubin[^2].
+It is important to understand the mechanisms by which missing fields are introduced in a dataset. Depending on the mechanism, we may choose to process the missing values differently. The mechanisms were first introduced by Rubin [2].

 **Missing Completely at Random**

@@ -201,7 +270,7 @@ simultaneously, so that we both catch the value of missingness and obtain a comp

 ### 2.2 Outliers

-**Definition**:  An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism.[^3]  
+**Definition**:  An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism. [3]  

 **Note**:  Outliers, depending on the context, either deserve special attention or should be completely ignored. For example, an unusual transaction on a credit card is usually a sign of fraudulent activity, while a height of 1600cm of a person is very likely due to measurement error and should be filter out or impute with something else.

@@ -219,16 +288,16 @@ On the other hand some algorithm are more robust to outliers. For example, decis

 #### 2.2.2 Outlier Detection

-In fact outlier analysis and anomaly detection is a huge field of research. Charu's book "Outlier Analysis"[^4] offer a great insight into the topic. PyOD[^5] is a comprehensive Python toolkit which contains many of the advanced methods in this field.
+In fact outlier analysis and anomaly detection is a huge field of research. Charu's book "Outlier Analysis" [4] offer a great insight into the topic. PyOD[5] is a comprehensive Python toolkit which contains many of the advanced methods in this field.

 All the methods here listed are for univariate outlier detection. Multivariate outlier detection is beyond the scope of this guide.

 | Method                                   | Definition                                                   | Pros                                                         | Cons                                                         |
 | ---------------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
 | Detect by arbitrary boundary             | identify outliers based on arbitrary boundaries              | flexiable                                                    | require business understanding                               |
-| Mean & Standard Deviation method[^6][^7] | outlier detection by Mean & Standard Deviation Method        | good for variable with Gaussian distribution (68-95-99 rule) | sensitive to extreme value itself (as the outlier increase the sd) |
-| IQR method[^8]                           | outlier detection by Interquartile Ranges Rule               | robust than Mean & SD method as it use quantile & IQR. Resilient to extremes. | can be too aggressive                                        |
-| MAD method[^6][^7]                       | outlier detection by Median and Median Absolute Deviation Method | robust than Mean & SD method. Resilient to extremes.         | can be too aggressive                                        |
+| Mean & Standard Deviation method [6],[7] | outlier detection by Mean & Standard Deviation Method        | good for variable with Gaussian distribution (68-95-99 rule) | sensitive to extreme value itself (as the outlier increase the sd) |
+| IQR method [8]                           | outlier detection by Interquartile Ranges Rule               | robust than Mean & SD method as it use quantile & IQR. Resilient to extremes. | can be too aggressive                                        |
+| MAD method [6],[7]                       | outlier detection by Median and Median Absolute Deviation Method | robust than Mean & SD method. Resilient to extremes.         | can be too aggressive                                        |

 However, beyond these methods, it's more important to keep in mind that the business context should govern how you define and react to these outliers. The meanings of your findings should be dictated by the underlying context, rather than the number itself.

@@ -331,7 +400,6 @@ All these methods attempt to group some of the labels and reduce cardinality. Gr

 A comparison of three methods when facing outliers:

-<div align=center>

 ![scaling](/images/scaling.png)

@@ -377,7 +445,7 @@ Below is some additional resource on this topic:
 | Equal frequency binning             | divides the scope of possible values of the variable into N bins, where each bin carries the same amount of observations | may help boost the algorithm's performance                   | this arbitrary binning may disrupt the relationship with the target |
 | K-means binning                     | using k-means to partition values into clusters              | /                                                            | needs hyper-parameter tuning                                 |
 | Discretization using decision trees | using a decision tree to identify the optimal splitting points that would determine the bins | observations within each  bin are more similar to themselves than to those of other bins | 1. may cause over-fitting<br>2. may not get a good performing tree |
-| ChiMerge[^11]                       | supervised hierarchical bottom-up (merge) method that locally exploits the chi-square criterion to decide whether two adjacent intervals are similar enough to be merged | robust and make use of a priori knowledge                    | cannot handle unlabeled data                                 |
+| ChiMerge[11]                       | supervised hierarchical bottom-up (merge) method that locally exploits the chi-square criterion to decide whether two adjacent intervals are similar enough to be merged | robust and make use of a priori knowledge                    | cannot handle unlabeled data                                 |

 In general there's no best choice of discretization method. It really depends on the dataset and the following learning algorithm. Study carefully about your features and context before deciding. You can also try different methods and compare the model performance.

@@ -399,8 +467,8 @@ We must transform strings of categorical variables into numbers so that algorith
 | Ordinal-encoding         | replace the labels by some ordinal number if ordinal is meaningful | straightforward                                              | does not add additional value to make the variable more predictive |
 | Count/frequency encoding | replace each label of the categorical variable by the count/frequency within that category | /                                                            | 1. may yield same encoding for two different labels (if they appear same times) and lose valuable info.<br />2. may not add predictive power |
 | Mean encoding     | replace the label by the mean of the target for that label. (the target must be 0/1 valued or continuous) | 1. Capture information within the label, therefore rendering more predictive features<br/>2. Create a monotonic relationship between the variable and the target<br>3. Do not expand the feature space | Prone to cause over-fitting                                  |
-| WOE encoding[^9]         | replace the label  with Weight of Evidence of each label. WOE is computed from the basic odds ratio: ln( (Proportion of Good Outcomes) / (Proportion of Bad Outcomes)) | 1. Establishes a monotonic relationship to the dependent variable<br/>2. Orders the categories on a "logistic" scale which is natural for logistic regression<br>3，The transformed variables, can then be compared because they are on the same scale. Therefore, it is possible to determine which one is more predictive. | 1. May incur in loss of information (variation) due to binning to few categories<br/>2. Prone to cause over-fitting |
-| Target encoding[^10]     | Similar to mean encoding, but use both posterior probability and prior probability of the target | 1. Capture information within the label, therefore rendering more predictive features<br/>2. Create a monotonic relationship between the variable and the target<br/>3. Do not expand the feature space | Prone to cause over-fitting      |
+| WOE encoding[9]         | replace the label  with Weight of Evidence of each label. WOE is computed from the basic odds ratio: ln( (Proportion of Good Outcomes) / (Proportion of Bad Outcomes)) | 1. Establishes a monotonic relationship to the dependent variable<br/>2. Orders the categories on a "logistic" scale which is natural for logistic regression<br>3，The transformed variables, can then be compared because they are on the same scale. Therefore, it is possible to determine which one is more predictive. | 1. May incur in loss of information (variation) due to binning to few categories<br/>2. Prone to cause over-fitting |
+| Target encoding[10]     | Similar to mean encoding, but use both posterior probability and prior probability of the target | 1. Capture information within the label, therefore rendering more predictive features<br/>2. Create a monotonic relationship between the variable and the target<br/>3. Do not expand the feature space | Prone to cause over-fitting      |

 **Note**: if we are using one-hot encoding in linear regression, we should keep k-1 binary variable to avoid multicollinearity. This is true for any algorithms that look at all features at the same time during training. Including SVM, neural network and clustering. Tree-based algorithm, on the other hand, need the entire set of binary variable to select the best split.

@@ -464,16 +532,16 @@ In the situations above, transformation of the original variable can help give t
 | Reciprocal transformation   | 1/x. Warning that x should not be 0.                     |
 | Square root transformation  | x**(1/2)                                                 |
 | Exponential transformation  | X**(m)                                                   |
-| Box-cox transformation[^12] | (X**λ-1)/λ                                               |
+| Box-cox transformation[12] | (X**λ-1)/λ                                               |
 | Quantile transformation     | transform features using quantiles information           |

 **Log transformation** is useful when applied to skewed distributions as they tend to expand the values which fall in the range of lower magnitudes and tend to compress or reduce the values which fall in the range of higher magnitudes, which helps to make the skewed distribution as normal-like as possible. **Square root transformation** does a similar thing in this sense.

-**Box-Cox transformation** in sklearn[^13] is another popular function belonging to the power transform family of functions. This function has a pre-requisite that the numeric values to be transformed must be positive (similar to what log transform expects). In case they are negative, shifting using a constant value helps. Mathematically, the Box-Cox transform function can be denoted as follows.
+**Box-Cox transformation** in sklearn [13] is another popular function belonging to the power transform family of functions. This function has a pre-requisite that the numeric values to be transformed must be positive (similar to what log transform expects). In case they are negative, shifting using a constant value helps. Mathematically, the Box-Cox transform function can be denoted as follows.

 ![](images/box-cox.png)

-**Quantile transformation** in sklearn[^14] transforms the features to follow a uniform or a normal distribution. Therefore, for a given feature, this transformation tends to spread out the most frequent values. It also reduces the impact of (marginal) outliers: this is therefore a robust preprocessing scheme. However, this transform is non-linear. It may distort linear correlations between variables measured at the same scale but renders variables measured at different scales more directly comparable.
+**Quantile transformation** in sklearn [14] transforms the features to follow a uniform or a normal distribution. Therefore, for a given feature, this transformation tends to spread out the most frequent values. It also reduces the impact of (marginal) outliers: this is therefore a robust preprocessing scheme. However, this transform is non-linear. It may distort linear correlations between variables measured at the same scale but renders variables measured at different scales more directly comparable.



@@ -481,8 +549,6 @@ We can use **Q-Q plot** to check if the variable is normally distributed (a 45 d

 Below is an example showing the effect of sklearn's Box-plot/Yeo-johnson/Quantile transform to map data from various distributions to a normal distribution.

-<div align=center>
-
 ![sphx_glr_plot_map_data_to_normal_001](.\images\sphx_glr_plot_map_data_to_normal_001.png)

 [img source](https://scikit-learn.org/stable/auto_examples/preprocessing/plot_map_data_to_normal.html#sphx-glr-auto-examples-preprocessing-plot-map-data-to-normal-py) 
@@ -797,7 +863,7 @@ The difference between this method and the step forward feature selection  is si

 ## 5. Data Leakage

-This section is a remainder to myself as I have had made huge mistakes because of not aware of the problem. Data leakage is when information from outside the training dataset is used to create the model[^15]. The result is that you may be creating overly optimistic models that are practically useless and cannot be used in production. The model shows great result on both your training and testing data but in fact it's not because your model really has a good generalizability but it uses information from the test data.
+This section is a remainder to myself as I have had made huge mistakes because of not aware of the problem. Data leakage is when information from outside the training dataset is used to create the model [15]. The result is that you may be creating overly optimistic models that are practically useless and cannot be used in production. The model shows great result on both your training and testing data but in fact it's not because your model really has a good generalizability but it uses information from the test data.

 While it is well known to use cross-validation or at least separate a validation set in training and evaluating the models, people may easily forget to do the same during the feature engineering & selection process. Keep in mind that the test dataset must not be used in any way to make choices about the model, including feature engineering & selection.

@@ -807,18 +873,32 @@ While it is well known to use cross-validation or at least separate a validation

 **Reference**

-[^1]: http://www.simonqueenborough.info/R/basic/missing-data
-[^2]: Rubin, D. B. (1976). Inference and missing data. Biometrika 63(3): 581-592.
-[^3]: D. Hawkins. Identification of Outliers, Chapman and Hall , 1980. 
-[^4]: https://www.springer.com/gp/book/9781461463955
-[^5]: https://github.com/yzhao062/pyod
-[^6]: https://docs.oracle.com/cd/E40248_01/epm.1112/cb_statistical/frameset.htm?ch07s02s10s01.html
-[^7]: https://www.academia.edu/5324493/Detecting_outliers_Do_not_use_standard_deviation_around_the_mean_use_absolute_deviation_around_the_median
-[^8]: https://www.purplemath.com/modules/boxwhisk3.htm
-[^9]: http://documentation.statsoft.com/StatisticaHelp.aspx?path=WeightofEvidence/WeightofEvidenceWoEIntroductoryOverview
-[^10]: A Preprocessing Scheme for High-Cardinality Categorical Attributes in Classification and Prediction Problems.  https://kaggle2.blob.core.windows.net/forum-message-attachments/225952/7441/high%20cardinality%20categoricals.pdf
-[^11]: https://www.aaai.org/Papers/AAAI/1992/AAAI92-019.pdf
-[^12]: http://onlinestatbook.com/2/transformations/box-cox.html
-[^13]: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PowerTransformer.html#sklearn.preprocessing.PowerTransformer
-[^14]: https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.QuantileTransformer.html#sklearn.preprocessing.QuantileTransformer
-[^15]: https://machinelearningmastery.com/data-leakage-machine-learning/
+1. http://www.simonqueenborough.info/R/basic/missing-data
+
+2.  Rubin, D. B. (1976). Inference and missing data. Biometrika 63(3): 581-592.
+
+3. D. Hawkins. Identification of Outliers, Chapman and Hall , 1980. 
+
+4. https://www.springer.com/gp/book/9781461463955
+
+5. https://github.com/yzhao062/pyod
+
+6. https://docs.oracle.com/cd/E40248_01/epm.1112/cb_statistical/frameset.htm?ch07s02s10s01.html
+
+7. https://www.academia.edu/5324493/Detecting_outliers_Do_not_use_standard_deviation_around_the_mean_use_absolute_deviation_around_the_median
+
+8. https://www.purplemath.com/modules/boxwhisk3.htm
+
+9. http://documentation.statsoft.com/StatisticaHelp.aspx?path=WeightofEvidence/WeightofEvidenceWoEIntroductoryOverview
+
+10. A Preprocessing Scheme for High-Cardinality Categorical Attributes in Classification and Prediction Problems.  https://kaggle2.blob.core.windows.net/forum-message-attachments/225952/7441/high%20cardinality%20categoricals.pdf
+
+11. https://www.aaai.org/Papers/AAAI/1992/AAAI92-019.pdf
+
+12. http://onlinestatbook.com/2/transformations/box-cox.html
+
+13. https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PowerTransformer.html#sklearn.preprocessing.PowerTransformer
+
+14. https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.QuantileTransformer.html#sklearn.preprocessing.QuantileTransformer
+
+15. https://machinelearningmastery.com/data-leakage-machine-learning/
--- a/README.md
+++ b/README.md
@@ -2,7 +2,7 @@

 ## About

-A comprehensive [guide]() for **Feature Engineering** and **Feature Selection**, with implementations and examples in Python.
+A comprehensive [guide](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#452-recursive-feature-addition) for **Feature Engineering** and **Feature Selection**, with implementations and examples in Python.



@@ -37,111 +37,106 @@ To run the demos or use the customized function,  please download the ZIP file f

 ## Table of Contents and Code Examples

-Below is a list of methods currently implemented in the repo. The complete guide can be found [here]().
+Below is a list of methods currently implemented in the repo. The complete guide can be found [here](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md).

-**1. Data Exploration**
+- **1. Data Exploration**
+  -    1.1 Variables 
+  -    1.2 Variable Identification
+    -    Check Data Types   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#12-variable-identification)  [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/1_Demo_Data_Explore.ipynb)
+  -    1.3 Univariate Analysis
+    -    Descriptive Analysis   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#13-univariate-analysis)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/1_Demo_Data_Explore.ipynb)
+    -    Discrete Variable Barplot   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#13-univariate-analysis)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/1_Demo_Data_Explore.ipynb)
+    -    Discrete Variable Countplot   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#13-univariate-analysis)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/1_Demo_Data_Explore.ipynb)
+    -    Discrete Variable Boxplot   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#13-univariate-analysis)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/1_Demo_Data_Explore.ipynb)
+    -    Continuous Variable Distplot   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#13-univariate-analysis)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/1_Demo_Data_Explore.ipynb)
+  -    1.4 Bi-variate Analysis
+    -    Scatter Plot   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#14-bi-variate-analysis)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/1_Demo_Data_Explore.ipynb)
+    -    Correlation Plot   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#14-bi-variate-analysis)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/1_Demo_Data_Explore.ipynb)
+    -    Heat Map   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#14-bi-variate-analysis)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/1_Demo_Data_Explore.ipynb)

-   1.1 Variables 
-   1.2 Variable Identification   
-             Check Data Types
-   1.3 Univariate Analysis
-             Descriptive Analysis
-             Discrete Variable Barplot
-             Discrete Variable Countplot
-             Discrete Variable Boxplot
-             Continuous Variable Distplot
-   1.4 Bi-variate Analysis
-             Scatter Plot
-             Correlation Plot
-             Heat Map
+- **2. Feature Cleaning**
+  -    2.1 Missing Values
+    -    Missing Value Check   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#214-how-to-handle-missing-data)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/2.1_Demo_Missing_Data.ipynb)
+    -    Listwise Deletion   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#214-how-to-handle-missing-data)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/2.1_Demo_Missing_Data.ipynb)
+    -    Mean/Median/Mode Imputation   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#214-how-to-handle-missing-data)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/2.1_Demo_Missing_Data.ipynb)
+    -    End of distribution Imputation   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#214-how-to-handle-missing-data)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/2.1_Demo_Missing_Data.ipynb)
+    -    Random Imputation   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#214-how-to-handle-missing-data)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/2.1_Demo_Missing_Data.ipynb)
+    -    Arbitrary Value Imputation   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#214-how-to-handle-missing-data)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/2.1_Demo_Missing_Data.ipynb)
+    -    Add a variable to denote NA   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#214-how-to-handle-missing-data)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/2.1_Demo_Missing_Data.ipynb)
+  -    2.2 Outliers
+    -    Detect by Arbitrary Boundary   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#222-outlier-detection)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/2.2_Demo_Outlier.ipynb)
+    -    Detect by Mean & Standard Deviation   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#222-outlier-detection)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/2.2_Demo_Outlier.ipynb)
+    -    Detect by IQR    [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#222-outlier-detection)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/2.2_Demo_Outlier.ipynb)
+    -    Detect by MAD      [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#222-outlier-detection)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/2.2_Demo_Outlier.ipynb)
+    -    Mean/Median/Mode Imputation   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#223-how-to-handle-outliers)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/2.2_Demo_Outlier.ipynb)
+    -    Discretization   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#223-how-to-handle-outliers)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/3.2_Demo_Discretisation.ipynb)
+    -    Imputation with Arbitrary Value   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#223-how-to-handle-outliers)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/2.2_Demo_Outlier.ipynb)
+    -    Windsorization   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#223-how-to-handle-outliers)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/2.2_Demo_Outlier.ipynb)
+    -    Discard Outliers   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#223-how-to-handle-outliers)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/2.2_Demo_Outlier.ipynb)
+  -    2.3 Rare Values
+    -    Mode Imputation     [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#23-rare-values)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/2.3_Demo_Rare_Values.ipynb)
+    -    Grouping into One New Category   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#23-rare-values)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/2.3_Demo_Rare_Values.ipynb)
+  -    2.4 High Cardinality
+    -    Grouping Labels with Business Understanding    [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#24-high-cardinality) 
+    -    Grouping Labels with Rare Occurrence into One Category   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#24-high-cardinality)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/2.3_Demo_Rare_Values.ipynb)
+    -    Grouping Labels with Decision Tree   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#24-high-cardinality)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/3.2_Demo_Discretisation.ipynb)

-**2. Feature Cleaning**
+- **3. Feature Engineering**
+  -    3.1 Feature Scaling  
+    -    Normalization - Standardization    [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#31-feature-scaling)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/3.1_Demo_Feature_Scaling.ipynb)
+    -    Min-Max Scaling   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#31-feature-scaling)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/3.1_Demo_Feature_Scaling.ipynb)
+    -    Robust Scaling   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#31-feature-scaling)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/3.1_Demo_Feature_Scaling.ipynb)
+  -    3.2 Discretize   
+    -    Equal Width Binning   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#32-discretize)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/3.2_Demo_Discretisation.ipynb)
+    -    Equal Frequency Binning   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#32-discretize)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/3.2_Demo_Discretisation.ipynb)
+    -    K-means Binning      [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#32-discretize)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/3.2_Demo_Discretisation.ipynb)
+    -    Discretization by Decision Trees   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#32-discretize)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/3.2_Demo_Discretisation.ipynb)
+    -    ChiMerge   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#32-discretize)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/3.2_Demo_Discretisation.ipynb)
+  -    3.3 Feature Encoding
+    -    One-hot Encoding   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#33-feature-encoding)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/3.3_Demo_Feature_Encoding.ipynb)
+    -    Ordinal-Encoding   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#33-feature-encoding)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/3.3_Demo_Feature_Encoding.ipynb)
+    -    Count/frequency Encoding    [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#33-feature-encoding) 
+    -    Mean Encoding   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#33-feature-encoding)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/3.3_Demo_Feature_Encoding.ipynb)
+    -    WOE Encoding   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#33-feature-encoding)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/3.3_Demo_Feature_Encoding.ipynb)
+    -    Target Encoding   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#33-feature-encoding)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/3.3_Demo_Feature_Encoding.ipynb)
+  -    3.4 Feature Transformation
+    -    Logarithmic Transformation   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#34-feature-transformation)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/3.4_Demo_Feature_Transformation.ipynb)
+    -    Reciprocal Transformation   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#34-feature-transformation)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/3.4_Demo_Feature_Transformation.ipynb)
+    -    Square Root Transformation   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#34-feature-transformation)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/3.4_Demo_Feature_Transformation.ipynb)
+    -    Exponential Transformation   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#34-feature-transformation)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/3.4_Demo_Feature_Transformation.ipynb)
+    -    Box-cox Transformation   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#34-feature-transformation)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/3.4_Demo_Feature_Transformation.ipynb)
+    -    Quantile Transformation   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#34-feature-transformation)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/3.4_Demo_Feature_Transformation.ipynb)
+  -    3.5 Feature Generation
+    -    Missing Data Derived   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#35-feature-generation)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/2.1_Demo_Missing_Data.ipynb)
+    -    Simple Stats   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#35-feature-generation) 
+    -    Crossing   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#35-feature-generation) 
+    -    Ratio & Proportion   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#35-feature-generation) 
+    -    Cross Product   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#35-feature-generation) 
+    -    Polynomial   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#35-feature-generation)  [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/3.5_Demo_Feature_Generation.ipynb)
+    -    Feature Learning by Tree   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#35-feature-generation)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/3.5_Demo_Feature_Generation.ipynb)
+    -    Feature Learning by Deep Network   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#35-feature-generation)  

-   2.1 Missing Values
-             Missing Value Check
-             Listwise Deletion
-             Mean/Median/Mode Imputation
-             End of distribution Imputation
-             Random Imputation
-             Arbitrary Value Imputation
-             Add a variable to denote NA
-   2.2 Outliers
-             Detect by Arbitrary Boundary
-             Detect by Mean & Standard Deviation
-             Detect by IQR 
-             Detect by MAD   
-             Mean/Median/Mode Imputation
-             Discretization
-             Imputation with Arbitrary Value
-             Windsorization
-             Discard Outliers
-   2.3 Rare Values
-             Mode Imputation  
-             Grouping into One New Category
-   2.4 High Cardinality
-             Grouping Labels with Business Understanding 
-             Grouping Labels with Rare Occurrence into One Category
-             Grouping Labels with Decision Tree
-
-**3. Feature Engineering**
-
-   3.1 Feature Scaling  
-             Normalization - Standardization 
-             Min-Max Scaling
-             Robust Scaling
-   3.2 Discretize   
-             Equal Width Binning
-             Equal Frequency Binning
-             K-means Binning   
-             Discretization by Decision Trees
-             ChiMerge
-   3.3 Feature Encoding
-             One-hot Encoding
-             Ordinal-Encoding
-             Count/frequency Encoding 
-             Mean Encoding
-             WOE Encoding
-             Target Encoding
-   3.4 Feature Transformation
-             Logarithmic Transformation
-             Reciprocal Transformation
-             Square Root Transformation
-             Exponential Transformation
-             Box-cox Transformation
-             Quantile Transformation
-   3.5 Feature Generation
-             Missing Data Derived
-             Simple Stats
-             Crossing
-             Ratio & Proportion
-             Cross Product
-             Polynomial
-             Feature Leanring by Tree
-             Feature Leanring by Deep Network
-
-**4. Feature Selection**
-
-   4.1 Filter Method
-             Variance
-             Correlation
-             Chi-Square
-             Mutual Information Filter
-             Univariate ROC-AUC or MSE
-             Information Value (IV)
-   4.2 Wrapper Method
-             Forward Selection
-             Backward Elimination
-             Exhaustive Feature Selection
-             Genetic Algorithm
-   4.3 Embedded Method
-             Lasso (L1)
-             Random Forest Importance
-             Gradient Boosted Trees Importance
-   4.4 Feature Shuffling
-             Random Shuffling
-   4.5 Hybrid Method
-             Recursive Feature Selection 
-             Recursive Feature Addition
+- **4. Feature Selection**
+  -    4.1 Filter Method
+    -    Variance   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#41-filter-method)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/4.1_Demo_Feature_Selection_Filter.ipynb)
+    -    Correlation   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#41-filter-method)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/4.1_Demo_Feature_Selection_Filter.ipynb)
+    -    Chi-Square   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#41-filter-method)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/4.1_Demo_Feature_Selection_Filter.ipynb)
+    -    Mutual Information Filter   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#41-filter-method)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/4.1_Demo_Feature_Selection_Filter.ipynb)
+    -    Information Value (IV)   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#41-filter-method) 
+  -    4.2 Wrapper Method
+    -    Forward Selection   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#42-wrapper-method)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/4.2_Demo_Feature_Selection_Wrapper.ipynb)
+    -    Backward Elimination   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#42-wrapper-method)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/4.2_Demo_Feature_Selection_Wrapper.ipynb)
+    -    Exhaustive Feature Selection   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#42-wrapper-method)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/4.2_Demo_Feature_Selection_Wrapper.ipynb)
+    -    Genetic Algorithm   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#42-wrapper-method) 
+  -    4.3 Embedded Method
+    -    Lasso (L1)   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#43-embedded-method)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/4.3_Demo_Feature_Selection_Embedded.ipynb)
+    -    Random Forest Importance   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#43-embedded-method)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/4.3_Demo_Feature_Selection_Embedded.ipynb)
+    -    Gradient Boosted Trees Importance   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#43-embedded-method)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/4.3_Demo_Feature_Selection_Embedded.ipynb)
+  -    4.4 Feature Shuffling
+    -    Random Shuffling   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#44-feature-shuffling)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/4.4_Demo_Feature_Selection_Feature_Shuffling.ipynb)
+  -    4.5 Hybrid Method
+    -    Recursive Feature Selection    [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#451-recursive-feature-elimination)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/4.5_Demo_Feature_Selection_Hybrid_method.ipynb)
+    -    Recursive Feature Addition   [[guide]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/A%20Short%20Guide%20for%20Feature%20Engineering%20and%20Feature%20Selection.md#452-recursive-feature-addition)   [[demo]](https://github.com/Yimeng-Zhang/feature-engineering-and-feature-selection/blob/master/4.5_Demo_Feature_Selection_Hybrid_method.ipynb)



@@ -155,6 +150,9 @@ Feature Engineering & Selection is the most essential part of building a useable
 > — Prof. Pedro Domingos

 ![001](./images/001.png)
+
+
+
 Data and feature determine the upper limit of a ML project, while models and algorithms are just approaching that limit. However, few materials could be found that systematically introduce the art of feature engineering, and even fewer could explain the rationale behind. This repo aims at teaching you a good guide for Feature Engineering & Selection.