Let’s click on the Principal Component Analysis button.
For the demonstration of the PCA module, we will be using the decathlon data set (you can catch a glimpse of it on the figure below) which comes with MEDA.
When opening the PCA submodule, a rather long graphic user interface (a.k.a. GUI) appears. It might feels intimidating at first, but we will see how it works step by step.
The first part of this GUI, like most of jamovi modules, is about selecting the variables for your analysis and giving them a role. As you can see on the figure above, there are 4 different types of variable for the PCA. First, we have the active variables, which in the case of a PCA are imperatively quantitative. Then comes the french touch of this module: the quantitative and categorical supplementary variables. These variables won’t participate in the construction of the different axes however, they are used to illustrate them. Lastly, the individuals labels variable will label the points on the individuals factor map.
Now, if we scroll down the GUI, we see an option called Scale to unit variance. This option purpose is to normalize the quantitative variables used in the PCA. Since most of the time PCA are performed on normalized data, this option is checked by default.
Let’s move on to the next set of options: the Graphic options. As the name suggests, those options rule what will be plotted on the graphics in the results window. With those options, you can choose which components to plot on the x-axis and the y-axis. For the individuals factor map, you can decide to display only the individuals, only the categories (of the categorical variables) or both of them. You can do the same with the active and supplementary variables for the variables factor map. It’s also possible to color the individuals based on the value they have for a categorical variable with the option Grouping variable. This option take as input the position (in the “Categorical Supplementary Variables” field) of the categorical variable used to segregate the individuals with colors. For instance, if we have 4 categorical variables and we want to color the individuals based on the 2nd categorical variable, we would set the Grouping variable option to 2.
At last, we have the options for the numerical indicators. The Significance threshold option from the Automatic Description of the Axes section fixes the threshold beneath which a variable characterize a dimension. The option Number of factors affect both the number of dimensions to be described and the number of dimensions shown for the individual and variable tables (which will be covered right after).
Finally, the individual and variable tables options let you choose to display or not some indicators resulting from the PCA.
Those indicators (calculated on all the dimensions from 1 to the number in the Number of factors option) are:
The square cosine
For the PCA to run correctly, the number of factors in the Automatic Description of the Axes must be less than or equal to the number of active variables selected. If this condition is not met, the analysis will produce an error.
The data set used for this demonstration is the decathlon data set. All the options are checked and the Grouping variable option is set to 1 to color the individual based on the competition they participated in. We set the value as 1 because Competition is the first (and only) categorical variable selected. See the figure below for the settings: