Blog
There are two main reasons for performing multivariate analysis in biomedical research:
In general, when several parameters (variables) can influence the result, a multivariate analysis makes it possible to adjust the results to take these parameters into account simultaneously.
Let's take a simple example: you want to compare cardiovascular risk in men and women in the general population. It is known that men have a higher cardiovascular risk. However, you may find only a small difference between the two sexes. This may be due to a confusion bias: women live longer than men on average. And an older age is also a cardiovascular risk factor. Your study population could be older in the female group, which would artificially increase the cardiovascular risk in the female group. By including both gender and age in your model, you correct this confusion bias.
We will now detail these steps.
This is usually the easiest step. It corresponds to your research hypothesis.
If you are looking for predictors of post-operative complications, your study variable Y is "post-operative complication".
It directly depends on the first stage.
EasyMedStat automatically chooses the type of model for you according to the type of the variable to be studied Y:
Note that if you transform a continuous numeric variable into a binary variable, you will need to use logistic regression. For example, if you are looking to predict when a pain score is greater than 5/10, you are actually analyzing a binary variable (> 5/10 = yes, ≤ 5/10 = no).
This is the most crucial step in your multivariate analysis!
This is where everything is played out. And the good news is, you don't need to have advanced statistical knowledge to choose these variables. On the other hand, you need a good knowledge of the pathology you are studying.
There are 2 types of variables that should generally be chosen as predictive variables:
Let us take a simplified example of a study on a new anti-aggregating treatment aimed at preventing the risk of myocardial infarction in patients who have never had a heart attack (primary prevention). You are testing this new treatment against a placebo. Your hypothesis is that there will be less myocardial infarction in the treatment group than in the placebo group.
You must therefore include in your model not only the treatment followed but also the age, the sex, the presence of diabetes, etc ...
As often, the correct answer is "neither too much nor too little".
The number of variables in the model must be adapted to the number of patients you have available for your analysis. A generally accepted rule is to have at least 10 patients for each variable in the model. However, different opinions on the matter exist.
This 10 patient rule differs somewhat if you are performing logistic or linear regression. For linear regression, it is heard directly: if you analyze 70 patients, you can put up to 7 predictor variables in the model. For a logistic regression, we expect 10 patients in each group. So if you have a binary variable (yes / no) known for 70 patients with 30 patients who have the value "Yes" and 40 patients who have the value "No", we consider the smallest number, that is to say 30 patients. You can then only include 3 variables in the model.
Once again, this 10 patient rule is not entirely consensual. But this is a frequently accepted rule.
When you perform your multivariate analysis on EasyMedStat, the number of predictive variables is automatically checked.
It is also important not to include too few variables in the model. Otherwise, your analysis could be incomplete or even wrong. If you do not include the variable "diabetes" in a study to predict cardiovascular risk, it may be biased.
As you understand, you have to include enough variables to draw a model as close as possible to reality, but also to analyze enough patients. This is why multivariate analyzes are generally performed on relatively large samples, usually at least 100 patients (although this number is very arbitrary and can vary greatly depending on your data).
Behind this barbaric word hides a relatively simple concept. This is to make sure that your explanatory variables X are not statistically related to each other in too important a way.
For example, you should not include the weight variable in a model along with the BMI variable because there is a direct relationship between these two variables (BMI = weight / height squared).
EasyMedStat automatically checks the multicollinearity of your variables when you include them to avoid this problem.
The statistical veracity of your results depends on compliance with the assumptions of the model you are using. If you violate these assumptions, your results may be wrong.
These assumptions depend on the type of model you are using. They can include, among other things, linearity, absence of heteroskedasticity, normality of residues, etc.
However, these advanced concepts are checked automatically when you perform multivariate analysis with EasyMedStat. In the event of a violation of one of the assumptions, you are automatically informed and a solution is offered to you if possible.
As you have understood, multivariate analysis is an advanced statistical technique but its use is facilitated by using suitable software.
This is precisely the case with EasyMedStat. You are guided throughout your analysis and you avoid the classic pitfalls in which you might otherwise fall.