Blog
With over 9.1 million weekly downloads, if you dabbled with Python programming you have probably already used Scipy. Scipy is an open source package that provides fundamental algorithms for scientific computing in Python. Open source means that the source code of the package is publicly available and anyone can contribute to the project.
EasyMedStat relies on open source code for core statistical functions as it guarantees that the algorithms used are transparent, have been
verified and validated for correctness through a rigorous peer review process and the results our users publish can be replicated.
During the development of a new statistical feature at EasyMedStat
we follow the latest academic standards and best practices, we look at various open source implementations of the published algorithms and validate the code for correctness, edge cases handling and performance.
Sometimes, we stumble upon implementations where improvements can be made and attempt to do it. In these cases we contribute our work back to the original package and discuss the merits of our proposed improvement with the project maintainer and the wider community. This ensures that our statistical core functions are not only validated internally but also by field experts.
During our work on the Wilcoxon signed-rank test, we noticed that the SciPy implementation limited the exact p-value estimation to cases where n < 25 and defaulted to a normal approximation for the other cases. This was the case because computing exact p-values requires knowing the distribution of T (the test statistic) under the null hypothesis.
With no closed formula for this distribution, it required computing the distribution of T by considering all possibilities and computing 2^n sums. Since this was intractable for all but the smallest n, the original SciPy implementation cached the results in a precomputed table for n<25 that was shipped in the package.
We discovered that there was an efficient recursion that allowed us to compute the exact distribution of T much faster. This allowed EasyMedStat to offer exact p-values for this test up to n=500.
The Wilcoxon signed-rank test tests the null hypothesis that two related paired samples come from the same distribution. In particular, it tests whether the distribution of the differences x - y is symmetric about zero. It is a non-parametric version of the paired T-test.
https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test
https://en.wikipedia.org/wiki/Wilcoxon_signed-rank_test#Computing_the_null_distribution