tra-analysis/analysis-master/tra_analysis/StatisticalTest.py
Arthur Lu 9f71ab3aad
tra-analysis v 3.0.0 aggregate PR (#73)
* reflected doc changes to README.md

Signed-off-by: Arthur Lu <learthurgo@gmail.com>

* tra_analysis v 2.1.0-alpha.1

Signed-off-by: Arthur Lu <learthurgo@gmail.com>

* changed setup.py to use __version__ from source
added Topic and keywords

Signed-off-by: Arthur Lu <learthurgo@gmail.com>

* updated Supported Platforms in README.md

Signed-off-by: Arthur Lu <learthurgo@gmail.com>

* moved required files back to parent

Signed-off-by: Arthur Lu <learthurgo@gmail.com>

* moved security back to parent

Signed-off-by: Arthur Lu <learthurgo@gmail.com>

* moved security back to parent
moved contributing back to parent

Signed-off-by: Arthur Lu <learthurgo@gmail.com>

* add PR template

Signed-off-by: Arthur Lu <learthurgo@gmail.com>

* moved to parent folder

Signed-off-by: Arthur Lu <learthurgo@gmail.com>

* moved meta files to .github folder

Signed-off-by: Arthur Lu <learthurgo@gmail.com>

* Analysis.py v 3.0.1

Signed-off-by: Arthur Lu <learthurgo@gmail.com>

* updated test_analysis for submodules, and added missing numpy import in Sort.py

* fixed item one of Issue #58

Signed-off-by: Arthur Lu <learthurgo@gmail.com>

* readded cache searching in postCreateCommand

Signed-off-by: Arthur Lu <learthurgo@gmail.com>

* added myself as an author

* feat: created kivy gui boilerplate

* added Kivy to requirements.txt

Signed-off-by: Arthur Lu <learthurgo@gmail.com>

* feat: gui with placeholders

* fix: changed config.json path

* migrated docker base image to debian

Signed-off-by: ltcptgeneral <learthurgo@gmail.com>

* style: spaces to tabs

* migrated to ubuntu

Signed-off-by: ltcptgeneral <learthurgo@gmail.com>

* fixed issues

Signed-off-by: ltcptgeneral <learthurgo@gmail.com>

* fix: docker build?

* fix: use ubuntu bionic

* fix: get kivy installed

* @ltcptgeneral can't spell

* optim dockerfile for not installing unused packages

* install basic stuff while building the container

* use prebuilt image for development

* install pylint on base image

* rename and use new kivy

* tests: added tests for Array and CorrelationTest

Both are not working due to errors

* use new thing

* use 20.04 base

* symlink pip3 to pip

* use pip instead of pip3

* equation.Expression.py v 0.0.1-alpha
added corresponding .pyc to .gitignore

* parser.py v 0.0.2-alpha

* added pyparsing to requirements.txt

* parser v 0.0.4-alpha

* Equation v 0.0.1-alpha

* added Equation to tra_analysis imports

* tests: New unit tests for submoduling (#66)

* feat: created kivy gui boilerplate

* migrated docker base image to debian

Signed-off-by: ltcptgeneral <learthurgo@gmail.com>

* migrated to ubuntu

Signed-off-by: ltcptgeneral <learthurgo@gmail.com>

* fixed issues

Signed-off-by: ltcptgeneral <learthurgo@gmail.com>

* fix: docker build?

* fix: use ubuntu bionic

* fix: get kivy installed

* @ltcptgeneral can't spell

* optim dockerfile for not installing unused packages

* install basic stuff while building the container

* use prebuilt image for development

* install pylint on base image

* rename and use new kivy

* tests: added tests for Array and CorrelationTest

Both are not working due to errors

* fix: Array no longer has *args and CorrelationTest functions no longer have self in the arguments

* use new thing

* use 20.04 base

* symlink pip3 to pip

* use pip instead of pip3

* tra_analysis v 2.1.0-alpha.2
SVM v 1.0.1
added unvalidated SVM unit tests

Signed-off-by: ltcptgeneral <learthurgo@gmail.com>

* fixed version number

Signed-off-by: ltcptgeneral <learthurgo@gmail.com>

* tests: added tests for ClassificationMetric

* partially fixed and commented out svm unit tests

* fixed some SVM unit tests

* added installing pytest to devcontainer.json

* fix: small fixes to KNN

Namely, removing self from parameters and passing correct arguments to KNeighborsClassifier constructor

* fix, test: Added tests for KNN and NaiveBayes.

Also made some small fixes in KNN, NaiveBayes, and RegressionMetric

* test: finished unit tests except for StatisticalTest

Also made various small fixes and style changes

* StatisticalTest v 1.0.1

* fixed RegressionMetric unit test
temporarily disabled CorrelationTest unit tests

* tra_analysis v 2.1.0-alpha.3

* readded __all__

* fix: floating point issues in unit tests for CorrelationTest

Co-authored-by: AGawde05 <agawde05@gmail.com>
Co-authored-by: ltcptgeneral <learthurgo@gmail.com>
Co-authored-by: Dev Singh <dev@devksingh.com>
Co-authored-by: jzpan1 <panzhenyu2014@gmail.com>

* fixed depreciated escape sequences

* ficed tests, indent, import in test_analysis

* changed version to 3.0.0
added backwards compatibility

* ficed pytest install in container

* removed GUI changes

Signed-off-by: Arthur Lu <learthurgo@gmail.com>

* incremented version to rc.1 (release candidate 1)

Signed-off-by: Arthur Lu <learthurgo@gmail.com>

* fixed NaiveBayes __changelog__

Signed-off-by: Arthur Lu <learthurgo@gmail.com>

* fix: __setitem__  == to single =

* Array v 1.0.1

* Revert "Array v 1.0.1"

This reverts commit 59783b79f7.

* Array v 1.0.1

* Array.py v 1.0.2
added more Array unit tests

* cleaned .gitignore
tra_analysis v 3.0.0-rc2

Signed-off-by: Arthur Lu <learthurgo@gmail.com>

* added *.pyc to gitignore
finished subdividing test_analysis

* feat: gui layout + basic func

* Froze and removed superscript (data-analysis)

* remove data-analysis deps install for devcontainer

* tukey pairwise comparison and multicomparison but no critical q-values

* quick patch for devcontainer.json

* better fix for devcontainer.json

* fixed some styling in StatisticalTest
removed print statement in StatisticalTest unit tests

* update analysis tests to be more effecient

* don't use loop for test_nativebayes

* removed useless secondary docker files

* tra-analysis v 3.0.0

Co-authored-by: James Pan <panzhenyu2014@gmail.com>
Co-authored-by: AGawde05 <agawde05@gmail.com>
Co-authored-by: zpan1 <72054510+zpan1@users.noreply.github.com>
Co-authored-by: Dev Singh <dev@devksingh.com>
Co-authored-by: = <=>
Co-authored-by: Dev Singh <dsingh@imsa.edu>
Co-authored-by: zpan1 <zpan@imsa.edu>
2021-04-28 19:33:50 -05:00

314 lines
11 KiB
Python

# Titan Robotics Team 2022: StatisticalTest submodule
# Written by Arthur Lu
# Notes:
# this should be imported as a python module using 'from tra_analysis import StatisticalTest'
# setup:
__version__ = "1.0.2"
__changelog__ = """changelog:
1.0.2:
- added tukey_multicomparison
- fixed styling
1.0.1:
- fixed typo in __all__
1.0.0:
- ported analysis.StatisticalTest() here
- removed classness
"""
__author__ = (
"Arthur Lu <learthurgo@gmail.com>",
"James Pan <zpan@imsa.edu>",
)
__all__ = [
'ttest_onesample',
'ttest_independent',
'ttest_statistic',
'ttest_related',
'ks_fitness',
'chisquare',
'powerdivergence'
'ks_twosample',
'es_twosample',
'mw_rank',
'mw_tiecorrection',
'rankdata',
'wilcoxon_ranksum',
'wilcoxon_signedrank',
'kw_htest',
'friedman_chisquare',
'bm_wtest',
'combine_pvalues',
'jb_fitness',
'ab_equality',
'bartlett_variance',
'levene_variance',
'sw_normality',
'shapiro',
'ad_onesample',
'ad_ksample',
'binomial',
'fk_variance',
'mood_mediantest',
'mood_equalscale',
'skewtest',
'kurtosistest',
'normaltest',
'tukey_multicomparison'
]
import numpy as np
import scipy
from scipy import stats, interpolate
def ttest_onesample(a, popmean, axis = 0, nan_policy = 'propagate'):
results = scipy.stats.ttest_1samp(a, popmean, axis = axis, nan_policy = nan_policy)
return {"t-value": results[0], "p-value": results[1]}
def ttest_independent(a, b, equal = True, nan_policy = 'propagate'):
results = scipy.stats.ttest_ind(a, b, equal_var = equal, nan_policy = nan_policy)
return {"t-value": results[0], "p-value": results[1]}
def ttest_statistic(o1, o2, equal = True):
results = scipy.stats.ttest_ind_from_stats(o1["mean"], o1["std"], o1["nobs"], o2["mean"], o2["std"], o2["nobs"], equal_var = equal)
return {"t-value": results[0], "p-value": results[1]}
def ttest_related(a, b, axis = 0, nan_policy='propagate'):
results = scipy.stats.ttest_rel(a, b, axis = axis, nan_policy = nan_policy)
return {"t-value": results[0], "p-value": results[1]}
def ks_fitness(rvs, cdf, args = (), N = 20, alternative = 'two-sided', mode = 'approx'):
results = scipy.stats.kstest(rvs, cdf, args = args, N = N, alternative = alternative, mode = mode)
return {"ks-value": results[0], "p-value": results[1]}
def chisquare(f_obs, f_exp = None, ddof = None, axis = 0):
results = scipy.stats.chisquare(f_obs, f_exp = f_exp, ddof = ddof, axis = axis)
return {"chisquared-value": results[0], "p-value": results[1]}
def powerdivergence(f_obs, f_exp = None, ddof = None, axis = 0, lambda_ = None):
results = scipy.stats.power_divergence(f_obs, f_exp = f_exp, ddof = ddof, axis = axis, lambda_ = lambda_)
return {"powerdivergence-value": results[0], "p-value": results[1]}
def ks_twosample(x, y, alternative = 'two_sided', mode = 'auto'):
results = scipy.stats.ks_2samp(x, y, alternative = alternative, mode = mode)
return {"ks-value": results[0], "p-value": results[1]}
def es_twosample(x, y, t = (0.4, 0.8)):
results = scipy.stats.epps_singleton_2samp(x, y, t = t)
return {"es-value": results[0], "p-value": results[1]}
def mw_rank(x, y, use_continuity = True, alternative = None):
results = scipy.stats.mannwhitneyu(x, y, use_continuity = use_continuity, alternative = alternative)
return {"u-value": results[0], "p-value": results[1]}
def mw_tiecorrection(rank_values):
results = scipy.stats.tiecorrect(rank_values)
return {"correction-factor": results}
def rankdata(a, method = 'average'):
results = scipy.stats.rankdata(a, method = method)
return results
def wilcoxon_ranksum(a, b): # this seems to be superceded by Mann Whitney Wilcoxon U Test
results = scipy.stats.ranksums(a, b)
return {"u-value": results[0], "p-value": results[1]}
def wilcoxon_signedrank(x, y = None, zero_method = 'wilcox', correction = False, alternative = 'two-sided'):
results = scipy.stats.wilcoxon(x, y = y, zero_method = zero_method, correction = correction, alternative = alternative)
return {"t-value": results[0], "p-value": results[1]}
def kw_htest(*args, nan_policy = 'propagate'):
results = scipy.stats.kruskal(*args, nan_policy = nan_policy)
return {"h-value": results[0], "p-value": results[1]}
def friedman_chisquare(*args):
results = scipy.stats.friedmanchisquare(*args)
return {"chisquared-value": results[0], "p-value": results[1]}
def bm_wtest(x, y, alternative = 'two-sided', distribution = 't', nan_policy = 'propagate'):
results = scipy.stats.brunnermunzel(x, y, alternative = alternative, distribution = distribution, nan_policy = nan_policy)
return {"w-value": results[0], "p-value": results[1]}
def combine_pvalues(pvalues, method = 'fisher', weights = None):
results = scipy.stats.combine_pvalues(pvalues, method = method, weights = weights)
return {"combined-statistic": results[0], "p-value": results[1]}
def jb_fitness(x):
results = scipy.stats.jarque_bera(x)
return {"jb-value": results[0], "p-value": results[1]}
def ab_equality(x, y):
results = scipy.stats.ansari(x, y)
return {"ab-value": results[0], "p-value": results[1]}
def bartlett_variance(*args):
results = scipy.stats.bartlett(*args)
return {"t-value": results[0], "p-value": results[1]}
def levene_variance(*args, center = 'median', proportiontocut = 0.05):
results = scipy.stats.levene(*args, center = center, proportiontocut = proportiontocut)
return {"w-value": results[0], "p-value": results[1]}
def sw_normality(x):
results = scipy.stats.shapiro(x)
return {"w-value": results[0], "p-value": results[1]}
def shapiro(x):
return "destroyed by facts and logic"
def ad_onesample(x, dist = 'norm'):
results = scipy.stats.anderson(x, dist = dist)
return {"d-value": results[0], "critical-values": results[1], "significance-value": results[2]}
def ad_ksample(samples, midrank = True):
results = scipy.stats.anderson_ksamp(samples, midrank = midrank)
return {"d-value": results[0], "critical-values": results[1], "significance-value": results[2]}
def binomial(x, n = None, p = 0.5, alternative = 'two-sided'):
results = scipy.stats.binom_test(x, n = n, p = p, alternative = alternative)
return {"p-value": results}
def fk_variance(*args, center = 'median', proportiontocut = 0.05):
results = scipy.stats.fligner(*args, center = center, proportiontocut = proportiontocut)
return {"h-value": results[0], "p-value": results[1]} # unknown if the statistic is an h value
def mood_mediantest(*args, ties = 'below', correction = True, lambda_ = 1, nan_policy = 'propagate'):
results = scipy.stats.median_test(*args, ties = ties, correction = correction, lambda_ = lambda_, nan_policy = nan_policy)
return {"chisquared-value": results[0], "p-value": results[1], "m-value": results[2], "table": results[3]}
def mood_equalscale(x, y, axis = 0):
results = scipy.stats.mood(x, y, axis = axis)
return {"z-score": results[0], "p-value": results[1]}
def skewtest(a, axis = 0, nan_policy = 'propogate'):
results = scipy.stats.skewtest(a, axis = axis, nan_policy = nan_policy)
return {"z-score": results[0], "p-value": results[1]}
def kurtosistest(a, axis = 0, nan_policy = 'propogate'):
results = scipy.stats.kurtosistest(a, axis = axis, nan_policy = nan_policy)
return {"z-score": results[0], "p-value": results[1]}
def normaltest(a, axis = 0, nan_policy = 'propogate'):
results = scipy.stats.normaltest(a, axis = axis, nan_policy = nan_policy)
return {"z-score": results[0], "p-value": results[1]}
def get_tukeyQcrit(k, df, alpha=0.05):
'''
From statsmodels.sandbox.stats.multicomp
return critical values for Tukey's HSD (Q)
Parameters
----------
k : int in {2, ..., 10}
number of tests
df : int
degrees of freedom of error term
alpha : {0.05, 0.01}
type 1 error, 1-confidence level
not enough error checking for limitations
'''
# qtable from statsmodels.sandbox.stats.multicomp
qcrit = '''
2 3 4 5 6 7 8 9 10
5 3.64 5.70 4.60 6.98 5.22 7.80 5.67 8.42 6.03 8.91 6.33 9.32 6.58 9.67 6.80 9.97 6.99 10.24
6 3.46 5.24 4.34 6.33 4.90 7.03 5.30 7.56 5.63 7.97 5.90 8.32 6.12 8.61 6.32 8.87 6.49 9.10
7 3.34 4.95 4.16 5.92 4.68 6.54 5.06 7.01 5.36 7.37 5.61 7.68 5.82 7.94 6.00 8.17 6.16 8.37
8 3.26 4.75 4.04 5.64 4.53 6.20 4.89 6.62 5.17 6.96 5.40 7.24 5.60 7.47 5.77 7.68 5.92 7.86
9 3.20 4.60 3.95 5.43 4.41 5.96 4.76 6.35 5.02 6.66 5.24 6.91 5.43 7.13 5.59 7.33 5.74 7.49
10 3.15 4.48 3.88 5.27 4.33 5.77 4.65 6.14 4.91 6.43 5.12 6.67 5.30 6.87 5.46 7.05 5.60 7.21
11 3.11 4.39 3.82 5.15 4.26 5.62 4.57 5.97 4.82 6.25 5.03 6.48 5.20 6.67 5.35 6.84 5.49 6.99
12 3.08 4.32 3.77 5.05 4.20 5.50 4.51 5.84 4.75 6.10 4.95 6.32 5.12 6.51 5.27 6.67 5.39 6.81
13 3.06 4.26 3.73 4.96 4.15 5.40 4.45 5.73 4.69 5.98 4.88 6.19 5.05 6.37 5.19 6.53 5.32 6.67
14 3.03 4.21 3.70 4.89 4.11 5.32 4.41 5.63 4.64 5.88 4.83 6.08 4.99 6.26 5.13 6.41 5.25 6.54
15 3.01 4.17 3.67 4.84 4.08 5.25 4.37 5.56 4.59 5.80 4.78 5.99 4.94 6.16 5.08 6.31 5.20 6.44
16 3.00 4.13 3.65 4.79 4.05 5.19 4.33 5.49 4.56 5.72 4.74 5.92 4.90 6.08 5.03 6.22 5.15 6.35
17 2.98 4.10 3.63 4.74 4.02 5.14 4.30 5.43 4.52 5.66 4.70 5.85 4.86 6.01 4.99 6.15 5.11 6.27
18 2.97 4.07 3.61 4.70 4.00 5.09 4.28 5.38 4.49 5.60 4.67 5.79 4.82 5.94 4.96 6.08 5.07 6.20
19 2.96 4.05 3.59 4.67 3.98 5.05 4.25 5.33 4.47 5.55 4.65 5.73 4.79 5.89 4.92 6.02 5.04 6.14
20 2.95 4.02 3.58 4.64 3.96 5.02 4.23 5.29 4.45 5.51 4.62 5.69 4.77 5.84 4.90 5.97 5.01 6.09
24 2.92 3.96 3.53 4.55 3.90 4.91 4.17 5.17 4.37 5.37 4.54 5.54 4.68 5.69 4.81 5.81 4.92 5.92
30 2.89 3.89 3.49 4.45 3.85 4.80 4.10 5.05 4.30 5.24 4.46 5.40 4.60 5.54 4.72 5.65 4.82 5.76
40 2.86 3.82 3.44 4.37 3.79 4.70 4.04 4.93 4.23 5.11 4.39 5.26 4.52 5.39 4.63 5.50 4.73 5.60
60 2.83 3.76 3.40 4.28 3.74 4.59 3.98 4.82 4.16 4.99 4.31 5.13 4.44 5.25 4.55 5.36 4.65 5.45
120 2.80 3.70 3.36 4.20 3.68 4.50 3.92 4.71 4.10 4.87 4.24 5.01 4.36 5.12 4.47 5.21 4.56 5.30
infinity 2.77 3.64 3.31 4.12 3.63 4.40 3.86 4.60 4.03 4.76 4.17 4.88 4.29 4.99 4.39 5.08 4.47 5.16
'''
res = [line.split() for line in qcrit.replace('infinity','9999').split('\n')]
c=np.array(res[2:-1]).astype(float)
#c[c==9999] = np.inf
ccols = np.arange(2,11)
crows = c[:,0]
cv005 = c[:, 1::2]
cv001 = c[:, 2::2]
if alpha == 0.05:
intp = interpolate.interp1d(crows, cv005[:,k-2])
elif alpha == 0.01:
intp = interpolate.interp1d(crows, cv001[:,k-2])
else:
raise ValueError('only implemented for alpha equal to 0.01 and 0.05')
return intp(df)
def tukey_multicomparison(groups, alpha=0.05):
#formulas according to https://astatsa.com/OneWay_Anova_with_TukeyHSD/
k = len(groups)
df = 0
means = []
MSE = 0
for group in groups:
df+= len(group)
mean = sum(group)/len(group)
means.append(mean)
MSE += sum([(i-mean)**2 for i in group])
df -= k
MSE /= df
q_dict = {}
crit_q = get_tukeyQcrit(k, df, alpha)
for i in range(k-1):
for j in range(i+1, k):
numerator = abs(means[i] - means[j])
denominator = np.sqrt( MSE / ( 2/(1/len(groups[i]) + 1/len(groups[j])) ))
q = numerator/denominator
q_dict["group "+ str(i+1) + " and group " + str(j+1)] = [q, q>crit_q]
return q_dict