Merge branch 'master-staged' of https://github.com/titanscouting/red-alliance-analysis into master-staged

2026-03-16 03:34:14 +00:00 · 2020-10-06 19:17:35 -07:00
parent 5d76eec848 350e0f9ed3
commit ab3585437c
9 changed files with 281 additions and 54 deletions
--- a/.devcontainer/devcontainer.json
+++ b/.devcontainer/devcontainer.json
@@ -24,5 +24,5 @@
 		"ms-python.python",
 		"waderyan.gitblame"
 	],
-	"postCreateCommand": "apt install vim -y ; pip install -r data-analysis/requirements.txt ; pip install -r analysis-master/requirements.txt ; pip install tra-analysis"
+	"postCreateCommand": "apt install vim -y ; pip install -r data-analysis/requirements.txt ; pip install -r analysis-master/requirements.txt ; pip install pylint ; pip install tra-analysis"
 }
--- a/README.md
+++ b/README.md
@@ -1,37 +1,102 @@
 # Red Alliance Analysis &middot; ![GitHub release (latest by date)](https://img.shields.io/github/v/release/titanscout2022/red-alliance-analysis)
+
 Titan Robotics 2022 Strategy Team Repository for Data Analysis Tools. Included with these tools are the backend data analysis engine formatted as a python package, associated binaries for the analysis package, and premade scripts that can be pulled directly from this repository and will integrate with other Red Alliance applications to quickly deploy FRC scouting tools.
-# Getting Started
+
+---
+
+# `tra-analysis`
+
+`tra-analysis` is a higher level package for data processing and analysis. It is a python library that combines popular data science tools like numpy, scipy, and sklearn along with other tools to create an easy-to-use data analysis engine. tra-analysis includes analysis in all ranges of complexity from basic statistics like mean, median, mode to complex kernel based classifiers and allows user to more quickly deploy these algorithms. The package also includes performance metrics for score based applications including elo, glicko2, and trueskill ranking systems.
+
+At the core of the tra-analysis package is the modularity of each analytical tool. The package encapsulates the setup code for the included data science tools. For example, there are many packages that allow users to generate many different types of regressions. With the tra-analysis package, one function can be called to generate many regressions and sort them by accuracy.
+
 ## Prerequisites
+---
+
 * Python >= 3.6
 * Pip which can be installed by running\
 `curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py`\
 `python get-pip.py`\
 after installing python, or with a package manager on linux. Refer to the [pip installation instructions](https://pip.pypa.io/en/stable/installing/) for more information.
+
 ## Installing
-### Standard Platforms
+---
+
+#### Standard Platforms
+
 For the latest version of tra-analysis, run `pip install tra-analysis` or `pip install tra_analysis`. The requirements for tra-analysis should be automatically installed.
-### Exotic Platforms (Android)
+
+#### Exotic Platforms (Android)
+
 [Termux](https://termux.com/) is recommended for a linux environemnt on Android. Consult the [documentation](https://titanscouting.github.io/analysis/general/installation#exotic-platforms-android) for advice on installing the prerequisites. After installing the prerequisites, the package should be installed normally with `pip install tra-analysis` or `pip install tra_analysis`. 
+
 ## Use
+
+---
+
 tra-analysis operates like any other python package. Consult the [documentation](https://titanscouting.github.io/analysis/tra_analysis/) for more information.
-# Supported Platforms
+
+## Supported Platforms
+
+---
+
 Although any modern 64 bit platform should be supported, the following platforms have been tested to be working:
 * AMD64 (Tested on Zen, Zen+, and Zen 2)
 * Intel 64/x86_64/x64 (Tested on Kaby Lake, Ice Lake)
 * ARM64 (Tested on Broadcom BCM2836 SoC, Broadcom BCM2711 SoC)
-### 
+
 The following OSes have been tested to be working:
 * Linux Kernel 3.16, 4.4, 4.15, 4.19, 5.4
 	* Ubuntu 16.04, 18.04, 20.04
 	* Debian (and Debian derivaives) Jessie, Buster
 * Windows 7, 10
-### 
+
 The following python versions are supported:
 * python 3.6 (not tested)
 * python 3.7
 * python 3.8
+
+---
+
+# `data-analysis`
+
+To facilitate data analysis of collected scouting data in a user firendly tool, we created the data-analysis application. At its core it uses the tra-analysis package to conduct any number of user selected tests on data collected from the TRA scouting app. It uploads these tests back to MongoDB where it can be viewed from the app at any time.  
+
+The data-analysis application also uses the TRA API to interface with MongoDB and uses the TBA API to collect additional data (match win/loss).
+
+The application can be configured with a configuration tool or by editing the config.json directly.
+
+## Prerequisites
+
+---
+
+Before installing and using data-analysis, make sure that you have installed the folowing prerequisites:
+- A common operating system like **Windows** or (*most*) distributions of **Linux**. BSD may work but has not been tested nor is it reccomended.
+- [Python](https://www.python.org/) version **3.6** or higher
+- [Pip](https://pip.pypa.io/en/stable/) (installation instructions [here](https://pip.pypa.io/en/stable/installing/))
+
+## Installing Requirements
+
+---
+
+Once navigated to the data-analysis folder run `pip install -r requirements.txt` to install all of the required python libraries.
+
+## Scripts
+
+---
+
+The data-analysis application is a collection of various scripts and one config file. For users, only the main application `superscript.py` and the config file `config.json` are important. 
+
+To run the data-analysis application, navigate to the data-analysis folder once all requirements have been installed and run `python superscript.py`. If you encounter the error:
+
+`pymongo.errors.ConfigurationError: Empty host (or extra comma in host list).`
+
+don't worry, you may have just not configured the application correctly, but would otherwise work. Refer to [the documentation](https://titanscouting.github.io/analysis/data_analysis/Config) to learn how to configure data-analysis.
+
 # Contributing
+
 Read our included contributing guidelines (`CONTRIBUTING.md`) for more information and feel free to reach out to any current maintainer for more information. 
+
 # Build Statuses
 ![Analysis Unit Tests](https://github.com/titanscout2022/red-alliance-analysis/workflows/Analysis%20Unit%20Tests/badge.svg)
-![Superscript Unit Tests](https://github.com/titanscout2022/red-alliance-analysis/workflows/Superscript%20Unit%20Tests/badge.svg?branch=master)
+![Superscript Unit Tests](https://github.com/titanscout2022/red-alliance-analysis/workflows/Superscript%20Unit%20Tests/badge.svg?branch=master)
--- a/SECURITY.md
+++ b/SECURITY.md
@@ -0,0 +1,6 @@
+# Security Policy
+
+
+## Reporting a Vulnerability
+
+Please email `titanscout2022@gmail.com` to report a vulnerability.
--- a/analysis-master/test_analysis.py
+++ b/analysis-master/test_analysis.py
@@ -1,8 +1,11 @@
 from tra_analysis import analysis as an
 from tra_analysis import metrics
+from tra_analysis import fits

 def test_():
 	test_data_linear = [1, 3, 6, 7, 9]
+	x_data_circular = []
+	y_data_circular = []
 	y_data_ccu = [1, 3, 7, 14, 21]
 	y_data_ccd = [1, 5, 7, 8.5, 8.66]
 	test_data_scrambled = [-32, 34, 19, 72, -65, -11, -43, 6, 85, -17, -98, -26, 12, 20, 9, -92, -40, 98, -78, 17, -20, 49, 93, -27, -24, -66, 40, 84, 1, -64, -68, -25, -42, -46, -76, 43, -3, 30, -14, -34, -55, -13, 41, -30, 0, -61, 48, 23, 60, 87, 80, 77, 53, 73, 79, 24, -52, 82, 8, -44, 65, 47, -77, 94, 7, 37, -79, 36, -94, 91, 59, 10, 97, -38, -67, 83, 54, 31, -95, -63, 16, -45, 21, -12, 66, -48, -18, -96, -90, -21, -83, -74, 39, 64, 69, -97, 13, 55, 27, -39]
@@ -28,4 +31,5 @@ def test_():
 	assert all(a == b for a, b in zip(an.Sort().shellsort(test_data_scrambled), test_data_sorted))
 	assert all(a == b for a, b in zip(an.Sort().bubblesort(test_data_scrambled), test_data_sorted))
 	assert all(a == b for a, b in zip(an.Sort().cyclesort(test_data_scrambled), test_data_sorted))
-	assert all(a == b for a, b in zip(an.Sort().cocktailsort(test_data_scrambled), test_data_sorted))
+	assert all(a == b for a, b in zip(an.Sort().cocktailsort(test_data_scrambled), test_data_sorted))
+	assert fits.CircleFit(x=[0,0,-1,1], y=[1, -1, 0, 0]).LSC() == (0.0, 0.0, 1.0, 0.0)
--- a/analysis-master/tra_analysis/fits.py
+++ b/analysis-master/tra_analysis/fits.py
@@ -0,0 +1,85 @@
+# Titan Robotics Team 2022: CPU fitting models
+# Written by Dev Singh
+# Notes:
+#   this module is cuda-optimized (as appropriate) and vectorized (except for one small part)
+# setup:
+
+__version__ = "0.0.1"
+
+# changelog should be viewed using print(analysis.fits.__changelog__)
+__changelog__ = """changelog:
+	0.0.1:
+		- initial release, add circle fitting with LSC
+"""
+
+__author__ = (
+	"Dev Singh <dev@devksingh.com>"
+)
+
+__all__ = [
+	'CircleFit'
+]
+
+import numpy as np
+
+class CircleFit:
+	"""Class to fit data to a circle using the Least Square Circle (LSC) method"""
+	# For more information on the LSC method, see: 
+	# http://www.dtcenter.org/sites/default/files/community-code/met/docs/write-ups/circle_fit.pdf
+	def __init__(self, x, y, xy=None):
+		self.ournp = np #todo: implement cupy correctly
+		if type(x) == list:
+			x = np.array(x)
+		if type(y) == list:
+			y = np.array(y)
+		if type(xy) == list:
+			xy = np.array(xy)
+		if xy != None: 
+			self.coords = xy
+		else: 
+			# following block combines x and y into one array if not already done
+			self.coords = self.ournp.vstack(([x.T], [y.T])).T
+	def calc_R(x, y, xc, yc):
+		"""Returns distance between center and point"""
+		return self.ournp.sqrt((x-xc)**2 + (y-yc)**2)
+	def f(c, x, y):
+		"""Returns distance between point and circle at c"""
+		Ri = calc_R(x, y, *c)
+		return Ri - Ri.mean()
+	def LSC(self):
+		"""Fits given data to a circle and returns the center, radius, and variance"""
+		x = self.coords[:, 0]
+		y = self.coords[:, 1]
+		# guessing at a center
+		x_m = self.ournp.mean(x)
+		y_m = self.ournp.mean(y)
+
+		# calculation of the reduced coordinates
+		u = x - x_m
+		v = y - y_m
+
+		# linear system defining the center (uc, vc) in reduced coordinates:
+		#    Suu * uc +  Suv * vc = (Suuu + Suvv)/2
+		#    Suv * uc +  Svv * vc = (Suuv + Svvv)/2
+		Suv  = self.ournp.sum(u*v)
+		Suu  = self.ournp.sum(u**2)
+		Svv  = self.ournp.sum(v**2)
+		Suuv = self.ournp.sum(u**2 * v)
+		Suvv = self.ournp.sum(u * v**2)
+		Suuu = self.ournp.sum(u**3)
+		Svvv = self.ournp.sum(v**3)
+
+		# Solving the linear system
+		A = self.ournp.array([ [ Suu, Suv ], [Suv, Svv]])
+		B = self.ournp.array([ Suuu + Suvv, Svvv + Suuv ])/2.0
+		uc, vc = self.ournp.linalg.solve(A, B)
+
+		xc_1 = x_m + uc
+		yc_1 = y_m + vc
+
+		# Calculate the distances from center (xc_1, yc_1)
+		Ri_1     = self.ournp.sqrt((x-xc_1)**2 + (y-yc_1)**2)
+		R_1      = self.ournp.mean(Ri_1)
+		# calculate residual error
+		residu_1 = self.ournp.sum((Ri_1-R_1)**2)
+		return (xc_1, yc_1, R_1, residu_1)
--- a/analysis-master/tra_analysis/regression.py
+++ b/analysis-master/tra_analysis/regression.py
@@ -1,8 +1,9 @@
 # Titan Robotics Team 2022: CUDA-based Regressions Module
+# Not actively maintained, may be removed in future release
 # Written by Arthur Lu & Jacob Levine
 # Notes:
 #   this module has been automatically inegrated into analysis.py, and should be callable as a class from the package
-#   this module is cuda-optimized and vectorized (except for one small part)
+#   this module is cuda-optimized (as appropriate) and vectorized (except for one small part)
 # setup:

 __version__ = "0.0.4"
@@ -25,7 +26,7 @@ __changelog__ = """

 __author__ = (
 	"Jacob Levine <jlevine@imsa.edu>",
-	"Arthur Lu <learthurgo@gmail.com>"
+	"Arthur Lu <learthurgo@gmail.com>",
 )

 __all__ = [
@@ -40,14 +41,15 @@ __all__ = [
 	'ExpRegKernel',
 	'SigmoidalRegKernelArthur',
 	'SGDTrain',
-	'CustomTrain'
+	'CustomTrain',
+	'CircleFit'
 ]

 import torch

 global device

-device = "cuda:0" if torch.torch.cuda.is_available() else "cpu"
+device = "cuda:0" if torch.cuda.is_available() else "cpu"

 #todo: document completely

@@ -217,4 +219,4 @@ def CustomTrain(self, kernel, optim, data, ground, loss=torch.nn.MSELoss(), iter
 				ls=loss(pred,ground_cuda)
 				ls.backward()
 				optim.step()
-		return kernel
+		return kernel
--- a/data-analysis/config.json
+++ b/data-analysis/config.json
@@ -1,6 +1,7 @@
 {
+	"max-threads": 0.5,
 	"team": "",
-	"competition": "2020ilch",
+	"competition": "",
 	"key":{
 		"database":"",
 		"tba":""
--- a/data-analysis/requirements.txt
+++ b/data-analysis/requirements.txt
@@ -1,4 +1,4 @@
 requests
 pymongo
 pandas
-dnspython
+tra-analysis
--- a/data-analysis/superscript.py
+++ b/data-analysis/superscript.py
@@ -3,10 +3,18 @@
 # Notes:
 # setup:

-__version__ = "0.7.0"
+__version__ = "0.8.2"

 # changelog should be viewed using print(analysis.__changelog__)
 __changelog__ = """changelog:
+	0.8.2:
+		- readded while true to main function
+		- added more thread config options
+	0.8.1:
+		- optimized matchloop further by bypassing GIL
+	0.8.0:
+		- added multithreading to matchloop
+		- tweaked user log
 	0.7.0:
 		- finished implementing main function
 	0.6.2:
@@ -114,16 +122,25 @@ __all__ = [

 from tra_analysis import analysis as an
 import data as d
+from collections import defaultdict
 import json
+import math
 import numpy as np
+import os
 from os import system, name
 from pathlib import Path
+from multiprocessing import Pool
 import matplotlib.pyplot as plt
+from concurrent.futures import ThreadPoolExecutor
 import time
 import warnings

+global exec_threads
+
 def main():

+	global exec_threads
+
 	warnings.filterwarnings("ignore")

 	while (True):
@@ -138,6 +155,23 @@ def main():
 		metrics_tests = config["statistics"]["metric"]
 		print("[OK] configs loaded")

+		print("[OK] starting threads")
+		cfg_max_threads = config["max-threads"]
+		sys_max_threads = os.cpu_count()
+		if cfg_max_threads > -sys_max_threads and cfg_max_threads < 0 :
+			alloc_processes = sys_max_threads + cfg_max_threads
+		elif cfg_max_threads > 0 and cfg_max_threads < 1:
+			alloc_processes = math.floor(cfg_max_threads * sys_max_threads)
+		elif cfg_max_threads > 1 and cfg_max_threads <= sys_max_threads:
+			alloc_processes = cfg_max_threads
+		elif cfg_max_threads == 0:
+			alloc_processes = sys_max_threads
+		else:
+			print("[Err] Invalid number of processes, must be between -" + str(sys_max_threads) + " and " + str(sys_max_threads))
+			exit()
+		exec_threads = Pool(processes = alloc_processes)
+		print("[OK] " + str(alloc_processes) + " threads started")
+
 		apikey = config["key"]["database"]
 		tbakey = config["key"]["tba"]
 		print("[OK] loaded keys")
@@ -151,15 +185,15 @@ def main():
 		pit_data = load_pit(apikey, competition)
 		print("[OK] loaded data in " + str(time.time() - start) + " seconds")

-		print("[OK] running tests")
+		print("[OK] running match stats")
 		start = time.time()
 		matchloop(apikey, competition, match_data, match_tests)
-		print("[OK] finished tests in " + str(time.time() - start) + " seconds")
+		print("[OK] finished match stats in " + str(time.time() - start) + " seconds")

-		print("[OK] running metrics")
+		print("[OK] running team metrics")
 		start = time.time()
 		metricloop(tbakey, apikey, competition, previous_time, metrics_tests)
-		print("[OK] finished metrics in " + str(time.time() - start) + " seconds")
+		print("[OK] finished team metrics in " + str(time.time() - start) + " seconds")

 		print("[OK] running pit analysis")
 		start = time.time()
@@ -217,48 +251,78 @@ def load_match(apikey, competition):

 	return d.get_match_data_formatted(apikey, competition)

+def simplestats(data_test):
+
+	data = np.array(data_test[0])
+	data = data[np.isfinite(data)]
+	ranges = list(range(len(data)))
+
+	test = data_test[1]
+
+	if test == "basic_stats":
+		return an.basic_stats(data)
+
+	if test == "historical_analysis":
+		return an.histo_analysis([ranges, data])
+
+	if test == "regression_linear":
+		return an.regression(ranges, data, ['lin'])
+
+	if test == "regression_logarithmic":
+		return an.regression(ranges, data, ['log'])
+
+	if test == "regression_exponential":
+		return an.regression(ranges, data, ['exp'])
+
+	if test == "regression_polynomial":
+		return an.regression(ranges, data, ['ply'])
+
+	if test == "regression_sigmoidal":
+		return an.regression(ranges, data, ['sig'])
+
 def matchloop(apikey, competition, data, tests): # expects 3D array with [Team][Variable][Match]

-	def simplestats(data, test):
+	global exec_threads

-		data = np.array(data)
-		data = data[np.isfinite(data)]
-		ranges = list(range(len(data)))
-
-		if test == "basic_stats":
-			return an.basic_stats(data)
-
-		if test == "historical_analysis":
-			return an.histo_analysis([ranges, data])
-
-		if test == "regression_linear":
-			return an.regression(ranges, data, ['lin'])
-
-		if test == "regression_logarithmic":
-			return an.regression(ranges, data, ['log'])
-
-		if test == "regression_exponential":
-			return an.regression(ranges, data, ['exp'])
-
-		if test == "regression_polynomial":
-			return an.regression(ranges, data, ['ply'])
-
-		if test == "regression_sigmoidal":
-			return an.regression(ranges, data, ['sig'])
+	class AutoVivification(dict):
+		def __getitem__(self, item):
+			try:
+				return dict.__getitem__(self, item)
+			except KeyError:
+				value = self[item] = type(self)()
+				return value

 	return_vector = {}
+	
+	team_filtered = []
+	variable_filtered = []
+	variable_data = []
+	test_filtered = []
+	result_filtered = []
+	return_vector = AutoVivification()
+
 	for team in data:
-		variable_vector = {}
+
 		for variable in data[team]:
-			test_vector = {}
-			variable_data = data[team][variable]
+
 			if variable in tests:
+
 				for test in tests[variable]:
-					test_vector[test] = simplestats(variable_data, test)
-			else:
-				pass      
-			variable_vector[variable] = test_vector
-		return_vector[team] = variable_vector
+
+					team_filtered.append(team)
+					variable_filtered.append(variable)
+					variable_data.append((data[team][variable], test))
+					test_filtered.append(test)
+
+	result_filtered = exec_threads.map(simplestats, variable_data)
+	i = 0
+
+	result_filtered = list(result_filtered)
+
+	for result in result_filtered:
+
+		return_vector[team_filtered[i]][variable_filtered[i]][test_filtered[i]] = result
+		i += 1

 	push_match(apikey, competition, return_vector)