Statistics#

Different statistical test which can be used to calculate contrast curves and contrast grids.

Parametric#

Parametric tests for contrast and detection limit calculations.

applefy.statistics.parametric.t_statistic_vectorized(noise_samples, planet_samples, numba_parallel_threads=1)#

Computes the test-statistic of the ttest / bootstrapping using vectorized code. Usually noise_samples and planet_samples contain a list of multiple samples. For every pair of noise and planet sample this function returns one T value. In case more than 10e4 test values have to be computed a parallel implementation using numba can be used.

Parameters
  • planet_samples (Union[float, ndarray]) – The planet sample containing the observation of the planet \(Y_1,\) as a float. In case multiple tests are performed the input can also be a list or 1D array \(((Y_1)_1, (Y_1)_2, ...)\)

  • noise_samples (Union[List[float], ndarray]) – The noise observations containing \((X_1, ..., X_n)\). In case multiple tests are performed the input can also be a list of lists / 1D arrays or a 2D array: \((X_1, ..., X_n)_1, (X_1, ..., X_n)_2, ...\).

  • numba_parallel_threads (int) – The number of parallel threads used by numba. In case the function is used with multiprocessing this number should always be equal to the default of 1.

Return type

Union[float, ndarray]

Returns

The ttest / bootstrapping test statistic \(T_{obs},\). Either a single value or a 1D numpy array.

class applefy.statistics.parametric.TTest(num_cpus=1)#

Bases: TestInterface

A classical two sample ttest. Assumes iid Gaussian noise and tests for differences in means.

classmethod fpf_2_t(fpf, num_noise_values)#

Computes the required value of \(T_{obs},\) (the test statistic) to get a confidence level of fpf. Takes into account the effect of the sample size by using the t-distribution. Accepts a single value as input as well as a list of fpf values.

Parameters
  • fpf (Union[float, ndarray]) – Desired confidence level(s) as FPF

  • num_noise_values (int) – Number of noise observations. Needed to take the effect of the sample size into account.

Return type

Union[float, ndarray]

Returns

The required value(s) of \(T_{obs},\)

t_2_fpf(statistic_t, num_noise_values)#

Computes the p-value of the ttest given the test statistic \(T_{obs},\). Takes into account the effect of the sample size by using the t-distribution. Accepts a single value as input as well as a list of \(T_{obs},\) values.

Parameters
  • statistic_t (Union[float, ndarray]) – The test statistic value(s) \(T_{obs},\)

  • num_noise_values (int) – Number of noise observations. Needed to take the effect of the sample size into account.

Return type

Union[float, ndarray]

Returns

The uncertainty / p-value / fpf of the test

test_2samp(planet_samples, noise_samples)#

Performs one (or multiple) two sample T-Tests given noise observations \((X_1, ..., X_n)\) and a planet observation \(Y_1,\) with null hypothesis:

\(H_0: \mu_X = \mu_Y\)

against the alternative:

\(H_1: \mu_X < \mu_Y\)

This implementation is similar to the one in scipy but calculates the special case where only one value is given in the planet sample.

Parameters
  • planet_samples (Union[float, ndarray]) – The planet sample containing the observation of the planet \(Y_1\) as a float. In case multiple tests are performed the input can also be a list or 1D array \(((Y_1)_1, (Y_1)_2, ...)\).

  • noise_samples (Union[List[float], ndarray]) – The noise observations containing \((X_1, ..., X_n)\). In case multiple tests are performed the input can also be a list of lists / 1D arrays or a 2D array: \((X_1, ..., X_n)_1, (X_1, ..., X_n)_2, ...\)

Return type

Union[Tuple[float, float], Tuple[ndarray, ndarray]]

Returns

  1. The p-value (or a 1d array of p-values) of the test i.e. the FPF.

2. The test statistic \(T_{obs},\) (or a 1d array of the statistics).

constrain_planet(noise_at_planet_pos, noise_samples, desired_confidence_fpf)#

The inverse of test_2samp. Given noise observations \((X_1, ..., X_{n-1})\) and a single noise observation \(X_n\) this function computes how much flux we have to add to \(X_n\) such that a t-test with noise \((X_1, ..., X_{n-1})\) and planet signal \(Y_1 = X_n + f\) rejects the null hypothesis i.e. reaches the desired confidence level. The added flux \(f\) is the flux a potential planet needs to have such that we would count is as a detection (assuming we observe it together with the noise at \(X_n\)). Similar to test_2samp this function also accepts lists as inputs to constrain multiple added_flux values at the same time.

Parameters
  • noise_at_planet_pos (Union[float, List[float], ndarray]) – The noise observation \(X_n\) on top of which the planet is added. In case multiple values are constrained at the same time this can also be a list or 1D array of floats.

  • noise_samples (Union[List[float], ndarray]) – List or 1D array of noise observations containing (\((X_1, ..., X_{n-1})\). In case multiple tests are performed the input can also be a list of lists / 1D arrays or a 2D array: \((X_1, ..., X_{n-1})_1, (X_1, ..., X_{n-1})_2, ...\).

  • desired_confidence_fpf (float) – The desired confidence we want to reach as FPF. For example in case of a 5 sigma detection 2.87e-7.

Return type

Union[float, ndarray]

Returns

The flux we need to add to noise_at_planet_pos to reach the desired_confidence_fpf \(f\) for a t-test. In case of multiple test the output is a 1D array.

Bootstrapping#

This module contains implementations of parametric bootstrapping tests which can be used for to calculate contrast curves and contrast grids.

class applefy.statistics.bootstrapping.BootstrapTest(noise_observations, num_cpus=1)#

Bases: TTest

This is a general interface for all bootstrap tests. It implements all functionality needed to save and load bootstrap results as json files and to run a new bootstrap experiment. It extends the classical TTest() as both use the same test statistic \(T_{obs}\). But for a given value of \(T_{obs}\) the bootstrap tests will give a different p-value / fpf.

__init__(noise_observations, num_cpus=1)#

Constructor of a BootstrapTest.

Parameters
  • noise_observations (Optional[Union[List[float], ndarray]]) – Noise observations which are used to run the bootstrap experiments. This can either be a single sample of noise observations \((X_1, ..., X_n)\) or a numpy array of several samples \((X_1, ..., X_n)_1, (X_1, ..., X_n)_2, ...\). It can also be set to None. If set to None run_bootstrap_experiment() can not be used which is why this option should only be used to restore a previously run bootstrap experiment from a .json file (see restore_lookups()).

  • num_cpus (int) – The number of CPU cores that will be used in all tests (e.g. to run the bootstrapping).

classmethod construct_from_json_file(lookup_file)#

An alternative constructor to create a BootstrapTest from a previously calculated bootstrap experiment given as a .json file. The function will restore the lookup table which maps \(T_{obs}\) to the p-values and vice versa.

Parameters

lookup_file (str) – A path to a .json file containing the statistics to be restored (lookup tables).

Return type

BootstrapTest

Returns

Instance of the BootstrapTest with restored lookup table.

restore_lookups(lookup_file)#

Restores previously computed bootstrap results / lookup tables from a .json file. The lookup table maps \(T_{obs}\) to the p-values and vice versa. If lookups already exist they are updated. Duplicates are overwritten.

Parameters

lookup_file (str) – A path to a .json file containing the lookup tables.

Return type

None

save_lookups(lookup_file)#

Saves the internal lookup tables into a .json file. The lookup tables map \(T_{obs}\) to the p-values and vice versa. The saved tables can be restored with restore_lookups().

Parameters

lookup_file (str) – The path with filename where lookup tables are saved.

Return type

None

run_bootstrap_experiment(memory_size, num_noise_values, num_draws=1000000000.0, approximation_interval=None)#

Runs a bootstrapping experiment (resampling) in order to calculate the distribution of the test statistic \(T\) under \(H_0\) given a sample size of m=1 (one planet observation) and n=num_noise_values. Allows the use of multiprocessing and management of memory size. The result is stored as a lookup table using the approximation_interval. The strategy used to resample during the bootstrapping is implemented by the classes inheriting from this class.

Parameters
  • memory_size (int) – Maximum number of float values stored per process. A loop is used in case the number is small.

  • num_noise_values (int) – The sample size of the noise observations. This depends on the separation from the star.

  • num_draws (int) – Number of bootstrap experiments (resamples) \(B\).

  • approximation_interval (Optional[ndarray]) – The values in terms of \(\sigma_{\mathcal{N}}\) at which the distribution of \(T_{obs}\) is evaluated and stored as a lookup table. If None a np.linspace(-7, 7, 10000) will be used.

Return type

ndarray

Returns

A 1D array with \(B\) bootstrap values \(T^*\).

t_2_fpf(statistic_t, num_noise_values)#

Computes the p-value of the ttest given the test statistic \(T_{obs},\). Takes into account the effect of the sample size and type of the noise (using previously computed lookup tables). Accepts a single value as input as well as a list of \(T_{obs},\) values.

Parameters
  • statistic_t (Union[float, ndarray]) – The test statistic value(s) \(T_{obs},\)

  • num_noise_values (int) – Number of noise observations. Needed to take the effect of the sample size into account.

Return type

Union[float, ndarray]

Returns

The uncertainty / p-value / fpf of the test

fpf_2_t(fpf, num_noise_values)#

Computes the required value of \(T_{obs},\) (the test statistic) to get a confidence level of fpf. Takes into account the effect of the sample size and type of the noise (using previously computed lookup tables). Accepts a single value as input as well as a list of fpf values.

Parameters
  • fpf (Union[float, ndarray]) – Desired confidence level(s) as FPF

  • num_noise_values (int) – Number of noise observations. Needed to take the effect of the sample size into account.

Return type

Union[float, ndarray]

Returns

The required value(s) of \(T_{obs},\)

class applefy.statistics.bootstrapping.GaussianBootstrapTest(noise_observations, num_cpus=1)#

Bases: BootstrapTest

The GaussianBootstrapTest is a parametric hypothesis test which assumes that the noise is Gaussian. This test is approximately equivalent to the ttest. This implementation is only for illustration purposes and should not be used in practice.

class applefy.statistics.bootstrapping.LaplaceBootstrapTest(noise_observations=None, num_cpus=1)#

Bases: BootstrapTest

The LaplaceBootstrapTest is a parametric hypothesis test which assumes that the distribution of the noise is Laplace. The test accounts for the higher occurrence rate of bright noise values as well as for the small sample size at close separations to the star. Applefy comes with previously computed lookup tables.

__init__(noise_observations=None, num_cpus=1)#

Constructor of a LaplaceBootstrapTest.

Parameters
  • noise_observations (Optional[Union[List[float], ndarray]]) – Noise observations which are used to run the bootstrap experiments. The LaplaceBootstrapTest benefits from pivoting. Hence, noise_observations will have no effect on the result.

  • num_cpus (int) – The number of CPU cores that will be used in all tests (e.g. to run the bootstrapping).

General#

The interface needed to make all statistical tests compatible with the contrast curve calculation.

class applefy.statistics.general.TestInterface(num_cpus=1)#

Bases: ABC

A general interface for two sample tests. This interface guarantees that the code for the contrast curve / grid calculations can be used with all test implemented.

__init__(num_cpus=1)#

The tests might use multiprocessing.

Parameters

num_cpus (int) – The number of CPU cores that will be used in all tests (e.g. to run the bootstrapping).

abstract test_2samp(planet_samples, noise_samples)#

Performs one (or multiple) two sample test given noise observations \((X_1, ..., X_n)\) and a planet observation \(Y_1,\) with null hypothesis:

\(H_0: \mu_X = \mu_Y\)

against the alternative:

\(H_1: \mu_X < \mu_Y\)

Parameters
  • planet_samples (Union[float, ndarray]) – The planet sample containing the observation of the planet \(Y_1\) as a float. In case multiple tests are performed the input can also be a list or 1D array \(((Y_1)_1, (Y_1)_2, ...)\).

  • noise_samples (Union[List[float], ndarray]) – The noise observations containing \((X_1, ..., X_n)\). In case multiple tests are performed the input can also be a list of lists / 1D arrays or a 2D array: \((X_1, ..., X_n)_1, (X_1, ..., X_n)_2, ...\).

Return type

Union[Tuple[float, float], Tuple[ndarray, ndarray]]

Returns

1. The p-value (or a 1d array of p-values) of the test i.e. the FPF. The interface returns -1.

2. The test statistic \(T_{obs},\) (or a 1d array of the statistics). The interface returns -1.

abstract constrain_planet(noise_at_planet_pos, noise_samples, desired_confidence_fpf)#

The inverse of test_2samp. Given noise observations \((X_1, ..., X_{n-1})\) and a single noise observation \(X_n\) this function computes how much flux we have to add to \(X_n\) such that a test_2samp with noise \((X_1, ..., X_{n-1})\) and planet signal \(Y_1 = X_n + f\) rejects the null hypothesis i.e. reaches the desired confidence level. The added flux \(f\) is the flux a potential planet needs to have such that we would count is as a detection (assuming we observe it together with the noise at \(X_n\)). Similar to test_2samp this function also accepts lists as inputs to constrain multiple added_flux values at the same time.

Parameters
  • noise_at_planet_pos (Union[float, List[float], ndarray]) – The noise observation \(X_n\) on top of which the planet is added. In case multiple values are constrained at the same time this can also be a list or 1D array of floats.

  • noise_samples (Union[List[float], ndarray]) – List or 1D array of noise observations containing (\((X_1, ..., X_{n-1})\). In case multiple tests are performed the input can also be a list of lists / 1D arrays or a 2D array: \((X_1, ..., X_{n-1})_1, (X_1, ..., X_{n-1})_2, ...\).

  • desired_confidence_fpf (float) – The desired confidence we want to reach as FPF. For example in case of a 5 sigma detection 2.87e-7.

Return type

Union[float, ndarray]

Returns

The flux we need to add to noise_at_planet_pos to reach the desired_confidence_fpf \(f\). In case of multiple test the output is a 1D array. The interface returns -1.

applefy.statistics.general.fpf_2_gaussian_sigma(confidence_fpf)#

Transforms a confidence level given as false-positive-fraction / p-value into a confidence level in the gaussian sense \(\sigma_\mathcal{N}\).

Parameters

confidence_fpf (Union[float, ndarray]) – The FPF we want to translate.

Return type

float

Returns

\(\sigma_\mathcal{N}\)

applefy.statistics.general.gaussian_sigma_2_fpf(confidence_sigma)#

Transforms a confidence level given as \(\sigma_\mathcal{N}\) into a confidence level as false-positive-fraction.

Parameters

confidence_sigma (Union[float, ndarray]) – The FPF as \(\sigma_\mathcal{N}\).

Return type

float

Returns

p-value / false-positive-fraction