rng = np.random.default_rng()
help(rng.random)

Help on built-in function random:

random(...) method of numpy.random._generator.Generator instance
    random(size=None, dtype=np.float64, out=None)
    
    Return random floats in the half-open interval [0.0, 1.0).
    
    Results are from the "continuous uniform" distribution over the
    stated interval.  To sample :math:`Unif[a, b), b > a` use `uniform`
    or multiply the output of `random` by ``(b - a)`` and add ``a``::
    
        (b - a) * random() + a
    
    Parameters
    ----------
    size : int or tuple of ints, optional
        Output shape.  If the given shape is, e.g., ``(m, n, k)``, then
        ``m * n * k`` samples are drawn.  Default is None, in which case a
        single value is returned.
    dtype : dtype, optional
        Desired dtype of the result, only `float64` and `float32` are supported.
        Byteorder must be native. The default value is np.float64.
    out : ndarray, optional
        Alternative output array in which to place the result. If size is not None,
        it must have the same shape as the provided size and must match the type of
        the output values.
    
    Returns
    -------
    out : float or ndarray of floats
        Array of random floats of shape `size` (unless ``size=None``, in which
        case a single float is returned).
    
    See Also
    --------
    uniform : Draw samples from the parameterized uniform distribution.
    
    Examples
    --------
    >>> rng = np.random.default_rng()
    >>> rng.random()
    0.47108547995356098 # random
    >>> type(rng.random())
    <class 'float'>
    >>> rng.random((5,))
    array([ 0.30220482,  0.86820401,  0.1654503 ,  0.11659149,  0.54323428]) # random
    
    Three-by-two array of random numbers from [-5, 0):
    
    >>> 5 * rng.random((3, 2)) - 5
    array([[-3.99149989, -0.52338984], # random
           [-2.99091858, -0.79479508],
           [-1.23204345, -1.75224494]])


x = rng.random(size=1000)
print(x[:10])

fig, (ax1, ax2) = plt.subplots(1, 2)
ax1.plot(np.sort(x))
ax2.hist(x);

[0.23168809 0.20112958 0.75185691 0.62481795 0.82925585 0.03235133
 0.5648251  0.12889729 0.31620117 0.07020354]


true_pos = .994
true_neg = .998
pop_rate = 1.2 / 333


N = 1_000_000
hiv_status = [rng.uniform() < pop_rate for k in range(N)]
test_result = np.full((N,), "")
for k in range(N):
    if hiv_status[k]:
        if rng.uniform() < true_pos:
            result = "+"
        else:
            result = "-"
    else:
        if rng.uniform() < true_neg:
            result = "-"
        else:
            result = "+"
    test_result[k] = result


test_result[:10], hiv_status[:10]

(array(['-', '-', '-', '-', '-', '-', '-', '-', '-', '-'], dtype='<U1'),
 [False, False, False, False, False, False, False, False, False, False])


test_result = pd.Series(test_result, name="test result")
hiv_status = pd.Series(hiv_status, name="HIV status")
pd.crosstab(test_result, hiv_status)


3469/(1982+3469) # TODO: update based on simulation numbers (which change)

0.6363969913777289


num = pop_rate * true_pos
denom = num + (1 - pop_rate) * (1 - true_neg)
num / denom

0.6425339366515835

Probability and Statistics for Data Science¶

Uncertainty: (how to) deal with it¶

Goals of this class¶

Getting random¶

Example: false positives¶

Background data¶

Probability rules¶

Probability rules¶

Bayes' rule¶

HIV status	False	True
test result
+	2007	3586
-	994393	14