rng = np.random.default_rng()
help(rng.random)

Help on built-in function random:

random(...) method of numpy.random._generator.Generator instance
    random(size=None, dtype=np.float64, out=None)
    
    Return random floats in the half-open interval [0.0, 1.0).
    
    Results are from the "continuous uniform" distribution over the
    stated interval.  To sample :math:`Unif[a, b), b > a` multiply
    the output of `random` by `(b-a)` and add `a`::
    
      (b - a) * random() + a
    
    Parameters
    ----------
    size : int or tuple of ints, optional
        Output shape.  If the given shape is, e.g., ``(m, n, k)``, then
        ``m * n * k`` samples are drawn.  Default is None, in which case a
        single value is returned.
    dtype : dtype, optional
        Desired dtype of the result, only `float64` and `float32` are supported.
        Byteorder must be native. The default value is np.float64.
    out : ndarray, optional
        Alternative output array in which to place the result. If size is not None,
        it must have the same shape as the provided size and must match the type of
        the output values.
    
    Returns
    -------
    out : float or ndarray of floats
        Array of random floats of shape `size` (unless ``size=None``, in which
        case a single float is returned).
    
    Examples
    --------
    >>> rng = np.random.default_rng()
    >>> rng.random()
    0.47108547995356098 # random
    >>> type(rng.random())
    <class 'float'>
    >>> rng.random((5,))
    array([ 0.30220482,  0.86820401,  0.1654503 ,  0.11659149,  0.54323428]) # random
    
    Three-by-two array of random numbers from [-5, 0):
    
    >>> 5 * rng.random((3, 2)) - 5
    array([[-3.99149989, -0.52338984], # random
           [-2.99091858, -0.79479508],
           [-1.23204345, -1.75224494]])


x = rng.random(size=1000)
print(x[:10])

fig, (ax1, ax2) = plt.subplots(1, 2)
ax1.plot(np.sort(x))
ax2.hist(x);

[0.80028468 0.1207662  0.02434086 0.9819454  0.99290216 0.74296774
 0.52593177 0.01181226 0.00296221 0.31311247]


true_pos = .994
true_neg = .998
pop_rate = 1.2 / 333


N = int(1e6)
hiv_status = pd.Series( rng.random(N) < pop_rate, name="HIV+")
n = np.sum(hiv_status)
test_result = pd.Series( np.full((N,), ""), name="test")
# hiv+ people
test_result[hiv_status] = ["+" if p < true_pos else "-" for p in rng.random(n)]
# hiv- people
test_result[~hiv_status] = ["-" if p < true_neg else "+" for p in rng.random(N - n)]

pd.crosstab(hiv_status, test_result, margins=True)


pd.crosstab(hiv_status, test_result, margins=True)


hiv_given_plus = sum(hiv_status & (test_result == "+")) / np.sum(test_result == "+")
print(f"The proportion of the {np.sum(test_result == '+')} people "
      f"that had a positive test result that actually have HIV is {100*hiv_given_plus:.2f}%.")

The proportion of the 5648 people that had a positive test result that actually have HIV is 64.59%.


pop_rate * true_pos / (pop_rate * true_pos + (1 - pop_rate) * (1 - true_neg))

0.6425339366515835

test	+	-	All
HIV+
False	2000	994327	996327
True	3648	25	3673
All	5648	994352	1000000

test	+	-	All
HIV+
False	2000	994327	996327
True	3648	25	3673
All	5648	994352	1000000

Probability and Statistics for Data Science¶

Uncertainty: (how to) deal with it¶

Goals of this class¶

Getting random¶

Example: false positives¶

Background data¶

Probability rules¶

Probability rules¶

Bayes' rule¶