{ "cells": [ { "cell_type": "markdown", "id": "59742dcb", "metadata": {}, "source": [ "# Homework 2: Random variables\n", "\n", "*Instructions:*\n", "Please answer the following questions and submit your work\n", "by editing this jupyter notebook and submitting it on Canvas.\n", "Questions may involve math, programming, or neither,\n", "but you should make sure to *explain your work*:\n", "i.e., you should usually have a cell with at least a few sentences\n", "explaining what you are doing." ] }, { "cell_type": "markdown", "id": "3e94f2a0", "metadata": {}, "source": [ "Several times below I ask you to compare the results of a simulation to a theoretical distribution.\n", "Here is a function that makes this easy:" ] }, { "cell_type": "code", "execution_count": 1, "id": "2d27b846", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import math\n", "rng = np.random.default_rng()\n", "\n", "def comparison_table(x, expected, *sims):\n", " \"\"\"\n", " Returns a pandas DataFrame with columns corresponding to:\n", " x: the possible values\n", " expected: the expected frequencies these should happen at\n", " sim1, sim2, ...: these are vectors of simulated values that\n", " will be tabulated, and the frequencies of each of the values\n", " of x will be put in a column in the result.\n", " \"\"\"\n", " df = pd.DataFrame(data={\"x\" : x})\n", " df['expected'] = expected\n", " for k, sim in enumerate(sims):\n", " total = len(sim)\n", " n = [np.sum(sim == y)/total for y in x]\n", " df[f\"sim{k}\"] = n\n", " df.style.format(\"{:.3f}\")\n", " return df" ] }, { "cell_type": "markdown", "id": "2eb4aa2c", "metadata": {}, "source": [ "For instance, here I'm simulating 1000 draws, twice, from a fair die (i.e., uniform draws from 1, 2, 3, 4, 5, 6)\n", "and using the function to see if it's close to the expected proportions of 1/6. Note that `expected` is a vector of length 6, but `sim1` and `sim2` are vectors of length 1000.\n", "A good way to check your numbers are \"close enough\" is to do the simulations two or three times (like I do here),\n", "and so check that the differences between simulations are similar to the difference to the expected column." ] }, { "cell_type": "code", "execution_count": 2, "id": "67985d54", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | x | \n", "expected | \n", "sim0 | \n", "sim1 | \n", "
---|---|---|---|---|
0 | \n", "1 | \n", "0.166667 | \n", "0.180 | \n", "0.166 | \n", "
1 | \n", "2 | \n", "0.166667 | \n", "0.183 | \n", "0.173 | \n", "
2 | \n", "3 | \n", "0.166667 | \n", "0.137 | \n", "0.156 | \n", "
3 | \n", "4 | \n", "0.166667 | \n", "0.151 | \n", "0.172 | \n", "
4 | \n", "5 | \n", "0.166667 | \n", "0.166 | \n", "0.170 | \n", "
5 | \n", "6 | \n", "0.166667 | \n", "0.183 | \n", "0.163 | \n", "