{ "cells": [ { "cell_type": "code", "execution_count": 3, "id": "39a7e8ac", "metadata": {}, "outputs": [], "source": [ "import pymc as pm\n", "import arviz as az\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd" ] }, { "cell_type": "markdown", "id": "3b2a5cfe", "metadata": {}, "source": [ "# Exercise 1\n", "\n", "Consider a population of 661 people. We want to know the average height of this population. We take a random sample of 40 people and measure their heights. Denote population mean $\\bar{y}$, sample mean by $\\bar{y}_{obs}$ and out of sample mean by $\\bar{y}_{mis}$. Let's assume that heights are exchangeable.\n", "First estimate $\\bar{y}_{mis}$ and then $\\bar{y}$ by\n", "$$\n", "\\bar{y} = \\frac{n}{N} \\bar{y}_{obs} + \\frac{N-n}{N} \\bar{y}_{mis}.\n", "$$\n" ] }, { "cell_type": "code", "execution_count": null, "id": "f7520535", "metadata": {}, "outputs": [], "source": [ "df = pd.read_csv(\"pikkused.csv\")\n", "sample = df[\"pikkus\"].sample(40, random_state=42)" ] }, { "cell_type": "markdown", "id": "358e1aa9", "metadata": {}, "source": [ "# Exercise 2\n", "The following series of exercises are from BDA3. The numbers correspond to the exercise number in the book. " ] }, { "cell_type": "markdown", "id": "5326c85b", "metadata": {}, "source": [ "## 2.13.\n", "\n", "Table 2.2 gives the number of fatal accidents and deaths on scheduled airline flights per year over a ten-year period. We use these data as a numerical example for fitting discrete data models.\n", "\n", "### (a)\n", "Assume that the numbers of fatal accidents in each year are independent with a Poisson(θ) distribution. Set a prior distribution for θ and determine the posterior distribution based on the data from 1976 through 1985. Under this model, give a 95% predictive interval for the number of fatal accidents in 1986. You can use the normal approximation to the gamma and Poisson or compute using simulation.\n", "\n", "### (b)\n", "Assume that the numbers of fatal accidents in each year follow independent Poisson distributions with a constant rate and an exposure in each year proportional to the number of passenger miles flown. Set a prior distribution for θ and determine the posterior distribution based on the data for 1976–1985. \n", "*(Estimate the number of passenger miles flown in each year by dividing the appropriate columns of Table 2.2 and ignoring round-off errors.)* \n", "\n", "Give a 95% predictive interval for the number of fatal accidents in 1986 under the assumption that \\( 8 \\times 10^{11} \\) passenger miles are flown that year.\n", "\n", "### (c)\n", "Repeat (a) above, replacing *“fatal accidents”* with *“passenger deaths.”*\n", "\n", "### (d)\n", "Repeat (b) above, replacing *“fatal accidents”* with *“passenger deaths.”*\n", "\n", "### (e)\n", "In which of the cases (a)–(d) above does the Poisson model seem more or less reasonable? Why? Discuss based on general principles, without specific reference to the numbers in Table 2.2.\n", "\n", "---\n", "\n", "**Incidentally**, in 1986, there were:\n", "- 22 fatal accidents \n", "- 546 passenger deaths \n", "- A death rate of 0.06 per 100 million miles flown" ] }, { "cell_type": "markdown", "id": "d8ff9b1f", "metadata": {}, "source": [ "## 6.2. Model Checking\n", "\n", "In Exercise 2.13, the counts of airline fatalities in 1976–1985 were fitted to four different Poisson models.\n", "\n", "### (a)\n", "For each of the models, set up posterior predictive test quantities to check the following assumptions:\n", "1. Independent Poisson distributions \n", "2. No trend over time \n", "\n", "### (b)\n", "For each of the models, use simulations from the posterior predictive distributions to measure the discrepancies. Display the discrepancies graphically and give *p*-values.\n", "\n", "### (c)\n", "Do the results of the posterior predictive checks agree with your answers in Exercise 2.13(e)?\n", "\n", "---\n", "\n", "## 6.3. Model Improvement\n", "\n", "### (a)\n", "Use the solution to the previous problem and your substantive knowledge to construct an improved model for airline fatalities.\n", "\n", "### (b)\n", "Fit the new model to the airline fatality data.\n", "\n", "### (c)\n", "Use your new model to forecast the airline fatalities in 1986. How does this differ from the forecasts from the previous models?\n", "\n", "### (d)\n", "Check the new model using the same posterior predictive checks as you used in the previous models. Does the new model fit better?" ] }, { "cell_type": "markdown", "id": "edc342da", "metadata": {}, "source": [ "## 8.14. Rounded Data\n", "\n", "The last two columns of Table 2.2 on page 59 give data on passenger airline deaths and deaths per 100 million passenger mile flown. We would like to divide these to obtain the number of passenger miles flown in each year, but the “per mile” data are rounded. \n", "\n", "*(For the purposes of this exercise, ignore the column in the table labeled “Fatal accidents.”)*\n", "\n", "### (a)\n", "Using just the data from 1976 (734 deaths, 0.19 deaths per 100 million passenger miles), obtain inference for the number of passenger miles flown in 1976. Give a 95% posterior interval *(you may do this by simulation).* Clearly specify your model and your prior distribution.\n", "\n", "### (b)\n", "Apply your method to obtain intervals for the number of passenger miles flown each year until 1985, analyzing the data from each year separately.\n", "\n", "### (c)\n", "Now create a model that allows you to use data from all the years to estimate jointly the number of passenger miles flown each year. Estimate the model and give 95% intervals for each year. *(Use approximate computational methods.)*\n", "\n", "### (d)\n", "Describe how you would use the results of this analysis to get a better answer for Exercise 2.13." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.8" } }, "nbformat": 4, "nbformat_minor": 5 }