STAT 202 Study Guide for Memory Game <><><><><><><><><><><><><><><><><><><> Chapter 1 question: 1 If you have paired data in which both sets are numerical, what is a good first tool to work with to see if there is a relationship? * Linear Regression <><><><><><><><><><><><><><><><><><><> Chapter 1 question: 2 If you have paired data in which both sets are categorical, what is a good first tool to work with to see if there is a relationship? * Chi Square Test <><><><><><><><><><><><><><><><><><><> Chapter 1 question: 3 If you have paired data where one set is categorical and one is numerical, what is a good first tool to work with to see if there is a relationship? * ANOVA <><><><><><><><><><><><><><><><><><><> Chapter 1 question: 4 If you have paired data in which both sets are numerical, what is a good graphical tool to use to represent your findings to a general audience? * A scatterplot, possibly with a best fit line. <><><><><><><><><><><><><><><><><><><> Chapter 1 question: 5 If you have paired data in which both sets are categorical, what is a good graphical tool to use to represent your findings to a general audience? * A collection of pie charts is often useful <><><><><><><><><><><><><><><><><><><> Chapter 1 question: 6 If you have paired data where one set is categorical and one's numerical, what's a good graphical tool to use to represent your findings to a general audience? * Boxplots along each category are often useful. <><><><><><><><><><><><><><><><><><><> Chapter 1 question: 7 What does p-value mean? * It's the chance with the null you obtain data that's at least that extreme. <><><><><><><><><><><><><><><><><><><> Chapter 1 question: 8 What is the magic cutoff value for significance of a p-value? * There is no magic cutoff. * In this class, the cutoff is typically 0.05 or 0.01, just in this class. * This is situation dependent. <><><><><><><><><><><><><><><><><><><> Chapter 2 question: 9 What is the difference between a pie chart and a bar plot? * The pie chart and bar plot give roughly the same information. * These are two different visuals. * One is drawn in a circle, and one isn't. <><><><><><><><><><><><><><><><><><><> Chapter 2 question: 10 What's important when considering the difference between a bar plot and a histogram? * Usually a bar plot is drawn with separated bars. * We never want to forget whether our data are categorical or numerical, so these tools have different names. * These tools are very similar. * We want to choose graphics that most clearly present what we want to express. <><><><><><><><><><><><><><><><><><><> Chapter 2 question: 11 What are quartiles? * Quartiles are named Q1, Q2, and Q3. * Q2 is the same as the median. * Q1 marks off where the first 25% of the data end <><><><><><><><><><><><><><><><><><><> Chapter 3 question: 12 Name three measures of centrality * mean * median * mode <><><><><><><><><><><><><><><><><><><> Chapter 3 question: 13 Define mean * average <><><><><><><><><><><><><><><><><><><> Chapter 3 question: 14 Define median * middlemost data element <><><><><><><><><><><><><><><><><><><> Chapter 3 question: 15 Define mode * most common data element <><><><><><><><><><><><><><><><><><><> Chapter 3 question: 16 Name three measures of spread. * Standard Deviation * Variance * IQR <><><><><><><><><><><><><><><><><><><> Chapter 3 question: 17 Define IQR * Distance from Q1 to Q3 * Where the middle half of the data fall <><><><><><><><><><><><><><><><><><><> Chapter 3 question: 18 Define Standard Deviation * The distance between the mean of the data and the point of inflection (normal data). * Plus and minus one standard deviation from the mean covers about 68 percent of the data (normal data). <><><><><><><><><><><><><><><><><><><> Chapter 3 question: 19 Define Variance * The square of the standard deviation. <><><><><><><><><><><><><><><><><><><> Chapter 3 question: 20 What is a skewed distribution? * The shape of the distribution is not symmetric. <><><><><><><><><><><><><><><><><><><> Chapter 3 question: 21 When, if ever, does it make sense to average averages? * You can only get away with this when everything has the same weight. * You usually can't get away with doing this safely. * If all the groups are the same size, you can average the averages for the groups. <><><><><><><><><><><><><><><><><><><> Chapter 4 question: 22 What is it called when the sample you take is everything in your population? * It's called a census. <><><><><><><><><><><><><><><><><><><> Chapter 4 question: 23 Let's say we group all current AU students into grads and undergrads, and first, we want to know the ratio of how many grads are wearing jeans today. * Total number of grads wearing jeans OVER total number of grads. <><><><><><><><><><><><><><><><><><><> Chapter 4 question: 24 What is an expected value in the context of a 2-way contingency table? * Row sum TIMES column sum divided by overall sum. * Percent with column trait TIMES percent with row trait TIMES total number overall. <><><><><><><><><><><><><><><><><><><> Chapter 4 question: 25 What can cause 'bias' in the context of a survey? * A survey is written in such a way as to encourage a certain response. * The surveys are only done in the evening. * If the person asking the questions is wearing a political button * The only place the survey is given is in front of the gym. <><><><><><><><><><><><><><><><><><><> Chapter 4 question: 26 What does 'bias' mean in the context of a statistic? * Some statistical tools are inherently mathematically 'biased' in that they tend over time to be too high or too low. <><><><><><><><><><><><><><><><><><><> Chapter 5 question: 27 What is the relationship between standard deviation and variance? * If you square the standard deviation, you get the variance. <><><><><><><><><><><><><><><><><><><> Chapter 5 question: 28 What is the population standard deviation? * The population standard deviation is the standard deviation of the population. <><><><><><><><><><><><><><><><><><><> Chapter 5 question: 29 What is the sample standard deviation, and why is it named as such? * The 'sample standard deviation' is the statistic used to estimate the population's standard deviation. * It is calculated using the sample, which is how it gets its name. <><><><><><><><><><><><><><><><><><><> Chapter 5 question: 30 In software, which standard deviation is usually calculated on a collection of data, and why? * Most software will default to calculating the 'sample standard deviation'. * Most data sets are samples. <><><><><><><><><><><><><><><><><><><> Chapter 5 question: 31 What is a parameter? * It is a numerical (or non-numerical) value that represents a population. * You can only find it if you have a census. <><><><><><><><><><><><><><><><><><><> Chapter 5 question: 32 What is a statistic? * It is a numerical (or non-numerical) value that is calculated from a sample. <><><><><><><><><><><><><><><><><><><> Chapter 5 question: 33 Why do we calculate statistics? * Typically, when we calculate statistics, it's really the population we are interested in. * Sample statistics are usually used to estimate population parameters. <><><><><><><><><><><><><><><><><><><> Chapter 5 question: 34 Why use biased estimations at all? * Sometimes a biased estimator is the best you can do. <><><><><><><><><><><><><><><><><><><> Chapter 5 question: 35 Give an example of a biased statistical tool. * Using the maximum of a sample to estimate the maximum of the population will tend to give an estimate that's too low. * Using the formula for population standard deviation on the sample will not give a good estimate of the population's standard deviation. <><><><><><><><><><><><><><><><><><><> Chapter 5 question: 36 Do we use the same formula for the population mean and the sample mean? why? * Yes. * The process for finding the mean of the sample gives us a good prediction for the mean of the population. <><><><><><><><><><><><><><><><><><><> Chapter 5 question: 37 Do we use the same formula for the population standard deviation and the sample standard deviation? why? * No. * The process we use to determine the population std when applied to a sample, would not give a good estimate for the population standard deviation. <><><><><><><><><><><><><><><><><><><> Chapter 5 question: 38 If we add 5 to every number in a set of numbers, how does the mean change? * It increases by 5. <><><><><><><><><><><><><><><><><><><> Chapter 5 question: 39 If we add 5 to every number in a set of numbers, how does the standard deviation change? * It stays the same. <><><><><><><><><><><><><><><><><><><> Chapter 5 question: 40 If we add 5 to every number in a set of numbers, how does the median change? * It increases by 5. <><><><><><><><><><><><><><><><><><><> Chapter 5 question: 41 If we multiply 2 to every number in a set of numbers, how does the mean change? * It increases by a factor of 2. <><><><><><><><><><><><><><><><><><><> Chapter 5 question: 42 If we multiply 2 to every number in a set of numbers, how does the standard deviation change? * It increases by a factor of 2. <><><><><><><><><><><><><><><><><><><> Chapter 5 question: 43 If we multiply 2 to every number in a set of numbers, how does the variance change? * It increases by a factor of 4. <><><><><><><><><><><><><><><><><><><> Chapter 5 question: 44 If you order fries and a sandwich, how would you calculate the overall caloric mean of your expected meal (r=0)? * Add the expected means together. <><><><><><><><><><><><><><><><><><><> Chapter 5 question: 45 If you order fries and a sandwich, how would you calculate the overall caloric variance of your expected meal (r=0)? * Add the expected variances together. <><><><><><><><><><><><><><><><><><><> Chapter 5 question: 46 If you order fries and a sandwich, how would you calculate the overall caloric standard deviation of your expected meal (r=0)? * Take the square root of the expected variance. <><><><><><><><><><><><><><><><><><><> Chapter 5 question: 47 One student took the SAT math test and one took the ACT math test. How would you compare the two students? * Convert both to z-scores. <><><><><><><><><><><><><><><><><><><> Chapter 6 question: 48 What is the Empirical Rule? * It reminds us that roughly 68% of the data fall within plus and minus one standard deviation for a normal data set. * We use '68-95-99.7' to remind us of the area values for plus and minus one, two, and 3 standard deviations. <><><><><><><><><><><><><><><><><><><> Chapter 6 question: 49 What is the placebo effect? * This only relates to studies on humans. * Humans often believe there is an effect when they perceive a treatment is happening. <><><><><><><><><><><><><><><><><><><> Chapter 6 question: 50 What is a double blind study? * This only relates to humans. * Not only can those receiving the treatments be swayed by the placebo effect, so can the researchers. * The best practice is that not only the participants but also the researchers don't know until after the end of the study which participants were in each group. <><><><><><><><><><><><><><><><><><><> Chapter 6 question: 51 Why do medical studies often use placebos? * In order to test the effectiveness of a treatment, they use placebos on the untreated group so they don't know they're untreated. <><><><><><><><><><><><><><><><><><><> Chapter 6 question: 52 What famous medical study happened in Tuskeegee, Alabama? * Between 1932 and 1972, many black men were left untreated for syphilis as a placebo group. * The men were not informed nor given the option to quit the study even after (c. 1943) * Penicillin was discovered in 1943, which could cure syphilis. * About 128 preventable deaths were caused. <><><><><><><><><><><><><><><><><><><> Chapter 6 question: 53 What happened in the Milgram Shock experiment? * In 1963 Milgram wanted to understand why Nazi soldiers had obeyed their orders. * Milgaram designed an experiment in which participants were told they were in a study about learning, but actually the study was about them. * They were told to cause painful shocks to their 'students' who were actually just acting as if they were in pain. * It was surprising to many that the 'teachers' obeyed the instructions long after common sense would have suggested they would have stopped. <><><><><><><><><><><><><><><><><><><> Chapter 7 question: 54 If you flip all the x and y values in a scatterplot, will you get a line of best fit that's also the exact flip of the one you originally had? (Why or why not?) * No. * The best-fit line is not meant to imply a geometric line of best fit. * The two solutions you would get are usually quite similar for highly correlated data. <><><><><><><><><><><><><><><><><><><> Chapter 7 question: 55 If you calculate a low correlation between paired numerical data points, does this mean there is no relationship? * It only means there is little to no linear relationship. * Perhaps there is a different pattern in the data. <><><><><><><><><><><><><><><><><><><> Chapter 7 question: 56 If there is no relationship between two paired numerical sets of data, what do you expect the correlation to be? * A correlation of zero is expected. <><><><><><><><><><><><><><><><><><><> Chapter 7 question: 57 If someone thinks 0.8 is a high correlation, what would that same person say about a correlation of -0.85? * They should say it's higher. <><><><><><><><><><><><><><><><><><><> Chapter 7 question: 58 If I create a random set of x-y data, do I expect the correlation to be zero? * Yes, if it's truly random. <><><><><><><><><><><><><><><><><><><> Chapter 7 question: 59 What if I create a random set of points and calculate their correlation. I get a p-value of 0.04. Does that indicate significance? * No. * You just created it randomly, so you already know it's not significant. <><><><><><><><><><><><><><><><><><><> Chapter 7 question: 60 Does correlation mean there is some causal relationship between your variables? * Correlation does not promise causation. * Causation can yield correlation. <><><><><><><><><><><><><><><><><><><> Chapter 8 question: 61 What does it mean for two events to be disjoint? Give an example. * They can't happen together. * Example: 'It is a Thursday', vs 'It is a Sunday'. <><><><><><><><><><><><><><><><><><><> Chapter 8 question: 62 What does it mean for two events to be independent? Give an example. * They don't effect each other. * Example: 'It's Sunday', vs 'It's rainy' <><><><><><><><><><><><><><><><><><><> Chapter 8 question: 63 What does it mean for two events to be mutually exclusive? Give an example. * They can't happen together. * Example: 'It is a Thursday', vs 'It is a Sunday'. <><><><><><><><><><><><><><><><><><><> Chapter 9 question: 64 When can you multiply two probabilities? * When the events are independent. * When the correlation is zero (or very close). <><><><><><><><><><><><><><><><><><><> Chapter 9 question: 65 When can you add two probabilities? * When the events are mutually exclusive. * When the events are disjoint. * When the events can't occur at the same time. <><><><><><><><><><><><><><><><><><><> Chapter 9 question: 66 In combinatorics, would 'select 2 committee members from a group' be with or without replacement? * Without. <><><><><><><><><><><><><><><><><><><> Chapter 9 question: 67 In combinatorics, would 'select 2 officers from a group' be with or without replacement? * Without. <><><><><><><><><><><><><><><><><><><> Chapter 9 question: 68 From a combinatorics perspective, how do the 'officer' selection and 'committee' selection differ? * In the committee case, the order in which they are selected doesn't matter. * In the 'officer' case, the order in which they're selected matters. <><><><><><><><><><><><><><><><><><><> Chapter 10 question: 69 If you roll two standard 6-sided dice, what is the expected value for the sum? * 7 <><><><><><><><><><><><><><><><><><><> Chapter 11 question: 70 Are natural phenomena usually easily modeled with normal distributions? * There are lots of other distribution families you can look up and use for that! * Sometimes. <><><><><><><><><><><><><><><><><><><> Chapter 12 question: 71 Is the binomial distribution a normal distribution? Why or why not? * The binomial distribution has discrete values whereas the normal distribution is continuous. * No. * However, they are very similar when you have a large number of outcomes. <><><><><><><><><><><><><><><><><><><> Chapter 13 question: 72 What is a sampling distribution? * It's the new distribution of sample means you would get if you could sample infinitely from your population. <><><><><><><><><><><><><><><><><><><> Chapter 13 question: 73 You and your friend each draw a random sample from a common population and create two different 95% confidence intervals. They don't even overlap! What can you conclude? * It's clear that one or both of you failed to capture the true value! <><><><><><><><><><><><><><><><><><><> Chapter 14 question: 74 What is the null hypothesis? * It's the claim that nothing other than sampling error could be causing the difference between the claim and the observation. * It's the claim that is being made about how reality is set up. <><><><><><><><><><><><><><><><><><><> Chapter 14 question: 75 What is the alternative hypothesis? * It's the claim that something is causing the claim to differ from the observed (measured) data. In other words, that the claim is wrong. * This says that the null hypothesis is incorrect. <><><><><><><><><><><><><><><><><><><> Chapter 14 question: 76 Can you use more than one null hypothesis at a time using basic STAT 202 tools? * No, you can only have one null hypothesis. <><><><><><><><><><><><><><><><><><><> Chapter 14 question: 77 Can ever you have more than one hypothesis for a given set of data? * Yes, but it's often awkward to talk about them this way. * We don't talk about multiple hypotheses in introductory courses. * We can discuss the claims one by one. <><><><><><><><><><><><><><><><><><><> Chapter 14 question: 78 What can happen if we have too many claims tested in the same study with the same data? * You may seemingly demonstrate an alternative hypothesis to be true, just because you've keep trying so much! * There are ways to prevent this from becoming an issue, but we don't discuss them at this level of study. <><><><><><><><><><><><><><><><><><><> Chapter 14 question: 79 What is the xkcd 'Green Jelly Beans Cause Acne' cartoon demonstrating? * If you keep trying hypothesis after hypothesis, even when nothing is really going on, you will eventually seem to 'prove' one of them. <><><><><><><><><><><><><><><><><><><> Chapter 14 question: 80 When doing hypothesis testing, what type of statement is assigned to the null hypothesis versus what is assigned to the alternative hypothesis? * The null hypothesis is usually the 'boring' option. * The alternative is that something odd is happening. <><><><><><><><><><><><><><><><><><><> Chapter 14 question: 81 In hypothesis testing, who has the burden of proof? * The person making the counter-claim has the burden of proof. * Even though it may be false, we behave as if the null hypothesis is true unless proven false. <><><><><><><><><><><><><><><><><><><> Chapter 15 question: 82 How does the mean of a sampling distribution relate to the mean of a population? * The mean of the sampling distribution is the same as the mean of the population. <><><><><><><><><><><><><><><><><><><> Chapter 15 question: 83 What do you need to know in order to determine what your sampling distribution looks like? * You need to know what the population looks like. * You need to know the size of the samples. <><><><><><><><><><><><><><><><><><><> Chapter 15 question: 84 How does the standard deviation of a sampling distribution relate to the standard deviation of the population from which the sample is drawn? * It gets smaller and smaller as the size of the sample increases. * It's never larger than the population's standard deviation. <><><><><><><><><><><><><><><><><><><> Chapter 15 question: 85 You have a Student-table with t* cutoff values and the p-values across the top row are strictly DECREASING. * If your score falls off the RIGHT side of the table, your result is VERY significant. * If your score falls off the LEFT side of the table, your result is NOT AT ALL significant. <><><><><><><><><><><><><><><><><><><> Chapter 15 question: 86 When testing against a hypothesis, what value goes in the middle of the confidence interval? Why? * The hypothesized value goes in the middle. * The hypothesized value is the standard against which you test your results. <><><><><><><><><><><><><><><><><><><> Chapter 15 question: 87 What does the Central Limit Theorem state? Give an example. * From a population, you create a 'sampling distribution' (of sample means). The mean of the sampling distribution is the mean of the original population. * The standard deviation of the sampling distribution (of sample means) will get progressively smaller as the sample size increases. <><><><><><><><><><><><><><><><><><><> Chapter 16 question: 88 You have two boxes of red and blue balls. Box one has 25 red and 75 blue balls. You don't know what is in the other box. You draw 8 balls from one of the boxes at random and 3 balls are red. What box did you likely pull from? * This question can't be answered. <><><><><><><><><><><><><><><><><><><> Chapter 16 question: 89 You have two boxes of red and blue balls. Box one has 10 red and 75 blue balls. You don't know what is in the other box. You draw 8 balls from one of the boxes at random and all 8 balls are red. What box did you likely pull from? * You were unlikely to have pulled from box one. <><><><><><><><><><><><><><><><><><><> Chapter 16 question: 90 What is the difference between paired data and non-paired data? * If you have a natural pairing between two sets of data you should subtract them and treat the data as if it is just one set of data. * If the two sets don't have a natural pairing, then we call it 'unpaired'. <><><><><><><><><><><><><><><><><><><> Chapter 17 question: 91 How does a computer create a QQ plot (also called a Normal Quantile plot)? * Your data are sorted and plotted against the expected z-scores the data would have if they had come from a Normal distribution. <><><><><><><><><><><><><><><><><><><> Chapter 18 question: 92 What causes Simpson's Paradox? * When your analysis depends on whether or not your split your data along a certain variable, you can get a statistically significant result that's absolutely false. * You can get Simpson's Paradox by EITHER splitting data you should not split or not splitting data you should split! <><><><><><><><><><><><><><><><><><><> Chapter 18 question: 93 How can a computer help you avoid Simpson's Paradox? * Only human insight based on context can tell you which analysis is correct. * There is nothing a computer or algorithm can do to help you make this judgement call. <><><><><><><><><><><><><><><><><><><> Chapter 19 question: 94 You calculate the best fit line for a collection of data, and the line you plot looks very horizontal to you when you look at the computer screen. Do you suspect a high or low correlation for that data? Why? * Very low. * High correlations would give us lines sloped up or sloped down. <><><><><><><><><><><><><><><><><><><> Chapter 19 question: 95 When is it NOT ok to add variances? * The same employee at McDonald's makes all meal components and that person is very 'generous' with portion sizes. * SAT scores: students who do better in math tend to also do better in verbal. <><><><><><><><><><><><><><><><><><><> Chapter 19 question: 96 Would it make sense to add variances in the case of student SAT math and verbal scores? Why or why not? * No. * We expect a correlation. <><><><><><><><><><><><><><><><><><><> Chapter 20 question: 97 How many degrees of freedom are there in a 2-way contigency table, for example, if you were to ask grads and undergrads whether they prefered classes in the morning, afternoon, or evening? * One less than the number of rows times one less than the number of columns. * 2x1=2 <><><><><><><><><><><><><><><><><><><> Chapter 21 question: 98 What is the correct way to state the the null and alternative hypotheses for an ANOVA test? * The null hypothesis states that all the groups have a common mean value. * The alternative hypothesis states that at least one group has a different mean from the others. <><><><><><><><><><><><><><><><><><><>