Topics:
Science

Essay,
Pages 6 (1304 words)

Views

3

*At the end of this lecture, you’ll understand how these terms are relevant in the context of a medical test. You’ll also find them useful when we discuss machine learning.* Let’s consider a hypothetical condition called Very Bad Syndrome (VBS), a condition that scientists have just developed tests for. Let’s suppose that there is a set of people who are taking the test that tells you whether or not you have VBS.

Let’s divide up this set into two subsets: S will be equal to the set of people x in X who have VBS: H will stand for “healthy” and will denote the set of people x in X who do not have VBS.

Now, let’s note that X equals S union H, because whether or not a person has VBS. he or she either does or does not have it.

There is no one who both has it and does not have it. Okay, in some sense the whole point of medical testing is to figure out whether a person is in S or H. Let’s think about what the test tells us: people who test positive for VBS are in P; those who test negative for VBS are in P’. That is, they come into the lab, they take the test, and whatever marker the test has for positive, they test positive. The doctor looks at the test and says, the test says you have VBS. it’s very important to realize that is a different concept than, you have VBS. All that we’re saying is you test positive for VBS. And let’s let N. N for negative, be the set of x in X. Thus, x is negative for VBS. Thus, you take the test again and your doctor or clinician looks at it. If it turns red or whatever the test needs to do, then you are considered to be negative for VBS. This does not mean that you are in the clear; it simply means that your test results have returned as negative. Again, assuming that the test is deterministic, we have P union N = everyone and P intersect N = no one Gust like we had S union H = everyone and S intersect N = no one). In an ideal world, the sick would be the ones who test positive, and those who tested positive would be sick. In reality, we cannot always assume this to be true. There are four intersections that help us talk about those discrepancies. Let’s consider S intersect P; let’s consider H intersect N. Let us consider S intersect N. And let us consider H intersect P. What do these tests represent? We must consider their size and how they relate to the real world. First, let us consider S intersect P. This means you are in S. That means you have VBS, and it means you are in P, which means you test positive. These are what are often called the true positives. The bad news is that you have VBS: however, the good news is that the test accurately told vou that vou had VBS. so perhaps you can seek treatment. The second test- -that is, H intersect N–means that you do not have VBS because you are in H: healthy. And it means that you are an N: negative (in regards to this disease). These are true negatives. The good news is that you don’t have the disease. The somewhat good news is that a test told you this, so now you don’t need to worry about it. The next two sets are ones we would like to avoid. We would prefer that these two sets were empty and that the next two sets were filled with zeroes, but this is not always the case. A false negative is when you test negative for a disease but truly have it. A false negative result occurs when a person has a disease, but the test indicates that he or she does not have it. These people are given false hope because they do not receive the information about their disease that could help them take action to treat it. One common reason for a false negative result is that there is too much overlap between the populations being studied and the group being tested; another common reason is that the test itself is unreliable. A false positive result occurs when a person does not have a disease but tests positive for it; this results from problems with both populations being tested and with the test itself. Although some people in S intersect N may not have the disease, they are still worried about it. Some people may even receive treatment that has side effects with no payoff, because they do not have the disease. Comparing the cardinalities of sets allows us to talk about m the number of elements within a set) which is useful in medical testing. And we can even generalize this idea to other areas of machine learning later in this course and into future classes. Let’s consider the ratio of the number of people with VBS in the study (the numerator) to the total number of people in the study (the denominator). This ratio must be less than or equal to 1, because everyone 1. who is in the study has VBS. Therefore, we can calculate a prevalence rate for VBS among the participants. When designing a study, the goal is to collect data about the people in the study that mirrors data about people in much larger population. If a study takes place at a VBS clinic, for example, then it is safe to assume that the proportion of people with VBS in that sample will be close to 1. However, this does not mean that the true proportion of people in the United States with VBS is also 1. What should we get. by the way, when we add these two quantities? Think about it a second, we better get 1, because you either have it or you don’t. Let’s think about this cardinality of S intersect P. Remember what those were, those were the true positives. Divided by the cardinality of S, sO what are those? In the numerator, on top, we have the number of people who are true positives, and in the denominator we have the number of people who are sick. This is what is called the true positive rate. A number we would like to be small is the false-positive rate, or the proportion of healthy people who test positive for a disease. Let’s look at H intersect P. The cardinality of that, divided by the cardinality of H. In the numerator, we have the number of false positives: people who are actually healthy but test positive for a disease. In the denominator, we have the number of people who are actually healthy. This is called the false-positive rate. We would like the false positive rate to be as close to 0 as possible. And so we compare the size of S intersect N, divided by the size of S, with the true negative rate, which is H intersect N divided by H. To summarize, medical testing theory can be simplified into one sentence: *The true positive rate and the true negative rate should ideally be 1 and 0 respectively; however, this never happens. The true positive rate and the false negative rate are generally close to 1, while the false positive rate is generally close to 0. Later, when you start thinking about business analytics, it may be helpful to consider what types of false positive rates are acceptable. It’s important to realize that this concept applies far beyond medical testing. If something is either true or false either you have a disease or you don’t–you can use vocabulary like “positive” or “negative” to describe your test results.*

Let’s chat?
We're online 24/7