All rights reserved © 5GL Software, Australia

A medical diagnosis aid/expert system. Large database with thousands of medical condition names (diseases) associated with a symptom/sign/event pattern. Suitable for personal or professional user. Lab analysis and drug reactions separate functions. Entries in the database, in their description, are intended to aid diagnosis. These rarely contain a complex description, more clues and hints to diagnosis. Treatment information is for the general user only.

Need to relax? Who doesn’t? Enjoy good classical style science fiction? Click on cover for link or search for Yipee Lulu.

5GL-Doctor Mathematics Overview

In mathematics a vector is an entity which has a direction and a magnitude (or focus). What 5GL-Doctor is doing is attempting to build a vector which is perfectly straight (heading in the right direction) with a 100% focus. Each individual mathematical calculation is part of that vector equation. More than one vector in that sense. If a top contender emerges, other conditions attempt to “chase it” by forming vectors of their own. For example.The second best disease might argue “but hold on - what if another symptoms comes along?I can better fit it into my symptom/sign/event pattern”. Not only that. The inquiry element ‘temperature and fever’ appears in about 500 patterns and hence is rather meaningless mathematically - but I have the lowest frequency symptom as part of my disease pattern.” This argument, of course, has to be proven using a vector.

Let’s say we have a database with these patterns:

a,b,c (pattern 1)

a,b,c,d (pattern 2)

b,d,e (pattern 3)

a,b (pattern 4)

Say our inquiry is a,b. Two elements in this inquiry.

The first P() is the number of matches of inquiry elements with a pattern. The results expressed as a percentage:

2/2 = 100% for Pattern 1

2/2 = 100 for pattern 2

½ = 50% for pattern 3

2/2 - 100% for pattern 4

The second P() the number of matches relative to pattern elements:

2/3 = 66% for pattern 1

2/4 = 50% for pattern 2

1/3 = 33% for pattern 3

2/2 = 100% for pattern 4

Add the two results together and divide by 2 since we have two P()’s, and we have:

100% for pattern 4

83% for pattern 1

75% for pattern 2

41% for pattern 3

Let’ build an IRC universe from the top two short listed patterns:

IRC name = Top 2 results; IRC population = a,b,c,d

If a pattern fits into this universe, we give it a P() of 1 else a 0.

Pattern 1, 2, and 4 fit into this IRC. Pattern 3 doesn’t. So we have for pattern 1 an existing P() of 83% plus a new P() of 1 (100%) hence new P() = (83 + 100/)= 91%. Our short list now looks like:

100 pattern 4

91 pattern 1

88 pattern 2

20 pattern 3

(While pattern 4 is an exact match, note that pattern 1 is closer to the inquiry than pattern 2. Closer in the sense that it has elements whose number is closer to the inquiry elements number than pattern 2.)

The 5GL-Doctor database has many things defined for a disease entry. With a disease pattern there is a typical and possible presentation, there are also strong indicators and other fields defined for each entry. There are hence a number of ways we can calculate a P(). As long as each new P() is legitimate probability, we can add it to the existing one and divide by 2 and maintain the integrity of the P() calculation.

What is not possible to do is work out a P() for incidence. The information can be difficult to find, and even when available how we we mathematically handle an incidence of 1 in 100,000? Adding 1/100,000 to a P() calculation is pointless because that is so small the P() will not change. And yet, common sense suggests this is an important clue to a diagnosis. What we can do is use trial and error say:

If incidence is 1 or less in 20 let P()=1

If incidence is 1 or less in 150 let P()=0.9

If incidence is 1 or less in 1000 let P()=0.8

And so on. The only way to attain confidence that this intuitive stab in the dark worked, is to test the maths against a lot of inquiries and see the results. In the case of 5GL-Doctor, do the results compare to existing differential diagnosis suggestions as found in more than one medical source? If the answer is yes, then our confidence about the initiative P() values tried grows.

Another P() we could try is related to the element item appearing least frequently and which is matched with a pattern. In this example ‘a’ appears 3 times and b 4 times. Hence any pattern we have that contains an ‘a’ would be given a slightly better P() value by a calculation - perhaps very little, say 0.002, but many of such small P() value calculations add up. Even if we mess up a P() calculation that produces these faint insights, the result is typically so small one calculation is not going to make a difference. Any errors become so small they are not noticed overall.

Notes

Mathematics such as the above are fine most of the time - perhaps 95% of the time - but exceptions do arise. For example a disease such as Guillain-Barre Syndrome. It has so many possible presentations that it proved hard not to include it on many a short list even when a key symptom namely paralysis was not in the inquiry. Using symbolic elements alone does not allow to always infer an intelligence. A technique had to be found to include such intelligence into the mathematics. In the case of this disease, a new field was included in the disease entry which associated a symptom group with this condition - that means one of the symptoms/signs/events that belongs to this symptom group MUST be present in the inquiry else this disease will be rejected from consideration. (In this case the symptom group was named ‘paralysis’).