[Python-Dev] GBayes design (original) (raw)
Raymond Hettinger python@rcn.com
Thu, 5 Sep 2002 11:43:20 -0400
- Previous message: [Python-Dev] utf8 issue
- Next message: [Python-Dev] GBayes design
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Is it too late to challenge a core design decision?
Instead of multiplying probablities, use fuzzy logic methods. Classify the indicators into damning, strong, weak, neautral, ...
After counting the number of indicators in each class, make a spam/ham decision that can be easily tweaked. This would make it easy to implement variations of Tim's recent clear win, where additional indicators are gathered until the balance shifts sharply to one side.
Some other advantages are: -- easily interpreted score vectors (6 damning, 7 strong, 4 weak, ... ) -- avoids mathematical issues with indicators not being independent -- allows the addition of non-token based indicators. for instance, a preponderance of caps would be a weak indicator. the presence of caps separated by spaces would be a strong indicator. -- the decision logic would be more intuitive -- avoids the issue of having equal amounts of spam and ham in the sample
The core concept would stay the same -- it's really just a shift from continuous to discrete.
of-course-this-is-entirely-outside-my-fields-of-knowledge-ly yours,
Raymond Hettinger
- Previous message: [Python-Dev] utf8 issue
- Next message: [Python-Dev] GBayes design
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]