Personal Health Information detection in unstructured web documents (original) (raw)

2013, Proceedings of the 26th IEEE International Symposium on Computer-Based Medical Systems

This paper describes our study of the incidence of Personal Health Information (PHI) on the Web. PHI is usually shared under conditions of confidentiality, protection and trust, and should not be disclosed or available to unrelated third parties or the general public. We first analyzed the characteristics that potentially make systems successful in identification of unsolicited or unjustified PHI disclosures. In the next stage, we designed and implemented an integrated Natural Language Processing/Machine Learning (NLP/ML)-based system that detects disclosures of personal health information, specifically according to the above characteristics including detected patterns. This research is regarded as the first step toward a learning system that will be trained based on a limited training set built on the result of the processing chain described in the paper in order to generally detect the PHI disclosures over the web.

Sign up for access to the world's latest research.

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact