Forexample, documents differ internally in their language (both human andprogramming), vocabulary (email addresses, links, zip codes, phone numbers,product numbers), type or format (text, HTML, PDF, images, sounds), andmay even be machine generated (log files or output from a database).
We must realize that terrorist and their plans of attack are not as clear cut as a made to television movie or a multi-million dollar Hollywood production.
It takes time to cultivate the info, research the possible senerios, decuss the probables with intelligent people who have experience in such matters. We cannot expect info to pop up out of the ground or be presented in a box with pretty paper. It is a process of effort, sweat, tears and yes, life-threatening.
The expense of data-mining may be high to most, but I would rather have my tax money spent in this manner than wasted on assisting people who are disrespectful of th US and are holding this nation hostage for welfare. We have too much fraud,waste and abuse from individuals who invade this country.. I think we should consentrate on preventing the abuse and horror that has and will again beset this Nation.
(Postscript)(PDF)Per author's description: This paper introduces the idea of a boundingchain for determining when complete coupling has occurred so thatCoupling From the Past may be run on the chain to obtain exactsamples.
Finally, the paper discusses the comparison of the three clustering algorithms.
Key words: Intrusion detection, K-Means Clustering, Y-Means Clustering, Fuzzy C-Means Clustering, Data Mining, Firewall.
 Ye Qing, Wu Xiaoping and HuangGaofeng, ―An Intrusion Detection Approach based on Data Mining‖, 2nd International Conference on FutureComputer and Communication, pp.
We find that decision tree classifier gives more accurate result if we take "complete information" of data set .In this paper we improve the traditional decision tree algorithm which works on with known and precise data , including gini index method for determining the goodness of a split and considering cumulative distribution function .