KCleaner line utility for Windows to clean the keywords on the basis of stop words.
I am the author of this program.
The main difference from other tools of this kind - the ability to work with large volumes
data while maintaining high speed operation. For example, treatment of the key file,
containing ~ 500,000 keev, with a stop word file, containing ~ 50 000 words, enough
modest hardware configuration (AMD Sempron 2500 1.4GHz, 512Mb RAM), this is done
utility for 7-8 seconds.
The input kCleaner.exe takes 4 parameters:
- The input file that contains the complete list of keys (one per line) that will
- Input file of stop words (words are one per line, and this should be
it is a word, not a sentence, that is, in no spaces!);
- The output file that will contain a list of "good" keywords
(Words that have been filtering the stop words). The file is created in the process.
- An output file that contains a list of "bad" key words (the words that have not been
filtering the stop words). The file is also created in the process.
Both input files must be encoded in Windows-1251.
Example utility call:
> Kcleaner.exe in.txt stop.txt good.txt bad.txt
in.txt - input file with the keyword / phrase,
stop.txt - input file with stop words
good.txt and bad.txt - the names of the files generated by the "good" and "bad"
The principle of utility is to look at the keywords and filter them
according to the following criteria: If the key enters any stop word (part of it
as a single word, not as a substring keywords!), this falls into the output of keywords
file containing "bad" keywords. If none of stop word in the current keyword is
not included, the keywords in the file gets "good" keywords.
The utility was tested on Windows XP, but with a high probability it should work
in other versions of OS Windows.