new NaiveBayesClassifier(options) → {Object}
The NaiveBayesClassifier object holds all the properties and methods used by the classifier.
Parameters:
Name | Type | Argument | Description | ||||||
---|---|---|---|---|---|---|---|---|---|
options |
Object |
<optional> |
Options that can be used for intialisation
Properties
|
Properties:
Name | Type | Description |
---|---|---|
VERSION |
String | Library version number |
Returns:
- Type
- Object
Members
-
<static, constant> VERSION
-
Properties:
Type Description String Library version number -
categories :Object
-
Hashmap holding all category names
Type:
- Object
-
docFrequencyCount :Object
-
Document frequency table for each of our categories. For each category, how many documents were mapped to it.
Type:
- Object
-
options :Object
-
Options defined at intialisation
Type:
- Object
Properties:
Name Type Description tokenizer
function Tokenization function (can be custom provided or default). -
totalNumberOfDocuments :Number
-
A counter that holds the total number of documents we have learnt from.
Type:
- Number
-
<constant> VERSION
-
Properties:
Type Description String Instance version number -
vocabulary :Object
-
Hashmap holding all words that have been learnt
Type:
- Object
-
vocabularySize :Number
-
A counter that holds the size of
NaiveBayesClassifier#vocabulary
hashmapType:
- Number
-
wordCount :Object
-
Word count table for each of our categories For each category, how many words in total were mapped to it.
Type:
- Object
-
wordFrequencyCount :Object
-
Word frequency table for each of our categories. For each category, how frequently did a given word appear.
Type:
- Object
Methods
-
<static> withClassifier(classifier) → {Object}
-
Initialise a new classifier from an existing NaiveBayesClassifier object. For example, the existing object may have been retrieved from a database or localstorage.
Parameters:
Name Type Description classifier
NaiveBayesClassifier An existing NaiveBayesClassifier Returns:
- Type
- Object
-
addWordToVocabulary(word) → {undefined}
-
Add a word to our vocabulary and increment the
NaiveBayesClassifier#vocabularySize
counter.Parameters:
Name Type Description word
String Word to be added to the vocabulary Returns:
- Type
- undefined
-
categorize(text) → {String}
-
Determine the category some `text` most likely belongs to. Use Laplace (add-1) smoothing to adjust for words that do not appear in our vocabulary (i.e. unknown words).
Parameters:
Name Type Description text
String Raw text that needs to be tokenized and categorised. Returns:
-
category - Category of “maximum a posteriori” (i.e. most likely category), or 'unclassified'
- Type
- String
-
probability - The probablity for the category specified
- Type
- Number
-
categories - Hashmap of probabilities for each category
- Type
- Object
-
-
frequencyTable(tokens) → {Object}
-
Build a frequency hashmap where the keys are the entries in `tokens` and the values are the frequency of each entry (`token`).
Parameters:
Name Type Description tokens
Array Normalized word array Returns:
FrequencyTable- Type
- Object
-
getOrCreateCategory(categoryName) → {String}
-
Retrieve a category. If it does not exist, then initialize the necessary data structures for a new category.
Parameters:
Name Type Description categoryName
String Name of the category you want to get or create Returns:
category- Type
- String
-
learn(text, category) → {Object}
-
Train our naive-bayes classifier by telling it what `category` some `text` corresponds to.
Parameters:
Name Type Description text
String Some text that should be learnt category
String The category to which the text provided belongs to Returns:
NaiveBayesClassifier- Type
- Object
-
tokenProbability(token, category) → {Number}
-
Calculate probability that a `token` belongs to a `category`
Parameters:
Name Type Description token
String The token (usually a word) for which we want to calculate a probability category
String The category we want to calculate for Returns:
probability- Type
- Number
-
<inner> defaultTokenizer(text) → {Array}
-
Given an input string, tokenize it into an array of word tokens. This tokenizer adopts a naive "independant bag of words" assumption. This is the default tokenization function used if the user does not provide one in
NaiveBayesClassifier#options
.Parameters:
Name Type Description text
String Text to be tokenized Returns:
String tokens- Type
- Array