Class: NaiveBayesClassifier

NaiveBayesClassifier

new NaiveBayesClassifier(options) → {Object}

The NaiveBayesClassifier object holds all the properties and methods used by the classifier.
Parameters:
Name Type Argument Description
options Object <optional>
Options that can be used for intialisation
Properties
Name Type Description
tokenizer function Custom tokenization function
Properties:
Name Type Description
VERSION String Library version number
Returns:
Type
Object

Members

<static, constant> VERSION

Properties:
Type Description
String Library version number

categories :Object

Hashmap holding all category names
Type:
  • Object

docFrequencyCount :Object

Document frequency table for each of our categories. For each category, how many documents were mapped to it.
Type:
  • Object

options :Object

Options defined at intialisation
Type:
  • Object
Properties:
Name Type Description
tokenizer function Tokenization function (can be custom provided or default).

totalNumberOfDocuments :Number

A counter that holds the total number of documents we have learnt from.
Type:
  • Number

<constant> VERSION

Properties:
Type Description
String Instance version number

vocabulary :Object

Hashmap holding all words that have been learnt
Type:
  • Object

vocabularySize :Number

A counter that holds the size of NaiveBayesClassifier#vocabulary hashmap
Type:
  • Number

wordCount :Object

Word count table for each of our categories For each category, how many words in total were mapped to it.
Type:
  • Object

wordFrequencyCount :Object

Word frequency table for each of our categories. For each category, how frequently did a given word appear.
Type:
  • Object

Methods

<static> withClassifier(classifier) → {Object}

Initialise a new classifier from an existing NaiveBayesClassifier object. For example, the existing object may have been retrieved from a database or localstorage.
Parameters:
Name Type Description
classifier NaiveBayesClassifier An existing NaiveBayesClassifier
Returns:
Type
Object

addWordToVocabulary(word) → {undefined}

Add a word to our vocabulary and increment the NaiveBayesClassifier#vocabularySize counter.
Parameters:
Name Type Description
word String Word to be added to the vocabulary
Returns:
Type
undefined

categorize(text) → {String}

Determine the category some `text` most likely belongs to. Use Laplace (add-1) smoothing to adjust for words that do not appear in our vocabulary (i.e. unknown words).
Parameters:
Name Type Description
text String Raw text that needs to be tokenized and categorised.
Returns:
  • category - Category of “maximum a posteriori” (i.e. most likely category), or 'unclassified'
    Type
    String
  • probability - The probablity for the category specified
    Type
    Number
  • categories - Hashmap of probabilities for each category
    Type
    Object

frequencyTable(tokens) → {Object}

Build a frequency hashmap where the keys are the entries in `tokens` and the values are the frequency of each entry (`token`).
Parameters:
Name Type Description
tokens Array Normalized word array
Returns:
FrequencyTable
Type
Object

getOrCreateCategory(categoryName) → {String}

Retrieve a category. If it does not exist, then initialize the necessary data structures for a new category.
Parameters:
Name Type Description
categoryName String Name of the category you want to get or create
Returns:
category
Type
String

learn(text, category) → {Object}

Train our naive-bayes classifier by telling it what `category` some `text` corresponds to.
Parameters:
Name Type Description
text String Some text that should be learnt
category String The category to which the text provided belongs to
Returns:
NaiveBayesClassifier
Type
Object

tokenProbability(token, category) → {Number}

Calculate probability that a `token` belongs to a `category`
Parameters:
Name Type Description
token String The token (usually a word) for which we want to calculate a probability
category String The category we want to calculate for
Returns:
probability
Type
Number

<inner> defaultTokenizer(text) → {Array}

Given an input string, tokenize it into an array of word tokens. This tokenizer adopts a naive "independant bag of words" assumption. This is the default tokenization function used if the user does not provide one in NaiveBayesClassifier#options.
Parameters:
Name Type Description
text String Text to be tokenized
Returns:
String tokens
Type
Array
Copyright (C) 2015, Hadi Michael. All rights reserved.
Documentation generated by JSDoc 3.2.2 on Fri, May 29th, 2015 using the DocStrap template.