Strings : Trie Data Structure To Store Words

0
1487

Trie Data Structure with Strings at HiringLibrary-Com

Use a trie data structure to store words. Every node contains a list of all letters (pointers to the same node structure) and flags for each letter to indicate the length of the word. Write a method to insert into this kind of data structure. What would you use to store each node?

Rationale : Unlike other data structures, in a trie keys are not identified by information from a single node, but by the path from the root of the tree to a particular node. Thus, each node has a number of children equal to the letters in the alphabet being used, and each edge is labeled with the corresponding letter of the alphabet.The following picture shows a trie storing the strings (words): cup, sing, curve, singer, small, ball:

TrieTree

Before we get to the insertion algorithm, let’s start by analyzing how a representation of this type of data structure would look like:

struct trie

{

int noOfWordsThatEndHere, noOfChildren;

trie *letters[26];

trie()

{

       noOfWordsThatEndHere = noOfChildren = 0;

memset(letters,0,sizeof(letters));

}

};

 

As you can see, a trie is a recursive data structure that stores 26 other tries within its internal representation (one for each letter in the alphabet). Some other information we are storing here is the number of words that end at the current trie and the number of non-NULL children. Insertion can be easily performed recursively. Given a string, we just increase noOfWordsThatEndHere if we reach the end of a word or, otherwise, create a new trie at the position corresponding to the next letter in the word. Insertion in a trie can be done in O(n), where n is the length of the string inserted.


 

Before solving this problem do the following (not limited to):

  • Clarify doubts
  • Write/loudly speak/analyze single or multiple pseudocode(e.g. trivial algorithm to advanced algorithm/Data Structures) which can lead to solution
  • Explain any assumptions or limitations on the written pseudocode
  • Mention major data structures to be used
  • Also how the positive/negative functional tests prove that the resultant solution is correct. 

Professional Teacher at HiringLibrary.com

Clarification Points 

Clarification Point-1:

What is the maximum size of the string given as input?

Clarification Point-2:

What kind of characters can the given string contain?

 

 

Diversity in WorkPlace - Japanese,Korea,Chinese

Pseudo Code

Pseudo code:

Insert(t, s):
  If (current position in s is end of word)
    t.number_of_words++
    return
  if (sub-trie for current letter is NULL)
    create sub-trie
    increase t.number of children
  insert (t.sub-trie, s+1)

How did you find your first job?

Assumptions/Limitations

Assumption/Limitation:

  • We will assume the given string to be inserted into the trie data structure is no longer than 200 characters.
  • We will also assume the given string contains only lower case and upper case letters and the words are separated by ‘ ‘ or a newline.

Career Summary from a Professional for HiringLibrar

Major Data Structures 

Major data structures:

Array of characters (string), trie

Example at Hiring Library

Unit tests:  
Postive Functional Test cases  
Input Expected Output
S = any string with lower case / upper case letters and spaces Program terminates
Negative Functional Test Cases
S = any other string Undefined

Answer of the coding Problem

Here is the sample answer that implements the solution described above

#include <cstdio>
#include <string.h>

using namespace std;

char s[200];

struct trie
{
    int noOfWordsThatEndHere, noOfChildren;
    trie *letters[26];
    trie()
    {
        noOfWordsThatEndHere = noOfChildren = 0;
        memset(letters,0,sizeof(letters));
    }
};

void insert(trie *t,char *s)
{
  for (int stringIndex = 0; stringIndex < strlen(s); stringIndex++)
    if (s[stringIndex] != ' ' && !(s[stringIndex] >= 'a' && s[stringIndex] <= 'z')
        && !(s[stringIndex] >= 'A' && s[stringIndex] <= 'Z'))
    {
      printf("Invalid input!\n");
      return;
    }

  if(*s == '\n' || *s == ' ' || strlen(s) == 1)
    {
        t->noOfWordsThatEndHere++;
        return;
    }

    int position = *s-'a';

    if(t -> letters[position]==0)
    {
        t -> letters[position]=new trie;
        t -> noOfChildren ++;
    }
    insert(t -> letters[position],s+1);
}

int main()
{
    trie *t=new trie;

    strcpy(s, "test sentence for trie");

    insert(t, s);

    return 0;
}

Output at HiringLibrary-Com

Output and Further Excercise

Further Exercise at HiringLibrary

Further exercise:
 

 

 

Q: Write a function that searches for a word in a trie. What is its complexity?

A: We would just need to start at the root and search for each subsequent letter, going along the path corresponding to the word. The complexity is O(N), where N is the length of the word we are searching for.