Factor Language Model Programming

(ii) Inflection Dictionary
This dictionary contains the list of all possible inflections of the Telugu language. Each entry of Stem word dictionary lists the indexes of this dictionary to indicate which all inflections are possible with that stem.
The proposed corpus structure helps in reducing the corpus size drastically. Every stem word may have number of inflections possible. If the inflected words are stored as it is, then corpus size would be m*n, where m is number of stem words and n is number of inflections. Instead of storing all the inflected words, the proposed corpus structure stores stem words and inflections separately, and handles the inflected words through morphology. Hence the corpus size required is for m stem words and n inflections i.e., m+n. Thus there is a great reduction in the corpus size. For a corpus of 1000 stem words and 10 inflections, the required corpus size is 1000+10=1010, which otherwise would have required 1000*10=10000.

Fig 5.3 : Corpus structure of proposed Language Model
Textual Word Segmentation using Proposed Language Model
The proposed language model is used to develop a textual word segmenter. A word segmenter is used to divide the given inflected word into a stem and single inflection. This is required as the corpus stores stems and inflections separately.
Input the word segmenter is an Inflected word. Syllabifier takes this word and divides the word into syllables and identifies if the letter is a vowel or a consonant. After applying the rules syllabified form of the input will be obtained. Once the process of syllabification is done, this will be taken up by the analyzer. Analyzer separates the stem and inflection part of the given word. This stem word will be validated by comparing it with the stem words present in stem dictionary. If the stem word is present, then the inflection of the input word will be compared with the inflections present in inflection dictionary of the given stem word. If both the inflections get matched then it will directly displays the output otherwise it takes the appropriate inflection(s) through comparison and then displays.
Syllabification is the separation of the words into syllables, where syllables are considered as phonological building blocks of words. It is dividing the word in the way of our pronunciation. The separation is marked by hyphen. In the morphological analyzer, the main objective is to divide the given word into root word and the inflection. For this, we divide the given input word into syllables and we compare the syllables with the root words and inflections to get the root word and appropriate inflection.

Don't use plagiarized sources. Get Your Custom Essay on
Factor Language Model Programming
Just from $13/Page
Order Essay

Fig 5.4: Block diagram of Word Segmentr for text
Steps for word segmentation

Receiving the inflected word as an input from the user.
Syllabify the input
Analyze the input and validating the stem word.
Identify the appropriate inflection for the given stem word by comparing the inflection of given word with the inflections present in inflection dictionary of the stem word.
Displaying the appropriate inflected word.

For example, considering the word “nAnnagariki” (నాన్నà°-ారికి) meaning “to father”, the input is given the user in Roman transliteration format. This input is basically divided into lexemes as:

Now, the array is processed which gives the type of lexeme by applying the rules of syllabification one by one.

“ No two vowels come together in Telugu literature.”
The given user input does not have two vowels together. Hence this rule is satisfied by the given user input. The output after applying this rule is same as above. If the rule is not satisfied, an error message is displayed that the given input is incorrect. Now the array is:
c – v – c – c – v – c – v – c – v – c – v

“ Initial and final consonants in a word go with the first and last vowel respectively.”
Telugu literature rarely has the words which end up with a consonant. Mostly all the Telugu words end with a vowel. So this rule does not mean the consonant that ends up with the string, but it means the last consonant in string. The application of this rule2 changes the array as following:
c – v – c – c– v – c – v – c – v – c – v
cv – c – c – v – c – v – c – v – cv
This generated output is further processed by applying the other rules.

“ VCV: The C goes with the right vowel.”
The string wherever has the form of VCV, then this rule is applied by dividing it as V – CV. In the above rule the consonant is combined with the vowel, but here in this rule the consonant is combined with the right vowel and separated from the left vowel. To the output generated by the application of rule2, this rule is applied and the output will be as:
cv – c – c – v – c – v – c – v – cv
cv – c – c – v – cv – cv – cv
This output is not yet completely syllabified, one more rule is to be applied which finishes the syllabification of the given user input word.

“ Two or more Cs between Vs – First C goes to the left and the rest to right.”
It is the string which is in the form of VCCC*V, then according to this rule it is split as VC – CC*V. In the above output VCCV in the string can be syllabified as VC – CV. Then the output becomes:
cv – c – c – v – cv – cv – cv
cvc– cv – cv – cv – cv
Now this output is converted to the respective consonants and vowels. Thus giving the complete syllabified form of the given user input.
nAn – na –cA – ri – ku
cvc – cv – cv – cv – cv
Hence, for the given user input, “nAnnagAriki”, the generated syllabified form is, “nAn – na – gA – ri – ki”.

Fig 5.5: Word Segmenter showing an inflected word without change in stem form

Fig 5.6: Word Segmenter showing an inflected word with a change in stem form
SCIL – Speech Corrector for Indian Languages
In inflectional language every word consists of one or several morphemes into which the word can be segmented. The approach used here aims at reducing the above mentioned problem of having a very huge corpus for good recognition accuracy. It exploits the characteristic of Telugu language that every word consists of one or several morphemes into which the word can be segmented.
SCIL is a procedure

To deal with complex word forms
applied after recognition
Using which misrecognized words are corrected

Architecture of SCIL
The design of Speech Corrector for Indian Languages, consists of the Syllable Identifier, Phone Sequence Generator, Word Segmenter, and Morpho- Syntactic Analyzer modules. Input speech is decoded by a normal ASR system which gives the identified word as a string. The sequence of phones would be the input to the Word Segmenter module which matches the phonetized input with the root words stored in dictionary module, and generates a possible set of root words. Morpho-Syntactic Analyzer compares the inflection part of the signal with the possible inflections list from the database and gives correct inflection. This will be given to Morph Analyzer to apply morpho-syntactic rules of the language and gives the correct inflected word.

Fig 5.7: Block diagram of SCIL
i) Syllable Identifier
Syllable identifier marks the rough boundaries of the syllables and labels them. At this stage , we get list of syllables separated with hyphen. The user input is syllabified and this would be the input to the next module. E.g. dE-vA-la-yA-ku
ii) Phone Sequence Generator
As the words in the dictionary are stored at phone level transcription, this module generates the phone sequences from the syllables. E.g. d-E-v-A-l-a-y-A-k-u
iii) Word Segmentor
This module compares the phonetized input from starting with the root words stored in dictionary module and lists the possible set of root words. The possible root word is dEvAlayamu.
iv) Dictionary
Dictionary contains stems and inflections separately. It does not store inflected words as it is very difficult, if not impossible, to cover all inflected words of the language. The database consists of 2 dictionaries:

Stem Dictionary
Inflection Dictionary

Stem dictionary contains the stem words of the language, signal information for that stem which includes the duration and location of that utterance and list of indices of inflection dictionary which are possible with that stem word.
Inflection Dictionary contains the inflections of the language, signal information for that inflection which includes the duration and location of that utterance. Both the dictionaries are implemented using trie structure in order to reduce the search space.
v) Morpho Syntactic Analyzer
This module compares the inflection part of the signal with the possible inflections list from the database and gives correct inflection. This will be given to Morph Analyzer to apply morpho-syntactic rules of the language and gives the correct inflected word.
Post Recognition Procedure

Capture the utterance, an isolated inflected word.
Get its syllabified form.
Generate phone sequence from the syllabified word.
Compare the phone sequences with stem words in the dictionary and identify the stem.
Segment the word into stem and inflection.
Get the list of possible inflections.
Compare the inflection signals possible with that stem one by one and apply morpho-syntactic rules of the language to combine stem and inflection.
Display the inflected word.

Using the rules the possible set of root words are combined with possible set of inflections and the obtained results are compared with the given user input and the nearest possible root word and inflection are displayed if the given input is correct. If the given input is not correct then the inflection part of the given input word is compared with the inflections of that particular root word and identifies the nearest possible inflection and combines the root word with those identified inflections, applies sandhi rules and displays the output. When there is more than one root word or more than one inflection has minimum edit distance then the model will display all the possible options. User can choose the correct one from that. For example, when the given word is pustakaMdO (పుస్తకందో), the inflections tO making it pustakaMtO (పుస్తకంతో) meaning ‘with the book’ and lO making it pustakaMlO (పుస్తకంలో) meaning ‘in the book’) mis are possible. Present work will list both the words and user is given the option. We are working on improving this by selecting the appropriate word based on the context.
SCIL Algorithm

W=Utterance.wav
Syl[]=SyllableIdentifier(W)
Phone[]=phonetizer(Syl[])
Stem=getStem(Syl[])
Infl[]=getInflections(Stem)
While (not exactMatch)

word=MorphAnalyzer(stem,inflMatch)

display word
Stop

Working of SCIL
Once possible root words identified the given word is segmented into two parts, first being the root word and second part inflection. Now the inflection part is compared in the reverse direction for a match in the inflection dictionary. It will consider only the inflections that are mentioned against the possible root words, thus reducing the search space and making the algorithm faster.
For example consider “nAnnagariki” (నాన్నà°-ారికి) meaning “to father”, is misrecognized as nAn-na-cA-ri-ku (నాన్నచారికు) then SCIL is applied and will correct the recognition error as follows:
The output from ASR is nAn-na-cA-ri-ku. The phone sequence generator will generate the phone sequence as n-A-n-n-a-c-A-r-i-k-u. Now, match it with the set of root words stored in dictionary module. This process will identify the possible set of root words from the Stem dictionary as follows:

…….

nAnna ( నాన్న)

nANemu (నాణెము)

………

Once possible root words identified the given word is segmented into two parts, first being the root word and second part inflection. Now the inflection part is compared for a match in the inflection dictionary. It will consider only the inflections that are mentioned against the possible root words, thus reducing the search space and making the algorithm faster.

ki (à°•à°¿)

ni (ని)

gAriki ( à°-ారికి )

………

Possible set of inflections in inflections dictionary
After getting the possible set of root words and possible set of inflections they are combined with the help of SaMdhi formation rules. Here in this example cA-ri-ku is compared with the inflections of the root word nAnna
After comparing it identifies gAriki as the nearest possible inflection and combines the root word with the inflection and displays the output as “nAnnagAriki”.
Conclusions
Language model proposed in this work results in reduction in corpus size by using factored approach. The search process is fastened by use of trie based structure. A change to standard trie is proposed.
A post recognition procedure SCIL, is designed which uses the proposed language model and corrects the words misrecognized at inflections. The approach is tested using 1500 speech samples. These samples consist of 100 distinct words , each word repeated 3 times and recorded by 5 speakers in the age group 18-50. It is implemented as a speaker dependent system. An average model is built from the three utterances of each word for each speaker. Each speaker is given a unique ID, using which average model of that speaker is used for testing.
 

What Will You Get?

We provide professional writing services to help you score straight A’s by submitting custom written assignments that mirror your guidelines.

Premium Quality

Get result-oriented writing and never worry about grades anymore. We follow the highest quality standards to make sure that you get perfect assignments.

Experienced Writers

Our writers have experience in dealing with papers of every educational level. You can surely rely on the expertise of our qualified professionals.

On-Time Delivery

Your deadline is our threshold for success and we take it very seriously. We make sure you receive your papers before your predefined time.

24/7 Customer Support

Someone from our customer support team is always here to respond to your questions. So, hit us up if you have got any ambiguity or concern.

Complete Confidentiality

Sit back and relax while we help you out with writing your papers. We have an ultimate policy for keeping your personal and order-related details a secret.

Authentic Sources

We assure you that your document will be thoroughly checked for plagiarism and grammatical errors as we use highly authentic and licit sources.

Moneyback Guarantee

Still reluctant about placing an order? Our 100% Moneyback Guarantee backs you up on rare occasions where you aren’t satisfied with the writing.

Order Tracking

You don’t have to wait for an update for hours; you can track the progress of your order any time you want. We share the status after each step.

image

Areas of Expertise

Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.

Areas of Expertise

Although you can leverage our expertise for any writing task, we have a knack for creating flawless papers for the following document types.

image

Trusted Partner of 9650+ Students for Writing

From brainstorming your paper's outline to perfecting its grammar, we perform every step carefully to make your paper worthy of A grade.

Preferred Writer

Hire your preferred writer anytime. Simply specify if you want your preferred expert to write your paper and we’ll make that happen.

Grammar Check Report

Get an elaborate and authentic grammar check report with your work to have the grammar goodness sealed in your document.

One Page Summary

You can purchase this feature if you want our writers to sum up your paper in the form of a concise and well-articulated summary.

Plagiarism Report

You don’t have to worry about plagiarism anymore. Get a plagiarism report to certify the uniqueness of your work.

Free Features $66FREE

  • Most Qualified Writer $10FREE
  • Plagiarism Scan Report $10FREE
  • Unlimited Revisions $08FREE
  • Paper Formatting $05FREE
  • Cover Page $05FREE
  • Referencing & Bibliography $10FREE
  • Dedicated User Area $08FREE
  • 24/7 Order Tracking $05FREE
  • Periodic Email Alerts $05FREE
image

Our Services

Join us for the best experience while seeking writing assistance in your college life. A good grade is all you need to boost up your academic excellence and we are all about it.

  • On-time Delivery
  • 24/7 Order Tracking
  • Access to Authentic Sources
Academic Writing

We create perfect papers according to the guidelines.

Professional Editing

We seamlessly edit out errors from your papers.

Thorough Proofreading

We thoroughly read your final draft to identify errors.

image

Delegate Your Challenging Writing Tasks to Experienced Professionals

Work with ultimate peace of mind because we ensure that your academic work is our responsibility and your grades are a top concern for us!

Check Out Our Sample Work

Dedication. Quality. Commitment. Punctuality

Categories
All samples
Essay (any type)
Essay (any type)
The Value of a Nursing Degree
Undergrad. (yrs 3-4)
Nursing
2
View this sample

It May Not Be Much, but It’s Honest Work!

Here is what we have achieved so far. These numbers are evidence that we go the extra mile to make your college journey successful.

0+

Happy Clients

0+

Words Written This Week

0+

Ongoing Orders

0%

Customer Satisfaction Rate
image

Process as Fine as Brewed Coffee

We have the most intuitive and minimalistic process so that you can easily place an order. Just follow a few steps to unlock success.

See How We Helped 9000+ Students Achieve Success

image

We Analyze Your Problem and Offer Customized Writing

We understand your guidelines first before delivering any writing service. You can discuss your writing needs and we will have them evaluated by our dedicated team.

  • Clear elicitation of your requirements.
  • Customized writing as per your needs.

We Mirror Your Guidelines to Deliver Quality Services

We write your papers in a standardized way. We complete your work in such a way that it turns out to be a perfect description of your guidelines.

  • Proactive analysis of your writing.
  • Active communication to understand requirements.
image
image

We Handle Your Writing Tasks to Ensure Excellent Grades

We promise you excellent grades and academic excellence that you always longed for. Our writers stay in touch with you via email.

  • Thorough research and analysis for every order.
  • Deliverance of reliable writing service to improve your grades.
Place an Order Start Chat Now
image

Order your essay today and save 30% with the discount code Happy