Key Cleaning Steps Before Generating Ngram Average ratng: 4,8/5 79 reviews
  1. Key Cleaning Steps Before Generating Ngram Software
  2. Key Cleaning Steps Before Generating Ngram In Windows 10
  3. Key Cleaning Steps Before Generating Ngram Machine
  • Hotel Housekeeping Tutorial
  • Hotel Housekeeping Resources

Before adding a new SSH key to the ssh-agent to manage your keys, you should have checked for existing SSH keys and generated a new SSH key. When adding your SSH key to the agent, use the default macOS ssh-add command, and not an application installed by macports. Well, we all hate data cleaning, but if we get organised and learn a few tricks there are ways to fast-track it and get it done in a fraction of the time. In fact, there are just 5 steps to getting your data clean and analysis-ready quickly and painlessly. Step 1: Plan, Plan, Plan – Then Plan Some More.

  • Selected Reading

It all comes back to the basics. Serve customers the best-tasting food at a good value in a clean, comfortable restaurant, and they’ll keep coming back.

− Dave Thomas, CEO of Windy’s, a fast-food restaurant chain.

The efforts of housekeeping speak for themselves. The result of sincere as well as faux housekeeping efforts are noticeable. The housekeeping staff needs to execute cleaning and maintenance tasks at various places inside the hotel. The most important task is cleaning and maintaining guest rooms and guest bathrooms. The guests assess the cleanliness in this area critically.

By following the best cleaning and maintenance practices, the housekeeping staff can contribute to retain the satisfied guests as well as to generate new guests willing to repeat their visit to the hotel. This brings in more revenue to the hotel business. To perform towards guest satisfaction and work productivity together, the housekeeping staff needs to structure the cleaning and maintenance procedures and follow them appropriately.

Setting Chambermaid’s Trolley

The chambermaid’s trolley can be viewed as a large tool box on wheels to aid the hotel housekeeping staff. It has a number of compartments and shelves of various sizes. This trolley is filled with the supplies from the housekeeping supplies store at the end of each shift so that the next shift staff can access it immediately.

The staff considers the following points while loading chambermaid’s trolley.

  • Loading the trolley with adequate supplies depending upon the number and types of the rooms on the floor.

  • Avoiding to overload the trolley that may lead to any accidents.

  • Avoiding to underload the trolley that may lead to make unnecessary trips to supplies store.

SOP for Setting the Chambermaid’s Trolley

The SOP is given as follows −

  • Empty the trolley.

  • Check rapidly for any broken parts.

  • Clean it by dusting and wiping any stains.

  • Place the items according to their weight: heaviest items at the bottom and lighter items at the top section of the trolley.

  • Place the linen for different purpose separately.

  • Close the lids of cleaner bottles and liquid cans tightly.

  • Record the numbers and types of the items loaded in the trolley for the rooms.

  • Collect the room keys.

  • Take the trolley to the assigned duty floor.

  • Park it outside the room such that the linen side faces outside and the room entrance is blocked.

SOP for Entering the Guest Room

Key Cleaning Steps Before Generating Ngram Software

The housekeeping staff should follow the SOP given below for entering the guest room.

  • Leave the DND (Do not Disturb) rooms undisturbed.

  • Knock the door with knuckles and announce in pleasant voice, “Housekeeping…”.

  • Wait for five seconds to hear the guest’s response.

  • In case of no response, announce the same again.

  • In there is no answer second time too, open the door with the key.

  • Enter the room.

  • If the guest is found sleeping, withdraw from the room quietly.

  • In case the guest answers, ask politely when would he like to service the room.

  • In case the guest wants it later, acknowledge his reply and withdraw from the room.

  • If the housekeeping work is in progress and the guest returns from outside, greet him and ask if the guest would like to return in some time.

SOPs for Cleaning the Guest Room

The SOP for cleaning the guest room is given below. Once the staff enters the room and starts the housekeeping work, he must −

  • Not use guest room linen as a door stopper or for cleaning and dusting the room.

  • Keep the guest room door open while working.

  • Open the curtains and patio door.

  • Assemble the furniture and place appropriately.

  • Keep the vacuum cleaner and other cleaning apparatus in the room.

  • Check the type of bed.

  • Take the bed linen of appropriate size and place it on the nearest chair.

  • Remove previous bedspread and place on the chair.

  • Inspect the bed and pillows for their condition as well as for any lost-and-found.

  • In case of checkout room, deposit the left guest items to the floor supervisor. If the room is still occupied by the guest, place the item such that it is safe as well as visible to the guest.

  • Put soiled sheets and pillow covers in the soiled linen cart of the trolley.

  • Empty ashtrays and rubbish from the guest room and bathroom dustbins into the trash cart of the trolley.

  • Pick up used glasses, mugs, ashtray, trays, and place them on bathroom platform.

  • Spray the bathtub, basin, glasses, mugs, and trays with cleaning liquid. Let them soak the chemicals from the liquid.

  • Make the bed.

  • Start dusting from an extreme inside corner of the room and work outwards.

  • Clean wipe TV.

  • Straighten the guest items.

  • Sweep the room and patio floor.

  • Mop the room and patio floor.

  • Clean the glasses, mugs, and tray.

  • Sanitize glasses, mugs, telephone device, and TV remote.

  • Inspect the condition of bathroom slippers and bathrobe. Replace if soiled.

  • Close the patio door.

  • Close all the curtains.

  • Clean the entrance door.

  • Close and lock the room door.

  • Report any damage spotted to the supervisor.

SOPs for Cleaning the Guest Bath Room

The SOP for cleaning the guest bathroom is given below.

  • Open bathroom ventilation.
  • Sweep the bathroom floor.
  • Scrub and finish the platform, bathtub, and basin.
  • Scrub and finish the toilet bowl, rim, ring, and hinge.
  • Wipe the mirror.
  • Clean bathroom walls using wet mop or sponge.
  • Replace amenities such as toilet roll, toilet block, shampoo, conditioners, and moisturizers.
  • Replace bathroom mat.
  • Wipe down shower curtain working from top to bottom with a dry cloth.
  • Replace bath towels and hand towels.
  • Replace the dustbin liner.
  • Close the bathroom ventilation.
  • Clean the bathroom door.
  • Keep the bathroom door open after cleaning.
  • Check bathroom doormat. Replace if required.
  • Report any damage spotted to the supervisor.

SOPs for Cleaning Balcony / Patio

The balcony or the patio are the extensions of the guest room. The SOPs for cleaning them are given below.

  • Enter the balcony.
  • Spray walls, railings
  • Scrub and clean the bird droppings
  • Wipe down rocking or sitting chairs and table
  • Clean the door tracks appeared on the floor.
  • Sweep the floor.
  • Mop the floor.

SOPs for Do-Not-Disturb (DND) Rooms

Every room has to be entered at least once a day by any housekeeping staff. The guests who do not want to get disturbed by any housekeeping service tag their rooms with a Do-Not-Disturb (DND) sign.

The SOP for these rooms is as given below.

  • Do not disturb by placing a call until 2:00 o’clock in the noon.

  • After 2:00 p.m., the Supervisor calls the room to know the guest’s needs.

  • The housekeeping staff contacts the supervisor to make sure whether to service the room.

  • If the call was not answered by the guest after two calling attempts, the room is serviced.

  • To his best judgement, the housekeeping staff enters the room and continues with the usual housekeeping work.

Public Area Cleaning SOP

There are various public areas frequented by the hotel guests. The areas and their respective SOPs for housekeeping are as given −

SOPs for Cleaning the Lifts

  • Carry out the lift cleaning task early morning when the least number of guests are expected to use it.
  • Call the elevator on the ground floor.
  • Open its door.
  • Put appropriate signboard near it.
  • Clean the lift using the appropriate cleaning liquid according to the wall material of the lift cabin.
  • Wipe the lift doors.
  • Work from top to bottom while cleaning a lift cabin.
  • Keep the lift door open till the floor and walls are dried completely.
  • Spray clean air freshener.

SOPs for Cleaning the Front Office and Lobby

The lobby is active 24 hours. The furniture, carpets, flooring, and ceiling; everything needs to be kept extremely clean at any given time. The SOPs are as follows −

  • Clear all ashtrays into the trash ensuring no cigarettes are burning.

  • Clean and restore them to proper places.

  • Clear the dustbins near front office desk.

  • Replace their lining and keep them as they were.

  • Dust and wipe the telephone device, fax machine, Computers, and kiosks. Sanitize the telephone device, computer key board, and touchpad of the kiosk.

  • Remove spider webs from ceiling.

  • Remove the dust deposited on walls, windows, furniture, and floor.

  • Remove stains on the carpet and furniture.

  • Clean all artifacts using damp and soft cloth carefully.

  • Sweep and mop the flooring of lobby and front office desk area.

  • Dust and polish any vases, paintings, and art pieces.

  • Spray the air clean spray with signature aroma.

  • Play a very light and soothing instrumental music.

SOPs for Cleaning Parking Area

The parking area takes the load of pollution created by hotel owned vehicles and guests’ private vehicles. It is heavily polluted with dirt and dust. The parking area needs cleanliness with respect to the following terms −

  • Control the ventilation.
  • Control pollutant discharges occurring from broken drainage or water systems of the hotel.
  • Remove fine-grained sediment particles on parking floor.
  • Clean the area near lift.
  • Hard-sweep the parking floor using street sweeping equipment.
  • Collect and dispose the debris appropriately.
  • Bringing presence of any unusual debris to the notice of public area supervisor.

SOPs for Keeping the Garden

The gardener or the team of gardeners work to keep the garden looking beautiful. They must −

  • Water the plants regularly according to the season and requirement of the plants; generally early morning.

  • Remove the weed and fall leaves daily.

  • Implement the art of Arbosculpture to enhance the beauty of the trees and bushes.

  • Keep the gardening tools clean and safe.

  • Report any damage or requirement of tools or plants to the public area supervisor.

  • Keep the lawn grass in healthy condition by periodic cutting with the help of scarifying machine.

  • Keep any artificial waterfalls or artificial water body clean.

  • Fertilizing and manuring the plants as per the schedule.

  • Recycle the food wastage in the hotel to prepare organic fertilizer.

SOPs for Cleaning the Dining Area

The dining areas need daily cleaning before their working hours start as well as when the restaurant staff requests cleaning. The SOP is given below.

  • Collect all the cleaning equipment and dining area keys.
  • Switch on all the electric lamps.
  • Open all the drapes and blinds for letting in the natural light.
  • Observer the entire area to plan the work.
  • Align all the chairs away from the table to make room for cleaning.
  • Clean the carpet area, using vacuum cleaner.
  • Remove any food stains from the carpet using appropriate cleaner.
  • If there is no carpet on the floor, sweep and mop it.
  • Dust all the furniture in the dining area.
  • Polish the furniture if required.
  • Using a feather duster, dust all the pictures, paintings, artworks, and corners.
  • Clean and disinfect the telephone devices.
  • Polish metal, glass, and wood items if required.
  • Clean the mirrors and windows by wiping them with wet sponge.
  • If requirement of maintenance is spotted, consult engineering department.
  • If any guest items are found then deposit it with housekeeping control desk.
  • Collect all dirty table linens and replace with the fresh ones.
  • Return the keys to the security department.
  • Record in the housekeeping register.

SOPs for Cleaning the Swimming Pool

The swimming pool cleaning activity can be conducted in-house by training and employing housekeeping staff; as there could be separate swimming pools such as indoor and outdoor as well as for adults and for children. The following steps are taken to clean and maintain the swimming pool −

  • Check water quality more than once a week.
  • Check any broken tiles/pipes inside the swimming pool.
  • Clean the water as soon as possible when required.
  • Check the pool water for contamination daily. Remove leaves using leaf catchers.
  • Check for slippery floor area and the pool bottom. Apply and maintain the anti-slip mats near the pool. Scrub and clean the bottom of the pool.
  • Keep the life-saving and floating apparatus ready all time.
  • Keep poolside area and basking chairs clean.
  • Keep an appropriate and noticeable signage showing the depth of the swimming pool.
  • Check and keep changing rooms up to good quality.
  • Keep the changing room door open when it is not occupied.
  • Employee lifeguards to provide general safety check for swimming pool once a day during the operating hours.
  • Add adequate amount of chlorine in the pool water.

SOPs for Spring Cleaning

Since Spring-cleaning is a time taking process, it is conducted during low occupancy period. The standard procedures are −

  • Request a spring-cleaning date the front office desk. (The housekeeping department needs to honor whatever date they give, as it is the matter of revenue generation.)

  • Tag the room as 'Not for Sale'.

  • Remove the guest amenities, curtains, and art pieces from the room.

  • Send the curtains to the laundry for dry cleaning.

  • Empty the mini bar and send the beverage items to Food and Beverage store.

  • Roll the curtains and cover them with dustsheet.

  • Inspect the furniture and send to the furniture yard for repair or upholstery.

  • Inspect the locks, knobs, latches, leaking pipes, and bathroom.

  • Hand over the room to maintenance department for any painting, sealing, and repairing work required.

  • Once the maintenance work is complete, remove any residual smell of paint and varnish by airing the room.

  • Polish and clean the permanent fixtures.

  • Open, lay, and shampoo the carpet.

    Diablo II: Lord of Destruction N/A Serial Number Keygen for All Versions Find Serial Number notice: Diablo II: Lord of Destruction serial number, Diablo II: Lord of Destruction all version keygen, Diablo II: Lord of Destruction activation key, crack. Apr 06, 2014  In Diablo II, players return to a world of dark fantasy.As one of five distinct character types, players explore the world of Diablo II, journey across distant lands, fight new villains, discover new treasures, and uncover ancient mysteries.Since the beginning of time, the forces of order and chaos have been engaged in an eternal struggle to decide the fate of all. Diablo 2 Serial Number Keygen for All Versions Find Serial Number notice: Diablo 2 serial number, Diablo 2 all version keygen, Diablo 2 activation key, crack - may give false results. Download now the serial number for Diablo II LoD Cd Keys. All serial numbers are genuine and you can find more results in our database for Diablo software. Updates are issued periodically and new results might be added for this applications from our community. Diablo 2 cd key generator.

  • Check the bathroom sealing and clean the bathroom.

  • Make the bed using fresh bed linen.

  • Restore the art pieces, furniture, and guest supplies.

  • Call room service for restoring mini bar, glasses, and trays.

  • Show the room to the floor supervisor.

  • Release it to the front office desk for selling.

SOPs for Closing Down the Shifts

The floor supervisor closes the shift formally by ensuring the following points from the attendants −

  • Empty garbage bags of the chambermaid’s trolley into the garbage receptacle.

  • Ensure they the soiled linen collected into chambermaid’s trolley bags are sent to laundry.

  • Remove the chambermaid’s trolley and check it for ant damage and dirt accumulation.

  • Empty the vacuum cleaner bags and replace them with new ones.

  • Tidy the housekeeping department area by stacking the items at their appropriate places.

  • Clean the toilet brushes with hot water for ten minutes every week.

  • Rinse mops in light detergents and hang for drying.

  • Close the doors and handover the keys to the housekeeping control desk.

  • Sign off the shift.

Natural Language Processing (NLP for short) is the process of processing written dialect with a computer. The processing could be for anything – language modelling, sentiment analysis, question answering, relationship extraction, and much more. In this series, we’re going to look at methods for performing some basic and some more advanced NLP techniques on various forms of input data.

One of the most basic techniques in NLP is n-gram analysis, which is what we’ll start with in this article!

What Is n-gram Analysis

Given a very simple sentence:

The quick brown fox jumped over the lazy dog.

For some grammatical analysis, it would be very useful to split this sentence up into consecutive pairs of words:

(The, quick), (quick, brown), (brown, fox), …

You need to scan the QR code shown on the site using your mobile phone (or tablet) and perform the required actions on your device.In order to be able to scan the code, use the camera of your phone. For Apple phones, no additional software is required (just point the camera at the QR code and follow the instructions). Wolfram mathematica 10 activation key generator online.

In the above example, we’ve split the sentence up into consecutive tuples of words. The examples above are 2-grams, more commonly known as “bigrams”. A 1-gram is called a “unigram”, and a 3-gram is called a “trigram”. For n-grams with 4 or more members, we generally just stick to calling it a 4-gram, 5-gram, etc. Some examples are in order:

  • Unigram: (The), (quick), (brown), ..
  • Bigram: (The, quick), (quick, brown), (brown, fox), ..
  • Trigram: (The, quick, brown), (quick, brown, fox), (brown, fox, jumps), ..
  • 4-gram: (The, quick, brown, fox), (quick, brown, fox, jumps), (brown, fox, jumps, over), ..

How is this useful? Let’s say we have a huge collection of sentences (one of which is our “quick brown fox” sentence from above), and we have created a giant array of all the possible bigrams in it. We are then fed some user input that looks like this:

I really like brown quick foxes

If we also split this sentence up into bigrams, we could say with reasonable certainty that the bigram (brown, quick) is very likely grammatically incorrect because we normally encounter those two words the other way around. Applying n-gram analysis to text is a very simple and powerful technique used frequently in language modelling problems like the one we just showed, and as such is often the foundation of more advanced NLP applications (some of which we’ll explore in this series).

In the examples we’ve shown so far, the “grams” part of “n-grams” could be taken to mean “word”, but that doesn’t necessarily have to be the case. For example, in DNA sequencing, “grams” could mean one character in a base-pair sequence. Take the base-pair sequence “ATCGATTGAGCTCTAGCG” – we could make bigrams from this sequence to calculate which pairs are seen most often together, allowing us to predict what might come next in the sequence. For example, “A” is most often followed by “G”, so if we see an “A” next time, we could hypothesise that the next item in the sequence will be a “G”.

Choosing a Good Data Source

The one thing that the majority of n-gram analysis relies on is a data source. In the examples above, we needed that huge collection of sentences in order to assert the user-provided sentence had a mistake in it, We also needed enough data in our base-pair sequence in order to make our future DNA predictions.

What we need more generally is some good “training data”. We need some data we can trust that we can use to assert that our analysis is doing the right thing before we can take any old user-data and working with it.

For tasks specific to language analysis, it turns out building huge training data sets for n-gram analysis tasks like this have been undertaken by many academic institutions. These huge collections of textual data are called “text corpora” (that’s the plural, one of these collections is called a “text corpus”). These corpora are effectively tagged sets of written text.

In 1967, a seminal collection of American English was published, which is now known today as the “Brown Corpus”. This collection was compiled with the aim of creating an exhaustive record of as much American English as possible in use at the time. The result was a giant dataset of sentences which reflects, with reasonable accuracy, the frequency one might encounter certain words in normal American English sentences in 1967.

Following it’s initial publishing, it has since had “Part-of-Speech tagging” (often abbreviated as “POS tagging”) applied (hence being called a corpus now). POS tagging is a way of tagging each and every word in the entire corpus with it’s role (verb, noun). Wikipedia maintains a list of the tags the Brown Corpus uses, which is exhaustive to say the least!

The Brown Corpus remains one of the most popular corpora for analysis, as it’s free for non-commercial purposes and easy to obtain. There are lots of other corpora available, including Google’s n-grams Corpus which contains upwards of 150 billion words!

Obtaining the Brown Corpus

In this first part of the series, we’re going to obtain a copy of the Brown Corpus and do some simple n-gram analysis with it. The tagged version of the Brown Corpus is available in a ZIP file from http://nltk.googlecode.com/svn/trunk/nltk_data/packages/corpora/brown.zip.

Download and extract this ZIP file so that you are left with a brown directory containing a lot of .txt files:

sh
$ ll brown
total 10588
-rw-r--r-- 1 nathan nathan 20187 Dec 3 2008 ca01
-rw-r--r-- 1 nathan nathan 20357 Dec 3 2008 ca02
-rw-r--r-- 1 nathan nathan 20214 Dec 3 2008 ca03
(..)

Writing an n-grams Class

We need a way to generate a set of n-grams from a sentence. It should take a sentence, split it into chunks, and return consecutive members in groups of n long. Thankfully, this is trivial in Ruby:

The each_cons method is defined in the Enumerable module, and does exactly what we need by returning every consecutive possible set of n elements. It’s effectively a built-in method to do what we want once we have the input data into an Array format.

Let’s wrap this up in a really simple class:

We add a constructor method that takes the target string to work on, and allow people to optionally define how we break apart the given target before generating the n-grams (It defaults to splitting it up into words, but you could change it to splitting into characters, as an example.)

We then define three helper methods for our most common n-grams, namely unigrams, bigrams and trigrams.

Key Cleaning Steps Before Generating Ngram

Let’s see if it works:

Success! We got exactly the result we were after and wrote barely any code to do it! Now let’s use this class to do something cool with the Brown Corpus we’ve downloaded.

Key cleaning steps before generating ngram pdf

Extracting Sentences From the Corpus

In its raw form, the corpus we obtained contains tagged sentences. In order to apply any sort of n-gram analysis on the contents of the corpus, we need to extract the raw sentences by removing the tags and keeping only the raw words.

A basic sentence in the tagged corpus looks something like this:

The/at Fulton/np-tl County/nn-tl Grand/jj-tl Jury/nn-tl said/vbd Friday/nr an/at investigation/nn of/in Atlanta’s/np$ recent/jj primary/nn election/nn produced/vbd / no/at evidence/nn ”/” that/cs any/dti irregularities/nns took/vbd place/nn ./.

Words and tags are separated with a /. The tags describe the sort of words we are looking at, for example “noun” or a “verb”, and often the tense and the role of the word, for example “past tense” and “plural”.

Given a line, we know the first thing we need to do is get an array of all the words:

We call strip just to clean off any preceding or trailing white-space we may have, and then we split the sentence up into words by using the space character as a delimiter.

Once we have that array of words, we need to split each word and keep only the first part:

This will, on lines which contain words and tags, give us back a sentence without the tags and any trailing white-space removed.

One other thing we need to take care of is blank lines, which are littered throughout the corpus. Presuming we have a path to a corpus file, here’s a complete script to turn it into an array of sentences:

One interesting method we call in here is each_with_object. This comes courtesy of the Enumerable module, and is effectively a specialized version of inject. Here’s an example without using either of those methods:

Key Cleaning Steps Before Generating Ngram In Windows 10

You can see we now have to manage the sentences variable ourselves. The inject method gets us one step closer:

However, we have to return the acc (short for “accumulator” which is a common nomenclature for the object one is building when using inject or other fold style variants.) If we don’t, the acc variable on the next loop run will become whatever we returned in the last loop. Thus, each_with_object was born, which takes care of always returning the object you are building without you having to worry about it.

Tip: One change between inject and each_with_object is that the acc parameter is passed to the block in the opposite order. Be sure to check the documentation before using these methods to be sure you’ve got it the right way around!

Now that we have a method written to get the sentences from a single file from the corpus, let’s wrap it up in a class:

Key Cleaning Steps Before Generating Ngram Machine

Supercharging the Corpus Class

In order to do useful analysis, we need to write one final class to allow us to do the n-gram analysis on one or many of the files in the corpus.

The class is super simple. We have a constructor that takes a glob pattern to select files in our corpus folder (the brown folder that you extracted previously), and a class to pass each file that is found to. If you’re not familiar with glob patterns, they’re often used in the terminal to wildcard things, eg. *.txt is a glob pattern that will find all files ending in .txt.

We then define files and sentences methods. The former uses the glob to find all matching files, loops over them, and creates a new instance of the class we passed to the constructor. The later calls the files method, and loops over those created classes, calling the sentences method on the class, and flattening the result into a single level deep Array.

Now for some convenience methods that wrap our Ngram class. The ngrams method calls sentences and returns an array of n-grams for those sentences. Note that when we call flatten, we ask it to only flatten one level, otherwise we’ll lose the n-gram Array. The unigrams, bigrams and trigrams methods are just helper methods to make things look nicer.

Doing Some n-gram Analysis

We have a basic tool ready, so let’s give it a go and do some analysis as a test on the corpus by trying to see how many proper nouns we can extract in a naïve way. This technique is called “named-entity recognition” in NLP lingo.

We start by creating a new instance of the Corpus class we wrote, and tell it the corpus we are looking at uses the BrownCorpusFile format.

Opening the very first corpus file in a text editor (ca01), there are lots of proper nouns (including places and names) mentioned with the word “of” beforehand. An example is the sentence “Janet Jossy of North Plains […]”. Most proper nouns will likely be 2 words or less, so let’s loop over all of the sentences in trigrams, looking for anything that has the first member “of” and the second member starting with a capital letter. We’ll include the third member of the trigram in the result if it also starts with a capital letter.

The output format we’re looking for is a Hash that looks something like the following format:

Where 2 in this example is the number of times the given proper noun occurred.

Here’s the solution to this problem in full:

The example starts by defining a range which contains all the capital letters. This range is used later on to check whether the trigram members begin with a capital. We also create a results Hash, although we use Hash.new(0) to do so which sets all values to 0 by default. This allows us to increment values of each key without ever having to worry if the key was created and set before.

Then, the code starts building the results by looping over all of the trigrams in the corpus. For each trigram:

  • Check whether the first member is “of”.
  • Check whether the first character of the second member is a capital letter.
  • If both of these are true, prepare to store just the second member by creating an Array with just the second member in it.
  • If the third member also starts with a capital letter, add the third member to the Array.
  • Join the Array with a space character and store it in the results Hash, incrementing the value of that key by one.

Eventually we end up with the results Hash populated with key/value pairs. The keys are the proper nouns we’ve found and the values are the number of times we’ve seen that proper noun in the corpus. Here’s the top 10 after running that code over the entire Brown Corpus:

Barring “Af” (which if you delve into the corpus, you would discover is the name given to a function for accelerometer scale functions in physics – how thrilling!), the rest of the results are very interesting – though what conclusions you can draw from them is entirely up to you!

Source Code

The source code for everything we wrote in this article is available in the repository on GitHub: https://github.com/nathankleyn/rubynlp. Specifically, see the file for this part of the series at https://github.com/nathankleyn/rubynlp/blob/master/examples/part_one.rb.

Next Time

In the next part of this series, we’re going to look into being more intelligent with our n-grams by exploring Markov chaining. This fascinating application of mathematical probability allows us to generate pseudo-random text by approximating a language model.