Flickr Badge

Tuesday, July 10, 2007

Generating sentences using Markov chains

So, a bunch of us - Aswin, Kausik, Moyeen, Sagaro - got together at Aswin's house on Sunday to have a python codekata. We did an intro to Python session, and we wrote a program to generate sentences using Markov chains.

I really like this program. Not only is it interesting to write and run but it makes for a very nice language intro. I'll be doing this session again as a workshop at Bangalore BarCamp 4 at the end of this month.

The final version of the program is given below. The example below uses sample text from Alice in Wonderland, downloaded from Project Gutenberg. Remember that this is the final version — In the actual kata, a number of variations were written before arriving here.
import random

def getLines(filename):
return [line[0:-1] for line in open(filename).readlines()]

def getWords(lines):
words = []
for line in lines:
return words

def createProbabilityHash(words):
numWords = len(words)
wordCount = {}
for word in words:
if wordCount.has_key(word):
wordCount[word] += 1
wordCount[word] = 1

for word in wordCount.keys():
wordCount[word] /= 1.0 * numWords
return wordCount

def getRandomWord(wordCount):
randomValue = random.random()
cumulative = 0.0
for word in wordCount:
cumulative += wordCount[word]
if cumulative > randomValue:
return word

# replace with a large text sample. Here we are using Alice in Wonderland
# from Project Gutenberg
words = getWords(getLines("alice.txt"))

wordMap = {}
previous = (words[0], words[1])
for word in words[2:]:
if wordMap.has_key(previous):
wordMap[previous] = [word]
previous = (previous[1], word)

for word in wordMap.keys():
probabilityHash = createProbabilityHash(wordMap[word])
wordMap[word] = probabilityHash

previous = ("The", "next") # The starting words
numWords = 100 # The number of words to print

print previous[0], previous[1],
for i in range(numWords):
word = getRandomWord(wordMap[previous])
print word,
if word.endswith("."):
print "\n"
previous = (previous[1], word)


Matheedubbas said...

Check my site:

I have made an implementation on SQL Server and C# that will render
webpages with markov-linked content. I would like to do the same with
MIDI. Anyone go any Ideas for an application ?



Anonymous said...

i'm new to programming and so when i put this into python it gives me an error. i have changed the txt file that it reads to one that i have and the error reads:

Traceback (most recent call last):
File "C:location", line 77, in
word = getRandomWord(wordMap[previous])
KeyError: ('works', 'properly.')