A few years ago I was playing around with a random name generator. My approach was to cobble together random letter combination like;
leading consonants + vowels + inner consonants + vowels + closing consonants
Basically, I was aiming for something pronounceable with commons letters weighted to appear more often. It produced output such as;
Votharn Eristacark Iplortidot Birtoil Udaeteahieb Aceastoherk Reloist Tharnog Wasterk Femewelav Ublyrrielic Cekird Owritothol Hoogoh Obloukajarriem Sleebont Niestart Pekev Lirtooth Efentoidagix Klyckas Yryfesat KlootonYeah ... that's really, really awful.
So I decided to give it another whack. This time I started with the premise, 'what sounds most like a name?'
Names do!
I found a couple files with over a thousand of the most common male and female first names on the
US Census Bureau's web page and started playing. I wrote a Python script that used regular expressions to slice a batch of words into three lists;
List 1 = Zero or more vowels + One or more consonants at the start of the word
List 2 = One or more vowels + One or more consonants inside the word (not at the start or the end). We can get 0 or more of these patterns depending on the word.
List 3 = One or more vowels + Zero or more consonants at the end of the word.
Side note: If you haven't dug into regular expressions yet I highly recommend you check them out. I avoided them for years and now they're an essential part of my programmer tool box. Another big plus is their utility spans multiple languages.
I also tracked the frequency of each pattern, sorting by most common first and discarding the rares. Finally, I dumped the output formatted as Python lists that I could paste right into the source of the next script.
import re
import operator
_FILENAME = 'data/elves2.txt'
_CULL = 1
## Match 0 or more vowels + 1 or more consonants at the start of the word
_LEAD = re.compile(r'^[aeiouy]*(?:qu|[bcdfghjklmnpqrstvwxz])+')
## Match 1 or more vowels + 1 or more consonants inside a word (not start/end)
_INNER = re.compile(r'\B[aeiouy]+(?:qu|[bcdfghjklmnpqrstvwxz])+\B')
# Match 1 or more vowels + 0 or more consonats at the end of a word
_TRAIL = re.compile(r'[aeiouy]+(?:qu|[bcdfghjklmnpqrstvwxz])+$')
def token_lists(names):
lead, inner, tail = {}, {}, {}
## Populate dictionaries; key=pattern, value=frequency
for name in names:
match = re.match(_LEAD, name)
if match:
pat = match.group(0)
count = lead.get(pat,0)
lead[pat] = count +1
matches = re.findall(_INNER, name)
for pat in matches:
print pat,
count = inner.get(pat,0)
inner[pat] = count +1
match = re.search(_TRAIL, name)
if match:
pat = match.group(0)
count = tail.get(pat,0)
tail[pat] = count +1
## Convert dicts to a list of tuples in the format (pattern, frequency)
lead_srt = sorted(lead.items(),key=operator.itemgetter(1),reverse=True)
inner_srt = sorted(inner.items(),key=operator.itemgetter(1),reverse=True)
tail_srt = sorted(tail.items(),key=operator.itemgetter(1),reverse=True)
## Build lists of patterns ordered most to least frequent and cull rares
lead_list = [ x[0] for x in lead_srt if x[1] > _CULL ]
inner_list = [ x[0] for x in inner_srt if x[1] > _CULL ]
tail_list = [ x[0] for x in tail_srt if x[1] > _CULL ]
return lead_list, inner_list, tail_list
if __name__ == '__main__':
names = open(_FILENAME, 'rt').readlines()
lead_list, inner_list, tail_list = token_lists(names)
print '#', len(lead_list), len(inner_list), len(tail_list)
print '_LEADS = ', lead_list
print '_INNERS = ', inner_list
print '_TAILS = ', tail_list
Next I used a script to assemble random names from these lists. Here's the one for male names:
import random
_LEADS = ['d', 'j', 'm', 'r', 'l', 'w', 'c', 'h', 'g', 'b', 'br', 't', 'k',
'n', 's', 'cl', 'fr', 'f', 'p', 'st', 'v', 'ch', 'sh', 'gr', 'tr']
_INNERS = ['er', 'ar', 'el', 'or', 'an', 'ic', 'arr', 'am', 'ol', 'on',
'al', 'en', 'ill', 'in', 'err', 'and', 'il', 'om', 'et', 'arl', 'ev',
'ac', 'ust', 'av', 'ert', 'enn', 'ent', 'ath', 'onn', 'it', 'os', 'enc',
'yr', 'em', 'ist', 'anc', 'arc', 'ich', 'as', 'est']
_TAILS = ['o', 'on', 'e', 'y', 'ey', 'er', 'in', 'an', 'ie', 'io', 'en',
'is', 'el', 'us', 'es', 'as', 'ian', 'ed']
def namegen():
syllables = random.randint(0,1)
if random.random() > .85:
syllables += 1
name = random.choice(_LEADS)
for x in range(syllables):
name += random.choice(_INNERS)
name += random.choice(_TAILS)
return name.title()
if __name__ == '__main__':
for x in range(100):
print namegen(),
Which gives output like;
Tyron Frian Grathan Charcan Tren Garrian Stasas Cholian Dites Ris Tie Non Lin Lon Wian Wio Ramas Lo Larus Pan Cio Wasie Mely Chie Levey Bustel Lillen Ben Ponny Panian Nyrus Bre Raso Stel Fasis Konandas Starrin Hian Starlio Rey Jathen Frander Ven Talamio Samin Ches Handey Sterian Nencin Sathon Ponel Citel Momen Wus Rencey Dence Rencan Bevon Jo Bre Tis Non Heved Ne Shestas Lales Ferrus Werrus Foso Standie Ver Gis Giner Ver Nin Res Veled Clon Geler Mustin Gio Bry Juste Shanan Cled Trely Chon Trin Binis Jinen Jacy Ralio Bo Metel De Sian Brie Fanan Donen JedSeeding with female names, I get;
Alianne Cler Cishel Kroller Namie Dah Kyn Niton Hie Trosancel Cher Na Angy Sestin Jah Dathe Pin Traton Teris Dishon Goren Sheny Frindis Chalin Frie Lolleen Ware Chrena Hy Veannon Dyn Padica Sher Chie Kon Brollis Cisi Gy Chey Kreni Dulani Gancis Shancey Satheen Chreanny Shy Clin Cessia Kriner Shestie Hancah Ner Trorer Ger Tren Ramerrer Freanne Kaurah Frolah Kritah Harley Kisten Eleen Trellyn Chiannel Trosi Alisin Genery Cla Mie Jate Shosacey De Angoron Kis Gianon Mandite Chryn Wer Shadalie Polon Gita Ner Fron Va Chrabica Vatia Clia Tacel Rareen Treen Angia Henen Angin Poreen Eleannel Ken Brer Branca Hesta