Keep reading to find out more about the name generator, or just start generating names.
Recent Posts:
Keep reading to find out more about the name generator, or just start generating names.
What I mean by this is simply that the program generates random output patterns based on the statistical frequency with which the patterns occured in some input sample.
Many random name generators I've seen don't generate new names at all (they just select a random name from a long list of predetermined names). So while the names may look random to the untrained human, they aren't.
Most of those which actually do generate new names operate by selecting random combinations of predetermined pieces, usually letters or syllables. This can be done well, but has a number of potential pitfalls. If the letters are just combined randomly, the resulting names may be unpronouceable. Care can be taken to alternate between vowels and consonants, but this limits names to a rather restrictive pattern. Using syllables instead solves some of these problems, but requires the programmer to decide which syllables to allow. And even so, not all combinations of syllables may be idiomatic. On top of that, if the programmer ever wants to generate names in a different language, they will need to reprogram the generator for that language.
The basic idea is that my program reads through a list of input names and counts how many times any letter follows any other letter. So given the names James, John and Jacob, the combination Ja occurs twice, the combination Jo occurs once, and so on. For example, given that the first letter is a J, the second letter in the name should be o 2/3rds of the time, and a 1/3rd of the time.
Unfortunately, this simple method frequently generates names which are unpronounceable. My algorithm fixes this by taking into account more than one preceding letter. So if I use two letters of context in the preceding example, the combinations Jam, Joh and Jac each occur once.
Taking into account multiple letters at a time helps prevent unpronouceable names from occuring, and also helps guide the transitions between syllables so the names sound more idiomatic as well. The danger in increasing the context length too much is that real names will start occuring in the output. I have found that 4 characters of context seems to be just about right for generating surnames.
From the US Census Bureau genealogy data, which provides a list of about 89000 last names in the US.
Yes, it is statistically possible for the name generator to produce a real name. Ignore these if you like, although in my experience, the real names it generates are some of the funniest (e.g. Nunmaker).
Again, I do not filter to avoid printing certain words. So if my name generator prints anything inflamatory, please ignore it.