In Which a Project was Undertaken, Part 1

With two Coursera specializations under my belt (which I detailed in In Which Instructions were Sought, Part 1 (link), Part 2 (link), and Part 3 (link)), it's time to put theory into practice and make something new.

What's the big idea?

Go Big or Go Home? (YouTube link)

I have been trying to come up with interesting projects for a while, and here is a list of some of the ideas that I think might be worth exploring:

______________________________________________________

Predicting the origin of a name:

In today's modern global society, you often get the chance to meet people from faraway places. Sometimes you hear a name, and you can't place where the person is from. For example, typically Chinese and Korean names have three syllables, but to non-Koreans and non-Chinese, it might be difficult to tell them apart. A neural network can be trained to categorize the name, in essence a classification task with all the different origins. The difficulty is collecting all these names written in English, since the neural network needs to be trained to tell the names apart, not just figure out the language encoding. I had limited the scope to just first names and collected over 13,000 names from various websites (baby name sites were especially helpful), but spread across over 50 different origins, it amounts to only about 260 names per origin. And considering how varied names can be within one country / culture / people, I suspect that much more data is needed.

______________________________________________________

Generate nonsense speech for background noise:

Some people prefer to have some background noise when they are working on something. It can be music or a television show playing in the background, or the sound of a bustling cafe with conversation all around. I often find myself listening to music as I type, and I prefer to listen to music in a language where I can't understand the lyrics, so that I don't end up paying attention to what the singer is saying. I thought it would be the same for conversations, it would act much better as background noise if the words being spoken are not something you can understand. Therefore, I thought of using a neural network to generate a string of sounds that plausibly can be language, but actually isn't. The complexity does not seem very high, since the neural network has wide latitude as to what is acceptable output. However, I am not certain that a neural network is necessary or the best way to go about this, so more research is needed.

______________________________________________________

Figure out text from pen audio:

As if straight out of a spy novel, the idea is to figure out what is being written by capturing the sound of pen on paper. I'm sure the intelligence community has thought of something like this already, but there's nothing out in the public yet. The main hurdle of this idea is the lack of available data. Collecting enough data for this project would require a large amount of time and effort and likely money, and although there are applications for this idea besides spycraft, such as automatic transcription, but they are few and don't seem very viable business-wise, so it would remain mostly an expensive toy project, making this idea dead in the water.

______________________________________________________

Determine most attractive face for personalized video:

Nowadays there are various ways to measure the reaction of an audience to visual entertainment. Electroencephalography (EEG) or brain wave measurements, pupil dilation tracking, and heart rate monitors have been used in cases such as gauging user response when playing VR games, to keep the user engaged and highly excited. This technology can be paired with the recent ability of Generative Adversarial Networks to generate photorealistic faces of people who does not exist (link), so that a large number of faces can be shown to a user, and automatic measurements taken of what kind of faces elicit the largest emotional response. And finally, this data can be used to generate virtual performers, or through deep fake technology, alter the faces of real performers with the facial features determined through the measurements. The idea has applications in entertainment (adjust movies and TV shows so that the performers look more impactful for their roles) and advertisement (as suggested by a friend... a scary thought, however one that does have market potential). This is a fairly massive project, and so it'll be shelved for now.

Walking Before Running

You better run~~ run~~ run~~ run~~ run~~ (YouTube link)

Before I embark on one of these large projects, I wanted to do a quick small one whose main purpose is for me to get some hands-on experience with going through the parameter selection and hyperparameter tuning process, rather than to make something amazing. Getting the fundamentals down is important so that on more grandiose projects, less time can be spent on the minutia and more time on the big concepts. So I've settled on a lightweight first project: a script generator. I will set up a neural network and train it on the scripts of a show, and have it spew out lines similar to what the characters would say in the show.

I wanted to use the script from something iconic, something that many people would recognize, and yet have a style that is unique enough that it's not just regular everyday conversation... or weird everyday conversation (see neural network generated Friends script: link). So I chose Star Trek: The Next Generation (see my next blog, In Which a Project was Undertaken, Part 2: link). Let the technobabble flow!

Star Trek: The Next Generation (Wikipedia link, IMDB link)

Still Waters Run Deep Learning