Recently, this little comic from the Oatmeal, titled “The Terrible & Wonderful Reasons why I run Long Distance” popped up on several friends’ newsfeeds.

It’s a very good piece, and if you are a runner yourself, or maybe a biker, or maybe a swimmer, or maybe just a person, or maybe even a cat of the internet, you should check it out. With the simplest of drawings, it captures very well the essence of an amateur runner.

I myself don’t run in the way that the narrator of this comic runs. I run when it’s nice outside, I run when I’m frustrated, I run when I’m angry, and I run when I’m lost. But I run very rarely, and never do I run fast enough, or long enough, to reach euphoria. My running is always solo, always complentative, and never has it silenced the noise in my head. Very often, it smoothes tangled knots out– if only by virtue of making myself feel healthy and sunkissed again.

Recently, I started dancing again. I’ve been dancing since I was five. When I was three, I watched my older sister’s ballet performance and begged my mom to be onstage, too. I love to move, and I love to dance, and in a way dance is a healing process– after 10 years of competitive gymnastics, I love the feeling of being onstage without having my every step marked by point deductions.

In a moment of sheer luck, I’ve discovered a rare gem: the Manhattan dance studio that is relatively affordable, air-conditioned, spacious, and most elusive of all, not filled with judgemental, competitive, rail-thin preprofessional teenagers waiting to eat me up.

The dancers at this studio are very talented. The instructor is friendly and experienced. But best of all, this class– twice a week, an hour-and-a-half each– whips the my ass. And it also whips my abs, and my arms, and my calves, and my thighs, and most importantly, my brain.

When I enter the studio directly after work, there’s a million thoughts running through my head. Is it bad that I left work before 7? Did I slack off too much today? Will I finish my summer project in time? Do I need a Ph.D? Am I ever going to be a good scientist? Is the world driven by chance or does it have a telos? Is it possible to be both intellectually conscious and happy?

But twenty minutes into class, I’m struggling just to breathe, as this killer warm-up asks me to do thirty more sit-ups, forty more crunches. Bass beats rivet off the ceiling, and sweat drips off my nose.

“I’ve always considered the question to be ‘Why am I alive? Why am I here? What’s the point of me? And to that I say: WHO CARES! FORGET THE WHY YOU ARE IN A RAGING FOREST FULL OF BEAUTY AND AGONY…THIS IS BETTER THAN THE WHY. I run because I seek that clarity”, says the little stick figure in the Oatmeal comic.

There is only one reason why I dance. I want, I crave, I need a better way to express myself. (Sometimes, I wish I could sing louder, clearer, just so I could belt it out like Christina Aguilera in Burlesque at the end of a long work day in my little-town-accidentally-sexy waitress outfit. And then become a professional performer with Cher.)

I’m not a great dancer, by any means. Sure, I can bust a few moves at a party. I’m relatively in shape, I’ve been dancing a long time, and most of all I Love it With All My Heart. I’m a slow learner, I’m a little off-beat, but by show time I’m giving it my all.

Sometimes, dancing makes me feel very small and adolescent again, while I watch the slim beautiful girls at Barnard and Columbia in my dance group rule the stage with their wonderful years of ballet training and their illuminating stage prescence. And when that happens, the voice of my fourteen-year-old self is telling me again that I’m not thin enough, not blonde enough, and just so damn awkward.

But goddamit, sometimes I like that. Sometimes it’s good to worry so much about things that my Computer Science classmates would find trivial, to feel small and nerdy and inadequate again, to feel the ruthless female competition of beauty and grace. Sometimes it’s good to want so much to be good at something that isn’t just a desk job.

It’s a struggle, and so much work, just to be able to express one-tenth of what I want to say in my movement. But in that moment when I’m finally on the beat, and not stumbling over my own feet, I never feel so much alive.

Tags: , ,

· · · ◊ ◊ ◊ · · ·

One should usually not take advice on modeling from a 5-foot tall nerdy asian girl, that is, unless it’s Data Modeling we’re talking about. (Whether or not you should take my advice then is up to your own discretion.) This summer, I’m interning at an education technology company, Knewton, where I have to great opportunity to model student behavior with real data. I’m learning a ton about what it takes to be a Data Scientist, although what concerns me more is the Scientist part, since the term is a bit of a buzzword anyway. Along the way, I figured I’d share some tidbits of knowledge with you. This post is specifically targeted towards non-technical people: my goal is to explain things in such a clear way that anyone with a healthy curiosity should be able to comprehend. I will focus on examples and fun demos, since that’s how I learn best. Feedback appreciated!

WHAT IS A MARKOV CHAIN?
A Markov chain, named after this great moustached man is a type of mathematical system composed of a finite number of discrete states and transitions between these states, denoted by their transition probabilities. The most important thing about a Markov Chain is that it satisfies the Markov Property: that each state depends only on the state directly proceding it* and no others. This independence assumption makes a Markov Chain easy to manipulate mathematically. (*This is a Markov Chain of degree 1, but you could also have a Markov Chain of degree n where we look at the past n states only.) A Markov Chain is a specific kind of Markov Process with discrete states.

A VISUAL
That’s a lot of words for a concept that is in fact very simple. Here’s a picturesque example instead:

Imagine that you are a small frog in a pond of lily pads. The pond is big but there are a countable (discrete) number of lily pads (states). You start on one lily pad (start state) and jump to the next with a certain probability (transition probability). When you’re on one lily pad, you only think of the next one to jump to, and you don’t really care about what lily pads you’ve jumped on in the past (memoryless).

That’s all!

WHY DO WE USE IT?
Markov Chains have many, many applications. (Check out this Wikipedia page for a long list.) They’re useful whenever we have a chain of events, or a discrete set of possible states. A good example is a time series: at time 1, perhaps student S answers question A; at time 2, student S answers question B, and so on.

A RANDOM TEXT GENERATOR
Now, for the fun part!

For Knewton’s company hackday, I’ve built a text analysis “funkit” that can perform a variety
of fun things, given an input text file (corpus). You can clone the source code here. Don’t worry if the word “cloning” sounds very scifi, you can check out the README that I’ve written (residing in that link) for detailed instructions on how to use the code. As long as you have python installed on your computer (Macs come pre-installed) you should be fine and dandy.

What we’re most interested in is the parrot() function. This is the “Markov Chain Babbler” or Random Text Generator that mimics an input text. (Markov Chain Babblers are used to generate Lorem Ipsums (text fillers) such as this wonderful Samuel L. Ipsum example.

Included are a few of my favorite sample “corpuses” (scary word for sample text) taken from Project Gutenburg, it includes:

“memshl.txt” which is the complete Memoirs of Sherlock Holmes
“kerouac.txt”, an excerpt from On the Road
“aurelius.txt”, Marcus Aurelius’ Meditations
and finally, “nietzsche.txt”, Nietzsche’s Beyond Good and Evil.

Here’s a prime snippet of text generated using the Nietzsche corpus, of length 100, one of my favorites:

“CONTEMPT. The moral physiologists. Do not find it broadens and a RIGHT OF RANK, says with other work is much further than a fog, so thinks every sense of its surface or good taste! For my own arts of morals in the influence of life at the weakening and distribution of disguise is himself has been enjoyed by way THERETO is thereby. The very narrow, let others, especially among things generally acknowledged to Me?.. Most people is a philosophy depended nevertheless a living crystallizations as well as perhaps in”

Despite being “nonsense”, it captures the essence of the German philosopher quite well. If you squint a little, it doesn’t take much imagination to see this arise from the mouth of Nietzsche himself.

Here’s some Kerouac text, too:

“Flat on a lot of becoming a wonderful night. I knew I wrote a young fellow in the next door, he comes in Frisco. That’s rights. A western plateau, deep one and almost agreed to Denver whatever, look at exactly what he followed me at the sleeping. He woke up its bad effects, cooked, a cousin of its proud tradition. Well, strangest moment; into the night, grand, get that he was sad ride with a brunette. You reckon if I bought my big smile.”

HOW IT WORKS
Parakeet generates text using a simple level-1 Markov Chain, just like we described above. Let’s break it down:

1. We read the input file and “tokenize” it– in other words we break it up into words and punctuation.
2. Now, for each word in text, we store every possible next word that follows it. We do this using a Python dictionary, aka a hash table.

For example, if we have the following sentence,
“the only people for me are the mad ones, the ones who are mad to live, mad to talk, mad to be saved, desirous of everything at the same time”

We have the following dependencies:

{‘,’: ['the', 'mad', 'mad', 'desirous'],
‘are’: ['the', 'mad'],
‘at’: ['the'],
‘be’: ['saved'],
‘desirous’: ['of'],
‘everything’: ['at'],
‘for’: ['me'],
‘live’: [','],
‘mad’: ['ones', 'to', 'to', 'to'],
‘me’: ['are'],
‘of’: ['everything'],
‘ones’: [',', 'who'],
‘only’: ['people'],
‘people’: ['for'],
‘same’: ['time'],
‘saved’: [','],
‘talk’: [','],
‘the’: ['only', 'mad', 'ones', 'same'],
‘to’: ['live', 'talk', 'be'],
‘who’: ['are']}

Note that in this naive (non-space smart) implementation of the text generator, when we have duplicate occurances of next words, for example
‘mad’: ['ones', 'to', 'to', 'to'], we store it once each time.

3. Now the fun part. Say we want to generate a paragraph of 100 words. First, we randomly choose a startword, that is capitalized first word. Now, we randomly choose a next word from its list of next words (since frequent next words will have many duplicates, it will be chosen more often), and from that word, continue the process till we achieve a paragraph of length 100.

WHY IS THIS A MARKOV PROCESS?
Well, when we build up our paragraphs, we choose our next word based only on the choices generated by our current word. Doing so, we ignore the history of previous words we have chosen (which is why many of the sentences are nonsensical), yet since each choice of the next word is logical based on the current one, we end up with something that emulates the writing style (chaining).

SOURCE CODE:
Check out my code on github, located here. Simply fire up your terminal, and type “git clone the-url”, and it will copy the repo into a directory on your local machine. Further instructions are in the README.

Tags: , , , ,

· · · ◊ ◊ ◊ · · ·