artist/designer Ishac Bertran has this really cool project called code {poems}: a compilation (ha!) of compilable poems in code.

Inspired by this, this afternoon I began to dabble in a bit of my own code poetry.

theHollowMen.py
“”"
T.S. Eliot, 1925
cat, 2014.06.27
Draft1
"""

class World:
     def __str__(self):
          return "this is the way the world ends"

def ends(self):
     return ["whimper"]

def main():
     world = World()

for i in range(4):
     print world

if "bang" not in world.ends():
     print "not with a bang"

if "whimper" in world.ends():
     print "but with a whimper"

if __name__ == "__main__":
     main()

to run, download source and run python theHollowMen.py.
repo here.
more to come.

· · · ◊ ◊ ◊ · · ·

happy day

13 Mar 2014

Three years ago, I learned how to print Hello World in Java for the first time.
That semester, I stayed up many late nights crying because I was so frustrated with how hard it was for me to fix even the tiniest of bugs. Everyone in class seemed light-years above me.

Today, I have been accepted to MIT Media Lab’s MAS program, and I’ll be joining Deb Roy’s Cognitive Machines lab this fall. It’s truly a nerd dream come true.

I think if my mother taught me one thing,
it is that
it is not how successful you are
or how wealthy you are
or even how hard you work that matters.
what matters is how interesting you are
because that is your human value.

And mom, if you’re reading this, don’t read too hard into it.

Tags: ,

· · · ◊ ◊ ◊ · · ·

Hey nerds!
Check out this cool model me and my friend Andy developed at Knewton last summer!

 

Tags: , , ,

· · · ◊ ◊ ◊ · · ·

Hey nerds!
Check out this cool model me and my friend Andy developed at Knewton last summer!

 

Tags: , , ,

· · · ◊ ◊ ◊ · · ·

This semester I have the pleasure of being involved with a pretty cool research project on Government secrecy at Columbia University.

The Declassification Engine, which bills itself as “Computational Analysis of Official Secrecy”, is a joint project between the History, Statistics, and Computer Science (specifically Natural Language Processing) departments at CU (among a slew of other things) to provide tools and better analysis of just what the government has and has not (and will and will not) be withholding from the public throughout the years.

I myself found a research position on this project by following my favorite TA in my favorite class on Natural Language Processing into his research life, which is how one often finds interesting things.

The project is still quite young and less rigidly defined, so it’s fun to be involved early.

I’ll be working on image processing and language processing, among other things I really enjoy doing.

Needless to say, I’m a huge proponent of free speech, free press, and transparency, and quite excited. Plus it never hurts to feel like a real badass hacker, ripping through hundreds of thousands of federal censored papers in the terminal.

Check out a recent interview on the Declassification on NPR!

Tags: , ,

· · · ◊ ◊ ◊ · · ·

One should usually not take advice on modeling from a 5-foot tall nerdy asian girl, that is, unless it’s Data Modeling we’re talking about. (Whether or not you should take my advice then is up to your own discretion.) This summer, I’m interning at an education technology company, Knewton, where I have to great opportunity to model student behavior with real data. I’m learning a ton about what it takes to be a Data Scientist, although what concerns me more is the Scientist part, since the term is a bit of a buzzword anyway. Along the way, I figured I’d share some tidbits of knowledge with you. This post is specifically targeted towards non-technical people: my goal is to explain things in such a clear way that anyone with a healthy curiosity should be able to comprehend. I will focus on examples and fun demos, since that’s how I learn best. Feedback appreciated!

WHAT IS A MARKOV CHAIN?
A Markov chain, named after this great moustached man is a type of mathematical system composed of a finite number of discrete states and transitions between these states, denoted by their transition probabilities. The most important thing about a Markov Chain is that it satisfies the Markov Property: that each state depends only on the state directly proceding it* and no others. This independence assumption makes a Markov Chain easy to manipulate mathematically. (*This is a Markov Chain of degree 1, but you could also have a Markov Chain of degree n where we look at the past n states only.) A Markov Chain is a specific kind of Markov Process with discrete states.

A VISUAL
That’s a lot of words for a concept that is in fact very simple. Here’s a picturesque example instead:

Imagine that you are a small frog in a pond of lily pads. The pond is big but there are a countable (discrete) number of lily pads (states). You start on one lily pad (start state) and jump to the next with a certain probability (transition probability). When you’re on one lily pad, you only think of the next one to jump to, and you don’t really care about what lily pads you’ve jumped on in the past (memoryless).

That’s all!

WHY DO WE USE IT?
Markov Chains have many, many applications. (Check out this Wikipedia page for a long list.) They’re useful whenever we have a chain of events, or a discrete set of possible states. A good example is a time series: at time 1, perhaps student S answers question A; at time 2, student S answers question B, and so on.

A RANDOM TEXT GENERATOR
Now, for the fun part!

For Knewton’s company hackday, I’ve built a text analysis “funkit” that can perform a variety
of fun things, given an input text file (corpus). You can clone the source code here. Don’t worry if the word “cloning” sounds very scifi, you can check out the README that I’ve written (residing in that link) for detailed instructions on how to use the code. As long as you have python installed on your computer (Macs come pre-installed) you should be fine and dandy.

What we’re most interested in is the parrot() function. This is the “Markov Chain Babbler” or Random Text Generator that mimics an input text. (Markov Chain Babblers are used to generate Lorem Ipsums (text fillers) such as this wonderful Samuel L. Ipsum example.

Included are a few of my favorite sample “corpuses” (scary word for sample text) taken from Project Gutenburg, it includes:

“memshl.txt” which is the complete Memoirs of Sherlock Holmes
“kerouac.txt”, an excerpt from On the Road
“aurelius.txt”, Marcus Aurelius’ Meditations
and finally, “nietzsche.txt”, Nietzsche’s Beyond Good and Evil.

Here’s a prime snippet of text generated using the Nietzsche corpus, of length 100, one of my favorites:

“CONTEMPT. The moral physiologists. Do not find it broadens and a RIGHT OF RANK, says with other work is much further than a fog, so thinks every sense of its surface or good taste! For my own arts of morals in the influence of life at the weakening and distribution of disguise is himself has been enjoyed by way THERETO is thereby. The very narrow, let others, especially among things generally acknowledged to Me?.. Most people is a philosophy depended nevertheless a living crystallizations as well as perhaps in”

Despite being “nonsense”, it captures the essence of the German philosopher quite well. If you squint a little, it doesn’t take much imagination to see this arise from the mouth of Nietzsche himself.

Here’s some Kerouac text, too:

“Flat on a lot of becoming a wonderful night. I knew I wrote a young fellow in the next door, he comes in Frisco. That’s rights. A western plateau, deep one and almost agreed to Denver whatever, look at exactly what he followed me at the sleeping. He woke up its bad effects, cooked, a cousin of its proud tradition. Well, strangest moment; into the night, grand, get that he was sad ride with a brunette. You reckon if I bought my big smile.”

HOW IT WORKS
Parakeet generates text using a simple level-1 Markov Chain, just like we described above. Let’s break it down:

1. We read the input file and “tokenize” it– in other words we break it up into words and punctuation.
2. Now, for each word in text, we store every possible next word that follows it. We do this using a Python dictionary, aka a hash table.

For example, if we have the following sentence,
“the only people for me are the mad ones, the ones who are mad to live, mad to talk, mad to be saved, desirous of everything at the same time”

We have the following dependencies:

{‘,’: ['the', 'mad', 'mad', 'desirous'],
‘are’: ['the', 'mad'],
‘at’: ['the'],
‘be’: ['saved'],
‘desirous’: ['of'],
‘everything’: ['at'],
‘for’: ['me'],
‘live’: [','],
‘mad’: ['ones', 'to', 'to', 'to'],
‘me’: ['are'],
‘of’: ['everything'],
‘ones’: [',', 'who'],
‘only’: ['people'],
‘people’: ['for'],
‘same’: ['time'],
‘saved’: [','],
‘talk’: [','],
‘the’: ['only', 'mad', 'ones', 'same'],
‘to’: ['live', 'talk', 'be'],
‘who’: ['are']}

Note that in this naive (non-space smart) implementation of the text generator, when we have duplicate occurances of next words, for example
‘mad’: ['ones', 'to', 'to', 'to'], we store it once each time.

3. Now the fun part. Say we want to generate a paragraph of 100 words. First, we randomly choose a startword, that is capitalized first word. Now, we randomly choose a next word from its list of next words (since frequent next words will have many duplicates, it will be chosen more often), and from that word, continue the process till we achieve a paragraph of length 100.

WHY IS THIS A MARKOV PROCESS?
Well, when we build up our paragraphs, we choose our next word based only on the choices generated by our current word. Doing so, we ignore the history of previous words we have chosen (which is why many of the sentences are nonsensical), yet since each choice of the next word is logical based on the current one, we end up with something that emulates the writing style (chaining).

SOURCE CODE:
Check out my code on github, located here. Simply fire up your terminal, and type “git clone the-url”, and it will copy the repo into a directory on your local machine. Further instructions are in the README.

Tags: , , , ,

· · · ◊ ◊ ◊ · · ·

One should usually not take advice on modeling from a 5-foot tall nerdy asian girl, that is, unless it’s Data Modeling we’re talking about. (Whether or not you should take my advice then is up to your own discretion.) This summer, I’m interning at an education technology company, Knewton, where I have to great opportunity to model student behavior with real data. I’m learning a ton about what it takes to be a Data Scientist, although what concerns me more is the Scientist part, since the term is a bit of a buzzword anyway. Along the way, I figured I’d share some tidbits of knowledge with you. This post is specifically targeted towards non-technical people: my goal is to explain things in such a clear way that anyone with a healthy curiosity should be able to comprehend. I will focus on examples and fun demos, since that’s how I learn best. Feedback appreciated!

WHAT IS A MARKOV CHAIN?
A Markov chain, named after this great moustached man is a type of mathematical system composed of a finite number of discrete states and transitions between these states, denoted by their transition probabilities. The most important thing about a Markov Chain is that it satisfies the Markov Property: that each state depends only on the state directly proceding it* and no others. This independence assumption makes a Markov Chain easy to manipulate mathematically. (*This is a Markov Chain of degree 1, but you could also have a Markov Chain of degree n where we look at the past n states only.) A Markov Chain is a specific kind of Markov Process with discrete states.

A VISUAL
That’s a lot of words for a concept that is in fact very simple. Here’s a picturesque example instead:

Imagine that you are a small frog in a pond of lily pads. The pond is big but there are a countable (discrete) number of lily pads (states). You start on one lily pad (start state) and jump to the next with a certain probability (transition probability). When you’re on one lily pad, you only think of the next one to jump to, and you don’t really care about what lily pads you’ve jumped on in the past (memoryless).

That’s all!

WHY DO WE USE IT?
Markov Chains have many, many applications. (Check out this Wikipedia page for a long list.) They’re useful whenever we have a chain of events, or a discrete set of possible states. A good example is a time series: at time 1, perhaps student S answers question A; at time 2, student S answers question B, and so on.

A RANDOM TEXT GENERATOR
Now, for the fun part!

For Knewton’s company hackday, I’ve built a text analysis “funkit” that can perform a variety
of fun things, given an input text file (corpus). You can clone the source code here. Don’t worry if the word “cloning” sounds very scifi, you can check out the README that I’ve written (residing in that link) for detailed instructions on how to use the code. As long as you have python installed on your computer (Macs come pre-installed) you should be fine and dandy.

What we’re most interested in is the parrot() function. This is the “Markov Chain Babbler” or Random Text Generator that mimics an input text. (Markov Chain Babblers are used to generate Lorem Ipsums (text fillers) such as this wonderful Samuel L. Ipsum example.

Included are a few of my favorite sample “corpuses” (scary word for sample text) taken from Project Gutenburg, it includes:

“memshl.txt” which is the complete Memoirs of Sherlock Holmes
“kerouac.txt”, an excerpt from On the Road
“aurelius.txt”, Marcus Aurelius’ Meditations
and finally, “nietzsche.txt”, Nietzsche’s Beyond Good and Evil.

Here’s a prime snippet of text generated using the Nietzsche corpus, of length 100, one of my favorites:

“CONTEMPT. The moral physiologists. Do not find it broadens and a RIGHT OF RANK, says with other work is much further than a fog, so thinks every sense of its surface or good taste! For my own arts of morals in the influence of life at the weakening and distribution of disguise is himself has been enjoyed by way THERETO is thereby. The very narrow, let others, especially among things generally acknowledged to Me?.. Most people is a philosophy depended nevertheless a living crystallizations as well as perhaps in”

Despite being “nonsense”, it captures the essence of the German philosopher quite well. If you squint a little, it doesn’t take much imagination to see this arise from the mouth of Nietzsche himself.

Here’s some Kerouac text, too:

“Flat on a lot of becoming a wonderful night. I knew I wrote a young fellow in the next door, he comes in Frisco. That’s rights. A western plateau, deep one and almost agreed to Denver whatever, look at exactly what he followed me at the sleeping. He woke up its bad effects, cooked, a cousin of its proud tradition. Well, strangest moment; into the night, grand, get that he was sad ride with a brunette. You reckon if I bought my big smile.”

HOW IT WORKS
Parakeet generates text using a simple level-1 Markov Chain, just like we described above. Let’s break it down:

1. We read the input file and “tokenize” it– in other words we break it up into words and punctuation.
2. Now, for each word in text, we store every possible next word that follows it. We do this using a Python dictionary, aka a hash table.

For example, if we have the following sentence,
“the only people for me are the mad ones, the ones who are mad to live, mad to talk, mad to be saved, desirous of everything at the same time”

We have the following dependencies:

{‘,’: ['the', 'mad', 'mad', 'desirous'],
‘are’: ['the', 'mad'],
‘at’: ['the'],
‘be’: ['saved'],
‘desirous’: ['of'],
‘everything’: ['at'],
‘for’: ['me'],
‘live’: [','],
‘mad’: ['ones', 'to', 'to', 'to'],
‘me’: ['are'],
‘of’: ['everything'],
‘ones’: [',', 'who'],
‘only’: ['people'],
‘people’: ['for'],
‘same’: ['time'],
‘saved’: [','],
‘talk’: [','],
‘the’: ['only', 'mad', 'ones', 'same'],
‘to’: ['live', 'talk', 'be'],
‘who’: ['are']}

Note that in this naive (non-space smart) implementation of the text generator, when we have duplicate occurances of next words, for example
‘mad’: ['ones', 'to', 'to', 'to'], we store it once each time.

3. Now the fun part. Say we want to generate a paragraph of 100 words. First, we randomly choose a startword, that is capitalized first word. Now, we randomly choose a next word from its list of next words (since frequent next words will have many duplicates, it will be chosen more often), and from that word, continue the process till we achieve a paragraph of length 100.

WHY IS THIS A MARKOV PROCESS?
Well, when we build up our paragraphs, we choose our next word based only on the choices generated by our current word. Doing so, we ignore the history of previous words we have chosen (which is why many of the sentences are nonsensical), yet since each choice of the next word is logical based on the current one, we end up with something that emulates the writing style (chaining).

SOURCE CODE:
Check out my code on github, located here. Simply fire up your terminal, and type “git clone the-url”, and it will copy the repo into a directory on your local machine. Further instructions are in the README.

Tags: , , , ,

· · · ◊ ◊ ◊ · · ·

One should usually not take advice on modeling from a 5-foot tall nerdy asian girl, that is, unless it’s Data Modeling we’re talking about. (Whether or not you should take my advice then is up to your own discretion.) This summer, I’m interning at an education technology company, Knewton, where I have to great opportunity to model student behavior with real data. I’m learning a ton about what it takes to be a Data Scientist, although what concerns me more is the Scientist part, since the term is a bit of a buzzword anyway. Along the way, I figured I’d share some tidbits of knowledge with you. This post is specifically targeted towards non-technical people: my goal is to explain things in such a clear way that anyone with a healthy curiosity should be able to comprehend. I will focus on examples and fun demos, since that’s how I learn best. Feedback appreciated!

WHAT IS A MARKOV CHAIN?
A Markov chain, named after this great moustached man is a type of mathematical system composed of a finite number of discrete states and transitions between these states, denoted by their transition probabilities. The most important thing about a Markov Chain is that it satisfies the Markov Property: that each state depends only on the state directly proceding it* and no others. This independence assumption makes a Markov Chain easy to manipulate mathematically. (*This is a Markov Chain of degree 1, but you could also have a Markov Chain of degree n where we look at the past n states only.) A Markov Chain is a specific kind of Markov Process with discrete states.

A VISUAL
That’s a lot of words for a concept that is in fact very simple. Here’s a picturesque example instead:

Imagine that you are a small frog in a pond of lily pads. The pond is big but there are a countable (discrete) number of lily pads (states). You start on one lily pad (start state) and jump to the next with a certain probability (transition probability). When you’re on one lily pad, you only think of the next one to jump to, and you don’t really care about what lily pads you’ve jumped on in the past (memoryless).

That’s all!

WHY DO WE USE IT?
Markov Chains have many, many applications. (Check out this Wikipedia page for a long list.) They’re useful whenever we have a chain of events, or a discrete set of possible states. A good example is a time series: at time 1, perhaps student S answers question A; at time 2, student S answers question B, and so on.

A RANDOM TEXT GENERATOR
Now, for the fun part!

For Knewton’s company hackday, I’ve built a text analysis “funkit” that can perform a variety
of fun things, given an input text file (corpus). You can clone the source code here. Don’t worry if the word “cloning” sounds very scifi, you can check out the README that I’ve written (residing in that link) for detailed instructions on how to use the code. As long as you have python installed on your computer (Macs come pre-installed) you should be fine and dandy.

What we’re most interested in is the parrot() function. This is the “Markov Chain Babbler” or Random Text Generator that mimics an input text. (Markov Chain Babblers are used to generate Lorem Ipsums (text fillers) such as this wonderful Samuel L. Ipsum example.

Included are a few of my favorite sample “corpuses” (scary word for sample text) taken from Project Gutenburg, it includes:

“memshl.txt” which is the complete Memoirs of Sherlock Holmes
“kerouac.txt”, an excerpt from On the Road
“aurelius.txt”, Marcus Aurelius’ Meditations
and finally, “nietzsche.txt”, Nietzsche’s Beyond Good and Evil.

Here’s a prime snippet of text generated using the Nietzsche corpus, of length 100, one of my favorites:

“CONTEMPT. The moral physiologists. Do not find it broadens and a RIGHT OF RANK, says with other work is much further than a fog, so thinks every sense of its surface or good taste! For my own arts of morals in the influence of life at the weakening and distribution of disguise is himself has been enjoyed by way THERETO is thereby. The very narrow, let others, especially among things generally acknowledged to Me?.. Most people is a philosophy depended nevertheless a living crystallizations as well as perhaps in”

Despite being “nonsense”, it captures the essence of the German philosopher quite well. If you squint a little, it doesn’t take much imagination to see this arise from the mouth of Nietzsche himself.

Here’s some Kerouac text, too:

“Flat on a lot of becoming a wonderful night. I knew I wrote a young fellow in the next door, he comes in Frisco. That’s rights. A western plateau, deep one and almost agreed to Denver whatever, look at exactly what he followed me at the sleeping. He woke up its bad effects, cooked, a cousin of its proud tradition. Well, strangest moment; into the night, grand, get that he was sad ride with a brunette. You reckon if I bought my big smile.”

HOW IT WORKS
Parakeet generates text using a simple level-1 Markov Chain, just like we described above. Let’s break it down:

1. We read the input file and “tokenize” it– in other words we break it up into words and punctuation.
2. Now, for each word in text, we store every possible next word that follows it. We do this using a Python dictionary, aka a hash table.

For example, if we have the following sentence,
“the only people for me are the mad ones, the ones who are mad to live, mad to talk, mad to be saved, desirous of everything at the same time”

We have the following dependencies:

{‘,’: ['the', 'mad', 'mad', 'desirous'],
‘are’: ['the', 'mad'],
‘at’: ['the'],
‘be’: ['saved'],
‘desirous’: ['of'],
‘everything’: ['at'],
‘for’: ['me'],
‘live’: [','],
‘mad’: ['ones', 'to', 'to', 'to'],
‘me’: ['are'],
‘of’: ['everything'],
‘ones’: [',', 'who'],
‘only’: ['people'],
‘people’: ['for'],
‘same’: ['time'],
‘saved’: [','],
‘talk’: [','],
‘the’: ['only', 'mad', 'ones', 'same'],
‘to’: ['live', 'talk', 'be'],
‘who’: ['are']}

Note that in this naive (non-space smart) implementation of the text generator, when we have duplicate occurances of next words, for example
‘mad’: ['ones', 'to', 'to', 'to'], we store it once each time.

3. Now the fun part. Say we want to generate a paragraph of 100 words. First, we randomly choose a startword, that is capitalized first word. Now, we randomly choose a next word from its list of next words (since frequent next words will have many duplicates, it will be chosen more often), and from that word, continue the process till we achieve a paragraph of length 100.

WHY IS THIS A MARKOV PROCESS?
Well, when we build up our paragraphs, we choose our next word based only on the choices generated by our current word. Doing so, we ignore the history of previous words we have chosen (which is why many of the sentences are nonsensical), yet since each choice of the next word is logical based on the current one, we end up with something that emulates the writing style (chaining).

SOURCE CODE:
Check out my code on github, located here. Simply fire up your terminal, and type “git clone the-url”, and it will copy the repo into a directory on your local machine. Further instructions are in the README.

Tags: , , , ,

· · · ◊ ◊ ◊ · · ·

There are many concepts in the human mind that may indeed be figments of one’s imagination: unicorns, flying pigs, and a truly free lunch, for example. Gender bias in math and science, however, is unfortunately not one of these. (That would be called gaslighting.)

It is extremely upsetting to hear that there are men, and even more dissapointing, women, out there who cannot grasp the idea that it might in fact be true, and morever, statistically relevant. This is particularly disturbing considering that students at Columbia are often considered some of the brightest and most liberal in America.

Gender bias in math and science is not a myth. (Check out this post of mine a while back, and for more on this topic, Cathy’s writing on this recent infographic in the nytimes on the depressing results of female performance in science exams in the US).

I will not go in depth today about my feelings about natural aptitude in men versus women in Math and Science, because a) I’m in class and b) really tired of ranting this week, which has been endlessly trying on my patience, but I will leave you this: my firm belief is that Self-Doubt is the Number One cause of underachievement with respect to women in technical fields. I say this because I do not believe that natural talent has much to do with scientific success– only hard work, interest, and a belief in one’s ability to learn.

I cannot tell you how many times I’ve heard women in Computer Science doubt their abilities to code or take courses, myself entirely included in this camp, yet there are very few times I’ve heard a similiar complaint from men: on all levels of learning. I myself did not become comfortable debating and discussing Computer Science (the most fun and best learning method in any field!) until nearly 2 years after my study began; and that was after I found great female (and male) mentors, and formed a strong friend group and area of interest (Data Science and Natural Language) within the community.

The problem with self-doubt starts very young– I often find that girls seem to believe they are not “suited” towards math or “no good” at it. Almost always this is untrue. If you do not believe me I will offer myself as an example: I may be no mathematician but, considering that I am currently attending and enjoying Columbia’s Engineering school and have accepted a full time paying internship for Data Science this summer, I can pretty safely say that my analytic thinking skills are at least adequate. Yet, up until senior year of high school I always believed myself to dislike math and consider it a non-option; almost all the “math geniuses” I knew were male and had personalities and interests far removed from my own.

These sentiments in my mind were even occasionally echoed by others. When I was a junior in high school my pre-calculus teacher actually refused to recommend me for the more advanced BC AP Calculus class. She told me that I was not “naturally suited” towards mathematics, and had I not had such supportive and encouraging parents (being at that age somewhat obedient towards authoratitive voices who told me I was no good at things) I may have never asked for a waiver. This was, of course, entire BS on her part: not only did I enjoy taking Calculus (I went on to take Calculus III and IV at Columbia), I ended up receiving an “A” in the class. Note that there was no concrete evidence whatsoever in her choice to deny my a recommendation; I had in fact been a stellar student in her class and received an “A+”.

I’m not saying that this specific instance was necessarily driven by gender bias; only that if you are a young girl, especially one who is shy and has not yet developed the thick armor necessarily to fend herself from the dissent and criticism prevalent in the perils of daily life; these various doubts on all sides (professors, colleagues, internal) can cumulate and deter a further pursuit of mathematics (or science). Which is a real damn shame.

I only hope that gender bias it will one day go in the way of the unicorn; until then, I will continue to try my best to lead as a good role model and encourage those around me to do the same, or at least die trying. (Hopefully, die of natural causes after a healthy and happy life trying.)

 

Tags: , , ,

· · · ◊ ◊ ◊ · · ·

talking to dad

25 Nov 2012

This Thanksgiving dinner, something miraculous happened: I finally felt smart enough to talk to my dad about science.

Now, granted, lest you think I have uncommunicative parents, my dad has been trying to talk to me about science since birth (at 5: do you want to build a computer board with me? now wouldn’t that be fun? me: no! i want to play with Barbies). However, it took me a full 20 years of life and 2 and counting years of Ivy league schooling to finally, finally know what he was talking about.

I feel like this monumental event speaks volumes about a number of important things.

First: that barriers to entry and barriers to culture in science, especially computer science, are, as previously suspected, very very high. I talked a little bit about the use of jargon and how it can discourage many new learners before, especially newcomers and women. Keep in mind that my dad is probably the nicest guy I know, and extremely enthusiastic in trying to get his children to be interested in his work, yet previous to this fall, even when I had been coding for more than a year, it was still intimidating and tough to talk to him about my work.

Second: on the power of being a role model and a huge influence on one’s life without ever telling someone what to do. Growing up, I spent a great deal more time with my mother (who didn’t work) than my father (who is a night owl and worked way past my elementary school bedtime). Neither of my parents told me what I should study or what sort of career to pursue, and my dad especially never tried to push any sort of academic dogma on me. I always felt that he was incredibly smart but that coding and computer science were something I had neither the aptitude or interest in: plus, since my parent did it, it must be, in some way, decidely uncool.

Fast-foward 20 years and he has two daughters in the hard sciences: one is a ph.d student at Princeton and another in Columbia engineering school. It’s especially fun to see how much my area of interest turned out so similiar to my father’s work: both of us are either studying or working in the field of Natural Language Processing, and in fact, I am taking a class under the direction of one of his former colleagues during the glory days of Bell Labs. Note that, being stubborn and independent, I never, ever talked to my dad much about what I was studying at school or doing at my summer jobs. He didn’t suggest NLP or AI to me; I just sort of fell into it, and loved it. An anecdote to the power of being a good role model. (Cathy talks a bit about a similiar topic on her great post, on the making of a girl nerd.)

Or, perhaps the apple really does never fall far from the tree.

Tags: , ,

· · · ◊ ◊ ◊ · · ·