February 2012
9 posts
Notes on Linked Lists
In an effort to improve my interview questions and to review my own knowledge of the basics, I have started studying basic data structures and algorithms. I have actually done this every few years or so just keep it fresh. Since I use lists and their variants everyday, I thought this might be a good place to start the review. The interesting thing about linked lists, is that for 99% of...
Feb 25th
Feb 25th
11,000 plays with no paper trail
This recording has gotten more than 11000 plays. And neither I nor SC have any idea why. So much for big data analytics.
Feb 24th
Feb 23rd
Programmers are athletes
As my career advances on, something is becoming more and more obvious. I am surrounded by people younger than me. At 35, I am not middle-aged, I am an old man. In my corporate environment, there is a preference to hire very young and right out of school. These kids are bright, hungry, and have almost no opinion about anything. Perfect to mold into the company’s image. The older...
Feb 23rd
1 note
3 tags
Logistic Regression
-NOTES, NEEDS FINISHING- Basic classification algorithm used for prediction of the probability of occurrence of an event by fitting data to a logit function logistic curve. Hypothesis Representation In logistic regression, we want the hypothesis to be $$0 \le h_\theta(x) \le 1$$  For this, we use the Sigmoid or Logistic Function $$\begin{eqnarray} g(z) &=& { \frac{1}{1 + e^{-z}} }...
Feb 23rd
Example MapReduce classes in Scala →
Feb 22nd
Google's Gson List deserialization and variable...
I spent much of the day today hung on two very simple and unrelated problems while running an HBase MapReduce job whose mapper collects data stored as JSON from HBase and writes it to the context as Text (serialized JSON). The reducer deserializes the data and does some long running calculations on it. The primary reason these calculations must be run in the reducer is because the HBase scanner...
Feb 22nd
HBase and submitting remote MapReduce jobs to...
I’ve been working with the Hadoop ecosystem quite a bit over the last 3 weeks. For as much documentation as exists on the topic, most of the books, online tutorials, and the like cover only the basic use cases. Many of these use cases revolve around using shell scripts to call command line tools (tiny apps with a main method that invoke other things). By following the basic tutorials, it is...
Feb 3rd