[video]
For the last 5 or 6 years I have been fishing around for branding and identity for the things I produce. I have started a company, closed a company, worked with a talented brand consultant, and almost worked with another. I have also started and stalled several different blogs relating to these endeavors, though none have really stuck. This blog is yet another attempt at expression and documentation.
Here are some of the names I have worked under during the years of 2005 or so until 2011.
For Unnature, I was specifically attempting to market the bioacoustic simulations I started making while studying evolutionary and adaptive systems at the University of Sussex. My obsessions there had already produced a paper, and I really wanted to find a way to continue this work. A company was formed with the intent of finding a product incarnation that I could produce and sell. I imagined a shop that made sort of an updated clockwork automata, but this time in computational media. However, there really was no market or at least no product market fit. I had an idea in search of a market. Bad dog.


The deeper I got into the project, the more I realized that it was the conceptual research that was really spinning my gears, not a love of the marketplace. A conference paper and book chapter later, it was totally clear that I should have pursued this as art, and not as business.
By about 2007, I was becoming more interested in the sounds actually made by animals, rather than just trying to simulate them. It started innocently enough - I needed source material for the simulations. I also wanted to document and understand the patterns I was trying to emulate - where did they come from, what motivates choruses, what do they actually sound like. So I started listening and recording. A lot.
I apparently don’t learn my lessons. After accumulating a bunch of recordings and some interesting concepts, I set out to publish the work, but not through established channels - I had to envision it as a business. The first label was “The Scientific Forestry Service” which put out 2 releases, most of which are now sold. After that came “Fieldcraft Records” on which I published most of my 2011 material. Thing is, I didn’t need to start the labels. I just needed to send out the material and let others publish what they wanted and press up the other editions they didn’t want for myself.
The reason I write this down is long way of saying that all these identities are now dead. What I interpreted from the world is that people want something packaged, coherent, and identifiable. I feel a little foolish now to realize that this identity can be a person, made of experiences and productions, not just some fabricated, controlled image.
In an effort to improve my interview questions and to review my own knowledge of the basics, I have started studying basic data structures and algorithms. I have actually done this every few years or so just keep it fresh. Since I use lists and their variants everyday, I thought this might be a good place to start the review.
The interesting thing about linked lists, is that for 99% of programmers, the last time they saw this structure was during study or in preparation for an interview. I asked a colleague the last time he saw this structure in its primitive form and he said when he worked in C and he had to do everything himself. Thankfully, there are all sorts of data structures whose implementations we take for granted these days. Imagine if every you wanted to make breakfast you had to gather eggs from the chicken, harvest wheat and leven up some bread, milk the cow and churn some butter, grow, roast, and grind the beans… you get the idea.
Let’s make breakfast
Here is an example in C
typedef struct IntElement {
struct IntElement *next;
int data;
} IntElement;
And in our favorite language Java (sike)
public class ListElement {
ListElement next;
Object data;
public ListElement(Object data){
this.data = data;
}
}
Here is the Java
public ListElement insertInFront( ListElement list, Object data){
ListElement l = new ListElement(data);
l.next = list;
return l; // the new head!
}
This is easy in most any language (but here is the Java)
public ListElement find(ListElement head, Object data){
while(head.data != data && head != null){
head = list.next;
}
return head
}
Below is the C implementation for deleting an element from a linked list. If we passed in just a pointer to the head as the first argument (*head, rather than **head), then reassignment will only be to the local copy of the head pointer. So we need a pointer to that pointer! I am so glad that I work so infrequently in C that I forget this all the time.
Here is the C
bool deleteElement( IntElement **head, IntElement *deleteMe){
IntElement *elem = *head;
// special case for head
if (deleteMe == *head){
*head = elem->next;
delete deleteMe;
return true;
}
while(elem){
if(elem->next == deleteMe){
elem->next = deleteMe->next;
delete deleteMe;
return true;
}
elem = elem->next;
}
// not found
return false;
}
Discuss the stack data structure, Implement a stack in C using either a linked list or a dynamic array, and justify your decision. Design the interface to the stack to be complete consistent and easy to use.
head and tail are global pointers to the first and last element respectively, of a singly linked list of integers. Implement C functions for the following prototypes:
bool remove (Element *e) bool insertAfter(Element *e, int data)
Given a singly-linked list, devise a time and space effecient algorithm to find the nth-to-last element of the list. Define nth to last such that when n=0, the last element of the list is returned.
This is why “Algorithms in a Nutshell” is a great book.
This recording has gotten more than 11000 plays. And neither I nor SC have any idea why. So much for big data analytics.
Mutant brown trout. Not a sufficient enough concern for the EPA to limit Selenium discharge from mines. postnatural
As my career advances on, something is becoming more and more obvious. I am surrounded by people younger than me. At 35, I am not middle-aged, I am an old man.
In my corporate environment, there is a preference to hire very young and right out of school. These kids are bright, hungry, and have almost no opinion about anything. Perfect to mold into the company’s image. The older “greybeards” are to be found huddled away maintaining systems that perhaps they have been working on for the last 20 years. A lucky few are leading groups, not so much programmers now, but managers.
What do athletes do when they are past thier prime? The sucessful ones invest in their own businesses, some teach or coach, but it seems that most are just bankrupt, divorced or unemployed. Programmers certainly have better chances than footballers, but I just do not have too many examples of career paths of programmers who are not .com superstars or reluctant managers.
-NOTES, NEEDS FINISHING-
Basic classification algorithm used for prediction of the probability of occurrence of an event by fitting data to a logit function logistic curve.
Hypothesis Representation
In logistic regression, we want the hypothesis to be $$0 \le h_\theta(x) \le 1$$
For this, we use the Sigmoid or Logistic Function
$$\begin{eqnarray} g(z) &=& { \frac{1}{1 + e^{-z}} } \\ h_\theta(x) &=& g(\theta^T x) \\ h_\theta(x) &=& { \frac{1}{1 + e^{-\theta^T x}} } \end{eqnarray}$$
![]()
Plot of \(g(z)\)
In Octave it looks something like the following
Example MapReduce classes in Scala
I spent much of the day today hung on two very simple and unrelated problems while running an HBase MapReduce job whose mapper collects data stored as JSON from HBase and writes it to the context as Text (serialized JSON). The reducer deserializes the data and does some long running calculations on it.
The primary reason these calculations must be run in the reducer is because the HBase scanner lease times out after 60 seconds. So if you are doing much of anything which requires computation in the map phase, the scanner will timeout and the job will fail. Lame. The other reason is that the calculation being done in the reduce phase does not parallelize right now, so the job must collect all the data it needs in the map phase (allowing batching of the job in different tasks).
Hangup #1: When serializing Lists, the concrete implementation chosen by Gson to deserialize is LinkedList. Type reflection will not work to cast it as any other concrete List like ArrayList. This may just be the product of the specific version of Gson I’m using, but there is evidence of this here:
http://groups.google.com/group/google-gson/browse_thread/thread/903d164d76ca1115/3649e02e4e0dd9d5?fwc=1&pli=1
However, this thread implies that List casting is indeed possible
http://stackoverflow.com/questions/5813434/trouble-with-gson-serializing-an-arraylist-of-pojos
This worked for me locally, but not on our remote cluster. I am packaging all the libraries as a jar and plopping it on the DistributedCluster… perhaps another version of Gson exists on the cluster. I dont know yet (I just finished banging my head on this one.)
Ill say this though, if you are serializing and deserializing Lists, you might as well just cast it its base class (List) unless you are using something very special in the concrete List implementation.
OK - on to hangup #2: Passing variables around in MapReduce jobs
This one is probably obvious to many, but if you need your mappers or reducers to retain some sort of state variable, like say the environment under which they are executing, you can pass them around in the Configuration instance using the “set” method from the main job class. In particular I was setting the environment passed in from a command line flag, which was properly configuring the main class, but not the mappers and reducers. Many folks seem to solve this issue by bootstrapping all sorts of environment variables with Bash, but I really dont like this. It spreads out the application.