Parsing Akamai logs using Azure HD Insight Spark Cluster.

You have seen many videos on Hadoop/Spark cluster, where a ubiquitous example for map reduce is used of counting the words “Banana” from a clean text files. But, in real-life your log files are not this clean, and they are not on cluster itself. Clusters are expensive affairs, so how do you programmatically create cluster and automate your processing?

Here is a presentation about developing a real-life application using Spark cluster. In this presentation, we will parse Akamai logs kept on an Azure storage. We will introduce some of the tools available. After having the script run in Jupyter notebook, we will automate the solution which can be started by a call to an endpoint.

Watch it here:




Thinking, fast and slow


In “Thinking, fast and slow”, Nobel prize winner, Daniel Kahneman talks about our mind. To explain our mind inner workings, he defines two actors, he divides our mind into two systems, “System 1” and “System 2”. “System 1” is fast, instinctive and emotional; It operates automatically and quickly, with little or no effort and no sense of voluntary control. “System 2” is slower, more deliberative, and more logical.

He describes System 1 as effortlessly originating impressions and feelings that are the main sources of our beliefs and choices. He describes System 1 with some examples of automatic activities:

  • Detect that one object is more distant than another.
  • Orient to the source of a sudden sound.
  • Detect hostility in a voice.
  • Answer to 2 + 2 = ?
  • Drive a car on an empty road.

System 2 require attention and get disrupted when attention is drawn away. Here are some examples:

  • Focus on the voice of a particular person in a crowded and noisy room.
  • Monitor the appropriateness of your behavior in a social situation.
  • Count the occurrences of the letter a in a page of text.
  • Tell someone your phone number.
  • Park in a narrow space
  • Fill out a tax form.
  • Check the validity of a complex logical argument.

Systems 1 and 2 are both active whenever we are awake. System 1 runs automatically and requires very less effort and energy from you. System 2 is expensive, slow and needs much  more effort and burn most of the glucose in your brain. It is hard to activate System 2. When all goes smoothly, which is most of the time, System 2 adopts the suggestions of System 1 with little or no modification. You generally believe your impressions and act on your desires.

When System 1 runs into difficulty, it calls on System 2 for help. System 2 is activated when System 1 does not offer an answer, e.g. what is 17 × 24?

System 2 is also credited with the continuous monitoring of your own behavior. You need System 2 for self-control. That’s why you need attention and effort for self control.

Active mind engages System 2 more often. However, majority of people, 50% of students at Harvard, MIT and Princeton avoided to activate their System 2. The number goes to 80% in less selective universities. Now, you can imagine where does the general population must  be standing.

You can raise your intelligence by improving the control of attention.

In other words, to be mentally active, to be intelligent means you are  engaging your System 2 more often than the norm.

Specially at work, where you are hard pressed for time, people are talking, communicators popping up messages, continuous mails are flowing in  and you need to reply quick. You are surviving because of your System 1, but in a hurry you may be making many mistakes too.

Now the question remains how to engage System 2 more often and rely less on System 1?

One easy technique, I found is for any issue at work, any design decisions, you quietly open a note book in your mind, divide the page  into two columns, “pro” and “cons”, and start listing the pros and cons of any issue. This act will actually activate your System 2. This will often makes you slow, but the outcome will be much better.

If system 2 is so important, then how can I keep it in good shape and lubricated the machinery to activate it more often with ease?

I think, to keep System 2 engage continuously is to remain mindful.

Continuously keep watching yourself.  Keep watching your thoughts. To keep System 2 healthy – meditate daily!

The easier for you to activate your System 2 the better you will be.

System 2 makes you thoughtful and intelligent.


Continue reading

The Role of Development Manager

Just other day, I was talking to a friend of mine who transitioned from the role of a world class Developer to be a Development Manager. He is struggling with his new role and questioning  his contribution. He came from the mindset, if there is no code produced by him, then he has not contributed.

This blog entry  is answering the narrow question of new Development Managers, who recently transitioned from dev and wondering about their contribution when their output is not code.

I tried to explain to him –

“The role of a Development Manager is like a conductor of an orchestra. The conductor never produces a sound himself but rather guides others to create sound.

I think as a development manager your role is more of an enabler. I see a good manager as a  training wheel on a kids bike, they come in play only when kids is out of balance.

If you are a Dev Manager of a large team, with multiple projects under you, then you cannot code the product by yourself.  You need to make sure to have people in team who can help you achieve your goal. The first quality of a Leader is to get work done from others by inspiring them.

A strong Dev. Manager cannot be the excited developer who runs after every new and shiny objects (new frameworks/languages/new paradigms), but instead you are the adult in team, you must accept new frameworks and adopt technologies deliberately and with caution. Every theoretical claims must be backed by empirical data. Often you contribute not only by adding features to product, but what you decided not to do.

You are the voice of customer. You must have extreme empathy with customers. Technology for the sake of technology is of no use.

Like the conductor, who understands every instrument, and their role in symphony, you need to understand every aspect of development.  You need to know what’s the best language to use to solve a certain problem. You need to know about testing, you need to be a good Product and a good Project Manager.

You need to know your team, their passion, their weakness and their strengths.Your job becomes more interesting as you not only deal with machines and software but the humans are involved. They are the most sophisticated machines with emotions and attitudes.

Like a good conductor, you should be able to play any instrument and show what exactly you want. You should be able to code but while not coding for the final symphony. You should be able to tell a bad design from a good design. You should be able to tell how an algorithm be improved. How a shiny code will crumble in production. You keep an eye for maintainability, security, extensibility, complexity and performance implication of any design.

A note of caution, don’t manage too much. Don’t be in every decision. Have smart people in your team, and get the heck out of their way. I think, your success is not measured by how many right decisions made, but how many right decisions made without you!

Sometimes big complex projects are more difficult than conducting a Symphony.

In software development, often symphonies are written as we go along.

Often end product is not well defined, no one knows how it will look in the end. As a Dev Manager, it becomes your job to explain, motivate, and guide the team to build something which is not known or seen by anyone. Often different products, developed by different teams, in different countries, by different people come together forming a perfectly beautiful product.


Communicating with IFrame

Creating IFrame in a web page is easy and it is also easy to have one way communication with it. WebPage can send a message to IFrame using PostMessage, but IFrame cannot send the message back to parent window. It is not allowed.



There are scenario’s where you may need to get the message back from IFrame. In this post, i will be talking about how you can have a two way communication with IFrame.

As we know, IFrame cannot send the message back to the root web, then we are left only with one option to go out to a service  and let that service send the message back to the root web. To have a good performance, we need to use web sockets for this communication. As you can see in the picture, the web page send the messages to IFrame using PostMessage, and when IFrame need to send any message back, it calls our SignalR server, which in turn calls the client method on parent web page.

You can find the full code here. Code is written using Typescript for web page pages, and C# for SignalR server. The code shows you how  JavaScript client can send and receive messages with SignalR server. In IFrame I load another web site, which is called “Extension B” in code.  First run SignalR server by pressing F5 in Visual studio. And then run the ExtensionB site and in last run the RootWeb site. Enjoy!

Requirements of RequireJS

I got couple of errors in my journey to use RequireJS. Looking at stackoverflow, I see many people stumble thru these problems. Many problems stems thru not reading the complete documentation of ‘require’ and lot of wrong and incomplete noise on internet. When you scroll thru RequireJS site, and see an example, you say oh, yea this is so simple, and with out reading much detail you jump into coding. In this article, I will go thru commons errors you may receive and their fix. In end and we will reach to a working code.

This code is referring to ECMA script5 using TypeScript 1.6. You choose ECMA version in project properties -> “TypeScript Build” pane in visual studio.

Here is my simple ‘Hello world’ application

<!DOCTYPE html>
<html lang="en">
         <script src="scripts/require.js></script>
       <div id="content"></div>

Here is my simple module

ModuleA {
      export class A {
        public static add(x: number, y: number): number {
           return x + y;

here is app.ts file, which is referring to ‘ModuleA’ which is sitting in scripts/js directory.

/// &lt;reference path="scripts/js/modulea.ts"&gt;
/// &lt;reference path="scripts/typings/requirejs/require.d.ts" &gt;

import moduleA = require('./scripts/js/ModuleA');
import _mA = moduleA.ModuleA;

window.onload = () => {
document.getElementById('content').innerHTML = "Answer:" +         _mA.A.add(2, 2).toString();

This is as simple as you can get. But when I tried to run this sample, my first build error was

TS1148 Cannot compile modules unless the ‘–module’ flag is provided.

This is simple, you have to go to Project properties by right clicking on project, choose “typescript build” and then choose “AMD” in module system.

After fixing this, I got the next error:

Error TS2306 File ‘E:/code/Require/Require/scripts/js/moduleA.ts’ is not a module. Require E:\code\Require\Require\app.ts 4

And you look at your code and it clearly says it is a module.  But soon you realize to put an ‘export’ in front of Module.

export ModuleA {
   export class A {
          public static add(x: number, y: number): number {
            return x + y;

Now, the code is compiling and running, but there is nothing just a blank page. Nothing on console, no clue what’s going wrong, it is just a blank page. There was no other option for me other than to go back and read RequireJS slowly and thoroughly and then I realized my mistake. I forgot “data-main” attribute in my html.

<script src="scripts/require.js" data-main="app"></script>

But, after fixing this, i run the application but there is again nothing on the page. But, now i can hit the break-point in my app.ts file. I hit the breakpoint and then F5, but nothing on the screen.

Back to RequireJS Site, now again reading their documentation slowly and thoroughly. And finally i hit to this :

It is possible when using RequireJS to load scripts quickly enough that they complete before the DOM is ready. Any work that tries to interact with the DOM should wait for the DOM to be ready. For modern browsers, this is done by waiting for the DOMContentLoaded event.

However, not all browsers in use support DOMContentLoaded. The domReady module implements a cross-browser method to determine when the DOM is ready. Download the module and use it in your project like so:

/// <reference path="scripts/js/modulea.ts" />
/// <reference path="scripts/typings/requirejs/require.d.ts" />

import moduleA = require('./scripts/js/ModuleA');
import _mA = moduleA.ModuleA;

require(['./scripts/domReady'], function (domReady) {
domReady(function () {
     document.getElementById('content').innerHTML =                      "Answer:" + _mA.A.add(2, 2).toString();

Who knew that document.onload which works every where will not work with RequireJS.

However, using “domReady” solved the problem.


Parsing complex JSON

We were building a UI control which gets data from JSON. To keep things simple, we wanted our users to specify JSON in a simple “.” manner e.g. if JSON looks like this

var obj = {
     name: 'Rafat Sarosh',
     address: 'Redmond. Seattle'

then the HTML can be as simple as this:

<my-control column='name' data='name'>;

and then in our directives we can parse the JSON by simply writing row[data] and get the data. Here row is referring to the JSON obj and ‘data’ is referring to ‘name’. Effectively, row[data] will translate to obj[name] in my directive. So far so good. This all works fine, till you get a simple JSON, and you are more or less certain of the shape of JSON. The user of my directive is happy to write a simple dot notation in his HTML to access the data. But imagine, if JSON is little complex. e.g.

var obj = {

      f1: "Angular",
      f2: { 
             "name": "Rafat", 
             "address": [{ city: 'Seattle'}, {zip: '98059' }] 
      f3: {
             "name": "Tim", 
             "address": [{ city: 'Redmond'}, {house: () => "I am function" }]
      f4: [1, 2, 3]

then it becomes little difficult to write a simple parsing function. Imagine the following test cases if user want to access the data as follows:

'f3.address[1].house')()  //Should return "I am function"

'f3.address[0].city' //Should return 'Redmond'

'' //Should return 'Rafat'

'f4[2]' //Should return 3

'f1' //Should return 'Angular'

So here is the code to parse JSON of any complexity.

function getValue(object: any, path: string, replace: boolean = true): T {

  //convert, all the .,[,] to * so I can break the strings on *
  if (replace) //A switch so I don't do this every time in my recursive loop
      path = path.replace(/\]\.|\.|\[|\]/g, "*");
  var index = path.indexOf('*');
  if (index === -1)
      return object[path];

  //Get the first segment
  var firstSegment= path.slice(0, index);
  var remainingSegments= path.slice(index + 1);

  if (object[firstSegment] !== undefined && remainingSegments.length > 0) {
       return getValue(object[firstSegment], remainingSegments, false);
  } else
       return object[firstSegment];

Happy Coding!!!

Angular Directive – Table

I have to develop an application where most of the data is displayed in a table. Putting data in a table is pretty straight forward, but when you have multiple pages then you would like to have a table control which can be driven by data and added to pages with one line of HTML. Something like this:

<rmt-table data="data" columns="columns" on-row-clicked="RowClicked(Id)"></rmt-table/>


To understand this post, get the full working code for the table directive from this plnkr. This directive takes two arrays of objects, name of columns, and the real data. This table directive has encapsulated paging and sorting functionality. Now dynamically you can supply different data and the page will show you different tables.

Code is straight forward, but I learn few things on the way. My first issue, was to get click on the table back to main controller with the row Id value. Unfortunately, this does not work as straight forward as you may think.

In main controller I have defined a function as follows:

  $scope.RowClicked = function(id) {

and as you can see in html, this function is defined on on-row-clicked.

<rmt-table data="data" columns="columns" on-row-clicked="RowClicked(Id)"></rmt-table/>

There are many angular courses and documentation where people are advised not to pass the parameter in the function, unfortunately this advice does not hold good anymore. Not only you have to pass the parameter in the function but the parameter name should be exact same as defined in your function definition.

Here is the function in directive:

angular.module('app').directive('rmtTable', function() {
  return {
    restrict: 'E',
    templateUrl: 'myTable.html',
    scope: {
      originalData: '=data',
      columns: '=columns',
      notifyParent: '&onRowClicked',
 snip ....snip 

      $scope.rowClicked = function(row) {
          Id: row.Id

Line 8, wire the external function to the directive. Line 13, calls the external function with the row id value. Here is the rub, you would think you could have called the function as follows:

      $scope.rowClicked = function(row) {

But this will not work, this straight and simple way of calling function will not work, no matter what god you believe in :) You may think, that you can do some more javascript jugglery like

      $scope.rowClicked = function(row) {

but still this will not work. It has to be a object. And make sure your object, key must match the parameter you are passing in the function, in my case it is “Id”. Thanks to Dan Wahlin who taught me this trick.

My second issue was to debug the scripts in plnkr. By default, it shows the output in the same window, and if you press F12 you are presented by hundreds of minified JS. Fortunately, in the display IFrame you will find a blue color button which will ‘launch the preview window in a separate window’. In this separate window, it is just you and your code. Much easier to debug the issues.

Go ahead and play with this plnkr. Remember, this is a prototype, there are many things can be improved (for example, max row can also be supplied to directive), this is just to get you started on table directive.

Happy coding!

Tree View control with JavaScript and Knockout for large trees

[To understand this post, get the code from git]

Few days back, I needed  a Tree View, which should be able to show tree with thousands of nodes, a big tree, a really very big tree.

Most of the existing tree views used to crash my browser, they are making the naive mistake to load all the nodes in one shot. The simple solution was to load the nodes on demand, and when you open a node with thousands of children, and then you open another node which again has thousands of children, it is better to close the previous node and let go all their children than crash the browser.

Now, this tree automatic closing the previous node can have little more intelligent, and not necessarily close the node if the number of its children is small. So this tree is little intelligent in few aspect, it loads the children on demand, closes and flushes the nodes which creates the danger of crashing the browser. This tree trade performance over reliability.

To build a tree with JavaScript and  Knockout, you need to understand how knockout templeting works. The template binding populates the associated DOM element with the results of rendering a template. Templates are a simple and convenient way to build sophisticated UI structures. String-based templating is a way to connect Knockout to a third-party template engine. Knockout will pass your model values to the external template engine and inject the resulting markup string into your document.

If you don’t know about KO templating, then you should stop here and read more about it here.

To create a new string template engine, start with an existing nativeTemplateEngine.

CreateStringTemplateEngine method does that task.

SetEngine() {
         this.customTemplateEngine = TreeTempate.createStringTemplateEngine(
                new ko.nativeTemplateEngine(), this.templates);

Inside createStringTemplateEngine, we use passed templateEngine and override it’s makeTemplateSource function to returns our stringTemplate, as follows:

     templateEngine.makeTemplateSource = function (templateName) {
           return new stringTemplate(templateName, templates);

So again, take a nativeTemplateEngine of ko, and override it’s makeTemplateSource and return your own stringTemplate. This makeTemplateSource method is called from ko.renderTemplate function, which in turn calls your template “text” method.

Here is how template engine is implemented, you just have to implement one methods ‘text’:

class stringTemplate {

private _templateName: string;
private _templates: any;

constructor (templateName, templates) {
    this._templateName = templateName;
    this._templates = templates;
text (value) {
    if (arguments.length === 0) {
        return this._templates[this._templateName];
    this[this._templateName] = value;

Templates is defined as follows:


Pay close attention to how templates are recursively calling each other to draw the tree. First template (tree) calls nodes and nodes calls nodeCore to draw the real node. NodeCore has two calls one is to subNodes, and the other one is to nodeContent. Writing this recursive template is key to drawing your tree on the browser.

KO will call stringTemplate, text method again and again with the template names it keep encountering and we will keep returning their value e.g, for ‘tree’ text will return ”

The html of the page is as simple as this:

       <div data-bind="template: { name: TreeTemplate }"  />

 $(document).ready(function () {
        var d = new data.Data();
        var vm = new Tree.viewModel(d);

This is how the tree will look.

When the Ko hit the following html  data-bind=”template: { name: TreeTemplate }”  It start processing the ‘tree’ template in templates. Which says for each item, please apply the node template. And then node template says for data apply the nodeCore template, and this templating continues on till we reach to nodeContent.

Rest of the code deals with clicking on nodes and loading children on demand, checking the status of tree, and automatically closing nodes which are too big and pose a threat to crash the browser. These cut-off number are configurable in the code, and you should be able to change and play with it. Interesting part of the code is many of the recursive functions. These functions need to traverse down thru the child nodes before they close a parent node.

Hopefully, this will get started you on your journey of creating a TreeView with KO and JavaScript.