Reading Cosmos DB Change Feed

To track the changes on the Cosmos DB, you can reads it’s change feed. The change feed support in Azure Cosmos DB enables you to build efficient and scalable solutions.

Azure Cosmos DB change feed provides a sorted list of documents within an Azure Cosmos DB collection in the order in which they were modified. This feed can be used to listen for modifications to data within the collection and perform any action. The change feed is available for each partition key range within the document collection, and thus can be distributed across one or more consumers for parallel processing. Once you get the document which is changed, sky is the limit. You can send that document to Azure Notification hub or trigger any other process.

You can read the change feed in three different ways:

  1. Using the CosmosDB SDK.
  2. Using the Change Feed Processor SDK
  3. Using the serverless Azure function.

In this article we will discuss first two option and subsequent blog post will address the last serverless Azure function options.

I am keeping this article very short and to the point. I am quickly showing you the code snippet which you need to write to get started on reading the change feed. At the end of article, you will find a link to the full working code.

Azure Cosmos DB SDK
Download CosmosDB SDK 

This SDK gives you all the power to read the change feed, but with power comes lots of responsibilities too. If you want to manage checkpoint, and deal with sequence number of documents and have granule control over partition keys then this may be the right approach.

So let’s get started, Read the database, collection name etc from appconfig. You will get this information from Azure portal.

DocumentClient client;
string DatabaseName = ConfigurationManager.AppSettings["database"];
string CollectionName = ConfigurationManager.AppSettings["collection"];
string endpointUrl = ConfigurationManager.AppSettings["endpoint"];
string authorizationKey = ConfigurationManager.AppSettings["authKey"];

Make the client as follows:

using (client = new DocumentClient(new Uri(endpointUrl), authorizationKey,
new ConnectionPolicy { ConnectionMode = ConnectionMode.Direct, ConnectionProtocol = Protocol.Tcp }))
{
}

and then get the partition key ranges

FeedResponse pkRangesResponse = await client.ReadPartitionKeyRangeFeedAsync(
                                      collectionUri,
                                      new FeedOptions
                      {RequestContinuation = pkRangesResponseContinuation });

partitionKeyRanges.AddRange(pkRangesResponse);
pkRangesResponseContinuation = pkRangesResponse.ResponseContinuation;

and then just call ExecuteNextAsync for every partition key ranges


 foreach (PartitionKeyRange pkRange in partitionKeyRanges){
                string continuation = null;
                checkpoints.TryGetValue(pkRange.Id, out continuation);
                IDocumentQuery<Document> query = client.CreateDocumentChangeFeedQuery(
                    collectionUri,
                    new ChangeFeedOptions
                    {
                        PartitionKeyRangeId = pkRange.Id,
                        StartFromBeginning = true,
                        RequestContinuation = continuation,
                        MaxItemCount = -1,
                        // Set reading time: only show change feed results modified since StartTime
                        StartTime = DateTime.Now - TimeSpan.FromSeconds(30)
                    });
                while (query.HasMoreResults)
                {
                    FeedResponse<dynamic> readChangesResponse = query.ExecuteNextAsync<dynamic>().Result;

                    foreach (dynamic changedDocument in readChangesResponse)
                    {
                        Console.WriteLine("document: {0}", changedDocument);
                    }
                    checkpoints[pkRange.Id] = readChangesResponse.ResponseContinuation;
                }
}

If you have multiple readers, you can use ChangeFeedOptions to distribute read load to different threads or different clients. This is it, with these few lines of code you will start reading the change feed. Get the code from here.

Here the last line ResponseContinuation has the last logical sequence number (LSN) of the document, which will be used next time to read new documents after this sequence numbers. Using StartTime of ChangeFeedOption you can widen your net to get the documents. So, If your ResponseContinuation is null, but your StartTime goes back in time then you will get all the documents change since StartTime. But, if your ResponseContinuation has a value then system will get you all the documents since that LSN.

Side Note: One more thing to note, ETag on FeedResponse is different than the _etag you see on the document. _etag is an internal identifier and used to concurrency, it tells about the version of the document and ETag is used for sequencing the feed.

So, you see your checkpoint array is just keeping LSN for each partition. But if you don’t want to deal with the partitions, checkpoints, LSN, Startime etc the simpler option is to use the Change Feed Processor Library.

 Using Change Feed Processor Library 

Azure Cosmos DB Change Feed Processor library, can help you easily distribute event processing across multiple consumers. This library simplifies reading changes across partitions and multiple threads working in parallel.

The main benefit of Feed Processor Library is that you don’t have to manage the each partition, continuation token etc and you don’t have to poll each collection manually.

The FP Library simplifies reading changes across partitions and multiple threads working in parallel.  Change Feed Processor automatically manages reading changes across partitions using a lease mechanism. As you can see in the below image, If I start two clients who are using Processor Library they divide the work among themselves. You can keep increasing the clients and they can keep dividing the work among themselves.

ChangeFeed2.PNG

I started the left client first and it started monitoring all the partitions, then I started the second client and then first let go some of the leases to second one. As you can see this is the nice way to distribute the work between different machines and clients.

To implement Feed library you have to do following:

  1. Implement a DocumentFeedObserver object which implements IChangeFeedObserver.
  2. Implement a DocumentFeedObserverFactory, which implements IChangeFeedObserverFactory.
  3. In the CreateObserver method of DocumentFeedObserverFacory, instantiate ChangeFeedObserver which you made in step 1 and return it.
  4. Instantiate DocumentObserverFactory.
  5. Instantiate a ChangeFeedEventHost
    ChangeFeedEventHost host = new ChangeFeedEventHost(
                     hostName,
                     documentCollectionLocation,
                     leaseCollectionLocation,
                     feedOptions,
                     feedHostOptions);

    Register the DocumentFeedObserverFactory with host.

That’s it. After these few steps you will start seeing the document come in DocumentFeedObserver ProcessChangesAsync method.

Here is the code for step 3.


public IChangeFeedObserver CreateObserver()
{
          DocumentFeedObserver newObserver = new DocumentFeedObserver(this.client,                                                                                         this.collectionInfo);
          return newObserver;
}

 

Here is the code for step 4 & 5


ChangeFeedOptions feedOptions = new ChangeFeedOptions();
feedOptions.StartFromBeginning = true;

ChangeFeedHostOptions feedHostOptions = new ChangeFeedHostOptions();

//Customizing lease renewal interval to 15 seconds. So the if
feedHostOptions.LeaseRenewInterval = TimeSpan.FromSeconds(15);

using (DocumentClient destClient = new DocumentClient(destCollInfo.Uri, destCollInfo.MasterKey))
{
        DocumentFeedObserverFactory docObserverFactory = new DocumentFeedObserverFactory(destClient, destCollInfo);
        ChangeFeedEventHost host = new ChangeFeedEventHost(hostName, documentCollectionLocation, leaseCollectionLocation, feedOptions, feedHostOptions);
        await host.RegisterObserverFactoryAsync(docObserverFactory);
        await host.UnregisterObserversAsync();
}

Complete code you will find here which shows step 1 & 2 and all other steps.

The best option to read the change feed of your collection is to use server less function of Azure. Now Azure functions and Cosmos have native integration.

Advertisements

How focused are you?

Your focus is the most precious thing you have. To be able to focus on one task at a time is the most important skill you need to have.

However, today we have more distractions than ever before. I wrote an application to measure how much time I spend focused on doing one task.  Here is a screen shot of my activity, it tells me how much my mind is thrashing,  and I am jumping from one application to another.

My aim is to see big block of colors on this application which tells me If I was focused on one thing or not. Too many different color lines means I am doing too much time-slicing and I am distracted.

flower

Every 10 sec, this application is making a log of my activity on my machine, and it shows an activity graph with different colors of lines per activity. This is a very simple application, It just log my activity on my PC. Code is on GitHub and you can download this application here.

Here are list of things, you can do to be less distracted:

  • Turn off all the notification on the PC and phone
  • Absolutely, you must turn off email. It is a productivity killer. I check my mail every few hours.
  • Don’t check twitter, FB and other social sites too often.
  • One most annoying distraction is when you open a web page to do something, and it by default start showing you todays news … and you end up reading the news and realize after 15 min that you open the web page to do something important. Turn off the news on default new page.
  • I know Slack, Team, Skype and communicator all are important part of your work, but I say you should turn them off too to avoid being distracted.

Webpack tutorial for beginners

Learn Webpack basics. In this screen cast we start with plain vanilla JavaScript, and then convert the code to different modules AMD and then ES6 module. And on every step of the way we will see how webpack makes our life easy and how seamlessly it works with different module system.

We will also learn how webpack is not only good for module bundling but how it can be extended for CSS, SASS and Lint etc.

webpack

Humans of Microsoft – Edward Sproull

Here is the second episode of Humans of Microsoft.

HOM_Ed.PNG

Let’s meet the man who inspired this show  “Humans of Microsoft”.

When I first come to know about Edward Sproull, I thought I must bring his story to more people, it is such an inspiring and positive story.
Ed Sproull was 27 when he woke in the hospital, disoriented and hung over. He felt a searing pain in his left leg, and suddenly the memory of the motorcycle accident came flooding back to him. He had been racing drunk when the brake line blew out, and he clipped his leg on an oncoming car. His upper leg had been fractured in 80 places, and his lower leg had been severed below the knee.
Losing a limb was a low point in Ed’s life, but he didn’t hit rock bottom until 10 years later, when he landed in prison on a six-year sentence for drug possession. As he lay on a prison cot that first night, using a toilet paper roll as a pillow, he wondered where his life went off the rails.
Ed couldn’t see a way out of this one; his future felt dim. That’s when he met the professor.
Let’s find out Ed’s journey from Prison to Microsoft.
Watch it here.

Humans of Microsoft: Jeffrey Snover

Jeffrey Snover, is a well-known figure of Microsoft. He is the inventor of PowerShell, Architect of Windows server and a Technical Fellow of Microsoft. In this show, we will go beyond the technology and try to meet the human behind the window server and PowerShell. We hope, in this personal and candid interview we will get inspired by his personal story and learn one or two things in the process. Let’s start the conversation and see what Jeffrey’s last tweet will be?

HJOM_JS.PNG

Watch it here.

Parsing Akamai logs using Azure HD Insight Spark Cluster.

You have seen many videos on Hadoop/Spark cluster, where a ubiquitous example for map reduce is used of counting the words “Banana” from a clean text files. But, in real-life your log files are not this clean, and they are not on cluster itself. Clusters are expensive affairs, so how do you programmatically create cluster and automate your processing?

Here is a presentation about developing a real-life application using Spark cluster. In this presentation, we will parse Akamai logs kept on an Azure storage. We will introduce some of the tools available. After having the script run in Jupyter notebook, we will automate the solution which can be started by a call to an endpoint.

Watch it here: https://channel9.msdn.com/Blogs/Beyond-Hello-World/Parsing-Akamai-logs-using-Azure-HD-Insight-Spark-Cluster

 

akamaiazurehdinsightsparkcluster_9601

Thinking, fast and slow

414_3402

In “Thinking, fast and slow”, Nobel prize winner, Daniel Kahneman talks about our mind. To explain our mind inner workings, he defines two actors, he divides our mind into two systems, “System 1” and “System 2”. “System 1” is fast, instinctive and emotional; It operates automatically and quickly, with little or no effort and no sense of voluntary control. “System 2” is slower, more deliberative, and more logical.

He describes System 1 as effortlessly originating impressions and feelings that are the main sources of our beliefs and choices. He describes System 1 with some examples of automatic activities:

  • Detect that one object is more distant than another.
  • Orient to the source of a sudden sound.
  • Detect hostility in a voice.
  • Answer to 2 + 2 = ?
  • Drive a car on an empty road.

System 2 require attention and get disrupted when attention is drawn away. Here are some examples:

  • Focus on the voice of a particular person in a crowded and noisy room.
  • Monitor the appropriateness of your behavior in a social situation.
  • Count the occurrences of the letter a in a page of text.
  • Tell someone your phone number.
  • Park in a narrow space
  • Fill out a tax form.
  • Check the validity of a complex logical argument.

Systems 1 and 2 are both active whenever we are awake. System 1 runs automatically and requires very less effort and energy from you. System 2 is expensive, slow and needs much  more effort and burn most of the glucose in your brain. It is hard to activate System 2. When all goes smoothly, which is most of the time, System 2 adopts the suggestions of System 1 with little or no modification. You generally believe your impressions and act on your desires.

When System 1 runs into difficulty, it calls on System 2 for help. System 2 is activated when System 1 does not offer an answer, e.g. what is 17 × 24?

System 2 is also credited with the continuous monitoring of your own behavior. You need System 2 for self-control. That’s why you need attention and effort for self control.

Active mind engages System 2 more often. However, majority of people, 50% of students at Harvard, MIT and Princeton avoided to activate their System 2. The number goes to 80% in less selective universities. Now, you can imagine where does the general population must  be standing.

You can raise your intelligence by improving the control of attention.

In other words, to be mentally active, to be intelligent means you are  engaging your System 2 more often than the norm.

Specially at work, where you are hard pressed for time, people are talking, communicators popping up messages, continuous mails are flowing in  and you need to reply quick. You are surviving because of your System 1, but in a hurry you may be making many mistakes too.

Now the question remains how to engage System 2 more often and rely less on System 1?

One easy technique, I found is for any issue at work, any design decisions, you quietly open a note book in your mind, divide the page  into two columns, “pro” and “cons”, and start listing the pros and cons of any issue. This act will actually activate your System 2. This will often makes you slow, but the outcome will be much better.

If system 2 is so important, then how can I keep it in good shape and lubricated the machinery to activate it more often with ease?

I think, to keep System 2 engage continuously is to remain mindful.

Continuously keep watching yourself.  Keep watching your thoughts. To keep System 2 healthy – meditate daily!

The easier for you to activate your System 2 the better you will be.

System 2 makes you thoughtful and intelligent.

 

Continue reading