Software System Design – reading list

1. The NoSQL Ecosystem

http://www.aosabook.org/en/nosql.html

2. Kafka documentation

http://kafka.apache.org/documentation.html

3. Questioning the Lambda Architecture

http://radar.oreilly.com/2014/07/questioning-the-lambda-architecture.html

4. Implementing a Real-time, Sliding-Window Application Using Amazon Kinesis and Apache Storm

http://d0.awsstatic.com/whitepapers/building-sliding-window-analysis-of-clickstream-data-kinesis.pdf

5. How to crawl a quarter billion webpages in 40 hours.

http://www.michaelnielsen.org/ddi/how-to-crawl-a-quarter-billion-webpages-in-40-hours/

REST APIs

tl:dr REST is a lightweight alternative to RPC and SOAP.

I’m reading about REST APIs. These are my notes from reading http://rest.elkstein.org/

I also liked the following series of youtube videos for an introduction to RESTful webservice apis.

https://www.youtube.com/watch?v=xkKcdK1u95s

Rest Architecture components:

1. Resources – identified by logical URLs.

2. Web of resources – a single resource should not be large, but instead contain links to other resources.

3. Client server

4. No connection state – interaction is stateless

5. Resources should be cachable. Use HTTP cache control headers.

6. There can be proxy servers.

Disabling 3D unity desktop on Ubuntu 14.04 LTS

I have started using Ubuntu again after 2.5 years. I started using a new installation of 14.04 LTS on a somewhat old box.

But I had forgotten to disable 3D unity desktop and suffered for few weeks. Ubuntu default installation no longer provides the option to choose desktop during login. But the instructions in the below links helped.

https://sites.google.com/site/easylinuxtipsproject/first

https://sites.google.com/site/easylinuxtipsproject/fatalmistakes

https://sites.google.com/site/easylinuxtipsproject/speed

I highly recommend disabling 3D unity desktop. The performance difference is staggering.

Approximate Stream Algorithms

The following paper by Motwani et al gives couple of algorithms for computing frequency counts exceeding a user specified threshold over data streams.

1. Sticky sampling.

2. Lossy counting.

http://www.vldb.org/conf/2002/S10P03.pdf

Lossy Counting:

bucket width w = ceil (1 / epsilon)

fe = frequency of element e in the steam, whose current count is N.

bcurrent = ceil (N / w)

We store the following (e, f, delta) in Data structure D. where e is element, f is freq and delta is maximum error in f.

1. Initially D is empty.

2. When a new item arrives, if its already in D, we increment its f by 1. Else, we create a new entry (e, f, bcurrent -1). We also prune D by deleting some entries at bucket boundaries (N congruent 0 mod w). Delete if f + delta <= bcurrent.

3. When user requests a list of items, with thresold s, we output entries in S, whose freq f is >= (s – epsilon)N.

****

The above paper led to other algorithms including the one by Eric Demaine et al.

http://erikdemaine.org/papers/NetworkStats_TR2002/paper.pdf

Algorithm Frequent

1. Initialize the counters to zero.

2. For each element in the stream:

a. If the current element is not monitored by any counter and some counter is zero, define the current element to be the monitored element of that counter.

b. If the current element is the monitored element of a counter, increment the counter. Otherwise decrement every counter.

****

The related “Britney Spears problem” which the above two papers solve is discussed in the below article.

http://www.americanscientist.org/issues/pub/the-britney-spears-problem/99999

Create a free website or blog at WordPress.com.

Up ↑