Papernotes: Maximum Likelihood Estimation of Observer Error-Rates Using the EM Algorithm

Oops, I just forgot to save my notes as draft. Will come back to it later since this seems to be a very important paper. And something I found hard to grasp.

This is the notes for the following legendary paper by Dawid and Skene.


Someone has implemented Dawid Skene’s example problem (patients). With excellent comments.

Another implementation is available on pypi.

And DS is so useful that someone tried to offer DS as a service! Though it doesn’t seem to be working now.

Paper Notes: Discovering Human Places of Interest from Multimodal Mobile Phone Data

Notes from reading the below paper.


A framework to discover place of interests (POIs).

Two levels of clustering are used.

First, user location points are grouped using a time-based clustering
technique which discovers stay points while dealing with missing
location data. The second level performs clustering on the stay
points to obtain stay regions. A grid-based clustering algorithm
has been used for this purpose.


1. Location Point – GPS coordinates along with [46.6N, 6.5E], [16:34:57]

2. Stay Point – cluster of location points. Represented using the coordinates of the centroid of the cluster and the time moments when the user arrived and left the stay [46.6N, 6.5E], [16:34:57], [17:50:12]

3. Stay Region – a cluster of stay points (from several days) with the same semantic meaning. Represented using the coordinates of the centroid of the cluster and the minimum and maximum coordinates of the stay points in the cluster. [46.6N, 6.5E], [46.595N, 46.599N], [6.498E, 6.502E]. Hence this can be represented by a rectangle centered at the centroid of the cluster whose size depends on the min and max coordinates.

(In this paper, stay region and place of interest are synonymous.)

Finger print based methods for location point detection:

These use data from GSM and Wifi sensors as opposed to geometry based methods that use GPS sensors. But exact location cannot be obtained.
1. BeaconPrint

2. PlaceSense

Time based method for stay points:

Dmax – 200 to 300 metresTmin –  time at a location – 20 to 40 minutesTmax – time between locations

Clustering algorithms for stay regions:

To estimate stay regions from stay points, couple of clustering algorithms are used.
1. k-means or variants

2. density based method (DBSCAN) – disadvantage: merges stay points with different semantic meaning in the same clusters.

3. grid based method (this performs the best)

Paper Notes: Ranking annotators for crowdsourced labeling tasks

Notes from reading the below paper by Raykar et al.

1. To get quality labels, requesters typically get the same instance labeled by multiple annotators and use majority voting or other methods to consolidate the annotations.

2. This paper ranks how spammer like each annotator is. Uses a scalar metric with spammer having score close to zero, good annotators close to 1.

3. This score is different from the accuracy computed from confusion rate matrix.

Annotator Model:

For example, this paper uses a binary classification model. If the ground truth is 1, then the sensitivity is defined as the probability that the model labels it as 1. If the ground truth is 0, then the specificity is defined as the probability that the model labels it as 0.

Spammer: The probability of the observed label is one irrespective of the true label. A perfect spammer has alpha + (beta – 1) = 0. This corresponds to diagonal line on the ROC curve.

Malicious annotator: if alpha + (beta – 1) < 0, then the annotator is a malicious one. Hence spammer score S = square(alpha + (beta – 1)). A spammer has S = 0. A perfect annotator has S = 1.

Accuracy: The above spammer score is different from accuracy. An annotator with high accuracy is a good annotator but a low accuracy annotator is not necessarily a spammer. A malicious annotator may have high accuracy if we flip their labels.

A spammer provides no information in updating the log odds, and hence does not contribute in the estimation of the actual true label.

Spammer score for categorical labels:

For multinomial categories, the term alpha (k, c) is the probability that the annotator assigns instance to category k, when the true label is c. In other words, annotator does not care if true label is c or c prime.

Spammer score for ordinal labels:

Note: a restaurant rating between 1 to 5 is ordinal. A spammer score can be given to ordinal labels just like binary classification, just that we give a score to each of the ordinals.


Experiments used Expectation Maximization algorithm. The annotator model parameters can be estimated using EM and these can be used to calculate the spammer score. The authors reported 95% Confidence Intervals for the spammer score for each annotator.

Paper Notes: Learning from Crowds

Notes from the above named paper by Raykar et al.


We do not have the ground truth. Namely, the actual annotation for an
instance. We instead have multiple subjective annotations for the

Contributions of this paper:

1. How to adapt conventional supervised learning algorithms when we have
multiple annotators providing subjective labels but no objective gold

2. How to evaluate systems when we do not have absolute gold standard.

3. Estimate how reliable or trustworthy is each annotator?

Problem with Majority voting:

Imagine there is an expert and a number of novices. Majority voting will favor
novices. A solution to this is to give weights. Still how to measure the
performance of the expert when there is no gold standard?

Maximum likelihood Estimator – jointly learns the classifier, annotator accuracy, true label.

1. Performance of each annotator is measured in terms of sensitivity and specificity with respect to the unknown gold standard.

2. Algorithm to automatically discover best experts and assign higher weights to them.

3. To incorporate prior knowledge about an expert, impose a beta prior on
the sensitivity and specificity and derive the maximum a-posteriori

4. Final estimation by an Expectation maximization algorithm.

Note 1: Can handle missing labels (each expert is not required to label each instance).

Note 2: Proposed for logistic regression, but it can handle categorical, ordinal and regression problems.

Two coin model for annotators:

The assumption is that sensitivity and specificity do not depend on the instance x.

Software System Design – reading list

1. The NoSQL Ecosystem

2. Kafka documentation

3. Questioning the Lambda Architecture

4. Implementing a Real-time, Sliding-Window Application Using Amazon Kinesis and Apache Storm

5. How to crawl a quarter billion webpages in 40 hours.


tl:dr REST is a lightweight alternative to RPC and SOAP.

I’m reading about REST APIs. These are my notes from reading

I also liked the following series of youtube videos for an introduction to RESTful webservice apis.

Rest Architecture components:

1. Resources – identified by logical URLs.

2. Web of resources – a single resource should not be large, but instead contain links to other resources.

3. Client server

4. No connection state – interaction is stateless

5. Resources should be cachable. Use HTTP cache control headers.

6. There can be proxy servers.

Disabling 3D unity desktop on Ubuntu 14.04 LTS

I have started using Ubuntu again after 2.5 years. I started using a new installation of 14.04 LTS on a somewhat old box.

But I had forgotten to disable 3D unity desktop and suffered for few weeks. Ubuntu default installation no longer provides the option to choose desktop during login. But the instructions in the below links helped.

I highly recommend disabling 3D unity desktop. The performance difference is staggering.

Approximate Stream Algorithms

The following paper by Motwani et al gives couple of algorithms for computing frequency counts exceeding a user specified threshold over data streams.

1. Sticky sampling.

2. Lossy counting.

Lossy Counting:

bucket width w = ceil (1 / epsilon)

fe = frequency of element e in the steam, whose current count is N.

bcurrent = ceil (N / w)

We store the following (e, f, delta) in Data structure D. where e is element, f is freq and delta is maximum error in f.

1. Initially D is empty.

2. When a new item arrives, if its already in D, we increment its f by 1. Else, we create a new entry (e, f, bcurrent -1). We also prune D by deleting some entries at bucket boundaries (N congruent 0 mod w). Delete if f + delta <= bcurrent.

3. When user requests a list of items, with thresold s, we output entries in S, whose freq f is >= (s – epsilon)N.


The above paper led to other algorithms including the one by Eric Demaine et al.

Algorithm Frequent

1. Initialize the counters to zero.

2. For each element in the stream:

a. If the current element is not monitored by any counter and some counter is zero, define the current element to be the monitored element of that counter.

b. If the current element is the monitored element of a counter, increment the counter. Otherwise decrement every counter.


The related “Britney Spears problem” which the above two papers solve is discussed in the below article.

Blog at

Up ↑