Building govpedia.in

Lets begin!

Consuming tweet stream:

Need to consume public streams from twitter using their streaming API.

https://dev.twitter.com/streaming/overview

Twitter has provided an http client to listen to the streaming API.

https://github.com/twitter/hbc

The SampleStreamExample is good enough for the first cut. I just added a properties file instead of giving the credentials on the commandline.

https://github.com/twitter/hbc/tree/master/hbc-example/src/main/java/com/twitter/hbc/example

Now we have to parse the JSON streamed by the above code. Or it turns out we can index json docs directly in elasticsearch.

Indexing tweets in elasticsearch:

Install elasticsearch!

https://intercityup.com/blog/installing-elasticsearch-mac-os-x-10-9-mavericks-development.html

Preliminary tests in elasticsearch.

http://www.elasticsearchtutorial.com/elasticsearch-in-5-minutes.html

Java client for elasticsearch:

There seem to be a number of java clients available to talk to elastic search. Though the native client is highly recommended, on first look, I like the Jest client.

https://www.elastic.co/blog/found-java-clients-for-elasticsearch

Jest client: https://github.com/searchbox-io/Jest/tree/master/jest

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

Up ↑

%d bloggers like this: