Skip to content

Lambda Architecture implementation using Apache Storm, Hadoop and HBase to perform Twitter real-time image processing analysis.

Notifications You must be signed in to change notification settings

Alessioventuri/TwitterRealtimeImageProcessing

Repository files navigation

Lambda Architecture for Twitter Realtime Image Processing

Description

Lambda Architecture implementation using Apache Storm, Hadoop and HBase to perform Twitter real-time image processing analysis. The goal of this project is find the most representative images from images obtained from Twitter through a certain keyword.
To find these representative images, we will use the K-Means algorithm.

Link to the whole paper

Dependencies

Pre-requisites

Hbase, Storm and Hadoop have to be installed and set correctly on your pseudo-distribuited cluster.

Usage

To collect the images from Twitter, you have to get your personal Twitter Developers credential and insert it on .txt file. You will find an example inside the project as FakeCredential.txt

Start in this order:

  • Hadoop
  • HBase
  • Storm ( it start automatically on Eclipse in my case )

After that, you can finally execute in this order:

  • TwitterRealTimeImageProcessing.java
    • insert the keyword inside the arguments list
  • HadoopDriver.java
    • Choose the correct value as number of center, threshold and a file where to write the centers.
    • With CEDD Descriptor, we obtain a 144-dimensional vector.
    • If you want change descriptor, change also FeatureExtractorCEDD.java and all corrispondences.

At the end of kmeans, an HTLM page will show the results obtained html

Tips

This whole project was executed on Ubuntu 20.04.2 LTS

About

Lambda Architecture implementation using Apache Storm, Hadoop and HBase to perform Twitter real-time image processing analysis.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages