Serverless Pipelines in Lithops

We present a summary of serverless benchmarks and pipelines, useful to measure the performance of serverless frameworks like Lithops. We name each parallel call to a set of cloud functions a stage.

Benchmark	Description	Language	Stages	Data set	Data format	LOC
General
FLOPS Computation Test	Analyze Lithops performance in FLOPS.	Python3.10	1	Autogenerated	NumPy array	63
Object Storage Test	Measure the bandwidth from a computation backend to storage.	Python3.10	2	Autogenerated	Bytes	286
Montecarlo	Monte Carlo Methods to make computations with a big amount of stochastic data.	Python3.10	1	Autogenerated	NumPy array	46
Mandelbrot classic	Mandelbrot set calculated on a increasing resoultion.	Python3.10	7	Autogenerated	Mandelbrot set	109
Machine Learning
Hyperparameter tunning	Hyperparameter tuning using grid search algorithm.	Python3.10	1	Amazon customer reviews	ft.txt	111
Geospatial
NDVI	Calculate NDVI from Object Storage images.	Python3.10	2	Sentinel2 satellite image from the AWS Sentinel2 open data repository	Cloud-Optimized GeoTIFF (COG)	1552
Model creation from LiDAR pre-processing	Create terrain models using LiDAR partitioner.	Python3.10	1	laz files	laz	212
Water Consumption	Calculate water consumption from crops using the Penman-Monteith formula and interpolation raster.	Python3.10	9	Instituto Nacional de Informacion Geográfica	Tif files	686
Metabolomics
METASPACE metabolite annotation	Run the METASPACE metabolite annotation pipeline on cloud resources.	Python3.8	16	Examples of datasets and databases	imzML	2642
Genomics
Genomics
Variant Calling	Alignment of sequencing reads, stored as FASTQ files, to a reference genome, stored as a FASTA file.	Python3.10	9	Trypanosome [Genome, SRR6052133], Human [Genome, SRR15068323 , ERR9856489], Bos Taurus [Genome, SRR934415]	fast, fastq	4174
Astronomics
Astronomica-interferometry	Radio interferometry data processing.	Python3.8	2	SB205.MS SB206.MS SB207.MS SB208.MS SB209.MS SB210.MS	MS	907
Elastic Exploration
UTS	The Unbalanced Tree Search (UTS) benchmark.	Java 11	1	Autogenerated	Dynamic tree	2841
Mandelbrot with Mariani Silver	Render the Mandelbrot set using Marian-Silver algorithm.	Java 11	1	Autogenerated	Mandelbrot set	2735
Betweenness Centrality	Compute the Between Centrality (BC) algorithm.	Java 11	1	Autogenerated	Graph	3119
Extreme Sorting
TeraSort	Implementation of the TeraSort benchmark built on Lithops.	Python3.10	2	TeraGen	ascii	827
					TOTAL:	20310

In most cases there's a link to an external repository containing the code while others can be found here.

All workflows except the ones in Elastic Exploration utilize Lithops to easily deploy and run code on any major Cloud serverless platform.

Benchmarks

For the geospatial benchmarks you first need to follow this instructions to set up the environment. Find more technical information about Geospatial, Genomics and Metabolomics benchmarks here.

1. FLOPS Computation Test

This is a benchmark to estimate the floating-point performance of the system for matrix multiplication operations using NumPy. It measures how many floating-point operations per second the system can perform for this specific operation.

2. Object Storage Test

This benchmark measures the bandwidth between the storage and computation backends.

3. Montecarlo Simulations

This contains two applications in which Monte Carlo Methods is used to make computations with big amount of random data using Cloud Functions with Lithops.

4. Mandelbrot classic

Mandelbrot set calculated on a limited space several times using Lithops. A certain region of the linear space is treated as a matrix and divided into chunks in order to be able to be distributed among many functions.

5. Hyperparameter Tunning with Grid Search

Perform hyperparameter tuning using grid search algorithm. We have a dataset consisting of amazon product reviews and a sklearn classifier to classiy these reviews. We take advantage of cloud functions to tune this classifier's hyperparameters and show how Lithops can be used for ML computations.

6. NDVI Calculation

Use case of serverless image processing consuming data from Object Storage, NDVI(Normalized Difference Vegetation Index) is calculated over many images.

7. Model creation from LiDAR pre-processing

We partition LiDAR files based on the denisty of points. With this partitioned data we create several terrain models used in many geospatial workflows. We study the impact of load balancing by partitioning LiDAR data using the aforementioned density-based partitioner.

8. Water Consumption

Pipeline that calculates water consumption from crops using the Penman-Monteith formula and interpolation rasters of temperature, wind, solar irradiance and humidity for a given day.

9. METASPACE metabolite annotation pipeline

Run the METASPACE (Spatial metabolomics cloud platform that conducts molecular annotation of imaging mass spectrometry data) metabolite annotation pipeline on cloud resources using Lithops. The original implementation of this pipeline can be found on Metaspace repository. We have addapted this implementation to work with Lithops 3.5.1 and with more recent package versions.

More information about this pipeline can be found on this IBM Blog post.

An extended analysis of the pipeline, along with a demonstration of Lithops' VM backend was presented in the Middleware '24 Industrial Track.

10. Variant Calling

In genomics, variant calling entails the alignment process, which is essentially a search for string similarities. This process aligns sequencing reads, typically stored as FASTQ files, with a reference genome, which is stored as a FASTA file. The reference genome and reads are split into smaller chunks for alignment.

Our serverless variant calling was presented in the 9th Workshop in Serverless Computing (WOSC).

11. Astronomica Interferometry

Processing radio interferometric data performing all the phases: rebinning, calibration and imaging using Lithops.

We present an adapted version for easier execution. The original implementation can be found at https://github.com/abourramouss/serverlessextract.

12. UTS

The implementation of UTS presented here is the first that tackles an elastic resource provisioning.

The UTS pipeline was part of a paper showcasing the use of Cloud Functions in elastic, unbalanced algorithms.

13. Mandelbrot with Mariani Silver

Render the Mandelbrot set using the Marian-Silver algorithm as optimization technique. This algorithm relies on the fact that the Mandelbrot250 set is connected, that is, there is a path between any two points belonging to the set.

14. Betweenness Centrality

Computing the Between Centrality (BC) algorithm. The implementation follows the Brandes’ algorithm described in the benchmark, augmenting Dijkstra’s single-source shortest paths (SSSP) algorithm for unweighted graphs.

15. TeraSort

Implementation of the TeraSort benchmark (a distributed sort), built on Lithops. Tasks are executed on cloud functions, object storage is used for reading & writing data (including the exchange of itnermediate files). We have deeply studied serverless data exchanges in a series of publications :

Corresponding authors

Pipeline tuning, set up and compilation:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serverless Pipelines in Lithops

Benchmarks

1. FLOPS Computation Test

2. Object Storage Test

3. Montecarlo Simulations

4. Mandelbrot classic

5. Hyperparameter Tunning with Grid Search

6. NDVI Calculation

7. Model creation from LiDAR pre-processing

8. Water Consumption

9. METASPACE metabolite annotation pipeline

10. Variant Calling

11. Astronomica Interferometry

12. UTS

13. Mandelbrot with Mariani Silver

14. Betweenness Centrality

15. TeraSort

Corresponding authors

Pau Balanzà

Jordi Canosa

FilesExpand file tree

PIPELINES.md

Latest commit

History

PIPELINES.md

File metadata and controls

Serverless Pipelines in Lithops

Benchmarks

1. FLOPS Computation Test

2. Object Storage Test

3. Montecarlo Simulations

4. Mandelbrot classic

5. Hyperparameter Tunning with Grid Search

6. NDVI Calculation

7. Model creation from LiDAR pre-processing

8. Water Consumption

9. METASPACE metabolite annotation pipeline

10. Variant Calling

11. Astronomica Interferometry

12. UTS

13. Mandelbrot with Mariani Silver

14. Betweenness Centrality

15. TeraSort

Corresponding authors

Pau Balanzà

Jordi Canosa