Note: I am looking for application developers who would want to use any of my frameworks or projects for their real world applications. Please contact me if you are working on an app and feel that your application can benefit from cloud platform.
cloudMPI: MPI Framework for the Azure Cloud Platform
The cloud computing vendors usually offer a set of services that provide proprietary communica- tion mechanisms for the respective cloud platform. For instance, Microsoft’s Azure cloud platform offers Queue Storage Service for communication among nodes, Table storage service for structured data storage, and Blob storage service for persistent data storage. We are exploring the major cloud platforms to architect cloudMPI such that it benefits from the reliability, robustness, and security provided by a cloud computing platform while avoiding any platform specific feature. We will provide APIs similar to those for traditional MPI implementations. However, we will implement the communication mechanism based on the APIs provided by the respective cloud service provider. This will facilitate extension of our APIs to a number of cloud service providers. Al- though, our implementation will target Microsoft’s Azure cloud platform, the APIs will be generic so that the application portability among various future implementations of cloudMPI, over differ- ent cloud platforms, is as seamless as possible.
AzureBOT: A Framework for Bag-of-tasks Application on the Azure Cloud Platform
Windows Azure is an emerging cloud platform that provides application developers with APIs to write scientific and commercial applications. However, the steep learning curve to understand the unique architecture of the cloud platforms in general and continuously changing Azure APIs specifically, make it difficult for the application developers to write cloud based applications. During our extensive experience with Azure cloud platform over the past few years, we have identified the need of a framework to abstract the complexities of working with the Azure cloud platform. Such a framework is essential for adoption of cloud technologies. Therefore, we have created AzureBOT---A framework for the Azure cloud platform to write bag-of-tasks class of applications. AzureBOT provides a straightforward and general interface that permits developers to concentrate on their application logic rather than cloud interaction. While we have implemented AzureBOT on Azure cloud platform, our framework design is generic to most of the cloud platforms. In this paper, we present the detailed design of our framework's internal architecture, the APIs in brief, and the usability of our framework. We also discuss the implementation of two different applications and their scalability results over 100 Azure workers.
Crayons: A Cloud Based Parallel Framework for GIS Overlay Operations
GIS vector-based spatial data overlay processing is much more complex than raster data processing. The GIS data ﬁles can be huge and their overlay processing is computationally intensive. Meager amount of work has been done on processing large volume of vector geospatial data through parallel/distributed computing, and none has been on cloud platforms. We have created Crayons system, which we believe to be the ﬁrst such parallel framework over clouds for overlay analysis of two GIS layers of polygonal data in GML format. The Windows Azure cloud platform was a challenge as it currently lacks support for traditional distributed computing infrastructures such as MPI or map-reduce. This paper presents the basic design of Crayons framework and explores the amount of parallelism in GIS computation over Azure. We show how the computation underlying this application can be effectively partitioned into independent tasks, and how Azure communication and storage mechanisms can be utilized to distribute these tasks among processors (Azure workers). We report on how much scalability Azure platform affords to various computational and i/o phases, and point out various bottlenecks in both algorithms and the Azure platform. Our experimental results show excellent speedups of basic overlay computation, highlight possible need for a new, distributed representation and storage of GIS ﬁles, and promise further scalability over larger clouds and data ﬁles. The code for Crayons is maintained at http://gpcoverlay.codeplex.com. For more information and to download manuscripts please visit the project page.
AzureBench: Benchmarking the Storage Services of the Azure Cloud Platform
Cloud computing is becoming mainstream for High Performance Computing (HPC) application development over the last few years. However, even though many vendors have rolled out their commercial cloud infrastructures, the service offerings are usually only best-effort based, without any performance guarantees. Cloud computing effectively saves the eScience developer the hassles of resource provisioning but utilization of these resources will be questionable if it can not meet the performance expectations of deployed applications. Furthermore, in order to make application design choices for a particular cloud offering, an eScience developer needs to understand the performance capabilities of the underlying cloud platform. Among all clouds, the emerging Azure cloud from Microsoft remains a challenge for HPC program development both due to lack of its support for traditional programming support such as MPI and mapreduce and due to its evolving APIs. To aid the HPC developers, we present an open-source benchmark suite, AzureBench, for Windows Azure cloud platform. We report comprehensive performance analysis of Azure cloud platform’s storage services which are its primary artifacts for inter-processor coordination and communication. We also report on how much scalability Azure platform affords using up to 100 processors and point out various bottlenecks in parallel access of storage services. The paper also has pointers to overcome the steep learning curve for HPC application development over Azure. We also provide an open-source generic application framework that can be a starting point for application development for bag-of-task applications over Azure.
AzureBench is an open source project available at http://azurebench.codeplex.com/
Parallel Priority Queues on multicore architectures
A parallel priority queue is a versatile data structure for applications such as extracting the earliest events in a discreteevent simulation, identifying urgent tasks in a parallel scheduler, or exploring most promising sub-problems in a state-spacesearch. The traditional heap data structure yields only one item at a time and any update along its tree structure is cache unfriendlydue to exponentially increasing distance between parent and child nodes. Fine-grained large-scale applications further causeexcessive contention among competing processors due to frequent updates. We present a cache-friendly heap-based parallel priorityqueue (PPQ) engineered for multicore architectures. Based on the amount of concurrency afforded by a parallel application, it canyield hundreds to thousands of top-priority items at a time. Even for extremely fine-grained applications, it is 2-3 times faster. Forlarger grain, it scales quickly to near linear speedups.
This research also showcases (i) how parallel data structurescan employ the synergy of multiple concurrent operations on wide-nodes to increase cache locality, (ii) how pipelined allocationof threads to update tasks at various tree levels can be dynamically shifted to closely follow the cached data, and (iii)how data-dependent cache prefetching can work by splitting a synchronous pipeline into odd and even phases. Our experimentalplatform includes 8 and 16-core compute nodes of a Linux cluster comprising AMD quad-core 2376, AMD quad-core 8350, andIntel quad-core Xeons.We present thorough experimental results, with our PPQ maintaining excellent speedups over sequentialheaps for a wide range of parameters.
If you want to have a look at the code, use this command to anonymously check out the latest project source code:
Parallel Priority Queues (PPQs) svn checkout http://parallelpriorityqueues.googlecode.com/svn/trunk yourfoldername Logic Simulator employing PPQs svn checkout http://logic-simulator.googlecode.com/svn/trunk/ parallelpriorityqueues
Cache Aware Matrix Multiplication algorithm
I am trying to implement a cache aware matrix multiplication algorithm for modern multicores. The idea is to introduce blocking such that the number of cache misses can be reduced. This should produce a fast matrix multiplication kernel that is at par with the standard libraries such as Intel's MKL and GoToBLAS2. This project is kind of abandoned. I might or might not revisit this one.
Bilateral Filtering for multicores using OMP and SSE Intrinsics
In this project I am exploring the possibility of employing high level as well as low level parallelism in a fundamental image processing kernel for bilateral filtering. The current implementation is highly scalable and can use n cores. I have tried it with Intel Multicore Test Lab and we were able to get a speedup of 30+ using 32 cores. The current implementation can be found at:
svn checkout http://blfilter.googlecode.com/svn/trunk/ blfilter-read-only