Grid computing framework in Python combining big data capabilities like HDFS (Hadoop)
“Today’s organisations are looking for a scalable solution to analyse volumes of data in shortest possible time providing business insight into the data for taking real time decision and possible automation of decision process”
Big Picture – Big Data
R was originally designed to run as a single threaded and single process application. With evolution of computers and with all the processes moving to computers recent years have seen a burst of data generated from various sources like social media, news, financial events etc. With ever increasing volumes of data, the data storage has evolved into a distributed storage like Hadoop file system (HDFS). Hadoop provides a scalable highly fault tolerant distributed storage of data. Distributed storage is also very well suited for Grid computing. A data analysis algorithm can be broken into small parallel algorithms that applies on a subset of data. The solution can run in parallel on the computers forming data grid. Once the results are formed these can be aggregated to find the solution to the problem.