In today’s ‘information age’, identifying and acting on the influx of data that is relevant using appropriate tools and techniques is imperative for organisations to stay ahead. The volume of such data was estimated as 2.5 exabytes (one billion gigabytes) a day in 2012, which has been doubling every 40 months (Harvard Business Review). ‘Big data’ addresses such volumes of data where the velocity can be real time and are extracted from a variety of sources such as mobile phones, online purchases, social networks, GPS, electronic communication and instrumented machinery.
SLASSCOM recently conducted a timely ‘Tech talk’ which was partnered by Zone 24x7 on ‘Smart data engineering’ which addressed drivers, tools and applications used to effectively process data from a Sri
Lankan context. The speakers at the forum included Vance Shipley, CTO at Wavenet, Srinath Perera, Director Research at WS02, Hiranya Samarasekera, Director of Engineering and Architecture (Data and Analytics) at Leapset, Dinesh Priyankara, Specialist and Senior Architect at Virtusa, and Gogula Aryalingam, Software Architect at Navantis.
Srinath addressed why most organizations collect data about their interactions with customers and gather a wealth of information. This data often has information about customer interests, usage patterns, and other useful information that’s invaluable in making both operational and policy decisions in an organization. He also discussed ‘why’, ‘how’, and ‘what’ aspects in making the best use of big data technologies. Vance deliberated on how Google ignited a big data boom with a 2004 paper on a functional programming paradigm which allowed them to leverage their huge scale compute resources.
Apache’s open source Hadoop made this method widely available. He demonstrated how MapReduce differs from a traditional database approach and shared lessons learned by “drinking from the firehose” in the real world.
MapReduce is beginning to show its limitations when it comes to fulfilling our real-time data processing needs. Hiranya Samarasekera’s presentation looked at the recent developments in Hadoop YARN, and the emergence of real-time stream processing technologies such as Apache Storm and Apache Spark.Machine learning is a type of artificial intelligence that empowers computers to learn without being explicitly programmed. Though it has gone unnoticed most of the time, we benefit from its goodness in many ways, including filtering spam and preventing credit card fraud. Apart from that, machine learning is being used by many organizations to improve their business processes, increase profit margins and perform better than competitors. Dinesh spoke about the concepts and theories of machine learning followed by a demo on Azure ML as an example platform to educate the audience on the machine learning aspects of data.
Organizations have, for a long time, used business intelligence (BI) to get insight into their businesses from their data. From these insights, they take measures to plan their businesses better. Traditional BI approaches however, are now changing. New trends such as Big Data, and evolving user needs have created a shift towards getting results faster, accessing and making use of "outside" data, and all these being done by the business users themselves: Enter Self-Service business intelligence (SSBI). Gogula spoke about what ‘Self Service BI’ is all about, and showed how any non-technical business user can analyse data to gain valuable insights.
The presentations were followed by a panel discussion moderated by Rasika Karunatilake, General Manager at Leapset where the speakers deliberated on local perspectives related to smart data engineering. Career paths for data engineers and the need for local organisations/ academia to develop such professionals, data processing technologies used in specific business and industry related situations and the impact of big data on data security were discussed and debated on.