इलेक्ट्रॉनिकी और सूचना प्रौद्योगिकी मंत्रालय, भारत सरकार
MINISTRY OF ELECTRONICS AND INFORMATION TECHNOLOGY, GOVERNMENT OF INDIA
Data, in today’s business and technology world, is indispensable. Data analytics is the process of examining large data sets to identify insights and patterns. Under the Digital India Initiative, there is an increased focus on developing IT-enabled solutions to improve service delivery. Governments at all levels—the centre, states, and local bodies are making significant investments in e-Governance solutions to realize this vision. This shift to a digital ecosystem has led to tremendous growth in data related to various aspects of government functions and services. Advanced analytical techniques and tools make this possible and offer new ways in which the data can be mined to generate insights, from retrospective analysis to prospective analysis, helping the decision-makers look into the future and plan accordingly.
At its core, a data analytics platform requires a robust infrastructure capable of storing and processing huge amounts of data. The Data Analytics Service of NIC enables users to build an infrastructure for such capabilities. The infrastructure shall be hosted in the NIC National Cloud and provides an alternative to setting up capital-intensive in-house data analytics infrastructure. Depending on the intended use, users can either choose Hadoop (for big data) or the ELK stack to build their ICT infrastructure.
Hadoop is a framework under the umbrella of "big data" that helps in handling the voluminous and variety of data at a fast pace, where traditional methods are failing to handle it. It takes the support of multiple machines to run the process parallely in a distributed manner. The Hadoop family itself consists of multiple tools and technologies depending upon need, like HDFS (Hadoop Distributed File System), Scoop, Hive, Pig, Spark, Mahout, etc.
The ELK Stack is a collection of three open-source products: Elasticsearch, Logstash, and Kibana.
E stands for Elasticsearch, used for storing logs.
L stands for Logstash, used for both shipping as well as processing and storing logs.
K stands for Kibana, a visualization tool.
ELK Stack is designed to allow users to take data from any source, in any format, and to search, analyze, and visualize that data in real time. This solution makes applications more powerful to work with complex search requirements or demands.
The choice of stack should depend on the data type, volume, and use case one is working on. The ELK stack is best suited for simple searching and web analytics. Whereas the Hadoop stack is suited for use cases that require scaling, the capability to handle a high volume of data, and compatibility with third-party tools.
While the National Cloud provides the means to build infrastructure to process the data analytics workload, NIC has also established the CENTRE OF EXCELLENCE FOR DATA ANALYTICS (CEDA) (http://ceda.gov.in) to assist government organisations in deriving insights from their data. CEDA provides world-class data analytics services to the government in an efficient and secure manner through its repository of world-class tools and technologies. As a part of its service offerings, it will help the departments:
To define their analytic needs
Identify the data sets that are required to meet the analytic needs
Determine access to the relevant data sources (both within and outside the government)
Build the required data analytics solutions
In integrating departmental data silos and delivering an integrated analytics for an integrated policy formulation
Setup and processing of large data sets using Big data solutions
Development and deployment of Business Intelligence solution in terms of dashboard
Use of machine learning algorithms for advanced analytics