Introduction
Data exists in many different sources around us. To extract meaningful information from this data, it is often necessary to collect data from multiple sources and store it in a common location. Once gathered, the data undergoes processing so that it can be transformed into a usable and consistent format. This transformation is essential because data collected from different sources may be represented in different forms.
After processing, structured data reveals patterns that provide information. This information can then be used for further knowledge discovery and effective decision-making.
Learning Objectives
By the end of this article, you will be able to:
- Explain the importance of data
- Understand the need for data processing
- Identify various data sources
- Describe the ecosystem required for data processing and decision-making
Importance and Need for Data in Data Science
In today’s world, terms such as data analysis, data mining, and artificial intelligence are commonly used. Data plays a critical role in almost every domain, making it essential to understand its importance.
Why Data Is Important
- Improves lives: Modern devices such as smartphones and smartwatches collect and analyze data like heart rate, calorie count, and daily activity to improve lifestyle and health.
- Supports decision-making: Organizations rely on data to make informed business decisions.
- Acts as a strategic asset: Data is treated as a valuable organizational asset for planning and growth.
- Helps in prediction: Historical data is used to predict future outcomes of strategies and decisions.
- Enables real-time monitoring: Data helps track and monitor systems and processes in real time.
- Promotes reusability: Data collected from multiple sources can be reused for different applications.
Data, Information, Knowledge, and Wisdom
Raw data, when organized and processed, becomes information. This information leads to the creation of knowledge, which helps individuals understand a subject. With experience and knowledge, one gains wisdom, which is the ability to make sound judgments.
Example
Consider the following data:
- 5, Colleges, Cities, 1980, Mangoes, Born, Quality, Mr. XYZ
This is merely a collection of random words and numbers. By identifying patterns, such as 5 top colleges in cities or Mr. XYZ was born in 1980, the data begins to convey meaning. This structured data becomes information, which further leads to knowledge and wisdom.
Another example:
- Data: 100
- Information: 100 miles
- Knowledge: 100 miles is a long distance
- Wisdom: Walking 100 miles is difficult, but using a vehicle is easier
This demonstrates how wisdom emerges from knowledge and experience.
Data Collection
Data collection is the process of gathering data from various sources based on the objective of the study. Some common data collection techniques include:
- Oral history
- Online marketing
- Interviews
- Questionnaires
- Observation
- Documents and records
- Focus groups
- Social media monitoring
Each method is chosen depending on the purpose of data analysis. For example, interviews help collect expert opinions, while social media monitoring provides insights into audience behavior and interests.
Data Processing
Once data is collected, it is stored together and then processed. Data processing involves several transformations, such as:
- Filtering
- Segregation
- Normalization
- Cleaning
Raw data is transformed into processed data so that it can be used for analysis, visualization, and decision-making.
Knowledge Discovery
Knowledge discovery is an iterative process where patterns are identified from processed data. These patterns help generate insights that are represented through reports, tables, or analytical models.
Information System
An information system stores, manages, and processes data to support analysis and decision-making. Database Management Systems (DBMS) are a common example of information systems used across various domains.
Stakeholders of an Information System
Stakeholders include:
- Owners
- Users
- Designers
- Developers
- System analysts
- Managers
For example, in a pharmacy, the information system is used not only by the owner but also by staff and customers, making them all stakeholders.
Data Warehouse
A data warehouse is a repository that stores curated and processed data ready for analysis. It supports business intelligence activities such as querying and data mining and often contains historical data.
Data Sources
Apart from data warehouses, other data sources include:
- Data Lake: A large pool of raw data with no predefined purpose
- Data Mart: A focused subset of a data warehouse designed for specific analysis
Comparison of Data Storage Types
- Data Lake: Broad scope, low preprocessing, hard to navigate
- Data Warehouse: Cleaned data, moderate preprocessing, structured
- Data Mart: Highly focused, high preprocessing, easy navigation
Summary
- Data collected from multiple sources is processed to generate meaningful information.
- Information leads to knowledge, which results in wisdom.
- Data lakes store raw data, data warehouses store processed data, and data marts store focused subsets.
- Proper data processing is essential for effective analysis and decision-making.
Glossary
- Data: Raw facts and figures
- Information: Structured data that provides meaning
- Knowledge: Understanding derived from information
- Wisdom: Ability to make sound decisions
- Data Lake: Repository of raw data
- Data Warehouse: Repository of processed data
- Data Mart: Focused subset of a data warehouse
References:
- Batini, C., & Scannapieco, M. (2016). Data and Information Quality. Springer.
- Gallaugher, J. (n.d.). Data, Information, and Knowledge. Getting the Most Out of Information Systems.