After data is collected, It it must be stored.
You might wonder why data can be collected and processed.
Some reasons for storage include the desire to perform
analysis in the future or recheck past analysis to add certainty to conclusions.
Data not stored or stored improperly can be lost.
Retrieving data is a costly proposition.
You store data every day if you are saving pictures on your phone,
or movies on hard drive.
Your smartphone and computer have places to store data,
so that you can retrieve things when you need them.
The storage place is called memory,
and how much you can store is measured in bytes.
The more data you need to store,
the more bytes of memory your camera or computer needs to have.
For example, megabyte is a million bytes.
For higher memory, you will need gigabytes,
terabytes, exabytes and petabytes of storage.
For large data sets,
memory devices that are much larger than
your smartphone and personal computers are needed.
Now imagine the amount of data a manufacturing enterprise can generate.
You are talking about storing massive amounts of data,
and also data that are of different types.
So you need special storage devices to store this type of data.
Also it is a good idea to store data in
a systematic manner so that you can ensure relationship between different data types.
You might have heard the word databases and servers.
A database stores data.
It is desirable to store data in
a systematic way so that the relationship between the data types can be understood.
A server again stores data,
but it stores a really large amount of data.
If you have been in the situation when you have found that
it is very hard to look at a file stored in your computer,
you have witnessed data being not organized that causes this problem.
Organizing data during storage makes it
easy to process compared to leaving it unorganized.
Organized data can be processed quicker,
it leads to reduced number of errors,
it is more efficient to process,
and it also requires less resources.
Additionally, sometimes efficient analytic algorithm
require that data be in a specific structured format.
Structured data is the data that is well-defined and formatted.
For example, data in databases are Excel sheet.
In an Excel sheet, if we need some information,
we can go to that respective column and get that information right away.
Semi-structured data is the data that does not have a proper format associated with it.
For example, data in an email or data are
stored in a Word document is semi structured data.
If you need more information,
you can go to a paragraph or a sentence directly but you need
to read that sentence or paragraph to obtain that required data.
Unstructured data is the least organized form of data that can exist.
Consider an example of unstructured data-- the images gathered by
satellite to predict and forecast weather.
If you see a series of images,
you will require specific softwares or
expert analysis that process this information to forecast the weather.
But if we just look at the individual single file,
it will be hard to process that information and
infer anything about the weather forecast.