In talking to multiple leaders across numerous consumer goods, manufacturing, and fashion companies, the supply chain leaders that are most impressive and most fun to dialogue with are the ones that have mapped out a data and analytics medium-term and long-term strategy. We have all heard the cliché – data is the new oil. As unoriginal as the line is, there is no bigger truth to survival today.
A successful data and analytics strategy means having the capability:
- To access and store huge volumes of data
- To access, store, and manage different types and sources of data
- To query and perform analytics on this data store with limited to no constraints on speed
I will use this blog post to go over the different sources and types of data sets you will want to plan for in your data warehouse when designing an analytics strategy. For many data geeks, this might be redundant. But if you are a supply chain professional that wants to get more familiar with data sources, you might find this simple primer useful. Not all the below is mutually exclusive. The purpose is to give you an easy data dictionary to converse on relevant subject matter confidently.
Structured, semi-structured, and unstructured data
Traditional databases were all relational, where every row had a unique ID. Relational databases are the most basic way of storing data, where rows are characterized and identified by unique keys. By nature of their ability to be stored in such row-column databases, such data is called structured data. POS data that can be transferred over from spreadsheets is an excellent example of structured data.
Semi-structured data has attributes resembling structured data but cannot be entirely saved in a rigid format. Images we obtain of new designs or from the factory floor fall in this category. The photos can have an ID and a file tag that is conducive to a structured store format, but the image cannot be saved in a structured form.
Increasingly most data flowing from and across the supply chain is unstructured and cannot be stored in a mainstream, rigid data model or relational database. Data from text messages and Word documents such as cutting tickets and quality check documents would fall under this category.
Querying and running high performance, real-time analytics on the above data sources requires storage and management in the right kind of data warehouse. We will not get into that topic in this post. But one thing to leave you with is that unstructured data is increasingly flowing through in much higher frequency across organizations and is more voluminous than ever, coming through in terabytes and petabytes. It is best to store all this rush of data traffic in a cloud data warehouse with unlimited querying and hosting capacity.
1. IoT (Internet of Things) data
All of the physical world is now connected with sensors. For example, data that flows from the hardware connected to cars, household appliances, and baby monitors fall under this category. Specific to supply chains, sensors can feed data from the components that get moved to the central car assembly factory, from sewing machines on the shop floor, pallets in which bicycles are transported cross-ocean, trucks that carry finished lighting inland, and more. Such data tends to be a combination of semi-structured and unstructured data.
2. Open data or publicly available data
There is often value in integrating publicly available data to other data sets to enhance the quality of analytics. For example, enterprise customers’ data off Wikipedia and SEC might improve the quality of customer profile information available to sales and marketing teams within your company. Google Trends is another free data source that your design team might want to query against actual customer data to identify the best new designs for a region.
3. Syndicated/purchased data sources
Nielsen, IRI, and Bloomberg are examples of syndicated data systems: your company is likely buying market data from them or other syndicated sources, and data exchanges depending on your industry. These data sources provide anonymized and aggregated data across an entire industry and sector, allowing you to evaluate how you measure up. For example, you could compare your new product performance against the product categories competitors have success in and compare your average sales performance against your industry’s quarterly growth.
4. Web data
If one of the distribution channels to reach your customers is a direct-to-consumer website and other e-commerce retail channels, gathering data on customers’ online behavior is critical to understanding the value chain’s full loop. Websites have logs generated with detailed information to help understand the user journey as they navigate different pages to complete a purchase or abandon a cart. Integrating this customer-level data into the enterprise analytics warehouse is vital to having a full view of supply chain success.
5. Big data
Remember I said that the types and sources of data mentioned here are not necessarily mutually exclusive at the start of this post. We cannot end this post without mentioning big data, which is mentioned in every conversation about data processing and analytics. Big data is nothing but a description for huge volumes and varieties of data and can include one or more of the above five classifications in this post. Since data sets are now flowing or, instead, drowning us in much larger volume in much higher frequencies and varieties, traditional data processing and data warehousing applications are no longer sufficient to host, manage, and query big data. Any organization today is preparing and subscribing to systems to manage big data. Decisions on the technology stack should be made keeping in mind the volumes and varieties of the big data an organization is gathering every second.
I hope you found this data primer useful!