Key Aspects of Ecommerce Database Design


Editor’s note: In this article, Tanya tells about the core of an ecommerce database and its implementation. Read on and if you need to design and implement an optimal ecommerce database, check out ScienceSoft’s ecommerce services.

Is it possible to run an ecommerce store without a database behind it? Yes, but it would be feasible only for small businesses selling a limited amount of products to a small customer base. If your store doesn’t fall under this description, you’ll need a database to store and process data about your products, customers, orders, etc.

An ecommerce database allows you to organize data in a coherent structure to keep track of your inventories, update your product catalog, and manage transactions. The amount of effort required for your database management directly depends on how well it was designed.

How to design ecommerce database

Create a layout to organize your data effectively

Elaborating an ecommerce database architecture is the first step in its development. With a thoroughly designed database, you’ll make your ecommerce database a long-term solution that won’t require many updates and changes when your business is up and running.

Database design is mainly determined by a database schema. It is expressed in a diagram or a set of rules, aka integrity constraints written in a database language like SQL, that define how the data is arranged, stored, and processed.

What forms the center of an ecommerce database?

The three data blocks that comprise the core of an ecommerce database are a customer, an order, and a product. However, it’s a good practice to plan a database keeping in mind its possible expansion (tables for suppliers, shipments, transactions, and other data).

The Customer

The basic customer information includes a name, an email address, a home address broken down into lines, a login, a password, as well as additional data like age, gender, purchasing history, etc.

The Order

Order tables hold data about the purchases of each individual client. Usually, this data is organized into two tables. The first table contains all order-related information like the items ordered, date of order, the total amount paid, as well as Customer ID – a foreign key to the customer table.

The second table holds data on each product from the order, including product ID, its price, parameters, and quantity.

Design Your Ecommerce Database with Further Expansion in Mind

ScienceSoft can help you design and develop a scalable ecommerce solution.

The Product

Tables with product data include SKUs, product names, prices, categories, weight, and product-specific information like color, size, and, material, etc. If you’re selling many products from categories different in their attributes, having a single table listing all products with null values for some of the product parameters may not be appropriate.

In some cases, an entity–attribute–value, or an EAV, model may become an optimal approach. In the simplest case, it includes two tables. The first one is a product entity table with product IDs, attribute IDs, and attribute values. The attribute ID is a foreign key into the second table with all attributes and their definitions. This configuration gives an opportunity not to store every possible attribute (sometimes absent for a specific product type) in the product entity table.

Develop the core into a tailored solution

To define an optimal ecommerce database design, you need to carefully assess your ecommerce store, considering its size, features, look, integrations, and future evolution. Dedicated database development consultants at ScienceSoft are ready to help you define the tech stack required to support your business throughout its life cycle. If you would like to learn more about the database design options that would be perfect for your store, feel free to ask us.


Are you planning to expand your business online? We will translate your ideas into intelligent and powerful ecommerce solutions.

Self-Service Business Intelligence: Drive Your Company’s Success


Editor’s note: In this article, Marina describes the components of a self-service business intelligence (BI) solution and shares how self-service BI allows non-IT users to create queries and reports on their own. Read on for tips on building your self-service BI project, and if you seek deeper assistance with developing or upgrading your self-service BI solution, consider ScienceSoft’s BI consulting services.

The trend of self-service analytics continues to gain its momentum as more and more companies leverage self-service BI tools to put actionable insights in the hands of business users, who have little or no background in BI, statistical analysis, and data mining.

As opposed to traditional BI, self-service BI allows business users to make timely strategic and operational decisions without reliance on third parties – IT specialists, data analysts, and data scientists. Still, I would not call self-service BI an alternative to a traditional BI solution but rather its amendment, so implementing one doesn’t necessarily require a complete reboot of previous investments.

‘What self-service BI requires then?’ you might be wondering. Below, you can find your answer while I describe the components that support effective self-service business intelligence.

The foundation of a self-service BI

self-service BI

To enable the environment in which non-IT professionals can obtain valuable insights from disparate data sources on demand, a self-service BI solution should provide:

Data integration from multiple sources

Your self-service BI solution should allow integrating with different data sources (ERP, CRM, accounting and marketing software, etc.). Additionally, a self-service BI platform has to be flexible enough to integrate big data sources (streaming data, IoT data, social media data, etc.) to further enrich your insights with big data analysis.

Self-service data preparation

To serve the diverse analytical needs of your business users (historical analysis, streaming analytics, predictive analytics, etc.), a self-service BI solution should enable the aggregation of data sets before feeding data of a suitable format into your business intelligence system for reporting and further analysis.

The self-service data preparation tools perform such tasks as:

  • Accessing and cataloging data.
  • Data parsing and profiling.
  • Transforming and modeling data for analysis.
  • Refining and enriching data, etc.

Among the most vivid examples of these tools, I can mention such Power BI technologies as Power Query and Power BI dataflows that, besides ingesting data from a variety of data sources, allow data cleansing, transforming, integrating, enriching, and schematizing to dive really deep into your business.

Getting Lost in Self-Service BI Technologies?

ScienceSoft’s team is ready to assist you in selecting and configuring the right technology stack to meet your particular business intelligence needs and maximize self-service BI ROI.

The interface suiting every business user

As a self-service BI solution has to satisfy the demands of business users with different levels of technical expertise, it should have an easy-to-use interface for data analysis, reporting and visualization, sharing and collaboration.

As an example of such, I can think of Power BI, a recognized leader among self-service software praised for its friendly interface. Have a look at how its compelling visualization and intuitive interface helped an international real estate developer analyze financial data from any perspective and, consequently, better understand their business. If you think you may need professional help with assessing how well Power BI can meet your self-service analytics needs or its implementation, you are welcome to consider our Power BI consulting offer.

To facilitate fast adoption of self-service BI, the solution has to offer a non-technical representation of corporate data, so that all end users can derive value from self-service analytics technologies using common business terms. Thus, they could have a consolidated view of data relevant to such terms as ‘customer’ or ‘product’ with no need to know the relational structure of the tables where the data is stored. If you need an example, check our BI demo to see how easy it is to spot trends and patterns and answer your business questions with such interactive data visualization.

Robust data governance

The agility of self-service BI, which is realized in accessing critical business data by a large number of business users, poses certain risks – data leakage and inconsistent or invalid outputs across business units. So, to be in control of business analytics quality, I recommend setting up a data governance framework as soon as you deploy self-service BI.

Surely, modern self-service software usually has built-in data governance features. Still, I advise you to develop and set up your own data management procedures in accordance with your industry specifics, BI objectives, a number of users, etc. If you need to dig deeper into the topic, say, learn how to define your data quality objectives or check data management best practices, explore this insightful guide to data quality management written by my colleague Irene Mikhailouskaya.

Achieve self-service BI success!

Although self-service BI has lots to offer, developing and maintaining such a solution requires substantial efforts. Besides choosing proper self-service software, you need to follow the agile approach in developing or upgrading your self-service BI solution and work hard on the solution’s adoption within your business community. If you feel in need of qualified assistance with any of these tasks, my colleagues at ScienceSoft and me are ready to help.


We offer BI consulting services to answer your business questions and make your analytics insightful, reliable and timely.

Can ERP Be the Only Source of Data for Business Intelligence?


Based on our 17 years of experience in BI implementation, we developed a BI framework that allows utilizing data for more structured fact-based decisions. The framework embraces four main components: planning, plan execution, change analysis and optimization. Let’s look at each component separately and check whether the data in a company’s enterprise resource planning (ERP) system is enough to get valuable insights and which other data sources to turn to if ERP data is insufficient or missing.

business intelligence ERP

Planning

The planning component embraces trend analysis, forecasting, and performance analysis.

Trend analysis

Examining only ERP data to identify trends can result in misleading insights. For instance, a manufacturer sees no growth in shipments to a particular region and concludes that the situation remains stable. In reality, the demand for the manufacturer’s products in this region is increasing. The ERP system doesn’t reflect this trend as it’s unaware of unmet demand: the enterprise is working at its full capacity and successfully selling everything it produces. However, if the manufacturer turned to their CRM data, they would find there the increased number of lost opportunities with ‘No product available’ clarification.

Forecasting

We don’t think that an ERP system is enough for forecasting. Say, to predict customer demand, data from internal and external sources is required, such as detailed sales history (ERP and POS systems), a customer’s location and type (CRM), weather conditions or social media trends (external sources).

Performance analysis

Internal benchmarking also requires not only ERP data. For example, to identify their top-selling stores, a retailer should analyze sales values taken from a POS system, as well as consider the sales floor area and the number of available checkouts, which can be stored in their ERP.

Plan execution

ERP data may be enough to analyze the performance against the plan and spot deviations. However, to run root cause analysis, a company often needs to go beyond ERP data. Say, if a manufacturer has failed to achieve their production plans, they may find the reason in the disrupted deliveries of raw materials by some of their Tier 1 suppliers. To get to these details, a manufacturer has to turn to SCM (supply chain management) data.

Change analysis

ERP can be one of the data sources for change analysis. It’s fine to conduct ROI analysis, as finances and assets are tracked in an ERP system. On the other hand, ERP data won’t help in analyzing the effect of redesigning a company’s online store, while the data (i.e., product lists, pictures, descriptions and visitors’ search and purchase histories) from a content management system and an e-commerce solution will be required for this purpose.

Optimization

We don’t recommend companies to rely solely on ERP data if they challenge themselves with business process optimization. Take asset management as an example: ERP is likely to contain just machinery name, purchasing date, and price. If a company strives to use their machinery efficiently and reduce overall costs through preventive and even predictive maintenance, they need to know equipment utilization and maintenance schedules. And this info is usually stored in an MES (manufacturing execution system) or dedicated equipment utilization software.

To sum it up

Though ERP is a vital source of data for BI, we don’t recommend embedding business intelligence into ERP. BI can only bring value when it’s on top of all the company’s applications, such as ERP, CRM, SCM, CMS, MES, and POS, and when it uses external data sources.


BI expertise since 2005. Full-cycle services to deliver powerful BI solutions with rich analysis options. Iterative development to bring quick wins.

Major Pros and Cons to Consider


Editor’s note: Do you wonder whether Microsoft Power BI is the right self-service analytics solution for you? Marina shares her vision on its major pros and cons, which can influence your decision. To learn how we support our customers in their business intelligence projects, check ScienceSoft’s Power BI consulting services.

When ScienceSoft’s customers ask for more agility and independence in their analytics and reporting, I recommend considering Microsoft Power BI. Rather than forcing them into an implementation project right away, my suggestion is for them to start with the careful mapping of their business analytics needs against Power BI advantages and disadvantages. Here is the list of Power BI strong sides and limitations I share with the companies to take into consideration in the first place.

What I like about Power BI

Data integration and visualization capabilities

In my everyday practice, I see how handy Power BI is for creating meaningful data stories by importing data of various formats from a diversity of external and internal data sources. It is especially beneficial for the companies that don’t have a data warehouse solution – Power BI acts as a facilitator that enables data sets processing.

Also, with the possibility of creating reports and personalized dashboards, business users of every level can use Power BI to make quick and confident decisions. To get an idea of how personalized reports and dashboards may look, check ScienceSoft’s project where we enabled the international real estate developer to gain in-depth insight into their business and spot trends for new business opportunities.

Affordability

I believe Power BI suits every pocket. It enables creating self-explanatory dashboards and reports at the lowest possible price. With Power BI Desktop, it is free of charge – you just download the free version and create reports. However, a Power BI Desktop limitation to be taken into account is the inability to collaborate on the insights with your colleagues within a single space. For that, you’ll need amending Desktop with Power BI service, which is available at a relatively low price – $9.99/user/month or at no additional cost for Office 365 E5 users.

Data analytics capabilities

When customers turn to ScienceSoft to upgrade their old and rigid enterprise DWHs to perform particular analytics, Power BI is one of the options we offer them, as it enables necessary analytics within days or even hours and eliminates the need for lengthy development and implementation.

If you want to read about Power BI value in more detail, don’t hesitate to explore our article dedicated to Power BI benefits.

Let us show you the Power BI potential!

If you want to see how Power BI translates raw data into meaningful data stories, have a look at our Power BI Demo.

What I don’t like about Power BI

Complexity

Power BI is intuitive enough when it comes to importing data and creating simple reports. However, I always warn the customers who require advanced analytics within the Power BI suite that they will need additional tools to master (Power Pivot, Power Query, etc.) and an external consultant to hire for conducting complex analysis.

Costly on-premises storage and processing

When our clients need to keep their data and reports on-premises because of legal regulations or specifics of their industry, I advise them to deploy Power BI Report Server. However, the solution is only available in the Power BI Premium plan, so they will have to pay starting from $4,995/month for the possibility of keeping their Power BI content in-house.

Limitations to face when dealing with huge data sets

When employing Power BI for deriving insights out of massive data sets, you must remember about data set limits. For example, the size of a data set you can import into Power BI Pro is 1GB, which can naturally limit the complexity of reports and dashboards. If you want to import and analyze larger data sets with Power BI, you can try creating multiple queries to process the entire data set or shift to Power BI Premium.

Is Microsoft Power BI the right solution for you?

Even though Power BI is one of the most powerful facilitators for self-service business intelligence, I believe that the feasibility of each Power BI project should be estimated individually. In case you need assistance with defining the right set of Power BI functions or advising on Power BI implementation, I am ready to answer your questions.


Do you need to get an expert opinion on Microsoft Power BI? Our consultants will analyze your current reporting capabilities and offer an optimal solution to meet your business needs.

Examples, Sources and Technologies explained


For years, people have asked all-knowing Google how big data can help businesses to succeed, what big data technologies are the best, and other important questions. A lot has been written and said about big data already, but the term itself remains unexplained. To be fair, we do not count a widespread definition “big data is big.” This concept raises another question: what are the measures for “big” – 1 terabyte, 1 petabyte, 1 exabyte or more?

Here, our big data consulting team defines the concept of big data through describing its key features. To give a complete picture, we also share an overview of big data examples from different industries, enumerate different sources of big data and fundamental technologies.

What is big data

Big data defined

Here’s our definition:

Big data is the data that is characterized by such informational features as the log-of-events nature and statistical correctness, and that imposes such technical requirements as distributed storage, parallel data processing and easy scalability of the solution.

Below, you can read about these features and requirements in more detail.

Informational features: In contrast to traditional data that may change at any moment (e.g., bank accounts, quantity of goods in a warehouse), big data represents a log of records where each describes some event (e.g., a purchase in a store, a web page view, a sensor value at a given moment, a comment on a social network). Due to its very nature, event data does not change.

Besides, big data may contain omissions and errors, which makes it a bad choice for the tasks where absolute accuracy is crucial. So, it doesn’t make much sense to use big data for bookkeeping. However, big data is correct statistically and can give a clear understanding of the overall picture, trends and dependencies. Another example from Finance: big data can help identify and measure market risks based on the analysis of customer behavior, industry benchmarks, product portfolio performance, interest rates history, commodity price changes, etc.

Technical requirements: Big data has a volume that requires parallel processing and a special approach to storage: one computer (or one node as IT gurus call it) is not sufficient to perform these tasks – we need many, typically from 10 to 100.

Besides, big data solution needs scalability. To cope with ever-growing data volume, we don’t need to introduce any changes to the software each time the amount of data increases. If this happens, we just involve more nodes, and the data will be redistributed among them automatically.

Big data examples

To better understand what big data is, let’s go beyond the definition and look at some examples of practical application from different industries.

1. Customer analytics

To create a 360-degree customer view, companies need to collect, store and analyze a plethora of data. The more data sources they use, the more complete picture they will get. Say, for each of their 10+ million customers they can analyze 5 types of customer big data:

  • Demographic data (this customer is a woman, 35 years old, has two children, etc.).
  • Transactional data (the products she buys each time, the time of purchases, etc.)
  • Web behavior data (the products she puts into her basket when she shops online).
  • Data from customer-created texts (comments about the company that this woman leaves on the internet).
  • Data about product/service use (feedback on the quality of the goods ordered, the speed of delivery, etc.).

Customer analytics is equally beneficial for companies and customers. The former can adjust their product portfolio to better satisfy customer needs and organize efficient marketing activities. The latter can enjoy favorite products, relevant promotions and personalized communication.

2. Industrial analytics

To avoid expensive downtimes that affect all the related processes, manufacturers can use sensor data to foster proactive maintenance. Imagine that the analytical system has been collecting and analyzing sensor data for several months to form a history of observations. Based on this historical data, the system has identified a set of patterns that are likely to end up with a machine breakdown. For instance, the system recognizes that picture formed by temperature and load sensors is similar to pre-failure situation #3 and alerts the maintenance team to check the machinery.

It’s important to mention that preventive maintenance is not the only example of how manufacturers can use big data. In this article, you’ll find a detailed description of other real-life big data use cases.  

3. Business process analytics

Companies also use big data analytics to monitor the performance of their remote employees and improve the efficiency of the processes. Let’s take transportation as an example. Companies can collect and store the telemetry data that comes from each truck in real time to identify a typical behavior of each driver. Once the pattern is defined, the system analyzes real-time data, compares it with the pattern and signals if there is a mismatch. Thus, the company can ensure safe working conditions (as drivers should change to have a rest, but they sometimes neglect the rule).

4. Analytics for fraud detection

Banks can detect an unusual card behavior in real time (if somebody else, not the owner, is using it) and block suspicious activities or at least postpone them to notify the owner. For example, if the user is trying to withdraw money in Spain, while they reside in Texas, before declining the transaction, the bank can check the user’s info on the social network – maybe they are simply on vacations. Besides, the bank can verify if this user has any linkage with fraud-related accounts or activities across all other channels.

Big data sources: internal and external

There are two types of big data sources: internal and external ones. Data is internal if a company generates, owns and controls it. External data is public data or the data generated outside the company; correspondingly, the company neither owns nor controls it.

Let’s look at some self-explanatory examples of data sources.

Internal and external big data sources

Autonomous system or a part of traditional BI?

Big data can be used both as a part of traditional BI and in an independent system. Let’s turn to examples again. A company analyzes big data to identify behavior patterns of every customer. Based on these insights, it allocates the customers with similar behavior patterns to a particular segment. Finally, a traditional BI system uses customer segments as another attribute for reporting. For instance, users can create reports that show the sales per customer segment or their response to a recent promotion.

Another example: Imagine an ecommerce website supported by the analytical system that identifies the preferences of each user by monitoring the products they buy or are interested in (according to the time spent on a product page). Based on this information, the system recommends “you-may-also-like” products. This is an independent system.

Big data technologies: overview of good-to-know names and terms

Big data good-to-know terms

The world of big data speaks its own language. Let’s look at some good-to-know terms and most popular technologies:

  • Cloud is the delivery of on-demand computing resources on a pay-for-use basis. This approach is widely used in big data, as the latter requires fast scalability. E.g., an administrator can add 20 computers in a few clicks.
  • Hadoop is a framework used for distributed storage of huge amounts of data (its HDFS component) and parallel data processing (Hadoop MapReduce). It breaks a large chunk into smaller ones to be processed separately on different data nodes (computers) and automatically gathers the results across the multiple nodes to return a single result. Quite often Hadoop means the ecosystem that covers multiple big data technologies, such as Apache Hive, Apache HBase, Apache Zookeeper and Apache Oozie.
  • Apache Spark is a framework used for in-memory parallel data processing, which makes real-time big data analytics possible. E.g., an analytical system may identify that a visitor has been spending quite a long time on particular product pages, but has not added them to the cart yet. To motivate a purchase, the system can offer a discount coupon for the product of interest.

Read more:

Now you know what big data is, don’t you?

Our big data consultants created a short quiz. There are five questions for you to check how much you’ve learned about big data:

  1. What kind of data processing does big data require?
  2. Is big data 100% reliable and accurate?
  3. If your goal is to create a unique customer experience, what kind of big data analytics do you need?
  4. Name at least three external sources of big data.
  5. Is there any similarity between Hadoop and Apache Spark?

Well done! We hope that the article was helpful to you and that after reading it you’ve found the quiz easy.


Big data is another step to your business success. We will help you to adopt an advanced approach to big data to unleash its full potential.

Supplier Risk Assessment with Data Science


Not once in their practice, our data scientists have heard such complaints from retailers and manufacturers as suppliers missing delivery deadlines, failing to meet quality requirements or bringing incomplete orders. To avoid these problems, businesses strive to optimize their current approaches to assessing supplier risks. As the issue is common and its criticality is in the air, our data science consultants decided to share best practices and describe an alternative approach to assessing supplier risks. This one relies on deep learning – the most advanced data science technique – and allows businesses to build accurate short-term and long-term predictions of a supplier’s failure to meet their expectations.

Here’s the summary of everything that we cover in our blog post:

supplier-risk-assessment

The limitations of the traditional approach to supplier risk assessment

Traditionally, businesses assess supplier risks based on such general data about suppliers as their location, size and financial stability, leaving out suppliers’ daily performance. And even if suppliers’ performance is considered, the traditional approach usually means a simplified classification that can easily result in a table like this:

traditional-approach-to-supplier-risk-assessment

With such an approach, several suppliers are rated the same. But the table doesn’t show a particular pattern, say, a trend in the latest deliveries of a particular category by the given supplier.

The data science-based approach to supplier risk assessment

Our data scientists suggest an alternative to the traditional supplier risk assessment – a data science-based approach. However, to enjoy it, a business must have extensive data sets with supplier profiles and delivery details, which serve as a starter kit. Without this data, a business cannot proceed with designing and developing a solution.

Below, we suggest a possible structure for each of the data sets, but they are neither obligatory nor rigid and only serve as an example. To make the set relevant for their needs, businesses can add specific properties, for example, expand a supplier profile with such criteria as financial situation, market reputation, and production capability.

Supplier data

supplier-data-for-risk-assessment

A supplier can mean either a company with all its manufacturing facilities or a separate manufacturing facility.

Delivery data

delivery-data-for-assessing-supplier-risk

Data science allows analyzing this diverse data and converting it into one of the prediction types like in the table below.

predicting-supplier-failure

The essence of a data science-based solution

Based on our experience, we suggest using a convolutional neural network (CNN) for the solution. Let’s go through its constituent parts.

cnn-strusture

A CNN has a complex structure consisting of several layers. Their number can vary depending on what criteria a business identifies as meaningful in their specific case, as well as on the output that they expect to receive. We’ll take the data from our example to illustrate how the CNN works. And the expected output is a binary non-detailed prediction of whether a supplier will fail within 3 and within 20 next deliveries.

Ingesting data

Let’s put supplier data aside for a while – we’ll need it a bit later – and focus on delivery data. For a CNN to consume it, delivery data should be represented as channels, where each channel corresponds to a certain delivery property, for example, delivery criticality.

how-cnn-sees-data

In every channel, each cell (or a ‘neuron’ if we use data science terms) takes a certain value in accordance with the encoding principle chosen. For example, we may choose the following gradation to describe the timeliness property: -1 for ‘very early’, -0.5 for ‘early’, 0 for ‘in time’, 0.5 for ‘late’ and 1 for ‘very late’.

Extracting features

In deep learning, a feature is a repetitive pattern in data that serves as a foundation for predictions. For delivery data, a feature can be a certain combination of values for delivery criticality, batch size and timeliness.

As compared with other machine learning algorithms, a CNN has a strong advantage – it identifies features on its own. Though feature extraction skills are inherent to CNNs, they would come to nothing without special training. When a CNN learns, it examines a plethora of labeled data (the historical data that contains both the details about suppliers and deliveries and the info whether a supplier has failed) and extracts distinctive patterns that influence the output.

When features are known, a CNN performs a number of convolution and pooling operations on the newly incoming data. During convolution, a CNN tries each feature (which serves as a filter) to every possible fragment of the delivery data. Very simple math happens at this stage: each value of the fragment gets multiplied by the corresponding value of the filter, and the sum of these results is divided by their number. After this operation, each initial fragment with delivery data turns into a set of new (filtered) fragments, which are smaller than the initial one, still they preserve all its features. During convolutions, the CNN extracts first low-level features then high-level features, increasing the scale at each new layer. For example, low-level features cover 3 deliveries, while high-level features may cover 100 deliveries.

During pooling, a CNN takes another filter (called ‘a window’). Contrary to feature filters, this one is designed by data scientists. A CNN slides the window filter over a convoluted fragment and chooses the highest value each time. As a result, the number of fragments does not change, but their size decreases dramatically.

Classifying into failure/non-failure

After the last pooling operation, the neurons get flattened to form the first layer of a fully connected neural network where each neuron of one layer is connected with each neuron of the following layer. This is another part of a CNN, which is in charge of making predictions.

It’s time to recall our supplier data, as we add it to the neurons with the results received during feature extraction, to improve the quality of predictions.

At the classification stage, we don’t have filters anymore. Instead, we have weights. To understand the nature of weights, it would be useful to regard them as coefficients that are applied to each neuron’s value to influence the output.

These multiple data transformations end with the output layer, where we have two neurons that say whether the supplier will fail within 5 and within 20 next deliveries. Two neurons are required for our binary non-detailed prediction, while other prediction types may require a different structure of the output layer.

A few extra words about how a CNN learns

When a CNN starts learning, all its filters and weights are random values. Then the labeled data flows through the CNN and all the filters and weights are applied. Finally, the CNN produces a certain output, for example, this supplier will fail both short-term and long-term. Then it compares the predictions with what really happened to calculate the error it made. Say, in reality, the supplier delivered on their commitments, both short- and long-term. After that, the CNN adjusts all the filters and weights to reduce the error and recalculates the prediction with newly set weights. This process is repeated many times until the CNN finds the filters and weights that produce the minimum error.

Why the data science-based approach is good and not so good

Based on the example of the described solution, we can draw a conclusion about some benefits and drawbacks of the data science-based approach.

Advantages

  • An unbiased view of a supplier

A CNN leaves no room for subjective opinions – it sets its filters and weights and no buyer can influence the transformations that happen. Contrary to the traditional approach, a solution based on data science allows for unified assessment of supplier risks as it relies on data, not on personal opinions of category or buyers.

  • Captured everyday performance

Instead of deciding on a supplier’s reliability once and for all, data-driven businesses get regular updates about each of their supplier’s performance. If, say, the supplier is late or the order is incomplete or something else is wrong with the delivery, an entry appears in the ERP system and this information is soon fed into a CNN to influence the predictions.

  • A detailed view of a supplier’s performance

Generalized assessment that a traditional approach offers is insufficient for risk management. We assume that a supplier has occasional problems with product quality. But does this mean that a business will face this problem during the supplier’s next delivery? A data science-based approach has a probabilistic answer to this and other questions as it considers numerous delivery properties and supplier details.

  • Identified non-linear dependencies

Linear dependencies are rare for business environment. For instance, if the number of critical deliveries for a certain supplier increased by 10% and this led to a 15% rise in short-term failures, this wouldn’t mean that the increase of critical deliveries by 10% for another supplier will also lead to 15% more short-term failures. A CNN, like any deep learning algorithm, is built around capturing both linear and non-linear dependencies – the neurons of the classification part have non-linear functions at their core.

Limitations

Though a data science-based approach to measuring supplier risks offers many advantages, it also has some serious limitations.

  • Dependence on data amount and quality

In order to get trained and build predictions that can be trusted, a data science-based solution needs big amount of data. Therefore, the solution is not suitable for companies that have a scarce supplier base and/or a very diverse supplier set that doesn’t contain any stable pattern. Frequency of deliveries is an important limitation, too – the approach won’t work for suppliers who deliver rarely.

  • Need for professional data scientists

The accuracy of predictions is in data scientists’ hands. They make a lot of fundamental decisions, for instance, on the solution’s architecture, the number of convolution layers and neurons, and the size of window filters.

  • Serious efforts required for adoption

It’s insufficient just to design and implement a solution based on data science. A business should always think about measures to take to introduce the change smoothly. Without dedicated training on deep learning basics in general and on the solution in particular, category managers or buyers won’t trust the predictions and will continue with their traditional practices of working with Excel tables.

To keep with the tradition or to advance with data science?

Neither traditional nor data science-based approaches to supplier risk assessment are flawless, but their limitations are of different nature. While the traditional approach is relatively simple in terms of implementation but quite modest in terms of business insights, the data science-based approach is its exact opposite. On the one hand, it’s extremely dependent on the amount and quality of data, it requires the involvement of professional data scientists and serious efforts for adoption. But on the other hand, it can produce different types of accurate predictions that consider each supplier’s daily performance. And this can be an effective prevention of many diseases triggered by unreliable suppliers.


Bringing data science on board is promising, yet difficult. We’ll solve all the challenges and let you enjoy the advantages that data science offers.

Self-service Analytics, or Your Next Step Towards a Data-driven Company


Editor’s note: Marina explains why self-service analytics has gained much traction and shares how to leverage its capabilities to the fullest. If you consider integrating self-service analytics into your analytics environment or improving the existing solution, don’t hesitate to turn to ScienceSoft’s data analytics consulting services for professional assistance.

Nowadays, with data being widely recognized as a valuable asset, I see how companies are willing to make data analysis easily available for more business users. And the market offers them a solution for that – self-service data analytics. The term speaks for itself: when employed, self-service analytics allows business users to perform queries and generate reports on their own.

I bet this ambitious possibility raises some questions. Does that mean that traditional data analytics, when you have to request reports from the IT department to obtain insights, is a relic of times past? Can every employee access any corporate data now? How can people with no understanding of data modeling conduct effective analysis anyway? Keep on reading for the answers and see how to make self-service analytics work for you, not against you.

self-service analytics

Top 3 reasons why I think self-service analytics is worth investing in

Data analytics is a well-known facilitator of data-driven decision-making for the business. However, there are times when traditional data analytics cannot satisfy urgent business requirements promptly. And here comes self-service analytics. Below you can see the self-service analytics benefits that make it a perfect amendment for the traditional data analysis:

1. Faster decision-making

With self-service analytics, business users don’t need to wait for the reports to be done for them: they can run queries and get whatever data they need to make timely decisions as fast as self-service analytics software allows. For example, in one of ScienceSoft’s projects, self-service analytics software enabled the customer to modify the existing reports according to the current business needs and conduct analysis just by pressing certain buttons on their user application home screen.

2. Self-sufficiency for business users + enhanced productivity for data analysts

Our customers particularly praise self-service analytics for the ability to make ad-hoc reporting and analytics accessible for employees with no technical background.

Additionally, as more employees obtain independence in running queries and conducting data analysis, data scientists and skilled analysts can shift their focus from simple analytics tasks towards their core and more complicated ones.

3. Data democratization

Self-service analytics facilitates data literacy and the spread of data-driven culture by granting access to data to a larger number of employees. Surely, it doesn’t mean that any employee gains free access to critical business data as access should be regulated by data governance policies. However, you should remember that the chosen security procedures may affect the performance of the analytics solution (e.g., it can take too long for the system to produce the required reports). To avoid such a negative outcome, I advise paying particular attention to tuning user access control. You may see this approach realized in one of our projects, where we set up an access model so that it wouldn’t slow down the overall self-service analytics solution. As a result, the customer could democratize data across their corporation with no risks of data breaches and solution performance issues.

Schedule a free demo!

ScienceSoft’s team will leverage top self-service software to present your sample data in the form of immersive reports and dashboards.

3 steps for self-service analytics success

To ensure that self-service analytics brings the desired results, I advise you to focus on:

1. Informed choice of self-service analytics tools

When choosing self-service software, you should think about such aspects as data integration capabilities, advanced analytics capabilities, speed of reporting, visualization capabilities, data security, and much more. Taking into account that there are so many factors to consider, configuring the right technology stack for your self-service solution is much easier with professional help. With experienced consultants, you will be able to opt for software that satisfies both your immediate and long-term business goals.

Many customers I work with consider Microsoft Power BI to be a great facilitator of self-service data analysis. You are welcome to read my article about Power BI pros and cons, which may help you assess the feasibility of this self-service analytics tool for your company.

2. User adoption

Remember that by granting business users access to self-service business intelligence, you don’t automatically give these users the skillset they need for leveraging its potential. To help employees embrace new capabilities, I advise you to:

  • Choose software with a simple and intuitive user interface, so that non-technical users could master it. In case you wonder what an intuitive user interface may look like, feel free to watch our Power BI demo.  
  • Follow the agile adoption approach: don’t rush your employees into harnessing the full breadth of your new software. Let them feel and appreciate the first tangible outcomes and gradually encourage them to explore software further.
  • Conduct proper training for end-users to ensure that self-service analytics software is in full compliance with the skills they possess.

3. Data management procedures

Keep in mind that delivered reports are only as accurate as the data depicted in them, so you’ll have to set up proper data management procedures. Regardless of the fact that you employ self-service analytics software, the role of a data analyst is still crucial. Only a professional data analyst can perform such activities as data cleaning, preparing data sets that are further used by end-users, conducting advanced data analysis and monitoring system performance to ensure its high effectiveness.

So, how to let your business users own the data?

Although self-service analytics empowers more business users to make better decisions at the speed of business, reaching this level of data maturity and scaling your corporate analytical culture can be tough. You need to equip business users with the properly chosen self-service tools, the right level of access to data based on their business roles, and the guidance they need. More than often, achieving it is impossible without professional help. If you don’t know how to start your transition to a truly data-driven company, or you’ve encountered some problems within the existing self-service analytics solution, I am here to offer my assistance.


Are you striving for informed decision-making? We will convert your historical and real-time data into actionable insights and set up forecasting.

7 Major Big Data Challenges and Ways to Solve Them


Before going to battle, each general needs to study his opponents: how big their army is, what their weapons are, how many battles they’ve had and what primary tactics they use. This knowledge can enable the general to craft the right strategy and be ready for battle.

Just like that, before going big data, each decision maker has to know what they are dealing with. Here, our big data consultants cover 7 major big data challenges and offer their solutions. Using this ‘insider info’, you will be able to tame the scary big data creatures without letting them defeat you in the battle for building a data-driven business.

Big data challenges

Challenge #1: Insufficient understanding and acceptance of big data

Oftentimes, companies fail to know even the basics: what big data actually is, what its benefits are, what infrastructure is needed, etc. Without a clear understanding, a big data adoption project risks to be doomed to failure. Companies may waste lots of time and resources on things they don’t even know how to use.

And if employees don’t understand big data’s value and/or don’t want to change the existing processes for the sake of its adoption, they can resist it and impede the company’s progress.

Solution:

Big data, being a huge change for a company, should be accepted by top management first and then down the ladder. To ensure big data understanding and acceptance at all levels, IT departments need to organize numerous trainings and workshops.

To see to big data acceptance even more, the implementation and use of the new big data solution need to be monitored and controlled. However, top management should not overdo with control because it may have an adverse effect.

Challenge #2: Confusing variety of big data technologies

Variety of big data technologies

It can be easy to get lost in the variety of big data technologies now available on the market. Do you need Spark or would the speeds of Hadoop MapReduce be enough? Is it better to store data in Cassandra or HBase? Finding the answers can be tricky. And it’s even easier to choose poorly, if you are exploring the ocean of technological opportunities without a clear view of what you need.

Solution:

If you are new to the world of big data, trying to seek professional help would be the right way to go. You could hire an expert or turn to a vendor for big data consulting. In both cases, with joint efforts, you’ll be able to work out a strategy and, based on that, choose the needed technology stack.

Read more:

Challenge #3: Paying loads of money

Big data on-premises vs. in-cloud costs

Big data adoption projects entail lots of expenses. If you opt for an on-premises solution, you’ll have to mind the costs of new hardware, new hires (administrators and developers), electricity and so on. Plus: although the needed frameworks are open-source, you’ll still need to pay for the development, setup, configuration and maintenance of new software.

If you decide on a cloud-based big data solution, you’ll still need to hire staff (as above) and pay for cloud services, big data solution development as well as setup and maintenance of needed frameworks.

Moreover, in both cases, you’ll need to allow for future expansions to avoid big data growth getting out of hand and costing you a fortune.

Solution:

The particular salvation of your company’s wallet will depend on your company’s specific technological needs and business goals. For instance, companies who want flexibility benefit from cloud. While companies with extremely harsh security requirements go on-premises.

There are also hybrid solutions when parts of data are stored and processed in cloud and parts – on-premises, which can also be cost-effective. And resorting to data lakes or algorithm optimizations (if done properly) can also save money:

  1. Data lakes can provide cheap storage opportunities for the data you don’t need to analyze at the moment.
  2. Optimized algorithms, in their turn, can reduce computing power consumption by 5 to 100 times. Or even more.

All in all, the key to solving this challenge is properly analyzing your needs and choosing a corresponding course of action.

Challenge #4: Complexity of managing data quality

Data from diverse sources

Sooner or later, you’ll run into the problem of data integration, since the data you need to analyze comes from diverse sources in a variety of different formats. For instance, ecommerce companies need to analyze data from website logs, call-centers, competitors’ website ‘scans’ and social media. Data formats will obviously differ, and matching them can be problematic. For example, your solution has to know that skis named SALOMON QST 92 17/18, Salomon QST 92 2017-18 and Salomon QST 92 Skis 2018 are the same thing, while companies ScienceSoft and Sciencesoft are not.

Unreliable data

Nobody is hiding the fact that big data isn’t 100% accurate. And all in all, it’s not that critical. But it doesn’t mean that you shouldn’t at all control how reliable your data is. Not only can it contain wrong information, but also duplicate itself, as well as contain contradictions. And it’s unlikely that data of extremely inferior quality can bring any useful insights or shiny opportunities to your precision-demanding business tasks.

Solution:

Big data quality

There is a whole bunch of techniques dedicated to cleansing data. But first things first. Your big data needs to have a proper model. Only after creating that, you can go ahead and do other things, like:

  • Compare data to the single point of truth (for instance, compare variants of addresses to their spellings in the postal system database).
  • Match records and merge them, if they relate to the same entity.

But mind that big data is never 100% accurate. You have to know it and deal with it, which is something this article on big data quality can help you with.

Challenge #5: Dangerous big data security holes

Security challenges of big data are quite a vast issue that deserves a whole other article dedicated to the topic. But let’s look at the problem on a larger scale.

Quite often, big data adoption projects put security off till later stages. And, frankly speaking, this is not too much of a smart move. Big data technologies do evolve, but their security features are still neglected, since it’s hoped that security will be granted on the application level. And what do we get? Both times (with technology advancement and project implementation) big data security just gets cast aside.

Solution:

The precaution against your possible big data security challenges is putting security first. It is particularly important at the stage of designing your solution’s architecture. Because if you don’t get along with big data security from the very start, it’ll bite you when you least expect it.

Challenge #6: Tricky process of converting big data into valuable insights

Valuable insights with big data

Here’s an example: your super-cool big data analytics looks at what item pairs people buy (say, a needle and thread) solely based on your historical data about customer behavior. Meanwhile, on Instagram, a certain soccer player posts his new look, and the two characteristic things he’s wearing are white Nike sneakers and a beige cap. He looks good in them, and people who see that want to look this way too. Thus, they rush to buy a similar pair of sneakers and a similar cap. But in your store, you have only the sneakers. As a result, you lose revenue and maybe some loyal customers.

Solution:

The reason that you failed to have the needed items in stock is that your big data tool doesn’t analyze data from social networks or competitor’s web stores. While your rival’s big data among other things does note trends in social media in near-real time. And their shop has both items and even offers a 15% discount if you buy both.

The idea here is that you need to create a proper system of factors and data sources, whose analysis will bring the needed insights, and ensure that nothing falls out of scope. Such a system should often include external sources, even if it may be difficult to obtain and analyze external data.

Challenge #7: Troubles of upscaling

The most typical feature of big data is its dramatic ability to grow. And one of the most serious challenges of big data is associated exactly with this.

Your solution’s design may be thought through and adjusted to upscaling with no extra efforts. But the real problem isn’t the actual process of introducing new processing and storing capacities. It lies in the complexity of scaling up so, that your system’s performance doesn’t decline, and you stay within budget.

Solution:

The first and foremost precaution for challenges like this is a decent architecture of your big data solution. As long as your big data solution can boast such a thing, less problems are likely to occur later. Another highly important thing to do is designing your big data algorithms while keeping future upscaling in mind.

But besides that, you also need to plan for your system’s maintenance and support so that any changes related to data growth are properly attended to. And on top of that, holding systematic performance audits can help identify weak spots and timely address them.

Win or lose?

As you could have noticed, most of the reviewed challenges can be foreseen and dealt with, if your big data solution has a decent, well-organized and thought-through architecture. And this means that companies should undertake a systematic approach to it. But besides that, companies should:

  • Hold workshops for employees to ensure big data adoption.
  • Carefully select technology stack.
  • Mind costs and plan for future upscaling.
  • Remember that data isn’t 100% accurate but still manage its quality.
  • Dig deep and wide for actionable insights.
  • Never neglect big data security.

If your company follows these tips, it has a fair chance to defeat the Scary Seven.


Big data is another step to your business success. We will help you to adopt an advanced approach to big data to unleash its full potential.

4 Retail Data Analytics Trends to Win More Sales


The Christmas season is over, and you’ve heroically lived through it. The beginning of the year is a great time to look around and check the latest retail trends. We’ve scanned the expectations and selected the ideas that are likely to add value. The compilation has another unifier: all the initiatives should be supported with data analytics. Let’s have a close look at the four chosen trends.

Retail data analytics

1. Going omnichannel

Retail started to move beyond brick-and-mortar stores long ago. However, many retailers still consider making the first step in this direction. If you are among those, you need a proper retail data analytics solution.

Let’s say, a consumer electronics retailer runs both physical stores and an online one. The retailer’s brick-and-mortar stores are showing brilliant sales, while their online store is terribly lagging behind. Naturally, the retailer is not happy with the format that does not bring sales. However, is their e-store really useless, and they are just wasting money to keep it running?

After scrutinizing customer big data for both channels, the retailer may find out the following: 75% of website visitors are surfing the online catalog to compare product features and finish their purchases in one of the physical stores. In this case, if the retailer abandons an online store, they may also lose the majority of customers who would prefer shopping with another retailer in a way that they find convenient.

2. Creating unique customer experience

Customers make purchases in brick-and-mortar and online stores, take part in loyalty programs, create shopping lists and place orders in the apps. In a word, they interact with a retailer in different ways and expect them to offer a personalized approach in return. As a retailer, you must understand how precious customer data is, so you should be striving to repay your customers with targeted marketing campaigns and product offers.

Let’s assume, you are a drugstore retailer. You collect and analyze customer data to understand their behavior and preferences. Your analytical system knows that Customer A used to visit your store once a month to buy 3 packs of nappies, a washing powder of brand X and a dishwashing liquid of brand Y. But this month, Customer A didn’t appear. To encourage him or her to visit your store, you may then send them a 5% coupon for their favorite washing powder brand.

As a real-life example of data analytics in action, we’ve chosen Nordstrom with their initiative Reserve Online & Try In Store. Their app users can select items and book them for trying on in a particular store and at a convenient time. Nordstrom has gone further: the retailer can recognize if a customer is passing by their store and kindly invite them to come in. How can anyone stay indifferent when they receive the message: Hello from Nordstrom. It looks like you are nearby, and your item is ready to try! To crown it all, once a customer is in the store, he or she will find their name on the door of a fitting room. Here is the personalized approach in action.

3. Dynamic pricing to stay competitive

Once competitive intelligence was a challenge for brick-and-mortar retailers. Naturally, the rivals were unwilling to share any information, and price monitoring at the competitor’s stores was time-consuming, prone to mistakes and exhausting. In the era of ecommerce, retailers can benefit from new approaches to competitive intelligence. By definition, online stores are publicly available, and a lot of information such as product details, promo offers, prices and category hierarchy is always at hand. This all made dynamic pricing possible, when the system is able to scan competitors’ prices in real time, run a complex analysis very quickly and change a retailer’s prices automatically based on the defined rules. Now, if an ecommerce retailer wants to be 5% cheaper than their competitors are, with big data analytics they are able to do this.

Brick-and-mortar retailers can also rely on big data analytics, though some time ago they experienced some implementation limitations, as store employees had to replace price tags manually, which was time-consuming. With the invention of electronic price tags, dynamic pricing has become available to brick-and-mortar retailers, too. In no time, the system can change prices based on analyzing the sales, stock, competitors’ prices, customer demand, shelf life, etc.

4. Building effective relationships with suppliers

An industry benchmark, Walmart laid the rules on how to collaborate with the suppliers. And this policy can add $1 billion to Walmart’s revenue. The retailer has implemented a scoring system to assess the suppliers according to On-Time, In-Full principle. In simple words, either delivery is late or early, the supplier should pay a fine. The same scenario works if the supplier delivers timely, but the quantity of goods is wrong, their quality is poor, or packaging is damaged.

If you are to follow Walmart’s best practice, you can also think about tuning your data analytics system to differentiate between strategic and non-strategic suppliers, critical and non-critical product categories. Additionally, you can set different thresholds. For instance, the critical number of troublesome deliveries may be 10%.

With such a scoring system, a retailer will easily identify whether a supplier is reliable or not. Besides, this approach will contribute to a more efficient inventory management. So, a retailer’s persistent headache caused by over-stocks and out-of-stocks should finally subside.

To sum it up

Retailers’ aspirations don’t change significantly with the time. Satisfying customer needs better, outperforming competitors and building up effective relationships with suppliers are at the top of every retailer’s wish list. Despite the retail industry is constantly developing and there appear new sophisticated ways to solve daily and strategic challenges, this does not guarantee that every retailer reaches the desired result. However, supporting the initiatives with data analytics can make the difference. 


Are you striving for informed decision-making? We will convert your historical and real-time data into actionable insights and set up forecasting.

Demand Forecasting Using Data Science


From our consulting practice, we know that even the companies that have put significant effort into demand forecasting can still go the extra mile and improve the accuracy of their predictions. So, if you’re one of the companies who want reliable demand forecasts on their radars, this is the right page for you.

Though a 100% precision is impossible to achieve, we believe data science can get you closer to it, and we’ll show how. Our data scientists have chosen the most prominent demand forecasting methods based on both traditional and contemporary data science to show you how they work and what their strengths and limitations are. We hope that our overview will help you opt for the right method, which is one of the essential steps to creating a powerful demand forecasting solution.

Demand forecasting using data science

Traditional data science: The ARIMA model

A well-known traditional data science method is the autoregressive integrated moving average (ARIMA) model. As the name suggests, its main parameters are autoregressive order (AR), integration order (I) and moving average order (MA).

The AR parameter identifies how the values of the previous period influence the values of the current period. For example, tomorrow the sales for SKU X will be high if the sales for SKU X were high during the last three days.

The I parameter defines how the difference in the values of the previous period influence the value in the current period: tomorrow the sales for SKU X will be the same if the difference in sales for SKU X was minimal during the last three days.

The MA parameter identifies the model’s error based on all the observed errors in its forecasts.

Strengths of the ARIMA model

  • ARIMA works well when the forecast horizon is short-term and when the number of demand-influencing factors is limited.

Limitations of the ARIMA model

  • ARIMA is unlikely to produce accurate long-term forecasts as it doesn’t store insights for long time periods.
  • ARIMA assumes that your data doesn’t show any trend or seasonal fluctuations, while these conditions are sure not to be met in real life.
  • ARIMA requires extensive feature engineering efforts to capture root causes of data fluctuations and that is a lengthy and labor-intensive process. For example, a data scientist should mark particular days of the month as weekends for ARIMA to take into account this factor. Otherwise, it won’t recognize the impact of a particular day on sales.
  • The model can be time-consuming as every SKU or subcategory requires separate tuning.
  • It can only handle numerical data, such as sales values. This means that you can’t take into account such factors as weather, store type, store location and promotion influence.
  • It fails to capture non-linear dependencies, and that’s the kind of dependencies that is most frequent. For example, with 5% off promotion, toys from Frozen witnessed a 3% increase in sales. If the discount becomes twice higher – 10%, this doesn’t mean that the company should expect a double increase in sales to 6%. Besides, if they run a 5% promotion for Barbie dolls, their sales can increase by 9% as promotion influences various categories differently.

Contemporary data science: Deep neural networks

Since there are so many limitations to traditional data science, it’s natural that there are other, more reliable approaches, namely contemporary data science. There’s no better candidate to represent contemporary data science than a deep neural network (DNN). Recent research papers show that DNNs outperform all the other forecasting approaches in terms of effectiveness and accuracy of predictions. To usher you into the promising world of deep learning, our data scientists composed a 5-minute introduction to DNNs that comprises both the theory part and the practical example.

What are DNNs made of?

Deep neural network architecture

Here’s the architecture of a standard DNN. To read this scheme, you should know just 2 terms – a neuron and a weight. Neurons (also called ‘nodes’) are the main building blocks of a neural network. They are organized in layers to transmit the data along the net, from its input layer all the way to the output one.

As to the weights, you can regard them as coefficients applied to the values produced by the neurons of the previous layer. Weights are of extreme importance as they transform the data along its way through a DNN, thus influencing the output. The more layers a DNN has or the more neurons each layer contains, the more weights appear.

What data can DNNs analyze?

DNNs can deal equally well with numerical and categorical values. In the case with numerical values, you give the network all needed figures. And in case with categorical values, you’ll need to use ‘0-1’ language. It usually works like this: if you want to input a particular day of the week (say, Wednesday), you should have seven neurons, and you’ll give 1 to the third neuron (which will mean Wednesday) and zeroes to all the rest.

The vast diversity of data that a DNN is able to ingest and analyze allows considering multiple factors that can influence demand, thus improving the accuracy of forecasts. The factors can be internal, such as store location, store type and promotion influence, and external ones – weather, changes in GDP, inflation rate, average income rate, etc.

And now, a practical example. Say, you are a manufacturer who uses deep neural networks to forecast weekly demand for their finished goods. Then, you may choose the following diverse factors and data for analysis.

Factors to analyze What each factor reflects Number of neurons for the input layer
8 previous weeks’ sales figures Latest trends 8
Weeks of the year Seasonality 52 (according to the number of weeks in a year)
SKUs Patterns specific to each SKU 119 (according to the number of SKUs in your product portfolio)
Promotion The influence of promotion 1 (Yes or No)
    Total number of input neurons: 180

In addition to showing the diversity of data, the table also draws the connection between the business and technical aspects of the demand forecasting task. Here, you can see how factors are finally converted into neurons. This information will be useful for understanding the sections that follow.

Where does DNN intelligence come from?

There are two ways for a DNN to get intelligence, and they peacefully coexist. Firstly, this intelligence comes from data scientists who set the network’s hyperparameters and choose most suitable activation functions. Secondly, to put its weights right, a DNN learns from its mistakes.

Activation functions

Each neuron has an activation function at its core. The functions are diverse and each of them takes a different approach to converting the values they take in. Therefore, different activation functions can reveal various complex linear and non-linear dependencies. To ensure the accuracy of demand forecasts and not to miss or misinterpret exponential growth or decline, surges and temporary falls, waves, and other patterns that data shows, data scientists carefully choose the best set of activation functions for each case.

Hyperparameters

There are dozens of hyperparameters, but we’d like to focus on a more down-to-earth one, such as the number of hidden layers required. Choosing this parameter right is critical for making a DNN able to identify complex dependencies. The more layers, the more complex dependencies a DNN can recognize. Each business task, and consequently, each DNN architecture designed to solve this task, requires an individual approach to the number of its hidden layers.

Suppose in our example, data scientists decided that the neural network requires 3 hidden layers. They also came up with the coefficients that change the number of neurons in the hidden layers (these coefficients are always applied to the number of neurons in the input layer). Here are their findings:

Layer Coefficient Number of neurons in the layer
Input layer   180
Hidden layer 1 1.5 270
Hidden layer 2 1 180
Hidden layer 3 0.5 90
Output layer   1
    Total number of neurons in the network: 721

Usually, data scientists create several neural networks and test which one shows better performance and higher accuracy of predictions.

Weights

To work properly, a DNN should learn which of its actions is right and which one is wrong. Let’s look at how the network learns to set the weights right. At this stage, regard it as a toddler who learns from their personal experience and with some supervision of their parents.

The network takes the inputs from your training data set. This data set is, in fact, your historical sales data broken down to SKU and store level, which may also contain store attributes, prices, promotions, etc. Then, the network lets this data pass through its layers. And, at first, it applies random weights to it and uses predefined activation functions.

However, the network doesn’t stop when it produces an output – a weekly demand for SKU X. Instead, it uses loss function to calculate to which extent the output the network got differs from the one that your historical data shows. Then, the network triggers optimization algorithms to reassign the weights and starts the whole process from the very beginning. The network repeats this as many times (can be thousands and millions) as needed to minimize the mistake and produce an optimal demand.

To let you understand the scale of it all: the number of weights that a neural network tunes can reach hundreds of thousands. In our example, we’ll deal with 113,490 weights. No serious math is required to get this figure. You should just multiply the number of neurons in one layer by the number of neurons in the layer that follows and sum it all up: 180×270 + 270×180 + 180×90 + 90×1 = 113,490. Impressive, right?

Demand forecasting challenges that DNNs overcome

New product introduction

Challenge: Historical data is either limited or doesn’t exist at all.

Solution: A DNN allows clustering SKUs to find lookalikes (for instance, based on their prices, product attributes or appearance) and use their sales histories to bootstrap forecasting.

The thing is that you have all the historical data for the lookalikes because they are your tried-and-tested SKUs. So, you can take their weekly sales data and use it as a training data set to estimate the demand for a new product. As discussed earlier, you can also add external data to increase the accuracy of demand predictions – for example, social media data.

Another scenario here could be: a DNN is tuned to cluster new products according to their performance. This helps to predict how a newly launched product will perform based on its behavior at the earliest stages compared to the behavior of other new product launches.

Complex seasonality

Challenge: For some products (like skis for the winter or sunbathing suits for the summer), the seasonality is obvious, while for others, the patterns are not so easy to spot. If you are looking for multiple seasonal periods or high-frequency seasonality, you need something more efficient than trivial methods.

Solution: Just like with new product introductions, the task of identifying complex seasonality can be solved with the help of clustering. A DNN sifts through hundreds and thousands of sales patterns of each SKU to find similar ones. If particular SKUs belong to the same cluster, they are likely to show the same sales patterns in the future.

Weighing the pros and cons of DNNs

Now that we know how a DNN works, we can consider the upsides and downsides of this method.

Strengths of DNNs

Compared to traditional data science approaches, DNNs can:

  • Consider multiple factors based on diverse data (both external and internal, numerical and categorical), thus increasing the accuracy of forecasts.
  • Capture complex dependencies in data (both linear and non-linear) thanks to multiple activation functions embedded into the neurons and cleverly set weights.
  • Successfully solve typical demand forecasting challenges, such as new product introductions and complex seasonality.

Limitations of DNNs

Although DNNs are the smartest data science method for demand forecasting, they still have some limitations:

  • DNNs don’t choose analysis factors on their own. If a data scientist disregards some factor, a DNN won’t know of its influence on the demand.
  • DNNs are greedy for data to learn from. The size of the training data set should not be less than the number of weights. And, as we have already discussed, you can easily end up with hundreds of thousands of weights. Correspondingly, you’ll need as many data records.
  • If a DNN is trained incorrectly, it can fail to distinguish erroneous data from the meaningful signals. As a result, such a network can produce accurate forecasts on the training data but bring up distorted outputs while dealing with new incoming data. This problem is called overfitting, and data scientists can fight it using a dropout technique.
  • Non-technical audience tends to perceive DNNs as ‘magic boxes’ that produce ungrounded figures. You should put some effort into making your account managers trust DNNs.
  • DNNs still can’t take into account force majeure, like natural disasters, government decisions, etc.

So, where does your heart lie?

From our consulting experience, we see that contemporary data science in most cases outperforms traditional methods, especially when it comes to identifying non-linear dependencies in data. However, this doesn’t mean that traditional data science methods should be completely disregarded. They still can be considered for producing short-term forecasts. For example, recently we successfully delivered sales forecasting for an FMCG manufacturer, where we applied linear regression, ARIMA, median forecasting, and zero forecasting.


Bringing data science on board is promising, yet difficult. We’ll solve all the challenges and let you enjoy the advantages that data science offers.