Appendix C: Time-Series Use Case¶
The following use case is taken from the Time is Money! Automate Your Time-Series Forecasets with Driverless AI blog post.
Time-series forecasting is one of the most common and important tasks in business analytics. There are many real-world applications like sales, weather, stock market, energy demand, just to name a few. We strongly believe that automation can help our users deliver business value in a timely manner. Therefore, once again we translated our Kaggle Grand Masters’ time-series recipes into our automatic machine learning platform Driverless AI (version 1.2).
The key features/recipes that make automation prossible are
- Automatic handling of time groups (e.g., different stores and departments)
- Robust time-series validation
- Accounts for gaps and forecaset horizon
- Uses past information only (i.e., no data leakage)
- Time-series specific feature engineering recipes
- Date features like day of week, day of month, etc.
- AutoRegressive features, like optimal lag and lag-features interaction
- Different types of exponentially weighted moving averages
- Aggregation of past information (different time groups and time intervals)
- Target transformations and differentiation
- Integration with existing feature engineering functions (recipes and optimization)
- Automatic pipeline generation (See this blog post.)
A Typical Example: Sales Forecasting¶
Below is a typical example of sales forecasting based on the Walmart competition on Kaggle. In order to frame it as a machine learning problem, we formulate the historical sales data and additional attributes as shown below:
Data formulated for machine learning
Once you have your data prepared in tabular format (see raw data above), Driverless AI can formulate it for machine learning and sort out the rest. If this is your very first session, the Driverless AI assistant will guide you through the journey.
Similar to previous Driverless AI examples, you need to select the dataset for training/test and define the target. For time-series, you need to define the time column (by choosing AUTO or selecting the date column manually). If weighted scoring is required (like the Walmart Kaggle competition), you can select the column with specific weights for different samples.
If you prefer to use automatic handling of time groups, you can leave the setting for time groups columns as AUTO.
Expert users can define specific time groups and change other settings as shown below.
Once the experiment is finished, you can make new predictions and download the scoring pipeline just like any other Driverless AI experiments.