Huawei Research France

Transfer learning on home network:

Build a transfer learning solution for home network failure prediction

Aladin Virmaux, Illyyne Saffar, Jianfeng Zhang, Balázs Kégl (Huawei Research, Noah's Ark Laboratory, France)

Introduction

Optical access network (OAN) is a mainstream Home Broadband Access Network solution around the world. It connects terminal subscribers to their service provider. Network failures affect both the quality of the service (QoS) and the user experience (the quality of experience QoE). To reduce the damage, it is important to predict in advance the network failures and fix them in time. Machine learning (ML) algoritms have been widely used as a solution to build these failure prediction models. However, most ML models are data-specific and are prone to degradation when the data distribution changes. This year's first Huawei France data challenge aims at solving this problem. You will receive a labeled optical access network dataset from a city we call "A" (which we name the source domain) and a mostly unlabeled dataset from a city "B" (which we name the target domain). You are asked to build a transfer learning solution using the labaled source data plus the unlabeled target data to train a failure prediction model for city B. It is an unsupervised domain adaptation (UDA) problem. To be precise, we do include a small number of labeled target points in the training set so we can call the setup "few-shot UDA" or "semi-supervised domain adaptation".

Additional challenges will come from

  1. missing values: there are a lot of missing values in the data;
  2. time series sensor data;
  3. class imbalance: network failures are rare, thus it is very imbalanced classification problem.

Context

Transmission technologies have evolved to integrate optical technologies even in access networks, as close as possible to the subscriber. Currently, fiber optic is the transmission medium par excellence due to its ability to propagate signal over long distances without regeneration, its low latency, and its very high bandwidth. Optical fiber, initially deployed in very long distance and very high speed networks, is now tending to be generalized to offer more consumer services in terms of bandwidth. These are FTTH technologies for "Fiber to the Home".

The FTTH generally adopted by operators is a PON (Passive Optical Network) architecture. The PON is a point-to-multipoint architecture based on the following elements:

The data for this challenge is collected from sensors at the ONT level.

The data

The data is coming from two different cities: city A (the source) and city B (the target). Data is labeled for city A but (mostly) unlabeled for city B (only 20% of labeled data is known for city B). For both cities A and B, the data is a time series collected for abuout 60 days. The granularity of the time series is 15 minutes. The samples reprensent different users (thus different ONT). At each time step, we have a ten-dimensional measurement of the following features (in parenthesis are the units of each feature).

The goal of the challenge is to separate weak from failure, the good data are just given as side information (could be used for calibration), thus the goal is to submit a binary classifier.

Let $x_t$ be the sample collected at the day $t$, then the label corresponding is computed on the day $t+7$. We aim to predict a failure from data coming from 7 days before.

The data is given to you with shape [users, timestamps, features] and the features are given in the same order as presented above. For each user and timestamp, we aggregate seven days of data.

Note that the public data set (given to you with the starting kit) and the private data set (used to evaluate your submissions on the server) come from the same distribution, so in principle you could use the labeled public target data to learn a classifier and submit the actual function. This would defeat the purpose of transfer learning, so we decided to slightly but significantly transform the private data set to make this strategy non-performing.

Missing data

You will notice that a some data is missing in the datasets. There may be several reasons:

  1. No data was gathered on a specific date for a specific user.
  2. The data collecting process fails to retrieve a feature.

It is part of the challenge to overcome this real-life difficulty.

The scoring metrics

In this challenge we evaluate the performance using five different metrics:

Note that the average precision (ap) is the official metric used for the final evaluation.

Competition rules

Participants accept these rules automatically when signing up to the “Transfer learning for detecting failures of optical access network” Data Challenge.

Getting started

Besides the usual pydata libraries, you will need to install ramp-workflow:

pip install git+https://github.com/paris-saclay-cds/ramp-workflow.git

It will install the rampwf library and the ramp-test script that you can use to check your submission before submitting. You do not need to know this package for participating in the challenge, but it could be useful to take a look at the documentation if you would like to know what happens when we test your model, especially the RAMP execution page to understand ramp-test, and the commands to understand the different command line options.

Read problem.py so you can have an access to the same interface as the testing script.

The data

First take the public data set from the #oan_failure_challenge channel of the Slack team (join by clicking here) and unzip it to create ./data, then execute the prepare_data.py script in ./data. Note that the public data given to you is different from the private data used to evaluate your submissions at the server.

The training data is composed of source and target data coming respectively from city A and city B. In real life, the FTTH problem has three classes: 1) the flow is normal and everything is going smoothly (good), 2) the flow is poor but the conection still working (weak), and 3) failure. For the OAN failure detection we are interested in a binary classification between the two classes: [weak, failure]. You are free to exploit the data of the good class but in the scoring you are only judged on the binary classification.

The dataset you are given is composed of:

Since we are interested in the performance of the classifier on the target data, the test set is composed entirely of target data. predict will receive both X_test.target and X_test.target_bkg, and expected to produce probabilities of the weak and failure labels only for the X_test.target.

Read the training and test data.

The input data is three-dimensional (sample, time, features). Time has 672 dimensions (4 times an hour $\times$ 24 hours $\times$ 7 days). It contains nan values thus it should be cleaned.

The classification task

You should submit a feature extractor and a classifier. The transform function of the feature extractor is executed on every input data (target, source, bkg) and the resulting arrays are passed to both to the fit and the predict functions of the classifier. The feature extractor of the starting kit replaces nans by zero, and flattens the matrix to (sample, 6720).

The starting kit implements a naive domain adaptation where the model (random forest) trained on the source is used to classify the target.

You can look at the workflow code at external_imports/utils/workflow.py to see exactly how your submissions are loaded and used. You can execute the training and prediction of your submission here in the notebook. When you run ramp-test, we do cross validation; here you use the full training data to train and test data to test. This page gives you a brief overview of what happens behind the scenes when you run the ramp-test script.

The scores

We compute six scores on the classification. All scores are implemented in external_imports.utils.scores.py so you can look at the precise definitions there. The official score of the competition is ap.

The cross validation scheme

We use a ten-fold shuffle split cross-validation (stratified when labels are available) for all data sets. In each fold, 20% of the instances are in the validation set, except for the labeled target data which serves mostly for validation (to obtain an unbiased estimate of the test scores, evaluated entirely on labeled target samples). We do put twenty labeled target points in the training folds. The rationale is that when we extend our broadband services to City B, we may obtain a small set of labeled data rapidly, but we would like to deploy our failure detector without waiting two months for collecting a data comparable to that of City A.

The cross-validation scheme (see problem.get_cv) is implemented in the TLShuffleSplit class of external_imports.utils.cv.py, if you want to take a closer look.

You are free to play with both the train/test cut and the cross-validation when developing your models but be aware that we will use the same setup on the official server as the one in the RAMP kit (on a different set of four campaigns that will not be available to you).

The following cell goes through the same steps as the official evaluation script (ramp-test).

We compute both the mean test score and the score of bagging your ten models. The official ranking will be determined by the bagged test score (on different data sets from the ones you have). Your public score will be the bagged validation score (the averaging is slightly more complicated since we need to take care of the cross validation masks properly).

Example submissions

Besides the starting kit we give you two other example submissions. The feature extractor is the same in all three. source_rf is similar to the starting kit, just uses more and deeper trees, to obtain a better score. target_rf is another extreme submission that only uses the (few) labeled target training instance to learn a classifier. It has a slightly worse perfomance than source_rf which means that the source data does enhance the classifer even though the source and target distributions differ.

Results:

ap rec-5 rec-10 rec-20 acc auc
source_rf 0.191 ± 0.0026 0.073 ± 0.002 0.176 ± 0.0032 0.357 ± 0.0075 0.84 ± 0.0014 0.637 ± 0.0063
target_rf 0.163 ± 0.0218 0.067 ± 0.0182 0.138 ± 0.0339 0.272 ± 0.0537 0.813 ± 0.036 0.591 ± 0.0399

The big transfer learning question to solve is: how to combine the low-bias high-variance target data with the high-bias low-variance source data. Other questions we're expecting to see answers to:

  1. Can we do a better preprocessing (missing data amputation, using the time in a more intelligent way) in the feature extractor?
  2. Normally the background data (good instances) does not participate in the scoring, but it can inform the classifier of the distribution shift. How to use this information the best?

Local testing (before submission)

You submission will contain a feature_extractor.py implementing a FeatureExtractor class with a transform function (no fit) and a classifier.py implementing a Classifier class with a fit and predict_proba functions as in the starting kit. You should place it in the submission/<submission_name> folder in your RAMP kit folder. To test your submission, go to your RAMP kit folder in the terminal and type

ramp-test --submission <submission_name>

It will train and test your submission much like we did it above in this notebook, and print the foldwise and summary scores. You can try it also in this notebook:

If you want to have a local leaderboard, use the --save-output option when running ramp-test, then try ramp-show leaderboard with different options. For example:

ramp-show leaderboard --mean --metric "['ap','auc']" --step "['valid','test']" --precision 3

and

ramp-show leaderboard --bagged --metric "['auc']"

RAMP also has an experimental hyperopt feature, with random grid search implemented. If you want to use it, type

ramp-hyperopt --help

and check out the example submission here.

Submission

  1. First you will need to sign up at the Huawei RAMP site. Your will be approved shortly by a system admin once your student status is verified.
  2. You will then need a second sign-up, this time for the OAN failure challenge. If your site sign-up was approved in the previous point, you should see a "Join event" button on the right of the top menu. This request will also be approved by a site admin.
  3. Once you are signed up, you can form or join a team (be careful: you can only change teams while neither you nor the team you would like to join have submitted a solution) and start submitting (once a day). If you are happy with your local scores, copy-paste your submission at the sandbox, press "submit now", name your submission, then give credits to which other submission you used (in the competitive phase you will see only your own submissions in the list.
  4. Your submission will be sent to train. It will either come back with an error or will be scored. You can follow the status at my submissions.
  5. If there is an error, click on the error to see the trace. You can resubmit a failed submission under the same name, this will not count in your daily quota.
  6. There is no way to delete trained submissions. In exceptional cases we can stop a submission that hasn't been scored yet so you can resubmit. We strongly suggest to finish training at least one fold locally (using ramp-test) before submitting so you can estimate the training time.
  7. You can follow the scores of the other participants at the public leaderboard.
  8. The public competition leaderboard displays the top submission (according to the public score) of each participant. You can change which of your submission enters the competition by pulling out the top submission. Click on the particular submission at my submissions and click on the yellow button. The operation is reversible as many times you want, even after the competition deadline.

Contact

You can contact the organizers in the Slack of the challenge, join by clicking here.