#


This notebook was written for a university project. </font>

#

Homework assignment notebook

General information:

  • The homework assignment can be conducted in groups of 1 to 2 people (1 homework submission per group).
  • The homework should be submitted via email to **rakers[at]pharm.kyoto-u.ac.jp** using the subject "homework - data analysis"
  • **Deadline: 1st of August (Thu)**

General evaluation criteria:

  • Quality of workflow execution
  • Documentation/ reporting of executed steps (data analysis procedures)
  • Interpretation of findings/ results
  • Bonus points for:
    • Creativity (e.g. in applying new statistical or technical methods, finding new ways to visualise results, ...)
    • Depth of analysis and/or interpretation

0. Import the modules necessary for your workflow

We start by importing the display utilities in order to print some markdown, format the dates, and then we import pandas, matplot, numpy and seaborn in order to gather the dataset, manipulate and print some information from it, and we import the sklearn modules to apply the K-means, linear and random forest regressions.

In [1]:
from IPython.display import display, Markdown, Image
from datetime import datetime

import pandas as pd
import matplotlib.pyplot
import numpy as np
import seaborn
import pydot

from sklearn.cluster import KMeans
from sklearn.model_selection import train_test_split 
from sklearn.linear_model import LinearRegression
from sklearn import metrics
from sklearn.ensemble import RandomForestRegressor
from sklearn.tree import export_graphviz
In [2]:
%matplotlib inline

1. Pick a dataset and load it into jupyter

For this homework, I decided to use the regularity dataset provided by the SNCF for French regional trains.
The dataset is available here (link in french)

We first define the API endpoint in order to download the dataset. We also define useful variables such as the number of datapoints, the date of the first recorded,and the date of the last recorded.

We then convert the dates to readable formats, we fill NaN values with 0 in order to avoid any type of problem, and we convert some fields to their real types as it is not automatically done.

For debugging purposes, we also display the table to check what informations are available, and how they are organized.

In [3]:
url = "https://data.sncf.com/explore/dataset/regularite-mensuelle-ter/download/?format=csv&timezone=Asia/Tokyo"

# Utility variables : number of lines in the dataset, most recent and oldest data
countDatapoints = 0
first = ""
last = ""
In [4]:
test = pd.read_csv(url, sep=';',header=0)

# Setting the datetime type on date with the correct format
test.date = pd.to_datetime(test.date)
test.date = test.date.dt.strftime('%Y-%m')

# Filling NaN values with fake data
test.fillna(0, inplace=True)

# Converting fields to their real types
test.nombre_de_trains_programmes = test.nombre_de_trains_programmes.astype(int)
test.nombre_de_trains_ayant_circule = test.nombre_de_trains_ayant_circule.astype(int)
test.nombre_de_trains_annules = test.nombre_de_trains_annules.astype(int)
test.nombre_de_trains_en_retard_a_l_arrivee = test.nombre_de_trains_en_retard_a_l_arrivee.astype(int)

# Set utility variables
first = test.date.min()
last = test.date.max()
countDatapoints = test.id.count()

# Display the table with all the data
pd.DataFrame(test).sort_values(by='date')
Out[4]:
id date region nombre_de_trains_programmes nombre_de_trains_ayant_circule nombre_de_trains_annules nombre_de_trains_en_retard_a_l_arrivee taux_de_ponctualite nombre_de_trains_a_lheure_pour_un_train_en_retard_a_larrivee commentaires
0 TER_3 2013-01 Auvergne 5785 5732 53 431 92.5 12.3 Conditions météos défavorables.
861 TER_5 2013-01 Bourgogne 8400 8332 68 625 92.5 12.3 Un mois de janvier qui surpasse les six exerci...
862 TER_19 2013-01 Provence Alpes Côte d'Azur 13956 13219 737 1816 86.3 6.3 0
863 TER_20 2013-01 Rhône Alpes 31438 30779 659 3552 88.5 7.7 0
864 TER_13 2013-01 Lorraine 0 0 0 0 0.0 0.0 Le Président de la Région Lorraine s'est oppos...
576 TER_15 2013-01 Nord Pas de Calais 19227 18890 337 2332 87.7 7.1 Vol de câble à Lille Sud.
575 TER_12 2013-01 Limousin 3817 3770 47 210 94.4 17.0 0
574 TER_7 2013-01 Centre 9882 9687 195 812 91.6 10.9 Trois incidents caténaires lourds, dont deux p...
573 TER_6 2013-01 Bretagne 8776 8631 145 554 93.6 14.6 Fortes chutes de neige ayant entrainé des pert...
572 TER_1 2013-01 Alsace 20095 19874 221 897 95.5 21.2 Intempéries.
1150 TER_18 2013-01 Poitou Charentes 3269 3134 135 205 93.5 14.3 Mouvements sociaux des agents du service comme...
860 TER_4 2013-01 Basse Normandie 3331 3297 34 311 90.6 9.6 Grand froid et épisode neigeux les semaines 3 ...
279 TER_16 2013-01 Pays de la Loire 10407 10195 212 713 93.0 13.3 0
278 TER_11 2013-01 Languedoc Roussillon 5024 4897 127 377 92.3 12.0 Chute de neige sur le littoral.
277 TER_10 2013-01 Haute Normandie 5957 5878 79 488 91.7 11.0 Episodes neigeux et difficultés matériels.
276 TER_8 2013-01 Champagne Ardenne 6648 6595 53 334 94.9 18.7 Conditions météorologiques dégradées du 15 au ...
859 TER_2 2013-01 Aquitaine 8099 8014 85 731 90.9 10.0 Intempéries à partir du 20 janvier.
1 TER_9 2013-01 Franche Comté 5826 5744 82 553 90.4 9.4 0
2 TER_14 2013-01 Midi Pyrénées 8208 7941 267 903 88.6 7.8 0
3 TER_17 2013-01 Picardie 11754 11673 81 1245 89.3 8.4 Fortes chutes de neige ayant entrainé des pert...
4 TER_7 2013-02 Centre 8979 8830 149 629 92.9 13.0 0
284 TER_13 2013-02 Lorraine 0 0 0 0 0.0 0.0 Le Président de la Région Lorraine s'est oppos...
283 TER_20 2013-02 Rhône Alpes 28215 27910 305 3771 86.5 6.4 Météo difficile et intrusions.
282 TER_10 2013-02 Haute Normandie 5349 5301 48 211 96.0 24.1 0
281 TER_6 2013-02 Bretagne 7729 7644 85 448 94.1 16.1 Quelques journées perturbées par des épisodes ...
280 TER_4 2013-02 Basse Normandie 3013 2991 22 185 93.8 15.2 Nombreuses difficultés de circulation et plusi...
869 TER_12 2013-02 Limousin 3449 3406 43 219 93.6 14.6 0
868 TER_5 2013-02 Bourgogne 7418 7337 81 630 91.4 10.6 Un mois de février en retrait d'un point sur l...
867 TER_3 2013-02 Auvergne 5229 5166 63 427 91.7 11.1 Conditions météos et dérangement d'installatio...
871 TER_18 2013-02 Poitou Charentes 3014 2991 23 118 96.1 24.3 0
... ... ... ... ... ... ... ... ... ... ...
1455 TER_1 2019-12 Grand Est 15858 15341 517 878 94.3 16.5 0
1143 TER_14 2019-12 Occitanie 2777 2599 178 431 83.4 5.0 0
1142 TER_6 2019-12 Bretagne 3584 3545 39 387 89.1 8.2 0
854 TER_7 2019-12 Centre 4025 3788 237 505 86.7 6.5 0
855 TER_15 2019-12 Hauts de France 10426 9999 427 1419 85.8 6.0 0
1144 TER_19 2019-12 Provence Alpes Côte d'Azur 4336 3459 877 585 83.1 4.9 0
565 TER_5 2019-12 Bourgogne-Franche Comté 5673 5252 421 518 90.1 9.1 0
566 TER_16 2019-12 Pays de la Loire 3835 3726 109 664 82.2 4.6 0
1458 TER_2 2020-01 Aquitaine 10387 10011 376 1153 87.1 6.7 0
569 TER_19 2020-01 Provence Alpes Côte d'Azur 9905 9199 706 1065 85.5 5.9 0
1145 TER_6 2020-01 Bretagne 7664 7584 80 287 96.1 24.9 0
1146 TER_7 2020-01 Centre 7991 7820 171 690 89.9 8.9 0
1147 TER_16 2020-01 Pays de la Loire 9655 9441 214 795 91.4 10.6 0
567 TER_5 2020-01 Bourgogne-Franche Comté 12135 11848 287 879 92.2 11.9 0
568 TER_15 2020-01 Hauts de France 22125 21551 574 2476 87.8 7.2 0
856 TER_1 2020-01 Grand Est 33309 32737 572 1583 94.7 18.0 0
271 TER_3 2020-01 Auvergne-Rhône Alpes 24279 23486 793 2105 89.8 8.8 0
273 TER_14 2020-01 Occitanie 6307 5802 505 717 85.3 5.8 0
272 TER_4 2020-01 Normandie 8356 8102 254 1297 83.3 5.0 0
857 TER_4 2020-02 Normandie 10373 10000 373 1279 87.2 6.8 0
1459 TER_2 2020-02 Aquitaine 16304 16018 286 1162 92.8 12.8 0
275 TER_19 2020-02 Provence Alpes Côte d'Azur 13719 13297 422 1271 90.4 9.5 0
1149 TER_6 2020-02 Bretagne 8733 8580 153 405 95.3 20.2 0
1148 TER_5 2020-02 Bourgogne-Franche Comté 14686 14352 334 1241 91.3 10.6 0
570 TER_1 2020-02 Grand Est 40918 39278 1640 2518 93.6 14.6 0
571 TER_16 2020-02 Pays de la Loire 12174 11941 233 689 94.2 16.3 0
1460 TER_7 2020-02 Centre 9903 9617 286 792 91.8 11.1 0
858 TER_15 2020-02 Hauts de France 29503 27962 1541 3076 89.0 8.1 0
274 TER_3 2020-02 Auvergne-Rhône Alpes 36026 35182 844 2970 91.6 10.8 0
1461 TER_14 2020-02 Occitanie 11512 11172 340 1112 90.0 9.1 0

1462 rows × 10 columns

2. Data inspection

Describe the raw data (For example: What is the source and background of the data? What kind of descriptors does it include? How many data points? Which attribute(s) could be used for prediction tasks (e.g. classification)?).
The following aspects might help to assess the data:

  • Number and types of attributes
  • Number of instances per attribute
  • Numeric data ranges of attributes
  • Distribution of data values per attribute (e.g. through plotting histograms and/or applying kernel density estimator plotting function (KDE))
  • Use scatter plots to visualise the relationship between some of the descriptors
  • ...
In [5]:
display(Markdown("The data represents the monthly regularity rates for the french local trains between "+first+" and "+last+"."))
display(Markdown("There are "+str(countDatapoints)+" datapoints inside of this dataset. It contains the following "+str(len(test.dtypes))+" fields :"))
print(test.dtypes)
display(Markdown("The data distribution for each attribute is represented via the following boxplot :"))
test.plot.box(figsize=(24, 16))

The data represents the monthly regularity rates for the french local trains between 2013-01 and 2020-02.

There are 1462 datapoints inside of this dataset. It contains the following 10 fields :

id                                                               object
date                                                             object
region                                                           object
nombre_de_trains_programmes                                       int32
nombre_de_trains_ayant_circule                                    int32
nombre_de_trains_annules                                          int32
nombre_de_trains_en_retard_a_l_arrivee                            int32
taux_de_ponctualite                                             float64
nombre_de_trains_a_lheure_pour_un_train_en_retard_a_larrivee    float64
commentaires                                                     object
dtype: object

The data distribution for each attribute is represented via the following boxplot :

Out[5]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cbeeb57358>

Source: https://data.sncf.com/explore/dataset/regularite-mensuelle-ter/table/?disjunctive.region&sort=date

The header for the CSV file downloaded previously is the following :

ID;Date;Région;Nombre de trains programmés;Nombre de trains ayant circulé;Nombre de trains annulés;Nombre de trains en retard à l'arrivée;Taux de régularité;Nombre de trains à l'heure pour un train en retard à l'arrivée;Commentaires

A datapoint looks like this :

TER_3;2013-01;Auvergne;5785;5732;53;431;92.5;12.3;Conditions météos défavorables.

Each datapoint has the following 10 fields (the names are french in the dataset, the descriptions are translated for more convenience):

- id : id of the region concerned
- date : YYYY-MM Year and month for those results
- région : name of the concerned region
- nombre de trains programmés : Number of scheduled trains for the month - no range
- nombre de trains ayant circulé : number of scheduled trains that have effectively circulated - between 0 and the value of
nombre_de_trains_programmes
- nombre de trains annulés : number of scheduled trains that were cancelled - between 0 and the value of 
nombre_de_trains_programmes
- nombre de trains en retard à l'arrivée : Number of trains delayed among the scheduled trains - between 0 and the value of
nombre_de_trains_programmes
- taux de ponctualité : punctuality rate (percentage of trains on time) - Between -100 and 100. Some datapoints don't have this
value, so the default value will be 0
- nombre de trains à l'heure pour un train en retard à l'arrivée : number of trains that effectively arrived on time for 1 train 
cancelled (in the example, 10 trains were on time for one delayed.
- commentaires : various comments written by the managers about the reasons why trains were delayed or cancelled, and 
observations about the different indicators and results

Some of those fields can be absent, as one region did not disclosed it's punctuality results until 2015, and some of the datapoints may have been filled incorrectly by the transportation authority. In this case, the fields will contain a 0 when concerned by those restrictions.

For the prediction tasks, we can basically use all the attributes, expect for the Commentaires section which contains only informations meant to have a deeper understanding of the data and that are not formatted in a computer-readable format nor in english language. It would have been better to have a standardized system which classifies the situations depending of the context, but this is strictly a limitation of the dataset.

We now display several scatter plots to display the relationships between several attributes. Those relationships are the most present in the dataset, especially for the linear between the number of scheduled trains and any other attribute related to a number of trains.

In [6]:
display(Markdown("Relationship between the number of scheduled trains and the number of trains that have effectively run"))
test.plot.scatter(x="nombre_de_trains_programmes", y="nombre_de_trains_ayant_circule")

Relationship between the number of scheduled trains and the number of trains that have effectively run

Out[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cbeef0a2e8>
In [8]:
display(Markdown("Relationship between the number of scheduled trains and the number of cancelled trains"))
test.plot.scatter(x="nombre_de_trains_programmes", y="nombre_de_trains_annules")

Relationship between the number of scheduled trains and the number of cancelled trains

Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cbeefb0e48>
In [9]:
display(Markdown("Relationship between the number of scheduled trains and the punctuality rate"))
test.plot.scatter(x="nombre_de_trains_programmes", y="taux_de_ponctualite")

Relationship between the number of scheduled trains and the punctuality rate

Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cbeeffea58>
In [10]:
display(Markdown("Relationship between the number of scheduled trains and the number of trains on time for one delayed train"))
test.plot.scatter(x="nombre_de_trains_programmes", y="nombre_de_trains_a_lheure_pour_un_train_en_retard_a_larrivee")

Relationship between the number of scheduled trains and the number of trains on time for one delayed train

Out[10]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cbef05ea90>
In [11]:
display(Markdown("Relationship between the punctuality rate and the number of trains on time for one delayed train"))
test.plot.scatter(x="taux_de_ponctualite", y="nombre_de_trains_a_lheure_pour_un_train_en_retard_a_larrivee")

Relationship between the punctuality rate and the number of trains on time for one delayed train

Out[11]:
<matplotlib.axes._subplots.AxesSubplot at 0x1cbef0b9550>

3. Basic data exploration

Analyse your data in terms of measures of central tendency and dispersion (Calculation of statistics AND visualization).
(Examples: mean, median, standard deviation, variance, 25th/75th percentile, ...)

In [12]:
display(Markdown("The global data analysis gives us the following :"))
pd.DataFrame(test).describe()

The global data analysis gives us the following :

Out[12]:
nombre_de_trains_programmes nombre_de_trains_ayant_circule nombre_de_trains_annules nombre_de_trains_en_retard_a_l_arrivee taux_de_ponctualite nombre_de_trains_a_lheure_pour_un_train_en_retard_a_larrivee
count 1462.000000 1462.000000 1462.000000 1462.000000 1462.000000 1462.000000
mean 10824.285910 10607.298906 222.194938 947.478796 88.832353 12.330711
std 8333.604446 8169.137232 263.820233 899.507293 15.326816 6.387889
min 0.000000 0.000000 0.000000 0.000000 -38.400000 -0.300000
25% 5498.500000 5426.750000 70.000000 369.000000 88.800000 8.000000
50% 8412.500000 8281.500000 134.000000 654.000000 91.700000 11.000000
75% 13331.500000 12949.750000 280.000000 1106.500000 93.900000 15.500000
max 46329.000000 45569.000000 4024.000000 10996.000000 98.000000 49.700000

We then display plots of the punctuality rates for each region per month, and the means and medians for each set

In [13]:
print("Displaying punctuality rates per month for each region, along with means and medians\n")
for i in range(1,20):
    # Create the id to sort and filter the data
    id = "TER_"+str(i)
    
    # Filter to keep only the data for the selected region, sorted by date
    m = test.loc[test.id == id].sort_values(by='date')
    
    # Draw the plot with all the parameters
    matplotlib.pyplot.rcParams["figure.figsize"] = [16.0, 16.0]
    matplotlib.pyplot.title('Punctuality rate per month for '+m.region.all())
    print("Region : "+m.region.all())
    matplotlib.pyplot.xlabel('Date')
    matplotlib.pyplot.ylabel('Punctuality rate')
    matplotlib.pyplot.xticks(rotation=90)
    matplotlib.pyplot.plot(m.date,m.taux_de_ponctualite)
    
    # Compute the mean, display and draw it
    mean = m.taux_de_ponctualite.mean()
    print("Mean : "+str(mean))
    horiz_line_data = np.array([mean for i in range(0,len(m.date))])
    matplotlib.pyplot.plot(m.date, horiz_line_data)
    
    #compute the median, display and draw it
    median = m.taux_de_ponctualite.median()
    print("Median : "+str(median))
    horiz_line_data2 = np.array([median for i in range(0,len(m.date))])
    matplotlib.pyplot.plot(m.date, horiz_line_data2)
    
    matplotlib.pyplot.show()
    
    
#seaborn.heatmap(punct, cmap='hot')
#matplotlib.pyplot.boxplot(punct, date)
Displaying punctuality rates per month for each region, along with means and medians

Region : Grand Est
Mean : 95.47093023255812
Median : 95.6
Region : Aquitaine
Mean : 89.15813953488373
Median : 89.2
Region : Auvergne-Rhône Alpes
Mean : 91.72674418604652
Median : 92.35
Region : Normandie
Mean : 93.2906976744186
Median : 93.9
Region : Bourgogne-Franche Comté
Mean : 91.25232558139535
Median : 91.5
Region : Bretagne
Mean : 94.66860465116284
Median : 95.15
Region : Centre
Mean : 91.59302325581396
Median : 92.0
Region : Champagne Ardenne
Mean : 94.47916666666667
Median : 94.9
Region : Franche Comté
Mean : 91.71000000000001
Median : 91.95
Region : Haute Normandie
Mean : 94.51666666666665
Median : 94.80000000000001
Region : Languedoc Roussillon
Mean : 88.30999999999997
Median : 88.75
Region : Limousin
Mean : 91.29166666666666
Median : 91.85
Region : Lorraine
Mean : 25.72708333333334
Median : 0.0
Region : Occitanie
Mean : 86.71279069767444
Median : 89.65
Region : Hauts de France
Mean : 88.89186046511628
Median : 91.4
Region : Pays de la Loire
Mean : 92.11162790697676
Median : 92.6
Region : Picardie
Mean : 91.02
Median : 91.05
Region : Poitou Charentes
Mean : 90.88166666666666
Median : 91.1
Region : Provence Alpes Côte d'Azur
Mean : 84.60697674418607
Median : 84.8
In [14]:
display(Markdown("We then display the Boxplots for each region. They report, for each numerical attribute related to the number of trains (number of trains scheduled, that have circulated, delayed at final stop and cancelled), the quartiles, means, and the upper and lower bounds."))
for i in range(1,20):
    # Create the id to sort and filter the data
    id = "TER_"+str(i)
    
    # Filter to keep only the data for the selected region, sorted by date, and with only the attributes that we want
    m = test.loc[test.id == id, ["date","region","nombre_de_trains_programmes","nombre_de_trains_ayant_circule","nombre_de_trains_en_retard_a_l_arrivee","nombre_de_trains_annules"]].sort_values(by='date')
    
    matplotlib.pyplot.rcParams["figure.figsize"] = [24.0, 16.0]
    m.plot.box()
    matplotlib.pyplot.title('Boxplot for '+m.region.all())
    
    matplotlib.pyplot.show()

We then display the Boxplots for each region. They report, for each numerical attribute related to the number of trains (number of trains scheduled, that have circulated, delayed at final stop and cancelled), the quartiles, means, and the upper and lower bounds.

In [15]:
display(Markdown("We do the same for the punctuality rates and the number of trains on time for one delayed train, since their values are much smaller (generally between 0 and 100, with some exceptions)."))
for i in range(1,20):
    # Create the id to sort and filter the data
    id = "TER_"+str(i)
    
    # Filter to keep only the data for the selected region, sorted by date, and with only the attributes that we want
    m = test.loc[test.id == id, ["date","region","taux_de_ponctualite","nombre_de_trains_a_lheure_pour_un_train_en_retard_a_larrivee"]].sort_values(by='date')
    
    matplotlib.pyplot.rcParams["figure.figsize"] = [24.0, 16.0]
    m.plot.box()
    matplotlib.pyplot.title('Boxplot for '+m.region.all())
    
    matplotlib.pyplot.show()

We do the same for the punctuality rates and the number of trains on time for one delayed train, since their values are much smaller (generally between 0 and 100, with some exceptions).

4. Correlation analysis

Calculate covariance and correlation matrices.
Which attributes are highly (positively or negatively) correlated?
Create a heatmap of the correlation matrix.

We output the matrices and heatmaps for the whole dataset, and for each region individually

In [15]:
print("Covariance and correlation matrices for all regions")

m = test.sort_values(by='date')

m.cov()
cl = m.corr()
print(cl)
    
seaborn.heatmap(cl,annot=True, cmap='hot')
matplotlib.pyplot.show()

for i in range(1,20):
    # Create the id to sort and filter the data
    id = "TER_"+str(i)
    
    # Filter to keep only the data for the selected region, sorted by date
    m = test.loc[test.id == id].sort_values(by='date')
    
    print("Covariance and correlation matrices for "+m.region.all())
    m.cov()
    cl = m.corr()
    
    print(cl)
    
    seaborn.heatmap(cl,annot=True, cmap='hot')
    matplotlib.pyplot.show()
Covariance and correlation matrices for all regions
                                                    nombre_de_trains_programmes  \
nombre_de_trains_programmes                                            1.000000   
nombre_de_trains_ayant_circule                                         0.999531   
nombre_de_trains_annules                                               0.625439   
nombre_de_trains_en_retard_a_l_arrivee                                 0.759383   
taux_de_ponctualite                                                    0.186293   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                     0.061704   

                                                    nombre_de_trains_ayant_circule  \
nombre_de_trains_programmes                                               0.999531   
nombre_de_trains_ayant_circule                                            1.000000   
nombre_de_trains_annules                                                  0.605500   
nombre_de_trains_en_retard_a_l_arrivee                                    0.754595   
taux_de_ponctualite                                                       0.188784   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                        0.070155   

                                                    nombre_de_trains_annules  \
nombre_de_trains_programmes                                         0.625439   
nombre_de_trains_ayant_circule                                      0.605500   
nombre_de_trains_annules                                            1.000000   
nombre_de_trains_en_retard_a_l_arrivee                              0.611169   
taux_de_ponctualite                                                 0.047005   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                 -0.203453   

                                                    nombre_de_trains_en_retard_a_l_arrivee  \
nombre_de_trains_programmes                                                       0.759383   
nombre_de_trains_ayant_circule                                                    0.754595   
nombre_de_trains_annules                                                          0.611169   
nombre_de_trains_en_retard_a_l_arrivee                                            1.000000   
taux_de_ponctualite                                                              -0.050789   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                               -0.353818   

                                                    taux_de_ponctualite  \
nombre_de_trains_programmes                                    0.186293   
nombre_de_trains_ayant_circule                                 0.188784   
nombre_de_trains_annules                                       0.047005   
nombre_de_trains_en_retard_a_l_arrivee                        -0.050789   
taux_de_ponctualite                                            1.000000   
nombre_de_trains_a_lheure_pour_un_train_en_reta...             0.499492   

                                                    nombre_de_trains_a_lheure_pour_un_train_en_retard_a_larrivee  
nombre_de_trains_programmes                                                                  0.061704             
nombre_de_trains_ayant_circule                                                               0.070155             
nombre_de_trains_annules                                                                    -0.203453             
nombre_de_trains_en_retard_a_l_arrivee                                                      -0.353818             
taux_de_ponctualite                                                                          0.499492             
nombre_de_trains_a_lheure_pour_un_train_en_reta...                                           1.000000             
Covariance and correlation matrices for Grand Est
                                                    nombre_de_trains_programmes  \
nombre_de_trains_programmes                                            1.000000   
nombre_de_trains_ayant_circule                                         0.999660   
nombre_de_trains_annules                                               0.713599   
nombre_de_trains_en_retard_a_l_arrivee                                 0.920313   
taux_de_ponctualite                                                   -0.521721   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                    -0.498232   

                                                    nombre_de_trains_ayant_circule  \
nombre_de_trains_programmes                                               0.999660   
nombre_de_trains_ayant_circule                                            1.000000   
nombre_de_trains_annules                                                  0.695183   
nombre_de_trains_en_retard_a_l_arrivee                                    0.916457   
taux_de_ponctualite                                                      -0.514000   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                       -0.491722   

                                                    nombre_de_trains_annules  \
nombre_de_trains_programmes                                         0.713599   
nombre_de_trains_ayant_circule                                      0.695183   
nombre_de_trains_annules                                            1.000000   
nombre_de_trains_en_retard_a_l_arrivee                              0.753703   
taux_de_ponctualite                                                -0.579208   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                 -0.532006   

                                                    nombre_de_trains_en_retard_a_l_arrivee  \
nombre_de_trains_programmes                                                       0.920313   
nombre_de_trains_ayant_circule                                                    0.916457   
nombre_de_trains_annules                                                          0.753703   
nombre_de_trains_en_retard_a_l_arrivee                                            1.000000   
taux_de_ponctualite                                                              -0.787124   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                               -0.729527   

                                                    taux_de_ponctualite  \
nombre_de_trains_programmes                                   -0.521721   
nombre_de_trains_ayant_circule                                -0.514000   
nombre_de_trains_annules                                      -0.579208   
nombre_de_trains_en_retard_a_l_arrivee                        -0.787124   
taux_de_ponctualite                                            1.000000   
nombre_de_trains_a_lheure_pour_un_train_en_reta...             0.950966   

                                                    nombre_de_trains_a_lheure_pour_un_train_en_retard_a_larrivee  
nombre_de_trains_programmes                                                                 -0.498232             
nombre_de_trains_ayant_circule                                                              -0.491722             
nombre_de_trains_annules                                                                    -0.532006             
nombre_de_trains_en_retard_a_l_arrivee                                                      -0.729527             
taux_de_ponctualite                                                                          0.950966             
nombre_de_trains_a_lheure_pour_un_train_en_reta...                                           1.000000             
Covariance and correlation matrices for Aquitaine
                                                    nombre_de_trains_programmes  \
nombre_de_trains_programmes                                            1.000000   
nombre_de_trains_ayant_circule                                         0.998909   
nombre_de_trains_annules                                               0.378596   
nombre_de_trains_en_retard_a_l_arrivee                                 0.550463   
taux_de_ponctualite                                                    0.576054   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                     0.619747   

                                                    nombre_de_trains_ayant_circule  \
nombre_de_trains_programmes                                               0.998909   
nombre_de_trains_ayant_circule                                            1.000000   
nombre_de_trains_annules                                                  0.336859   
nombre_de_trains_en_retard_a_l_arrivee                                    0.539071   
taux_de_ponctualite                                                       0.587916   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                        0.631538   

                                                    nombre_de_trains_annules  \
nombre_de_trains_programmes                                         0.378596   
nombre_de_trains_ayant_circule                                      0.336859   
nombre_de_trains_annules                                            1.000000   
nombre_de_trains_en_retard_a_l_arrivee                              0.422782   
taux_de_ponctualite                                                -0.019475   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                 -0.007765   

                                                    nombre_de_trains_en_retard_a_l_arrivee  \
nombre_de_trains_programmes                                                       0.550463   
nombre_de_trains_ayant_circule                                                    0.539071   
nombre_de_trains_annules                                                          0.422782   
nombre_de_trains_en_retard_a_l_arrivee                                            1.000000   
taux_de_ponctualite                                                              -0.291229   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                               -0.282004   

                                                    taux_de_ponctualite  \
nombre_de_trains_programmes                                    0.576054   
nombre_de_trains_ayant_circule                                 0.587916   
nombre_de_trains_annules                                      -0.019475   
nombre_de_trains_en_retard_a_l_arrivee                        -0.291229   
taux_de_ponctualite                                            1.000000   
nombre_de_trains_a_lheure_pour_un_train_en_reta...             0.926968   

                                                    nombre_de_trains_a_lheure_pour_un_train_en_retard_a_larrivee  
nombre_de_trains_programmes                                                                  0.619747             
nombre_de_trains_ayant_circule                                                               0.631538             
nombre_de_trains_annules                                                                    -0.007765             
nombre_de_trains_en_retard_a_l_arrivee                                                      -0.282004             
taux_de_ponctualite                                                                          0.926968             
nombre_de_trains_a_lheure_pour_un_train_en_reta...                                           1.000000             
Covariance and correlation matrices for Auvergne-Rhône Alpes
                                                    nombre_de_trains_programmes  \
nombre_de_trains_programmes                                            1.000000   
nombre_de_trains_ayant_circule                                         0.999878   
nombre_de_trains_annules                                               0.806830   
nombre_de_trains_en_retard_a_l_arrivee                                 0.964094   
taux_de_ponctualite                                                   -0.574404   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                    -0.510346   

                                                    nombre_de_trains_ayant_circule  \
nombre_de_trains_programmes                                               0.999878   
nombre_de_trains_ayant_circule                                            1.000000   
nombre_de_trains_annules                                                  0.797515   
nombre_de_trains_en_retard_a_l_arrivee                                    0.962789   
taux_de_ponctualite                                                      -0.570504   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                       -0.507324   

                                                    nombre_de_trains_annules  \
nombre_de_trains_programmes                                         0.806830   
nombre_de_trains_ayant_circule                                      0.797515   
nombre_de_trains_annules                                            1.000000   
nombre_de_trains_en_retard_a_l_arrivee                              0.822833   
taux_de_ponctualite                                                -0.608475   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                 -0.523841   

                                                    nombre_de_trains_en_retard_a_l_arrivee  \
nombre_de_trains_programmes                                                       0.964094   
nombre_de_trains_ayant_circule                                                    0.962789   
nombre_de_trains_annules                                                          0.822833   
nombre_de_trains_en_retard_a_l_arrivee                                            1.000000   
taux_de_ponctualite                                                              -0.716662   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                               -0.612545   

                                                    taux_de_ponctualite  \
nombre_de_trains_programmes                                   -0.574404   
nombre_de_trains_ayant_circule                                -0.570504   
nombre_de_trains_annules                                      -0.608475   
nombre_de_trains_en_retard_a_l_arrivee                        -0.716662   
taux_de_ponctualite                                            1.000000   
nombre_de_trains_a_lheure_pour_un_train_en_reta...             0.917223   

                                                    nombre_de_trains_a_lheure_pour_un_train_en_retard_a_larrivee  
nombre_de_trains_programmes                                                                 -0.510346             
nombre_de_trains_ayant_circule                                                              -0.507324             
nombre_de_trains_annules                                                                    -0.523841             
nombre_de_trains_en_retard_a_l_arrivee                                                      -0.612545             
taux_de_ponctualite                                                                          0.917223             
nombre_de_trains_a_lheure_pour_un_train_en_reta...                                           1.000000             
Covariance and correlation matrices for Normandie
                                                    nombre_de_trains_programmes  \
nombre_de_trains_programmes                                            1.000000   
nombre_de_trains_ayant_circule                                         0.999282   
nombre_de_trains_annules                                               0.422120   
nombre_de_trains_en_retard_a_l_arrivee                                 0.786327   
taux_de_ponctualite                                                   -0.089720   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                    -0.052969   

                                                    nombre_de_trains_ayant_circule  \
nombre_de_trains_programmes                                               0.999282   
nombre_de_trains_ayant_circule                                            1.000000   
nombre_de_trains_annules                                                  0.387674   
nombre_de_trains_en_retard_a_l_arrivee                                    0.775763   
taux_de_ponctualite                                                      -0.073054   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                       -0.036119   

                                                    nombre_de_trains_annules  \
nombre_de_trains_programmes                                         0.422120   
nombre_de_trains_ayant_circule                                      0.387674   
nombre_de_trains_annules                                            1.000000   
nombre_de_trains_en_retard_a_l_arrivee                              0.571795   
taux_de_ponctualite                                                -0.433886   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                 -0.424369   

                                                    nombre_de_trains_en_retard_a_l_arrivee  \
nombre_de_trains_programmes                                                       0.786327   
nombre_de_trains_ayant_circule                                                    0.775763   
nombre_de_trains_annules                                                          0.571795   
nombre_de_trains_en_retard_a_l_arrivee                                            1.000000   
taux_de_ponctualite                                                              -0.614364   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                               -0.508656   

                                                    taux_de_ponctualite  \
nombre_de_trains_programmes                                   -0.089720   
nombre_de_trains_ayant_circule                                -0.073054   
nombre_de_trains_annules                                      -0.433886   
nombre_de_trains_en_retard_a_l_arrivee                        -0.614364   
taux_de_ponctualite                                            1.000000   
nombre_de_trains_a_lheure_pour_un_train_en_reta...             0.907025   

                                                    nombre_de_trains_a_lheure_pour_un_train_en_retard_a_larrivee  
nombre_de_trains_programmes                                                                 -0.052969             
nombre_de_trains_ayant_circule                                                              -0.036119             
nombre_de_trains_annules                                                                    -0.424369             
nombre_de_trains_en_retard_a_l_arrivee                                                      -0.508656             
taux_de_ponctualite                                                                          0.907025             
nombre_de_trains_a_lheure_pour_un_train_en_reta...                                           1.000000             
Covariance and correlation matrices for Bourgogne-Franche Comté
                                                    nombre_de_trains_programmes  \
nombre_de_trains_programmes                                            1.000000   
nombre_de_trains_ayant_circule                                         0.999101   
nombre_de_trains_annules                                               0.213432   
nombre_de_trains_en_retard_a_l_arrivee                                 0.699082   
taux_de_ponctualite                                                    0.426108   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                     0.501354   

                                                    nombre_de_trains_ayant_circule  \
nombre_de_trains_programmes                                               0.999101   
nombre_de_trains_ayant_circule                                            1.000000   
nombre_de_trains_annules                                                  0.177994   
nombre_de_trains_en_retard_a_l_arrivee                                    0.696578   
taux_de_ponctualite                                                       0.430322   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                        0.507617   

                                                    nombre_de_trains_annules  \
nombre_de_trains_programmes                                         0.213432   
nombre_de_trains_ayant_circule                                      0.177994   
nombre_de_trains_annules                                            1.000000   
nombre_de_trains_en_retard_a_l_arrivee                              0.240723   
taux_de_ponctualite                                                -0.077318   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                 -0.122217   

                                                    nombre_de_trains_en_retard_a_l_arrivee  \
nombre_de_trains_programmes                                                       0.699082   
nombre_de_trains_ayant_circule                                                    0.696578   
nombre_de_trains_annules                                                          0.240723   
nombre_de_trains_en_retard_a_l_arrivee                                            1.000000   
taux_de_ponctualite                                                              -0.319649   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                               -0.236626   

                                                    taux_de_ponctualite  \
nombre_de_trains_programmes                                    0.426108   
nombre_de_trains_ayant_circule                                 0.430322   
nombre_de_trains_annules                                      -0.077318   
nombre_de_trains_en_retard_a_l_arrivee                        -0.319649   
taux_de_ponctualite                                            1.000000   
nombre_de_trains_a_lheure_pour_un_train_en_reta...             0.939437   

                                                    nombre_de_trains_a_lheure_pour_un_train_en_retard_a_larrivee  
nombre_de_trains_programmes                                                                  0.501354             
nombre_de_trains_ayant_circule                                                               0.507617             
nombre_de_trains_annules                                                                    -0.122217             
nombre_de_trains_en_retard_a_l_arrivee                                                      -0.236626             
taux_de_ponctualite                                                                          0.939437             
nombre_de_trains_a_lheure_pour_un_train_en_reta...                                           1.000000             
Covariance and correlation matrices for Bretagne
                                                    nombre_de_trains_programmes  \
nombre_de_trains_programmes                                            1.000000   
nombre_de_trains_ayant_circule                                         0.997662   
nombre_de_trains_annules                                               0.279673   
nombre_de_trains_en_retard_a_l_arrivee                                 0.113021   
taux_de_ponctualite                                                    0.296943   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                     0.200559   

                                                    nombre_de_trains_ayant_circule  \
nombre_de_trains_programmes                                               0.997662   
nombre_de_trains_ayant_circule                                            1.000000   
nombre_de_trains_annules                                                  0.213399   
nombre_de_trains_en_retard_a_l_arrivee                                    0.096074   
taux_de_ponctualite                                                       0.314139   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                        0.220535   

                                                    nombre_de_trains_annules  \
nombre_de_trains_programmes                                         0.279673   
nombre_de_trains_ayant_circule                                      0.213399   
nombre_de_trains_annules                                            1.000000   
nombre_de_trains_en_retard_a_l_arrivee                              0.265958   
taux_de_ponctualite                                                -0.168253   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                 -0.231099   

                                                    nombre_de_trains_en_retard_a_l_arrivee  \
nombre_de_trains_programmes                                                       0.113021   
nombre_de_trains_ayant_circule                                                    0.096074   
nombre_de_trains_annules                                                          0.265958   
nombre_de_trains_en_retard_a_l_arrivee                                            1.000000   
taux_de_ponctualite                                                              -0.899694   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                               -0.836325   

                                                    taux_de_ponctualite  \
nombre_de_trains_programmes                                    0.296943   
nombre_de_trains_ayant_circule                                 0.314139   
nombre_de_trains_annules                                      -0.168253   
nombre_de_trains_en_retard_a_l_arrivee                        -0.899694   
taux_de_ponctualite                                            1.000000   
nombre_de_trains_a_lheure_pour_un_train_en_reta...             0.876132   

                                                    nombre_de_trains_a_lheure_pour_un_train_en_retard_a_larrivee  
nombre_de_trains_programmes                                                                  0.200559             
nombre_de_trains_ayant_circule                                                               0.220535             
nombre_de_trains_annules                                                                    -0.231099             
nombre_de_trains_en_retard_a_l_arrivee                                                      -0.836325             
taux_de_ponctualite                                                                          0.876132             
nombre_de_trains_a_lheure_pour_un_train_en_reta...                                           1.000000             
Covariance and correlation matrices for Centre
                                                    nombre_de_trains_programmes  \
nombre_de_trains_programmes                                            1.000000   
nombre_de_trains_ayant_circule                                         0.895664   
nombre_de_trains_annules                                              -0.019225   
nombre_de_trains_en_retard_a_l_arrivee                                 0.417879   
taux_de_ponctualite                                                    0.033998   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                     0.007339   

                                                    nombre_de_trains_ayant_circule  \
nombre_de_trains_programmes                                               0.895664   
nombre_de_trains_ayant_circule                                            1.000000   
nombre_de_trains_annules                                                 -0.149855   
nombre_de_trains_en_retard_a_l_arrivee                                    0.341472   
taux_de_ponctualite                                                       0.144737   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                        0.159081   

                                                    nombre_de_trains_annules  \
nombre_de_trains_programmes                                        -0.019225   
nombre_de_trains_ayant_circule                                     -0.149855   
nombre_de_trains_annules                                            1.000000   
nombre_de_trains_en_retard_a_l_arrivee                              0.170761   
taux_de_ponctualite                                                -0.250028   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                 -0.289950   

                                                    nombre_de_trains_en_retard_a_l_arrivee  \
nombre_de_trains_programmes                                                       0.417879   
nombre_de_trains_ayant_circule                                                    0.341472   
nombre_de_trains_annules                                                          0.170761   
nombre_de_trains_en_retard_a_l_arrivee                                            1.000000   
taux_de_ponctualite                                                              -0.870350   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                               -0.831030   

                                                    taux_de_ponctualite  \
nombre_de_trains_programmes                                    0.033998   
nombre_de_trains_ayant_circule                                 0.144737   
nombre_de_trains_annules                                      -0.250028   
nombre_de_trains_en_retard_a_l_arrivee                        -0.870350   
taux_de_ponctualite                                            1.000000   
nombre_de_trains_a_lheure_pour_un_train_en_reta...             0.950730   

                                                    nombre_de_trains_a_lheure_pour_un_train_en_retard_a_larrivee  
nombre_de_trains_programmes                                                                  0.007339             
nombre_de_trains_ayant_circule                                                               0.159081             
nombre_de_trains_annules                                                                    -0.289950             
nombre_de_trains_en_retard_a_l_arrivee                                                      -0.831030             
taux_de_ponctualite                                                                          0.950730             
nombre_de_trains_a_lheure_pour_un_train_en_reta...                                           1.000000             
Covariance and correlation matrices for Champagne Ardenne
                                                    nombre_de_trains_programmes  \
nombre_de_trains_programmes                                            1.000000   
nombre_de_trains_ayant_circule                                         0.976279   
nombre_de_trains_annules                                               0.249106   
nombre_de_trains_en_retard_a_l_arrivee                                 0.146028   
taux_de_ponctualite                                                    0.145879   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                     0.130757   

                                                    nombre_de_trains_ayant_circule  \
nombre_de_trains_programmes                                               0.976279   
nombre_de_trains_ayant_circule                                            1.000000   
nombre_de_trains_annules                                                  0.033508   
nombre_de_trains_en_retard_a_l_arrivee                                    0.032561   
taux_de_ponctualite                                                       0.264811   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                        0.228159   

                                                    nombre_de_trains_annules  \
nombre_de_trains_programmes                                         0.249106   
nombre_de_trains_ayant_circule                                      0.033508   
nombre_de_trains_annules                                            1.000000   
nombre_de_trains_en_retard_a_l_arrivee                              0.528422   
taux_de_ponctualite                                                -0.511126   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                 -0.416982   

                                                    nombre_de_trains_en_retard_a_l_arrivee  \
nombre_de_trains_programmes                                                       0.146028   
nombre_de_trains_ayant_circule                                                    0.032561   
nombre_de_trains_annules                                                          0.528422   
nombre_de_trains_en_retard_a_l_arrivee                                            1.000000   
taux_de_ponctualite                                                              -0.951806   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                               -0.917644   

                                                    taux_de_ponctualite  \
nombre_de_trains_programmes                                    0.145879   
nombre_de_trains_ayant_circule                                 0.264811   
nombre_de_trains_annules                                      -0.511126   
nombre_de_trains_en_retard_a_l_arrivee                        -0.951806   
taux_de_ponctualite                                            1.000000   
nombre_de_trains_a_lheure_pour_un_train_en_reta...             0.947839   

                                                    nombre_de_trains_a_lheure_pour_un_train_en_retard_a_larrivee  
nombre_de_trains_programmes                                                                  0.130757             
nombre_de_trains_ayant_circule                                                               0.228159             
nombre_de_trains_annules                                                                    -0.416982             
nombre_de_trains_en_retard_a_l_arrivee                                                      -0.917644             
taux_de_ponctualite                                                                          0.947839             
nombre_de_trains_a_lheure_pour_un_train_en_reta...                                           1.000000             
Covariance and correlation matrices for Franche Comté
                                                    nombre_de_trains_programmes  \
nombre_de_trains_programmes                                            1.000000   
nombre_de_trains_ayant_circule                                         0.698226   
nombre_de_trains_annules                                              -0.272423   
nombre_de_trains_en_retard_a_l_arrivee                                 0.205932   
taux_de_ponctualite                                                    0.125201   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                     0.073597   

                                                    nombre_de_trains_ayant_circule  \
nombre_de_trains_programmes                                               0.698226   
nombre_de_trains_ayant_circule                                            1.000000   
nombre_de_trains_annules                                                 -0.310449   
nombre_de_trains_en_retard_a_l_arrivee                                    0.074959   
taux_de_ponctualite                                                       0.267074   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                        0.306454   

                                                    nombre_de_trains_annules  \
nombre_de_trains_programmes                                        -0.272423   
nombre_de_trains_ayant_circule                                     -0.310449   
nombre_de_trains_annules                                            1.000000   
nombre_de_trains_en_retard_a_l_arrivee                              0.335113   
taux_de_ponctualite                                                -0.468040   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                 -0.408580   

                                                    nombre_de_trains_en_retard_a_l_arrivee  \
nombre_de_trains_programmes                                                       0.205932   
nombre_de_trains_ayant_circule                                                    0.074959   
nombre_de_trains_annules                                                          0.335113   
nombre_de_trains_en_retard_a_l_arrivee                                            1.000000   
taux_de_ponctualite                                                              -0.930019   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                               -0.894502   

                                                    taux_de_ponctualite  \
nombre_de_trains_programmes                                    0.125201   
nombre_de_trains_ayant_circule                                 0.267074   
nombre_de_trains_annules                                      -0.468040   
nombre_de_trains_en_retard_a_l_arrivee                        -0.930019   
taux_de_ponctualite                                            1.000000   
nombre_de_trains_a_lheure_pour_un_train_en_reta...             0.963991   

                                                    nombre_de_trains_a_lheure_pour_un_train_en_retard_a_larrivee  
nombre_de_trains_programmes                                                                  0.073597             
nombre_de_trains_ayant_circule                                                               0.306454             
nombre_de_trains_annules                                                                    -0.408580             
nombre_de_trains_en_retard_a_l_arrivee                                                      -0.894502             
taux_de_ponctualite                                                                          0.963991             
nombre_de_trains_a_lheure_pour_un_train_en_reta...                                           1.000000             
Covariance and correlation matrices for Haute Normandie
                                                    nombre_de_trains_programmes  \
nombre_de_trains_programmes                                            1.000000   
nombre_de_trains_ayant_circule                                         0.979343   
nombre_de_trains_annules                                              -0.158727   
nombre_de_trains_en_retard_a_l_arrivee                                 0.171496   
taux_de_ponctualite                                                    0.086810   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                     0.044911   

                                                    nombre_de_trains_ayant_circule  \
nombre_de_trains_programmes                                               0.979343   
nombre_de_trains_ayant_circule                                            1.000000   
nombre_de_trains_annules                                                 -0.355092   
nombre_de_trains_en_retard_a_l_arrivee                                    0.093944   
taux_de_ponctualite                                                       0.168721   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                        0.127904   

                                                    nombre_de_trains_annules  \
nombre_de_trains_programmes                                        -0.158727   
nombre_de_trains_ayant_circule                                     -0.355092   
nombre_de_trains_annules                                            1.000000   
nombre_de_trains_en_retard_a_l_arrivee                              0.334146   
taux_de_ponctualite                                                -0.422487   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                 -0.416889   

                                                    nombre_de_trains_en_retard_a_l_arrivee  \
nombre_de_trains_programmes                                                       0.171496   
nombre_de_trains_ayant_circule                                                    0.093944   
nombre_de_trains_annules                                                          0.334146   
nombre_de_trains_en_retard_a_l_arrivee                                            1.000000   
taux_de_ponctualite                                                              -0.964419   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                               -0.925600   

                                                    taux_de_ponctualite  \
nombre_de_trains_programmes                                    0.086810   
nombre_de_trains_ayant_circule                                 0.168721   
nombre_de_trains_annules                                      -0.422487   
nombre_de_trains_en_retard_a_l_arrivee                        -0.964419   
taux_de_ponctualite                                            1.000000   
nombre_de_trains_a_lheure_pour_un_train_en_reta...             0.949461   

                                                    nombre_de_trains_a_lheure_pour_un_train_en_retard_a_larrivee  
nombre_de_trains_programmes                                                                  0.044911             
nombre_de_trains_ayant_circule                                                               0.127904             
nombre_de_trains_annules                                                                    -0.416889             
nombre_de_trains_en_retard_a_l_arrivee                                                      -0.925600             
taux_de_ponctualite                                                                          0.949461             
nombre_de_trains_a_lheure_pour_un_train_en_reta...                                           1.000000             
Covariance and correlation matrices for Languedoc Roussillon
                                                    nombre_de_trains_programmes  \
nombre_de_trains_programmes                                            1.000000   
nombre_de_trains_ayant_circule                                         0.972856   
nombre_de_trains_annules                                               0.037765   
nombre_de_trains_en_retard_a_l_arrivee                                 0.317465   
taux_de_ponctualite                                                    0.096394   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                     0.092917   

                                                    nombre_de_trains_ayant_circule  \
nombre_de_trains_programmes                                               0.972856   
nombre_de_trains_ayant_circule                                            1.000000   
nombre_de_trains_annules                                                 -0.194507   
nombre_de_trains_en_retard_a_l_arrivee                                    0.329807   
taux_de_ponctualite                                                       0.093653   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                        0.091190   

                                                    nombre_de_trains_annules  \
nombre_de_trains_programmes                                         0.037765   
nombre_de_trains_ayant_circule                                     -0.194507   
nombre_de_trains_annules                                            1.000000   
nombre_de_trains_en_retard_a_l_arrivee                             -0.078518   
taux_de_ponctualite                                                 0.004179   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                  0.000073   

                                                    nombre_de_trains_en_retard_a_l_arrivee  \
nombre_de_trains_programmes                                                       0.317465   
nombre_de_trains_ayant_circule                                                    0.329807   
nombre_de_trains_annules                                                         -0.078518   
nombre_de_trains_en_retard_a_l_arrivee                                            1.000000   
taux_de_ponctualite                                                              -0.905253   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                               -0.867311   

                                                    taux_de_ponctualite  \
nombre_de_trains_programmes                                    0.096394   
nombre_de_trains_ayant_circule                                 0.093653   
nombre_de_trains_annules                                       0.004179   
nombre_de_trains_en_retard_a_l_arrivee                        -0.905253   
taux_de_ponctualite                                            1.000000   
nombre_de_trains_a_lheure_pour_un_train_en_reta...             0.961579   

                                                    nombre_de_trains_a_lheure_pour_un_train_en_retard_a_larrivee  
nombre_de_trains_programmes                                                                  0.092917             
nombre_de_trains_ayant_circule                                                               0.091190             
nombre_de_trains_annules                                                                     0.000073             
nombre_de_trains_en_retard_a_l_arrivee                                                      -0.867311             
taux_de_ponctualite                                                                          0.961579             
nombre_de_trains_a_lheure_pour_un_train_en_reta...                                           1.000000             
Covariance and correlation matrices for Limousin
                                                    nombre_de_trains_programmes  \
nombre_de_trains_programmes                                            1.000000   
nombre_de_trains_ayant_circule                                         0.912996   
nombre_de_trains_annules                                              -0.146494   
nombre_de_trains_en_retard_a_l_arrivee                                 0.158836   
taux_de_ponctualite                                                    0.052706   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                     0.030030   

                                                    nombre_de_trains_ayant_circule  \
nombre_de_trains_programmes                                               0.912996   
nombre_de_trains_ayant_circule                                            1.000000   
nombre_de_trains_annules                                                 -0.537315   
nombre_de_trains_en_retard_a_l_arrivee                                    0.063206   
taux_de_ponctualite                                                       0.176827   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                        0.147229   

                                                    nombre_de_trains_annules  \
nombre_de_trains_programmes                                        -0.146494   
nombre_de_trains_ayant_circule                                     -0.537315   
nombre_de_trains_annules                                            1.000000   
nombre_de_trains_en_retard_a_l_arrivee                              0.175101   
taux_de_ponctualite                                                -0.319800   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                 -0.294912   

                                                    nombre_de_trains_en_retard_a_l_arrivee  \
nombre_de_trains_programmes                                                       0.158836   
nombre_de_trains_ayant_circule                                                    0.063206   
nombre_de_trains_annules                                                          0.175101   
nombre_de_trains_en_retard_a_l_arrivee                                            1.000000   
taux_de_ponctualite                                                              -0.969480   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                               -0.889295   

                                                    taux_de_ponctualite  \
nombre_de_trains_programmes                                    0.052706   
nombre_de_trains_ayant_circule                                 0.176827   
nombre_de_trains_annules                                      -0.319800   
nombre_de_trains_en_retard_a_l_arrivee                        -0.969480   
taux_de_ponctualite                                            1.000000   
nombre_de_trains_a_lheure_pour_un_train_en_reta...             0.912505   

                                                    nombre_de_trains_a_lheure_pour_un_train_en_retard_a_larrivee  
nombre_de_trains_programmes                                                                  0.030030             
nombre_de_trains_ayant_circule                                                               0.147229             
nombre_de_trains_annules                                                                    -0.294912             
nombre_de_trains_en_retard_a_l_arrivee                                                      -0.889295             
taux_de_ponctualite                                                                          0.912505             
nombre_de_trains_a_lheure_pour_un_train_en_reta...                                           1.000000             
Covariance and correlation matrices for Lorraine
                                                    nombre_de_trains_programmes  \
nombre_de_trains_programmes                                            1.000000   
nombre_de_trains_ayant_circule                                         0.999925   
nombre_de_trains_annules                                               0.872518   
nombre_de_trains_en_retard_a_l_arrivee                                 0.978353   
taux_de_ponctualite                                                    0.989946   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                     0.961425   

                                                    nombre_de_trains_ayant_circule  \
nombre_de_trains_programmes                                               0.999925   
nombre_de_trains_ayant_circule                                            1.000000   
nombre_de_trains_annules                                                  0.866484   
nombre_de_trains_en_retard_a_l_arrivee                                    0.977550   
taux_de_ponctualite                                                       0.989502   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                        0.962147   

                                                    nombre_de_trains_annules  \
nombre_de_trains_programmes                                         0.872518   
nombre_de_trains_ayant_circule                                      0.866484   
nombre_de_trains_annules                                            1.000000   
nombre_de_trains_en_retard_a_l_arrivee                              0.882820   
taux_de_ponctualite                                                 0.878553   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                  0.807112   

                                                    nombre_de_trains_en_retard_a_l_arrivee  \
nombre_de_trains_programmes                                                       0.978353   
nombre_de_trains_ayant_circule                                                    0.977550   
nombre_de_trains_annules                                                          0.882820   
nombre_de_trains_en_retard_a_l_arrivee                                            1.000000   
taux_de_ponctualite                                                               0.957854   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                                0.891409   

                                                    taux_de_ponctualite  \
nombre_de_trains_programmes                                    0.989946   
nombre_de_trains_ayant_circule                                 0.989502   
nombre_de_trains_annules                                       0.878553   
nombre_de_trains_en_retard_a_l_arrivee                         0.957854   
taux_de_ponctualite                                            1.000000   
nombre_de_trains_a_lheure_pour_un_train_en_reta...             0.980191   

                                                    nombre_de_trains_a_lheure_pour_un_train_en_retard_a_larrivee  
nombre_de_trains_programmes                                                                  0.961425             
nombre_de_trains_ayant_circule                                                               0.962147             
nombre_de_trains_annules                                                                     0.807112             
nombre_de_trains_en_retard_a_l_arrivee                                                       0.891409             
taux_de_ponctualite                                                                          0.980191             
nombre_de_trains_a_lheure_pour_un_train_en_reta...                                           1.000000             
Covariance and correlation matrices for Occitanie
                                                    nombre_de_trains_programmes  \
nombre_de_trains_programmes                                            1.000000   
nombre_de_trains_ayant_circule                                         0.996669   
nombre_de_trains_annules                                               0.359899   
nombre_de_trains_en_retard_a_l_arrivee                                 0.075516   
taux_de_ponctualite                                                    0.073352   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                     0.071702   

                                                    nombre_de_trains_ayant_circule  \
nombre_de_trains_programmes                                               0.996669   
nombre_de_trains_ayant_circule                                            1.000000   
nombre_de_trains_annules                                                  0.282616   
nombre_de_trains_en_retard_a_l_arrivee                                    0.070339   
taux_de_ponctualite                                                       0.078733   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                        0.101083   

                                                    nombre_de_trains_annules  \
nombre_de_trains_programmes                                         0.359899   
nombre_de_trains_ayant_circule                                      0.282616   
nombre_de_trains_annules                                            1.000000   
nombre_de_trains_en_retard_a_l_arrivee                              0.083539   
taux_de_ponctualite                                                -0.037958   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                 -0.313073   

                                                    nombre_de_trains_en_retard_a_l_arrivee  \
nombre_de_trains_programmes                                                       0.075516   
nombre_de_trains_ayant_circule                                                    0.070339   
nombre_de_trains_annules                                                          0.083539   
nombre_de_trains_en_retard_a_l_arrivee                                            1.000000   
taux_de_ponctualite                                                              -0.987975   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                               -0.622223   

                                                    taux_de_ponctualite  \
nombre_de_trains_programmes                                    0.073352   
nombre_de_trains_ayant_circule                                 0.078733   
nombre_de_trains_annules                                      -0.037958   
nombre_de_trains_en_retard_a_l_arrivee                        -0.987975   
taux_de_ponctualite                                            1.000000   
nombre_de_trains_a_lheure_pour_un_train_en_reta...             0.631335   

                                                    nombre_de_trains_a_lheure_pour_un_train_en_retard_a_larrivee  
nombre_de_trains_programmes                                                                  0.071702             
nombre_de_trains_ayant_circule                                                               0.101083             
nombre_de_trains_annules                                                                    -0.313073             
nombre_de_trains_en_retard_a_l_arrivee                                                      -0.622223             
taux_de_ponctualite                                                                          0.631335             
nombre_de_trains_a_lheure_pour_un_train_en_reta...                                           1.000000             
Covariance and correlation matrices for Hauts de France
                                                    nombre_de_trains_programmes  \
nombre_de_trains_programmes                                            1.000000   
nombre_de_trains_ayant_circule                                         0.999100   
nombre_de_trains_annules                                               0.522728   
nombre_de_trains_en_retard_a_l_arrivee                                 0.815741   
taux_de_ponctualite                                                    0.521201   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                     0.019484   

                                                    nombre_de_trains_ayant_circule  \
nombre_de_trains_programmes                                               0.999100   
nombre_de_trains_ayant_circule                                            1.000000   
nombre_de_trains_annules                                                  0.489509   
nombre_de_trains_en_retard_a_l_arrivee                                    0.811815   
taux_de_ponctualite                                                       0.522756   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                        0.025011   

                                                    nombre_de_trains_annules  \
nombre_de_trains_programmes                                         0.522728   
nombre_de_trains_ayant_circule                                      0.489509   
nombre_de_trains_annules                                            1.000000   
nombre_de_trains_en_retard_a_l_arrivee                              0.540356   
taux_de_ponctualite                                                 0.206385   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                 -0.228320   

                                                    nombre_de_trains_en_retard_a_l_arrivee  \
nombre_de_trains_programmes                                                       0.815741   
nombre_de_trains_ayant_circule                                                    0.811815   
nombre_de_trains_annules                                                          0.540356   
nombre_de_trains_en_retard_a_l_arrivee                                            1.000000   
taux_de_ponctualite                                                               0.240638   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                               -0.479282   

                                                    taux_de_ponctualite  \
nombre_de_trains_programmes                                    0.521201   
nombre_de_trains_ayant_circule                                 0.522756   
nombre_de_trains_annules                                       0.206385   
nombre_de_trains_en_retard_a_l_arrivee                         0.240638   
taux_de_ponctualite                                            1.000000   
nombre_de_trains_a_lheure_pour_un_train_en_reta...             0.595889   

                                                    nombre_de_trains_a_lheure_pour_un_train_en_retard_a_larrivee  
nombre_de_trains_programmes                                                                  0.019484             
nombre_de_trains_ayant_circule                                                               0.025011             
nombre_de_trains_annules                                                                    -0.228320             
nombre_de_trains_en_retard_a_l_arrivee                                                      -0.479282             
taux_de_ponctualite                                                                          0.595889             
nombre_de_trains_a_lheure_pour_un_train_en_reta...                                           1.000000             
Covariance and correlation matrices for Pays de la Loire
                                                    nombre_de_trains_programmes  \
nombre_de_trains_programmes                                            1.000000   
nombre_de_trains_ayant_circule                                         0.994913   
nombre_de_trains_annules                                               0.166776   
nombre_de_trains_en_retard_a_l_arrivee                                 0.393180   
taux_de_ponctualite                                                    0.189958   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                     0.109093   

                                                    nombre_de_trains_ayant_circule  \
nombre_de_trains_programmes                                               0.994913   
nombre_de_trains_ayant_circule                                            1.000000   
nombre_de_trains_annules                                                  0.066596   
nombre_de_trains_en_retard_a_l_arrivee                                    0.380624   
taux_de_ponctualite                                                       0.204236   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                        0.126640   

                                                    nombre_de_trains_annules  \
nombre_de_trains_programmes                                         0.166776   
nombre_de_trains_ayant_circule                                      0.066596   
nombre_de_trains_annules                                            1.000000   
nombre_de_trains_en_retard_a_l_arrivee                              0.168886   
taux_de_ponctualite                                                -0.117527   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                 -0.158970   

                                                    nombre_de_trains_en_retard_a_l_arrivee  \
nombre_de_trains_programmes                                                       0.393180   
nombre_de_trains_ayant_circule                                                    0.380624   
nombre_de_trains_annules                                                          0.168886   
nombre_de_trains_en_retard_a_l_arrivee                                            1.000000   
taux_de_ponctualite                                                              -0.793699   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                               -0.794092   

                                                    taux_de_ponctualite  \
nombre_de_trains_programmes                                    0.189958   
nombre_de_trains_ayant_circule                                 0.204236   
nombre_de_trains_annules                                      -0.117527   
nombre_de_trains_en_retard_a_l_arrivee                        -0.793699   
taux_de_ponctualite                                            1.000000   
nombre_de_trains_a_lheure_pour_un_train_en_reta...             0.915351   

                                                    nombre_de_trains_a_lheure_pour_un_train_en_retard_a_larrivee  
nombre_de_trains_programmes                                                                  0.109093             
nombre_de_trains_ayant_circule                                                               0.126640             
nombre_de_trains_annules                                                                    -0.158970             
nombre_de_trains_en_retard_a_l_arrivee                                                      -0.794092             
taux_de_ponctualite                                                                          0.915351             
nombre_de_trains_a_lheure_pour_un_train_en_reta...                                           1.000000             
Covariance and correlation matrices for Picardie
                                                    nombre_de_trains_programmes  \
nombre_de_trains_programmes                                            1.000000   
nombre_de_trains_ayant_circule                                         0.995432   
nombre_de_trains_annules                                              -0.006111   
nombre_de_trains_en_retard_a_l_arrivee                                 0.256222   
taux_de_ponctualite                                                    0.081414   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                     0.060147   

                                                    nombre_de_trains_ayant_circule  \
nombre_de_trains_programmes                                               0.995432   
nombre_de_trains_ayant_circule                                            1.000000   
nombre_de_trains_annules                                                 -0.101556   
nombre_de_trains_en_retard_a_l_arrivee                                    0.227022   
taux_de_ponctualite                                                       0.111942   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                        0.096055   

                                                    nombre_de_trains_annules  \
nombre_de_trains_programmes                                        -0.006111   
nombre_de_trains_ayant_circule                                     -0.101556   
nombre_de_trains_annules                                            1.000000   
nombre_de_trains_en_retard_a_l_arrivee                              0.292014   
taux_de_ponctualite                                                -0.324134   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                 -0.379345   

                                                    nombre_de_trains_en_retard_a_l_arrivee  \
nombre_de_trains_programmes                                                       0.256222   
nombre_de_trains_ayant_circule                                                    0.227022   
nombre_de_trains_annules                                                          0.292014   
nombre_de_trains_en_retard_a_l_arrivee                                            1.000000   
taux_de_ponctualite                                                              -0.938062   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                               -0.901118   

                                                    taux_de_ponctualite  \
nombre_de_trains_programmes                                    0.081414   
nombre_de_trains_ayant_circule                                 0.111942   
nombre_de_trains_annules                                      -0.324134   
nombre_de_trains_en_retard_a_l_arrivee                        -0.938062   
taux_de_ponctualite                                            1.000000   
nombre_de_trains_a_lheure_pour_un_train_en_reta...             0.949267   

                                                    nombre_de_trains_a_lheure_pour_un_train_en_retard_a_larrivee  
nombre_de_trains_programmes                                                                  0.060147             
nombre_de_trains_ayant_circule                                                               0.096055             
nombre_de_trains_annules                                                                    -0.379345             
nombre_de_trains_en_retard_a_l_arrivee                                                      -0.901118             
taux_de_ponctualite                                                                          0.949267             
nombre_de_trains_a_lheure_pour_un_train_en_reta...                                           1.000000             
Covariance and correlation matrices for Poitou Charentes
                                                    nombre_de_trains_programmes  \
nombre_de_trains_programmes                                            1.000000   
nombre_de_trains_ayant_circule                                         0.988696   
nombre_de_trains_annules                                              -0.114548   
nombre_de_trains_en_retard_a_l_arrivee                                 0.193943   
taux_de_ponctualite                                                    0.035218   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                    -0.012480   

                                                    nombre_de_trains_ayant_circule  \
nombre_de_trains_programmes                                               0.988696   
nombre_de_trains_ayant_circule                                            1.000000   
nombre_de_trains_annules                                                 -0.262199   
nombre_de_trains_en_retard_a_l_arrivee                                    0.166887   
taux_de_ponctualite                                                       0.061484   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                        0.022863   

                                                    nombre_de_trains_annules  \
nombre_de_trains_programmes                                        -0.114548   
nombre_de_trains_ayant_circule                                     -0.262199   
nombre_de_trains_annules                                            1.000000   
nombre_de_trains_en_retard_a_l_arrivee                              0.142526   
taux_de_ponctualite                                                -0.180706   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                 -0.231810   

                                                    nombre_de_trains_en_retard_a_l_arrivee  \
nombre_de_trains_programmes                                                       0.193943   
nombre_de_trains_ayant_circule                                                    0.166887   
nombre_de_trains_annules                                                          0.142526   
nombre_de_trains_en_retard_a_l_arrivee                                            1.000000   
taux_de_ponctualite                                                              -0.971437   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                               -0.905672   

                                                    taux_de_ponctualite  \
nombre_de_trains_programmes                                    0.035218   
nombre_de_trains_ayant_circule                                 0.061484   
nombre_de_trains_annules                                      -0.180706   
nombre_de_trains_en_retard_a_l_arrivee                        -0.971437   
taux_de_ponctualite                                            1.000000   
nombre_de_trains_a_lheure_pour_un_train_en_reta...             0.926489   

                                                    nombre_de_trains_a_lheure_pour_un_train_en_retard_a_larrivee  
nombre_de_trains_programmes                                                                 -0.012480             
nombre_de_trains_ayant_circule                                                               0.022863             
nombre_de_trains_annules                                                                    -0.231810             
nombre_de_trains_en_retard_a_l_arrivee                                                      -0.905672             
taux_de_ponctualite                                                                          0.926489             
nombre_de_trains_a_lheure_pour_un_train_en_reta...                                           1.000000             
Covariance and correlation matrices for Provence Alpes Côte d'Azur
                                                    nombre_de_trains_programmes  \
nombre_de_trains_programmes                                            1.000000   
nombre_de_trains_ayant_circule                                         0.972571   
nombre_de_trains_annules                                               0.029286   
nombre_de_trains_en_retard_a_l_arrivee                                 0.535467   
taux_de_ponctualite                                                    0.045407   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                     0.025496   

                                                    nombre_de_trains_ayant_circule  \
nombre_de_trains_programmes                                               0.972571   
nombre_de_trains_ayant_circule                                            1.000000   
nombre_de_trains_annules                                                 -0.204024   
nombre_de_trains_en_retard_a_l_arrivee                                    0.496733   
taux_de_ponctualite                                                       0.112794   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                        0.090360   

                                                    nombre_de_trains_annules  \
nombre_de_trains_programmes                                         0.029286   
nombre_de_trains_ayant_circule                                     -0.204024   
nombre_de_trains_annules                                            1.000000   
nombre_de_trains_en_retard_a_l_arrivee                              0.119016   
taux_de_ponctualite                                                -0.293604   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                 -0.280998   

                                                    nombre_de_trains_en_retard_a_l_arrivee  \
nombre_de_trains_programmes                                                       0.535467   
nombre_de_trains_ayant_circule                                                    0.496733   
nombre_de_trains_annules                                                          0.119016   
nombre_de_trains_en_retard_a_l_arrivee                                            1.000000   
taux_de_ponctualite                                                              -0.798956   
nombre_de_trains_a_lheure_pour_un_train_en_reta...                               -0.777194   

                                                    taux_de_ponctualite  \
nombre_de_trains_programmes                                    0.045407   
nombre_de_trains_ayant_circule                                 0.112794   
nombre_de_trains_annules                                      -0.293604   
nombre_de_trains_en_retard_a_l_arrivee                        -0.798956   
taux_de_ponctualite                                            1.000000   
nombre_de_trains_a_lheure_pour_un_train_en_reta...             0.948312   

                                                    nombre_de_trains_a_lheure_pour_un_train_en_retard_a_larrivee  
nombre_de_trains_programmes                                                                  0.025496             
nombre_de_trains_ayant_circule                                                               0.090360             
nombre_de_trains_annules                                                                    -0.280998             
nombre_de_trains_en_retard_a_l_arrivee                                                      -0.777194             
taux_de_ponctualite                                                                          0.948312             
nombre_de_trains_a_lheure_pour_un_train_en_reta...                                           1.000000             

For all regions, we see that there are two groups representated : the number of trains scheduled, the number of trains that ran, the number of cancelled trains and the number of delayed trains have some mid-ranged correlation with each other, oscillating between 0.6 and 0.75; while the punctuality rate and the number of trains on time for one delayed train at arrival have a correlation of 0.5 only. Both groups don't have any strong positive correlation with each other, and they only tend to weak tendances (close to 0.2 at most), or negative values (some reach -0.36 at most). We observe the same tendency with each region analysis, but the values are really different depending on the region studied.

5. Data transformation

We need to select datapoints where the fields that interests us are not equal to 0 as they usually indicates the absence of revelant data. This means that we are basically going to exclude all reported months where there was no data from our model to avoid any problem.

We also reshape our data to have less difficulties during the supervised learning.

In [16]:
ah = test.loc[test.taux_de_ponctualite > 0].sort_values(by='date')
test.describe()
ah.plot(x='nombre_de_trains_programmes', y='nombre_de_trains_en_retard_a_l_arrivee', style='o')  
matplotlib.pyplot.title('Number of scheduled trains vs Number of delayed trains (before reshaping)')  
matplotlib.pyplot.xlabel('Number of scheduled trains')  
matplotlib.pyplot.ylabel('Number of delayed trains')  
matplotlib.pyplot.show()

X = ah['nombre_de_trains_programmes'].values.reshape(-1,1)
y = ah['nombre_de_trains_en_retard_a_l_arrivee'].values.reshape(-1,1)

6. Clustering and/or PCA (Unsupervised learning)

Apply a clustering (e.g. k-means, hierarchical, ...) or PCA algorithm to your dataset.

NOTE: This step can be skipped in case you have a dataset that is suitable for supervised learning!


(Note: If you have a dataset that is suitable for supervised learning but you still want to perform unsupervised learning:
In that case, the dependent variable (class attribute to predict) should be excluded from the data for clustering.
For example, for the iris dataset we would exclude the attribute "species" to see if clustering only based
on the independent variables (i.e. petal and sepal lengths/widths) allows partitioning of the data into
groups that relate to the plant species)

For the unsupervised learning part, we try to apply the k-means clustering to our dataset in order to determine main groups among some of our fields.
Here, we try to predict how many trains are going to be cancelled in average by selecting the group that corresponds the most to our number of scheduled trains.

In order to have an optimal clustering, we first use the elbow method to select a value of k with a low error rate.

In [17]:
df = pd.DataFrame( test.sort_values(by='date'),columns=['nombre_de_trains_programmes','nombre_de_trains_annules'])

ssedata = {}
for k in range(2, 30):
    kmeans = KMeans(n_clusters=k).fit(df)
    centroids = kmeans.cluster_centers_
    ssedata[k] = kmeans.inertia_ 

matplotlib.pyplot.figure()
matplotlib.pyplot.plot(list(ssedata.keys()), list(ssedata.values()))
matplotlib.pyplot.xlabel("Number of cluster")
matplotlib.pyplot.ylabel("SSE")
Out[17]:
Text(0, 0.5, 'SSE')

With this method, we then know that we should select a number of clusters that is superior than 15, because they generate less errors than any number of cluster inferior to this minimal limit. Once we got the good number of cluster, we can compute the k-means graph. Here, we will try with 15.

In [18]:
kmeans = KMeans(n_clusters=15).fit(df)

centroids = kmeans.cluster_centers_
print(centroids)

matplotlib.pyplot.scatter(df['nombre_de_trains_programmes'], df['nombre_de_trains_annules'], c= kmeans.labels_.astype(float), s=50, alpha=0.5)
matplotlib.pyplot.scatter(centroids[:, 0], centroids[:, 1], c='red', s=50)
[[1.09945362e+04 2.06173913e+02]
 [3.07472000e+04 6.10325000e+02]
 [3.68484681e+03 8.71829787e+01]
 [1.76981064e+04 2.70191489e+02]
 [3.94523529e+04 8.00470588e+02]
 [9.26591195e+03 1.81283019e+02]
 [5.63212741e+03 9.74247104e+01]
 [1.52798454e+04 4.79608247e+02]
 [2.53238125e+04 4.18875000e+02]
 [4.41808889e+04 8.61500000e+02]
 [2.76840000e+04 4.77553191e+02]
 [7.92537168e+03 1.40588496e+02]
 [1.93819310e+04 3.64310345e+02]
 [1.30914179e+04 4.03328358e+02]
 [3.65789474e+01 7.89473684e-01]]
Out[18]:
<matplotlib.collections.PathCollection at 0x1cbef3597f0>

According to our final results, if we decide in the future to schedule 9259 trains for a month, there is a probability that there will be 177 trains that will be cancelled on average. Since the data is sprayed, this is not the most reliable estimation available, but it helps us predict global, average outcomes that can help us to verify if the situation for the month currently tested was considerated as normal (within the predictions) or not.

7. Building prediction models (supervised learning)

Depending on your dataset, there is an attribute (which can be categorical (classification task) or numeric (regression task)) that we would like to be able to predict.

(For example, for the iris dataset we had the attribute "species". Using the independent attributes of
sepal and petal lengths and widths, the goal would be to build a prediction model that allows us to predict
the plant species of new incoming plant data.)

Tasks:

  1. Determine which class attribute you want to predict. Then, split your data into training and test set
  2. Use one or more machine learning method(s) to build models based on the training data.
  3. Validate your models by predicting your test data.
  4. Assess your model performances using different metrics (e.g. accuracy, recall, ...).
  5. Optional: Attempt to improve the machine learning process (and thus prediction performance(s)) by implementing feature selection and/or parameter search (for the machine learning algorithm) steps.

    Since the dataset doesn't have a category attribute, we will do a basic linear regression task first, and then try a random forest regression to try multiple models and see which one is the best.
    We want to predict the number of trains that are likely going to be delayed depending on the punctuality rate.

We then create our train and test shapes, apply the linear regression wth the train set, and test it to see if it fits.

In [19]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

print('Training Features Shape:', X_train.shape)
print('Training Labels Shape:', y_train.shape)
print('Testing Features Shape:', X_test.shape)
print('Testing Labels Shape:', y_test.shape)

regressor = LinearRegression()  
regressor.fit(X_train, y_train) #training the algorithm

#To retrieve the intercept:
print(regressor.intercept_)

#For retrieving the slope:
print(regressor.coef_)

y_pred = regressor.predict(X_test)

wow = pd.DataFrame({'Actual': y_test.flatten(), 'Predicted': y_pred.flatten()})
wow
Training Features Shape: (996, 1)
Training Labels Shape: (996, 1)
Testing Features Shape: (427, 1)
Testing Labels Shape: (427, 1)
[41.29517057]
[[0.08272399]]
Out[19]:
Actual Predicted
0 545 715.578397
1 185 290.542546
2 1078 1031.418583
3 1088 1614.788147
4 956 649.647378
5 797 977.730715
6 393 1351.146797
7 327 656.596193
8 3580 2566.941249
9 1186 1612.058255
10 635 389.563160
11 378 587.025320
12 937 1311.356559
13 521 1563.168378
14 639 711.855817
15 663 455.494179
16 806 662.304149
17 740 720.045492
18 311 323.053074
19 220 297.574085
20 240 203.020567
21 1693 1654.330213
22 3162 2271.285716
23 292 405.777062
24 533 1075.345021
25 1278 728.731511
26 334 412.891325
27 486 404.039858
28 500 397.918283
29 799 1612.306427
... ... ...
397 265 458.306794
398 225 575.940305
399 581 567.750630
400 359 487.094742
401 236 513.318246
402 790 639.968672
403 1552 1619.668862
404 740 981.618742
405 1419 903.775470
406 1478 1072.449681
407 597 392.541224
408 274 349.855646
409 322 737.996598
410 2743 3656.085275
411 334 392.541224
412 2322 1340.061783
413 396 506.783051
414 1109 741.884625
415 367 515.303622
416 383 689.520341
417 2542 1358.591956
418 336 554.597516
419 444 317.924187
420 521 541.361678
421 298 663.379561
422 841 1003.788771
423 385 1344.446154
424 883 1125.641206
425 2548 1722.081159
426 767 748.171648

427 rows × 2 columns

In [20]:
matplotlib.pyplot.scatter(X_test, y_test,  color='gray')
matplotlib.pyplot.plot(X_test, y_pred, color='red', linewidth=2)
matplotlib.pyplot.show()

print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred))  
print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred))  
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred)))
Mean Absolute Error: 340.2254091000616
Mean Squared Error: 259611.72720724894
Root Mean Squared Error: 509.5210763130893

We see with the prediction tables, the graph and the error values that the model gives us outputs that are not perfect for the test set. But, we are actually trying to estimate an attribute that has no strong correlation with the input attribute, so this was to be expected. This model is more adaptated to give us a way to obtain an average number of delayed trains that we can expect depending on the punctuality rate provided on the input, that would act as a way to control if a unnatural situation has occured on that month (by applying an interval around the model prediction where we can still confirm that the value is normal or not).

Thus, we can say that the linear regression does not fit with the real data, but instead focus to trying to provide an average for constraints-free situations.

We then want to apply the random forest regression in order to have another supervised learning technique. The problem is that we obtain a negative accuracy which is synonym of a bad quality regression, so we can't rely on it to obtain accurate results. However, we can still conceive a model that can output realistic values for a somewhat "normal" situation.

In [21]:
# The baseline predictions are the historical averages
baseline_preds = X_test[:, 0]

# Baseline errors, and display average baseline error
baseline_errors = abs(baseline_preds - y_test)

print('Average baseline error: ', round(np.mean(baseline_errors), 2))

# Import the model we are using
rf = RandomForestRegressor(n_estimators = 100, random_state = 42)# Train the model on training data

rf.fit(X_train, y_train.ravel());

# Use the forest's predict method on the test data
predictions = rf.predict(X_test)# Calculate the absolute errors
errors = abs(predictions - y_test)# Print out the mean absolute error (mae)
print('Mean Absolute Error:', round(np.mean(errors), 2))

# Calculate mean absolute percentage error (MAPE)
mape = 100 * (errors / y_test)# Calculate and display accuracy
accuracy = 100 - np.mean(mape)
print('Accuracy:', round(accuracy, 2), '%.')
Average baseline error:  10334.22
Mean Absolute Error: 834.83
Accuracy: -30.55 %.

We output two different trees for our random forest regressor. The biggest one is on tree.dot and tree.png, while a (not so) "smaller" and somewhat more readable tree designed on a 6-level depth is available on small_tree.dot and small_tree.png

In [22]:
# Pull out one tree from the forest
tree = rf.estimators_[5]# Export the image to a dot file

export_graphviz(tree, out_file = 'tree.dot', feature_names = ['nombre_de_trains_programmes'], rounded = True)

# Use dot file to create a graph
(graph, ) = pydot.graph_from_dot_file('tree.dot')

# Write graph to a png file
graph.write_png('tree.png')
display(Image(filename='tree.png'))
In [23]:
# Limit depth of tree to 6 levels
rf_small = RandomForestRegressor(n_estimators=10, max_depth = 6)
rf_small.fit(X_train, y_train.ravel())

# Extract the small tree
tree_small = rf_small.estimators_[5]

# Save the tree as a png image
export_graphviz(tree_small, out_file = 'small_tree.dot', feature_names = ['nombre_de_trains_programmes'], rounded = True, precision = 1)
(graph, ) = pydot.graph_from_dot_file('small_tree.dot')
graph.write_png('small_tree.png')
display(Image(filename='small_tree.png'))

This is where we are supposed to run our model, and to see if our predictions fits the actual data.

In [24]:
predictions_data = pd.DataFrame(data = {'nombre_de_trains_en_retard_a_l_arrivee': predictions})

matplotlib.pyplot.plot(ah['nombre_de_trains_programmes'], ah['nombre_de_trains_en_retard_a_l_arrivee'], 'go', label = 'actual')
matplotlib.pyplot.plot(X_test, predictions_data['nombre_de_trains_en_retard_a_l_arrivee'], 'ro', label = 'prediction')
matplotlib.pyplot.xticks(rotation = '60'); 
matplotlib.pyplot.legend()
Out[24]:
<matplotlib.legend.Legend at 0x1cbfb350cf8>

We see that our predictions are actually pretty good compared to the disparity of the data, and that we can actually make some average predictions to simulate a normal behaviour. I was expecting this model to output values that would be completly out of place and totally unusable, but it turned out better than expected.

In conclusion, we now have a model that, for any number of scheduled trains inputted, is able to predict the number of trains that will be delayed at the final stop while following the same pattern than for the actual data. This gives more realistic values, but which are less likely to be accurate.
But since the accuracy is irrevelant on data based from results mesured on the real world that depends on countless factors, we can still take this model into consideration.

To sum up this part on supervised learning, we obtained two models that each have their own way of running (linear and random forest), and outputs different values for each data inputted. In order to have more accurate predictions, an idea of improvement would be to merge the results given by both models, which would give probably better results. For the time being, the random forest seems closer to the real life situation anticipation, while the linear model is only considerating a situation where there is no external constraint.