Jonas Buyl - Power Demand Prediction of Vehicles On A Non-Fixed Route

Power demand prediction of a vehicle on a non-fixed route Jonas Buyl
Promotoren: prof. dr. ir. Benjamin Schrauwen, dr. ir. David Verstraeten Begeleiders: Pieter Buteneers, Tim Waegeman Masterproef ingediend tot het behalen van de academische graad van Master in de ingenieurswetenschappen: computerwetenschappen
Vakgroep Elektronica en Informatiesystemen Voorzitter: prof. dr. ir. Jan Van Campenhout Faculteit Ingenieurswetenschappen en Architectuur Academiejaar 2011-2012
Power demand prediction of a vehicle on a non-fixed route Jonas Buyl
Promotoren: prof. dr. ir. Benjamin Schrauwen, dr. ir. David Verstraeten Begeleiders: Pieter Buteneers, Tim Waegeman Masterproef ingediend tot het behalen van de academische graad van Master in de ingenieurswetenschappen: computerwetenschappen
Vakgroep Elektronica en Informatiesystemen Voorzitter: prof. dr. ir. Jan Van Campenhout Faculteit Ingenieurswetenschappen en Architectuur Academiejaar 2011-2012
Power demand prediction of a vehicle on a non-xed route

Jonas Buyl Supervisors: prof. dr. ir. Benjamin Schrauwen, dr. ir. David Verstraeten, ir. Pieter Buteneers, ir. Tim Waegeman.
AbstractIn this article several approaches are presented to predict the future power demand and speed of a car, as well as the chance to stop in 200m. We introduce a time series prediction model based on Reservoir Computing, a novel technique for training recurrent neural networks. The model is improved by using information of previous trips, and post-processing the predicted output window. Furthermore, we present a RC-based classier to predict the chance a car is stopping within the next 200m. This is used to separate the model in a model trained on fast data, and a model trained on intervals where the car stops. Index TermsReservoir Computing, vehicle behavior prediction, stop prediction, road graph, electric vehicles, time series prediction
weight of the predicted output vs. the information from previous trips decreases as the range of the prediction increases. Therefore the predicted output windows are post-processed using a simple linear model. Furthermore, we introduce an RCbased classier model to predict if the car stops within 200m. The classier is then used to separate the time series prediction model for situations where the car stops within 200m. II. V EHICLE BEHAVIOR PREDICTION A. Pre-processing After building the road graph data structure dened by Cao et al. [2], the complete dataset is mapped on the road segments. To better align the trip data, it was interpolated every meter, converting the data to a distance scale. B. Single RC time series prediction model (RCLA) The speed, acceleration and power proles are used as input in separate systems with a reservoir of 150 neurons each. The neuron output weights are trained using ridge regression. Each predicted value is sent back in an output feedback loop to recursively predict the rest of the sequence. The reservoir state outputs at each step t are extended with the averages of the information of previous trips at each step t, to predict the next output value y(t). C. Output window post-processing (OWPP) The training process of the time series consists of only predicting the next step ahead. We observed that the inuence of the predicted output vs. the information from previous trips decreases as the range of the prediction increases. The output window is therefore post-processed by applying linear regression at each time step t individually, combining the predicted values with the average values at point t. D. Stop prediction A reservoir was used with a logistic regression readout to classify a sample t as point where the car stops in 200m. First the current power demand, acceleration and speed at t is used as input of the reservoir. Secondly, the average acceleration of previous trips at t + 20 is used, excluding trips where the car does not stop within 200m of t + 20. Lastly, the average chance to stop within 200m of t + 20 is used. For the evaluation of this classier, the area under the ROCcurve (AUC) is maximized. The true positive rate and false
I. I NTRODUCTION
LECTRIC vehicles (EV) are increasingly commercially viable but sales gures remain fairly disappointing, often because of the high price. The battery has been the main way of storing energy in EVs because of its large powerto-weight ratio, but they are not as capable as capacitors to handle peaks in power demand. New research proposes to use supercapacitors capacitors with an energy density much greater than capacitors in electric vehicles to replace batteries. The ChargeCar project [1] suggests to combine the advantages of both, to be able to use cheaper batteries and supercapacitors to reduce the manufacturing costs for EVs. The capacitor is used as a buffer to handle the high spikes in power demand. This extends battery life-time, increases efciency in cold weather and can even extend the range of the EV. To direct the energy ows between battery, capacitor, and engine, a controller is needed. In this article we introduce several approaches to predict vehicle behavior and upcoming stops. These predictions can then be used to improve an intelligent controller. We use of Reservoir Computing (RC), a novel way of training recurrent neural networks [3]. Instead of training all internal weights, only the output weights are trained. The weights of the input and the internal connections are generated randomly and remain constant. First, we modify a GPS map generation algorithm presented by L. Cao and J. Krumm [2], to keep information about the cars current power demand, speed, acceleration, etc... . This information is then used as extra input for time series prediction of the power demand, speed and acceleration proles using Reservoir Computing [4]. Additionally, we observed that the
positive rate are calculated for every threshold that can separate the classes, when classifying the output between [0, 1] of the reservoir readout. A maximum average AUC of 0.955 was found and 94.5% of the samples were correctly classied, tested on a dataset in which 10% of the samples are actual stops. From Figure 1, we can see that the predicted chance to stop is usually high at points where the car stops. Around places where the car breaks but doesnt stop, the output can be high as well. This could be interpreted as an error, but this output may still be useful for some applications.
1.0 Chance to stop 0.8 0.6 0.4 0.2 1000 2000 3000 4000 5000
than the linear methods. However, after output window postprocessing the Root Mean Squared Error (RMSE) can be decreased signicantly. The RCSP model predicts the speed better than other models, and when extended with the OWPP lter, the model outperforms any other tested model to predict the power demand, acceleration and speed proles. In Figure 2 the absolute deviation is given over the predicted distance. The OWPP improves the result towards the end of predictions, whereas the RCSP model improves the result at the start of predictions.
RMSE (STD) TDW TDWAtdw RCLA RCLA/OWPP RCSP RCSP/OWPP Power (W) 8766 8401 8416 (4.47) 8386 (3.67) 8367 (25.62) 8257 (12.48) Speed (m/s) 1.481 1.502 1.423 (0.018) 1.311 (0.012) 1.304 (0.017) 1.257 (0.016) Acceleration (m/s)2 0.3136 0.3111 0.3130 (0.0005) 0.3081 (0.0002) 0.3023 (0.0005) 0.2992 (0.0006)
0.00 100 Speed (km/h) 80 60 40 20 00
TABLE I AVERAGE RMSE ERROR RATES ( AND STANDARD DEVIATION ).
1000
2000 3000 Distance (meter)
4000
5000
Fig. 1. An example of the output of the RC stop prediction model. The green areas are the target areas where the car stops within 200m. At the bottom, the speed prole of the trip is given for comparison with the actual car behavior. The chosen threshold line is shown as a grey dashed line.
9 8 7 6 5 4 3 2 1 00
Power (kW)
Predicted distance (m)
50
100
150
8 7 6 5 4 3 2 1 00 200
Speed (km/h)
0.30 Acceleration (m/s^2) 0.25 0.20 0.15 0.10 0.05

TDW TDWAtdw RCLA RCLA/OWPP RCSP RCSP/OWPP
50
100
150
0.00 200 0
50 100 150 200
Fig. 2.
Average absolute deviation over the predicted distance.
E. Split model time series prediction (RCSP) The RCLA model was separated by training and optimizing one model on a dataset of intervals where the car stops, and one model on the other intervals. The stop classier is then used to determine which model should be used for the time series prediction. Finally, the OWPP lter was separated as well and applied to the RCSP model to further improve results. III. E VALUATION The proposed RC-based models were compared with a number of linear methods. The best performing methods made use of a time delay window (TDW): a weighted average of the previous values, trained using linear regression. A second model extends the TDW model by including a weighted average of the information of previous trips (TDWAtdw). Trips of one driver were used from the dataset supplied by the ChargeCar project. A random subset of 2,230,500 samples was chosen and divided in 9566 intervals to predict. Of these data 25% was used for training, another 25% for validation, and the remaining 50% was used to compare the models. The results of the RC models are the averages taken over 10 reservoir instances. The results of all discussed models are given in Table I. The rst RC-based model RCLA does not yield much better results
IV. C ONCLUSION It is possible to use data from previous trips and Reservoir Computing to predict the future power demand, speed and acceleration prole. Using a classier to predict if a stop is imminent signicantly improves the results. Post-processing the predicted output interval further boosts the performance. The average absolute deviation of the predicted speed at 200m further is 6km/h. Both the predicted proles and the stop predictor could be used for an intelligent vehicle energy management controller. R EFERENCES
[1] H. Benjamin Brown, Illah Nourbakhsh, Christopher Bartley, Jennifer Cross, Paul S Dille, Joshua Schapiro, and Alexander Styler. Chargecar community conversions: Practical, custom electric vehicles now! Number CMU-RI-TR-, March 2012. [2] Lili Cao and John Krumm. From gps traces to a routable road map. In Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS 09, pages 312, New York, NY, USA, 2009. ACM. [3] David Verstraeten, Benjamin Schrauwen, Michiel DHaene, and Dirk Stroobandt. An experimental unication of reservoir computing methods. Neural Networks, 20(3):391403, 4 2007. [4] Francis wyffels, Benjamin Schrauwen, and Dirk Stroobandt. Stable output feedback in reservoir computing using ridge regression. In V. Kurkova, R. Neruda, and J. Koutnik, editors, Proceedings of the 18th International Conference on Articial Neural Networks, pages 808817, Prague, 9 2008. Springer.
Voorspelling van vermogensgebruik van een voertuig op een niet-vaste route

Jonas Buyl Promotors: prof. dr. ir. Benjamin Schrauwen, dr. ir. David Verstraeten, Begeleiders: ir. Pieter Buteneers, ir. Tim Waegeman.
SamenvattingIn dit artikel presenteren we verschillende manieren om het verbruik en de snelheid van een voertuig te voorspellen, net als de kans om te stoppen binnen de 200m. We introduceren een model voor tijdsreeksvoorspelling gebaseerd op Reservoir Computing, een nieuwe techniek om recurrente neurale netwerken te trainen. Het model werd nog verbeterd met informatie van vorige trips, en met herverwerking van het voorspellingsvenster. Verder gebruiken we een RC classicatiemodel om de stopkans te voorspellen om het eerdere model op te splitsen door een model te trainen op snelle data en een ander op intervallen waar de auto stopt. De stopkans predictor bepaalt dan welk model moet gebruikt worden om het volgende interval te voorspellen. Sleutelwoorden: Reservoir Computing, gedragsvoorspelling van een voertuig, voorspelling van stopkans, wegengraaf, elektrische voertuigen, tijdsreeksvoorspelling
voorspelde waarde t.o.v. de informatie van voorbije trips, daalde naargelang de voorspelde afstand. Daarom wordt het voorspelde venster herverwerkt met een simpel lineair model. Verder introduceren we een RC classicatiemodel om te voorspellen of de auto stopt binnen de 200m. Deze wordt dan gebruikt om de tijdsreeksvoorspellingsmodellen te splitsen naargelang de auto stopt of niet. II. VOORSPELLING VAN HET GEDRAG VAN EEN VOERTUIG A. Voorverwerking Na het bouwen van de wegengraaf gedeni erd door Cao e et al. [2], werd de volledige dataset gekoppeld aan de wegsegmenten. Om de data beter te uit te lijnen werden de trips elke meter genterpoleerd zodat de data op een afstandsschaal komt. B. Tijdsreeksvoorspelling met enkel RC model(RCLA) De snelheid, acceleratie en het vermogen worden gebruikt als input in aparte reservoirs met 150 neuronen elk. De uitganggewichten worden getraind met ridge regressie [4]. Elke voorspelde waarde wordt teruggekoppeld om recursief de rest van de reeks te voorspellen. De reservoir-uitgang op elke stap t wordt uitgebreid met de informatie van de vorige trips op elke tijdsstip t om de volgende uitgang y(t) te voorspellen. C. Herverwerking van uitvoervenster (OWPP) De tijdsreeksen worden slechts getraind op de volgende stap. We hebben gemerkt dat de invloed van de voorspelde uitvoer t.o.v. de informatie van vorige trips vermindert naargelang de predictie-afstand langer wordt. Het uitvoervenster werd daarom herverwerkt door lineare regressie toe te passen op elke stap t apart en de de voorspelde waarde te combineren met de gemiddelde waarde op punt t. D. Stopkans voorspelling Een reservoir werd gecombineerd met een logistische uitleesfunctie om een punt t te classiceren als een punt waar de auto stopt binnen de 200 meter met de volgende input: Het huidig vermogen, de snelheid, de acceleratie, de gemiddelde acceleratie van de vorige trips op het punt t + 20 en als laatste de gemiddelde kans om binnen de 200 meter te stoppen van punt t + 20. Voor de evaluatie van dit model werd de oppervlakte onder de ROC-curves gemaximaliseerd (AUC). De fracties echte positieven en fout-positieven worden berekenend
I. I NTRODUCTIE LEKTRISCHE voertuigen (EV) zijn meer en meer commercieel aantrekkelijker maar de verkoopscijfers blijven tegenvallen, dikwijls o.w.v. de hoge kostprijs van de batterij. De batterij is de meest gebruikte manier om energie op te slaan in EVen, maar ze zijn niet zo goed in staat om grote pieken in het vermogen op te vangen zoals condensatoren. Nieuw onderzoek stelt voor om supercondensatoren condensatoren met een veel grotere energiedensiteit te gebruiken in EVen i.p.v. batterijen. Het ChargeCar project [1] stelt voor om de voordelen van beide te gebruiken zodat goedkopere batterijen en supercondensatoren gebruikt kunnen worden. De condensator wordt dan gebruikt als buffer tegen hoge pieken in het vermogen. Dit verbetert de levensduur van de batterijen, de efci ntie in e koud weer en kan zelfs het bereik van het EV vergroten. Om de energiestromen tussen batterij, condensator en motor te sturen, is een controller nodig. In dit artikel introduceren we verschillende manieren om het gedrag van een voertuig en stops te voorspellen. Deze voorspellingen kunnen dan gebruikt worden om een intelligente controller te verbeteren. We maken gebruik van Reservoir Computing (RC), een vrij nieuwe techniek om recurrente neurale netwerken te trainen [3]. In plaats van alle interne gewichten te trainen, worden alleen de uitganggewichten getraind. De rest van de verbindingen blijven constant en worden willekeurig gegenereerd. Als eerste passen we een automatisch algoritme aan om GPS kaarten te genereren [2] om informatie bij te houden van de auto (zoals het huidig vermogen, snelheid, enz...). Deze informatie kan dan gebruikt worden om de modellen te verbeteren. Bovendien zagen we dat het gewicht van de
voor elke drempelwaarde die de klassen kan scheiden van de uitvoer waarde tussen [0, 1] van de reservoir uitleesfunctie. We vonden een maximale gemiddelde oppervlakte van 0.955. Na het minimalizeren van de fractie foute classicaties werd een drempelwaarde gevonden die 94.5% van de punten juist classiceert. Uit Figuur 1 kunnen we zien dat de voorspelde stopkans meestal hoog is waar de auto stopt. Rond plekken waar de auto niet stopt maar wel remt is de uitvoer soms ook hoog. Dit resulteert in een fout, maar deze uitvoer kan toch nog nuttig zijn voor andere applicaties.
1.0 Chance to stop 0.8 0.6 0.4 0.2 1000 2000 3000 4000 5000
in Tabel I. Het eerste RC-gebaseerde model RCLA biedt geen grote verbetering tov. de lineaire methodes, maar de OWPP lter kan de gemiddelde kwadratische fout (RMSE) wel sterk verbeteren. Het RCSP model voorspelt de snelheid beter dan de andere modellen, en uitgebreid met een OWPP lter presteert het beter dan elk ander model. In Figuur 2 is de absolute afwijking gegeven over de voorspelde afstand. De OWPP verbetert het resultaat aan het einde van de voorspellingen, terwijl het RCSP model de resultaten verbetert aan het begin.
RMSE (STD) TDW TDWAtdw RCLA RCLA/OWPP RCSP RCSP/OWPP Vermogen (W) 8766 8401 8416 (4.47) 8386 (3.67) 8367 (25.62) 8257 (12.48) Snelheid (m/s) 1.481 1.502 1.423 (0.018) 1.311 (0.012) 1.304 (0.017) 1.257 (0.016) Acceleratie (m/s)2 0.3136 0.3111 0.3130 (0.0005) 0.3081 (0.0002) 0.3023 (0.0005) 0.2992 (0.0006)
0.00 100 Speed (km/h) 80 60 40 20 00
Tabel I G EMIDDELDE RMSE ( EN STANDAARD DEVIATIE )
1000
4000
5000
Figuur 1. Een voorbeeld van de uitvoer van het RC stopkans voorspellingsmodel. De groene gebieden zijn de richtgebieden waar de auto stopt binnen de 200m. Onderaan werd het snelheidsproel gegeven om de vergelijking met het echte gedrag te tonen. De gekozen drempelwaarde wordt getoond met de grijze stippellijn.
9 8 7 6 5 4 3 2 1 00
Power (kW)
50
100
150
8 7 6 5 4 3 2 1 200 00
Speed (km/h)
0.30 Acceleration (m/s^2) 0.25 0.20 0.15 0.10 0.05

TDW TDWAtdw RCLA RCLA/OWPP RCSP RCSP/OWPP
50
100
150
0.00 200 0
50 100 150 200
Figuur 2.
Gemiddelde absolute afwijking over de voorspelde afstand.
E. Gescheiden tijdsreeksvoorspelling (RCSP) Het RCLA model werd gescheiden door een model apart te optimalizeren en te trainen op een dataset met intervallen waar de auto stopt, en een ander model op de andere intervallen. De stopkans predictor wordt dan gebruikt om te bepalen van welk model de voorspelling moet gebruikt worden. Tenslotte wordt ook de OWPP lter apart getraind en toegepast op dit model om de resultaten nog meer te verbeteren. III. E VALUATIE De RC-gebaseerde modellen werden vergeleken met enkele lineaire methodes. De beste modellen maken gebruik van een geschiedenisvenster (TDW): een gewogen gemiddelde van de vorige waarden, getraind met lineaire regressie. Een tweede model breidt het TDW model uit met ook het gewogen gemiddelde van informatie uit de wegengraaf (TDWAtdw). De trips van 1 bestuurder werden gebruikt uit het ChargeCar project. Een willekeurige subset van 2,230,500 punten werd gekozen en verdeeld in 9566 intervallen om te voorspellen. Van deze data werd 25% gebruikt voor training, nog 25% voor validatie en de resterende 50% werd gebruikt om de modellen te vergelijken. De resultaten van de RC modellen zijn de gemiddelden over 10 reservoirs. De resultaten van alle modellen kunnen teruggevonden worden
IV. C ONCLUSIE Het is mogelijk de voorspelling van snelheid, het vermogen en de acceleratie te verbeteren door het gebruik van de informatie van vorige trips en Reservoir Computing. Met een stopkans predictor kan dit model nog meer verbeterd worden. Herverwerking van het uitgangsvenster verbetert de prestaties nog meer. De gemiddelde absolute afwijking van de voorspelling op meter 200 is 6km/u. De voorspelde proelen en de stopkans predictor kunnen bovendien samen gebruikt worden in een intelligente controller om de energie in een EV te sturen. R EFERENTIES
[1] H. Benjamin Brown, Illah Nourbakhsh, Christopher Bartley, Jennifer Cross, Paul S Dille, Joshua Schapiro, and Alexander Styler. Chargecar community conversions: Practical, custom electric vehicles now! Number CMU-RI-TR-, March 2012. [2] Lili Cao and John Krumm. From gps traces to a routable road map. In Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS 09, pages 312, New York, NY, USA, 2009. ACM. [3] David Verstraeten, Benjamin Schrauwen, Michiel DHaene, and Dirk Stroobandt. An experimental unication of reservoir computing methods. Neural Networks, 20(3):391403, 4 2007. [4] Francis wyffels, Benjamin Schrauwen, and Dirk Stroobandt. Stable output feedback in reservoir computing using ridge regression. In V. Kurkova, R. Neruda, and J. Koutnik, editors, Proceedings of the 18th International Conference on Articial Neural Networks, pages 808817, Prague, 9 2008. Springer.
Toelating tot bruikleen - Copyright

De auteur geeft de toelating deze masterproef voor consultatie beschikbaar te stellen en delen van de masterproef te kopiren voor persoonlijk gebruik. Elk ander gebruik valt onder de beperkingen van het auteursrecht, in het bijzonder met betrekking tot de verplichting de bron uitdrukkelijk te vermelden bij het aanhalen van resultaten uit deze masterproef. The author gives permission to make this master dissertation available for consultation and to copy parts of this master dissertation for personal use. In the case of any other use, the limitations of the copyright have to be respected, in particular with regard to the obligation to state expressly the source when quoting results from this master dissertation.
Jonas Buyl
June 10, 2012
Acknowledgments
First, I would like to thank my promotors prof. dr. ir. Benjamin Schrauwen and dr. ir. David Verstraeten for their advice and for making this research possible. I would also like to thank my supervisor Pieter Buteneers for his guidance and patience for letting me work and discover at my own pace. On a personal level I owe much gratitude to friends and family for their support. Especially towards Sara Im very grateful for her understanding and patience. Lastly, I thank my parents for giving me the means to study.
ii
Power demand prediction of a vehicle on a non-xed route by Jonas Buyl Thesis submitted in partial fulllment of a Master Degree in Engineering: Computer Science Academic year: 2011-2012 Universiteit Gent Faculty of Engineering
Promoters: prof. dr. ir. Benjamin Schrauwen, dr. ir. David Verstraeten Supervisor: ir. Pieter Buteneers
Summary In this thesis several approaches are presented to predict the future power demand and speed of a car, as well as other upcoming events that aect this demand. First, a road graph data structure for automatic GPS map generation is adapted to capture local vehicle behavior information. The average local vehicle behavior is then used as extra information for the time series prediction of the power demand, speed and acceleration using Reservoir Computing, which is a novel technique for training recurrent neural networks. The predicted output window is post-processed using a simple linear technique. Thirdly, another system is presented that uses the current acceleration prole as well as the information in the road graph to predict the chance a car is going to stop within the next 200m. Finally, two separate time prediction models are trained, one for when the car stops over the next 200m, and one for when it does not. For each prediction, the model used is then determined by the stop prediction model. Keywords: reservoir computing, vehicle behavior prediction, road graph, electric vehicles
iii
Contents
1 Introduction 1.1 A battery-capacitor hybrid setup . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 1.1.2 1.2 1.3 Battery vs. supercapacitor . . . . . . . . . . . . . . . . . . . . . . . Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 2 4 5 5 6 6 7 9 9 9 . . . . . . . . . . . . . . . . . . . . . .
Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Content and structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Reservoir Computing 2.1 2.2 2.3 Introduction to neural networks . . . . . . . . . . . . . . . . . . . . . . . . . The Reservoir Computing approach 2.3.1 2.3.2 2.4 2.5 Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Logistic regression . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Regularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.5.1 2.5.2 2.5.3 2.5.4 2.5.5 Input scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Spectral radius . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Leak rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Bias scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Reservoir size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Time series prediction using Reservoir Computing . . . . . . . . . . 16
2.6 2.7
Time series prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.6.1 Classication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 18
3 Data analysis 3.1 3.2 3.3
A road graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Extracting useful information . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Error measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
iv
3.3.1 3.3.2 3.3.3 3.3.4
Dening a prediction distance . . . . . . . . . . . . . . . . . . . . . . 22 Root Mean Square Error (RMSE) . . . . . . . . . . . . . . . . . . . 24 Kurtosis Dierence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Receiver Operating Characteristic (ROC) . . . . . . . . . . . . . . . 25 28
4 Time series prediction of vehicle power, speed and acceleration 4.1 4.2
Evaluation methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Baseline models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2.1 4.2.2 System setups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 System setups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Parameter optimization . . . . . . . . . . . . . . . . . . . . . . . . . 38 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.3
A prediction system with Reservoir Computing . . . . . . . . . . . . . . . . 36 4.3.1 4.3.2 4.3.3
4.4
Output window post-processing . . . . . . . . . . . . . . . . . . . . . . . . . 45 47
5 Stop prediction 5.1 5.2
Predicting the chance to stop using RC . . . . . . . . . . . . . . . . . . . . 48 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 55
6 Splitting the system for stopping and driving behavior 6.1 6.2
The model setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6.2.1 RCSP Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 59 61
7 Conclusion A Extra tables
A.1 Kurtosis dierence results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 A.2 Model parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 B Extra gures 66
B.1 Stop prediction model examples . . . . . . . . . . . . . . . . . . . . . . . . . 66 B.2 Time series prediction examples . . . . . . . . . . . . . . . . . . . . . . . . . 68
Chapter 1
Introduction
Electric cars or vehicles (EVs) are increasingly commercially viable but seem to be holdback by a number of problems. Not only are there a lot of myths around EVs but there remain some real issues. One of the myths, for example is the issue of battery life expectancy which has been largely solved. Nissan even announced an eight-year warranty on the batteries of its electric model: the LEAF (gure 1.1). People usually drive longer than 8 years with a car so some money will be spent on battery replacements, but these maintenance costs are lower than the more frequent repairs needed in a regular gas-powered car [9]. The LEAF is not as successful as anticipated however because the price is still too high for people to switch. Often a third of the total price is spent solely on the expensive lithium batteries. Prices are expected to drop through mass production and governments all over the world give substantial incentives. A radical change may be necessary, however, to make it interesting for consumers to buy an EV for shorter distances while keeping the regular car for long distances. One idea could be to mitigate the disadvantages of cheaper batteries (such as a short life expectancy) in another cheaper way, so we could bring the price down.
1.1
A battery-capacitor hybrid setup
The ChargeCar project [1] is committed to nding new ways to bring down the costs for EVs. One of their ideas is to exploit the advantages of both batteries and capacitors.
1.1.1
Battery vs. supercapacitor
The battery has been the main way of storing energy in EVs because of their large powerto-weight ratio. This simply means it allows the car to go further without adding a lot of
Figure 1.1: The Nissan LEAF full electric vehicle
weight. The downside however is that batteries are generally more inecient when coping with large spikes in power demand. Not only do fast charges and discharges decrease battery life expectancy, they also decrease the capacity of batteries. This is especially true for lead-acid batteries, due to the chemical and structural changes in the interface under high load. These increase resistance and therefore decrease capacity. For lithium-based batteries on the other hand, it is shown that they lose capacity because of the higher temperatures caused by high power load. [11]. Capacitors, like batteries, are electrical components used to store energy. They consist of two metal plates separated by a thin isolation layer. Electrons can then be transfered from one plate to another, charging and discharging the capacitor. One advantage over batteries is that they show little degradation even after several hundreds of thousands of charge cycles. Theyre especially more procient in handling large power demand peaks than batteries. The energy density of capacitors is a lot lower than batteries however. Supercapacitors1 on the other hand, have a much greater energy density than capacitors [24]. The amount of energy that can be stored in a capacitor increases with the surface area of the metal plates. In supercapacitors, the plates are coated with a carbon layer, etched to produce many holes that extend through the material, much like a sponge. This increases the interior surface area many orders of magnitude, greatly increasing the energy density (> 100,000 times).
1
Also referred to as electric double-layer capacitor (EDLC), ultracapacitor, etc...
1.1.2
Controller
The solution presented by ChargeCar is to exploit the advantages of both a battery pack and a supercapacitor. The capacitor is used for high spikes in power demand, and to save energy generated while braking (through regenerative braking). When the capacitor is empty, the battery is used to supply power. When generating power while the capacitor is already full, the battery is charged. The supercapacitor eectively works as a buer between engine and battery, relieving the battery. Using both systems together then allows car manufacturers to use cheaper, more cost-eective components. Furthermore, capacitors function well in temperatures as low as 40 C, when batteries are at their worst. To direct the energy ows between battery, capacitor and engine a controller is needed as
Battery Controller Engine Capacitor
Figure 1.2: A controller guides power ows between battery, capacitor and engine
shown in gure 1.2. When accelerating, as much power as possible should be drained from the capacitor to handle the high energy demand for accelerating. When braking all energy generated by regenerative braking should be saved in the capacitor. After accelerating and keeping a constant speed, the capacitor will be nearly empty, which means the battery will need to be used. Ideally this is the only time the battery is used. Now consider the following situation: the car is approaching an intersection but only stops there sometimes. It could be useful in this situation to make sure the capacitor is completely lled in case the car does slow down. Another example: the car is driving steadily at 70km/h but wants to overtake another car. The capacitor should have some energy left to handle the short power burst, which means its desirable to transfer some energy slowly from the battery to the capacitor, when the capacitor is almost empty, to handle any possible peak. Finding an optimal controller then, is a complex problem. To minimize battery usage it could be benecial to try to predict vehicle behavior and upcoming driving environments. An intelligent controller could then use these predictions to optimize capacitor usage.
1.2
Problem statement
In this thesis we will investigate to what extend it is possible to predict the future power demand and speed, as well as other upcoming events that aect this demand. These predictions could then be useful for the controller described above. Previous research on this subject includes the prediction of power demand for hybrid vehicles on a xed route by Bartholomaeus et al.[4] and Johannesson et al.[18], but these make use of a xed route where the data set is perfectly aligned, and they predict vehicle behavior assuming the vehicle drives along the same route. The reality however, is much more complex. To truly investigate the possibilities of prediction in real-life situations we do not make the assumptions made by Bartholomaeus and Johannesson, making it less relevant to compare results. In this work predictions are made without assuming the vehicle is on a xed route. The models presented here can be easily adapted to work under real-life circumstances where vehicles are driving on a non-xed route, and data is collected while driving.
1.3
Content and structure
To do this we rst try and gather information of previous trips in a single data structure in Chapter 3. We then try several approaches using Reservoir Computing and other machine learning techniques explained in Chapter 2. One approach is to use these techniques for time series prediction of power demand and other factors that it depends on (e.g. speed) which we discuss in Chapter 4. In Chapter 5 we try to calculate the chance of stopping within a short distance. Finally in Chapter 6, we use the stop predictor from Chapter 5 and determine if they can improve the prediction models presented in Chapter 4.
Chapter 2
Reservoir Computing
When problems become so complex they cant be solved eciently by ordinary algorithms sometimes a near-optimal solution can be found more eciently using machine learning techniques. This usually comes down to a model that is trained to capture underlying characteristics of data. These can then be used to predict the output of new input data.
2.1
Introduction to neural networks
A neural network is a model based on the biological structure of the brain and consists of several interconnected neurons. Each neuron has input and output connections that connect it to the rest of the network. The output is calculated by taking a weighted sum of those input connections, usually transformed by a non-linear activation function (for the rest of this work the hyperbolic tangent tanh is used). When there are no recurrent connections or cycles in the network, the network is called a feed-forward neural network (FFNN). If there are cycles in the network it is called a recurrent neural network (RNN). A neural network is trained by adjusting the weights according to the error rate between target and predicted output. If the output depends on large chains of neurons the adjustments can become so small that they cant be calculated anymore. In RNNs the output depends on innite chains of neurons which makes it very hard to train this type of neural network. Algorithms like back-propagation-through-time [30] are able to solve the problem but it the algorithm is very complex and takes a long time to calculate.
2.2
The Reservoir Computing approach
Reservoir Computing (RC) is a fairly new approach to training recurrent neural networks [28]. Its a unifying term for several similar methods discovered independently, the most important ones being Echo State Networks [16] and Liquid State Networks [21]. The idea is to never train the network itself but to only train the weight of each neuron to the output: a readout function. All other weights, such as the input connections and internal connections, are xed and initialized randomly, but can be scaled and tuned (see section 2.5 on reservoir parameters). To understand the dynamics of reservoirs consider the following analogy, which is usually given to explain Liquid State Networks. It does not capture the whole picture of Reservoir Computing, but it gives an idea of what happens to the state of the reservoir when aected by an external input. Imagine the hidden layers of the reservoir network as a real reservoir or liquid. We would like a warning system that warns us when someone throws a large object in the liquid (the input). A single throw will generate ripples in the reservoir, converting that input to a spatio-temporal pattern along the surface of the reservoir. To detect this pattern we place oating sensors in the reservoir which are evidently connected through the liquid. The state of the reservoir can then be read from the sensor values at a specic point in time. This analogy makes it clear that certain parameters can heavily inuence the reservoir dynamics: the number of sensors, the size of the thrown object (the input), the way the connecting surface behaves when an object falls in the water, etc... They are further discussed in section 2.5 in the context of Reservoir Computing specically. In general, reservoirs are used to give a high-dimensional dynamic representation of the input, called the state of the reservoir. Because they are interconnected, they possess a memory which depends largely on the scaling of the internal connections. Extra memory can also be introduced for every neuron individually by retaining a part of the previous neuron output value. To work properly, the reservoir needs to satisfy the Echo State Property[16, 15]: the reservoir needs to wash out any information from initial conditions. In practice, a reservoir network consists of the following:
Input
Reservoir
Output
Figure 2.1: A schematic representation of a typical reservoir network. Solid arrow lines are not trained. Dashed arrow lines are trained connections.
u[k] The reservoir input vector on time step k x[k] Reservoir state on time step k y[k] The output vector on time step k A schematic representation is given in Figure 2.1. The state of the reservoir x[k] that retains 1 of the previous state x[k 1] at each time step k is given by: x[k] = (1 ) x[k 1] + f (Wr x[k 1] + Wi u[k] + Wb ) The weights of the internal connections Wr are initialized with random values from the normal distribution, but scaled so that the largest absolute eigenvalue of the random matrix is equal to a given parameter value: the spectral radius (see subsection 2.5.2). The input weights Wi are initialized randomly as well, but are rescaled by the input scaling parameter (subsection 2.5.1). A bias is sometimes added to the input with scaling Wb (see subsection 2.5.4). The output is calculated by:
o o y[k] = Wr x[k] + Wio u[k] + Wb o o The output weights Wr (reservoir to output), Wio (input to output) and Wb (output
bias) need to be trained. They are the dashed connection arrows in Figure 2.1.
2.3
Training
The main advantage of training recurrent neural networks using Reservoir Computing is
o that only the weights Wr of the reservoir neurons to the output need to be trained. An
additional linear connection straight from input to output with weight Wio is sometimes
o added, as well as a constant value (or bias) with weight Wb .
This training approach not only reduces the time required for training but also allows a wider variety of training methods to train the output weights. For this work, two different training methods are used:
2.3.1
Linear regression
The rst step for training the weights using linear regression consists of letting the reservoir run over all the samples and keep the reservoir states on every time step k in a matrix A. Suppose we want to train the weights W using simple linear regression then we want to nd the least squares solution of the desired output y and the predicted output y: W opt = argmin A W y
W 2
There exists a closed-form solution: W opt = (AT A)1 AT y Although A is large (nsamples nneurons ), its still possible to calculate the output weights relatively fast when compared to other RNN training techniques such as BPTT.
2.3.2
Logistic regression
Logistic regression is a classication method that models the probability of an input sample x belonging to a certain class. In contrast to other probabilistic models, logistic regression uses a discriminative approach which classies the inputs directly with the following probability: p(C1 |x) = f (x) = (wT x) = 1 1 + exp(wT x w0 )
and p(C2 , x) = 1 p(C1 , x) when solving a binary classication problem.
1.0 0.8 0.6 p(C|x) 0.4 0.2 0.0 6
Class C1 Class C2
0 x
Figure 2.2: An example of classication using the logistic function. The shaded area is the overlap area between decision spaces. The red line is the suggested hard threshold
These distributions are visualized in Figure 2.2. When distributions are not linearly separable1 theres an overlap area which means hard threshold needs to be dened. This is often the point at which both probability functions intersect, but other thresholds can be chosen if misclassication costs are dierent for the two classes. The weights of the logistic regression model are found by minimizing the cross-entropy function: E(w) = ln p(t|w) =
n=1 N
tn ln yn + (1 tn ) ln(1 yn )
Where tn {0, 1}, 1 if the input sample belongs to class C1 . yn is the predicted output of the model. Note that correctly classied samples that lie far from the decision line do not get penalized. In ridge regression however, when samples would be correctly classied, but lie far from the target output, they get penalized when they lie far from the target output. This is further illustrated in Figure 2.3. The solution for the minimization does not have a closed form but it is a convex problem2 so we can nd it through gradient descent3 . There exists a gradient descent approach
1
Two sets of points in two dimensions are linearly separable if they can be separated by a single straight
line A convex problem is a problem that has a unique minimum Gradient descent is an optimization algorithm that nds the minimum error by taking steps to the negative of the gradient (or derivative) of the error function
3 2
10
2.0
1.5
E(z)
1.0
0.5
0.02.0
Cross-entropy function Mean squared error

1.5 1.0 0.5 0.0 z = y*t 0.5 1.0 1.5 2.0
Figure 2.3: The error measure E(z) for the mean squared error of target and model output (used e.g. in classication using linear regression) and the cross-entropy function used in logistic regression. In z = y t, y is the model output, and t the target output of the model. For t = 1: a model output y = 2 is penalized more by the mean squared error than a model output of y = 1, although they are both classied correctly. For the cross-entropy function this is not true, and could therefore be more suitable for classication.
based on the Newton-Raphson iterative optimization scheme called iteratively reweighed least squares (IRLS) [23]. The weights are updated each iteration subtracting the derivative of the error function divided by the second derivative. The derivation and specics of this algorithm are not important for this work but the basics steps consist of the following each iteration: y = ((w() )T x) Rnn = yn (1 yn ) z = Xw() R1 (y t) w(+1) = (X T RX)1 X T Rz
2.4
Regularization
When training a complex system, the model can become overtted to the training samples. This means that the model will perform well on the training set, but not on new test data, because it is trained on examples that are not representative for the full range of possibilities. When the model is then tested on a sample it has not seen before in the training set, it wont know what to do with it. For example[27], suppose we want to train a model to predict the Fibonacci sequence [1, 1, 2, 3, 5, 8, ...], and we give it the examples
11
[1, 1], the model will be trained to output 1. The training examples, like in the Fibonacci sequence, are examples of the underlying characteristics, aected by a small deviation or noise. The underlying characteristic for the Fibonacci sequence is known: the n-th Fibonacci number can be calculated by rounding
1 2. (1+ 2)n . 5
This deviation between the
training examples and the underlying characteristic therefore lies always between 1 and 2 If the model is too complex, the deviation is trained as well. One of the ways to smooth this noise is by constraining weight size. This makes the model less sensitive to noise and slight deviations. However, if this constraint is too strict, the model is simplied too much to learn the underlying characteristics. A trade-o needs to be made. Using Tikhonov regularization [25] or ridge regression, the trade-o can be tuned with a single regularization parameter [32]. To nd the least squares solution the regularized weights W are now found by: W opt = argmin A W y
W 2
+ W
in which is the regularization parameter. This minimization problem has a closed-form solution as well, the weights can be calculated as follows: W opt = (AT A + I)1 AT y When is large, the size of the squared weights will increase the cost a lot. Setting too high however, will increase the distance between the optimal solution and the regularized solution (referred to as undertting). To optimize , the same model is trained each time with a dierent . The performance of each is then evaluated on new samples that are not a part of the training set. For training reservoir networks in particular, its important to note that the weights depend on the random initialization, and that a regularization parameter needs to be optimized for every reservoir specically. More regularization is needed as reservoir size increases because complexity increases: in the extreme case there is a reservoir node for every training sample, mapping the sample exactly to the output. On the other hand, if reservoir size is extremely low, no regularization is needed because the model is not as complex. For logistic regression, proper regularization is often necessary as well. The optimized regularization parameter can be added to the IRLS algorithm easily by modifying the
12
weight update as follows: w(+1) = (X T RX + I)1 X T Rz
2.5
Parameters
As mentioned before, there are a number of parameters that need to be determined to control the dynamics of the reservoir. The results of a model trained using Reservoir Computing depend on the careful ne-tuning of these parameters. When using regularization in the readout function, the regularization parameter should be optimized separately for every reservoir parameter. Each model is trained using several dierent regularization parameters and tested on a validation set. The model with the optimal regularization parameter is then evaluated again on a separate test set to be sure of the general performance of the model. We therefore need divide the dataset in three parts. This could be a problem, especially when using a limited amount of data because we could accidentally choose a poor set of samples which can lead to misleading results. The best solution to counter this problem is cross-validation. The dataset is divided in K subsets. Each subset is then used exactly once as a test set and the others as a training set. After this, the result is averaged, making sure the result is valid for the complete dataset. When an extra validation set is needed as well, the subsets used for training are divided again in a smaller training subset and a validation set. For example, suppose the dataset consists of 4 samples then the cross-validation scheme is shown in Table 2.1.
2.5.1
Input scaling
Input scaling determines the scaling of the random input weights to the reservoir. They determine how much the neurons are excited by new input values. For very low input values the nonlinear neuron activation functions are barely activated resulting in an almost linear system. Very high input values however, will saturate the activation function, resulting almost in a binary step function. In other words: the input scaling determines the degree of nonlinearity in the system.
2.5.2
Spectral radius
The spectral radius of a reservoir is the largest absolute eigenvalue of the weight matrix of the internal connections between the neurons in the reservoir. It therefore denes the factor by which the previous states are multiplied in the reservoir state update (section
13
Training set 1,2 1,3 2,3 1,2 1,4 2,4 1, 3 1, 4 3, 4 2, 3 2, 4 3, 4
Validation set 3 2 1 4 2 1 4 3 1 4 3 2
Test set 4
Table 2.1: An example cross-validation scheme where a dataset of 4 samples is divided in a training set, a validation set, and a test set[27]
2.2). If we choose a spectral radius < 1, the input values will eventually fade out, ensuring stability and the echo state property. With a spectral radius > 1, the reservoir can become unstable if the reservoir is near linear. The internal connections between the neurons add memory to the reservoir. The spectral radius and the scaling of the internal connection weights therefore inuences the time scale of reservoir. For input that evolves slowly, or that has long range temporal interactions, the spectral radius is usually chosen close to 1, or even higher if the reservoir is nonlinear enough.
2.5.3
Leak rate
The leak rate of each neuron in the reservoir controls the retainment rate of the previous neuron output. It inuences the memory of the reservoir directly. This also means it aects the inuence of new state updates and therefore also makes the reservoir adapt more slowly to new situations. Therefore, a trade-o between the inuence of long-term dynamics and the inuence of new input needs to be made.
2.5.4
Bias scaling
A constant 1 may be added to the input, multiplied by the bias scaling parameter. This shifts the working point on the sigmoid activation function tanh of the neuron. The steep-
14
ness of the sigmoid is largest around the origin. Shifting the point upward or downward therefore makes the reservoir less dynamic. An illustration of the inuence of the bias to the activation function is shown in 2.4.
1.0
0.5
0.5
1.0
Figure 2.4: Illustration of the eect of bias scaling. Using 0 as working point for the input broadens the spectrum of the neuron activation (red line). When shifting the bias, the neuron exhibits a less dynamic behavior (the green line).
2.5.5
Reservoir size
The reservoir size is the amount of neurons in the network. Increasing reservoir size usually improves the result, assuming sucient regularization. Its therefore not really an optimizable parameter as the reservoir size is normally determined according to the computational power available.
2.6
Time series prediction
Predicting time series, as it is basically predicting the future, has been the focus of much research throughout history. The basic idea is to rst observe a training sequence and then try to complete the sequence over a number of steps in the future, also referred to as the number of freerun steps in RC literature. To be able to predict by only taking into account the history there needs to be a pattern in the sequence. That pattern can either be periodical (like 1, 2, 1, 2, 1, 2, . . .) or contain a certain trend (such as the Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, . . .).
15
2.6.1
Time series prediction using Reservoir Computing
Reservoir Computing has already been successfully used for the signal generation task [17]. The model is trained by training it to always output the next step ahead. After training these signal dynamics, the rst unknown signal value is predicted according to the learned signal transitions (teacher forcing). The predicted value is then used as input value to predict the following signal value, and so on. This continues until the required number of predicted steps is reached. Its very important to let the reservoir neurons warm up suciently before starting prediction, as the model obviously cant complete a sequence if it doesnt know the rst part of the sequence. Warming up a reservoir is done simply by letting the run the reservoir over the history of the signal and feeding back the observed values. The evolution of neuron activations when running over an example power demand prole can be seen in Figure 2.5. In this case, we can see that the neuron output values are initialized at a value close to 0 and take about 100 state updates before patterns begin to appear in the reservoir.
0.8 0.6 Neuron activation 0.4 0.2 0.0 0.2 0.4 0.6 0.80 50 100 Time Steps 150 200
Figure 2.5: An example of the output values of the rst 10 neurons over the rst 200 state updates when running the reservoir over a power demand prole of a car
One of the short-comings of time series prediction with Reservoir Computing, is that often it can only act on short-term time scales, although this can be partly solved by retaining a part of the previous state. However, RC has already been tried successfully for long-term nancial time series prediction in [31] and [26]. Financial time series can often be decomposed in periodical patterns, a trend and a remainder. These signals can then be predicted separately and combined again after prediction. In this work we predict the time series of vehicle behavior such as the power demand course, the speed prole
16
and acceleration. After initial experiments it became clear however that these are not as straight-forward to predict as previously attempted and resulted in poor results. Our approach to this problem is explained in Chapter 4.
2.7
Classication
Reservoir Computing can also be successfully applied to many temporal classication problems such as speech recognition [27] and the detection of epileptic seizures [7]. In the rst example, the samples are isolated and need to be classied as a whole. In the second, the input is a signal where the class needs to be dened at every time step. Every classication then depends on the current input values and the previous input samples (thanks to the memory properties of reservoir networks). The advantage of classication using RC is that the reservoir not only maps the input values to a high-dimensional feature space but that input values are memorized in the reservoir for a certain time in order to detect the temporal patterns as well. The reservoirs states can then be classied using any classication technique. The classication problem handled in this thesis is explained and addressed in Chapter 5.
17
Chapter 3
Data analysis
The data is supplied by ChargeCar.org [1]. It consists of the GPS coordinates and elevation of each point sampled on a one-second interval. The dataset contains data from multiple drivers, but the GPS points are not gathered in the same neighborhood, so they cant be combined. Well focus on the available data from one driver alone as it already allows for extensive research. We will be using the trips from a driver who drove around in south San Francisco. In total about 6915 km is covered in about 159 hours (Figure 3.1). The driver frequently drives along the same road segments. It could therefore be interesting to keep data from previous trips over the same road. Future predictions can then be based on the previous trips when passing in the neighborhood.
Figure 3.1: ChargeCar GPS data
18
3.1
A road graph
To access data from previous trips easily, a suitable data structure is needed along with an ecient algorithm to build it. The greatest diculty is detecting when a car is driving on a road where he has already been, and quickly nding the related GPS points. We follow the approach described by L. Cao and J. Krumm to build a graph of directed road segments [8]. They present a new and fully automatic method for creating a routable road map from GPS traces of everyday drivers. The algorithm described by Cao and Krumm performs well for ensuring road connectivity and dierentiating GPS traces from close parallel roads. First, to increase eciency, a separate dataset is made by generally retaining only one point every 30 meter. When the direction change over the last three points is greater than 10 degrees, the GPS points are retained every 10 meter. This increases accuracy when the car is making a turn. Some of these points will be close and should be merged, making sure the connectivity in the graph is not lost. To build the data structure we start with an empty graph. Each trip is then processed sequentially. For each node in a trip the graph is searched to decide whether it should be merged with an existing graph node. Intuitively, a new node n should be merged with node v from the graph if theyre on the same road segment. Let e be a road segment that connects v and another node v from the road graph. Then n should be merged with v if the distance from e to n is small enough, the trip goes in the same direction as e, and n is closer to v than v . An illustration of the process can be seen in Figure 3.2. The rst trip becomes the initial graph. In the second trip, the 2nd, 3rd, 4th and 5th nodes are merged with the existing nodes. The 1st, 6th and 7th node are copied to the graph. The road segments from the second trip are connected from the new nodes to the existing nodes to ensure connectivity. No nodes from trip 3 satisfy the merge conditions so they are simply copied to the graph. The algorithm without optimization is very inecient because every GPS point of the trip needs to be compared to every node in the graph. The time required to add a trip increases dramatically as the number of nodes in the graph increases. However, its clear that the current GPS point will never need to merge with nodes far away. For this purpose, the nodes are kept in a 2-d tree. Using a 2-dimensional distance tree all nodes within range can be looked up in O(log N ) time [19] (where N is the amount of nodes in the road
19
Trip 1 Trip 2 Trip 3
Graph nodes
a) Three trips to merge
b) Trip 1 added to graph G
c) Trip 2 merged with graph G
d) All trips merged with graph G
Figure 3.2: A simple example of the merge algorithm. The circles represent the retained GPS points. The arrowed lines represent the connecting road segments along with the driving direction.
graph). Storing the road graph nodes in a 2-d tree therefore signicantly reduces the time required to add a trip to the road graph.
3.2
Extracting useful information
Although we had no access to real data such as speed and power usage, a lot of information can be calculated from the GPS data. Speed, acceleration, and power demand is calculated for every sample using the power model described by ChargeCar. After building the graph, the complete dataset is used again, and mapped on the road segments between the road graph nodes. This way, the road graph can be built eciently without losing information about vehicle behavior between those points. The dataset consists of samples with 1 second intervals. This means that the time spent over a road segment of 30 meters between two nodes can be very variable, because it depends on the speed of the vehicle over the road segment. In order to correctly calculate
20
and align the average behavior over the road segment, while properly retaining any peaks in the proles, interpolation is needed. The distance traveled over the road segment is almost constant for every trips. The trips are therefore interpolated every meter, converting them from a sample every second, to a sample every meter. Over the rest of the work, the time series described are converted from a time scale to a distance scale. The eect of this interpolation is illustrated in Figure 3.3. Through testing we also noticed that vehicles driving exhibit very dierent behavior from
80 70 60 50 40 30 20 10 00 80 70 60 50 40 30 20 10 00
Speed (km/h)
50 100 150 200 250 300 350 400 Time (s)
Speed (km/h)
500 1000 1500 2000 2500 3000 3500 Distance (m)
Figure 3.3: The speed prole on a time scale (left prole) is interpolated to a distance scale (right prole)
vehicles stopping. Suppose a vehicle stops 50% of the times he passes through an intersection where the speed limit is 50 km/h. The average speed over the road segment would then be 25 km/h. However, he rarely really drives at this speed, but usually either stops and continues slowly over the intersection, or he doesnt need to stop, and keeps driving at 50 km/h. We therefore separate the captured information in a slow set and a fast dataset on every road segment. When passing through a road segment, the current vehicle behavior is added to the slow set if: The car stops on the current road segment, i.e. if the car drives slower than 2 m/s = 7.2 km/h at any point over the road segment. Or, the car stops in a road segment within 100 m before or after the current position, and the cars average speed over the segment is more than 2m/s slower than the total average speed of the fast set.
21
If neither condition is satised, the information about the cars behavior over the segment is added to the fast data. The average proles over a path in the road graph can now be collected by concatenating the average proles captured in each road segment. An example of the average speed proles is given in Figure 3.4. Small jumps in the average proles couldnt be avoided, because the amount of trips, that drive over the dierent road segments in the chosen path, is variable.
70 60 50 Speed (km/u) 40 30 20 10 00
Slow speed data Fast speed data

500 1000 Distance (m) 1500 2000 2500
Figure 3.4: An example of the average speed proles over a trip in the road graph
3.3
Error measures
To evaluate the techniques used, obviously some sort of evaluation method is needed. The ultimate goal is to minimize battery usage, but the controller using these predictions remains hypothetical. Its therefore impossible to dene a single number to capture the goodness of a model. Its still possible however, to reason about the usefulness of the proposed models using the following error measures.
3.3.1
Dening a prediction distance
If we want to evaluate the predictive capabilities of each model we should rst specify on what scale prediction is required. Of course, this can be very dierent for each application. The initial purpose for this work however, is the improvement of a battery/capacitor controller so we will focus on this example. The most ideal situation would be to allow the capacitor to empty completely while driving at a constant speed, to be ready to save all the energy contained in the moving car. The theoretical maximum prediction distance is then the distance traveled to empty the
22
capacitor. This can be calculated using the specications of the vehicle used to capture the GPS data. In the ChargeCar data a Honda Civic is used. The power required for this vehicle driving at a constant speed is can be calculated as follows, with the constants and units described in Table 3.1.
P = Symbol A Cd D Cr m g W v P
A Cd D v 3 + Cr m g v 2 Value 1.988 m2 0.31 1.29 kg/m3 0.015 1200 kg 9.81 m/s2 190080 J m/s W
Description Frontal area Drag coecient Air density Roll resistance coecient Car mass Gravitational acceleration Capacity energy capacity Vehicle velocity Power
Table 3.1: Constants and units needed to calculate the prediction distance
The supercapacitor used in the ChargeCar test car is the Maxwell BMOD0165. The maximum stored energy in this capacitor is 52.8 Wh or 190080 Joule. The time needed to use the 190080 J while driving at constant velocity v is then:
t = =
190080 J
A Cd Dv 3 +Cr mgv 2
190080 J 0.3975v 3 +176.58v
The distance covered in function of that time and velocity is then:
d=v
190080 J 0.3975 v 3 + 176.58 v
23
1200 1076 1000 800 Distance (m) 600 400 200 00 20 40 60 80 Speed (km/h) 100 120 140 750 581 447 307
Figure 3.5: Maximum prediction distance vs. vehicle velocity
Figure 3.5 shows that the required distance decreases as speed increases. At 1 km/h the maximum distance is 1076 meter. As it approaches 0, the distance approaches +. From this result we would say that we need to predict the coming kilometer. However, he car will usually be driving at minimum 50 km/h. The distance required at this speed is 750 m.
3.3.2
Root Mean Square Error (RMSE)
The root mean square error is a frequently used error measure of the deviation between a predicted model and the actual observed values. It allows us to compare two signals (e.g. the speed proles) and aggregate the point-wise dierences (or residuals) between them into a single number to evaluate the models used. For vectors x (observed values) and x (predicted values) the RMSE is calculated as follows: RM SE(, x) = x M SE(, x) = x
n x i=1 (i
xi )2
3.3.3
Kurtosis Dierence
Kurtosis is a measure of peakedness, in Machine Learning usually used as a measure of non-Gaussianity 1 [3]. Through experiments we noticed a model sometimes converges to a weighted average of the history of the current trip. This might be the best result according to the RMSE and MAD error measures above but will not be as useful for a controller
1
Gaussianity is the similarity of a distribution with the normal (or Gaussian) distribution
24
because a controller needs a predictor that can correctly predict energy spikes or other events that have a large inuence on energy usage. It could then be useful to compare the peakedness of the prediction prole with the peakedness of the observed prole. We could then get a better idea of the similarity of both signals. The peakedness (or kurtosis) of a vector x with mean x is calculated by: Kurt(x) =
1 n 1 n n 4 i=1 (xi x) n 2 2 i=1 (xi x)
To compare the peakedness, the kurtosis dierence error measure is presented. No previous work was found on this measure, at least in the context of Machine Learning. The kurtosis dierence is merely used in an attempt to quantify the models in a secondary way. The kurtosis dierence of the model output y and the target output y is calculated by: Kurt( ) Kurt(y). y
3.3.4
Receiver Operating Characteristic (ROC)
The Receiver Operating Characteristic (ROC) is a classication error measure developed in WWII by radar engineers and has since been used in large number of areas. In recent years it has also gained a lot of interest in the eld of machine learning and pattern recognition [13]. The output of classier models is usually continuous, but its often hard to evaluate performance of these model because usually a hard threshold needs to be set to classify the output. The ROC curve is able to visualize the trade-o between the hit rate (or true positive rate or TPR) and the rate of false positives (or FPR) in binary classication. The TPR is equivalent with the proportion of actual positives which are correctly identied. The FPR on the other hand, is the proportion of negatives which are wrongly identied as positive. The curve is calculated by iterating over every possible hard threshold that can classify the output of the model. Classiers appearing on the lower left hand-side of an ROC curve can be thought of as strict or conservative (A in Figure 3.6). They only make positive classication with strong evidence. Classiers appearing on the right hand-side of the ROC curve are less selective but result in a lot of false positives (B). Intuitively the best model then stretches as far as possible to the top left hand-side (C). Random classication models result in points along a straight line from the bottom left to the top right of the graph (D). Classiers under this line (E) can be thought of as worse than random, but if wed inverse
25
1.0 0.8 True Positive Rate 0.6 0.4 0.2
C B
D A E 0.2 0.4 0.6 False Positive Rate 0.8 1.0
0.0 0.0
Figure 3.6: A basic ROC graph showing ve discrete classiers
the classier and switch the target classes, the classiers result is inversed, which makes it perform better again than the random classier. Consequently, the models that extract no knowledge from the data will roughly follow the straight line from the random classier. The ROC measure also provides a way of evaluating model performance without setting a threshold by calculate the total surface area under the curve, known as Area Under Curve (AUC). Well mainly be using this measure for comparing performance of models for binary classication. One single threshold can be selected along the ROC curve to know the exact percentage of correctly classied samples for a selected trade-o between false and true positive rates. Often the threshold is chosen where the false and true positive rates are equal. If the classier is used for a specic purpose, the cost of a false positive is sometimes higher than the cost of a true negative or vice versa. The threshold can then be optimized in relation to that application specically. Two example ROC curves are shown in Figure 3.7, along with the line at which the two error rates are equal.
26
1.0 0.8 True Positive Rate 0.6 0.4 0.2 0.0 0.0
Model 1 Model 2 Equal Error Rate

0.2 0.4 0.6 False Positive Rate 0.8 1.0
Figure 3.7: Example of the ROC curves of two models, including the Equal Error Rate line, where the True Positive Rate is equal to the False Positive Rate.
27
Chapter 4
Time series prediction of vehicle power, speed and acceleration

To predict the future power demand prole of a car using RC techniques, a dierent approach as the prediction of nancial time series is needed, because there are typically fewer periodic factors in driving a car. Stopping and accelerating often happens in a similar way, but these stops come at near random intervals if there is no pre-existing information about the cars environment. Vehicle power demand depends on many physical factors and can be decomposed in other ways: elevation dierences, acceleration, speed, etc... . Elevation is not expected to change over several trips so we can read it directly from the road graph. Predicting the vehicle acceleration and speed however, is far more complex. In this chapter, we present and evaluate several time series prediction models for the vehicles power demand, acceleration and speed prole. An example of these proles can be seen in Figure 4.1. Note that from here on, the proles are evaluated on a distance scale instead of a time scale, as explained in the previous chapter.
4.1
Evaluation methodology
The theoretical prediction distance calculated in section 3.3.1 was 750m. However, the memory capabilities of reservoirs is limited and after predicting a number of steps, the inuence of the real observed samples on the reservoir diminishes[14]. In the setup presented by Jaeger, the input was forgotten after about 400 time steps. When adding noise, the memory reduced to around 200 time steps. Experiments showed that trying to predict any further than 200m with the models presented here, yielded misleading results which were dicult to explain. We therefore decided to limit the evaluation to predicting the
28
20 Power (kW)
90 80 0 70 60 20 50 40 40 30 20 60 10 800 1000 2000 3000 4000 5000 6000 7000 8000 00 1000 2000 3000 4000 5000 6000 7000 8000 3 235 230 2 225 220 1 215 0 210 205 1 200 2 195 30 1000 2000 3000 4000 5000 6000 7000 8000 1900 1000 2000 3000 4000 5000 6000 7000 8000 Distance (m) Distance (m) Elevation (m) Speed (km/h)
rst 200m. As mentioned in Chapter 3, in total about 6915 km is covered. The dataset was converted to a distance scale which means the dataset now contains about 6,915,000 samples. To train and test the models in this chapter, all trips in the dataset were divided in 200m intervals. The information of previous trips over each interval was extracted from the road graph and merged with the intervals. For time series prediction, especially using RC, a warm-up period is needed (see section 2.6.1). We couldt use the reservoir states of the previous intervals because the reservoir states contain the predicted signal over the interval, not the real signal values. A poor prediction would therefore have an eect on the next prediction. Moreover, the previously predicted interval is not always a part of the current trip. A warm-up period is therefore added before every prediction interval. This should be long enough to forget the previous states, but short enough to limit memory requirements, and the computation time required to run over all the warm-up samples. Because the maximum possible prediction distance seemed to be around 200m, we expect that 300m is enough to largely forget the state of the previously predicted interval.
Acceleration (m/s^2)
Figure 4.1: An example power demand, speed, acceleration and elevation prole
29
Since the input weights and the internal weights of a reservoir are chosen randomly, a particularly good reservoir could be generated by training one model while another model was trained with a very weak reservoir. This could lead to misleading results. Therefore each experiment involving RC is done over 10 dierent reservoir instances. When plotting the resulting errors, the standard deviation of the results of a model is given as well using error bars. In text, the standard deviation is given in parentheses along with the average results. To include all intervals, the reservoirs would now need to run 10 times over 17,287,500 samples. Additionally, were predicting the power demand as well as the speed and acceleration prole. Too much time would be required to nish all experiments in a reasonable time frame. Therefore, a random subset of 2,230,500 samples (containing 9566 prediction intervals) was chosen and xed for the remaining experiments. Because of the restrictions above, cross-validation was unfortunately not an option. In-
Warm-up interval
Prediction interval
Figure 4.2: Example of splitting the proles in prediction intervals
stead, the models were trained on the rst 25% of the trips. The next 25% was used as a validation set to optimize the model parameters. The performance of the resulting models was then evaluated on the remaining 50% of the dataset. The test set was chosen large enough to ensure the models are evaluated on a data set large enough to draw general conclusions.
30
4.2
4.2.1
Baseline models
System setups
First, some simple and linear models are presented and evaluated. For each model we dene an abbreviation in the paragraph title to be able to refer to them more clearly afterwards. Last value as prediction (LV) The last observed value is used as prediction for the next predicted values: y(t) = y(t 1). As speed is usually quite constant this should already provide a good estimate. The longest distances in the dataset are often over highways where speed hardly changes. Averages from previous trips as prediction (SA/FA) The averages of the previous trips over the path of the current trip are used directly as prediction. The performance of the slow average proles (SA) and the fast average proles (FA) are evaluated separately. Oset averages as prediction (OA) The predictions of the previous model dont make any use of the current trip however. At least the rst few predicted values should be close to the last known values. To solve this, we rst calculate the dierence between the current trip and the averages of the last observed sample of the interval. This oset is then added to the averages over the rest of the prediction. Weighted time delay window (TDW) A weighted average is taken of the previous values. This allows the model to incorporate the recent history of the current trip. The weights of every point in the recent history window y(t nwindow history: y(t nwindow
size ),
..., y(t 1) are trained using ridge regression to predict one
step ahead: y(t). The predicted value y(t) is fed back and used as part of the current trip
size
+ 1), ..., y(t 1), y(t) to predict y(t + 1).
Of course, we need to know how many previous values need to be incorporated: the size of the time delay window needs to be determined. As shown in Figure 4.3, the optimal window size is 2 for the speed prole. For the acceleration prole, a window of size 3 is chosen. Lastly, for the power prole, a very large window size is preferred. The error
31
function attens out around window size 200. This value is therefore chosen as the window size.
9800 9700 9600 9500 9400 9300 9200 9100 90000 50 100 150 Time window size Power 1.495 1.490 1.485 1.480 1.475 1.470 1.465 1.460 1.455 1.4500 200 Speed 0.324 0.322 0.320 0.318 0.316 50 100 150 Time window size 200 0 50 100 150 Time window size 200 Acceleration
Figure 4.3: RMSE error values vs. time window size in the TDW model
Weighted time delay window with averages (TDWAt/TDWAtdw)
Current trip history Time window
Road graph path predictor Averages over path
y(t-1)
y(t-2)
y(t-i)
y(t-n)
Slow mean(t)
Fast mean(t)
Elevation(t)
Chance to stop(t)
Linear regression
Feedback
y(t)
Figure 4.4: The weighted time delay model (TDW) used for each prole (with added averages at step t (TDWAt) in dashed rectangle)
The TDW model above can be extended with input of the average proles of previous trips to investigate the inuence of the information in the road graph. The predicted value at each step is combined with the average value at that step (TDWAt). Additionally, this model is extended again by also including a weighted average of the history of the average proles (TDWAtdw). Initially all information from the road graph was included: the slow and fast averages of power demand, speed and acceleration as well as the average chance to stop over the current road segment and the elevation dierence between two successive samples.
32
Some of the information in the road graph is not useful. To detect the contributing averages, a feature selection is done on the input dimensions using Least Angle Regression (LAR). With this method, the weights of the input dimensions that hardly contribute towards improving the solution can drop to 0, which is not possible using linear (or ridge) regression. For the specics of this algorithm we refer to the article by Efron et al.[12]. This method was chosen over other similar stepwise methods such as forward feature selection because its just as fast as forward feature selection but generally performs better[12]. No additional experiments were done in this thesis to conrm this however. For every prole, the last observed value was selected. For the power demand prole, the fast average power value was selected as well. For the speed prole almost all averages remained non-zero, except for the slow average speed. Lastly for the acceleration prole, both the fast average acceleration and the average chance to stop were chosen. Training the TDWAt model weights using LAR does not perform as well as training the weights using ridge regression in reducing the RMSE1 . We therefore decided to use LAR for nothing but the feature selection above.
4.2.2
Results
Taking the last known value (LV) performs very well for predicting the speed according to the RMSE value (Table 4.1). However, Table A.1 in Appendix A demonstrates that the kurtosis of the predicted signal is much lower than the observed signal (the kurtosis of the uniform distribution is 3). This suggests the model doesnt predict any peaks, which is conrmed in example 4.6. The complete lack of peaks means the LV model is not very suitable for use in a controller, because most power is spent on accelerating, and most power is regenerated when braking. Furthermore, because of the conversion of the data to a distance scale the dataset has become unbalanced. The vehicle spends more time driving slowly, than he covers distance slowly. Any model thats good at predicting the speed while driving fast, will thus perform better according to the RMSE, than a model thats good at predicting speed, while driving slowly in the city, because the dataset is balanced towards fast driving situations. Cars
LAR TDWAt Results: Power demand prole RMSE = 9338 W; Speed prole RMSE = 1.494; Acceleration prole RMSE = 0.328 m/s2 .
1
33
12 10 8 6 4 2 00
Power (kW)
25 20 15 10 5
Speed (km/h)
50
100
150
200 00
50
100
150
0.45 Acceleration (m/s^2) 0.40 0.35 0.30 0.25 0.20 0.15 0.10 0.05 0.00 200 0 50 100 150 200
LV SA FA OA(F) TDW TDWAt TDWAtdw
Figure 4.5: Average absolute deviation of the baseline models over the predicted distance.
driving at a high speed are often on a highway where the speed remains nearly constant. This further explains the decent performance of the trivial LV predictor. Taking the averages as a prediction (SA/FA), results in poor performance according to the RMSE results, partly because it doesnt make any use of the current information. But, the low kurtosis dierence of these models suggests more prole similarity. In 4.6 for example, the stop is best predicted by the SA model. The slow averages (SA) contain more peaks than the FA and have therefore higher kurtosis than the FA predictions. The SA and FA predictions of the rst meters is very poor (Figure 4.5), resulting in a higher average RMSE than the other models. Osetting the fast averages (OAf) by the dierence of the current signal value, and the average value at the current position, combines a better RMSE value with the same peaks as the fast average proles. By aligning the averages to the current signal, at least the rst few predicted values are predicted well. The eect can be seen in example 4.6, although in this case the fast averages (FA) without an added oset would result in a lower RMSE. The time window model (TDW) yields the best solution for the speed prole for the RMSE (Table 4.1), but struggles like LV to predict peaks (see example 4.6). The models with a time delay window: TDW, TDWAt and TDWAtwd predict the acceleration and power demand better than the other proles, which shows that theyre quite noisy and
34
benet from averaging. The TDWAtwd, which uses information from the road graph, predicts the acceleration and power demand better than the TDW model (without previous information), demonstrating the usefulness of the road graph. RMSE (STD) LV SA FA OAf TDW TDWAt TDWAtdw Power (W) 10354 11380 9287 10525 8766 8434 8401 Speed (m/s) 1.502 8.384 3.814 1.641 1.481 1.502 1.502 Acceleration (m/s)2 0.4039 0.5666 0.3357 0.4124 0.3136 0.3150 0.3111
Table 4.1: Average RMSE results of the baseline prediction models.
10 0 10 20 30 40
Power (kW) 100 80 60 40 20
Speed (km/h)
3 Acceleration (m/s^2) 2 1 0 1 2 Target LV SA FA OAf TDW TDWAt TDWAtdw
50 -50 0 50 100 150 200 -100 -50 0 50 100 150 200 -100-50 0 50 100150200 0 3 -100 Predicted distance (m) Predicted distance (m) Predicted distance (m)
Figure 4.6: Example 2 of the output of the baseline prediction models.
35
4.3
4.3.1
A prediction system with Reservoir Computing

System setups
The TDW models showed the benet of including a part of the history of the current trip to predict the next values. These models however, are all linear. In this section we propose a few RC-based prediction models to extract any underlying nonlinear patterns as well (see section 2.6.1 on time series prediction with RC). System without averages (RCNoA) In the rst model only the history of the current trip is used. The speed, acceleration and power proles are used as input in separate systems with a reservoir and its readout function (ridge regression) as illustrated in Figure 4.7. The previous value y(t 1) is also used as direct (linear) input to the readout function. Each predicted output value is sent back in an output feedback loop as the next value in the current trip history, to recursively predict the rest of the sequence.
Current trip history

Speed Acceleration Power
Speed Reservoir
y(t-1)
Acceleration Reservoir
y(t-1)
Power Reservoir
Readout
Readout y(t)
Readout
Figure 4.7: RCNoA: RC prediction system without usage of averages from the road graph.
System with linear input of averages (RCLA) The basic RC prediction model is extended with input of the average proles of previous trips, much like the previously discussed weighted time history model. The weights of the average values at each step t are trained along with the reservoir state outputs at t in the readout function to predict the next output value y(t), as shown in Figure 4.8. The
36
averages used are the same as the averages used in the TDWAt and TDWAtdw models in section 4.2.
Speed Acceleration Power
Speed Reservoir
y(t-1)
y(t-1)
Power Reservoir
Slow mean(t)
Fast mean(t)
Elevation(t)
Chance to stop(t)
Readout
Feedback
Readout y(t)
Readout
Figure 4.8: RCLA: RC prediction system with linear input of previous averages
System with nonlinear (reservoir) input of averages (RCNLA) There may be nonlinear correlations between the predicted output and the averages of the previous trips in the road graph. We therefore also evaluate a system where we use the current trip history as well as the average values at t as input to the reservoir to predict y(t). To predict the speed prole at t for example, the observed speed at t 1, the slow average speed at t, and the fast average speed at t are used as input to the reservoir. Additionally, the observed speed at t1 is used as linear input to the readout as well. The other proles are predicted in the same way, with their respective averages. An illustration is given in Figure 4.9. Separating time scales (STSRCLA) Time series often have trends and patterns on dierent time scales. By using a dierent leak rate, the memory of a reservoir network can be modied directly. In this system we use two reservoirs with dierent time scales for every prole. One of the reservoirs does not retain anything of the previous neuron output (with leak rate set to 1.0). The leak rate of the other reservoir is then optimized like the other reservoir parameters (see section 4.3.2). The output weights of the states of both reservoirs are trained together by the same readout function.
37
Slow averages(t)
Fast averages(t)
Speed
Acceleration
Power
Speed Reservoir
y(t-1)
y(t-1)
Power Reservoir
y(t-1)
Readout
Feedback
Readout y(t)
Readout
Figure 4.9: RCNLA: RC prediction system with average proles as additional input to reservoir
4.3.2
Parameter optimization
As mentioned before in the evaluation methodology, every parameter is tested on 10 dierent reservoir instances. For each reservoir, the regularization parameter is rst optimized before evaluating the parameter on the validation set. The results are given for the power demand, speed and acceleration prole of the RC system that does not make use of the information in the road graph (RCNoA). For the RCLA model, no specic parameter optimization was done because the reservoir input is the same as RCNoA while training. A separate parameter optimization was done for RCNLA, of which the parameters used are given in Table A.7 in appendix A. Input scaling The interval 10[6,2.5] was searched, the results of which can be seen in Figure 4.11. At a certain point, often around input scaling 1, the RMSE shows a large peak. Around this point the reservoir output becomes more nonlinear, often too chaotic to train and the predictions become highly unstable. To display all RMSE errors for the input scaling used
38
in the power demand prole, they are plotted on a logarithmic scale in Figure 4.10. The lowest RMSE cant be deducted from this gure however, so the high RMSE values are clipped from here on, such as in Figure 4.11, 4.13, and 4.14. The lowest average RMSE for the acceleration and power demand prole are found at
1026 1024 1022 1020 1018 1016 1014 1012 1010 108 106 104 10-6
RMSE
10-5
10-4
10-3
10-2 10-1 Input scaling
100
101
102
103
Figure 4.10: Average RMSE on a logarithmic scale of the input scaling used in the RCNoA model for the power prole.
102 . For the speed prole a slightly better RMSE is found at 101.5 . At the other side of the peak, for input scaling larger than 100.5 , the RMSE becomes lower again, and almost lower than the previously found optimum for the speed prole. After close inspection these yielded useless solutions, outputting simply straight lines. The kurtosis dierence of the speed prole for high input scaling is lower than the kurtosis dierence for input scaling between 104 and 100 , which also shows that the peakedness of the predicted speed signal is lower than the observed speed signal.
Leak rate To nd the optimal leak rate the interval 2[5,0] was searched (including leak rate 1). The optimal leak rate of the power prediction model is around 0.1 as illustrated in Figure 4.12. After 0.5, the predictions becomes unstable and the resulting RMSE very high, these values are therefore omitted. The leak rate of the speed prole is optimal around 0.05. The leak rate graph of the acceleration prole is quite irregular. The lowest RMSE value is found at 0.0625.
39
Power 12500 12000 11500 11000 10500 10000 9500 9000 10-610-510-410-310-210-1100101102 Input scaling
1.9 1.8 1.7 1.6 1.5 1.4
Speed
0.40 0.38 0.36 0.34 0.32
Acceleration
1.3 -6 -5 -4 -3 -2 -1 0 1 2 30.30 -6 -5 -4 -3 -2 -1 0 1 2 3 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 Input scaling Input scaling
Figure 4.11: Average RMSE and standard deviation over the reservoir instances vs. the input scaling used in the RCNoA model.
Spectral radius The interval [0.5, 1.4] was searched (Figure 4.13). The optimal spectral radius is found around 0.9, for both the RMSE and the kurtosis dierence. At spectral radiuss higher than 1, the reservoir becomes unstable and no longer satises the Echo State Property. The low input scaling causes low neuron activations of the nonlinear tanh neuron output function and therefore causes the system to be more linear and makes it prone to instability, resulting in some extremely high RMSE values for spectral radiuses higher than 1. Bias scaling Because the input signal is skewed, a low bias scaling shifts the working point of half the input signals to the origin. The neuron activation therefore becomes more nonlinear, resulting in higher RMSE values, but a lower kurtosis dierence for the speed prole. When bias scaling becomes very high, the output of each neuron is a constant 1 or +1, resulting in the same predictions as when input scaling is very high. Eventually, no bias was added for either prole.
40
Power 9400 9300 9200 9100
1.45
Speed 0.320
Acceleration
1.40 0.315 1.35 0.310 0.305 0.300
1.30 9000 10-1 Leak rate 10-1 Leak rate
10-1 Leak rate
100
Figure 4.12: Average RMSE and standard deviation over the reservoir instances vs. the leak rate used in the RCNoA model.
Power 9500 9400 9300 9200 9100 9000 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 Spectral radius 1.5 3.0 2.5 2.0
Speed
0.310 0.309 0.308 0.307 0.306 0.305 0.304 0.303
Acceleration
0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 0.302 0.6 0.8 1.0 1.2 1.4 1.6 0.4 Spectral radius Spectral radius
Figure 4.13: Average RMSE and standard deviation over the reservoir instances vs. the spectral radius used in the RCNoA model.
41
9400 9350 9300 9250 9200 9150 9100 9050 0.0 0.5
Power
1.60 1.55 1.50 1.45 1.40 1.35
Speed
0.3060 0.3055 0.3050 0.3045 0.3040 0.3035 0.5
Acceleration
1.0 1.5 2.0 Bias scaling
2.5
1.30 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.50.3030 0.0 Bias scaling
1.0 1.5 2.0 Bias scaling
2.5
3.0
Figure 4.14: Average RMSE and standard deviation over the reservoir instances vs. the bias scaling used in the RCNoA model.
42
4.3.3
Results
A model is needed that is as peaked as the observed prole. Therefore a balance is needed between the lowest kurtosis dierence and the lowest RMSE value. These conditions are sometimes contradictory, especially for the speed prole. We expect that the speed prole is often either constant and linear but otherwise more nonlinear to break for intersections and such. The challenge for these models is to distinguish between these two situations. According to the results found in Table 4.2, the RCLA model and RCNoA model perform best to predict the speed. To predict the power demand, the inuence of the averages is clearly visible as the RCNoA (without input of averages) performs signicantly worse than the other models. To test the separating of the time scales (STSRCLA), the RCLA model was extended with a second reservoir with leak rate 1.0. The leak rate of the other reservoir was then re-optimized. This slightly increases performance for the power demand and acceleration prole, possible due to the more noisy nature of the signals. For the speed prole the model seems slightly unstable sometimes as the speed standard deviation is higher than the other models. To see which models could be useful for a controller we can take a look at a few prediction examples in Figure 4.15. The RCNoA model does not predict any of the peaks in the target signal. It only follows the trend of the last few observed values, which often yields a low RMSE but we expect it would not be very suitable for usage in a controller. The other models sometimes slightly follow the peaks in the target signal but seem to hesitate, because wrongly predicting a lot of breaks causes a high RMSE because of the unbalanced data. The models predict an average of the case where the car stops, and the case where it keeps driving at a constant speed, with an inclination towards predicting a constant speed. It seems separating both situations could be useful: one model trained for when the car stops, and one for when the car keeps driving. We investigate this in Chapter 6. When comparing the RC models with the best linear models from section 4.2, we can only conclude that using RC does not seem very useful in the setups presented here to predict the power demand and acceleration prole. For speed, a much lower RMSE was
43
reached (1.408) in comparison with the best baseline model (1.481). The best power demand and acceleration prediction model remains TDWtdw (see Table 4.1 and Table 4.2). It became clear at this point that the reliability of the kurtosis dierence as an error measure is limited, because the RC models can all be optimized to predict signals with a kurtosis dierence around 0, but that doesnt mean results are truly optimal. The lowest average kurtosis dierences found for each RC model are shown in Table A.2. RMSE (STD) RCNoA RCLA RCNLA STSRCLA Power (W) 8693 (6.58) 8416 (4.47) 8419 (13.45) 8398 (3.08) Speed (m/s) 1.408 (0.017) 1.423 (0.018) 1.462 (0.017) 1.593 (0.304) Acceleration (m/s)2 0.3165 (0.0007) 0.3130 (0.0005) 0.3163 (0.0021) 0.3113 (0.0004)
Table 4.2: Average RMSE results of the RC-based prediction models. The standard deviation over the reservoir instances is given in parentheses.
10 0 10 20 30 40
Power (kW) 100 80 60 40 20
Speed (km/h)
3 Acceleration (m/s^2) 2 1 0 1 2 Target RCNoA RCLA RCNLA STSRCLA
Figure 4.15: Example of the output of the RC prediction models.
44
4.4
Output window post-processing
The training process of the time series consists of only predicting the next step ahead. The weight of the previous value is therefore much higher than the prole average at this point. When predicting 200 steps ahead however, the inuence of the last known value could be much lower at the 200th point, and the weight of the averages could increase. To conrm this expectation, all predictions by the RCLA model on the test set are saved. Then, for each prediction, the 200 predicted values are combined with the 200 values of all averages over the 200 predicted meters. To calculated the weights of each value, linear regression is applied separately on each predicted step, and trained to output the target value at that predicted step. The new predicted value ynew (t + i) at step i now becomes: ynew (t + i) = wi,P y(t + i) + wi,SA SA(t + i) + wi,F A FA(t + i) In which SA and FA are the averages extracted from the road graph. The resulting weights wi over all 200 predicted steps i of the predicted power demand,
12000 10000 8000 6000 4000 2000 0 20000 50 100 150 200
W(predicted power) W(slow average power) W(fast average power)
Figure 4.16: The weights of the predicted power demand and the slow and fast power averages over the predicted distance.
when combined with the slow and fast average power demand at that point are shown in Figure 4.16. The weight ratios evolve over time, therefore its possible to further improve the system using a linear combination of the predicted values and the average proles. To evaluate the performance of this system, a 5-fold cross-validation is applied on the set
45
of intervals generated by the prediction models on the test set. We test the inuence of OWPP on the TDWAtdw, RCNoA and RCLA model.
RMSE (STD) TDWAtdw TDWAtdw/OWPP RCNoA RCNoA/OWPP RCLA RCLA/OWPP STSRCLA STSRCLA/OWPP
Power (W) 8401 8408 8693 (6.58) 8406 (4.54) 8416 (4.47) 8386 (3.67) 8398 (3.08) 8377 (6.35)
Speed (m/s) 1.502 1.453 1.408 (0.017) 1.317 (0.016) 1.423 (0.018) 1.311 (0.012) 1.593 (0.304) 1.423 (0.111)
Acceleration (m/s)2 0.3111 0.3084 0.3165 (0.0007) 0.3093 (0.0002) 0.3130 (0.0005) 0.3081 (0.0002) 0.3113 (0.0004) 0.3085 (0.0006)
Table 4.3: Average RMSE with and without output window post-processing on the current best models.
The average RMSE and standard deviation with and without output window postprocessing (OWPP) are given in Table 4.3. A signicant improvement was found for every prole for the RC-based models by post-processing its output. Applying OWPP to the TDWAtdw does show an improvement for speed and the prediction of acceleration, but the improvement is not as substantial. The RCLA/OWPP model becomes the best time series prediction model to predict the speed and acceleration. The STSRCLA/OWPP performs slightly better on predicting the power demand.
46
Chapter 5
Stop prediction
Predicting the peaks in power demand, speed and acceleration remains a problem, especially predicting at what point in time they will happen. Its possible to know how far the car is from the intersection using the information captured in the road graph, but a car does not always stop exactly at the intersection. With trac, the car may need to stop 100m or 200m away from the intersection and may even need to wait twice at the same intersection. Furthermore, a car does not always stop at an intersection. The models proposed in Chapter 4 arent able to fully distinguish between the signal dynamics when the car stops at an intersection, and when it doesnt stop (see section 4.3.3). However, instead of predicting the actual power demand, speed or acceleration, the controller may benet more from knowing if the car will stop soon, or the chance of that happening, instead of exactly when it will stop. We therefore introduce a classication system based on Reservoir Computing which tries to predict at each step if the car will stop soon or not. Specically, the model should predict whether or not the car will stop within the next 200m. Experiments showed that predicting stops any further turned the predictor in a general city driving classication system, because a car stops frequently in the city, therefore the chance to stop was always very high during city driving. Although this classication may be interesting as well, we did not further investigate it. Driving environment classication has been researched extensively by others however. A collection of this research is gathered in D. Prokhorovs book on computational intelligence in automotive applications [22]. Usually they are based on a broad range of general drive cycle parameters such as maximum velocity, trip length, etc... [10]. Feed-forward neural networks have been tried to predict the driving environment by Murphy et al.[20]. Few research
47
was found that tries to predict stops however, especially predicting a stop before the car actually starts braking.
5.1
Predicting the chance to stop using RC
We rst list some observations that were made in the process of developing the model. First, we noticed that information of the current trip was not enough to detect stops 200m before the intersection, because the car usually starts slowing down at a point 50m to 100m before actually halting. This is illustrated in Figure 5.1. Using information from the road graph is therefore necessary to predict stops before they are actually happening. As a baseline model, to compare to the performance of the presented RC model, we use
3 2 Acceleration (m/s^2) 1 0 1 2 30 500 1000 1500 Distance (meter) 2000 2500 3000
Figure 5.1: An example of the acceleration prole. The green areas are the areas where the car will stop within 200m. The actual stop occurs on the right edge of the area. We can see a clear drop in the signal when the car starts breaking, but it usually occurs within 100m of the actual stop.
the average chance a car is going to stop over the next 200m. For a lot of sections no stops are recorded so this baseline model should already provide a good estimate. However, its interesting to try to predict, when approaching an intersection, if the car is actually going to stop or not. For this, machine learning techniques could be useful to nd underlying correlations between the current trip and the environment information. To approach this problem we introduce an RC-based classication model. The reservoir state outputs are trained using logistic regression. We chose logistic regression over ridge regression because of the advantage over ridge regression specied in section 2.3.2, and because the logistic function output always lies between 0 and 1, which allows us to interpret the model output directly as the predicted chance to stop within 200m. The rst input of the reservoir is the output of the baseline model: the average chance the
48
car is stopping within the next 200m, because this information already showed promising results. Secondly, the slow acceleration averages are used. Early experiments showed this average signal was the most signicant but we werent able to conduct a full feature selection yet. Lastly, of course, we can use the current power demand, speed and acceleration as well. We tested each of these proles separately as input to the reservoir, as well as using them all together. Furthermore, its possible to use information from the road graph from a point t + i, i meter further down the road instead the information at t. This could improve performance, for example in situations where the car stops a few meters further away from an intersection than he usually does. This gap size i is an extra parameter that needs to be optimized. Additionally, we also tested using multiple future points t + 2i and t + 3i, but as the results show this does not improve performance. An illustration of the resulting model is given in Figure 5.2.
Road graph path predictor Current trip Averages over path
Power(t)
Speed(t)
Acceleration (t)
Average Acceleration (t+20)
Chance to stop in [t+20,t+200]
Reservoir
IRLS Logistic regression
Chance to stop over next 200 meter at point t
Figure 5.2: The stop prediction model with gap size i = 20m
49
5.2
Evaluation
For evaluation, the Receiver Operating Characterics are calculated as specied in section 3.3.4 as well as the total area under the ROC curve (AUC). We used the same dataset selected in Chapter 4. In this case however, we did not need to divide the trips in the dataset in intervals, because no output feedback is used that could aect the next prediction. Additionally, we only needed to evaluate the output of one reservoir, instead of the three reservoirs needed to predict the power demand, speed, and acceleration prole separately. This signicantly reduced the time required to evaluate each sample, and therefore allowed us to use cross-validation to optimize the parameters. We could not use a large training set however, because of the high memory requirements for training a logistic regression readout function using the IRLS algorithm (all reservoir states are saved and processed later). Parameter optimization was therefore done on the rst 50% of the dataset of Chapter 4 with a 3-fold crossvalidation. The performance of the nal model was then evaluated on a separate 50% as test set. Based on the results in Table 5.1 we decided to use power demand, acceleration and Reservoir input Only power demand Only speed Only acceleration All proles Area Under Curve (STD) 0.929 (0.0010) 0.944 (0.0009) 0.919 (0.0015) 0.952 (0.0007)
Table 5.1: The AUC of the RC model using dierent reservoir inputs on the test set.
speed as input to the reservoir. The eect of using the road graph information at point t + i instead of t can be seen in Figure 5.3. The optimal gap size i is found at i = 20. Adding an extra time frames t + 2i or t + 3i did not improve the result. The optimal reservoir parameters used for the RC model can be found in the appendix in Table A.9.
50
0.947 0.946 0.945 0.944 AUC 0.943 0.942 0.941 0.940 0.9391 Time frames
TF = 1 TF = 2 TF = 3
5 10 15 20 Gap (m) 25 30 35
Figure 5.3: The eect of gap width i and using several time frames on the resulting AUC after cross-validation on the training set.
Model Baseline RC Model
Area Under Curve (STD) 0.866 0.955 (0.0004)
Minimum miss rate (STD) 0.077 0.055 (0.0003)
Table 5.2: The AUC of the baseline model and the average AUC of the RC model over 10 instances (with standard deviation), and the minimum misclassication rate achieved.
For the baseline model, the average is taken over the AUC of each trip separately. For the RC model these results are averaged again over the 10 reservoir instances. The standard deviation of the AUC over the 10 reservoirs is given in parentheses. The corresponding ROC curves are shown in Figure 5.4. From these numbers we conclude the RC model works signicantly better as a classier than the baseline model. Its also useful to optimize a threshold to calculate the minimum misclassication rate. The threshold is optimized on the training set over 10 instances of the RC-model. From Figure 5.5 we conclude that the optimal threshold to classify stops is at 0.60, for the RC-model the optimal threshold is found at 0.50. The misclassication rate with this threshold on the test set for each model is found in Table 5.2.
The output of the RC model compared to the baseline model is compared in Figure 5.6. One of the RC model inputs is the output of the baseline model, which is the average
51
1.0 0.8 True Positive Rate 0.6 0.4 0.2 0.0 0.0
Baseline RC model Equal Error Rate

0.2 0.4 0.6 False Positive Rate 0.8 1.0
Figure 5.4: ROC curves of baseline vs. proposed Reservoir Computing model
chance to stop within 200m. The inuence of this input is clearly noticeable. Some of the peaks are very similar in both models. However, sometimes these peaks are poorly aligned in the baseline model, such as the peak at meter 1400 in Figure 5.6, because the car usually doesnt stop at the exact same point before an intersection. When comparing the baseline model with the RC model output at this point, we see the model can correct the poor alignment by also incorporating the current acceleration: the RC model output chance to stop increases very fast when the car is actually stopping. Furthermore, at meter 2300, a stop occurs that was not clearly detected by either pattern, but the RC model still detects the stop when the car actually brakes. Additionally, we can see in the baseline model output that the average chance to stop within 200m becomes very high right after meter 2300. The RC model, however, seems to know at this point that a stop just occurred and that its unlikely the car will stop so soon again. Lastly, around 4000m, the average chance to stop is fairly high, but because speed remains high, the chance to stop remains low. More examples are given in Appendix B.
52
0.14 0.12 Miss rate 0.10 0.08 0.06 0.2 0.4 Threshold 0.6
RC model Baseline model
0.8
1.0
Figure 5.5: Misclassication error rates for the baseline and RC models over the thresholds [0, 1]
53
1.0 0.8 0.6 0.4 0.2 0.00 1.0 0.8 0.6 0.4 0.2 0.00 90 80 70 60 50 40 30 20 10 00
Baseline output
1000
2000
3000
4000
5000
RC model output
1000
2000
3000
4000
5000
Speed (km/h)
1000
4000
5000
Figure 5.6: Example of chance to stop predicted by the baseline model and the RC model. The green areas are the target areas where the car stops within 200m. At the bottom the speed prole of the trip is given for comparison with the actual car behavior. The chosen threshold line is shown as a grey dashed line.
54
Chapter 6
Splitting the system for stopping and driving behavior

The models presented in Chapter 4 struggled to distinguish between situations where the car stops in the near feature, and situations where he keeps driving at a fairly constant speed. Using the results presented in Chapter 5 however, its now possible to split the models presented in Chapter 4.
6.1
The model setup
To separate a model based on whether or not the car stops over the next 200m, the dataset was divided in a slow set (if the car velocity is lower than 2m/s at any point over the next 200m), and a fast set (if the car does not stop over the next 200m). One model (the slow model) is then trained on the slow set, while another (the fast model) is trained on the fast set. Additionally, a parameter optimization was done as in Chapter 4 for each model individually. The inuence of the classiers can now be evaluated by predicting for every predicted interval if the car will stop within the next 200m. If the classier predicts the car will stop, the prediction of the slow model is used. Otherwise, the prediction of the fast model is used. We rst test what performance can be achieved with splitting the models by using a perfect oracle classier with a 0% error rate. Secondly, the real classier presented in Chapter 5 is used. The split system using this classier is referred to as the RCSP model. We use the hard threshold optimized in Chapter 5 to classify the output of the classier model at the last observed point. This model is illustrated in Figure 6.1. The threshold
55
could be optimized for this application specically but these results could not yet be fully investigated. After evaluating these systems, the output window post-processing lter (OWPP) proposed in section 4.4 is applied again. This model can be trained on a separated slow and fast dataset as well, further improving the proposed model.
Slow Average Acc. (t+20)
Chance to stop in [t+20,t+220]
RC Stop Predictor
Slow Model
Fast Model
OWPP
y(t)
Figure 6.1: The RCSP/OWPP model to predict the power demand, speed and acceleration prole of a vehicle at the next point y(t). To predict y(t + 1), ..., y(t + n).
6.2
Evaluation
To evaluate this system, we split a model similar to the RCLA model from Chapter 4. In this chapter, instead of using the averages found with feature selection, we use the slow averages from the road graph as additional input to the slow model, and the fast averages as additional input to the fast model. For the experiments, each models output predictions of the test set are generated. Each prediction of the test set is therefore generated twice: once by the slow model, and once by the fast model. Furthermore, each prediction is generated by 10 dierent reservoir instances to make sure we can draw general conclusions. As mentioned before, parameter
56
8 7 6 5 4 3 2 1 00
Power (kW)
50
100
150
9 8 7 6 5 4 3 2 1 200 00
Speed (km/h)
0.30 Acceleration (m/s^2) 0.25 0.20 0.15 0.10 0.05

RCLA RCLA/OWPP STSRCLA STS/OWPP RCSP RCSP/OWPP
50
100
150
0.00 200 0
50 100 150 200
Figure 6.2: Average absolute deviation Average RMSE error rates of the RCSP and RCSP/OWPP models. The RCLA models are given to compare with previous models.
optimization was done as in Chapter 4 but now on each model individually. The resulting parameters used for this chapter can be found in the appendix in Table A.8. In Table 6.1, the RMSE results of the RCLA and RCLA/OWPP, and the STSRCLA and STSRCLA/OWPP models from section 4.4 are given again for comparison. In Figure 6.2 the absolute deviation is given as the predicted distances increases. The results for the oracle model show a clear improvement over the other models for the acceleration and speed prole. With output window post-processing (Oracle/OWPP), also trained separately on the slow and fast dataset, the improvement is more signicant, especially to predict the power demand. Unfortunately, these results are only valid for the oracle model, but they do suggest an improvement is possible. In Figure 6.3, an example is given of the RCSP models performance right before stopping. The predicted speed prole clearly predicts the car slowing down. The RCSP/OWPP model with output window post-processing matches the observed signal even better but doesnt predict the brake as clearly. The predicted prole is quite noisy because there are fewer intervals where the car stops to train the OWPP lter. Adding a regularization parameter did not seem to signicantly improve the result however.
57
RMSE (STD) RCLA RCLA/OWPP STSRCLA STSRCLA/OWPP RCSP RCSP/OWPP Oracle Oracle/OWPP
Power (W) 8416 (4.47) 8386 (3.67) 8398 (3.08) 8377 (6.35) 8367 (25.6) 8257 (12.48) 8428 (8.56) 8251 (18.47)
Speed (m/s) 1.423 (0.018) 1.311 (0.012) 1.593 (0.304) 1.423 (0.111) 1.304 (0.017) 1.257 (0.016) 1.274 (0.017) 1.162 (0.015)
Acceleration (m/s)2 0.3130 (0.0005) 0.3081 (0.0002) 0.3113 (0.0004) 0.3085 (0.0006) 0.3023 (0.0005) 0.2992 (0.0006) 0.3082 (0.0006) 0.297 (0.0013)
Table 6.1: Average RMSE error rates of the RCSP and RCSP/OWPP models. The RCLA and STSRCLA models are given to compare with previous models. The results of the Oracle model are shown to compare with the performance of the split system with a perfect classier.
6.2.1
RCSP Results
The RCSP model uses the stop predictor presented in Chapter 5. The results of this model are not as signicant as the hypothetical oracle, but it still oers a clear improvement over the RCLA model. After the output window post-processing, the average RMSE of the RCSP/OWPP model is brought down to 8257 for the power demand prole prole, 1.257 for the speed prole, and 0.2992 for the acceleration prole. This is therefore the best performing model found in this thesis. Furthermore, when inspecting the examples, we see that peaks are often much more pronounced, which suggests this system could be quite suitable for a controller.
10 0 10 20 30 40
Power (kW) 100 80 60 40 20
Speed (km/h)
3 Acceleration (m/s^2) 2 1 0 1 2 Target RCLA RCLA/OWPP STSRCLA STS/OWPP RCSP RCSP/OWPP
50 -50 0 50 100 150 200 -100 -50 0 50 100 150 200 -100 -50 0 50 100 150 200 0 3 -100 Predicted distance (m) Predicted distance (m) Predicted distance (m)
Figure 6.3: Example of the output of the RCLA, RCLA/OWPP, STSRCLA, STSRCLA/OWPP models from Chapter 4, and the RCSP and RCSP/OWPP models.
58
Chapter 7
Conclusion
First, a road graph data structure for automatic GPS map generation was adapted to capture local vehicle behavior information. The proles for speed, acceleration and power demand are separated in a slow and a fast set, depending on whether the car stops over the road segment. The road graph allowed the prediction models to use information of the averages of the previous trips for the prediction of the current trip. Secondly, a number of simple linear models were compared with RC-based models. It became clear that any model that predicts the speed to remain constant, already results in a low RMSE, but we expect these predictions are not useful for controllers because of the lack of peaks in the predictions. The overall best RC-based model from Chapter 4 was the RCLA model with output window post-processing (OWPP) with a RMSE result of 8386 W for the power demand, 1.31 m/s for predicting the speed, and 0.308 m/s2 to predict the acceleration prole. However, the RC-based models still seemed incapable of distinguishing between situations where the car stops over the next 200m and where it doesnt stop. In Chapter 5, a model was presented to classify the prediction intervals by predicting if the car will stop within the rst 200m. The area under the ROC-curve of this model reached a maximum surface of 0.955 on the test set and a minimum misclassication rate of 5.5%, tested on a dataset in which 10% of the samples are actual stops. Finally, a modied RCLA/OWPP model was trained twice, separately on a slow dataset and a fast dataset. The classier presented in Chapter 5 was then used to determine which of the two models to use to predict the next 200m. The results of this model outperformed any other model presented in this work. The power demand prole RMSE was brought
59
down to 8257 W, the speed prole RMSE to 1.25 m/s, and the acceleration prole to RMSE to 0.299 m/s2 . In an attempt to quantify the usefulness of a prediction model, the kurtosis dierence is presented. The conclusions that can be drawn from the secondary error measure used in this work, the kurtosis dierence, are mixed. A small kurtosis dierence does not mean the prediction covers the same spectrum as the observed signal, especially for the acceleration and power demand. For predicting the speed however, the models that are expected to be more suitable in a controller, always had a smaller kurtosis dierence than less useful models. Furthermore, these predictions seemed capable of predicting some peaks. We believe further signicant improvements could be made with research focused on improving the road graph structure towards an integrated intelligent driving environment, with vehicles exchanging information with each other about the road ahead. More research could also be done on conrming the results with controllers and investigating which predictions are more useful. Many more predictions could be done as well, should more data become available: trac situation prediction, image processing to detect if an intersection light is red or green, etc.... This research could not only improve energy eciency, but aid automated driving in complex situations as well.
60
Appendix A
Extra tables
A.1 Kurtosis dierence results
Kurt. di. (STD) LV SA FA OAf TDW TDWAt TDWAtdw Power 2.711 0.566 0.072 0.072 2.660 2.099 1.936 Speed 2.494 0.420 0.166 0.166 0.6944 0.290 0.278 Acceleration 2.735 0.345 0.008 0.008 1.643 0.818 0.750
Table A.1: Average kurtosis dierence: Kurt(predicted) Kurt(observed) of the baseline prediction models.
Kurt. di. (STD) RCNoA RCLA RCNLA STSRCLA
Power 2.093 (0.101) 1.656 (0.051) 1.624 (0.082) 1.840 (0.0870)
Speed 0.223 (0.037) 0.239 (0.026) 0.195 (0.023) 0.264 (0.00)
Acceleration 1.166 (0.146) 0.765 (0.286) 0.353 (0.084) 1.174 (0.0836)
Table A.2: Average kurtosis dierence of the RC-based prediction models. The standard deviation over the reservoir instances is given in parentheses.
61
Kurt. di. (STD) TDWAtdw TDWAtdw/OWPP RCNoA RCNoA/OWPP RCLA RCLA/OWPP STSRCLA STSRCLA/OWPP
Power 1.936 1.495 2.093 (0.101) 1.156 (0.017) 1.656 (0.051) 1.354(0.015) 1.907 (0.073) 1.482 (0.057)
Speed 0.278 0.074 0.223 (0.037) 0.113 (0.030) 0.239 (0.026) 0.081 (0.038) 0.246 (0.059) 0.086 (0.105)
Acceleration 0.750 0.894 1.166 (0.146) 0.893 (0.019) 0.765 (0.286) 0.975(0.021) 1.441 (0.620) 0.975 (0.035)
Table A.3: Average kurtosis dierence for several models with and without output window postprocessing.
Kurt. Di (STD) RCSP RCSP/OWPP Oracle Oracle/OWPP
Power (W) 1.691 (0.081) 1.359 (0.048) 1.753 (0.090) 1.519 (0.050)
Speed (m/s) 0.117 (0.037) 0.056 (0.038) -0.151 (0.051) 0.238 (0.046)
Acceleration (m/s)2 0.710 (0.222) 0.483(0.073) 0.785 (0.240) 0.756 (0.083)
Table A.4: Average kurtosis dierence of the RCSP and RCSP/OWPP models. The RCLA and STSRCLA models are given to compare with previous models. The results of the Oracle model are shown to compare with the performance of the split system with a perfect classier.
A.2
Model parameters
Max. road segment angle dierence Max. dist. from road segment to new node 2-d distance tree max. node distance [40, 40] degrees 0.0002 degrees 0.002 degrees
Table A.5: Node merge rules used for building the road graph. See Cao et al [8]
62
Power prole Reservoir size Input scaling Leak rate Spectral radius Bias scaling Speed prole Reservoir size Input scaling Leak rate Spectral radius Bias scaling Acceleration prole Reservoir size Input scaling Leak rate Spectral radius Bias scaling Regularization param. interval
150 102.5 0.1 0.8 0 150 101.5 0.05 0.9 0 150 103 0.0625 0.9 0 10[8,3]
Table A.6: Parameters used for the RCNoA and RCLA reservoirs
63
150 0.01 0.125 0.8 0 150 0.01 0.125 0.5 0 150 0.1 0.5 0.5 1 10[8,3]
Table A.7: System with nonlinear input of averages
64
150 / 150 102.5 / 0.1 0.125 / 0.05 0.9 / 0.8 0/0 150 / 150 0.03 / 0.1 0.15 / 0.08 0.99 / 0.8 0/0 150 / 150 0.05 / 0.1 0.125 / 0.1 0.5 / 0.8 1.5 / 0 10[8,3]
Table A.8: System optimized for the fast/slow sections. The parameters for the fast model are shown on the left and vice versa.
Reservoir size Input scaling Leak rate Spectral radius Bias scaling Gap size i Regularization param. interval
150 0.1 0.3 0.6 0 20 [8,3] 10
Table A.9: RC stop prediction model
65
Appendix B
Extra gures
B.1 Stop prediction model examples
1.0 Chance to stop 0.8 0.6 0.4 0.2 500 1000 1500 2000 2500 3000
0.00 100 Speed (km/h) 80 60 40 20 00
500
1000
2500
3000
Figure B.1: The green areas are the target areas where the car stops within 200m. At the bottom the speed prole of the trip is given
66
1.0 Chance to stop 0.8 0.6 0.4 0.2 1000 2000 3000 4000 5000
0.00 100 Speed (km/h) 80 60 40 20 00
1000
4000
5000
1.0 Chance to stop 0.8 0.6 0.4 0.2 500 1000 1500 2000
0.00 100 Speed (km/h) 80 60 40 20 00
500
1000 Distance (meter)
1500
2000
67
1.0 Chance to stop 0.8 0.6 0.4 0.2 0.00 60 50 40 30 20 10 00 200 400 600 800 1000 1200
Speed (km/h)
200
400
1000
1200
B.2
10 0 10 20 30 40
Time series prediction examples

Power (kW) 100 80 60 40 20 Speed (km/h) 3 Acceleration (m/s^2) 2 1 0 1 2 Target RCLA RCLA/OWPP STSRCLA STS/OWPP RCSP RCSP/OWPP
50 -50 0 50 100 150 200 -100 -50 0 50 100 150 200 -100 -50 0 50 100 150 200 0 3 -100 Predicted distance (m) Predicted distance (m) Predicted distance (m)
Figure B.5: Example of the output of the RCLA, RCLA/OWPP, STSRCLA, STSRCLA/OWPP models from Chapter 4, and the RCSP and RCSP/OWPP models.
68
10 0 10 20 30 40
Power (kW) 100 80 60 40 20
Speed (km/h)
10 0 10 20 30 40
Power (kW) 100 80 60 40 20
Speed (km/h)
69
10 0 10 20 30 40
Power (kW) 100 80 60 40 20
Speed (km/h)
70
Bibliography
[1] Chargecar community at chargecar.org. [2] Scholarpedia article on echo state networks. [3] D. Barber. Bayesian Reasoning and Machine Learning. Cambridge University Press, 04-2011 edition, 2011. In press. [4] R Bartholomaeus, A Fischer, and M Klingner. Real-time predictive control of hybrid fuel cell drive trains, volume 54, page 258. 2007. [5] Christopher M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006. [6] H. Benjamin Brown, Illah Nourbakhsh, Christopher Bartley, Jennifer Cross, Paul S Dille, Joshua Schapiro, and Alexander Styler. Chargecar community conversions: Practical, custom electric vehicles now! March 2012. [7] Pieter Buteneers. De detectie van epileptische aanvallen met reservoir computing. Masters thesis, Gent, 6 2008. [8] Lili Cao and John Krumm. From gps traces to a routable road map. In Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS 09, pages 312, New York, NY, USA, 2009. ACM. [9] Mark Delucchi and Timothy Lipman. An analysis of the retail and lifecycle cost of battery-powered electric vehicles. Institute of transportation studies, working paper series, Institute of Transportation Studies, UC Davis, 2001. [10] F. DiGenova, R. Dulla, Sierra Research (Firm), and California. Dept. of Transportation. SCF Improvement Cycle Development. Sierra Research, Incorporated, 2002.
71
[11] Dennis Doerel and Suleiman Abu-Sharkh. A critical review of using peukert-equation for determining the remaining capacity of lead-acid and lithium ion batteries. Journal of Power Sources, 155(2):395400, 2006. [12] Bradley Efron, Trevor Hastie, Iain Johnstone, and Robert Tibshirani. Least angle regression. Annals of Statistics, 32:407499, 2004. [13] Tom Fawcett. Roc graphs: Notes and practical considerations for researchers. Technical report, 2004. [14] H Jaeger. Short term memory in echo state networks. German National Research Center for Information, 152(GMD Report 152):757769, 2002. [15] H. Jaeger. Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the echo state network approach. Technical report, Fraunhofer Institute AIS, St. Augustin-Germany, 2002. [16] Herbert Jaeger. The echo state approach to analysing and training recurrent neural networks - with an erratum note. 2001. [17] Herbert Jaeger and Harald Haas. Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. Science, 304(5667):7880, 2004. [18] L. Johannesson and B. Egardt. A novel algorithm for predictive control of parallel hybrid powertrains based on dynamic programming. Advances in Automotive Control, 5(1), 2007. [19] Songrit Maneewongvatana and David M. Mount. Analysis of approximate nearest neighbor searching with clustered point sets. CoRR, cs.CG/9901013, 1999. [20] Yi Lu Murphey, ZhiHang Chen, Leonadis Kiliaris, Jungme Park, Ming Kuang, M. Abul Masrur, and Anthony M. Phillips. Neural learning of driving environment prediction for vehicle power management. In IJCNN, pages 37553761, 2008. [21] T. Natschlger, W. Maass, and H. Markram. The liquid computer: A novel strategy a for real-time computing on time series. Special Issue on Foundations of Information Processing of TELEMATIK, 8(1):3943, 2002. [22] Danil V. Prokhorov, editor. Computational Intelligence in Automotive Applications, volume 132 of Studies in Computational Intelligence. Springer, 2008.
72
[23] D.B. Rubin. Iteratively reweighted least squares. Encyclopedia of Statistical Sciences, 4:272275, 1983. [24] J. Schindall. The charge of the ultracapacitors. Spectrum, IEEE, 44(11):4246, 2007. [25] A. N. Tikhonov. Solution of incorrectly formulated problems and the regularization method. Soviet Math. Dokl., 4:10351038, 1963. [26] Gertjan Van Droogenbroeck. Invloed van decompositiemethoden bij het voorspellen van tijdsreeksen met reservoir computing. Masters thesis, 2010. [27] David Verstraeten. Reservoir Computing : computation with dynamical systems. PhD thesis, Ghent University, 2009. [28] David Verstraeten, Benjamin Schrauwen, Michiel DHaene, and Dirk Stroobandt. An experimental unication of reservoir computing methods. 20(3):391403, 4 2007. [29] David Verstraeten, Benjamin Schrauwen, Sander Dieleman, Philemon Brakel, Pieter Buteneers, and Dejan Pecevski. Oger: Modular learning architectures for large-scale sequential processing. Journal of Machine Learning Research, Submitted. [30] P. Werbos. Back propagation through time: What it does and how to do it. In Proceedings of the IEEE, volume 78, pages 15501560, 1990. [31] Francis wyels and Benjamin Schrauwen. A comparative study of reservoir computing strategies for monthly time series prediction. NEUROCOMPUTING, 73(10-12):1958 1964, 2010. [32] Francis wyels, Benjamin Schrauwen, and Dirk Stroobandt. Stable output feedback in reservoir computing using ridge regression. In V. Kurkova, R. Neruda, and J. Koutnik, editors, Proceedings of the 18th International Conference on Articial Neural Networks, pages 808817, Prague, 9 2008. Springer. [33] Tiziano Zito, Niko Wilbert, Laurenz Wiskott, and Pietro Berkes. Modular toolkit for data processing (mdp): A python data processing framework. Frontiers in neuroinformatics, 2(January):7, 2009. Neural Networks,
73

Jonas Buyl - Power Demand Prediction of Vehicles On A Non-Fixed Route

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Jonas Buyl - Power Demand Prediction of Vehicles On A Non-Fixed Route

Uploaded by

Copyright:

Available Formats

Power demand prediction of a vehicle on a non-fixed route Jonas Buyl

Power demand prediction of a vehicle on a non-fixed route Jonas Buyl

Power demand prediction of a vehicle on a non-xed route

0.00 100 Speed (km/h) 80 60 40 20 00

TABLE I AVERAGE RMSE ERROR RATES ( AND STANDARD DEVIATION ).

2000 3000 Distance (meter)

Predicted distance (m)

0.30 Acceleration (m/s^2) 0.25 0.20 0.15 0.10 0.05

Predicted distance (m)

Predicted distance (m)

50 100 150 200

Average absolute deviation over the predicted distance.

Voorspelling van vermogensgebruik van een voertuig op een niet-vaste route

0.00 100 Speed (km/h) 80 60 40 20 00

Tabel I G EMIDDELDE RMSE ( EN STANDAARD DEVIATIE )

2000 3000 Distance (meter)

Predicted distance (m)

0.30 Acceleration (m/s^2) 0.25 0.20 0.15 0.10 0.05

Predicted distance (m)

Predicted distance (m)

50 100 150 200

Gemiddelde absolute afwijking over de voorspelde afstand.

Toelating tot bruikleen - Copyright

June 10, 2012

Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Content and structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Time series prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.6.1 Classication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 18

3 Data analysis 3.1 3.2 3.3

A road graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Extracting useful information . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Error measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.3.1 3.3.2 3.3.3 3.3.4

A prediction system with Reservoir Computing . . . . . . . . . . . . . . . . 36 4.3.1 4.3.2 4.3.3

Output window post-processing . . . . . . . . . . . . . . . . . . . . . . . . . 45 47

5 Stop prediction 5.1 5.2

Predicting the chance to stop using RC . . . . . . . . . . . . . . . . . . . . 48 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 55

The model setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6.2.1 RCSP Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 59 61

7 Conclusion A Extra tables

A.1 Kurtosis dierence results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 A.2 Model parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 B Extra gures 66

A battery-capacitor hybrid setup

Battery vs. supercapacitor

Figure 1.1: The Nissan LEAF full electric vehicle

Also referred to as electric double-layer capacitor (EDLC), ultracapacitor, etc...

Content and structure

Introduction to neural networks

The Reservoir Computing approach

and p(C2 , x) = 1 p(C1 , x) when solving a binary classication problem.

1.0 0.8 0.6 p(C|x) 0.4 0.2 0.0 6

Cross-entropy function Mean squared error

This deviation between the

weight update as follows: w(+1) = (X T RX + I)1 X T Rz

Training set 1,2 1,3 2,3 1,2 1,4 2,4 1, 3 1, 4 3, 4 2, 3 2, 4 3, 4

Time series prediction

Time series prediction using Reservoir Computing

Figure 3.1: ChargeCar GPS data

Trip 1 Trip 2 Trip 3

a) Three trips to merge

b) Trip 1 added to graph G

c) Trip 2 merged with graph G

d) All trips merged with graph G

Extracting useful information

50 100 150 200 250 300 350 400 Time (s)

500 1000 1500 2000 2500 3000 3500 Distance (m)

Slow speed data Fast speed data

Dening a prediction distance