ROLLING COSTS:

Exploring CPI trends in Public Transportation and Fuel

Overview

Background of the Study

Currently, 80% of Filipinos living in urban areas rely on public transportation. In Metro Manila alone, 70% of people are dependent on public transportation to get from place to place. Every day, the number of commuters who ride jeepneys and buses are 8,959,000 and 1,865,000, respectively.

While there may be other factors as to why Filipinos choose to commute, the main benefit of public transportation is how much more affordable it is compared to the cost of owning a private vehicle and constantly spending money on fuel, maintenance, and parking fees. According to a 2018 Consumer Finance Survey, the monthly expenditure level in Metro Manila was at P27,507/month. Of this, 10.5% was spent on transportation. Most Filipino commuters are from low to middle income classes. The minimum wage in the Philippines varies per province with Metro Manila having the highest amount at P610/day or around P18,300/month; thus, it is difficult for many to allocate such a large budget for transportation.

After Russia’s invasion of Ukraine, the price of gas and fuel surged worldwide. The Department of Energy released data that showing that 77.4% of its gas, 80% of its diesel and 84% of its kerosene are imported by the Philippines. As a result, the global increase in gas prices have led to the significant increase of local gas prices, which has also largely added to the financial burden of Filipinos.

References: [1] [2] [3] [4] [5] [6] [7] [8]



Problem

Many Filipinos are reliant on public transportation as a means to travel to different destinations. Because of this, they are heavily affected by changes in its prices.

Objectives

  • To analyze the CPI of public transportation in the Philippines over the past five years
  • To determine the relationship between the CPI of fuel/gas and public transportation over the past five years.

Solution

Through data science, the group aims to draw insights regarding the state of public transportation prices in the Philippines and its implications to Filipino commuters.

Primary Research Question

What is the overall trend of the CPI of public transportation over the past five years?

Null Hypothesis

The CPI of public transportation has not increased over the past five years.

Alternative Hypothesis

The CPI of public transportation has increased over the past five years.

Secondary Research Question

Is there a relationship between the CPI of fuel/gas and public transportation over the past five years?

Null Hypothesis

There is no relationship between the CPI of fuel/gas and public transportation over the past five years.

Alternative Hypothesis

There is a relationship between the CPI of fuel/gas and public transportation over the past five years.

Data & Methods

Data Collection

To address the overarching problem and the proposed research questions, our group opted to utilize the data set entitled “Consumer Price Index for All Income Households by Commodity Group (2018=100): January 2018 - March 2024” data set found on the OpenStat website , an initiative made by the Philippine Statistics Authority.

The data set as a whole is massive, as it lists the Consumer Price Index (CPI) of various commodities under different commodity groups, wherein the CPI is computed based on the province or region monthly.

Given the scope of our study, we will perform modifications onto the data set, which includes reformatting it as a whole followed by preprocessing. Details about our data set will be presented below.



Step 0  :  Preparation of the Data Set
At this step, we first extracted the relevant data from the original data set. After doing so, we manually removed excess rows and filled in the missing Geolocation and Year entries. We then restructured the data set into a different format to make data access easier.

Step 1  :  Preprocessing
After preparing our data set, we now perform preprocessing to ensure that the data is clean and consistent. As we performed this step, we found that only the CPI column has null values and performed some steps to ensure that such is addressed. Entries with null CPI may either be dropped or replaced by an aggregate value depending how many entries of the same Geolocation are also null. In the end, all the CPI values are converted into floats as they are initially in string format.

About the data set

After performing preprocessing and reformatting, our data set now contains a total of 44850 entries. Moreover, the data set now has five columns instead of just two, which was found in the raw data set. From left to right, these columns are:

Geolocation
Commodity Description
CPI
Year
Month

In this case, the years we consider are the years 2019 to 2023, since we specifically wanted to know the transportation situation in the past five years. It is worth noting that the year 2024 was not considered as data for this year is still incomplete. The months we consider are January to December with an additional average value. This aside, since every entry represents a totally different CPI value, no sampling method will be used here.

Exploratory Data Analysis (EDA)

Now that our data set is cleaned and well-formatted, we can now proceed to exploratory data analysis. Essentially, this step involves creating (clear) visualizations to help us understand the data, and using this information to determine if our established hypotheses are correct. Following this, we will perform hypothesis testing to see what results we can derive from the data.



Step 2  :  Data Visualization
For this study, we conceptualized two different plots that aim to help us answer our Research Questions, and understand our project in a visual manner.

Plot 1

Plot 1

What is the overall trend of the CPI of public transportation over the past five years?

From our visualization, we can see that the CPI of all transportation groups seem to be going upward over the years. Generally, the CPI for all transportation groups seem to be continuously increasing until it plateaus on some maximum and will eventually increase again after some amount of time.



Plot 2

Plot 2

Is there a relationship between the CPI of fuel/gas and public transportation over the past five years?

From here, we can see that the CPI for both fuel and land transportation seem to be going upwards, albeit at different rates. This suggests that they may be correlated to one another. Nevertheless, it is worth noting that the CPI for fuel fluctuates, whereas the CPI for land transportation is relatively more stable.



Step 3  :  Hypothesis Testing
Before choosing a statistical test to use for both research questions, we first performed the QQ-plot test to check for normality of our data set. For both research questions, it was found that our data does not follow a normal distribution. Thus, we use different tests to perform hypothesis from those that we have learned.

What is the overall trend of the CPI of public transportation over the past five years?

Null hypothesis:  The CPI of public transportation has not increased over the past five years.
Alternative hypothesis:  The CPI of public transportation has increased over the past five years.

The Mann-Whitney U statistical test was then used to check if we will accept or reject the null hypothesis. After performing the statistical test, we obtained a p-value of 7.714x10^(-6), which is less than our set significance level of 0.05. Thus, we reject the null hypothesis.

Given this, it implies that the CPI of Passenger transportation has significantly increased over the past five years.

Is there a relationship between the CPI of fuel/gas and public transportation over the past five years?

Null hypothesis:  There is no relationship between the CPI of fuel/gas and public transportation over the past five years.
Alternative hypothesis:  There is a relationship between the CPI of fuel/gas and public transportation over the past five years.

The Spearman Correlation test was then used to check if we will accept or reject the null hypothesis. After performing the statistical test, we obtained the p-value of 3.626x10^(-12), which is less than our set significance level of 0.05. Thus, we reject the null hypothesis. Additionally, the Spearman Coefficient rho of 0.7336 tells us that there is a positive correlation between the CPI of fuel and public transporation where both increase together over time.

Given this, it implies that there is an observed correlation between the CPI of fuel and public transportation over the past 5 years from 2019 to 2023.

Modelling & Interpretation

After exploratory data analysis, we intend to perform Machine Learning to achieve the following:

Primary Research Question

  1. To better show the trend of CPI of Land Transportation from 2019 to 2023, and
  2. To predict the possible trend of CPI values in 2024, based on this information

Secondary Research Question

  1. To better show the trend of CPI of Fuel from 2019 to 2023, and
  2. To predict the possible trend of CPI values in 2024, based on this information


To do this, we employ the use of a regression model,

Support Vector Regression.

Specifically, we will try to fit a polynomial of degree n to predict the values.

Primary Research Question

Input Feature

Months column

Output Feature

CPI column

Our model is essentially examining the relationship of the CPI of Land Transportation with respect to time (represented as an integer of months passed since the earliest date in our data). Once the model has been fitted to the data, the CPI for the land transportation for the months of January 2024 to December 2024 is predicted.



Plot 1

From the plot, we can see that the model predicts the CPI of Land Transportation to rise throughout 2024. This reflects the ongoing trend that has been ongoing for the past five years.



Like with our primary research question, we follow the same fitting, prediction, and data preparation process for the secondary research question.


Secondary Research Question

Input Feature

Months column

Output Feature

CPI column

Our model is essentially examining the relationship of the CPI of Fuel with respect to time (represented as an integer of months passed since the earliest date in our data). Once the model has been fitted to the data, the CPI for the fuel for the months of January 2024 to December 2024 is predicted.


Plot 2

The machine learning model seems to see the sudden spike and fall of the CPI of fuel prices to be an ongoing trend into 2024. However, without further context into the current economy, it is hard to conclude using this data whether the CPI will continue downward or once again reach the same high as it did in 2022.

Conclusion

By the end of our exploratory data analysis and machine learning phases, we have gathered the following insights:
The CPI of all transportation groups seem to be going upward over the years. Nevertheless, it occurs in such a way where the CPI values continuously increase until they "plateau" and increase again after some period of time.
The CPI of passenger transportation has significantly increased over the past five years.
The CPI for both fuel and land transportation seem to have been going upwards from 2019 to 2023, albeit at different rates. In particular, the CPI for fuel fluctuates, whereas the CPI for land transportation is relatively more stable. However, the CPI of of fuel may experience a continued dip over 2024, as concluded by our machine learning model. It is unsure whether the sudden spike and fall in CPI fuel during 2022 is merely a fluke or if the trend is definitely going in this direction.
There is an observed correlation between the CPI of fuel and public transportation over the past 5 years from 2019 to 2023.


Implications

After performing exploratory data analysis and creating a machine learning model on our data set, we managed to attain valuable insights with regard to the transportation scene in the Philippines. To be specific, we found that the CPI of Land Transportation in the Philippines has significantly increased from 2019 to 2023. Furthermore, subsequent regression analysis also concluded that this trend may further increase throughout 2024.

Our exploratory data analysis also found a correlation between the CPI of land transportation and CPI of fuel from 2019 to 2023. Both values have been seen to increase alongside each other throughout the past five years. However, due to the current trend of the CPI of fuel, there is a possibility of it decreasing in 2024, as concluded from regression analysis.

While the future state of the price of fuel and land transportation is not certain without further economic analysis on the national and global scales, overall these insights tell us that the situation of the transportation system in the Philippines is in a less-than-ideal state. This is mostly because transportation prices seem to be going up continuously, which will eventually become a big burden to Filipinos. This issue is already quite apparent in the Philippines based on our research. Essentially, being aware of this fact can serve as a wake-up call for agencies such as the DOTr, allowing them to implement policies that may reduce the rate of rising fuel prices.

Future Recommendations

While insightful, we have some recommendations for those who wish to work on something similar in the future. In particular, we recommend that they take confounding variables more into consideration to ensure that associations are made correctly. Correlations between the CPI of transportation and other factors such as changes in government policies and subsidies and the overall trend of cost of living may prove to be beneficial in further research.

Our team

Diego Montenejo

I'm Diego Montenejo, a 3rd year BS Computer Science student with a knack for game development and music production. Despite my inclination towards different creative avenues, I still enjoy the analytical and computational side of computer science and information technology.

Jasmin Pascual

I'm Jasmin Pascual, a 4th year BS Computer Science student who has a passion for graphic design. Because of this, I would love to be a UI/UX designer in the future. In my free time, you can find me watching volleyball, playing sudoku, or exercising at the gym.

Jeanne Toledo

I'm Jeanne Toledo, a graduating BS Computer Science student who finds joy in learning new things. In Computer Science, I particularly enjoy web and software development since it is where I can apply my learnings whilst adding my own creative twist. Other than Computer Science, other things that I like are graphic design and Winnie the Pooh.