We use cookies to give you the best experience possible. By continuing we’ll assume you’re on board with our cookie policy

Literature SurveyRoad traffic congestion become a major problem Paper

Words: 1393, Paragraphs: 31, Pages: 5

Paper type: Essay , Subject: Literature

Literature Survey

Road traffic congestion become a major problem in various urban cities. This can results various problems in modern city management such as untimely delays to passengers, Higher Carbon dioxide emission and high travel times. Therefore, controlling traffic congestion on the road has become a very strong necessity on today.

The current traffic control system has failed to consider the volume of vehicles in their operations. Traffic lights system plays an important role when considering proper ways to control road traffic. Therefore, many researches attempt to find solutions to control road traffic using affecting traffic light system using information technology. There are lot of solutions have been already proposed using various IT based methods such as vehicle to infrastructure communication, image processing, Radio frequency identification and machine learning. But practically, applying these methods in an efficient way has become the major problem when considering these solutions.

Reinforcement Learning

A main goal of artificial intelligence is to develop machines that can act as the intelligence behavior of the human. For this purpose, “AI system should be able to interact with the environment and learn how to correctly act inside it”[]. Reinforcement learning is an area of machine learning that can act in that kind of experienced driven autonomous learning. For the purpose of traffic light signal control this ML method has been proved as a successful one.

Don't use plagiarized sources. Get Your Custom Essay on Literature SurveyRoad traffic congestion become a major problem
Just from $13,9/Page

Get Essay

“In a Reinforcement Learning (RL) problem, an autonomous agent observes the environment and perceives a state st, which is the state of the environment at time t. Then the agent chooses an action at which leads to a transition of the environment to the state st+1. After the environment transition, the agent obtains a reward rt+1 which tells the agent how good at was with respect to a performance measure.”[]

Learning in Traffic Signal Control

The traffic light agent has the goal of maximizing the efficiency of controlling the traffic flow around a road junction. Using reinforcement learning to control traffic lights is enhanced by several reasons. Those are

1. RL agents are able to adapt to different situations in the environment.

2. RL agent has the ability to learn itself without the help of supervision or prior knowledge.

3. The model needed to learn for RL agent is too simplified because the agent learns using system performance metric reward.

The RL can address the following challenges when it apply to traffic light control.

1. In appropriate traffic light sequence – Traditional traffic light system usually use predefined static policy. This method can be caused to choose inappropriate traffic light phase that cause to increase the traffic.

2. Inappropriate traffic light duration – Traditional traffic light system contains a predefined duration for each phase and it does not depends on the current traffic conditions. It could cause inappropriate extensive traffic light durations.

The state representation, the available actions and the reward functions are essential define when applying RL algorithm. We will discuss about those terms in terms of traffic signal control below.

“The state is the agent’s perception of the environment in an arbitrary step Deep Q Learning”. According to the information density, the state representations are divided into two categories. In low information density representations, the lanes of an intersection are discretized in cells along them. Then those cells are mapped into a vector and, if a vehicle is inside the cell, it marks as 1 and if not it marks as 0. In some approaches the velocity of vehicles and the current traffic light phase also have been added as two vectors for state.

In high information density, the agent receives an image of whole intersection and that is redirected to a simulator. Multiple snapshots of current state are stacked together to give a sense to agent about the vehicle motion.

But in reality high information density agents are hard to implement. Let’s pay attention on cell based approach to the state representation.

Every cell is not in the same length and when it goes away from the stop line the length of the cells increase. If cells are too long, it has some possibility to not to be detected some cars approaching the crossing line and if they are too short the number of states would increase too and lead to higher computational complexity. Therefore more recent researches have chosen the cell length as in the figure because the agent is able to obtain clear picture of presence or absence of vehicles in every area of the incoming lanes.

There are 4 action sets have been adopted in recent literature and they have represented in the figure below.

If the action chosen in current and previous steps are same, the green phase continues and there is no yellow phase. If the actions are different yellow phase is initiated between the actions. The SUMO simulator has mostly been used to simulate these actions.

“The reward represents the feedback from the environment after the agent has chosen an action.”[] Then the agent use the reward to understand about the action and that will use to improve the rewards for future actions. A positive reward is generated for good actions and it is negative for bad actions.

Deep Q learning

The most widely used machine learning method for traffic signal control is Q learning because of its higher performance. If we assume that we know the expected reward of each action in order to achieve the next state, the agent will know exactly which action to perform. Then the agent will perform the sequence of actions that will gain the maximum total reward to achieve that state. This total reward is called Q value and we can have a formula for our strategy as below.

This equation implies that the Q value achieved form being at state s and performing action a is the latest reward r(s,a) plus the highest possible Q value from the next state s’. The factor gamma has become the discount factor to control the contribution of rewards further in the future. Q(s’,a) will again depend on Q(s”,a) the Q value for state after Q(s’,a). As a result of deriving this way, the Q value will depend on the future Q values as below.

“Since this is a recursive equation, we can start with making arbitrary assumptions for all q-values. With experience, it will converge to the optimal policy. In practical situations, this is implemented as an update:”[]

Here, the alpha is the learning rate. The above expression implies what amount newly captured information overrides old information.

“Q-learning is a simple yet quite powerful algorithm to create a cheat sheet for our agent. This helps the agent figure out exactly which action to perform.”[] but if the number of states and actions per states are too long, it would be difficult to control and may lead to require higher memory and unrealistic amount of time to create the Q table of each state. As a solution for these problems the researchers use neural network to approximate the Q value function and it is called as deep Q learning. Here the state is given as the input for network and Q values of all possible actions are received as the output.

The Training Process

According to the recent researches, ‘Experience replay’ has been chosen as the technique for the training process because of its higher performance of the agent and learning efficiency. It consists randomized group of samples called batch, and the sample can be interpreted as below.

m = {st, at, rt+1, st+1}

Here m is the sample and the batch of samples is extracted from a data structure called memory. When the memory is filled in a certain step oldest sample is removed. As an example, if memory size contain 50000 samples, extracted batch size can be 100. This batch is used for one training instance and the Q value function is learned iteratively during the training instance. From the standpoint of a single sample, following operations are executed.

1. Prediction of Q-values Q(st), which represent the current knowledge that the agent has about the action values from st.

2. Prediction of Q value Q’(st+1), which represent the knowledge of the agent about the action values starting from the state st+1

3. Update of Q(st,at), which represent the value of the action (at) selected by the agent during the simulation.

4. Training of the neural network. Here the maximum future reward value is expected as the result of the Q value update. The input for the training is st and desired output is Q value Q(st,at).

About the author

The following sample is written by Matthew who studies English Language and Literature at the University of Michigan. All the content of this paper is his own research and point of view on Literature SurveyRoad traffic congestion become a major problem and can be used only as an alternative perspective.

Matthew other papers:

How to cite this page

Choose cite format:

Literature SurveyRoad traffic congestion become a major problem. (2019, Nov 21). Retrieved from https://paperap.com/literature-surveyroad-traffic-congestion-become-a-major-problem-best-essay/

Is Your Deadline Too Short?
Let Professionals Help You

Get Help

Our customer support team is available Monday-Friday 9am-5pm EST. If you contact us after hours, we'll get back to you in 24 hours or less.

By clicking "Send Message", you agree to our terms of service and privacy policy. We'll occasionally send you account related and promo emails.
No results found for “ image
Try Our service

Hi, I am Colleen from Paperap.

Hi there, would you like to get such a paper? How about receiving a customized one? Click to learn more https://goo.gl/CYf83b