Project Id: 19-124

Project Proposal Report

IT16140080 – E.G.H.Sajeewan

IT16548374 – H.G.K.A.Wijayasiriwardhana

IT16126084 – N.M.Hettiarachchi

IT16125858 – A.J.M.L.M.Jayasundara

Bachelor of Science in Information Technology

Department of Information Technology Sri Lanka Institute of Information Technology

Sri Lanka


Project Id: 19-124

Project Proposal Report

(Proposal documentation submitted in partial fulfillment of the requirement for the Degree of Bachelor of Science Special (honors) In Information Technology)

Bachelor of Science in Information Technology

Department of Information Technology Sri Lanka Institute of Information Technology

Sri Lanka


Student ID Name Signature

IT16140080 E.G.H.Sajeewan

IT16548374 H.G.K.A.Wijayasiriwardhana

IT16126084 N.M.Hettiarachchi

IT16125858 A.J.M.L.M.Jayasundara



MRS. Shashika Lokuliyanage



Dr. Dharshana Kasthurirathna



We declare that this is researchers own work and this project proposal does not incorporate without acknowledgement any material previously submitted for a Degree or Diploma in any other University or institute of higher learning and to the best of knowledge and belief it does not contain any material previously published or written by another person except where the acknowledgement is made in the text.

Student ID Name Signature

IT16140080 E.G.H.Sajeewan

IT16548374 H.G.K.A.Wijayasiriwardhana

IT16126084 N.M.Hettiarachchi

IT16125858 A.J.M.L.M.Jayasundara

The above candidates are carrying out research for the undergraduate Dissertation under my supervision.

Signature of the supervisor: Date: 3/5/2019


There are many types of business overheads project team process to get business related data in any business like invoices, goods return notices etc.

Get quality help now

Proficient in: Data

4.7 (348)

“ Amazing as always, gave her a week to finish a big assignment and came through way ahead of time. ”

+84 relevant experts are online
Hire writer

processing these documents manually can be a very complex and time consuming. Then there is the human error factor. Efficiency each employee can also be different. Taking all these factors into account automating the document processing mechanism can be very effective and efficient. “Sarotis ” is a product aimed at solving this issue.

“Sarotis ” is a tool capable of processing data in business documents and formatting them into common platform like json / spreadsheet files so these information is easily accessible and manipulated. Basic computer vision technologies, machine learning and image processing techniques are used to implement this solution.

Main objective of all businesses is to maximize profit, today during the 21st century information is the main driving force behind every business. Project team process that information to make future business decisions, to generate reports, make predictions, producing accounting statements and many more, business decision making being the primary focus. “Sarotis ” also have the capabilities of analyzing the processed data to give predictions on business transactions so that the management can use them to make decisions. Furthermore, this tool can be used to produce reports on various business areas.



Authors: 3



Table of Figures 7

List of Tables 7


1.1. Background 8

1.2. Literature Review 9

1.2.1. Sales Analysis and prediction module 9

1.2.2. Optical character recognition and processing. 12

1.3. Research Gap 15

1.4. Research Problem 16


2.1. Main objectives 17

2.2. Specific Objectives 17


3.1 High-level architecture……………………………………………………………………………………..18

3.2. Major Components 19

3.2.1. Computer-generated document analyzer 18

3.2.2. Handwritten document analyzer 21

3.2.3. Template training and accounting statement generator 21

3.2.4. Business data analysis and predictions module 22

3.4. Testing 22

3.5. Marketability 23

3.6. Gantt chart 24


5. BUDGET 27



Table of Figures

Figure 1 – Level Of Benefits And Scope…………………………………………………………………………….9

Figure 2 – Error Rate Graph 10

Figure 3 – The Different Areas Of Character Recognition 13

Figure 4 – OCR Heavy Print Example 14

Figure 5 – OCR Light Print Example 14

Figure 6 – The high-level architecture diagram 17

Figure 7 – Computer Generated Document Analyzer…………………………………………………………20

Figure 8 – Business data analysis and predictions module ………………………………….…..22

Figure 9 – Gantt Chart…………………………………………………………………………..24

List Of Tables

Table 1 – Description of personal and facilities…………………………………………………24 ?


1.1. Background

In today’s world main goal of almost all the businesses is to maximise the profit while reducing the costs and workload. Efficiency in all key aspects of the business is crucial in achieving aforementioned state of efficiency. Today’s businesses are dependent extensively dependent on data. Efficient handling of data can boost the overall efficiency by a significant margin. All business organizations handle business related documents at various capacities. Especially medium scale businesses process such documents at a considerable level as their operating capacity is comparatively high and doesn’t have investments in methodical automation of the process. “Sarotis” provide a cost effective and efficient solution at such instances.

Labour cost or employee salaries can be a huge cost to the business even if a single employee is hired for data entry and documents processing. Employee salaries and and other benefits which have to be granted to them increase day by day with ever increasing living costs and government imposed rules and regulations on employers.” sarotis” provide a simple to handle solution which does not need special training or high level technical skills.mid level management can easily operate this.

Information contained in these documents can be very sensitive to the business and lesser the number of layers which these data pass through, it is better. Businesses these days prefer privacy and security of its data at a very high degree.” Sarotis” is a cloud based solution and project team can restrict access by unauthorized parties while permitting easy access to relevant parties.

Businesses sometimes require an insight on how to make future investments. Entrepreneurs with years of experience and technical understanding can make such decisions effectively without much effort, even then they need past sales information and information on other parameters to make them. having a business prediction system is an added advantage which analyses past information at a very intensive level and make well founded predictions.

1.2. Literature Review

1.2.1. Sales Analysis and prediction module

To get a better understanding of the business project team extract and analyze internal data from the organization to give predictions and suggestions to improve the business condition and analyze the market behavior and make predictions about future behavior of the market. project team use purchasing data to analyze and have an idea on the sales conditions, based on that researchers make predictions on future purchases. As an example future predictions are important for prepare upcoming seasons of the year. Furthermore, this information could be used to make decisions on how authors can channel researcher’s resources and make investments.

Machines and humans have distinct strengths and weaknesses in the context of prediction. As prediction machines improve, businesses must adjust their division of labor between humans and machines in response. Prediction machines are better than humans at factoring in complex interactions among different indicators, especially in settings with rich data.

As the number of dimensions for such interactions grows, the ability of humans to form accurate predictions diminishes, especially relative to machines. However, humans are often better than machines when understanding the data generation process confers a prediction advantage, especially in settings with thin data.

Hugh J. Watson, Barbara H. Wixom “The Current State of Business Intelligence”. [1]

As business users mature to performing analysis and prediction, the level of benefits become more global in scope and difficult to quantify. For example, the most mature uses of BI (Business Intelligence) might facilitate a strategic decision to enter a new market, change a company’s orientation from product-centric to customer-centric, or help launch a new product line.

Figure 1 [1]

As business users mature to performing analysis and prediction, the level of benefits become more global in scope and difficult to quantify

Ajay Agrawal, Joshua Gans, Avi Goldfarb ” Prediction Machines: The Simple Economics of Artificial Intelligence”

Humans make mistakes around 5 percent of the time. Prediction is the process of filling in missing information. Prediction takes information researchers have, often called “data,” and uses it to generate information researchers don’t have. In addition to generating information about the future, prediction can generate information about the present and the past. [2]

Figure 2 [2]

Between the first year of the competition in 2010 to the final contest in 2017, prediction got much better. Figure 2 shows the accuracy of the contest winners by year. The vertical axis measures the error rate, so lower is better. In 2010, the best machine predictions made mistakes in 28 percent. So predictions are an important part of a business to get a better understanding of the business and make decisions about the future behavior of the market.

Decisions about Data

Data is often costly to acquire, but prediction machines cannot operate without it. They require data to create, operate, and improve.

Researchers therefore must make decisions around the scale and scope of data acquisition. How many different types of data do researchers need? How many different objects are required for training? How frequently do researchers need to collect data? More types, more objects, and more frequency mean higher cost but also potentially higher benefit. In thinking through this decision, researchers must carefully determine what researchers want to predict. The particular prediction problem will tell researchers what researchers need.

According to the above, Prediction machines use three types of data:

(1) Training data for training the AI.

(2) Input data for predicting.

(3) Feedback data for improving prediction accuracy.

For better accuracy, author needs more data to predict and the high prediction accuracy often enables machines to perform tasks well. Sometimes prediction machines may also lack data because some events are rare. If a machine cannot observe enough data, it cannot predict those decisions. As a result of this, the prediction mechanism is poor.

“Machines are bad at prediction for rare events. Managers make decisions on mergers, innovation, and partnerships without data on similar past events for their firms. Humans use analogies and models to make decisions in such unusual situations. Machines cannot predict judgment when a situation has not occurred many times in the past.” [2]

1.2.2. Optical character recognition and processing.

Optical character recognition (OCR) is process of classification of optical patterns contained in a digital image. The character recognition is achieved through segmentation, feature extraction and classification. There are different techniques of OCR(Optical character recognition) systems.

1. Optical scanning.

2. Location segmentation.

3. Pre-processing.

4. Segmentation.

5. Representation.

6. Feature extraction.

7. Training and recognition.

8. Post-processing.

Arindam Chaudhuri, Krupa Mandaviya, Pratixa Badelia, Soumya K Ghosh (auth.) – “Optical Character Recognition Systems for Different Languages with Soft Comp”

OCR tries to address several issues of above mentioned techniques for automatic identification. They are required when the information is readable both to humans and machines. OCR systems have carved a niche place in pattern recognition. Their uniqueness lies in the fact that it does not require control of process that produces information. OCR deals with the problem of recognizing optically processed characters.

Optical recognition is performed offline after the writing or printing has been completed whereas the online recognition is achieved where computer recognizes the characters as they are drawn. Both hand printed and printed characters may be recognized but the performance is directly dependent upon the quality of input documents. The more constrained the input is, better is the performance of OCR system [3].

Figure 3

The different areas of character recognition [3]

The main concept of the optical character recognition is first to teach the machine which class of patterns that may occur and what they look like. This action performed by showing examples of characters for all different classes to the machine.

Imaging Defects errors occurs with the OCR When neighboring characters are joined or fused due to heavy print (Figure 4) and print with light print (Figure 5). [3]

Figure 4

Heavy Print

Figure 5

Light Print

To improve the accuracy of the OCR authors need to, Improve image processing, Adapting to the current document, Multi-character recognition and increased use of linguistic context.

1.3. Research Gap

As mentioned above, during the literature review authors have found there are similar systems which have been already created for the invoice handling, but there are several drawbacks in those systems.

Most invoices handling systems only manage the invoices which are computer generated. But in the practical world companies received lot of hand written invoices. So authors also plan to handle hand written (only English) invoices along with computer generated ones. For do that we’re willing to use machine learning algorithm.

Invoices are mainly handle for accounting purposes. But alternative invoices handling systems does not support in build function to make accounting them only summarize the data according to the invoices. But we’re planning to do summarize the data from invoices and entered that data in proper accounting equation. So authors hope give output of that accounting equations as a common format (XML or Json) for use with any accounting software which company use for there accounting purposes.

Currently any invoices handle system does not give predictions about the business. By getting the predictions about purchasing business can take valuable decisions about their company in future. And they can easily manage there incomes and outcomes. So authors plan provide some valuable prediction based on purchasing and internal factors. For do those researchers willing to use Artificial Intelligence (AI).

1.4. Research problem

Main goal of all businesses are to make profits and maximize them with time. To do so researchers need to increase the efficiency of the business in all aspects. Information handling is a vital factor in almost all modern businesses and they are all integrated with modern technologies to a certain degree. To accomplish efficient handling of business information, opting for an automated system is the only option. Such a system can be more efficient in storing, retrieving, report generation, analysis and much more operation with data.

In a business mainly they process documentation related to various overhead types such as invoices, return notes and many other documents. Manually processing them painstakingly can be time consuming and inefficient. The factor of human error is also there which cannot be eliminated. Sometimes sensitive data of the business organization can be vulnerable as the data has to pass through an extra layer of management hierarchy other than the top level decision makers.

Almost all of the above mentioned issues could be eliminated with “sarotis ” like tool. Efficiency of data processing is increased as the data is updated real-time into a cloud based database. This allows researchers to manipulate data according to researchers wish in areas like sales predictions.

Security of the information is ensured as the access to information can be controlled according to the desire of the management or the ownership.

In large scale organizations there could be a performance bottleneck in the whole process if manual techniques are used. Labour cost to operate such a pool of workers can also be eliminated through this. Processing such documentation is made simple here as the only requirement to operate this system is a scanned image of the invoice or the relevant document. Businesses some time receive handwritten documentation apart from computer generated ones. “ Sarotis “ is capable of processing handwritten documents too.

We use templates of frequently used documents to train researchers overhead detection mechanism to extensively train it, increasing the accuracy and speed of the process. When it comes to analyzing and predicting the future behaviors humans are great if the number of variables and volume of data to be manipulated is comparatively low but in most business organizations researchers have to deal with thousands of products and various parameters (Ex: department stores, textile shops). Integrated prediction module can be very advantageous under such circumstances.


2.1. Main Objectives

Main objective of this research is to create a smart and efficient invoice scanner and analyzer to automate the invoice handling process. This system will reduce the labor cost and increase the efficiency of invoice handling. This is a user friendly system can be used for a business of any scale.

2.2. Specific Objectives

• Process the computer generated invoices

In order to process computer generated or printed invoices, those invoices need to be scanned through a scanner. After the scanning, details of that image are extracted by using an OCR mechanism. In order achieve maximum efficiency we strive to extract data with a high accuracy.

• Process the hand written invoices

Handrwritten documents processing does not always produce highly accurate and reliable. Main goal is to extract data with an accuracy exceeding 70%. Those data will be formatted correctly and uploaded so that it is easy to manipulate real-time.

Cite this page

SMART INVOICE SCANNER AND ANALYZERProject Id 19124Project. (2019, Dec 07). Retrieved from

Let’s chat?  We're online 24/7