APPLICATION OF FUZZY LOGIC TO DOCUMENT ARCHIVING
APPLICATION OF FUZZY LOGIC TO DOCUMENT ARCHIVING
This project is concerned with the creation of an information retrieval system. These are systems that search through a huge number of documents to find those that are relevant to the user's demand (query). Finally, the results are indexed based on the queries and documents retrieved.
Various retrieval systems, such as vector space, Boolean, probabilistic, and so on, have been studied over time.
Professor Lofti A. Zadeh created “A fuzzy logic system for archiving purposes” employing the notion of fuzzy logic to improve the precision of information retrieval in archives.
Most information retrieval systems employ various retrieval methods such as weight matching, probability, and so on. However, in this work, the concept of membership function and fuzzy set theory are used.
This strategy was chosen since archives typically include big documents and the user may not always have a clear concept of what he wants to retrieve. As a result, an ideal tool was created that allows the user to just enter a portion of the file or document name that he or she is looking for.
The findings were then compared to other search and matching systems, such as the Lucene App, which was written in Java but does not utilise fuzzy logic, the Rubens App, which does use fuzzy logic, and Doc Fetcher, which was downloaded from the internet and used to search for files and documents.
The approach provided in this work was significantly more effective than the aforementioned, resulting in clogged or no results from some of the aforementioned software above.
In its literal sense, logic can be defined as a system's ability to produce a rational conclusion, which can be seen as the theory of reasoning in decision making. logical reasoning yields two outcomes: TRUE or FALSE, 0 or 1, ON or OFF, or any other suitable representation. This concept is known as Boolean logic.
Unfortunately, Boolean logic is limited. This is because it is constrained to a set of (0, 1) exclusively, implying that Boolean logic is too precise.
This also implies that a condition can only be true or false. For example, Boolean logic cannot distinguish between something that is “good” and something that is “very good.” The concept of fuzzy logic eliminates this limitation.
Fuzzy logic is a subfield of logic and artificial intelligence. Although it has been studied as infinite-valued logics since 1920, most notably by ukasiewicz and Tarski, the concept was fully developed in 1965 by Lofti A. Zadeh in one of his seminar works known as “fuzzy set theory.”
Fuzzy logic is a type of logic that allows for imprecise or ambiguous solutions to problems, and it serves as the foundation for computer programming that attempts to emulate human intellect (Microsoft Encarta Encyclopaedia, 2009). Unlike Boolean logic, fuzzy logic extends the set elements to [0.0, 1.0] and applies the membership function to each of the set elements.
According to the preceding, fuzzy logic is more complex and less precise than Boolean logic, yielding a larger range of results to a condition. Instead of just creating true or false, fuzzy logic might provide very true, true, false, very false. This idea is known as degree of truth, where 0.0 represents total falsehood and 1.0 represents perfect truth.
Before delving farther into fuzzy logic, we first consider the concept of defuzzification. Given fuzzy sets and membership degrees, defuzzification is the process of producing a quantifiable result in fuzzy logic. It is commonly used in fuzzy control systems.
These will include a set of rules that will transform a set of variables into a fuzzy result, which is described in terms of membership in fuzzy sets. For example, guidelines for determining how much pressure to apply could result in “Decrease Pressure (15%), Maintain Pressure (34%), and Increase Pressure (72%).”
Defuzzification is the process of converting the fuzzy set membership degrees into a definite choice or real value.
Fuzzy operations on fuzzy sets are defined by fuzzy set theory. It employs the feature of human decision making through the usage of levels of possibility in a variety of uncertain/fuzzy categories. As a result, fuzzy logic employs IF – Then – Else constructions in the following format:
IF variable IS a property, then action follows.
The Boolean logic operators AND, OR, and NOT are also utilised in fuzzy logic, where they are known as MAXIMUM, MINIMUM, and COMPLIMENT. They are also known as the Zadeh operators. These operators are as follows:
– AND: If Xa is a member of set a, and Xband is a member of set b, and another measurable variable, then the fuzzy AND will be:
A and B are equal to min(X(a), X(b)) or
Xa and b are equal to Xa Xb = Xa * Xb = min (Xa, Xb)
– OR: If Xa belongs to set a for a measurable variable Xb and to set b for another measurable variable, the fuzzy OR will be:
max(X(a), X(b)) or A = max(X(a), X(b))
Xa = Xa Xb = Xa + Xb = max (Xa, Xb)
– NOT: The fuzzy NOT for a member of set Xa will be:
NOTa = 1 minus X(a) or
X without a = 1 – X(a) =Xa
Fuzzy logic has been used in a variety of fields, including medical, technical equipment, databases, archives, and so on. The use of fuzzy logic in archives is a subset of information retrieval systems.
The process of compressing huge files or data for long-term storage is known as archiving. Archive data is often composed of compressed files with extensions such as.zip,.rar, and so on.
The majority of archives include very old files that are just needed for reference and not for everyday processing.An archive is a collection of records containing original source materials accumulated over the lifetime of an individual or organisation.
Archiving provides numerous benefits, including improved performance, increased storage space availability, lower maintenance costs, and so on. Despite the benefits of archiving, organisations cannot archive as they please.
In order to meet some legal and legislative obligations, an organisation must keep data on its database for a particular amount of time before archiving it.
– An excellent data archiving strategy can be significantly less expensive than the traditional way of just adding extra storage (discs) and servers.
– If a misdemeanour or illegal act is detected, data archives can be utilised to retrieve information at a later point. This has grown increasingly crucial in recent years as a result of numerous incidences of illegal activity, such as drug sales utilising company computer resources and even difficulties involving terrorist operations.
– Data archiving systems can compress information, decreasing an organization's storage requirements.
– Data or content archiving solutions may ensure that documents or records are not replicated automatically. Again, replicating the same information can be a significant drain on an organization's resources.
– Mitigation of regulatory violations. Implementing a data archiving system reduces the possibility of violating important codes of practise and other legislation.
The archived data is accessible upon request. The archived data must be re-loaded into the online database before it can be accessed. However, with NetWeaver 2004s, a new archiving method known as NearLine Storage was introduced.
NearLine Storage serves as a bridge between traditional archiving and online databases. Using NearLine Storage would allow us to retrieve archived data without having to reload the data into an online database. Archives are classified into two types:
– On-line Archiving: is a system in which the archive system is physically connected to an organization's network at all times. It offers the advantage of being efficient and allowing for quick access to stored content, as well as the ability to automate the archiving process.
– Offline Archiving: This is a system in which an IT manager must archive data from a computer network and then physically relocate that data to a separate system for retention.
The disadvantage is the amount of time and labour required to execute this process, and if someone wishes to retrieve certain archived data, the entire method would have to be repeated in reverse.
1.1 DEFINITION OF THE PROBLEM
As previously stated, fuzzy logic for archiving is a subset of information retrieval systems. In general, we face the difficulty of retrieving documentary information from storage in response to search queries (G. Klir et al, 1995). We will be concerned with the storage, representation, organisation, and access of information items because archiving is a relatively compact means of storing data that reduces the problem of disc and space management.
The next section goes into greater detail about the potential issues that can arise when using fuzzy logic in archives:
– Although memory wastage is not a major worry in archiving systems, archival storage capacity is always a concern because data is often immutable and cannot be erased until the retention term expires, as previously stated. To ensure that the archive does not run out of space, diligent capacity management is required.
– Because archives might literally contain hundreds of gigabytes of unique material, finding files can be arduous and time consuming. As a result, a strong indexing and search capacity is necessary.
– Data duplication can be a particularly bothersome issue in archives, as redundant data can persist in the archive for a longer amount of time than planned, leading to data inconsistency.
– The obtained documents must be rated in order of their importance in relation to the user query.
– Inability to determine the usefulness of a document in an archive.
1.2 goals AND OBJECTIVES
Because of the problems encountered in managing archives, as well as the difficulty to properly classify records based on their level of significance and membership roles, the goals of this research would be as follows:
– The matching method has been modified to a partial match: it computes the degree of relevance of each document to the user query based on query word membership values in document representations.
– Proper data representation to distinguish which data belongs to which set (archive) and to apply fuzzy logic operations to determine which is a member, a partial member, not a member, and so on.