-
Notifications
You must be signed in to change notification settings - Fork 1
Design
File-Inspector is a Graphical User Interface Driven Tool/Application that can be used to obtain statistical data about the words and the sentences of an ASCII file which will be provided by the user as input. The Statistical Data include the number of sentences, number of words, number of newlines, most and least frequently occurring word. The tool can also plot a histogram denoting the frequency of the words in the file.
The user can provide another file containing a list of keywords and File-Inspector can show all the occurrences of the sentences in the input file containing those keywords.
This document describes the implementation details of the File Inspector Tool. It will also serve as a User Manual since it explains User Interface, buttons, and associated operations they can perform using this tool.
The project is completely based on the python programming language.
-
File Inspectoris a solelyPythonbased Graphical User Interface (GUI) driven File Analysing Tool. It uses tkinter library to render the GUI. It also uses pandas and matplotlib to manipulate the data and plot histograms respectively. Data Structures like Map are used for efficient searching and reducing time overhead, making the project scalable for larger files. No Database is used by the tool. -
This document is intended for the software developers and designers. Dr. Padmanabhan Rajan, the client for the software may also read the document.
-
[1] Tkinter Official Documentation
[2] Guide to GUI Programming
[3] File Analyser Tools
[4] Matplotlib Documentation
-

-
Input Module (Loading the data) : First we load our data into a pandas dataframe , which makes the data readable and presentable and will help in data manipulations.
-
Processing Module : We analysed our data by plotting different types of graphs so as to visualize variation and to show relationships between variables.
-
Output Module : The results show the acquired useful and usable information we got after analysing the data.
-
The project uses the following data structures:
-
Array : We used arrays so as to contain all the required words from the file.
Below are some advantages of the array:
- In an array, accessing an element is very easy by using the index number in O(1) time.
- The search process can be applied to an array easily O(n) time.
- For any reason a user wishes to store multiple values of similar type then the Array can be used and utilized efficiently.
Alternatives / different tradeoff : We can use linked lists or vectors in place of arrays which are more useful from memory allocation point of view.
-
Map : We used maps to store the words so as to find out the most and the least frequent words.
-
map is a fairly well-rounded dictionary-type container that provides several advantages over std:list (linked lists) and std:vector (arrays).
-
Lookup Time : A map lets you maintain reasonable lookup performance (O(log(n))), but only takes up 2 spots to store the memory. A map also lets you lookup on any type that defines a < operator or specifies to the map through a template argument how to compare keys. So you can have a reasonable lookup on maps of strings -> another value.
Alternatives / different tradeoffs :
- multimap is like map but allows the keys to be not unique
- unordered_map is a map that does not store items in order, but can provide better lookup performance if a good hash function is provided
-
-
-
The project solely uses
Pythonlanguage for scripting.It uses several libraries in its aid for development. Some of them include:
-
tkinter: A standard GUI library for the Python programming language, which permits to create of the GUI application. We have used this library to design our tool and also to obtain its various controls, such as buttons, labels, and text boxes.
-
statistics: A Python library for calculating mathematical statistics of numeric(Real-valued) data, such as mean, standard deviation, variance, mode, etc.
-
matplotlib: A plotting library for the Python programming language. We have used this library to plot the histogram for word-frequency.
-
NumPy: A Python library that provides efficient operations, especially with arrays. We have used this library to store the list of words and find a unique set of elements from them.
-