The project was to get the similarity between two sentences of any language ( It can be natural language or computer language). The first step we are doing is to parse the sentence and get the tree structure of it. Then applying Zhang and Shasha's algorithm (we modified it as pure algorithm was not appropriate here) to find tree distance between these parsed trees. So now we divide the tree in a tree forest according to part of speech and calculate tree distance and we assigned different weightage for different parts of speech structures. for example. Ram is playing. Sita is playing You can see noun part does not alter the meaning of sentence but Ram is playing . Ram is eating verb is completely changing the meaning

so clearly parts of sentence has to be different weightage for finding similarity. We used regression techniques to find these parameters.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.settings		.settings
bin/ZangShasha		bin/ZangShasha
src/ZangShasha		src/ZangShasha
.classpath		.classpath
.project		.project
README.md		README.md
ass2.jar		ass2.jar
ass3.jar		ass3.jar
ass4.jar		ass4.jar
dataset.txt		dataset.txt
en-parser-chunking.bin		en-parser-chunking.bin
msr-test-mini.txt		msr-test-mini.txt
msr-test.txt		msr-test.txt
msr-train-mini.txt		msr-train-mini.txt
msr-train.txt		msr-train.txt
msr-val.txt		msr-val.txt
pom.xml		pom.xml
predict.txt		predict.txt
weights.txt		weights.txt
ython.py		ython.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages