Added new method to get speakers#1
Conversation
…has more than 4 words
| flag_3=0 | ||
| firm='' | ||
| for c in range(name_index+len(name.split())-position,len(w)): | ||
| if ((w[c][0].isupper() or w[c]=='individual') and flag_2==0 and w[c]!='Sir,' and w[c]!='Mr.'): |
There was a problem hiding this comment.
If there's repeated logic b/w passes, let's reduce rewrite.
There was a problem hiding this comment.
There would be too many variables involved to write a meaningful function, also some conditions vary from pass to pass
| speakers1=_get_speakers_capitals(pdf_name) | ||
| for s in speakers: | ||
| names.append(s[0]) | ||
| for s in speakers1: |
There was a problem hiding this comment.
We can have another algo type as capitals and maybe just return the results from that algo only.
There was a problem hiding this comment.
After getting speaker names and firms from the capitals method, I'm finding common names and returning them to reduce the possibility of false positives
| if count/len(name.split())>=0.5 and count>=count_max: | ||
| #flag_2 is used to check if We've found any work Starting with a Capital | ||
| #flag_3 is used to check if We've reached end of Conversation while adding names to the firm | ||
| flag_2=0 |
There was a problem hiding this comment.
nit: We can use more meaningful names instead of flag.
There was a problem hiding this comment.
Meaning is explained in the comment and variables not used that often
| if count/len(name.split())>=0.33 and count>=count_max: | ||
| #flag_2 is used to check if We've found any work Starting with a Capital | ||
| #flag_3 is used to check if We've reached end of Conversation while adding names to the firm | ||
| flag_2=0 |
There was a problem hiding this comment.
If code blocks are repeating b/w passes then we can group into a function.
There was a problem hiding this comment.
Too many variables involved to write a logical function
| if not speakers and (algorithm == "auto" or algorithm == "plain"): | ||
| speakers = _get_speakers_from_text(doc) | ||
| return speakers | ||
| speakers_in_capital=_get_speakers_capitals(pdf_name) |
There was a problem hiding this comment.
What if we just want to process based on bold only?
There was a problem hiding this comment.
I've tried to solve it a little, but we need to come up with a better way to integrate all the Algorithms
| expected = [ | ||
| Speaker(name="Moderator", firm=None), | ||
| Speaker(name="Sayam Pokharna", firm="The Investment Lab"), | ||
| Speaker(name="Moderator", firm=None, is_management="No"), |
There was a problem hiding this comment.
Change is_management to True/False
Added a new method that Identifies names and firms using the core logic that proper nouns start with Capital letters