Skip to content

Added new method to get speakers#1

Open
adityaladhad wants to merge 8 commits into
Mittal-Analytics:mainfrom
adityaladhad:speakers
Open

Added new method to get speakers#1
adityaladhad wants to merge 8 commits into
Mittal-Analytics:mainfrom
adityaladhad:speakers

Conversation

@adityaladhad

Copy link
Copy Markdown

Added a new method that Identifies names and firms using the core logic that proper nouns start with Capital letters

Comment thread tests/test_extract_speakers.py Outdated
Comment thread src/concall_tools.py Outdated
Comment thread src/concall_tools.py Outdated
Comment thread src/speakers/extraction.py Outdated
Comment thread src/speakers/extraction.py Outdated
Comment thread src/speakers/extraction.py Outdated
Comment thread src/speakers/extraction.py Outdated
Comment thread src/speakers/extraction.py
Comment thread src/speakers/extraction.py
Comment thread src/speakers/extraction.py Outdated
Comment thread src/speakers/extraction.py
Comment thread src/speakers/extraction.py Outdated
Comment thread src/speakers/extraction.py
Comment thread src/speakers/extraction.py
Comment thread src/speakers/extraction.py Outdated
flag_3=0
firm=''
for c in range(name_index+len(name.split())-position,len(w)):
if ((w[c][0].isupper() or w[c]=='individual') and flag_2==0 and w[c]!='Sir,' and w[c]!='Mr.'):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's repeated logic b/w passes, let's reduce rewrite.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There would be too many variables involved to write a meaningful function, also some conditions vary from pass to pass

Comment thread src/concall_tools.py Outdated
speakers1=_get_speakers_capitals(pdf_name)
for s in speakers:
names.append(s[0])
for s in speakers1:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have another algo type as capitals and maybe just return the results from that algo only.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After getting speaker names and firms from the capitals method, I'm finding common names and returning them to reduce the possibility of false positives

Comment thread src/speakers/extraction.py Outdated

@manujagobind manujagobind left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work!

Comment thread src/concall_tools.py Outdated
Comment thread src/concall_tools.py Outdated
Comment thread src/speakers/extraction.py Outdated
Comment thread src/speakers/extraction.py Outdated
Comment thread src/speakers/extraction.py Outdated
Comment thread src/speakers/extraction.py
Comment thread src/speakers/extraction.py Outdated
Comment thread src/speakers/extraction.py Outdated
if count/len(name.split())>=0.5 and count>=count_max:
#flag_2 is used to check if We've found any work Starting with a Capital
#flag_3 is used to check if We've reached end of Conversation while adding names to the firm
flag_2=0

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We can use more meaningful names instead of flag.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meaning is explained in the comment and variables not used that often

Comment thread src/speakers/extraction.py Outdated
if count/len(name.split())>=0.33 and count>=count_max:
#flag_2 is used to check if We've found any work Starting with a Capital
#flag_3 is used to check if We've reached end of Conversation while adding names to the firm
flag_2=0

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If code blocks are repeating b/w passes then we can group into a function.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many variables involved to write a logical function

Comment thread src/concall_tools.py Outdated
if not speakers and (algorithm == "auto" or algorithm == "plain"):
speakers = _get_speakers_from_text(doc)
return speakers
speakers_in_capital=_get_speakers_capitals(pdf_name)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we just want to process based on bold only?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to solve it a little, but we need to come up with a better way to integrate all the Algorithms

expected = [
Speaker(name="Moderator", firm=None),
Speaker(name="Sayam Pokharna", firm="The Investment Lab"),
Speaker(name="Moderator", firm=None, is_management="No"),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change is_management to True/False

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants