Hi,
I'm having trouble setting up the environment for this. I'm using a conda environment on Windows and get the same problem with python 3.9, 3.10 and 3.11. I also made sure to pip install with the requirements.txt here before running pip install newspaper4k.
I will encounter this first issue
File "c:\Users...\scrape_from_urls.py", line 1, in
import newspaper
File "C:\Users...\site-packages\newspaper_init_.py", line 17, in
from .api import (
File "C:\Users...\site-packages\newspaper\api.py", line 11, in
from newspaper.article import Article
File "C:\Users...\site-packages\newspaper\article.py", line 28, in
from .extractors import ContentExtractor
File "C:\Users...\site-packages\newspaper\extractors_init_.py", line 8, in
from newspaper.extractors.content_extractor import ContentExtractor
File "C:\Users...\site-packages\newspaper\extractors\content_extractor.py", line 8, in
from newspaper.extractors.articlebody_extractor import ArticleBodyExtractor
File "C:\Users...\site-packages\newspaper\extractors\articlebody_extractor.py", line 8, in
import newspaper.extractors.defines as defines
File "C:\Users...\site-packages\newspaper\extractors\defines.py", line 2, in
from typing_extensions import TypedDict, NotRequired
ModuleNotFoundError: No module named 'typing_extensions'
No biggie, just need to pip install typing-extensions, so the import works, but then it encounters another error later when I try to call newspaper.article with any url.
File "c:\Users...\scrape_from_urls.py", line 7, in
article = newspaper.article(url)
File "C:\Users...\site-packages\newspaper_init_.py", line 61, in article
a = Article(url, language=language, **kwargs)
File "C:\Users...\site-packages\newspaper\article.py", line 195, in init
scheme = urls.get_scheme(url)
File "C:\Users...\site-packages\newspaper\urls.py", line 370, in get_scheme
return urlparse(abs_url, **kwargs).scheme
File "c:\Users...\lib\urllib\parse.py", line 399, in urlparse
url, scheme, _coerce_result = _coerce_args(url, scheme)
File "c:\Users...\lib\urllib\parse.py", line 136, in _coerce_args
return _decode_args(args) + (_encode_result,)
File "c:\Users...\lib\urllib\parse.py", line 120, in _decode_args
return tuple(x.decode(encoding, errors) if x else '' for x in args)
File "c:\Users...\lib\urllib\parse.py", line 120, in
return tuple(x.decode(encoding, errors) if x else '' for x in args)
AttributeError: 'builtin_function_or_method' object has no attribute 'decode'
I also tried newspaper3k and get a similar AttributeError so I'm wondering if I should be using a different urllib version (urllib3==1.26.18).
Would be great if these could be added to the requirements.txt. Thank you.
Hi,
I'm having trouble setting up the environment for this. I'm using a conda environment on Windows and get the same problem with python 3.9, 3.10 and 3.11. I also made sure to pip install with the requirements.txt here before running pip install newspaper4k.
I will encounter this first issue
No biggie, just need to pip install typing-extensions, so the import works, but then it encounters another error later when I try to call newspaper.article with any url.
I also tried newspaper3k and get a similar AttributeError so I'm wondering if I should be using a different urllib version (urllib3==1.26.18).
Would be great if these could be added to the requirements.txt. Thank you.