South Korea dataset module by cgomez9 · Pull Request #45 · CoronaWhy/task-geo

cgomez9 · 2020-04-07T01:20:38Z

Extraction and formatting of the dataset of confirmed cases, deaths and recovered per patient in South Korea #20

ManuelAlvarezC

audit.md and datapackage.json are missing.

Also, on your PR template, there was no checklist?

ManuelAlvarezC · 2020-04-07T12:00:15Z

.gitignore

+### PyCharm ###
+# Covers JetBrains IDEs: IntelliJ, RubyMine, PhpStorm, AppCode, PyCharm, CLion, Android Studio and WebStorm
+# Reference: https://intellij-support.jetbrains.com/hc/en-us/articles/206544839


I'm totally fine with you updating the .gitignore with required files, but adding your personal IDE files is considered a bad practice, you can see how to move this contents into a global .gitignore file for your local installation here.

.gitignore has been updated.
also, audit.md and datapackage.json will be added too.
also, didnt notice the PULL_REQUEST_TEMPLATE.md, so ill take a look into it! thanks.

This is awesome, I didn't know you can set a global gitignore thank you @ManuelAlvarezC !

@ManuelAlvarezC Could you elaborate on the audit.md and the datapackage.json? just this two left.

Although is not as complete as it should be, thedocumentation will help you get a good grab on it.

@ManuelAlvarezC Thank you, ill take a look at it, finalize the edit and make the push.

ManuelAlvarezC · 2020-04-07T12:25:53Z

task_geo/data_sources/covid/south_korea/__init__.py

@@ -0,0 +1,3 @@
+from task_geo.data_sources.covid.south_korea.south_korea_patients import south_korea_patients
+
+__all__ = ['south_korea_patients']


Rename your data source to south_korea, or even better, the ISO code, something like kr_covid

ManuelAlvarezC · 2020-04-07T15:29:39Z

task_geo/data_sources/covid/south_korea/south_korea_patients.py

+        'infected_by', 'contact_number'
+    ]
+    df = df.reindex(columns=cols_ordered)
+    df['confirmed_date'] = pd.to_datetime(df.confirmed_date)


This cast can be done in two lines with:

date_columns = [...] df[date_columns] = df[date_columns].apply(pd.to_datetime) # This was written originally as pd.to_datetime(df[columns]) which crashes.

@ManuelAlvarezC Could you elaborate on this?

Sure thing.

On the lines between 37-41(I only selected the line 37 to comment, a blunder of mine) you are casting columns to datetime and reassigning them multiple times. This approach has two drawbacks:

More lines of code to read and write, making it easier to miss details and introduce errors.

It's in fact much faster making the casting and assigning of all the columns at once. This is what is called vectorization and pandas ( and numpy too) are designed to have vectorized operations run much faster than regular iteration in python.

Also, passing the date format to to_datetime will further improve its performance.

ManuelAlvarezC · 2020-04-07T15:30:39Z

task_geo/data_sources/covid/south_korea/south_korea_patients.py

+import requests
+
+
+def south_korea_patients_connector(*args, **kwargs):


I'm not sure why the signature of the functions is with *args, and **kwargs if you are only expecting one argument.
Also, this argument is a url, does this data source work with any url? I think this connector should take no arguments. Can you please update the signature for both connector and data source?

Done, takes no argument now and the url has been set

KrSuma · 2020-04-14T03:40:35Z

Update: audit.md is done, some changes are made to comply with the lint for flake8 (checks during commit).

Just have the datapackage.json left to do.

once this is done, will be making another commit

South Korea dataset module

85b25ef

ManuelAlvarezC requested changes Apr 7, 2020

View reviewed changes

without audit.md and datapackage.json

4274adb

added audit(unfinished), datapackage left

cdc16de

ManuelAlvarezC added ci-fail When the CI fails for a PR waiting-review-changes labels Apr 19, 2020

		@@ -0,0 +1,3 @@
		from task_geo.data_sources.covid.south_korea.south_korea_patients import south_korea_patients

		__all__ = ['south_korea_patients']

		import requests


		def south_korea_patients_connector(args, *kwargs):

Conversation

cgomez9 commented Apr 7, 2020

Uh oh!

ManuelAlvarezC left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ManuelAlvarezC Apr 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KrSuma Apr 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

KrSuma commented Apr 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ManuelAlvarezC left a comment •

edited

Loading

ManuelAlvarezC Apr 7, 2020 •

edited

Loading

KrSuma Apr 9, 2020 •

edited

Loading

KrSuma commented Apr 14, 2020 •

edited

Loading