A place for exploratory and micro data science projects for Nesta's ASF mission.
This repository is for work that is small and self-contained, is not part of specific ASF projects or is exploratory in nature. Examples of what might be included are exploring the potential for a new dataset, testing out a modelling approach or handling an ad-hoc request for data analysis.
Each exploration is in its own sub-directory, which can be navigated using the Directory below. Contributing guidelines are below.
Exploration of RECC cpmplaints data: exploratory data analysis of 3 years of RECC complaints.
< Project name (with link to sub-directory) > < Short description of project and links to any key resources >
- Use the directory above to find the exploration that you are looking for.
- Clone this repository.
- Create the environment for that exploration. This will probably involve running
conda create -n <exploration>_<name> python=3and installing some requirements specified in the sub-directory. Each exploration should have its own installation instructions.
To add a new exploration to this repository:
- Determine whether this is the right place for the work. As a rule of thumb it should be: a. Executed with a fairly small codebase (e.g. 1 notebook and a utils module). b. Exploratory in nature (e.g. a small analysis for internal project scoping). c. Not contributing to a signifcant external output for the organisation.
- Create an issue for the exploration. For example, Exploration of RECC complaints data.
- Create a new branch from
devand check it out withgit checkout -b <issue number>_<short>_<exploration>_<name> - Create a subdirectory for this exploration.
- Write and commit all code, data and documentation inside this sub-directory. Make sure to add a
README.md. - Once development is finished, add the name and description of the exploration to the Directory above.
- Create a pull request into
dev.
When writing your exploration, the following requirements must be met.
👍 Development standards
- Adhere to the Google Python Style Guide.
- All requirements to run exploratory projects should be independent and defined within the sub-directory (e.g. in a
requirements.txtorenvironment.yml). A user should be able to create an environment from scratch to run the code. - The code must run from start to finish without error.
- Modularised and refactored code is preferred over notebooks. If using notebooks, use
jupytext. - There should be no imports from other explorations.
- Each exploration should contain its own
README.mdthat gives a comprehensive description of the work.
🔀 Workflow
- Make pull requests and request code reviews.
- Use the Nesta Git/GitHub guidelines. If your code is simple and short, then you may work in one branch and merge into
devwith one PR. If there are a few development steps then consider breaking the code down into smaller issues, branches and PRs.
💾 Data
- It is preferable that data are stored on S3.
- Very small datasets (~1Mb) can be stored in the repository.
If you find that you are revisiting a project here, or beginning to do significant development, you may want to review the decision to have the work in its own repository.
Many of these guidelines require judgement on your part. If in doubt, chat to someone else in the team 🙂