diff --git a/.gitignore b/.gitignore index 89fae39..115fd97 100644 --- a/.gitignore +++ b/.gitignore @@ -1,4 +1,5 @@ site/ .cache/ .DS_Store +.venv/ diff --git a/docs/how-zooniverse-works/index.md b/docs/how-zooniverse-works/index.md new file mode 100644 index 0000000..f744813 --- /dev/null +++ b/docs/how-zooniverse-works/index.md @@ -0,0 +1,236 @@ +--- +title: How Zooniverse Works +description: A visual guide to understanding if Zooniverse is right for your research +--- + +# How Zooniverse Works + +A researcher's guide to evaluating whether Zooniverse can enable your research goals. This page provides visual decision frameworks to help you understand the platform's capabilities and determine if citizen science is a good fit for your project. + +!!! info "About these diagrams" + Each concept is presented in multiple visual styles where applicable. This helps you find the representation that best matches how you think about the problem. Compare them and use whichever resonates with your team. + +--- + +## 1. Core Data Flow + +Understanding the fundamental transformation that Zooniverse enables: turning your raw data into volunteer-generated annotations. + +![Core Data Flow - Horizontal](../img/how-zooniverse-works/01a-core-data-flow-horizontal.svg) + +![Core Data Flow - Vertical](../img/how-zooniverse-works/01b-core-data-flow-vertical.svg) + +**Key takeaway:** Zooniverse transforms your **Subjects** (data) through **Workflows** (tasks) into **Annotations** (results). Everything else is optimization of this core loop. + +--- + +## 2. Is Zooniverse Right For Me? + +The most important question to answer before investing time in project setup. Use these decision frameworks to evaluate fit. + +![Is Zooniverse Right - Flowchart](../img/how-zooniverse-works/02a-is-zooniverse-right-flowchart.svg) + +![Is Zooniverse Right - Decision Tree](../img/how-zooniverse-works/02b-is-zooniverse-right-tree.svg) + +**Key questions to consider:** + +1. **Dataset size:** Do you have enough items (typically 1,000+) to justify the setup overhead? +2. **Task suitability:** Can non-experts meaningfully contribute to your analysis? +3. **Team capacity:** Do you have resources to engage with the volunteer community? +4. **Timeline:** Can you accommodate weeks-to-months of data collection? + +--- + +## 3. Dataset Evaluation Matrix + +Not all datasets are equally suited for crowdsourcing. This matrix helps you understand where your data falls on the suitability spectrum. + +![Dataset Evaluation Matrix](../img/how-zooniverse-works/03-dataset-evaluation-matrix.svg) + +**How to read this:** + +- **Rows:** Your dataset size (number of items to classify) +- **Columns:** How complex your task is +- **Colors:** Green = ideal fit, Yellow = possible with caveats, Red = likely poor fit + +**Common patterns:** + +- Large datasets with simple tasks: *Perfect for Zooniverse* +- Small datasets with complex tasks: *Consider expert annotation instead* +- Medium datasets with moderate tasks: *The sweet spot for most projects* + +--- + +## 4. Task Decomposition + +One of the most powerful optimizations: breaking complex tasks into sequences of simpler ones. + +![Task Decomposition](../img/how-zooniverse-works/04-task-decomposition.svg) + +**Why decompose?** + +- **Faster classification:** Simple yes/no questions take seconds, not minutes +- **Higher accuracy:** Focused tasks have better volunteer agreement +- **Early filtering:** Remove "blank" or uninteresting subjects before expensive analysis +- **Parallel processing:** Different volunteers can work on different stages simultaneously + +**Example decomposition:** + +| Complex Task | Decomposed Tasks | +|--------------|------------------| +| "Identify all species, count them, note behaviors" | WF1: "Any animals?" → WF2: "What species?" → WF3: "How many?" | + +--- + +## 5. Who's Responsible for What? + +Clear expectations for what you need to provide versus what the platform handles. + +![Responsibilities](../img/how-zooniverse-works/05-responsibilities.svg) + +**Your critical responsibilities:** + +1. **Data preparation:** Format subjects, create manifests, ensure image quality +2. **Workflow design:** Define tasks clearly, write unambiguous instructions +3. **Community engagement:** Respond on Talk, answer questions, show appreciation +4. **Data analysis:** Process exports, aggregate results, publish findings + +**What Zooniverse provides:** + +- Hosting infrastructure and volunteer community access +- Project Builder tools and classification interface +- Data storage, export, and (optionally) aggregation +- Discussion boards and community tools + +--- + +## 6. Caesar: Real-time Data Processing + +For projects that need automated actions without relying on machine learning. + +![Caesar Flow](../img/how-zooniverse-works/06-caesar-flow.svg) + +**What Caesar enables:** + +- **Consensus-based retirement:** Stop showing subjects when enough people agree +- **Workflow advancement:** Move interesting subjects to more detailed analysis +- **Webhook notifications:** Trigger your own systems when rules are met +- **No ML required:** Pure logic based on volunteer consensus + +**When to use Caesar:** + +- You want subjects retired automatically (not manually) +- You need to move subjects between workflows based on results +- You want to integrate Zooniverse with external systems +- You don't want to download and process all data manually + +--- + +## 7. External Service Integration + +Zooniverse doesn't have to be an island. Here's how your systems can connect. + +![External Integration](../img/how-zooniverse-works/07-external-integration.svg) + +**Data In (your systems → Zooniverse):** + +- Host subject media on your own S3/cloud storage +- Use the Panoptes API to upload subjects programmatically +- Pre-process data with ML before sending to volunteers + +**Data Out (Zooniverse → your systems):** + +- Webhooks for real-time notifications +- API access for automated export retrieval +- Caesar hooks for custom processing pipelines + +--- + +## 8. Volunteer Engagement Cycle + +Sustained volunteer engagement requires intentional effort throughout the project lifecycle. + +![Volunteer Engagement](../img/how-zooniverse-works/08-volunteer-engagement.svg) + +**The engagement loop:** + +1. **Discover:** Volunteers find your project (homepage, social media, word of mouth) +2. **Learn:** Tutorial teaches them how to contribute effectively +3. **Classify:** They do the actual work +4. **Discuss:** Talk boards enable community and discovery +5. **Feedback:** You share results and progress updates +6. **Newsletter:** Periodic updates maintain connection +7. **Return:** Engaged volunteers come back +8. **Share:** Happy volunteers recruit others + +**Your role in the cycle:** + +- Provide clear tutorials (step 2) +- Engage actively on Talk (steps 4-5) +- Send updates via newsletters (step 6) +- Acknowledge contributions publicly (step 5) + +--- + +## 9. Multi-Workflow Pipelines + +Advanced technique for maximizing efficiency through progressive filtering. + +![Multi-Workflow Pipeline](../img/how-zooniverse-works/09-multi-workflow-pipeline.svg) + +**How it works:** + +1. **Triage workflow:** Quick binary question filters out uninteresting subjects +2. **Identification workflow:** Only interesting subjects get detailed analysis +3. **Expert workflow:** Only rare/unusual items get the most attention + +**Benefits:** + +- Reduces total classification effort (70% filtered at step 1) +- Focuses expensive tasks on valuable subjects +- Enables different retirement rules per workflow +- Creates natural subject prioritization + +--- + +## 10. Output Options: Raw vs. Aggregated + +Understanding what data you'll get and what it's best used for. + +![Output Options](../img/how-zooniverse-works/10-output-options.svg) + +**Raw classifications:** + +- Every individual volunteer response +- Best for: ML training, research papers, auditing, custom aggregation +- Tradeoff: Larger files, requires processing + +**Aggregated/Consensus:** + +- Combined result per subject (via Caesar or offline processing) +- Best for: Quick analysis, catalogs, immediate use +- Tradeoff: Less flexibility, fixed aggregation algorithm + +**Questions to consider:** + +- Do I need to train machine learning models? → Raw data +- Do I need an audit trail of individual responses? → Raw data +- Do I just need the final answer per subject? → Aggregated +- Am I building a catalog or database? → Aggregated + +--- + +## Next Steps + +Based on your evaluation: + +| If... | Then... | +|-------|---------| +| Zooniverse is a good fit | Start with [Getting Started](../getting-started/index.md) | +| You need transcription | See the [Transcription Project Guide](../transcription-project-guide/index.md) | +| You're ready to build | Go to [Project Builder](https://www.zooniverse.org/lab) | +| You need help deciding | [Contact the Zooniverse team](https://zooniverse.org/about/contact) | + +--- + +*This guide was developed from brainstorming sessions with Lucy and Travis on helping researchers evaluate Zooniverse fit from a heuristic decision-tree perspective.* diff --git a/docs/img/how-zooniverse-works/01a-core-data-flow-horizontal.svg b/docs/img/how-zooniverse-works/01a-core-data-flow-horizontal.svg new file mode 100644 index 0000000..abf5538 --- /dev/null +++ b/docs/img/how-zooniverse-works/01a-core-data-flow-horizontal.svg @@ -0,0 +1,60 @@ + + + + + + + + + + Core Data Flow (Horizontal) + + + + + Subjects + Your research data + + + + + + + + Project + Workflow + Tasks volunteers perform + on your data + + + + + + + + Annotations + Volunteer responses + + + + Data In + + Zooniverse + + Output + \ No newline at end of file diff --git a/docs/img/how-zooniverse-works/01b-core-data-flow-vertical.svg b/docs/img/how-zooniverse-works/01b-core-data-flow-vertical.svg new file mode 100644 index 0000000..6eb93cc --- /dev/null +++ b/docs/img/how-zooniverse-works/01b-core-data-flow-vertical.svg @@ -0,0 +1,58 @@ + + + + + + + + + + Core Data Flow (Vertical) + + + INPUT + PROCESS + OUTPUT + + + + + 1 + Subjects + Images, audio, video, documents + + + + + + + + 2 + Project + Workflow + Define what volunteers + should do with your data + + + + + + + + 3 + Annotations + Classifications from volunteers + \ No newline at end of file diff --git a/docs/img/how-zooniverse-works/02a-is-zooniverse-right-flowchart.svg b/docs/img/how-zooniverse-works/02a-is-zooniverse-right-flowchart.svg new file mode 100644 index 0000000..44579ab --- /dev/null +++ b/docs/img/how-zooniverse-works/02a-is-zooniverse-right-flowchart.svg @@ -0,0 +1,99 @@ + + + + + + + + + + + + + + + + Is Zooniverse Right For Me? (Flowchart Style) + + + + I have a dataset + + + + + + Large enough for + crowdsourcing? + + + + Yes + + + + No + + + + Task suitable for + non-experts? + + + Yes + + + + + + No + + + + Team capacity to + engage volunteers? + + + Yes + + + No + + + + Timeline allows for + volunteer ramp-up? + + + + Yes + + + Zooniverse is a fit! + + + + Consider + alternatives + + + + + \ No newline at end of file diff --git a/docs/img/how-zooniverse-works/02b-is-zooniverse-right-tree.svg b/docs/img/how-zooniverse-works/02b-is-zooniverse-right-tree.svg new file mode 100644 index 0000000..ab0803a --- /dev/null +++ b/docs/img/how-zooniverse-works/02b-is-zooniverse-right-tree.svg @@ -0,0 +1,97 @@ + + + + + + + Is Zooniverse Right For Me? (Decision Tree Style) + + + + Start + I have data + + + + + + + Dataset Size + >1000 items for + meaningful stats? + + + + Y + + + + N + + + + Task Type + Can non-experts + do this task? + + + + Y + + + + N + + + + Team Capacity + Can engage with + volunteer community? + + + + Y + + + + N + + + + Timeline + Months to collect + classifications? + + + + Y + + + Use Zooniverse! + + + + Consider ML + + + Hire experts + + + Too small + \ No newline at end of file diff --git a/docs/img/how-zooniverse-works/03-dataset-evaluation-matrix.svg b/docs/img/how-zooniverse-works/03-dataset-evaluation-matrix.svg new file mode 100644 index 0000000..c749ce2 --- /dev/null +++ b/docs/img/how-zooniverse-works/03-dataset-evaluation-matrix.svg @@ -0,0 +1,129 @@ + + + + + + + Dataset Evaluation Matrix: Size vs. Task Complexity + + + Dataset Size + + + Task Complexity + + + 100K+ + 10K-100K + 1K-10K + 100-1K + <100 + + + Simple + (binary Q) + Moderate + (multi-choice) + Complex + (drawing) + Expert + (transcription) + + + + Ideal + Fast results + + + Ideal + High volume + + + Great + Rich data + + + Possible + Long timeline + + + + Great + + + Ideal + + + Great + + + Possible + + + + Good + + + Good + + + Possible + + + Challenging + + + + Possible + + + Possible + + + Challenging + + + Not ideal + + + + Too small + + + Too small + + + Too small + + + Not viable + + + + Ideal fit + + + Great fit + + + Possible + + + Challenging + + + Poor fit + \ No newline at end of file diff --git a/docs/img/how-zooniverse-works/04-task-decomposition.svg b/docs/img/how-zooniverse-works/04-task-decomposition.svg new file mode 100644 index 0000000..921fe52 --- /dev/null +++ b/docs/img/how-zooniverse-works/04-task-decomposition.svg @@ -0,0 +1,104 @@ + + + + + + + + + + Task Decomposition: Breaking Complex Tasks into Sequences + + + "Can I reduce my task to a sequence of simpler tasks?" + + + BEFORE: One Complex Task + + + Complex Task + "Identify all species, + count individuals, + note behaviors, + record interactions" + + + 5 min/item + + + High error + + + + DECOMPOSE + + + AFTER: Sequential Simple Tasks + + + + Workflow 1: Detect + "Is there an animal?" + Yes / No + + + + + + + + Workflow 2: Identify + "What species?" + Select from list + + + + + + + + Workflow 3: Count + "How many?" + 1, 2, 3, 4+ + + + + Benefits + + + Faster per-task + + Lower error rate + + Better consensus + + Parallel processing + + Early filtering + + Easier training + + "Blank" subjects + filtered at step 1 + + + + Complex Task Stats + Retirement: 15+ per subject + Agreement: ~60% + + + Simple Task Stats + Retirement: 3-5 per subject + Agreement: ~90%+ + \ No newline at end of file diff --git a/docs/img/how-zooniverse-works/05-responsibilities.svg b/docs/img/how-zooniverse-works/05-responsibilities.svg new file mode 100644 index 0000000..1f0182c --- /dev/null +++ b/docs/img/how-zooniverse-works/05-responsibilities.svg @@ -0,0 +1,137 @@ + + + + + + + Who's Responsible for What? + Clear separation of researcher work vs. platform capabilities + + + + + + RESEARCHER DOES + + + Prepare subjects + Format data, create manifests + + + Design workflows + Define tasks, write instructions + + + Create tutorials + Train volunteers, set expectations + + + Engage community + Answer questions on Talk + + + Analyze data + Process exports, publish results + + + Acknowledge volunteers + Credit in publications + + + Iterate on design + Improve based on feedback + + + Set retirement rules + How many classifications needed + + + + + + ZOONIVERSE PROVIDES + + + Host infrastructure + Servers, storage, uptime + + + Project Builder + No-code workflow creation + + + Volunteer community + 2M+ registered users + + + Classification interface + Consistent UX for all projects + + + Data storage + Subjects, classifications, exports + + + Talk discussion boards + Community engagement tools + + + Subject retirement + Automatic based on your rules + + + Data exports + CSV/JSON classification data + + + + + + OPTIONAL / ADVANCED + + + Caesar aggregation + Real-time data processing + + + External data hooks + Send data to your services + + + Machine learning + Auto-retirement, pre-filtering + + + Multi-workflow pipelines + Chain tasks together + + + Custom subject selection + Priority queues, weighting + + + Gold standard validation + Quality assurance checks + + + Newsletters + Email campaigns to volunteers + + + Translations + Multi-language support + \ No newline at end of file diff --git a/docs/img/how-zooniverse-works/06-caesar-flow.svg b/docs/img/how-zooniverse-works/06-caesar-flow.svg new file mode 100644 index 0000000..b6fd798 --- /dev/null +++ b/docs/img/how-zooniverse-works/06-caesar-flow.svg @@ -0,0 +1,119 @@ + + + + + + + + + + + + + Caesar: Real-time Data Processing (Without ML) + + + Input + Extract + Reduce + Rules + Actions + + + + Classification + Volunteer submits + answers/marks + + + + + + Extractor + Pull specific data + from classification + + + + + + Reducer + Aggregate across + multiple volunteers + + + + + + Rule + Met? + + + + Yes + + + Retire + + + + Move to WF2 + + + + Webhook + + + + Example: Consensus-based retirement (no ML) + + + + 3 volunteers + classify same subject + "Is there a galaxy?" + Yes, Yes, Yes + + + + + Extract + question_extractor + → [Yes, Yes, Yes] + + + + + Reduce + stats_reducer + → 100% agree "Yes" + + + + + Rule Check + consensus ≥ 80% + AND count ≥ 3 + + + + + Retire! + Subject done + \ No newline at end of file diff --git a/docs/img/how-zooniverse-works/07-external-integration.svg b/docs/img/how-zooniverse-works/07-external-integration.svg new file mode 100644 index 0000000..cb626af --- /dev/null +++ b/docs/img/how-zooniverse-works/07-external-integration.svg @@ -0,0 +1,118 @@ + + + + + + + + + + + + + + + + External Service Integration Points + + + Your Systems + Zooniverse + Your Systems + + + + Your Data Sources + + + S3 / Cloud Storage + Host your own media + + + Database / API + Subject metadata + + + ML Pipeline + Pre-processed subjects + + + + Zooniverse Platform + + + Subjects + Can reference + external URLs + + + Workflows + Define tasks + + + Caesar + Processing + + + Panoptes API + REST endpoints + for automation + + + Webhooks + Event-driven + notifications + + + + Your Processing + + + Custom Aggregator + Your own reduction logic + + + ML Training + Labels for models + + + Database Sync + Update your records + + + Dashboard + Real-time monitoring + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/docs/img/how-zooniverse-works/08-volunteer-engagement.svg b/docs/img/how-zooniverse-works/08-volunteer-engagement.svg new file mode 100644 index 0000000..91ce590 --- /dev/null +++ b/docs/img/how-zooniverse-works/08-volunteer-engagement.svg @@ -0,0 +1,109 @@ + + + + + + + + + + Volunteer Engagement Cycle + + + + + + + + + + + + + Sustained + Engagement + + + + 1 + + Discover Project + Zooniverse homepage, social + + + + 2 + + Tutorial + Learn how to + contribute + + + + 3 + + Classify + Do the + work + + + + 4 + + Talk + Discuss with + community + + + + 5 + + Results & Feedback + Researcher shares progress + + + + 6 + + Newsletter + Updates & + recognition + + + + 7 + + Return + Continue + classifying + + + + 8 + + Share + Tell friends, + recruit others + + + + Platform + + Volunteer + + Researcher + \ No newline at end of file diff --git a/docs/img/how-zooniverse-works/09-multi-workflow-pipeline.svg b/docs/img/how-zooniverse-works/09-multi-workflow-pipeline.svg new file mode 100644 index 0000000..2f60d07 --- /dev/null +++ b/docs/img/how-zooniverse-works/09-multi-workflow-pipeline.svg @@ -0,0 +1,118 @@ + + + + + + + + + + + + + + + + Multi-Workflow Pipeline: Progressive Filtering + + + + All Subjects + 10,000 + + + + + + + + Workflow 1 + Triage + "Is there something + interesting here?" + + Quick binary task + 3 volunteers each + 30,000 clicks + + + + Nothing: 70% + + + Filtered + 7,000 + + + Interesting: 30% + + + + To WF2 + 3,000 + + + + + + + + Workflow 2 + Identify + "What species + is this?" + + Multi-choice task + 5 volunteers each + 15,000 clicks + + + + Common: 80% + + + Done + 2,400 + + + Rare: 20% + + + + To WF3 + 600 + + + + Efficiency Gains + + + Without pipeline: + 50,000+ total clicks + + + With pipeline: + ~45,000 clicks (10% saved) + + + Plus: detailed data only where needed + 600 rare items get expert attention + \ No newline at end of file diff --git a/docs/img/how-zooniverse-works/10-output-options.svg b/docs/img/how-zooniverse-works/10-output-options.svg new file mode 100644 index 0000000..0acbcea --- /dev/null +++ b/docs/img/how-zooniverse-works/10-output-options.svg @@ -0,0 +1,95 @@ + + + + + + + + + + Output Options: Raw vs. Aggregated Data + "What level of accuracy do I need?" + + + + Classifications + Raw volunteer data + + + + + + + + + Raw Classifications + + Every individual response + + + + subject_1: [ + {user: A, answer: "Yes"}, + {user: B, answer: "Yes"}, + {user: C, answer: "No"} + ] + + + + Full transparency + + Custom aggregation + + ML training data + + Audit trail + + - Large files + - Requires processing + - More complex analysis + + + + Best for: ML, research, auditing + + + + + + Aggregated / Consensus + + Combined result per subject + + + + subject_1: { + consensus: "Yes", + agreement: 0.67, + count: 3 + } + + + + Ready to use + + Smaller files + + Direct answers + + Confidence scores + + - Less flexibility + - Fixed algorithm + - No individual data + + + + Best for: Quick analysis, catalogs + \ No newline at end of file diff --git a/mkdocs.yml b/mkdocs.yml index d6ade74..7f66c85 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -25,6 +25,8 @@ plugins: - search markdown_extensions: + - admonition + - pymdownx.details - pymdownx.highlight: anchor_linenums: true line_spans: __span @@ -64,4 +66,6 @@ nav: - transcription-project-guide/project-launch.md - transcription-project-guide/working-with-data.md - transcription-project-guide/project-maintenance-and-conclusion.md - - transcription-project-guide/acknowledgements-and-resources.md \ No newline at end of file + - transcription-project-guide/acknowledgements-and-resources.md + - 'How Zooniverse Works': + - how-zooniverse-works/index.md \ No newline at end of file