-
Notifications
You must be signed in to change notification settings - Fork 2
feat: new release version #108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
858a9ca
e16b9de
e7b6a9a
6fa49d2
026958c
efe2737
4e60578
9fbea01
c095d30
887fba3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| name: Pre-commit | ||
|
|
||
| on: | ||
| pull_request: | ||
| push: | ||
| branches: [main, development] | ||
|
|
||
| jobs: | ||
| pre-commit: | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - name: Check out repository | ||
| uses: actions/checkout@v4 | ||
|
|
||
| - name: Set up Python | ||
| uses: actions/setup-python@v5 | ||
| with: | ||
| python-version: '3.10.19' | ||
|
|
||
| - name: Run pre-commit | ||
| uses: pre-commit/action@v3.0.1 |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| repos: | ||
| - repo: https://github.com/pre-commit/pre-commit-hooks | ||
| rev: v4.5.0 | ||
| hooks: | ||
| - id: trailing-whitespace | ||
| - id: end-of-file-fixer | ||
| - id: check-yaml | ||
| exclude: ^mkdocs\.yml$ | ||
| - id: check-added-large-files | ||
| args: ['--maxkb=1000'] | ||
| - id: check-merge-conflict | ||
| - id: check-toml | ||
| - id: debug-statements | ||
|
|
||
| - repo: https://github.com/astral-sh/ruff-pre-commit | ||
| rev: v0.3.4 | ||
| hooks: | ||
| - id: ruff | ||
| args: [--fix] | ||
| - id: ruff-format | ||
|
|
||
| - repo: https://github.com/PyCQA/docformatter | ||
| rev: v1.7.6 | ||
| hooks: | ||
| - id: docformatter | ||
| args: [--in-place, --config, ./pyproject.toml] |
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -18,27 +18,31 @@ | |||||||||
| </p> | ||||||||||
|
|
||||||||||
|
|
||||||||||
| MatText is a framework for text-based materials modeling. It supports | ||||||||||
| MatText is a framework for text-based materials modeling. It supports | ||||||||||
|
|
||||||||||
| - conversion of crystal structures in to text representations | ||||||||||
| - conversion of crystal structures in to text representations | ||||||||||
|
sourcery-ai[bot] marked this conversation as resolved.
|
||||||||||
| - transformations of crystal structures for sensitivity analyses | ||||||||||
| - decoding of text representations to crystal structures | ||||||||||
| - tokenization of text-representation of crystal structures | ||||||||||
| - pre-training, finetuning and testing of language models on text-representations of crystal structures | ||||||||||
| - pre-training, finetuning and testing of language models on text-representations of crystal structures | ||||||||||
| - analysis of language models trained on text-representations of crystal structures | ||||||||||
|
|
||||||||||
|
|
||||||||||
|
|
||||||||||
| ## Local Installation | ||||||||||
|
|
||||||||||
| We recommend that you create a virtual conda environment on your computer in which you install the dependencies for this package. To do so head over to [Miniconda](https://docs.conda.io/en/latest/miniconda.html) and follow the installation instructions there. | ||||||||||
| **Requirements:** | ||||||||||
| - Python 3.10 or 3.11 (tested and supported) | ||||||||||
| - [uv](https://docs.astral.sh/uv/) package manager (recommended) | ||||||||||
|
|
||||||||||
| We recommend using [uv](https://docs.astral.sh/uv/) for fast and reliable Python package management. To install uv, follow the [installation instructions](https://docs.astral.sh/uv/getting-started/installation/). | ||||||||||
|
|
||||||||||
| <!-- ### Install latest release | ||||||||||
|
|
||||||||||
| ### Install latest release | ||||||||||
|
|
||||||||||
| ```bash | ||||||||||
| pip install mattext | ||||||||||
| ``` --> | ||||||||||
| uv pip install git+https://github.com/lamalab-org/mattext.git | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| ### Install development version | ||||||||||
|
|
||||||||||
|
|
@@ -49,16 +53,32 @@ git clone https://github.com/lamalab-org/mattext.git | |||||||||
| cd mattext | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| Create a virtual environment and install: | ||||||||||
|
|
||||||||||
| ```bash | ||||||||||
| uv venv --python 3.10 | ||||||||||
| source .venv/bin/activate # On Windows: .venv\Scripts\activate | ||||||||||
| uv pip install -e ".[dev]" | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| Install pre-commit hooks (optional, for development): | ||||||||||
|
|
||||||||||
| ```bash | ||||||||||
| pip install -e . | ||||||||||
| pre-commit install | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| If you want to use the Local Env representation, you will also need to install OpenBabel, e.g. using | ||||||||||
| If you want to use the Local Env representation, you will also need to install OpenBabel. You can install it via conda/mamba: | ||||||||||
|
|
||||||||||
| ```bash | ||||||||||
| ```bash | ||||||||||
| conda install openbabel -c conda-forge | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| or on Ubuntu/Debian: | ||||||||||
|
|
||||||||||
| ```bash | ||||||||||
| sudo apt-get install openbabel | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| ## Getting started | ||||||||||
|
|
||||||||||
| ### Converting crystals into text | ||||||||||
|
|
@@ -94,18 +114,18 @@ requested_text_reps = text_rep.get_requested_text_reps(requested_reps) | |||||||||
| python main.py -cn=pretrain model=pretrain_example +model.representation=composition +model.dataset_type=pretrain30k +model.context_length=32 | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| ### Running a benchmark | ||||||||||
| ### Running a benchmark | ||||||||||
|
|
||||||||||
| ```bash | ||||||||||
| python main.py -cn=benchmark model=benchmark_example +model.dataset_type=filtered +model.representation=composition +model.dataset=perovskites +model.checkpoint=path/to/checkpoint | ||||||||||
| python main.py -cn=benchmark model=benchmark_example +model.dataset_type=filtered +model.representation=composition +model.dataset=perovskites +model.checkpoint=path/to/checkpoint | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| The `+` symbol before a configuration key indicates that you are adding a new key-value pair to the configuration. This is useful when you want to specify parameters that are not part of the default configuration. | ||||||||||
|
|
||||||||||
| To override the existing default configuration, use `++`, for e.g., `++model.pretrain.training_arguments.per_device_train_batch_size=32`. Refer to the [docs](https://lamalab-org.github.io/MatText/) for more examples and advanced ways to use the configs with config groups. | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. issue (typo): Adjust "for e.g." to the standard "e.g.". "For e.g." is nonstandard; use "e.g.," instead:
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fix redundant phrase "for e.g." The phrase "for e.g." is redundant since "e.g." already means "for example". Use either "e.g." or "for example". Suggested fix-To override the existing default configuration, use `++`, for e.g., `++model.pretrain.training_arguments.per_device_train_batch_size=32`.
+To override the existing default configuration, use `++`, e.g., `++model.pretrain.training_arguments.per_device_train_batch_size=32`.📝 Committable suggestion
Suggested change
🧰 Tools🪛 LanguageTool[style] ~125-~125: The phrase ‘for e.g.’ is an tautology (‘e.g.’ means ‘for example’). Consider using just “e.g.” or “for example”. (FOR_EG_REDUNDANCY) [uncategorized] ~125-~125: Do not mix variants of the same word (‘pretrain’ and ‘pre-train’) within a single text. (EN_WORD_COHERENCY) 🤖 Prompt for AI Agents |
||||||||||
|
|
||||||||||
|
|
||||||||||
| ### Using data | ||||||||||
| ### Using data | ||||||||||
|
|
||||||||||
| The MatText datasets can be easily obtained from [HuggingFace](https://huggingface.co/datasets/n0w0f/MatText), for example | ||||||||||
|
|
||||||||||
|
|
@@ -123,19 +143,19 @@ Contributions, whether filing an issue, making a pull request, or forking, are a | |||||||||
|
|
||||||||||
| ## 👋 Attribution | ||||||||||
|
|
||||||||||
| ### Citation | ||||||||||
| ### Citation | ||||||||||
|
|
||||||||||
| If you use MatText in your work, please cite | ||||||||||
| If you use MatText in your work, please cite | ||||||||||
|
|
||||||||||
| ``` | ||||||||||
| @misc{alampara2024mattextlanguagemodelsneed, | ||||||||||
| title={MatText: Do Language Models Need More than Text & Scale for Materials Modeling?}, | ||||||||||
| title={MatText: Do Language Models Need More than Text & Scale for Materials Modeling?}, | ||||||||||
| author={Nawaf Alampara and Santiago Miret and Kevin Maik Jablonka}, | ||||||||||
| year={2024}, | ||||||||||
| eprint={2406.17295}, | ||||||||||
| archivePrefix={arXiv}, | ||||||||||
| primaryClass={cond-mat.mtrl-sci} | ||||||||||
| url={https://arxiv.org/abs/2406.17295}, | ||||||||||
| url={https://arxiv.org/abs/2406.17295}, | ||||||||||
| } | ||||||||||
|
Comment on lines
150
to
159
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fix citation block formatting. The code block should specify a language (e.g., Suggested fix-```
+```bibtex
`@misc`{alampara2024mattextlanguagemodelsneed,
title={MatText: Do Language Models Need More than Text & Scale for Materials Modeling?},
author={Nawaf Alampara and Santiago Miret and Kevin Maik Jablonka},
year={2024},
eprint={2406.17295},
archivePrefix={arXiv},
- primaryClass={cond-mat.mtrl-sci}
- url={https://arxiv.org/abs/2406.17295},
+ primaryClass={cond-mat.mtrl-sci},
+ url={https://arxiv.org/abs/2406.17295}
}🧰 Tools🪛 markdownlint-cli2 (0.21.0)[warning] 150-150: Fenced code blocks should have a language specified (MD040, fenced-code-language) 🤖 Prompt for AI Agents |
||||||||||
| ``` | ||||||||||
|
|
||||||||||
|
|
@@ -146,4 +166,4 @@ The code in this package is licensed under the MIT License. | |||||||||
|
|
||||||||||
| ### 💰 Funding | ||||||||||
|
|
||||||||||
| This project has been supported by the [Carl Zeiss Foundation](https://www.carl-zeiss-stiftung.de/en/) as well as Intel and Merck. | ||||||||||
| This project has been supported by the [Carl Zeiss Foundation](https://www.carl-zeiss-stiftung.de/en/) as well as Intel and Merck. | ||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -10,4 +10,3 @@ model: | |
| per_device_train_batch_size: 64 | ||
| path: | ||
| pretrained_checkpoint: ft_100k_mb_small | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -10,4 +10,3 @@ model: | |
| per_device_train_batch_size: 64 | ||
| path: | ||
| pretrained_checkpoint: ft_300k_mb_small | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
Repository: lamalab-org/MatText
Length of output: 45
🏁 Script executed:
Repository: lamalab-org/MatText
Length of output: 107
🏁 Script executed:
Repository: lamalab-org/MatText
Length of output: 1109
🏁 Script executed:
Repository: lamalab-org/MatText
Length of output: 1732
🏁 Script executed:
Repository: lamalab-org/MatText
Length of output: 1484
🏁 Script executed:
Repository: lamalab-org/MatText
Length of output: 8533
🏁 Script executed:
Repository: lamalab-org/MatText
Length of output: 2281
🏁 Script executed:
Repository: lamalab-org/MatText
Length of output: 666
🏁 Script executed:
Repository: lamalab-org/MatText
Length of output: 577
🏁 Script executed:
Repository: lamalab-org/MatText
Length of output: 230
🏁 Script executed:
Repository: lamalab-org/MatText
Length of output: 681
Remove
openbabelfrom system dependencies—it's not needed for the current test suite.LocalEnv representation (which requires OpenBabel) is not covered by the tests (
test_imports.pyandtest_xtal2pot.pyonly test basic tokenizers and structures). Since the test suite doesn't exercise this feature, installing OpenBabel adds unnecessary overhead to CI. If needed later for integration testing, add it back with a comment explaining its purpose.🤖 Prompt for AI Agents