Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions .github/workflows/pypi-publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,13 @@ jobs:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v6
- name: Set up Python
uses: actions/setup-python@v5
uses: actions/setup-python@v6
with:
python-version: '3.11'
python-version: '3.12'
- name: Install uv
uses: astral-sh/setup-uv@v5
uses: astral-sh/setup-uv@v8.1.0
- name: Build
run: uv build
- name: Publish
Expand Down
12 changes: 7 additions & 5 deletions .github/workflows/run-pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,15 @@ jobs:
python-version: ['3.11', '3.12', '3.13']

steps:
- uses: actions/checkout@v4
- uses: actions/checkout@v6
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v5
uses: actions/setup-python@v6
with:
python-version: ${{ matrix.python-version }}
- name: Install uv
uses: astral-sh/setup-uv@v5
uses: astral-sh/setup-uv@v8.1.0
- name: Install dependencies
run: uv sync --all-extras --dev
run: uv sync --dev
- name: Extract test files
run: ./.github/scripts/extract_files.sh
env:
Expand All @@ -44,8 +44,10 @@ jobs:
KFINTECH_CAS_FILE_NEW: ${{ secrets.KFINTECH_CAS_FILE_NEW }}
KFINTECH_CAS_PASSWORD: ${{ secrets.KFINTECH_CAS_PASSWORD }}
NSDL_CAS_FILE_1: ${{ secrets.NSDL_CAS_FILE_1 }}
CDSL_CAS_FILE_1: ${{ secrets.CDSL_CAS_FILE_1 }}
CDSL_CAS_PASSWORD: ${{ secrets.CDSL_CAS_PASSWORD }}
- name: Upload coverage report to codecov
uses: codecov/codecov-action@v5
uses: codecov/codecov-action@v6
with:
files: ./coverage.xml
token: ${{ secrets.CODECOV_TOKEN }}
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,7 @@ dmypy.json
tests/files/**
tests/files.tar
tests/files.tar.bz2
tests/samples/**
.DS_Store

casparser.code-workspace
Expand Down
83 changes: 68 additions & 15 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,71 @@
# Changelog

## 1.0.0

Major release. The parsing backend was rewritten from scratch on
[pypdfium2](https://github.com/pypdfium2-team/pypdfium2) (Apache-2.0 /
BSD-3) and the four supported CAS issuers now each have a dedicated
parser tuned to their template family.

### Breaking changes

- **pdfminer.six and PyMuPDF backends removed.** `casparser.read_cas_pdf`
no longer dispatches between them. The `mupdf` / `fast` extras in
`pyproject.toml` are gone. The `--force-pdfminer` CLI flag and the
`force_pdfminer=` kwarg on `read_cas_pdf` are kept as no-ops; the
kwarg emits a `DeprecationWarning` and is otherwise ignored.
- **License simplified to pure MIT.** With the GPL/AGPL-licensed
PyMuPDF dependency gone, the `licenses/` directory of GPL/AGPL
copies has been removed. pypdfium2 is dual Apache-2.0 / BSD-3 and
doesn't impose any copyleft obligation on users of casparser.
- **Minimum Python is now 3.11.** 3.9 / 3.10 classifiers dropped from
`pyproject.toml`.
- **`CASData.investor_info` is now `Optional[InvestorInfo]`** (matches
the `NSDLCASData.investor_info` shape that already existed). It is
populated on every supported issuer, but consumers should still
guard against the `None` case for unfamiliar templates.
- **Internal `casparser.process` package removed.** The two helpers
downstream code still imports from it are now at
`casparser.parsers._classify` (`get_parsed_scheme_name`,
`get_transaction_type`) and `casparser.parsers._isin` (`isin_search`).

### New

- **First-class NSDL and CDSL parsers.** Drops the regex-on-text
approach the 0.8 NSDL/CDSL code used; the new parsers consume
structured `Block`/`Cell` records directly from `pypdfium2`. Several
bugs the v0.8 NSDL/CDSL code shipped with are no longer in scope
(misplaced-UCC-as-folio on NSDL MF Holdings, space-merged
folio+units cells on CDSL, the silently-dropped NSDL HDFC
subaccount on CDSL multi-account statements, `Optional[Decimal]`
comma-strip miss in the `MutualFund` validator).
- **CAMS / KFin 2026 templates supported** out of the box. The newer
CAMS SUMMARY template added an ISIN column the v0.8 regex didn't
match; v1.0 parses all rows. The newer KFin SUMMARY template emits
zero-balance schemes with single-space-separated trio cells that
the v0.8 regex required `\t\t` between; v1.0 picks them up too.
- **AMC-header detection extended** to include the `Fund House`
suffix. v0.8's regex only matched `Mutual Fund` / `MF` suffixes,
so schemes from a few newer AMCs whose names end in `Fund House`
ended up bucketed under the previous AMC.
- **ISIN / AMFI enrichment has a direct-ISIN fallback** path via
`MFISINDb.direct_isin_lookup` for the case where multi-line
`Registrar:` rendering corrupts the RTA token.

### Fixed

- **CAMS SUMMARY `valuation.date` no longer mis-parses to year 201**
(was a column-boundary bug — the NAVDate column treated as
right-aligned with a 42pt width clipped the trailing year digit,
then Pydantic mis-coerced the `01-Jan-201` string).
- **CDSL multi-account statements** (5+ demat accounts on one PDF) are
now parsed correctly. Earlier the page-3+ scan only kicked in from
page 8, dropping holdings sections that landed on pages 4-7.
- **CDSL MF holdings** rows with `DIRECT` (or any non-`ARN-XXXX`
distribution-mode token) now correctly populate `pnl` and `return_`.

## 0.9.0 - 2026-05-22
- Add support for CDSL sttements
- Add support for CDSL statements
- Drop support for Python 3.9 and 3.10; minimum supported version is now 3.11
- Support PyMuPDF >= 1.25 (1.27.x tested). Older `<1.25` pin removed.
- Bump `casparser-isin` to `>= 2026.5.1` (new DB format v2 with
Expand All @@ -11,20 +75,9 @@
field (Python attribute `return_`) also gets the comma-stripping
treatment; previously NSDL MF folio rows with a return value of
1 lakh or more would fail Decimal validation.
- Parser robustness fixes for PyMuPDF 1.25+ text extraction quirks:
- Re-emit visual rows as separate blocks for CAMS/KFINTECH so the
table header / folio header no longer get merged when the new
block grouping collapses them into a single PyMuPDF block.
- Recover the registrar value (e.g. `KFINTECH`) when it wraps to the
next line.
- Recover the advisor value when the scheme name wraps before the
advisor closing paren.
- Pull ISIN/Advisor onto the scheme line when long scheme names wrap.
- Tax transactions (`*** Stamp Duty ***`, STT, TDS) no longer absorb
spurious units when an adjacent column wraps onto the same row.
- NSDL holdings: widen the y-band tolerance, drop the strict
multiline `$` anchoring, and accept tab-separated wrapped names so
the regexes match consistently across Python 3.11–3.14.
- Parser robustness fixes for PyMuPDF 1.25+ text extraction quirks
(all superseded in 1.0.0 by the pypdfium2 rewrite, kept here for
the historical record).

## 0.8.1 - 2025-09-21
- NSDL parser bug fixes
Expand Down
32 changes: 17 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@
[![codecov](https://codecov.io/gh/codereverser/casparser/branch/main/graph/badge.svg?token=DYZ7TXWRGI)](https://codecov.io/gh/codereverser/casparser)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/casparser)

Parse Consolidated Account Statement (CAS) PDF files generated from CAMS/KFINTECH
Parse Consolidated Account Statement (CAS) PDF files generated from
CAMS, KFintech, NSDL, and CDSL.

`casparser` also includes a command line tool with the following analysis tools
- `summary`- print portfolio summary
Expand All @@ -19,13 +20,8 @@ Parse Consolidated Account Statement (CAS) PDF files generated from CAMS/KFINTEC
pip install -U casparser
```

### with faster PyMuPDF parser
```bash
pip install -U 'casparser[fast]'
```

**Note:** Enabling this dependency could result in licensing changes. Check the
[License](#license) section for more details
Since v1.0 the parser is built on [pypdfium2](https://github.com/pypdfium2-team/pypdfium2)
(Apache-2.0 / BSD-3) — no optional PDF backends, no GPL/AGPL dependencies.


## Usage
Expand All @@ -50,7 +46,7 @@ csv_str = casparser.read_cas_pdf("/path/to/cas/file.pdf", "password", output="cs
"from": "YYYY-MMM-DD",
"to": "YYYY-MMM-DD"
},
"file_type": "CAMS/KARVY/UNKNOWN",
"file_type": "CAMS/KFINTECH/NSDL/CDSL/UNKNOWN",
"cas_type": "DETAILED/SUMMARY",
"investor_info": {
"email": "string",
Expand Down Expand Up @@ -122,6 +118,9 @@ Notes:
- `MISC`
- `dividend_rate` is applicable only for `DIVIDEND_PAYOUT` and
`DIVIDEND_REINVESTMENT` transactions.
- NSDL and CDSL statements return a different top-level shape with
`accounts[].equities[]` and `accounts[].mutual_funds[]` instead of
`folios[].schemes[]`. See `casparser.types.NSDLCASData` for details.

### CLI

Expand All @@ -143,8 +142,6 @@ Usage: casparser [-o output_file.json|output_file.csv] [-p password] [-s] [-a] C
--gains-112a ask|FY2020-21 Generate Capital Gains Report - 112A format for
a given financial year - Use 'ask' for a prompt
from available options (BETA)
--force-pdfminer Force PDFMiner parser even if MuPDF is
detected

--version Show the version and exit.
-h, --help Show this message and exit.
Expand Down Expand Up @@ -199,11 +196,16 @@ failing scheme name(s).

## License

CASParser is distributed under MIT license by default. However enabling the optional dependency
`mupdf/fast` would imply the use of [PyMuPDF](https://github.com/pymupdf/PyMuPDF) /
[MuPDF](https://mupdf.com/license.html) and hence the licenses GNU GPL v3 and GNU Affero GPL v3
would apply. Copies of all licenses have been included in this repository. - _IANAL_
CASParser is distributed under the MIT license. Up to v0.8 the optional
`mupdf` / `fast` extra pulled in [PyMuPDF](https://github.com/pymupdf/PyMuPDF) /
[MuPDF](https://mupdf.com/license.html), which would have caused GNU GPL v3
and GNU Affero GPL v3 to apply transitively. v1.0 dropped that extra
(the PyMuPDF and pdfminer.six backends are gone; the parser now runs on
[pypdfium2](https://github.com/pypdfium2-team/pypdfium2), which is dual
Apache-2.0 / BSD-3), so casparser is now pure MIT end-to-end.

## Resources
1. [CAS from CAMS](https://www.camsonline.com/Investors/Statements/Consolidated-Account-Statement)
2. [CAS from Karvy/Kfintech](https://mfs.kfintech.com/investor/General/ConsolidatedAccountStatement)
3. [NSDL Consolidated Account Statement](https://nsdlcas.nsdl.com/)
4. [CDSL Consolidated Account Statement](https://www.cdslindia.com/Investors/Cas.html)
2 changes: 1 addition & 1 deletion casparser/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,4 @@
"CapitalGainsReport",
]

__version__ = "0.9.0"
__version__ = "1.0.0"
6 changes: 3 additions & 3 deletions casparser/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ def formatINR(number):
else:
last3 = int_part[-3:]
rest = int_part[:-3]
groups = [rest[max(0, i - 2):i or None] for i in range(len(rest), 0, -2)][::-1]
groups = [rest[max(0, i - 2) : i or None] for i in range(len(rest), 0, -2)][::-1]
if groups and groups[0]:
r = ",".join(groups + [last3])
else:
Expand Down Expand Up @@ -82,7 +82,7 @@ def print_nsdl(parsed_data: NSDLCASData):
)
summary_table.add_row(Padding("File Type :", spacing), f"[bold]{data['file_type']}[/]")
# summary_table.add_row(Padding("CAS Type :", spacing), f"[bold]{data['cas_type']}[/]")
for key, value in data["investor_info"].items():
for key, value in (data.get("investor_info") or {}).items():
summary_table.add_row(
Padding(f"{key.capitalize()} :", spacing), re.sub(r"[^\S\r\n]+", " ", value)
)
Expand Down Expand Up @@ -208,7 +208,7 @@ def print_summary(parsed_data: CASData, output_filename=None, include_zero_folio
summary_table.add_row(Padding("File Type :", spacing), f"[bold]{data['file_type']}[/]")
summary_table.add_row(Padding("CAS Type :", spacing), f"[bold]{data['cas_type']}[/]")

for key, value in data["investor_info"].items():
for key, value in (data.get("investor_info") or {}).items():
summary_table.add_row(
Padding(f"{key.capitalize()} :", spacing), re.sub(r"[^\S\r\n]+", " ", value)
)
Expand Down
Loading