Parsing OFX Data¶
ofxtools
parses OFX messages in two steps.
The first step parses serialized OFX data into a Python data structure. The
ofxtools.Parser.OFXTree
parser parser subclasses
xml.etree.ElementTree.ElementTree
, and follows the ElementTree API:
In [1]: from ofxtools.Parser import OFXTree
In [2]: parser = OFXTree()
In [3]: with open('2015-09_amtd.ofx', 'rb') as f: # N.B. need to open file in binary mode
...: parser.parse(f)
...:
In [4]: parser.parse('2015-09_amtd.ofx') # Can also use filename directly
In [5]: type(parser._root)
Out[5]: xml.etree.ElementTree.Element
In [6]: parser.find('.//STATUS')[:] # The full ElementTree API can be used, including XPath
Out[6]:
[<Element 'CODE' at 0x7f4cc0aa4868>,
<Element 'SEVERITY' at 0x7f4cc0aa49f8>,
<Element 'MESSAGE' at 0x7f4cc0aa4d68>]
At this stage, you can modify the entire Element structure arbitrarily - move branches around the tree, add or delete elements, rewrite tags and text, etc.
The second step of parsing converts the Element structure into a hierarchy of
custom class instances, namely subclasses of ofxtools.models.base.Aggregate
and ofxtools.models.Types.Element
, following the OFX specification’s
classification of nodes into either containers (“Aggregates”) or data-bearing
leaf nodes (“Elements”, not to be confused with
xml.etree.ElementTree.Element
). This parsing step validates the
deserialized OFX data against the OFX spec, and performs type conversion
(so that, for example, an OFX element specified as a monetary quantity will
be converted to an instance of decimal.Decimal
, while another element
specified as date & time will be converted to an instance of
datetime.datetime
) The original structure of the OFX data hierarchy is
preserved through this conversion.
In [7]: ofx = parser.convert()
In [8]: type(ofx)
Out[8]: ofxtools.models.ofx.OFX
Following the OFX spec , you can navigate the OFX hierarchy using normal Python dotted-attribute access, and standard slice notation for lists.
In [9]: tx = ofx.invstmtmsgsrsv1[0].invstmtrs.invtranlist[-1]
In [10]: tx.dtposted
Out[10]: datetime.datetime(2015, 9, 16, 17, 9, 48, tzinfo=<UTC>)
In [11]: tx.trnamt
Out[11]: Decimal('4.7')
While it’s obvious that INTRANLIST is a list, it’s perhaps less obvious that INVSTMTMSGSRSV1 is also a list, since OFX specifies that a single statement response wrapper can contain multiple statements.
It can get to be a real drag crawling all the way to the bottom of
deeply-nested SGML hierarchies to extract the data that you really want, so
subclasses of ofxtools.models.base.Aggregate
provide some navigational
conveniences.
First, each Aggregate
provides proxy access to the attributes of its
SubAggregates
(and its sub-subaggregates, and so on). If the data you’re
looking for is located in a.b.c.d.e.f
, you can access it more simply
as a.f
. This won’t work across lists, of course; you have to select
an item from the list. So in this example, if c
is a list type, you could
get your data from a.c[10].f
.
Second, the upper-level Aggregates
define some human-friendly aliases
for the data structures you’re really looking for. Here’s an example.
In [12]: stmts = ofx.statements # All {``STMTRS``, ``CCSTMTRS``, ``INVSTMTRS``} in the response
In [13]: txs = stmts[0].transactions # The relevant ``*TRANLIST``
In [14]: acct = stmts[0].account # The relevant ``*ACCTFROM``
In [15]: balances = stmts[0].balances # ``INVBAL`` - use ``balance`` for bank statement ``LEDGERBAL``
In [16]: securities = ofx.securities # ``SECLIST``
In [17]: len(securities)
Out[17]: 5
In [18]: len(txs)
Out[18]: 6
In [19]: tx = txs[-1]
In [20]: tx.trnamt
Out[20]: Decimal('4.7')
In [21]: tx = txs[1]
In [22]: type(tx)
Out[22]: ofxtools.models.invest.transactions.TRANSFER
In [23]: tx.invtran.dttrade # Who wants to remember where to find the trade date?
Out[23]: datetime.datetime(2015, 9, 8, 17, 14, 8, tzinfo=<UTC>)
In [24]: tx.dttrade # That's more like it
Out[24]: datetime.datetime(2015, 9, 8, 17, 14, 8, tzinfo=<UTC>)
In [25]: tx.secid.uniqueid # Yet more layers
Out[25]: '403829104'
In [26]: tx.uniqueid # Flat access is less cognitively taxing
Out[26]: '403829104'
In [27]: tx.uniqueidtype
Out[27]: 'CUSIP'
The designers of the OFX spec did a good job avoiding name collisions. However
you will need to remember that <UNIQUEID>
always refers to securities; if
you’re looking for a transaction unique identifier, you want tx.fitid
(which is a shortcut to tx.invtran.fitid
).
Deviations from the OFX specification¶
For handling multicurrency transactions per OFX section 5.2, Aggregates
that can contain ORIGCURRENCY
have an additional curtype
attribute,
which is not part of the OFX spec. curtype
yields 'CURRENCY'
if the
money amounts have not been converted to the home currency, or yields
'ORIGCURRENCY'
if they have been converted.
YIELD
elements are renamed yld
, and FROM
elements are renamed frm
,
in order to avoid name collision with Python reserved keywords.
Proprietary OFX tags (e.g. <INTU.BANKID>
) are stripped and dropped.