Pandas

Pandas is a Python library utilized for data manipulation in data science, and I’ve been studying it for a personal finance project I’ve been working on.

One of the primary structures in Pandas is a DataFrame, which is a two-dimensional data structure with labeled axes (rows and columns) which is extremely useful for processing data.
Below is an example creating an empty dataframe, with the base pandas.Index type overriden by a pandas.DateTimeIndex.

For simplicity I’ve only created a date index for a few days, but in my project I use much longer time periods. For example to pull all of 2017’s historical stock data I could set the start as

2017-01-01

and the end as

2017-012-31

.


import pandas as pd
start_date = '2018-01-22'
end_date = '2018-01-24'
dates = pd.date_range(start_date, end_date)
df = pd.DataFrame(index=dates)
print df
Empty DataFrame
Columns: []
Index: [2010-01-22 00:00:00, 2010-01-23 00:00:00, 2010-01-24 00:00:00]

Creating a DataFrame from a CSV


import pandas as pd
dfspy = pd.read_csv("datalake/SPY.csv",
                     index_col="timestamp",
                     parse_dates=True,
                     usecols=['timestamp', 'high', 'low'],
                     na_values=['nan'])

The S&P 500 CSV I’m reading in has the following structure:
Screenshot from 2018-05-27 10-38-50