Solution for How to plot data chronologically
is Given Below:
I am using matplotlib
to graph my results from a .dat
file.
The data is as follows
1145, 2021-07-17 00:00:00, bob, rome, 12.75, 65.0, 162.75
1146, 2021-07-12 00:00:00, billy larkin, italy, 93.75, 325.0, 1043.75
114, 2021-07-28 00:00:00, beatrice, rome, 1, 10, 100
29, 2021-07-25 00:00:00, Colin, italy the third, 10, 10, 50
5, 2021-07-22 00:00:00, Veronica, canada, 10, 100, 1000
1149, 1234-12-13 00:00:00, Billy Larkin, 1123, 12.75, 65.0, 162.75
I want to print a years worth of data (Jan to Dec) in the proper sequence and have my labels show up as the months, instead of the long date.
Here is my code:
import matplotlib.pyplot as plt
import csv
x = []
y = []
with open('Claims.dat','r') as csvfile:
#bar = csv.reader(csvfile, delimiter=",")
plot = csv.reader(csvfile, delimiter=",")
for row in plot:
x.append(str(row[1]))
y.append(str(row[6]))
plt.plot(x,y, label="Travel Claim Totals!", color="red", marker="o")
plt.xlabel('Months', color="red", size="large")
plt.ylabel('Totals', color="red", size="large")
plt.title('Claims Data: Team Bobbyn Second Place is the First Looser', color="Blue", weight="bold", size="large")
plt.xticks(rotation=45, horizontalalignment="right", size="small")
plt.yticks(weight="bold", size="small", rotation=45)
plt.legend()
plt.subplots_adjust(left=0.2, bottom=0.40, right=0.94, top=0.90, wspace=0.2, hspace=0)
plt.show()
I think the easiest way is to resort the data based on the date, which can be constructed using the datetime
package. Here is a min working example, based on your data
import datetime
def isfloat(value: str):
try:
float(value)
return True
except ValueError:
return False
def isdatetime(value: str):
try:
datetime.datetime.fromisoformat(value)
return True
except ValueError:
return False
data = r"""1145, 2021-07-17 00:00:00, bob, rome, 12.75, 65.0, 162.75
1146, 2021-07-12 00:00:00, billy larkin, italy, 93.75, 325.0, 1043.75
114, 2021-07-28 00:00:00, beatrice, rome, 1, 10, 100
29, 2021-07-25 00:00:00, Colin, italy the third, 10, 10, 50
5, 2021-07-22 00:00:00, Veronica, canada, 10, 100, 1000
1149, 1234-12-13 00:00:00, Billy Larkin, 1123, 12.75, 65.0, 162.75"""
for idx in range(len(data)):
data[idx] = data[idx].split(', ')
for jdx in range(len(data[idx])):
if data[idx][jdx].isnumeric(): # Is it an integer?
value = int(data[idx][jdx])
elif isfloat(data[idx][jdx]): # Is it a float?
value = float(data[idx][jdx])
elif isdatetime(data[idx][jdx]): # Is it a date?
value = datetime.datetime.fromisoformat(data[idx][jdx])
else:
value = data[idx][jdx]
data[idx][jdx] = value
data.sort(key=lambda x: x[1])
You can also sort by more specific things:
data.sort(key=lambda x: x[1].month)
Note: You might not need all the logic in the for-loop. I think the csv
package does some basic preprocessing for you, such as splitting and data type conversion.
- The easiest solution is to use
pandas
- In the sample data,
'1234-12-13'
was changed to'2020-12-13'
since'1234'
isn’t a valid year. - If you aren’t allowed to use
pandas
, then please see How to read, format, sort, and save a csv file, without pandas - Using
pandas 1.3.0
andmatplotlib 3.4.2
Imports and DataFrame
import pandas as pd
import matplotlib.dates as mdates # used to format the x-axis
import matplotlib.pyplot as plt
# read in the data
df = pd.read_csv('Claims.dat', header=None)
# convert the column to a datetime format, which ensures the data points will be plotted in chronological order
df[1] = pd.to_datetime(df[1], errors="coerce").dt.date
# display(df)
0 1 2 3 4 5 6
0 1145 2021-07-17 bob rome 12.75 65.0 162.75
1 1146 2021-07-12 billy larkin italy 93.75 325.0 1043.75
2 114 2021-07-28 beatrice rome 1.00 10.0 100.00
3 29 2021-07-25 Colin italy the third 10.00 10.0 50.00
4 5 2021-07-22 Veronica canada 10.00 100.0 1000.00
5 1149 2020-12-13 Billy Larkin 1123 12.75 65.0 162.75
Plotting the DataFrame
# plot the dataframe, which uses matplotlib as the backend
ax = df.plot(x=1, y=6, marker=".", color="r", figsize=(10, 7), label="Totals")
# format title and labels
ax.set_xlabel('Months', color="red", size="large")
ax.set_ylabel('Totals', color="red", size="large")
ax.set_title('Claims Data: Team Bobbyn Second Place is the First Looser', color="Blue", weight="bold", size="large")
# format ticks
xt = plt.xticks(rotation=45, horizontalalignment="right", size="small")
yt = plt.yticks(weight="bold", size="small", rotation=45)
# format the dates on the xaxis
myFmt = mdates.DateFormatter('%b')
ax.xaxis.set_major_formatter(myFmt)