clean

lovelyrita.clean.clean(dataframe)[source]

Apply a series of data cleaning steps to a dataframe of raw data

Parameters:dataframe : pandas.DataFrame
Returns:A cleaned DataFrame
lovelyrita.clean.clean_voided(dataframe, add_indicator=True)[source]

Detect voided citations

Parameters:

dataframe : pandas.DataFrame

add_indicator : bool

If True, add a column voided to the dataframe that indicates whether the ticket was voided or not.

lovelyrita.clean.convert_dollar_to_float(dollars)[source]
lovelyrita.clean.drop_null(dataframe, inplace=True)[source]

Drop null tickets

Parameters:

dataframe : pandas.DataFrame

inplace : bool

Returns:

If inplace is False, returns the input dataframe with the null citations removed.

lovelyrita.clean.find_dollar_columns(dataframe, nrows=100)[source]

Find the columns in a DataFrame that contain dollar values

lovelyrita.clean.get_datetime(dataframe)[source]

Get a datatime for each row in a DataFrame

Parameters:

dataframe : pandas.DataFrame

A dataframe with ticket_issue_date and ticket_issue_time columns

Returns:

A Series of datetime values

lovelyrita.clean.impute_missing_times(datetimes)[source]

Fill in missing times by interpolating surrounding times

Parameters:datetimes : pandas.Series
Returns:The original Series with missing times replaced by interpolated times
lovelyrita.clean.infer_datetime_format(dt)[source]

Infer the datetime format for a Series

Parameters:dt : pandas.Series
Returns:The datetime format as a string