Title: | Access Australian Census Data (2006-2021) |
---|---|
Description: | R package to interact with Australian Census Data Packs,providing an interface to extract data across multiple censuses. |
Authors: | Carlos Yáñez Santibáñez [aut, cre], Craig Alexander [ths], Kyle Walker [cph], Australian Bureau of Statistics [cph] |
Maintainer: | Carlos Yáñez Santibáñez <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.0.1.0011 |
Built: | 2024-11-13 06:16:08 UTC |
Source: | https://github.com/carlosyanez/auscensus |
Little helper function that converts tibble into a list with vectors, which is the expected attributes input for get_census_summary()
attribute_tibble_to_list(df, original = colnames(df)[1], new = colnames(df)[2])
attribute_tibble_to_list(df, original = colnames(df)[1], new = colnames(df)[2])
df |
tibble/data.frame. First column is the original value, the second the new label |
original |
name of original attribute |
new |
new naming |
list object
## Not run: attributes <- tribble(~Census_stat, ~ Group, "Age_years_60_males","60 year old male", "Age (Years): 60_males","60 year old male", "Age_years_60_males","60 year old female", "Age (Years): 60_females","60 year old female") attribute_tibble_to_list(attributes) ## End(Not run)
## Not run: attributes <- tribble(~Census_stat, ~ Group, "Age_years_60_males","60 year old male", "Age (Years): 60_males","60 year old male", "Age_years_60_males","60 year old female", "Age (Years): 60_females","60 year old female") attribute_tibble_to_list(attributes) ## End(Not run)
Little helper function that converts tibble into a list with vectors, which is the expected attributes input for get_census_summary()
calculate_percentage( df, key_col, value_col, key_value = "Total", percentage_scale = 1 )
calculate_percentage( df, key_col, value_col, key_value = "Total", percentage_scale = 1 )
df |
data frame |
key_col |
name of the column containing the "Total Label" |
value_col |
name of the column containing values |
key_value |
total label |
percentage_scale |
1 if percentage to be presented in scale 0-1, or 100 to be shown as 0%-100% |
list object
Helper function to download data
census_datapacks()
census_datapacks()
nothing
Helper function to update/download data
data_census_delete(file = NULL)
data_census_delete(file = NULL)
file |
to delete - defaults to all of them (provide full path, can obtain from data_census_info) |
nothing
Helper function to download data
data_census_download( download_dir, census_year = NULL, download_method = "wget" )
data_census_download( download_dir, census_year = NULL, download_method = "wget" )
download_dir |
Full path where to download census files (required) |
census_year |
census year to download (default to all) |
download_method |
method to pass to download.file() ("wget" as default) |
nothing
Helper function to update/download data
data_census_import(file)
data_census_import(file)
file |
file to import to the cache |
nothing
Helper function to update/download data
data_census_info()
data_census_info()
nothing
Helper function to find cache folder
find_census_cache()
find_census_cache()
nothing
This function extracts table files from each data pack (given tables and geo structure), and will collate them together into a list(), which it will return. By default it will save the processed tables in the cache folder (in parquet files), which it will use on subsquent calls.
get_census_data( census_table, geo_structure, selected_years = list_census_years(), ignore_cache = FALSE, collect_data = FALSE, attr = NULL )
get_census_data( census_table, geo_structure, selected_years = list_census_years(), ignore_cache = FALSE, collect_data = FALSE, attr = NULL )
census_table |
list of tables, in the format of the output of list_census_tables() |
geo_structure |
vector with strings of geo structures (e.g. SA1,LGA,CED) |
selected_years |
years to filter |
ignore_cache |
If TRUE, it will ignore cached files |
collect_data |
if TRUE will return data. if FALSE (default) , it will return arrow bindings to cached files |
attr |
attributes to filter on, presented as a character vector (e.g c("Age_years_60_males","Age_years_60_females")) |
data frame with data from file, filtered by division and election year
## Not run: data <- get_census_data(census_table = list_census_tables("04"), geo_structure = "LGA") names(data) ## End(Not run)
## Not run: data <- get_census_data(census_table = list_census_tables("04"), geo_structure = "LGA") names(data) ## End(Not run)
This function allows to produce a summary of one or many statistics across censuses. Results are presented in a simple summary table. The function allows to present individual statistics or an aggregation of several statistics (e.g. aggregate number of births by country to present a continental total). If the name statistic containing totals is provided, the function has an option to calculate percentages (presented either in 0-1 or 0-100 scale).
get_census_summary( table_number = NULL, geo_structure = NULL, attribute, geo_unit_names = NULL, geo_unit_codes = NULL, selected_years = list_census_years(), reference_total = NULL, percentage_scale = 1, ignore_cache = FALSE, data_source = NULL, data_collected = FALSE, census_table = NULL )
get_census_summary( table_number = NULL, geo_structure = NULL, attribute, geo_unit_names = NULL, geo_unit_codes = NULL, selected_years = list_census_years(), reference_total = NULL, percentage_scale = 1, ignore_cache = FALSE, data_source = NULL, data_collected = FALSE, census_table = NULL )
table_number |
number of selected table |
geo_structure |
character presenting the geographical structure to present stats (e.g. SA1,LGA,CED) |
attribute |
list with vectors of statistics to be summarise. Each vector element will be aggregated and presented under the item's name, e.g. list("60 year old male"=c("Age_years_60_males","Age (Years): 60_males"), |
geo_unit_names |
vector with names of the geographic structures to present. They need to correspond with geo_structure, e.g. if geo_structure="LGA", acceptable values could be c("Melbourne","Stonnington","Yarra"). If both this and geo_unit_codes are null, it will present all avaialable elements. |
geo_unit_codes |
vector with ABS codes of the geo structures to present. Similar to geo_units_names. |
selected_years |
vector with selected years to display. |
reference_total |
Optional. List containing the names of all statistics representing totals, e.g. list("Total"=c("Total_persons") |
percentage_scale |
1 if percentage to be presented in scale 0-1, or 100 to be shown as 0%-100% |
ignore_cache |
If TRUE, it will ignore cached files |
data_source |
result of get_census_data (will ignore other parameters if this is provided) |
data_collected |
TRUE if data_source is a dataset, FALSE if is a DB,arrow binding |
census_table |
Instead of using a table number, this allows for a more complex filter table, e.g. containing different table numbers. Expected format matches the output of list_census_table(). |
data frame with data from file, filtered by division and election year
## Not run: get_census_summary(table_number = "04", attribute = list("60 year old male"=c("Age_years_60_males","Age (Years): 60_males"), "60 year old female"=c("Age_years_60_males","Age (Years): 60_females")), geo_unit_names = c("Melbourne","Stonnington","Yarra"), reference_total = list("Total"=c("Total_persons")), ## End(Not run)
## Not run: get_census_summary(table_number = "04", attribute = list("60 year old male"=c("Age_years_60_males","Age (Years): 60_males"), "60 year old female"=c("Age_years_60_males","Age (Years): 60_females")), geo_unit_names = c("Melbourne","Stonnington","Yarra"), reference_total = list("Total"=c("Total_persons")), ## End(Not run)
Get list of available geopgrahies, filterable by type and name.
list_census_attributes(number = NULL, attribute_regex = NULL)
list_census_attributes(number = NULL, attribute_regex = NULL)
number |
vector containing one or more table numbers |
attribute_regex |
string with a regular expression to filter attribute names |
tibble, showing the geo type, available for each year
## Not run: # Get list of all divisions list_census_attributes() ## End(Not run)
## Not run: # Get list of all divisions list_census_attributes() ## End(Not run)
Get list of available geographies, filterable by type and name.
list_census_geo(geo_types = NULL, geo_names = NULL, geo_name_regex = NULL)
list_census_geo(geo_types = NULL, geo_names = NULL, geo_name_regex = NULL)
geo_types |
vector containing one or more geography types (i.e. "STE","CED","SA1" ). NULL by default. |
geo_names |
vector containing one or more geography names (i.e. "Melbourne", "Yarra","Stonnington" for LGAs). NULL by default. |
geo_name_regex |
string with a regular expression to filter geograhpy names (i.e. for all elements starting with M : "$M") |
tibble, showing the geo type, available for each year
## Not run: # Get list of all Commonwealth electoral divisions and Local Government Areas that start with "Mel" list_census_geo(geo_types=c("CED","LGA"),geo_name_regex="^Mel") ## End(Not run)
## Not run: # Get list of all Commonwealth electoral divisions and Local Government Areas that start with "Mel" list_census_geo(geo_types=c("CED","LGA"),geo_name_regex="^Mel") ## End(Not run)
List if a geo structure is available for a particular table , in a particular year
list_census_geo_tables(year = NULL, geo = NULL, table_number = NULL)
list_census_geo_tables(year = NULL, geo = NULL, table_number = NULL)
year |
vector with years |
geo |
vector with geo structure |
table_number |
table number |
tibble
## Not run: # Get list of all divisions list_census_geo() ## End(Not run)
## Not run: # Get list of all divisions list_census_geo() ## End(Not run)
Very simple function listing geography types (e.g. SAx, CED, etc.), for which data pack has been imported.
list_census_geo_types()
list_census_geo_types()
tibble, showing the geotype, available for each year
## Not run: # Get list of all divisions list_census_geo_types() ## End(Not run)
## Not run: # Get list of all divisions list_census_geo_types() ## End(Not run)
Get list of available geopgrahies, filterable by type and name.
list_census_tables(number = NULL, table_name_regex = NULL)
list_census_tables(number = NULL, table_name_regex = NULL)
number |
vector containing one or more table numbers |
table_name_regex |
string with a regular expression to filter table names (i.e. for all elements containing with Country : "Country") |
tibble, showing the geo type, available for each year
## Not run: # Get list of all divisions list_census_geo() ## End(Not run)
## Not run: # Get list of all divisions list_census_geo() ## End(Not run)
Very simple function listing the Census years included in this package, for which data pack has been imported.
list_census_years(mode = "available")
list_census_years(mode = "available")
mode |
Either "listed" or "available |
vector with years
## Not run: # Get list of all divisions list_census_years() ## End(Not run)
## Not run: # Get list of all divisions list_census_years() ## End(Not run)
Helper function to delete all csv in cache
remove_census_cache_csv()
remove_census_cache_csv()
nothing