APPENDIX B: Metadata Table and Data Dictionary

The following metadata tables apply to the data that can be accessed both in SciServer and through using the APIs. For each API call, results can be filtered by agency designation using url parameter agency={AGENCY}. For example, https://prod.democratizing-data.tacc.utexas.edu/authors?agency=USDA

In each case below, the API returns data in the schema listed in the table, unless noted otherwise. The API endpoint can be found below each table.

agency_run: the table of runs for the different agencies and their datasets

https://prod.democratizing-data.tacc.utexas.edu/agency_runs
Column name
Description
Data type
Length
Is nullable
id
unique identifier to the agency_run table
bigint
0
NO
agency
name of the agency for which the run was performed
varchar
32
NO
version
version of the run for the agency; allows multiple versions on the same datasets, or possibly new runs for the same agency but with different input datasets
varchar
32
NO
run_date
approximate date the run was performed
date
0
YES
last_updated_date
last time the row was updated; generally the time of creation of the row
datetime
0
NO

asjc: the table with the All Science Journal Classification codes which define the research area of a journal and the articles it contains

https://prod.democratizing-data.tacc.utexas.edu/asjc A publication's research areas are defined through Elsevier's All Science Journal Classification scheme. More information about this classification system can be found here: https://service.elsevier.com/app/answers/detail/a_id/15181/supporthub/scopus/
Column name
Description
Data type
Length
Is nullable
id
unique identifier for this entry
bigint
0
NO
run_id
identifies the agency run for which this entry was determined, foreign key to agency_run.id
bigint
0
NO
code
the All Science Journal Classification code which defines the research area of a journal and the articles it contains; there may be more than one ASJC code for each journal/publication, and the 334 codes are used here to provide a relatively precise definition of research area
bigint
0
NO
label
the All Science Journal Classification name
nvarchar
-1
YES
last_updated_date
last time the row was updated; generally the time of creation of the row
datetime
0
NO

author: the table with author information

https://prod.democratizing-data.tacc.utexas.edu/authors
Column name
Description
Data type
Length
Is nullable
id
unique identifier for this entry
bigint
0
NO
run_id
identifies the agency run for which this entry was determined, foreign key to agency_run.id
bigint
0
NO
external_id
id assigned by Elsevier to the author; this allow different authors to be identified across publications, even if they used different name variations
varchar
128
YES
given_name
the unique given name of the author as determined by Elsevier
nvarchar
150
YES
family_name
the unique family name of the author as determined by Elsevier
nvarchar
150
YES
last_updated_date
last time the row was updated; generally the time of creation of the row
datetime
0
NO

author_affiliation: the table linking authors to their affiliations in a publication

https://prod.democratizing-data.tacc.utexas.edu/author_affiliations
Column name
Description
Data type
Length
Is nullable
id
unique identifier for this entry
bigint
0
NO
run_id
identifies the agency run for which this entry was determined, foreign key to agency_run.id
bigint
0
NO
publication_author_id
identifies the publication_author here linked to the publication affiliation
bigint
0
YES
publication_affiliation_id
identifies the publication_affiliation entry here linked to a publication author
bigint
0
YES
last_updated_date
last time the row was updated; generally the time of creation of the row
date
0
NO

dataset_alias: the datasets provided by an agency for a particular run and possible aliases

https://prod.democratizing-data.tacc.utexas.edu/dataset_aliases (Note that the endpoint https://prod.democratizing-data.tacc.utexas.edu/datasets joins dataset_alias and agency_run, the schema being a concatenation of the two.) (To filter dataset_alias by other parameters, use https://prod.democratizing-data.tacc.utexas.edu/topics/{topic_id}/datasets, https://prod.democratizing-data.tacc.utexas.edu/authors/{author_id}/datasets, https://prod.democratizing-data.tacc.utexas.edu/publications/{publication_id}/datasets)
Column name
Description
Data type
Length
Is nullable
id
unique identifier for this entry
bigint
0
NO
run_id
identifies the agency run for which this entry was determined, foreign key to agency_run.id
bigint
0
NO
alias_id
the alias_id as provided by Elsevier
bigint
0
NO
parent_alias_id
identifies the parent dataset entry in this table, as identified by the alias_id
bigint
0
YES
alias
the name of the data set or the alias depending on whether alias_id==parent_alias_id or not
varchar
160
YES
alias_type
indicates the alias type; options include main, alias, acronym, doi
varchar
50
YES
url
URL to information about the dataset
varchar
2048
YES
last_updated_date
last time the row was updated; generally the time of creation of the row
datetime
0
NO

dyad: the core table with dyads representing dataset references

https://prod.democratizing-data.tacc.utexas.edu/dyads
Column name
Description
Data type
Length
Is nullable
id
unique identifier for this entry
bigint
0
NO
run_id
identifies the agency run for which this entry was determined, foreign key to agency_run.id
bigint
0
NO
publication_id
foreign key the publication table's id column, identifying the publication within which the dataset reference represented by this dyad was identified
bigint
0
NO
elsevier_id
REMOVE
int
0
NO
dataset_alias_id
foreign key to the dataset_alias table's id column identifying the match made between this dyad and a dataset alias provided by an agency; if no such match was found this column has a NULL
bigint
0
YES
alias_id
the intrinsic id assigned by Elsevier to the dataset alias, corresponding to the alias_id column in the dataset_alias table
bigint
0
YES
mention_candidate
the phrase in the publication that was deemed by the algorithm to reflect a reference to a dataset
varchar
1028
NO
snippet*
snippet of text surrounding the mention_candidate, meant to provide contextual information to reviewers/validators
varchar
-1
YES
last_updated_date
last time the row was updated; generally the time of creation of the row
datetime
0
NO
is_fuzzy
column has value 1 if the matching between mention candidate and the dataset alias was performed using a fuzzy algorithm, 0 otherwise
bit
0
YES
fuzzy_score
in case the matching between mention candidate and the dataset alias was performed using a fuzzy algorithm, this column stores the score indicating how certain match was deemed to be
real
0
YES
*This column is set to NULL in the publicly available databases, and is only shown in restricted use databases.

dyad_model: the table with model scores for particular entries in the dyad table

https://prod.democratizing-data.tacc.utexas.edu/dyad_models
Column name
Description
Data type
Length
Is nullable
id
unique identifier for this entry
bigint
0
NO
run_id
identifies the agency run for which this entry was determined, foreign key to agency_run.id
bigint
0
NO
dyad_id
identifies the dyad
bigint
0
NO
model_id
identifies the model
bigint
0
NO
score
the score of this model for the dyad
real
0
YES
last_updated_date
last time the row was updated; generally the time of creation of the row
datetime
0
NO

issn: the table with ISSN/ISBN codes for the journal

https://prod.democratizing-data.tacc.utexas.edu/issns
Column name
Description
Data type
Length
Is nullable
id
unique identifier for this entry
bigint
0
NO
run_id
identifies the agency run for which this entry was determined, foreign key to agency_run.id
bigint
0
NO
journal_id
foreign key to the journal table's id column, identifying the journal for this ISSN
bigint
0
YES
ISSN
the ISSN/ISBN codes for the referenced journal/source
varchar
13
YES
last_updated_date
last time the row was updated; generally the time of creation of the row
datetime
0
NO

journal: the table linking publications to the journal in which they appeared

https://prod.democratizing-data.tacc.utexas.edu/journals
Column name
Description
Data type
Length
Is nullable
id
unique identifier and primary key for this table
bigint
0
NO
run_id
foreign key, identifier of the agency run for which this entry was determined
bigint
0
NO
publisher_id
foreign key to the publisher table, identifying the publisher for this journal at the time the agency run was executed
bigint
0
YES
external_id
the scopus ID for the journal/source
varchar
128
YES
title
the name of the journal/source that the publication was published in
varchar
1028
NO
cite_score
citescore is an Elsevier derived metric that measures the relative standing of a journal
decimal
0
YES
last_updated_date
last time the row was updated; generally the time of creation of the row
datetime
0
NO

model: the table with the Kaggle models that are run

https://prod.democratizing-data.tacc.utexas.edu/models
Column name
Description
Data type
Length
Is nullable
id
unique identifier for this entry
bigint
0
NO
name
the name of the model
varchar
32
NO
github_commit_url
the github url where the commit for this model can be found
varchar
1024
YES
description
description of the model
nvarchar
-1
YES
last_updated_date
last time the row was updated; generally the time of the creation of the row
datetime
0
NO

publication: the publications discovered in a run and their metadata

https://prod.democratizing-data.tacc.utexas.edu/publications (To filter publications using other parameters, use https://prod.democratizing-data.tacc.utexas.edu/topics/{topic_id}/publications, https://prod.democratizing-data.tacc.utexas.edu/authors/{author_id}/publications, https://prod.democratizing-data.tacc.utexas.edu/datasets/{parent_alias_id}/publications)
Column name
Description
Data type
Length
Is nullable
id
unique identifier for this entry
bigint
0
NO
run_id
identifies the agency run for which this entry was determined, foreign key to agency_run.id
bigint
0
NO
journal_id
foreign key to the journal for this publication
bigint
0
YES
external_id
the scopus ID (for Elsevier publications) of this publication
varchar
128
YES
title
title of the publication
varchar
400
YES
doi
DOI of the publication
varchar
80
YES
year
the year that the publication was published as recorded in scopus
int
0
YES
month
the month of publication, which may not be available; this will be an integer value, such as 1 = January
int
0
YES
pub_type
the type of publication, which includes: article, review, book, book chapter, letter
varchar
30
YES
citation_count
the number of times this publication is cited in scopus
int
0
YES
fw_citation_impact
the Field Weighted Citation impact (FWCI) for the publication, which is a measure for how impactful or important a publication is as measured through normalized citations; the number of times cited is divided by the expected number of citations of articles in the same year, subject and publication type, and the world average across papers is 1.0 for this measure; this measure also changes over time
float
0
YES
last_updated_date
last time the row was updated; generally the time of creation of the row
datetime
0
NO

publication_affiliation: the table with affiliations linked to a publication

https://prod.democratizing-data.tacc.utexas.edu/publication_affiliations
Column name
Description
Data type
Length
Is nullable
id
unique identifier of the affiliation table
bigint
0
NO
run_id
identifies the agency run for which this entry was determined, foreign key to agency_run.id
bigint
0
NO
external_id
id assigned by Elsevier to this affiliation
varchar
128
YES
institution_name
the name of the institution to which an author was associated
nvarchar
750
YES
address
the address of the author, most likely that of their institution
nvarchar
750
YES
country_code
the three let6ter country code associated to the affiliation, most likely of the institution
nvarchar
10
YES
last_updated_date
last time the row was updated; generally the time of creation of the row
datetime
0
NO
city
the city for the address or institute in this affiliation
nvarchar
128
YES
state
if appropriate, the state for the address or institute in this affiliation
nvarchar
128
YES
postal_code
if appropriate, the postal code for this affiliation
nvarchar
64
YES

publication_asjc: the table linking a publication to its ASJC code(s)

https://prod.democratizing-data.tacc.utexas.edu/publication_asjcs
Column name
Description
Data type
Length
Is nullable
id
unique identifier for this entry
bigint
0
NO
run_id
identifies the agency run for which this entry was determined, foreign key to agency_run.id
bigint
0
NO
publication_id
foreign key to publication table's id column, identifying the publication in this relation
bigint
0
NO
asjc_id
foreign key to the ASJC
bigint
0
NO
last_updated_date
last time the row was updated; generally the time of creation of the row
datetime
0
NO

publication_author: the table linking publication and author tables

https://prod.democratizing-data.tacc.utexas.edu/publication_authors
Column name
Description
Data type
Length
Is nullable
id
unique identifier for this entry
bigint
0
NO
run_id
identifies the agency run for which this entry was determined, foreign key to agency_run.id
bigint
0
NO
publication_id
foreign key to the publication for this author
bigint
0
NO
author_id
foreign key to the table with scopus author entries
bigint
0
NO
author_position
position of author in the list of authors on the publication
int
0
YES
last_updated_date
last time the row was updated; generally the time of creation of the row
datetime
0
NO

publication_topic: the table identifying the topic assigned to a publication

https://prod.democratizing-data.tacc.utexas.edu/publication_topics
Column name
Description
Data type
Length
Is nullable
id
unique identifier for this entry
bigint
0
NO
run_id
identifies the agency run for which this entry was determined, foreign key to agency_run.id
bigint
0
NO
publication_id
foreign key to the topic table's id column identifying the publication in this relation between publications and topics
bigint
0
NO
topic_id
foreign key to the topic table's id column identifying the topic in this relation between publications and topics
bigint
0
NO
score
TBD
real
0
YES
last_updated_date
last time the row was updated; generally the time of creation of the row
datetime
0
NO

publisher: the table with a list of publishers that can be linked to journals

https://prod.democratizing-data.tacc.utexas.edu/publishers
Column name
Description
Data type
Length
Is nullable
id
unique identifier for this entry
bigint
0
NO
run_id
identifies the agency run for which this entry was determined, foreign key to agency_run.id
bigint
0
NO
external_id
external identifier of this publisher in Elsevier's scopus repository
nvarchar
128
YES
name
name of the publisher
nvarchar
120
YES
last_updated_date
last time the row was updated; generally the time of creation of the row
datetime
0
NO

topic: the table with Topics defined by Elsevier, consistinging of topic names with three concatenated keywords

https://prod.democratizing-data.tacc.utexas.edu/topics Topics are defined through Elsevier's Topic Prominence in Science methodology. More information about these Topics can be found here: https://www.elsevier.com/solutions/scival/features/topic-prominence-in-science (To filter topic by other parameters, use https://prod.democratizing-data.tacc.utexas.edu/authors/{author_id}/topics, https://prod.democratizing-data.tacc.utexas.edu/datasets/{parent_alias_id}/topics, https://prod.democratizing-data.tacc.utexas.edu/publications/{publication_id}/topics)
Column name
Description
Data type
Length