APPENDIX B: Metadata Table and Data Dictionary
https://prod.democratizing-data.tacc.utexas.edu/agency_runs
Column name | Description | Data type | Length | Is nullable |
---|---|---|---|---|
id | unique identifier to the agency_run table | bigint | 0 | NO |
agency | name of the agency for which the run was performed | varchar | 32 | NO |
version | version of the run for the agency; allows multiple versions on the same datasets, or possibly new runs for the same agency but with different input datasets | varchar | 32 | NO |
run_date | approximate date the run was performed | date | 0 | YES |
last_updated_date | last time the row was updated; generally the time of creation of the row | datetime | 0 | NO |
https://prod.democratizing-data.tacc.utexas.edu/asjc
A publication's research areas are defined through Elsevier's All Science Journal Classification scheme. More information about this classification system can be found here: https://service.elsevier.com/app/answers/detail/a_id/15181/supporthub/scopus/
Column name | Description | Data type | Length | Is nullable |
---|---|---|---|---|
id | unique identifier for this entry | bigint | 0 | NO |
run_id | identifies the agency run for which this entry was determined, foreign key to agency_run.id | bigint | 0 | NO |
code | the All Science Journal Classification code which defines the research area of a journal and the articles it contains; there may be more than one ASJC code for each journal/publication, and the 334 codes are used here to provide a relatively precise definition of research area | bigint | 0 | NO |
label | the All Science Journal Classification name | nvarchar | -1 | YES |
last_updated_date | last time the row was updated; generally the time of creation of the row | datetime | 0 | NO |
​ | ​ | ​ | ​ |
https://prod.democratizing-data.tacc.utexas.edu/authors
Column name | Description | Data type | Length | Is nullable |
---|---|---|---|---|
id | unique identifier for this entry | bigint | 0 | NO |
run_id | identifies the agency run for which this entry was determined, foreign key to agency_run.id | bigint | 0 | NO |
external_id | id assigned by Elsevier to the author; this allow different authors to be identified across publications, even if they used different name variations | varchar | 128 | YES |
given_name | the unique given name of the author as determined by Elsevier | nvarchar | 150 | YES |
family_name | the unique family name of the author as determined by Elsevier | nvarchar | 150 | YES |
last_updated_date | last time the row was updated; generally the time of creation of the row | datetime | 0 | NO |
​ | ​ | ​ | ​ |
https://prod.democratizing-data.tacc.utexas.edu/author_affiliations
Column name | Description | Data type | Length | Is nullable |
---|---|---|---|---|
id | unique identifier for this entry | bigint | 0 | NO |
run_id | identifies the agency run for which this entry was determined, foreign key to agency_run.id | bigint | 0 | NO |
publication_author_id | identifies the publication_author here linked to the publication affiliation | bigint | 0 | YES |
publication_affiliation_id | identifies the publication_affiliation entry here linked to a publication author | bigint | 0 | YES |
last_updated_date | last time the row was updated; generally the time of creation of the row | date | 0 | NO |
​ | ​ | ​ | ​ |
https://prod.democratizing-data.tacc.utexas.edu/dataset_aliases
(Note that the endpoint https://prod.democratizing-data.tacc.utexas.edu/datasets joins dataset_alias and agency_run, the schema being a concatenation of the two.)
(To filter dataset_alias by other parameters, use https://prod.democratizing-data.tacc.utexas.edu/topics/{topic_id}/datasets, https://prod.democratizing-data.tacc.utexas.edu/authors/{author_id}/datasets, https://prod.democratizing-data.tacc.utexas.edu/publications/{publication_id}/datasets)
Column name | Description | Data type | Length | Is nullable |
---|---|---|---|---|
id | unique identifier for this entry | bigint | 0 | NO |
run_id | identifies the agency run for which this entry was determined, foreign key to agency_run.id | bigint | 0 | NO |
alias_id | the alias_id as provided by Elsevier | bigint | 0 | NO |
parent_alias_id | identifies the parent dataset entry in this table, as identified by the alias_id | bigint | 0 | YES |
alias | the name of the data set or the alias depending on whether alias_id==parent_alias_id or not | varchar | 160 | YES |
alias_type | indicates the alias type; options include main, alias, acronym, doi | varchar | 50 | YES |
url | URL to information about the dataset | varchar | 2048 | YES |
last_updated_date | last time the row was updated; generally the time of creation of the row | datetime | 0 | NO |
​ | ​ | ​ | ​ |
https://prod.democratizing-data.tacc.utexas.edu/dyads
Column name | Description | Data type | Length | Is nullable |
---|---|---|---|---|
id | unique identifier for this entry | bigint | 0 | NO |
run_id | identifies the agency run for which this entry was determined, foreign key to agency_run.id | bigint | 0 | NO |
publication_id | foreign key the publication table's id column, identifying the publication within which the dataset reference represented by this dyad was identified | bigint | 0 | NO |
elsevier_id | REMOVE | int | 0 | NO |
dataset_alias_id | foreign key to the dataset_alias table's id column identifying the match made between this dyad and a dataset alias provided by an agency; if no such match was found this column has a NULL | bigint | 0 | YES |
alias_id | the intrinsic id assigned by Elsevier to the dataset alias, corresponding to the alias_id column in the dataset_alias table | bigint | 0 | YES |
mention_candidate | the phrase in the publication that was deemed by the algorithm to reflect a reference to a dataset | varchar | 1028 | NO |
snippet* | snippet of text surrounding the mention_candidate, meant to provide contextual information to reviewers/validators | varchar | -1 | YES |
last_updated_date | last time the row was updated; generally the time of creation of the row | datetime | 0 | NO |
is_fuzzy | column has value 1 if the matching between mention candidate and the dataset alias was performed using a fuzzy algorithm, 0 otherwise | bit | 0 | YES |
fuzzy_score | in case the matching between mention candidate and the dataset alias was performed using a fuzzy algorithm, this column stores the score indicating how certain match was deemed to be | real | 0 | YES |
*This column is set to NULL in the publicly available databases, and is only shown in restricted use databases.
https://prod.democratizing-data.tacc.utexas.edu/dyad_models
Column name | Description | Data type | Length | Is nullable |
---|---|---|---|---|
id | unique identifier for this entry | bigint | 0 | NO |
run_id | identifies the agency run for which this entry was determined, foreign key to agency_run.id | bigint | 0 | NO |
dyad_id | identifies the dyad | bigint | 0 | NO |
model_id | identifies the model | bigint | 0 | NO |
score | the score of this model for the dyad | real | 0 | YES |
last_updated_date | last time the row was updated; generally the time of creation of the row | datetime | 0 | NO |
​ | ​ | ​ | ​ |
https://prod.democratizing-data.tacc.utexas.edu/issns
Column name | Description | Data type | Length | Is nullable |
---|---|---|---|---|
id | unique identifier for this entry | bigint | 0 | NO |
run_id | identifies the agency run for which this entry was determined, foreign key to agency_run.id | bigint | 0 | NO |
journal_id | foreign key to the journal table's id column, identifying the journal for this ISSN | bigint | 0 | YES |
ISSN | the ISSN/ISBN codes for the referenced journal/source | varchar | 13 | YES |
last_updated_date | last time the row was updated; generally the time of creation of the row | datetime | 0 | NO |
​ | ​ | ​ | ​ |
https://prod.democratizing-data.tacc.utexas.edu/journals
Column name | Description | Data type | Length | Is nullable |
---|---|---|---|---|
id | unique identifier and primary key for this table | bigint | 0 | NO |
run_id | foreign key, identifier of the agency run for which this entry was determined | bigint | 0 | NO |
publisher_id | foreign key to the publisher table, identifying the publisher for this journal at the time the agency run was executed | bigint | 0 | YES |
external_id | the scopus ID for the journal/source | varchar | 128 | YES |
title | the name of the journal/source that the publication was published in | varchar | 1028 | NO |
cite_score | citescore is an Elsevier derived metric that measures the relative standing of a journal | decimal | 0 | YES |
last_updated_date | last time the row was updated; generally the time of creation of the row | datetime | 0 | NO |
​ | ​ | ​ | ​ |
https://prod.democratizing-data.tacc.utexas.edu/models
Column name | Description | Data type | Length | Is nullable |
---|---|---|---|---|
id | unique identifier for this entry | bigint | 0 | NO |
name | the name of the model | varchar | 32 | NO |
github_commit_url | the github url where the commit for this model can be found | varchar | 1024 | YES |
description | description of the model | nvarchar | -1 | YES |
last_updated_date | last time the row was updated; generally the time of the creation of the row | datetime | 0 | NO |
​ | ​ | ​ | ​ |
https://prod.democratizing-data.tacc.utexas.edu/publications
(To filter publications using other parameters, use https://prod.democratizing-data.tacc.utexas.edu/topics/{topic_id}/publications, https://prod.democratizing-data.tacc.utexas.edu/authors/{author_id}/publications, https://prod.democratizing-data.tacc.utexas.edu/datasets/{parent_alias_id}/publications)
Column name | Description | Data type | Length | Is nullable |
---|---|---|---|---|
id | unique identifier for this entry | bigint | 0 | NO |
run_id | identifies the agency run for which this entry was determined, foreign key to agency_run.id | bigint | 0 | NO |
journal_id | foreign key to the journal for this publication | bigint | 0 | YES |
external_id | the scopus ID (for Elsevier publications) of this publication | varchar | 128 | YES |
title | title of the publication | varchar | 400 | YES |
doi | DOI of the publication | varchar | 80 | YES |
year | the year that the publication was published as recorded in scopus | int | 0 | YES |
month | the month of publication, which may not be available; this will be an integer value, such as 1 = January | int | 0 | YES |
pub_type | the type of publication, which includes: article, review, book, book chapter, letter | varchar | 30 | YES |
citation_count | the number of times this publication is cited in scopus | int | 0 | YES |
fw_citation_impact | the Field Weighted Citation impact (FWCI) for the publication, which is a measure for how impactful or important a publication is as measured through normalized citations; the number of times cited is divided by the expected number of citations of articles in the same year, subject and publication type, and the world average across papers is 1.0 for this measure; this measure also changes over time | float | 0 | YES |
last_updated_date | last time the row was updated; generally the time of creation of the row | datetime | 0 | NO |
​ | ​ | ​ | ​ |
https://prod.democratizing-data.tacc.utexas.edu/publication_affiliations
Column name | Description | Data type | Length | Is nullable |
---|---|---|---|---|
id | unique identifier of the affiliation table | bigint | 0 | NO |
run_id | identifies the agency run for which this entry was determined, foreign key to agency_run.id | bigint | 0 | NO |
external_id | id assigned by Elsevier to this affiliation | varchar | 128 | YES |
institution_name | the name of the institution to which an author was associated | nvarchar | 750 | YES |
address | the address of the author, most likely that of their institution | nvarchar | 750 | YES |
country_code | the three let6ter country code associated to the affiliation, most likely of the institution | nvarchar | 10 | YES |
last_updated_date | last time the row was updated; generally the time of creation of the row | datetime | 0 | NO |
city | the city for the address or institute in this affiliation | nvarchar | 128 | YES |
state | if appropriate, the state for the address or institute in this affiliation | nvarchar | 128 | YES |
postal_code | if appropriate, the postal code for this affiliation | nvarchar | 64 | YES |
​ | ​ | ​ | ​ |
https://prod.democratizing-data.tacc.utexas.edu/publication_asjcs
Column name | Description | Data type | Length | Is nullable |
---|---|---|---|---|
id | unique identifier for this entry | bigint | 0 | NO |
run_id | identifies the agency run for which this entry was determined, foreign key to agency_run.id | bigint | 0 | NO |
publication_id | foreign key to publication table's id column, identifying the publication in this relation | bigint | 0 | NO |
asjc_id | foreign key to the ASJC | bigint | 0 | NO |
last_updated_date | last time the row was updated; generally the time of creation of the row | datetime | 0 | NO |
​ | ​ | ​ | ​ |
https://prod.democratizing-data.tacc.utexas.edu/publication_authors
Column name | Description | Data type | Length | Is nullable |
---|---|---|---|---|
id | unique identifier for this entry | bigint | 0 | NO |
run_id | identifies the agency run for which this entry was determined, foreign key to agency_run.id | bigint | 0 | NO |
publication_id | foreign key to the publication for this author | bigint | 0 | NO |
author_id | foreign key to the table with scopus author entries | bigint | 0 | NO |
author_position | position of author in the list of authors on the publication | int | 0 | YES |
last_updated_date | last time the row was updated; generally the time of creation of the row | datetime | 0 | NO |
​ | ​ | ​ | ​ |
https://prod.democratizing-data.tacc.utexas.edu/publication_topics
Column name | Description | Data type | Length | Is nullable |
---|---|---|---|---|
id | unique identifier for this entry | bigint | 0 | NO |
run_id | identifies the agency run for which this entry was determined, foreign key to agency_run.id | bigint | 0 | NO |
publication_id | foreign key to the topic table's id column identifying the publication in this relation between publications and topics | bigint | 0 | NO |
topic_id | foreign key to the topic table's id column identifying the topic in this relation between publications and topics | bigint | 0 | NO |
score | TBD | real | 0 | YES |
last_updated_date | last time the row was updated; generally the time of creation of the row | datetime | 0 | NO |
​ | ​ | ​ | ​ |
https://prod.democratizing-data.tacc.utexas.edu/publishers
Column name | Description | Data type | Length | Is nullable |
---|---|---|---|---|
id | unique identifier for this entry | bigint | 0 | NO |
run_id | identifies the agency run for which this entry was determined, foreign key to agency_run.id | bigint | 0 | NO |
external_id | external identifier of this publisher in Elsevier's scopus repository | nvarchar | 128 | YES |
name | name of the publisher | nvarchar | 120 | YES |
last_updated_date | last time the row was updated; generally the time of creation of the row | datetime | 0 | NO |
​ | ​ | ​ | ​ |
https://prod.democratizing-data.tacc.utexas.edu/topics
Topics are defined through Elsevier's Topic Prominence in Science methodology. More information about these Topics can be found here: https://www.elsevier.com/solutions/scival/features/topic-prominence-in-science
(To filter topic by other parameters, use https://prod.democratizing-data.tacc.utexas.edu/authors/{author_id}/topics, https://prod.democratizing-data.tacc.utexas.edu/datasets/{parent_alias_id}/topics, https://prod.democratizing-data.tacc.utexas.edu/publications/{publication_id}/topics)
Column name | Description | Data type | Length |
---|