bootleg.symbols package — Bootleg v1.1.0dev1 documentation (original) (raw)
Submodules¶
bootleg.symbols.constants module¶
Constants.
bootleg.symbols.constants.check_qid_exists(func)[source]¶
Check QID exists.
bootleg.symbols.constants.edit_op(func)[source]¶
Edit op.
bootleg.symbols.entity_profile module¶
Entity profile.
class bootleg.symbols.entity_profile.EntityObj(*, entity_id: str, mentions: List[Tuple[str, float]], title: str, description: str, types: Dict[str, List[str]] = None, relations: List[Dict[str, str]] = None)[source]¶
Bases: pydantic.main.BaseModel
Base entity object class to check types.
description_: str_¶
entity_id_: str_¶
mentions_: List[Tuple[str, float]]_¶
relations_: Optional[List[Dict[str, str]]]_¶
title_: str_¶
types_: Optional[Dict[str, List[str]]]_¶
class bootleg.symbols.entity_profile.EntityProfile(entity_symbols, type_systems=None, kg_symbols=None, edit_mode=False, verbose=False)[source]¶
Bases: object
Entity Profile object to handle and manage entity, type, and KG metadata.
add_entity(entity_obj)[source]¶
Add entity to our dump.
Parameters
entity_obj – JSON object of entity metadata
add_mention(qid: str, mention: str, score: float)[source]¶
Add the mention with its score to the QID.
Parameters
- qid – QID
- mention – mention
- score – score
add_relation(qid, relation, qid2)[source]¶
Add the relation triple.
Parameters
- qid – head QID
- relation – relation
- qid2 – tail QID
add_type(qid, type, type_system)[source]¶
Add type to QID in for the given type system.
Parameters
- qid – QID
- type – type name
- type_system – type system
Return list of all mentions.
Returns: List of strings
Return all entity QIDs.
Returns: List of strings
get_all_types(type_system)[source]¶
Return list of all type names for a type system.
Parameters
type_system – type system
Returns: List of strings
get_all_typesystems()[source]¶
Return list of all type systems.
Returns: List of strings
Get the description of an entity QID.
Parameters
qid – entity QID
Returns: string
Get the entity EID (internal number) of an entity QID.
Parameters
qid – entity QID
Returns: integer
get_entities_of_type(typename, type_system)[source]¶
Get all entities of type typename
for type system type_system
.
Parameters
- typename – type name
- type_system – type system
Returns: List of QIDs
Get the mentions for the QID.
Parameters
qid – QID
Returns: List of mentions
get_mentions_with_scores(qid)[source]¶
Get the mentions with thier scores associated with the QID.
Parameters
qid – QID
Returns: List of tuples [mention, score]
get_qid_cands(mention)[source]¶
Get the entity QID candidates of the mention.
Parameters
mention – mention
Returns: List of QIDs
get_qid_count_cands(mention)[source]¶
Get the entity QID candidates with their scores of the mention.
Parameters
mention – mention
Returns: List of tuples [QID, score]
get_relations_between(qid, qid2)[source]¶
Check if two QIDs are connected in KG and returns their relation.
Parameters
- qid – QID one
- qid2 – QID two
Returns: string relation or None
get_relations_tails_for_qid(qid)[source]¶
Get dict of relation to tail qids for given qid.
Parameters
qid – QID
Returns: Dict relation to list of tail qids for that relation
Get the title of an entity QID.
Parameters
qid – entity QID
Returns: string
get_type_typeid(type, type_system)[source]¶
Get the type type id for the type of the type_system
system.
Parameters
- type – type
- type_system – type system
Returns: type id
get_types(qid, type_system)[source]¶
Get the type names associated with the given QID for the type_system
system.
Parameters
- qid – QID
- type_system – type system
Returns: list of typename strings
classmethod load_from_cache(load_dir, edit_mode=False, verbose=False, no_kg=False, no_type=False, type_systems_to_load=None)[source]¶
Load a pre-saved profile.
Parameters
- load_dir – load directory
- edit_mode – edit mode flag, default False
- verbose – verbose flag, default False
- no_kg – load kg or not flag, default False
- no_type – load types or not flag, default False. If True, this will ignore type_systems_to_load.
- type_systems_to_load – list of type systems to load, default is None which means all types systems
Returns: entity profile object
classmethod load_from_jsonl(profile_file, max_candidates=30, max_types=10, max_kg_connections=100, edit_mode=False)[source]¶
Load an entity profile from the raw jsonl file.
Each line is a JSON object with entity metadata.
Example object:
{ "entity_id": "C000", "mentions": [["dog", 10.0], ["dogg", 7.0], ["animal", 4.0]], "title": "Dog", "types": {"hyena": ["animal"], "wiki": ["dog"]}, "relations": [ {"relation": "sibling", "object": "Q345"}, {"relation": "sibling", "object": "Q567"}, ], }
Parameters
- profile_file – file where jsonl data lives
- max_candidates – maximum entity candidates
- max_types – maximum types per entity
- max_kg_connections – maximum KG connections per entity
- edit_mode – edit mode
Returns: entity profile object
mention_exists(mention)[source]¶
Check if mention exists.
Parameters
mention – mention
Returns: Boolean
property num_entities_with_pad_and_nocand¶
Get the number of entities including a PAD and UNK entity.
Returns: integer
prune_to_entities(entities_to_keep)[source]¶
Remove all entities except those in entities_to_keep
.
Parameters
entities_to_keep – List or Set of entities to keep
Check if QID exists.
Parameters
qid – entity QID
Returns: Boolean
reidentify_entity(qid, new_qid)[source]¶
Rename qid
to new_qid
.
Parameters
- qid – old QID
- new_qid – new QID
remove_mention(qid, mention)[source]¶
Remove the mention from being associated with the QID.
Parameters
- qid – QID
- mention – mention
remove_relation(qid, relation, qid2)[source]¶
Remove the relation triple.
Parameters
- qid – head QID
- relation – relation
- qid2 – tail QID
remove_type(qid, type, type_system)[source]¶
Remove the type from QID in the given type system.
Parameters
- qid – QID
- type – type to remove
- type_system – type system
Save the profile.
Parameters
save_dir – save directory
save_to_jsonl(profile_file)[source]¶
Dump the entity dump to jsonl format.
Parameters
profile_file – file to save the data
update_entity(entity_obj)[source]¶
Update the metadata associated with the entity.
The entity must already be in our dump to be updated.
Parameters
entity_obj – JSON of entity metadata.
bootleg.symbols.entity_symbols module¶
Entity symbols.
class bootleg.symbols.entity_symbols.EntitySymbols(alias2qids: Union[Dict[str, list], bootleg.utils.classes.nested_vocab_tries.TwoLayerVocabularyScoreTrie], qid2title: Dict[str, str], qid2desc: Optional[Dict[str, str]] = None, qid2eid: Optional[bootleg.utils.classes.nested_vocab_tries.VocabularyTrie] = None, alias2id: Optional[bootleg.utils.classes.nested_vocab_tries.VocabularyTrie] = None, max_candidates: int = 30, alias_cand_map_dir: str = 'alias2qids', alias_idx_dir: str = 'alias2id', edit_mode: Optional[bool] = False, verbose: Optional[bool] = False)[source]¶
Bases: object
Entity Symbols class for managing entity metadata.
add_entity(qid, mentions, title, desc='')[source]¶
Add entity QID to our mappings with its mentions and title.
Parameters
- qid – QID
- mentions – List of tuples [mention, score]
- title – title
- desc – description
add_mention(qid: str, mention: str, score: float)[source]¶
Add mention to QID with the associated score.
The mention already exists, error thrown to call set_score
instead. If there are already max candidates to that mention, the last candidate of the mention is removed in place of QID.
Parameters
- qid – QID
- mention – mention
- score – score
Check alias existance.
Parameters
alias – alias string
Returns: boolean
get_alias2qids_dict()[source]¶
Get the alias2qids mapping.
Key is alias, value is list of candidate tuple of length two of [QID, sort_value].
Returns: Dict alias2qids mapping
get_alias_from_idx(alias_idx)[source]¶
Get the alias from the numeric index.
Parameters
alias_idx – alias numeric index
Returns: alias string
Get the numeric index of an alias.
Parameters
alias – alias
Returns: integer representation of alias
get_all_alias_vocabtrie()[source]¶
Get a trie of all aliases.
Returns: Vocab trie of all aliases.
Get all aliases.
Returns: Dict_keys of all aliases
Get all QIDs.
Returns: Dict_keys of all QIDs
Get all QID titles.
Returns: Dict_values of all titles
Get description for QID.
Parameters
id – QID string
Returns: title string
Get the QID for the EID.
Parameters
id – EID int
Returns: QID string
get_eid_cands(alias, max_cand_pad=False)[source]¶
Get the EID candidates for an alias.
Parameters
- alias – alias
- max_cand_pad – whether to pad with -1 or not if fewer than max_candidates candidates
Returns: List of EID ints
Get the mentions for the QID.
Parameters
qid – QID
Returns: List of mentions
get_mentions_with_scores(qid)[source]¶
Get the mentions and the associated score for the QID.
Parameters
qid – QID
Returns: List of tuples [mention, score]
Get the QID associated with EID.
Parameters
id – EID
Returns: QID string
Get the qid2eid mapping.
Returns: Dict qid2eid mapping
Get the qid2title mapping.
Returns: Dict qid2title mapping
get_qid_cands(alias, max_cand_pad=False)[source]¶
Get the QID candidates for an alias.
Parameters
- alias – alias
- max_cand_pad – whether to pad with ‘-1’ or not if fewer than max_candidates candidates
Returns: List of QID strings
get_qid_count_cands(alias, max_cand_pad=False)[source]¶
Get the [QID, sort_value] candidates for an alias.
Parameters
- alias – alias
- max_cand_pad – whether to pad with [‘-1’,-1] or not if fewer than max_candidates candidates
Returns: List of [QID, sort_value]
Get title for QID.
Parameters
id – QID string
Returns: title string
classmethod load_from_cache(load_dir, alias_cand_map_dir='alias2qids', alias_idx_dir='alias2id', edit_mode=False, verbose=False)[source]¶
Load entity symbols from load_dir.
Parameters
- load_dir – directory to load from
- alias_cand_map_dir – alias2qid directory
- alias_idx_dir – alias2id directory
- edit_mode – edit mode flag
- verbose – verbose flag
prune_to_entities(entities_to_keep)[source]¶
Remove all entities except those in entities_to_keep
.
Parameters
entities_to_keep – Set of entities to keep
Check QID existance.
Parameters
alias – QID string
Returns: boolean
reidentify_entity(old_qid, new_qid)[source]¶
Rename old_qid
to new_qid
.
Parameters
- old_qid – old QID
- new_qid – new QID
remove_mention(qid, mention)[source]¶
Remove the mention from those associated with the QID.
Parameters
- qid – QID
- mention – mention to remove
Dump the entity symbols.
Parameters
save_dir – directory string to save
set_desc(qid: str, desc: str)[source]¶
Set the description for a QID.
Parameters
- qid – QID
- desc – description
set_score(qid: str, mention: str, score: float)[source]¶
Change the mention QID score and resorts candidates.
Highest score is first.
Parameters
- qid – QID
- mention – mention
- score – score
set_title(qid: str, title: str)[source]¶
Set the title for a QID.
Parameters
- qid – QID
- title – title
bootleg.symbols.kg_symbols module¶
KG symbols class.
class bootleg.symbols.kg_symbols.KGSymbols(qid2relations: Union[Dict[str, Dict[str, List[str]]], bootleg.utils.classes.nested_vocab_tries.ThreeLayerVocabularyTrie], max_connections: Optional[int] = 50, edit_mode: Optional[bool] = False, verbose: Optional[bool] = False)[source]¶
Bases: object
KG Symbols class for managing KG metadata.
add_entity(qid, relation_dict)[source]¶
Add a new entity to our relation mapping.
Parameters
- qid – QID
- relation_dict – dictionary of relation -> list of connected other_qids by relation
add_relation(qid, relation, qid2)[source]¶
Add a relationship triple to our mapping.
If the QID already has max connection through relation
, the last other_qid
is removed and replaced by qid2
.
Parameters
- qid – head entity QID
- relation – relation
- qid2 – tail entity QID:
Get all relations in our KG mapping.
Returns: Set
get_qid2relations_dict()[source]¶
Return a dictionary form of the relation to qid mappings object.
Returns: Dict of relation to head qid to list of tail qids
get_relations_between(qid1, qid2)[source]¶
Check if two QIDs are connected in KG and returns the relations between then.
Parameters
- qid1 – QID one
- qid2 – QID two
Returns: string relation or empty set
get_relations_tails_for_qid(qid)[source]¶
Get dict of relation to tail qids for given qid.
Parameters
qid – QID
Returns: Dict relation to list of tail qids for that relation
classmethod load_from_cache(load_dir, prefix='', edit_mode=False, verbose=False)[source]¶
Load type symbols from load_dir.
Parameters
- load_dir – directory to load from
- prefix – prefix to add to beginning to file
- edit_mode – edit mode
- verbose – verbose flag
Returns: TypeSymbols
prune_to_entities(entities_to_keep)[source]¶
Remove all entities except those in entities_to_keep
.
Parameters
entities_to_keep – Set of entities to keep
reidentify_entity(old_qid, new_qid)[source]¶
Rename old_qid
to new_qid
.
Parameters
- old_qid – old QID
- new_qid – new QID
remove_relation(qid, relation, qid2)[source]¶
Remove a relation triple from our mapping.
Parameters
- qid – head entity QID
- relation – relation
- qid2 – tail entity QID
save(save_dir, prefix='')[source]¶
Dump the kg symbols.
Parameters
- save_dir – directory string to save
- prefix – prefix to add to beginning to file
bootleg.symbols.type_symbols module¶
Type symbols class.
class bootleg.symbols.type_symbols.TypeSymbols(qid2typenames: Union[Dict[str, List[str]], bootleg.utils.classes.nested_vocab_tries.TwoLayerVocabularyScoreTrie], max_types: Optional[int] = 10, edit_mode: Optional[bool] = False, verbose: Optional[bool] = False)[source]¶
Bases: object
Type Symbols class for managing type metadata.
add_entity(qid, types)[source]¶
Add an entity QID with its types to our mappings.
Parameters
- qid – QID
- types – list of type names
add_type(qid, typename)[source]¶
Add the type to the QID.
If the QID already has maximum types, the last type is removed and replaced by typename
.
Parameters
- qid – QID
- typename – type name
Return all typenames.
get_entities_of_type(typename)[source]¶
Get all entity QIDs of type typename
.
Parameters
typename – typename
Returns: List
get_qid2typename_dict()[source]¶
Return dictionary of qid to typenames.
Returns: Dict of QID to list of typenames.
Get the type names associated with the given QID.
Parameters
qid – QID
Returns: list of typename strings
classmethod load_from_cache(load_dir, prefix='', edit_mode=False, verbose=False)[source]¶
Load type symbols from load_dir.
Parameters
- load_dir – directory to load from
- prefix – prefix to add to beginning to file
- edit_mode – edit mode flag
- verbose – verbose flag
Returns: TypeSymbols
prune_to_entities(entities_to_keep)[source]¶
Remove all entities except those in entities_to_keep
.
Parameters
entities_to_keep – Set of entities to keep
reidentify_entity(old_qid, new_qid)[source]¶
Rename old_qid
to new_qid
.
Parameters
- old_qid – old QID
- new_qid – new QID
remove_type(qid, typename)[source]¶
Remove the type from the QID.
Parameters
- qid – QID
- typename – type name to remove
save(save_dir, prefix='')[source]¶
Dump the type symbols.
Parameters
- save_dir – directory string to save
- prefix – prefix to add to beginning to file
Module contents¶
Symbols init.