condor.record package

Submodules

condor.record.base module

This module describes the API for a record parser and a record iterator, they have the responsibility of transforming a chunk of text into a usable dictionary and go through a file and find all the records.

class condor.record.base.RecordIterator(filename)[source]

Bases: object

Iterates over a bunch of reccords in a file.

get_buffer()[source]
parser_class

alias of RecordParser

class condor.record.base.RecordParser[source]

Bases: object

Outlines the API for parsing different kinds of records into dictionaries that are easy to use and store.

clear(field, raw)[source]

Clears the given field out of a raw data record.

get_default(field)[source]

Returns the default value for a given field

get_mapping(field)[source]

Returns the name of a field with a modification useful when the field is easy to clear but it requires just a change of name.

interest_fields = [‘hash’, ‘title’, ‘keywords’, ‘description’, ‘language’, ‘file’]
list_fields = [‘keywords’]
mappings = {}
parse(raw)[source]

Returns a dictionary of the interest fields in the metadata.

condor.record.bib module

class condor.record.bib.BibtexRecordIterator(filename)[source]

Bases: condor.record.base.RecordIterator

Iterates over bibtex reccords

get_buffer()[source]
parser_class

alias of BibtexRecordParser

class condor.record.bib.BibtexRecordParser[source]

Bases: condor.record.base.RecordParser

accent_remover = <condor.normalize.LatexAccentRemover object>
clear(field, raw)[source]
guesser = <condor.util.LanguageGuesser object>
mappings = {‘description’: ‘abstract’}
parse(raw)[source]

condor.record.froac module

class condor.record.froac.FroacRecordIterator(filename)[source]

Bases: condor.record.base.RecordIterator

Iterates plain txt froac records in a file.

get_buffer()[source]
parser_class

alias of FroacRecordParser

class condor.record.froac.FroacRecordParser[source]

Bases: condor.record.base.RecordParser

default_language = ‘english’
language_key = {‘es’: ‘spanish’, ‘en’: ‘english’, ‘pt’: ‘portuguese’, ‘fr’: ‘french’, ‘it’: ‘italian’, ‘de’: ‘german’}
parse(raw)[source]

condor.record.isi module

class condor.record.isi.IsiRecordIterator(filename)[source]

Bases: condor.record.base.RecordIterator

Iterates over a file with ISI txt reccords while yielding reccords.

get_buffer()[source]

Iterates over a file by looking for lines containing the ER mark of the isi plain text files.

parser_class

alias of IsiRecordParser

class condor.record.isi.IsiRecordParser[source]

Bases: condor.record.base.RecordParser

This represents an ISI web of knowledge record

clear(field, raw)[source]

Uses the mappings to get the entries out of the raw dictionary

mappings = {‘title’: ‘TI’, ‘description’: ‘AB’, ‘keywords’: ‘ID’, ‘language’: ‘LA’, ‘hash’: ‘UT’}
parse(raw)[source]

Checks if the input is a string if so it transforms it into a dictionary and runs the default parse method.

Module contents

condor.record.record_iterator_class(record_type)[source]

Gets the record iterator for a given type

A way to abstract the construction of a record iterator class.

Parameters:record_type – the type of file as string
Returns:the appropriate record iterator class