Skip to content

Project Documentation

Randname - A random name generator library.

This package provides functionality for generating random names from various countries. It includes utilities for generating first names, last names, and full names, with support for multiple countries and customizable options.

The package offers a simple API for generating random names that can be used in testing, data generation, or any application requiring random name generation.

Warning

This package uses pseudo-random generators from Python standard library. This package should not be used for security purposes. The base package contains limited dataset of names. It is easy to create a collision.

Attributes:

Name Type Description
__title__ str

The title of the package.

__version__ str

The current version of the package.

__author__ str

The author of the package.

__license__ str

The license under which the package is distributed.

Examples:

>>> import randname
>>> randname.randfirst()
'John'
>>> randname.randlast()
'Smith'
>>> randname.randfull()
'Jane Doe'
>>> randname.available_countries()
['PL', 'US', 'ES', ...]

Modules:

Name Description
config

Configuration for logging.

core

Core functionality for generating random names.

database

Database handling for name data.

error

Custom exceptions for the randname library.

config

Configuration for logging in the randname library.

This module act as singleton. Once imported it will not be re-imported again. Thus, logger is always the same object.

Attributes:

Name Type Description
DEFAULT_LOGGING_LEVEL

Default logging level for the application.

LOGGING_LEVEL_MAP

Mapping of string logging levels to logging module constants.

logger

Configured logger instance.

Functions:

Name Description
set_logger

Set logger for the application.

set_logger(level_name)

Set logger for the application.

Parameters:

Name Type Description Default
level_name str

Logging level to set. Can be a string key from LOGGING_LEVEL_MAP or a logging level constant.

required

Returns:

Type Description
Logger

logger

Source code in src/randname/config.py
def set_logger(level_name: str) -> logging.Logger:
    """Set logger for the application.

    Args:
        level_name: Logging level to set. Can be a string key from LOGGING_LEVEL_MAP
            or a logging level constant.

    Returns:
        logger
    """
    level = LOGGING_LEVEL_MAP.get(level_name, logging.ERROR)
    formatter = logging.Formatter(
        "[%(asctime)s][%(levelname)s][%(filename)s:%(funcName)s:%(lineno)d] %(message)s"
    )

    handler = logging.StreamHandler()
    handler.setLevel(level)
    handler.setFormatter(formatter)

    logger = logging.getLogger("randname")
    logger.addHandler(handler)

    return logger

core

Core module for randname

Functions in this module are aliased to methods of an instance of the Randname class. This allows users to call functions directly from the module without needing to instantiate the Randname class themselves. And makes implementation easier, because functions share internal state and resources.

To change the database path, assign a new path to randname.core.database attribute.

Examples:

Example usage of module:

>>> import randname
>>> randname.randfull()
'John Doe'

Attributes:

Name Type Description
database

Instance of the Database class used for name data.

Functions:

Name Description
randfirst

Generate a random first name.

randlast

Generate a random last name.

randfull

Generate a random full name.

available_countries

List available countries in the database.

show_data

Show information about the database.

Classes:

Name Description
Randname

Main class for generating random names.

Randname

Source code in src/randname/core.py
class Randname:
    PATH_TO_DATABASE = Path() / _THIS_FOLDER / "data"
    VALID_SEX_OPTIONS = ("M", "F", "N", None)

    def __init__(self, path_to_database: Path | None = None):
        if path_to_database is None:
            self._database = randname.database.Database(Randname.PATH_TO_DATABASE)
        else:
            self._database = randname.database.Database(path_to_database)

        logger.debug("Database: %s", self._database)

    @property
    def database(self) -> randname.database.Database:
        return self._database

    @database.setter
    def database(self, path: Path) -> None:
        self._database = randname.database.Database(path)
        logger.debug("Database path: %s", self._database.path)

    def randfull(
        self,
        year: int | None = None,
        sex: str | None = None,
        country: str | None = None,
        weights: bool = True,
    ) -> str:
        """Return full name

        Args:
            year: Year of birth, defaults to None
            sex: Sex's name, defaults to None
            country: Country of origin, defaults to None
            weights: Use population distribution if True, else treat all names
                with same probability, defaults to True

        Returns:
            Full name

        Raises:
            InvalidSexArgument: If sex is not in proper sex options
            InvalidCountryName: If country is not in valid countries

        Examples:
            >>> randfull()
            'John Doe'
        """
        country = self._gen_country(country)
        first_name_available_sex = self._available_sex(country, "first_names")
        last_name_available_sex = self._available_sex(country, "last_names")

        first_name_sex = last_name_sex = sex

        if sex not in Randname.VALID_SEX_OPTIONS:
            raise randname.error.InvalidSexArgumentError(
                sex, Randname.VALID_SEX_OPTIONS
            )

        if sex not in first_name_available_sex:
            first_name_sex = random.choice(first_name_available_sex)

        if sex not in last_name_available_sex:
            last_name_sex = random.choice(last_name_available_sex)

        first = self.randfirst(year, first_name_sex, country, weights)
        last = self.randlast(year, last_name_sex, country, weights)
        return f"{first} {last}"

    def randlast(
        self,
        year: int | None = None,
        sex: str | None = None,
        country: str | None = None,
        weights: bool = True,
    ) -> str:
        """Return random last name

        Args:
            year: Year of birth, defaults to None
            sex: Sex's name, defaults to None
            country: Country of origin, defaults to None
            weights: Use population distribution if True, else treat all names
                with same probability, defaults to True

        Returns:
            Last name

        Raises:
            InvalidSexArgument: If sex is not in proper sex options
            InvalidCountryName: If country is not in valid countries

        Examples:
            >>> randlast()
            'Doe'
        """
        last_name = self._gen_name("last", year, sex, country, weights)
        return last_name

    def randfirst(
        self,
        year: int | None = None,
        sex: str | None = None,
        country: str | None = None,
        weights: bool = True,
    ) -> str:
        """Return random first name

        Args:
            year: Year of birth, defaults to None
            sex: Sex's name, defaults to None
            country: Country of origin, defaults to None
            weights: Use population distribution if True, else treat all names
                with same probability, defaults to True

        Returns:
            First name

        Raises:
            InvalidSexArgument: If sex is not in proper sex options
            InvalidCountryName: If country is not in valid countries

        Examples:
            >>> randfirst()
            'John'
        """
        return self._gen_name("first", year, sex, country, weights)

    def _gen_name(
        self,
        short_name: ShortConvention,
        year: int | None = None,
        sex: str | None = None,
        country: str | None = None,
        cum_weights: bool = True,
    ) -> str:
        """Private function to get either first or last name

        Args:
            name_type: "first" or "last"
            year: Year of source database, defaults to None
            sex: Name gender, defaults to None
            country: Database country, defaults to None
            cum_weights: Include weights in database, defaults to True

        Returns:
            Name from database

        Raises:
            InvalidSexArgument: Raise when provided sex is not available for
                given database

        Examples:
            >>> _gen_name("first")
            "John"
            >>> _gen_name("last")
            "Doe"
        """
        long_name = Randname._map_short_to_full_convention(short_name)
        country = self._gen_country(country)
        year = self._gen_year(year, country, long_name)
        sex = self._gen_sex(sex, country, long_name)

        name_of_dataset = f"{year}_{sex}"
        path_to_dataset = self.database.path / country / long_name / name_of_dataset

        return Randname._gen_name_from_file(path_to_dataset, cum_weights)

    @staticmethod
    def _map_short_to_full_convention(
        short: ShortConvention,
    ) -> LongConvention:
        opt: dict[ShortConvention, LongConvention] = {
            "first": "first_names",
            "last": "last_names",
        }
        result = opt.get(short)

        if result is None:
            raise ValueError("Incorrect key")

        return result

    def _gen_country(self, country: str | None) -> str:
        countries = list(self.available_countries())
        if country is None:
            country = random.choice(countries)
        # TODO: if not countries
        if country not in countries:
            raise randname.error.InvalidCountryNameError(country, countries)
        return country

    def _gen_year(
        self,
        year: int | None,
        country: str,
        name_type: str,
    ) -> int:
        database_files = list((self._database.path / country / name_type).iterdir())
        database_years = set(year.name.split("_")[0] for year in database_files)
        data_range = sorted([int(year) for year in database_years])

        if not year:
            year = random.choice(data_range)

        logging.debug(f"Year: {year}")

        if not min(data_range) <= year <= max(data_range):
            logger.warning("%s -> %s not in range %s", year, year, data_range)

        # Correction of year index. If bisect_left returns int > len(data_range)
        # return bisect_left -1. It's in case of very small data sets.
        def correct_bisect_left(d, y):
            bisect = bisect_left(d, y)
            return bisect if bisect != len(d) else bisect - 1

        year_index = correct_bisect_left(data_range, year)
        logging.debug(f"Year index: {year_index}")

        return data_range[year_index]

    def _gen_sex(self, sex: str | None, country: str, name_type: str) -> str:
        available_sex = self._available_sex(country, name_type)

        if sex is None:
            sex = random.choice(available_sex)

        if str(sex).capitalize() not in available_sex:
            raise randname.error.InvalidSexArgumentError(sex, available_sex)

        return sex

    def _available_sex(self, country: str, name_type: str):
        info = self._database.path / country / "info.json"

        with info.open("r", encoding="utf-8") as fd:
            available_sex = json.load(fd)[name_type]

        logging.debug("Available sex: %s", available_sex)

        return available_sex

    @staticmethod
    def _gen_name_from_file(path_to_dataset: Path, cum_weights: bool = True) -> str:
        with path_to_dataset.open("r", encoding="utf-8") as json_file:
            logging.debug(f"Opening: {json_file.name}")
            data_set = json.load(json_file)
            name_population = data_set["Names"]
            name_cum_weights = data_set["Totals"]

        if cum_weights:
            name = random.choices(name_population, cum_weights=name_cum_weights)[0]
        else:
            name = random.choices(name_population)[0]

        logging.debug(f"Name: {name}")

        return name

    # Support functions

    def available_countries(self, path: Path | None = None) -> set[str]:
        """Return set of available countries

        Args:
            path: Path to database, defaults to DATABASE

        Returns:
            Set of available countries

        Examples:
            >>> available_countries()
            {'ES', 'PL', 'US'}
        """
        if path is None:
            path = self._database.path

        return {p.name for p in path.iterdir()}

    def show_data(
        self, path: Path | None
    ) -> dict[str, dict[LongConvention, list[SexConvention]]]:
        """Return dictionary with information about database.

        Args:
            path: Path to the root directory of database.

        Returns:
            Information about database

        Examples:
            >>> show_data()
            {
                'ES': {'first_names': ['M'], 'last_names': ['N']},
                'PL': {'first_names': ['M', 'F'], 'last_names': ['M', 'F']},
                'US': {'first_names': ['M', 'F'], 'last_names': ['N']}
            }
        """
        result: dict[str, dict[LongConvention, list[SexConvention]]] = {}

        if path is None:
            path = self.database.path

        for country in self.available_countries(path):
            path_to_info_json = path / country / "info.json"

            with open(path_to_info_json, "r", encoding="utf-8") as info_file:
                info_dict = json.load(info_file)
                result.setdefault(
                    info_dict["country"],
                    {
                        "first_names": info_dict["first_names"],
                        "last_names": info_dict["last_names"],
                    },
                )

        return result

available_countries(path=None)

Return set of available countries

Parameters:

Name Type Description Default
path Path | None

Path to database, defaults to DATABASE

None

Returns:

Type Description
set[str]

Set of available countries

Examples:

>>> available_countries()
{'ES', 'PL', 'US'}
Source code in src/randname/core.py
def available_countries(self, path: Path | None = None) -> set[str]:
    """Return set of available countries

    Args:
        path: Path to database, defaults to DATABASE

    Returns:
        Set of available countries

    Examples:
        >>> available_countries()
        {'ES', 'PL', 'US'}
    """
    if path is None:
        path = self._database.path

    return {p.name for p in path.iterdir()}

randfirst(year=None, sex=None, country=None, weights=True)

Return random first name

Parameters:

Name Type Description Default
year int | None

Year of birth, defaults to None

None
sex str | None

Sex's name, defaults to None

None
country str | None

Country of origin, defaults to None

None
weights bool

Use population distribution if True, else treat all names with same probability, defaults to True

True

Returns:

Type Description
str

First name

Raises:

Type Description
InvalidSexArgument

If sex is not in proper sex options

InvalidCountryName

If country is not in valid countries

Examples:

>>> randfirst()
'John'
Source code in src/randname/core.py
def randfirst(
    self,
    year: int | None = None,
    sex: str | None = None,
    country: str | None = None,
    weights: bool = True,
) -> str:
    """Return random first name

    Args:
        year: Year of birth, defaults to None
        sex: Sex's name, defaults to None
        country: Country of origin, defaults to None
        weights: Use population distribution if True, else treat all names
            with same probability, defaults to True

    Returns:
        First name

    Raises:
        InvalidSexArgument: If sex is not in proper sex options
        InvalidCountryName: If country is not in valid countries

    Examples:
        >>> randfirst()
        'John'
    """
    return self._gen_name("first", year, sex, country, weights)

randfull(year=None, sex=None, country=None, weights=True)

Return full name

Parameters:

Name Type Description Default
year int | None

Year of birth, defaults to None

None
sex str | None

Sex's name, defaults to None

None
country str | None

Country of origin, defaults to None

None
weights bool

Use population distribution if True, else treat all names with same probability, defaults to True

True

Returns:

Type Description
str

Full name

Raises:

Type Description
InvalidSexArgument

If sex is not in proper sex options

InvalidCountryName

If country is not in valid countries

Examples:

>>> randfull()
'John Doe'
Source code in src/randname/core.py
def randfull(
    self,
    year: int | None = None,
    sex: str | None = None,
    country: str | None = None,
    weights: bool = True,
) -> str:
    """Return full name

    Args:
        year: Year of birth, defaults to None
        sex: Sex's name, defaults to None
        country: Country of origin, defaults to None
        weights: Use population distribution if True, else treat all names
            with same probability, defaults to True

    Returns:
        Full name

    Raises:
        InvalidSexArgument: If sex is not in proper sex options
        InvalidCountryName: If country is not in valid countries

    Examples:
        >>> randfull()
        'John Doe'
    """
    country = self._gen_country(country)
    first_name_available_sex = self._available_sex(country, "first_names")
    last_name_available_sex = self._available_sex(country, "last_names")

    first_name_sex = last_name_sex = sex

    if sex not in Randname.VALID_SEX_OPTIONS:
        raise randname.error.InvalidSexArgumentError(
            sex, Randname.VALID_SEX_OPTIONS
        )

    if sex not in first_name_available_sex:
        first_name_sex = random.choice(first_name_available_sex)

    if sex not in last_name_available_sex:
        last_name_sex = random.choice(last_name_available_sex)

    first = self.randfirst(year, first_name_sex, country, weights)
    last = self.randlast(year, last_name_sex, country, weights)
    return f"{first} {last}"

randlast(year=None, sex=None, country=None, weights=True)

Return random last name

Parameters:

Name Type Description Default
year int | None

Year of birth, defaults to None

None
sex str | None

Sex's name, defaults to None

None
country str | None

Country of origin, defaults to None

None
weights bool

Use population distribution if True, else treat all names with same probability, defaults to True

True

Returns:

Type Description
str

Last name

Raises:

Type Description
InvalidSexArgument

If sex is not in proper sex options

InvalidCountryName

If country is not in valid countries

Examples:

>>> randlast()
'Doe'
Source code in src/randname/core.py
def randlast(
    self,
    year: int | None = None,
    sex: str | None = None,
    country: str | None = None,
    weights: bool = True,
) -> str:
    """Return random last name

    Args:
        year: Year of birth, defaults to None
        sex: Sex's name, defaults to None
        country: Country of origin, defaults to None
        weights: Use population distribution if True, else treat all names
            with same probability, defaults to True

    Returns:
        Last name

    Raises:
        InvalidSexArgument: If sex is not in proper sex options
        InvalidCountryName: If country is not in valid countries

    Examples:
        >>> randlast()
        'Doe'
    """
    last_name = self._gen_name("last", year, sex, country, weights)
    return last_name

show_data(path)

Return dictionary with information about database.

Parameters:

Name Type Description Default
path Path | None

Path to the root directory of database.

required

Returns:

Type Description
dict[str, dict[LongConvention, list[SexConvention]]]

Information about database

Examples:

>>> show_data()
{
    'ES': {'first_names': ['M'], 'last_names': ['N']},
    'PL': {'first_names': ['M', 'F'], 'last_names': ['M', 'F']},
    'US': {'first_names': ['M', 'F'], 'last_names': ['N']}
}
Source code in src/randname/core.py
def show_data(
    self, path: Path | None
) -> dict[str, dict[LongConvention, list[SexConvention]]]:
    """Return dictionary with information about database.

    Args:
        path: Path to the root directory of database.

    Returns:
        Information about database

    Examples:
        >>> show_data()
        {
            'ES': {'first_names': ['M'], 'last_names': ['N']},
            'PL': {'first_names': ['M', 'F'], 'last_names': ['M', 'F']},
            'US': {'first_names': ['M', 'F'], 'last_names': ['N']}
        }
    """
    result: dict[str, dict[LongConvention, list[SexConvention]]] = {}

    if path is None:
        path = self.database.path

    for country in self.available_countries(path):
        path_to_info_json = path / country / "info.json"

        with open(path_to_info_json, "r", encoding="utf-8") as info_file:
            info_dict = json.load(info_file)
            result.setdefault(
                info_dict["country"],
                {
                    "first_names": info_dict["first_names"],
                    "last_names": info_dict["last_names"],
                },
            )

    return result

database

Database module

Classes:

Name Description
Database

Database container and validator.

Database

Source code in src/randname/database.py
class Database:
    schema_info_json = {
        "type": "object",
        "title": "info.json schema",
        "description": "Schema for info.json file",
        "properties": {
            "country": {"type": "string"},
            "first_names": {
                "type": "array",
                "items": {"type": "string"},
                "minItems": 1,
            },
            "last_names": {"type": "array", "items": {"type": "string"}, "minItems": 1},
        },
        "required": ["country", "first_names", "last_names"],
        "additionalProperties": False,
    }
    schema_name_json = {
        "type": "object",
        "title": "first_names and last_names schema",
        "description": "Schema for last and first names files",
        "properties": {
            "Names": {"type": "array", "items": {"type": "string"}, "minItems": 1},
            "Totals": {"type": "array", "items": {"type": "number"}, "minItems": 1},
        },
        "required": ["Names", "Totals"],
        "additionalProperties": False,
    }

    draft_validator_info = jsonschema.Draft7Validator(schema_info_json)
    draft_validator_name = jsonschema.Draft7Validator(schema_name_json)

    def __init__(self, path_to_database: Union[Path, str]):
        """Database container.

        Database does not validates the database on initialization., due to
        performance considerations.

        To validate the database, use `Database.validate(path)` method.
        Or set the `path` property, which will validate the new path.

        Args:
            path_to_database: Path to directory with database
        """
        # self.validate_database(path_to_database)
        self._path = Path(path_to_database)

    @property
    def path(self) -> Path:
        """Path to database

        Returns:
            Path to database
        """
        return self._path

    @path.setter
    def path(self, new_path: Path) -> None:
        Database.validate(new_path)
        self._path = Path(new_path)

    @staticmethod
    def validate(path: Path) -> bool:
        """Check if database has valid structure and it's files are
        correctly formatted.

        Warning:
            Validating database might take some time, depends how large is the database.

        Args:
            path: Path to database

        Raises:
            randname.error.DirectoryDoesNotExist: Raise when directory with database does not exist.
            randname.error.MissingInfoFile: Raise when info.json is missing.
            randname.error.GenderMismatch: Raise when gender information in info.json does not match to what is in directories.
            randname.error.FileNameDoesNotMatchPattern: Raise when file with names doesn't match naming convention.
            jsonschema.ValidationError: Raise when json file doesn't match pattern.
        """
        invalid_name_pattern: list[Path] = []
        invalid_json_files: list[Path] = []

        if not path.is_dir():
            raise randname.error.DirectoryDoesNotExistError(path)

        # traverse directory
        for country_directory in path.iterdir():
            path_to_info_file = Path() / country_directory / "info.json"
            first_names_dir = Path() / country_directory / "first_names"
            last_names_dir = Path() / country_directory / "last_names"

            # check for required files
            if not path_to_info_file.exists():
                raise randname.error.MissingInfoFileError(path_to_info_file)
            if not first_names_dir.exists():
                raise randname.error.DirectoryDoesNotExistError(first_names_dir)
            if not last_names_dir.exists():
                raise randname.error.DirectoryDoesNotExistError(last_names_dir)

            # check info.json
            with path_to_info_file.open("r", encoding="utf-8") as info_file:
                json_file = json.load(info_file)
                first_names_sex = set(json_file["first_names"])
                last_names_sex = set(json_file["last_names"])

            try:
                Database._validate_json_schema(
                    Database.schema_info_json, path_to_info_file
                )
            except jsonschema.ValidationError:
                logger.error(f"Invalid info file: {path_to_info_file}")
                invalid_json_files.append(path_to_info_file)

            # check if content fo info.json match the content of first_names and last_names directories
            sex_in_first_names_dir = set(
                [path.name.split("_")[1] for path in first_names_dir.iterdir()]
            )
            sex_in_last_names_dir = set(
                [path.name.split("_")[1] for path in last_names_dir.iterdir()]
            )

            diff = first_names_sex.difference(sex_in_first_names_dir)
            if diff:
                raise randname.error.GenderMismatchError(
                    f"Info file: {path_to_info_file}, defines: {first_names_sex}, but there is {sex_in_first_names_dir} in firs_names directory"
                )
            diff = last_names_sex.difference(sex_in_last_names_dir)
            if diff:
                raise randname.error.GenderMismatchError(
                    f"Info file: {path_to_info_file}, defines: {last_names_sex}, but there is {sex_in_last_names_dir} in firs_names directory"
                )

            # TODO: refactor into smaller functions
            # check first_names
            glob_pattern = f"[1-9]*_[{''.join(first_names_sex)}]"
            for f in first_names_dir.iterdir():
                match = f.match(glob_pattern)
                if not match:
                    logger.error(f"Invalid name pattern: {f}")
                    invalid_name_pattern.append(f)
                try:
                    Database._validate_json_schema(Database.schema_name_json, f)
                except jsonschema.ValidationError:
                    logger.error(f"Invalid content pattern: {f}")
                    invalid_json_files.append(f)

            # check last_names
            glob_pattern = f"[1-9]*_[{''.join(last_names_sex)}]"
            for f in last_names_dir.iterdir():
                if not f.match(glob_pattern):
                    logger.error(f"Invalid name pattern: {f}")
                    invalid_name_pattern.append(f)
                try:
                    Database._validate_json_schema(Database.schema_name_json, f)
                except jsonschema.ValidationError:
                    logger.error(f"Invalid content pattern: {f}")
                    invalid_json_files.append(f)

        if invalid_json_files:
            raise jsonschema.ValidationError(str(invalid_json_files))

        if invalid_name_pattern:
            raise randname.error.FileNameDoesNotMatchPatternError(invalid_name_pattern)

        return True

    @staticmethod
    def _validate_json_schema(schema, path: Path) -> None:
        """Validate JSON schema for database files

        Args:
            schema: JSON schema to validate against
            path_to_json: Path to JSON file to validate

        Raises:
            jsonschema.ValidationError: If JSON doesn't match schema
        """
        with path.open("r", encoding="utf-8") as f:
            json_content = json.load(f)

        if schema is Database.schema_name_json:
            Database.draft_validator_name.validate(json_content)

        if schema is Database.schema_info_json:
            Database.draft_validator_info.validate(json_content)

        jsonschema.validate(json_content, schema)

path property writable

Path to database

Returns:

Type Description
Path

Path to database

__init__(path_to_database)

Database container.

Database does not validates the database on initialization., due to performance considerations.

To validate the database, use Database.validate(path) method. Or set the path property, which will validate the new path.

Parameters:

Name Type Description Default
path_to_database Union[Path, str]

Path to directory with database

required
Source code in src/randname/database.py
def __init__(self, path_to_database: Union[Path, str]):
    """Database container.

    Database does not validates the database on initialization., due to
    performance considerations.

    To validate the database, use `Database.validate(path)` method.
    Or set the `path` property, which will validate the new path.

    Args:
        path_to_database: Path to directory with database
    """
    # self.validate_database(path_to_database)
    self._path = Path(path_to_database)

validate(path) staticmethod

Check if database has valid structure and it's files are correctly formatted.

Warning

Validating database might take some time, depends how large is the database.

Parameters:

Name Type Description Default
path Path

Path to database

required

Raises:

Type Description
DirectoryDoesNotExist

Raise when directory with database does not exist.

MissingInfoFile

Raise when info.json is missing.

GenderMismatch

Raise when gender information in info.json does not match to what is in directories.

FileNameDoesNotMatchPattern

Raise when file with names doesn't match naming convention.

ValidationError

Raise when json file doesn't match pattern.

Source code in src/randname/database.py
@staticmethod
def validate(path: Path) -> bool:
    """Check if database has valid structure and it's files are
    correctly formatted.

    Warning:
        Validating database might take some time, depends how large is the database.

    Args:
        path: Path to database

    Raises:
        randname.error.DirectoryDoesNotExist: Raise when directory with database does not exist.
        randname.error.MissingInfoFile: Raise when info.json is missing.
        randname.error.GenderMismatch: Raise when gender information in info.json does not match to what is in directories.
        randname.error.FileNameDoesNotMatchPattern: Raise when file with names doesn't match naming convention.
        jsonschema.ValidationError: Raise when json file doesn't match pattern.
    """
    invalid_name_pattern: list[Path] = []
    invalid_json_files: list[Path] = []

    if not path.is_dir():
        raise randname.error.DirectoryDoesNotExistError(path)

    # traverse directory
    for country_directory in path.iterdir():
        path_to_info_file = Path() / country_directory / "info.json"
        first_names_dir = Path() / country_directory / "first_names"
        last_names_dir = Path() / country_directory / "last_names"

        # check for required files
        if not path_to_info_file.exists():
            raise randname.error.MissingInfoFileError(path_to_info_file)
        if not first_names_dir.exists():
            raise randname.error.DirectoryDoesNotExistError(first_names_dir)
        if not last_names_dir.exists():
            raise randname.error.DirectoryDoesNotExistError(last_names_dir)

        # check info.json
        with path_to_info_file.open("r", encoding="utf-8") as info_file:
            json_file = json.load(info_file)
            first_names_sex = set(json_file["first_names"])
            last_names_sex = set(json_file["last_names"])

        try:
            Database._validate_json_schema(
                Database.schema_info_json, path_to_info_file
            )
        except jsonschema.ValidationError:
            logger.error(f"Invalid info file: {path_to_info_file}")
            invalid_json_files.append(path_to_info_file)

        # check if content fo info.json match the content of first_names and last_names directories
        sex_in_first_names_dir = set(
            [path.name.split("_")[1] for path in first_names_dir.iterdir()]
        )
        sex_in_last_names_dir = set(
            [path.name.split("_")[1] for path in last_names_dir.iterdir()]
        )

        diff = first_names_sex.difference(sex_in_first_names_dir)
        if diff:
            raise randname.error.GenderMismatchError(
                f"Info file: {path_to_info_file}, defines: {first_names_sex}, but there is {sex_in_first_names_dir} in firs_names directory"
            )
        diff = last_names_sex.difference(sex_in_last_names_dir)
        if diff:
            raise randname.error.GenderMismatchError(
                f"Info file: {path_to_info_file}, defines: {last_names_sex}, but there is {sex_in_last_names_dir} in firs_names directory"
            )

        # TODO: refactor into smaller functions
        # check first_names
        glob_pattern = f"[1-9]*_[{''.join(first_names_sex)}]"
        for f in first_names_dir.iterdir():
            match = f.match(glob_pattern)
            if not match:
                logger.error(f"Invalid name pattern: {f}")
                invalid_name_pattern.append(f)
            try:
                Database._validate_json_schema(Database.schema_name_json, f)
            except jsonschema.ValidationError:
                logger.error(f"Invalid content pattern: {f}")
                invalid_json_files.append(f)

        # check last_names
        glob_pattern = f"[1-9]*_[{''.join(last_names_sex)}]"
        for f in last_names_dir.iterdir():
            if not f.match(glob_pattern):
                logger.error(f"Invalid name pattern: {f}")
                invalid_name_pattern.append(f)
            try:
                Database._validate_json_schema(Database.schema_name_json, f)
            except jsonschema.ValidationError:
                logger.error(f"Invalid content pattern: {f}")
                invalid_json_files.append(f)

    if invalid_json_files:
        raise jsonschema.ValidationError(str(invalid_json_files))

    if invalid_name_pattern:
        raise randname.error.FileNameDoesNotMatchPatternError(invalid_name_pattern)

    return True

error

Error module

Classes:

Name Description
RandnameError

Base exception for randname library.

InvalidSexArgumentError

Exception for invalid sex argument.

InvalidCountryNameError

Exception for invalid country name.

DirectoryDoesNotExistError

Exception for non-existing database directory.

MissingInfoFileError

Exception for missing info.json file.

FileNameDoesNotMatchPatternError

Exception for invalid file name pattern.

GenderMismatchError

Exception for gender mismatch in database directories.

DirectoryDoesNotExistError

Bases: RandnameError

Exception raised when specified directory with database does not exist.

Source code in src/randname/error.py
class DirectoryDoesNotExistError(RandnameError):
    """Exception raised when specified directory with database does not exist."""

FileNameDoesNotMatchPatternError

Bases: RandnameError

Exception raised when file doesn't match the pattern.

Source code in src/randname/error.py
class FileNameDoesNotMatchPatternError(RandnameError):
    """Exception raised when file doesn't match the pattern."""

GenderMismatchError

Bases: RandnameError

Exception raised when supported genders defined in info.json does not match to what is in corresponding folders.

Source code in src/randname/error.py
class GenderMismatchError(RandnameError):
    """Exception raised when supported genders defined in info.json does not match
    to what is in corresponding folders."""

InvalidCountryNameError

Bases: RandnameError

Exception raised when country is not in available countries list.

Attributes:

Name Type Description
country

The invalid country name that was provided

available_countries

List of available countries

message

Explanation of the error

Source code in src/randname/error.py
class InvalidCountryNameError(RandnameError):
    """Exception raised when country is not in available countries list.

    Attributes:
        country: The invalid country name that was provided
        available_countries: List of available countries
        message: Explanation of the error
    """

    def __init__(self, country: str, available_countries: list[str]):
        self.country = country
        self.available_countries = available_countries
        self.message = f"{self.country} not in {self.available_countries}"
        super().__init__(self.message)

    def __str__(self):
        return f"{self.country} -> {self.message}"

InvalidSexArgumentError

Bases: RandnameError

Exception raised when selected sex is not available for chosen country.

Attributes:

Name Type Description
sex

The invalid sex value that was provided

available_sex

List of available sex values for the country

message

Explanation of the error

Source code in src/randname/error.py
class InvalidSexArgumentError(RandnameError):
    """Exception raised when selected sex is not available for chosen country.

    Attributes:
        sex: The invalid sex value that was provided
        available_sex: List of available sex values for the country
        message: Explanation of the error
    """

    def __init__(self, sex: str | None, available_sex: tuple[Any, ...]):
        self.sex = sex
        self.available_sex = available_sex
        self.message = f"{self.sex} not in {self.available_sex}"
        super().__init__(self.message)

    def __str__(self):
        return f"{self.sex} -> {self.message}"

MissingInfoFileError

Bases: RandnameError

Exception raised when info.json file is missing in the country directory.

Source code in src/randname/error.py
class MissingInfoFileError(RandnameError):
    """Exception raised when info.json file is missing in the country directory."""