apps.bazel_parser.cli

Parse bazel query outputs

A larger system description: - inputs:

  • bazel query //… –output proto > query_result.pb - the full dependency tree

  • bazel test //… –build_event_binary_file=test_all.pb - bazel run //utils:bep_reader < test_all.pb - the execution time related to each test target

  • git_utils.get_file_commit_map_from_follow - how files have changed over time, can be used to generate

    probabilities of files changing in the future

  • intermediates: - representation for source files and bazel together

  • outputs: - test targets:

    • likelihood of executing - expected value of runtime

    • source files: - cost in execution time of modification - expected cost of file change (based on probability of change * cost)

    • graph with the values above, we could take any set of file inputs and describe cost

    • graph that we could identify overly depended upon things

XXX:
  • bazel query –keep_going –noimplicit_deps –output proto “deps(//…)” is much bigger than “//…” alone, compare what the differences are

  • git log –since=”10 years ago” –name-only –pretty=format: | sort
    uniq -c | sort -nr
    • this is much faster

    • could identify renames via: - git log –since=”1 month ago” –name-status –pretty=format:

      grep -P ‘R[0-9]*t’ | awk ‘{print $2, “->”, $3}’
      • then correct

      • can get commit association via - git log –since=”1 month ago” –name-status –pretty=format:”%H”

      • statuses are A,M,D,Rddd

``` # Regex pattern to match the git log output pattern = r”^([AMD])s+(.+?)(s*->s*(.+))?$|^R(d+)s+(.+?)s*->s*(.+)$” # Parse each line using the regex for line in git_log_output.strip().split(’n’):

match = re.match(pattern, line.strip()) if match:

if match.group(1): # For A, M, D statuses

change_type = match.group(1) old_file = match.group(2) new_file = match.group(4) if match.group(4) else None print(f”Change type: {change_type}, Old file: {old_file}, “

f”New file: {new_file}”)

elif match.group(5): # For R status (renames)

change_type = ‘R’ similarity_index = match.group(5) old_file = match.group(6) new_file = match.group(7) print(f”Change type: {change_type}, Similarity index:”

f” {similarity_index}, Old file: {old_file}, New file:” f” {new_file}”)

```

Example Script:

repo_dir=`pwd` file_commit_pb=$repo_dir/file_commit.pb query_pb=$repo_dir/s_result.pb bep_pb=$repo_dir/test_all.pb out_gml=$repo_dir/my.gml out_csv=$repo_dir/my.csv out_html=$repo_dir/my.html

# Prepare data bazel query “//… - //docs/… - //third_party/bazel/…” –output proto

> $query_pb

bazel test //… –build_event_binary_file=$bep_pb bazel run //apps/bazel_parser –output_groups=-mypy – git-capture –repo-dir

$repo_dir –days-ago 400 –file-commit-pb $file_commit_pb

# Separate step if we want build timing data bazel clean bazel build –noremote_accept_cached

–experimental_execution_log_compact_file=exec_log.pb.zst –generate_json_trace_profile –profile=example_profile_new.json //…

# Would then need to process the exec_log.pb.zst file to get timing from it and # then add to the other timing information

# Process and visualize the data bazel run //apps/bazel_parser –output_groups=-mypy – process

–file-commit-pb $file_commit_pb –query-pb $query_pb –bep-pb $bep_pb –out-gml $out_gml –out-csv $out_csv

bazel run //apps/bazel_parser –output_groups=-mypy – visualize

–gml $out_gml –out-html $out_html

Attributes

Classes

Functions

load_config(→ Config)

get_config(→ Config)

cli()

git_capture(→ None)

process(→ None)

full(→ None)

visualize(→ None)

Module Contents

apps.bazel_parser.cli.logger[source]
apps.bazel_parser.cli.PATH_TYPE[source]
apps.bazel_parser.cli.OUT_PATH_TYPE[source]
class apps.bazel_parser.cli.Config[source]

Bases: pydantic.BaseModel

model_config[source]
query_target: str[source]
test_target: str[source]
days_ago: int[source]
refinement: Config.refinement[source]
apps.bazel_parser.cli.load_config(config_yaml_path: pathlib.Path, overrides: dict) Config[source]
Parameters:
  • config_yaml_path (pathlib.Path)

  • overrides (dict)

Return type:

Config

apps.bazel_parser.cli.get_config(config_file: pathlib.Path | None, days_ago: int | None) Config[source]
Parameters:
  • config_file (pathlib.Path | None)

  • days_ago (int | None)

Return type:

Config

apps.bazel_parser.cli.cli()[source]
apps.bazel_parser.cli.git_capture(repo_dir: pathlib.Path, days_ago: int, file_commit_pb: pathlib.Path) None[source]
Parameters:
  • repo_dir (pathlib.Path)

  • days_ago (int)

  • file_commit_pb (pathlib.Path)

Return type:

None

apps.bazel_parser.cli.process(query_pb: pathlib.Path, bep_pb: pathlib.Path, file_commit_pb: pathlib.Path, out_gml: pathlib.Path, out_csv: pathlib.Path, config_file: pathlib.Path | None) None[source]
Parameters:
  • query_pb (pathlib.Path)

  • bep_pb (pathlib.Path)

  • file_commit_pb (pathlib.Path)

  • out_gml (pathlib.Path)

  • out_csv (pathlib.Path)

  • config_file (pathlib.Path | None)

Return type:

None

apps.bazel_parser.cli.full(repo_dir: pathlib.Path, days_ago: int | None, config_file: pathlib.Path | None, out_gml: pathlib.Path | None, out_csv: pathlib.Path | None) None[source]
Parameters:
  • repo_dir (pathlib.Path)

  • days_ago (int | None)

  • config_file (pathlib.Path | None)

  • out_gml (pathlib.Path | None)

  • out_csv (pathlib.Path | None)

Return type:

None

apps.bazel_parser.cli.visualize(gml: pathlib.Path, out_html: pathlib.Path | None) None[source]
Parameters:
  • gml (pathlib.Path)

  • out_html (pathlib.Path | None)

Return type:

None