apps.bazel_parser.cli

Parse bazel query outputs

A larger system description: - inputs:

bazel query //… –output proto > query_result.pb - the full dependency tree

bazel test //… –build_event_binary_file=test_all.pb - bazel run //utils:bep_reader < test_all.pb - the execution time related to each test target

git_utils.get_file_commit_map_from_log - how files have changed over time, can be used to generate

probabilities of files changing in the future

intermediates: - representation for source files and bazel together
outputs: - test targets:
- likelihood of executing - expected value of runtime
- source files: - cost in execution time of modification - expected cost of file change (based on probability of change * cost)
- graph with the values above, we could take any set of file inputs and describe cost
- graph that we could identify overly depended upon things
git log –since=”10 years ago” –name-only –pretty=format: | sort

uniq -c | sort -nr
- this is much faster
- could identify renames via: - git log –since=”1 month ago” –name-status –pretty=format:
  
  grep -P ‘R[0-9]*t’ | awk ‘{print $2, “->”, $3}’
  - then correct
  - can get commit association via - git log –since=”1 month ago” –name-status –pretty=format:”%H”
  - statuses are A,M,D,Rddd

``` # Regex pattern to match the git log output pattern = r”^([AMD])s+(.+?)(s*->s*(.+))?$|^R(d+)s+(.+?)s*->s*(.+)$” # Parse each line using the regex for line in git_log_output.strip().split(’n’):

match = re.match(pattern, line.strip()) if match:

if match.group(1): # For A, M, D statuses
change_type = match.group(1) old_file = match.group(2) new_file = match.group(4) if match.group(4) else None print(f”Change type: {change_type}, Old file: {old_file}, “

f”New file: {new_file}”)

elif match.group(5): # For R status (renames)
change_type = ‘R’ similarity_index = match.group(5) old_file = match.group(6) new_file = match.group(7) print(f”Change type: {change_type}, Similarity index:”

f” {similarity_index}, Old file: {old_file}, New file:” f” {new_file}”)

```

Example Script:

repo_dir=`pwd` file_commit_pb=$repo_dir/file_commit.pb query_pb=$repo_dir/s_result.pb bep_pb=$repo_dir/test_all.pb out_gml=$repo_dir/my.gml out_csv=$repo_dir/my.csv out_html=$repo_dir/my.html

# Prepare data bazel query “//… - //docs/… - //third_party/bazel/…” –output proto

> $query_pb

bazel test //… –build_event_binary_file=$bep_pb bazel run //apps/bazel_parser –output_groups=-mypy – git-capture –repo-dir

$repo_dir –days-ago 400 –file-commit-pb $file_commit_pb

# Separate step if we want build timing data bazel clean bazel build –noremote_accept_cached

–experimental_execution_log_compact_file=exec_log.pb.zst –generate_json_trace_profile –profile=example_profile_new.json //…

# Would then need to process the exec_log.pb.zst file to get timing from it and # then add to the other timing information

# Process and visualize the data bazel run //apps/bazel_parser –output_groups=-mypy – process

–file-commit-pb $file_commit_pb –query-pb $query_pb –bep-pb $bep_pb –out-gml $out_gml –out-csv $out_csv

bazel run //apps/bazel_parser –output_groups=-mypy – visualize: –gml $out_gml –out-html $out_html

Attributes

`logger`
`PATH_TYPE`
`OUT_PATH_TYPE`

Classes

Config

Functions

`load_config`(→ Config)
`get_config`(→ Config)
`cli`()
`git_capture`(→ None)
`process`(→ None)
`full`(→ None)
`report`(→ None)
`visualize`(→ None)

Module Contents

apps.bazel_parser.cli.logger[source]

apps.bazel_parser.cli.PATH_TYPE[source]

apps.bazel_parser.cli.OUT_PATH_TYPE[source]

class apps.bazel_parser.cli.Config[source]

Bases: pydantic.BaseModel

model_config[source]

query_target: str[source]

test_target: str[source]

days_ago: int[source]

refinement: Config.refinement[source]

apps.bazel_parser.cli.load_config(config_yaml_path: pathlib.Path, overrides: dict) → Config[source]

Parameters:

config_yaml_path (pathlib.Path)
overrides (dict)

Return type:

Config

apps.bazel_parser.cli.get_config(config_file: pathlib.Path | None, days_ago: int | None) → Config[source]

Parameters:

config_file (pathlib.Path | None)
days_ago (int | None)

Return type:

Config

apps.bazel_parser.cli.cli()[source]

apps.bazel_parser.cli.git_capture(repo_dir: pathlib.Path, days_ago: int, file_commit_pb: pathlib.Path) → None[source]

Parameters:

repo_dir (pathlib.Path)
days_ago (int)
file_commit_pb (pathlib.Path)

Return type:

None

apps.bazel_parser.cli.process(query_pb: pathlib.Path, bep_pb: pathlib.Path, file_commit_pb: pathlib.Path, out_gml: pathlib.Path, out_csv: pathlib.Path, config_file: pathlib.Path | None) → None[source]

Parameters:

query_pb (pathlib.Path)
bep_pb (pathlib.Path)
file_commit_pb (pathlib.Path)
out_gml (pathlib.Path)
out_csv (pathlib.Path)
config_file (pathlib.Path | None)

Return type:

None

apps.bazel_parser.cli.full(repo_dir: pathlib.Path, days_ago: int | None, config_file: pathlib.Path | None, out_gml: pathlib.Path | None, out_csv: pathlib.Path | None) → None[source]

Parameters:

repo_dir (pathlib.Path)
days_ago (int | None)
config_file (pathlib.Path | None)
out_gml (pathlib.Path | None)
out_csv (pathlib.Path | None)

Return type:

None

apps.bazel_parser.cli.report(csv_path: pathlib.Path, top_n: int) → None[source]

Parameters:

csv_path (pathlib.Path)
top_n (int)

Return type:

None

apps.bazel_parser.cli.visualize(gml: pathlib.Path, out_html: pathlib.Path | None) → None[source]

Parameters:

gml (pathlib.Path)
out_html (pathlib.Path | None)

Return type:

None