apps.bazel_parser.cli ===================== .. py:module:: apps.bazel_parser.cli .. autoapi-nested-parse:: Parse bazel query outputs A larger system description: - inputs: - bazel query //... --output proto > query_result.pb - the full dependency tree - bazel test //... --build_event_binary_file=test_all.pb - bazel run //utils:bep_reader < test_all.pb - the execution time related to each test target - git_utils.get_file_commit_map_from_follow - how files have changed over time, can be used to generate probabilities of files changing in the future - intermediates: - representation for source files and bazel together - outputs: - test targets: - likelihood of executing - expected value of runtime - source files: - cost in execution time of modification - expected cost of file change (based on probability of change * cost) - graph with the values above, we could take any set of file inputs and describe cost - graph that we could identify overly depended upon things XXX: - bazel query --keep_going --noimplicit_deps --output proto "deps(//...)" is much bigger than "//..." alone, compare what the differences are - git log --since="10 years ago" --name-only --pretty=format: | sort \ | uniq -c | sort -nr - this is much faster - could identify renames via: - git log --since="1 month ago" --name-status --pretty=format: \ | grep -P 'R[0-9]*\t' | awk '{print $2, "->", $3}' - then correct - can get commit association via - git log --since="1 month ago" --name-status --pretty=format:"%H" - statuses are A,M,D,R\d\d\d ``` # Regex pattern to match the git log output pattern = r"^([AMD])\s+(.+?)(\s*->\s*(.+))?$|^R(\d+)\s+(.+?)\s*->\s*(.+)$" # Parse each line using the regex for line in git_log_output.strip().split('\n'): match = re.match(pattern, line.strip()) if match: if match.group(1): # For A, M, D statuses change_type = match.group(1) old_file = match.group(2) new_file = match.group(4) if match.group(4) else None print(f"Change type: {change_type}, Old file: {old_file}, " f"New file: {new_file}") elif match.group(5): # For R status (renames) change_type = 'R' similarity_index = match.group(5) old_file = match.group(6) new_file = match.group(7) print(f"Change type: {change_type}, Similarity index:" f" {similarity_index}, Old file: {old_file}, New file:" f" {new_file}") ``` Example Script: repo_dir=`pwd` file_commit_pb=$repo_dir/file_commit.pb query_pb=$repo_dir/s_result.pb bep_pb=$repo_dir/test_all.pb out_gml=$repo_dir/my.gml out_csv=$repo_dir/my.csv out_html=$repo_dir/my.html # Prepare data bazel query "//... - //docs/... - //third_party/bazel/..." --output proto \ > $query_pb bazel test //... --build_event_binary_file=$bep_pb bazel run //apps/bazel_parser --output_groups=-mypy -- git-capture --repo-dir \ $repo_dir --days-ago 400 --file-commit-pb $file_commit_pb # Separate step if we want build timing data bazel clean bazel build --noremote_accept_cached \ --experimental_execution_log_compact_file=exec_log.pb.zst \ --generate_json_trace_profile --profile=example_profile_new.json \ //... # Would then need to process the exec_log.pb.zst file to get timing from it and # then add to the other timing information # Process and visualize the data bazel run //apps/bazel_parser --output_groups=-mypy -- process \ --file-commit-pb $file_commit_pb --query-pb $query_pb --bep-pb \ $bep_pb --out-gml $out_gml --out-csv $out_csv bazel run //apps/bazel_parser --output_groups=-mypy -- visualize \ --gml $out_gml --out-html $out_html Attributes ---------- .. autoapisummary:: apps.bazel_parser.cli.logger apps.bazel_parser.cli.PATH_TYPE apps.bazel_parser.cli.OUT_PATH_TYPE Classes ------- .. autoapisummary:: apps.bazel_parser.cli.Config Functions --------- .. autoapisummary:: apps.bazel_parser.cli.load_config apps.bazel_parser.cli.get_config apps.bazel_parser.cli.cli apps.bazel_parser.cli.git_capture apps.bazel_parser.cli.process apps.bazel_parser.cli.full apps.bazel_parser.cli.visualize Module Contents --------------- .. py:data:: logger .. py:data:: PATH_TYPE .. py:data:: OUT_PATH_TYPE .. py:class:: Config Bases: :py:obj:`pydantic.BaseModel` .. py:attribute:: model_config .. py:attribute:: query_target :type: str .. py:attribute:: test_target :type: str .. py:attribute:: days_ago :type: int .. py:attribute:: refinement :type: Config.refinement .. py:function:: load_config(config_yaml_path: pathlib.Path, overrides: dict) -> Config .. py:function:: get_config(config_file: pathlib.Path | None, days_ago: int | None) -> Config .. py:function:: cli() .. py:function:: git_capture(repo_dir: pathlib.Path, days_ago: int, file_commit_pb: pathlib.Path) -> None .. py:function:: process(query_pb: pathlib.Path, bep_pb: pathlib.Path, file_commit_pb: pathlib.Path, out_gml: pathlib.Path, out_csv: pathlib.Path, config_file: pathlib.Path | None) -> None .. py:function:: full(repo_dir: pathlib.Path, days_ago: int | None, config_file: pathlib.Path | None, out_gml: pathlib.Path | None, out_csv: pathlib.Path | None) -> None .. py:function:: visualize(gml: pathlib.Path, out_html: pathlib.Path | None) -> None