《Chrome V8原理讲解》第二十篇 编译链1:语法分析,被遗忘的细节

 

1 摘要

第三、四、五三篇文章对V8编译流程的主要功能做了介绍,在基础之上,接下来的几篇文章是编译专题,讲解V8编译链,从读取Javascript源码文件开始,到字节码的生成,并结合前三篇文章,详细说明编译过程和技术细节。编译专题的知识点包括:生成Token、生成AST、生成常量池、生成Bytecode和Sharedfunction。本文讲编译的准备工作,Javascirpt源码的读取与转码(章节2);语法分析的准备工作(章节3)。

 

2 读取Javascript源码

测试源码如下:

function ignition(s) {
    this.slogan=s;
    this.start=function(){eval('console.log(this.slogan);')}
}
worker = new ignition("here we go!");
worker.start();

Javascript源码先转成V8的内部字符串,内部字符串编译后生成Sharedfunction,Sharedfunction绑定Context等信息后生成JSfunction后交给执行单元。从读取Javascript源码讲起,源码如下:

1.  bool SourceGroup::Execute(Isolate* isolate) {
2.  //............省略很多..................
3.      // Use all other arguments as names of files to load and run.
4.      HandleScope handle_scope(isolate);
5.      Local<String> file_name =
6.          String::NewFromUtf8(isolate, arg, NewStringType::kNormal)
7.              .ToLocalChecked();
8.      Local<String> source = ReadFile(isolate, arg);
9.      if (source.IsEmpty()) {
10.        printf("Error reading '%s'\n", arg);
11.        base::OS::ExitProcess(1);
12.      }
13.      Shell::set_script_executed();
14.      if (!Shell::ExecuteString(isolate, source, file_name, Shell::kNoPrintResult,
15.                                Shell::kReportExceptions,
16.                                Shell::kProcessMessageQueue)) {
17.        success = false;
18.        break;
19.      }
20.    }
21.    return success;
22.  }

我用d8做讲解,d8方便加载Javascript源码,不需要重复造轮子。代码5行file_name的值是test.js;代码8行读取文件内容,ReadFile()代码如下:

1.  Local<String> Shell::ReadFile(Isolate* isolate, const char* name) {
2.  //只保留最重要的部分...............................
3.    char* chars = static_cast<char*>(file->memory());
4.    Local<String> result;
5.    if (i::FLAG_use_external_strings && i::String::IsAscii(chars, size)) {
6.      String::ExternalOneByteStringResource* resource =
7.          new ExternalOwningOneByteStringResource(std::move(file));
8.      result = String::NewExternalOneByte(isolate, resource).ToLocalChecked();
9.    } else {
10.      result = String::NewFromUtf8(isolate, chars, NewStringType::kNormal, size)
11.                   .ToLocalChecked();
12.    }
13.    return result;
14.  }

代码3行读取文件内容;代码5行判断i::FLAG_use_external_strings和ASCII字符,代码10行返回UTF8编码的Javascript源码。
进入bool SourceGroup::Execute()代码14行,源码如下:

1.  bool Shell::ExecuteString(Isolate* isolate, Local<String> source,
2.                      Local<Value> name, PrintResult print_result,
3.                      ReportExceptions report_exceptions,
4.                      ProcessMessageQueue process_message_queue) {
5.      //省略很多............................
6.  bool success = true;
7.  {
8.    if (options.compile_options == ScriptCompiler::kConsumeCodeCache) {
9.      //省略很多............................
10.       } else if (options.stress_background_compile) {
11.      //省略很多............................
12.       } else {
13.         ScriptCompiler::Source script_source(source, origin);
14.         maybe_script = ScriptCompiler::Compile(context, &script_source,
15.                                                options.compile_options);
16.       }
17.       Local<Script> script;
18.       if (!maybe_script.ToLocal(&script)) {
19.         // Print errors that happened during compilation.
20.         if (report_exceptions) ReportException(isolate, &try_catch);
21.         return false;
22.       }
23.       if (options.code_cache_options ==
24.           ShellOptions::CodeCacheOptions::kProduceCache) {
25.      //省略很多............................
26.       }
27.       maybe_result = script->Run(realm);//这是代码执行.....................
28.  }
29.  }

省略了不执行的代码,代码13行,把Javascript源码封装成ScriptCompiler::Source;代码14行,ScriptCompiler::Compile是编译入口,开始进入编译阶段。

 

3 语法分析器初始化

编译的第一阶段是词法分析,生成Token字;第二阶段是语法分析,生成语法树;V8的编译工具链中,先启动语法分析器,它读取Token字失败时启动词法分析器工作,按照这一流程,我们先讲解语法分析器的初始化。
ScriptCompiler::Compile()方法内部调用CompileUnboundInternal()方法,源码如下:

1.  MaybeLocal<UnboundScript> ScriptCompiler::CompileUnboundInternal(
2.      Isolate* v8_isolate, Source* source, CompileOptions options,
3.      NoCacheReason no_cache_reason) {
4.  //省略很多................
5.    i::Handle<i::String> str = Utils::OpenHandle(*(source->source_string));
6.    i::Handle<i::SharedFunctionInfo> result;
7.    i::Compiler::ScriptDetails script_details = GetScriptDetails(
8.        isolate, source->resource_name, source->resource_line_offset,
9.        source->resource_column_offset, source->source_map_url,
10.        source->host_defined_options);
11.    i::MaybeHandle<i::SharedFunctionInfo> maybe_function_info =
12.        i::Compiler::GetSharedFunctionInfoForScript(
13.            isolate, str, script_details, source->resource_options, nullptr,
14.            script_data, options, no_cache_reason, i::NOT_NATIVES_CODE);
15.    if (options == kConsumeCodeCache) {
16.      source->cached_data->rejected = script_data->rejected();
17.    }
18.    delete script_data;
19.    has_pending_exception = !maybe_function_info.ToHandle(&result);
20.    RETURN_ON_FAILED_EXECUTION(UnboundScript);
21.    RETURN_ESCAPED(ToApiHandle<UnboundScript>(result));
22.  }

“Bind”(绑定)是V8中使用的语术,作用是绑定上下文(context)。“Unbound”是没有绑定上下文的函数,即Sharedfunction,类似DLL函数,使用之前要配置相关信息。代码7行,GetScriptDetails()是计算行、列偏移量等信息;代11行Sharedfunction(),从编译缓存中读取Sharedfunction,缓存缺失时启动编译器,编译源码生成并返回Sharedfunction,源码如下:

1.  MaybeHandle<SharedFunctionInfo> Compiler::GetSharedFunctionInfoForScript(
2.      Isolate* isolate, Handle<String> source,
3.      const Compiler::ScriptDetails& script_details,
4.   .................) {
5.  //省略很多.........................
6.          {
7.      maybe_result = compilation_cache->LookupScript(
8.          source, script_details.name_obj, script_details.line_offset,
9.          script_details.column_offset, origin_options, isolate->native_context(),
10.          language_mode);
11.    }
12.    if (maybe_result.is_null()) {
13.      ParseInfo parse_info(isolate);
14.      // No cache entry found compile the script.
15.      NewScript(isolate, &parse_info, source, script_details, origin_options,
16.                natives);
17.      // Compile the function and add it to the isolate cache.
18.      if (origin_options.IsModule()) parse_info.set_module();
19.      parse_info.set_extension(extension);
20.      parse_info.set_eager(compile_options == ScriptCompiler::kEagerCompile);
21.      parse_info.set_language_mode(
22.          stricter_language_mode(parse_info.language_mode(), language_mode));
23.      maybe_result = CompileToplevel(&parse_info, isolate, &is_compiled_scope);
24.      Handle<SharedFunctionInfo> result;
25.      if (extension == nullptr && maybe_result.ToHandle(&result)) {
26.        DCHECK(is_compiled_scope.is_compiled());
27.        compilation_cache->PutScript(source, isolate->native_context(),
28.                                     language_mode, result);
29.      } else if (maybe_result.is_null() && natives != EXTENSION_CODE) {
30.        isolate->ReportPendingMessages();
31.      }
32.    }
33.    return maybe_result;
34.  }

代码7行查询compilation_cache上篇文章讲过,初次查询结果为空。代码13行创建ParseInfo实例,为语法分析器(Parser)做准备工作。代码15行初始化Parser_info,源码如下:

1.  Handle<Script> NewScript(Isolate* isolate, ParseInfo* parse_info,
2.                           Handle<String> source,
3.                           Compiler::ScriptDetails script_details,
4.                           ScriptOriginOptions origin_options,
5.                           NativesFlag natives) {
6.    Handle<Script> script =
7.        parse_info->CreateScript(isolate, source, origin_options, natives);
8.    Handle<Object> script_name;
9.    if (script_details.name_obj.ToHandle(&script_name)) {
10.      script->set_name(*script_name);
11.      script->set_line_offset(script_details.line_offset);
12.      script->set_column_offset(script_details.column_offset);
13.    }
14.    Handle<Object> source_map_url;
15.    if (script_details.source_map_url.ToHandle(&source_map_url)) {
16.      script->set_source_mapping_url(*source_map_url);
17.    }
18.    Handle<FixedArray> host_defined_options;
19.    if (script_details.host_defined_options.ToHandle(&host_defined_options)) {
20.      script->set_host_defined_options(*host_defined_options);
21.    }
22.    return script;
23.  }

代码6~12行,把源码封装到Parser_info中,设置行、例偏移量信息。
回到Compiler::GetSharedFunctionInfoForScript(),代码23行,进入CompileToplevel(),源码如下:

1.  MaybeHandle<SharedFunctionInfo> CompileToplevel(
2.      ParseInfo* parse_info, Isolate* isolate,
3.      IsCompiledScope* is_compiled_scope) {
4.  //省略很多.........................
5.    if (parse_info->literal() == nullptr &&
6.        !parsing::ParseProgram(parse_info, isolate)) {
7.      return MaybeHandle<SharedFunctionInfo>();
8.    }
9.  //省略很多.........................
10.    MaybeHandle<SharedFunctionInfo> shared_info =
11.        GenerateUnoptimizedCodeForToplevel(
12.            isolate, parse_info, isolate->allocator(), is_compiled_scope);
13.    if (shared_info.is_null()) {
14.      FailWithPendingException(isolate, parse_info,
15.                               Compiler::ClearExceptionFlag::KEEP_EXCEPTION);
16.      return MaybeHandle<SharedFunctionInfo>();
17.    }
18.    FinalizeScriptCompilation(isolate, parse_info);
19.    return shared_info;
20.  }

代码5行literal()判断抽象语法树是否存在,首次执行时为空,所以进入代码6行,开始语法分析,源码如下:

1.  bool ParseProgram(ParseInfo* info, Isolate* isolate,
2.                    ReportErrorsAndStatisticsMode mode) {
3.  //省略代码..............................
4.    Parser parser(info);
5.    FunctionLiteral* result = nullptr;
6.    result = parser.ParseProgram(isolate, info);
7.    info->set_literal(result);
8.    if (result) {
9.      info->set_language_mode(info->literal()->language_mode());
10.      if (info->is_eval()) {
11.        info->set_allow_eval_cache(parser.allow_eval_cache());
12.      }
13.    }
14.    if (mode == ReportErrorsAndStatisticsMode::kYes) {
15.  //省略代码..............................
16.    }
17.    return (result != nullptr);
18.  }

代码4行,使用Parse_info信息创建Parser实例,源码如下:

1.  Parser::Parser(ParseInfo* info)
2.      : ParserBase<Parser>(info->zone(), &scanner_, info->stack_limit(),
3.                           info->extension(), info->GetOrCreateAstValueFactory(),
4.                           info->pending_error_handler(),
5.                           info->runtime_call_stats(), info->logger(),
6.                           info->script().is_null() ? -1 : info->script()->id(),
7.                           info->is_module(), true),
8.        info_(info),
9.        scanner_(info->character_stream(), info->is_module()),
10.        preparser_zone_(info->zone()->allocator(), ZONE_NAME),
11.        reusable_preparser_(nullptr),
12.        mode_(PARSE_EAGERLY),  // Lazy mode must be set explicitly.
13.        source_range_map_(info->source_range_map()),
14.        target_stack_(nullptr),
15.        total_preparse_skipped_(0),
16.        consumed_preparse_data_(info->consumed_preparse_data()),
17.        preparse_data_buffer_(),
18.        parameters_end_pos_(info->parameters_end_pos()) {
19.    bool can_compile_lazily = info->allow_lazy_compile() && !info->is_eager();
20.    set_default_eager_compile_hint(can_compile_lazily
21.                                       ? FunctionLiteral::kShouldLazyCompile
22.                                       : FunctionLiteral::kShouldEagerCompile);
23.    allow_lazy_ = info->allow_lazy_compile() && info->allow_lazy_parsing() &&
24.                  info->extension() == nullptr && can_compile_lazily;
25.    set_allow_natives(info->allow_natives_syntax());
26.    set_allow_harmony_dynamic_import(info->allow_harmony_dynamic_import());
27.    set_allow_harmony_import_meta(info->allow_harmony_import_meta());
28.    set_allow_harmony_nullish(info->allow_harmony_nullish());
29.    set_allow_harmony_optional_chaining(info->allow_harmony_optional_chaining());
30.    set_allow_harmony_private_methods(info->allow_harmony_private_methods());
31.    for (int feature = 0; feature < v8::Isolate::kUseCounterFeatureCount;
32.         ++feature) {
33.      use_counts_[feature] = 0;
34.    }
35.  }

代码8~18行,从ParserInfo中获取信息;代码19~23行是lazy compile开关,allow_lazy表示最终结果;代码25是否支持natives语法,也就是Javascript源码中是否允许使用以%开头的命令;代码26~30行是否支持私有方法等等。至此,语法分析器初始化工作完毕。
创建Paser实例后,返回bool ParseProgram(),代码6行,进行语法分析,期间还需要创建扫描器,下次讲解。
技术总结
(1) Javascript源码进入V8后需要转码;
(2) Javascript源码在V8内的表示是Source类,全称是v8::internal::source
(3) 先查编译缓存,缓存缺失时启动编译;
(4) 语法分析器先启动,Token缺失时启动词法分析器。
好了,今天到这里,下次见。

恳请读者批评指正、提出宝贵意见
微信:qq9123013 备注:v8交流 邮箱:v8blink@outlook.com

(完)